Coercive Mirroring

Request For Comment
      
Coercive Mirroring is a simple technique for coercing downloaders
into helping to mirroring content in an automated and secure manner.


    The Pitch: 

Wouldn't it be great if some of the people who downloaded your
files would help mirror your content on an as-needed basis?

The only reason these folks don't is that the mechanisms are not 
in place to make it easy to manage mirroring. Manual mirror maintainers
must select which sites need to be mirrored,
establish a relationship with the source site to gain reliable 
access, and set up a script to keep the mirror up-to-date.
The result is that sites start with no mirrors, a wave of 
downloads swamp their serving capacity, the maintainer solicits 
friends and neighbors to mirror bits and pieces of his site.
Mirrors get set up, then their bandwidth gets swamped, or not enough
traffic is redirected from the primary site, or traffic is redirected
but users have no automated way to test the authenticity of files from
the mirror sites.  The site maintiner doesn't have an automated way
of keeping track of his mirrors or of keeping them up-to-date.
What a big mess! I can't believe this works at all! 

Manual mirroring works on a time scale of weeks or months. 
Today when a site gets posted to Slashdot, traffic spikes throught the 
roof in 30 seconds. The maintainer has only minutes to respond, 
and often ends up getting a huge bill or hitting a bandwidth cutoff.
The lucky are those with 'unlimited' bandwidth hosting services.
In the 'unlimited' case, it won't take long for thm to kindly suggest
you take your bandwidth hogging data elsewhere. 
I once mirrored a single file that had just been released. 
It ended up consuming an entire 10BaseT link solid for a week. 
For my hosting provider, this was not their idea of 'unlimited'.
Mirrors need to be distributed if you want to serve millions.


We know from game theory that the best way to encourage 
cooperation is to play by the rules of Tit for Tat. 
I'll be nice to you if you are nice to me.  
If you don't keep your promises, I'll lock you out.

Coercive Mirroring is a means of automating a
request for mirroring and the corresponding tit-for-tat 
reward or possible punishment. It lets low-bandwidth 
servers deliver internet services in a scalable manner
to a greatly magnified number of clients while retaining 
central control over the content.

This protocol does *not* address the needs of
database-backed applications. These requre more
complicated resources at the mirror and are 
beyond the scope of this document.

Coercive Mirroring delivers flat files and 
appropriately segmented streamed media.
The following two mechanisms are added to the
server side:

      
	RFM -- REQUEST_FOR_MIRROR
		Request that the client provide a local mirror after
                downloading the requested document.

	RTUM -- REDIRECT_TO_UNTRUSTED_MIRROR
		Forward the client on to an alternative source.
		Give them the data they need to test the
		integrity of the resulting file.
		Include a URL for reporting a bad cache.

	REQUEST_FOR_MIRROR
		REQUEST_URL	= HTTP://XXXX originaly requested URL
		START_TIME	= GMT DATE/TIME of when mirroring will start
		STOP_TIME       = GMT DATE/TIME of when mirroring will end
                MAX_TRAFFIC     = maximum bytes of data expected to be served 
	        MAX_CLIENTS     = Number of clients expected to be redirected
		--- client specifies ---
		CACHE_URL	= [where the proxy will keep the files]
                --- optional ---
                CONFIRM_URL     = Web page to hit when content is ready
		FRAGMENT_LIST	= encrypted fragments that may need caching
	
	REDIRECT_TO_UNTRUSTED_MIRROR
                REQUEST_URL     = originally requested URL
		REDIRECT_URL	= url of proxied document HTTP://YYYY
	 	HASH		= CRC to check document integrity
	 	PUBLIC_KEY	= key to check document authenticity
		FEEDBACK_URL	= url used to report on deadbeats
                --- client specifies ---
                DOWNLOAD_TIME   = time it took to download from the mirror
		--- optional ---
		FRAGMENT_LIST	= list of URLs for encrypted fragments
		DECRYPTION_KEY	= key to decrypt assembled fragments

	The server component of CM can be 
	implemented as an apache module.
	Initially, the client end of CM can be
	done manually, however CM really takes off
	when clients and caching proxies cooperate
	with CM automatically.

	Bandwith may be rising, but in the wireless space
	it is very difficult to balence broadband with
	spectrum allocation and and power constraints.

	Intelligent caching proxies are essential to
	delivering high-quality wireless services.
	Coercive mirroring is one way to accelerate
	the practical adoption of intelligent caching
	to the ends of the network.



A complete protocol specification is not yet available.
The following conversations are intended to give a
sense of how CM works in day-to-day use.
  

[B] Ordinary Web Browser, user has some space on a web server
[S] Coercive Mirroring Server 
[M] Employee in the mirroring business


  B->S 	Standard HTTP Request* for HTTP://S/SSSS
	(*) NOTE: Clients may pre-agree 
	    to participate in CM within the
	    HTTP request, eliminiating the
	    next two steps of negotiation.

  B<-S 	ErrorCode: NNN Too Busy
	"I'm under too much load, however,
	you can have it if agree to my terms."
	REQUEST_FOR_MIRROR
                action = HTTP://S/cgi-bin/RFM
		REQUEST_URL	= HTTP://S/SSSS
		START_TIME	= Jun 23 23:34:10 UTC 2000
                STOP_TIME       = Jun 24 23:34:10 UTC 2000
		MAX_TRAFFIC	= 3MB
		MAX_CLIENTS	= 10
		FRAGMENT_LIST	= [ not used ]
		--- client specifies ---
		CACHE_URL	= [ blank ]
        
  B->S 	"Yes, I agree to the following terms:"
	HTTP-GET: HTTP://S/cgi-bin/RFM
		REQUEST_URL	= HTTP://S/SSSS
		START_TIME	= Jun 23 23:34:10 UTC 2000
		STOP_TIME	= Jun 24 23:34:10 UTC 2000
		MAX_TRAFFIC	= 3MB
		MAX_CLIENTS	= 10
		--- client specifies ---
		CACHE_URL	= HTTP://B/BBBB

  B<-S 	"OK, now I'm giving you a redirect.
	 The data has a hash and is signed with a private key.
	 Send me feedback if the guy is a deadbeat and I'll 
	 get you another cache."
	REDIRECT_TO_UNTRUSTED_MIRROR
		action = HTTP://S/cgi-bin/FEEDBACK
		REQUEST_URL	= HTTP://S/SSSS
                REDIRECT_URL    = HTTP://M/MMMM
	 	HASH		= CRC
	 	PUBLIC_KEY	= KKKK
		--- client specifies ---
		DOWNLOAD_TIME	= [ blank ]


  B->M	Standard HTTP request for HTTP://M/MMMM

  B<-M   HTTP... DATA

  B->S	HTTP-GET: HTTP://S/cgi-bin/FEEDBACK
		REQUEST_URL	= HTTP://S/SSSS
                REDIRECT_URL    = HTTP://M/MMMM
		HASH     	= CRC
		PUBLIC_KEY	= KKKK
		DOWNLOAD_TIME	= 36s

  B<-S	"OK, Thanks! This guy did well. I'll up his rankings."
 


	VOCABULARY

	    CM-Server
		A webserver extended with the special 
		ability to negotiate with users using
		Coercive Mirroring
	    Untrusted Mirror
		A client that has filled in an 
		REQUEST_FOR_MIRROR and 
		agreed to the specified terms.
	    REQUEST_FOR_MIRROR
		form a client must fill out to
		participate in the TFT Protocol
	    REDIRECT_TO_UNTRUSTED_MIRROR
		forward a client to an untrusted 
		proxy, give the client enough
		information to make his/her own
		decisions about data validity.


	STREAMING

	CM has the capacity to allow content creators 
	to publish their content at extremely low cost.
	Streaming media is more difficult in that 
	packets have a very short Time-to-live.

	By breaking a stream into packets, the CM
	protocol makes it practical to create streamed
	interfaces that scale as readily as flat files.

	Establish a starting URL for your stream:
	   HTTP://pehr.net/mystream/
	which redirects to the latest media packet
	   HTTP://pehr.net/mystream/2000-05-22-GMT1734.mp3

	Clients automatically fetch the newest mp3 as it
	becomes availabe, caching as they go. To conserve 
	space, the server may begin deleting mp3s older 
	than one hour.

	Playback at the client lags by recording time
	+ encoding time + download time.  Download time
	may vary considerably as employees take in 
	and serve back data.  

	The client will be jockying for 
	employee positions with other clients. 
	START_TIME becomes very important.
	A smart CM-Server will give priority to 
	clients with the best latency track record.

	An employee position is desireable because
	it means that the data is delivered sooner
	and more reliably, but it requires the client
	to *perform* to keep their employee status.
	If an employee does a bad job delivering content,
	they must go to the back of the line with
	the rest of the clients.
	
	Scalable streaming is likely to require 2-3 times
	as much client-side bandwidth. However systems
	like DSL are often somewhat symetric, having 
	pre-allocated throughput for uplink and downlink.
	On such systems, being a TFT employee requires
	very little additional overhead.


	STATE TABLES

	This section is an incomplete atempt to outline the
	required state tables required to implement the 
	CM Protocol.

	    CM-Server Global State
		HTTP      - files are served to anybody who wants 'em
		HTTP+CM  - files are only served to "employees"
	
 	    CM-Server File State
		Generous  - severed to everybody
		Normal    - auto-enable CM when busy
		Streaming - requires very low MAX_SETUP_DELAY
		Force     - always require CM (preferred for big files)
	
	    Client Download State
		Requestor  - requested a file
		Applicant  - agreed to the APPLICATION but no files yet
		Pending    - received file, now confiming receipt
		Employee   - serving files as needed until expiration
		Deadbeat   - somebody tattled, reported non-delivery


	BANDWIDTH CALCULATIONS
	    Gross assumptions on improvements in performance
	    
	    worst case scenario: new employee, downloading from a proxy
		transaction overhead = 512 bytes * 6
	    	total latency = round-trip * 3

	    typical scenario: new employee, downloading direct
		transaction overhead = 512 bytes * 4
		total latency = round-trip * 2

	    best case scenario: direct, unchanged
		transaction overhead = 0
		total latency = round-trip * 1

	    If there are many deadbeats, latency starts to get ugly.
	    A CM-Server can do smart things to monitor and police
	    its employees.  Punishing deadbeats with denied access
	    may ultimately be required if smart algorthms aren't
	    enough.
	
	    Stable delivery is achieved if..
	      .1% of clients are willing to serve 1000 copies
	       1% of clients are willing to serve 100 copies
	      10% of clients are willing to serve 10 copies
	      33% of clients are willing to serve 3 copies
	    
	    Transaction overhead is capped at ~2KB when stable.
	    Line Speed   Rate	Req/sec	 Req/day *absolute max*
		 56Kb	  7KB	  3 	  .25M
		1.4Mb	180KB	 90 	  7.8M
		10 Mb   800KB	400	  34M

	    In the limit, the performance improvement is
	    directly related to file size. 

	    The most important improvement comes from 
	    total overal bandwidth consumption.
	    Many low-end ISP accounts cap total bandwidth
	    and charge large premiums fro sending excess data.
	
	    CM provides a way for ISPs to offer nearly
	    unlimited distribution to their clients at 
	    an extremely low total cost.


	PRIVACY EXTENSIONS

	It is important to preserve user privacy. 
	The standard implementation of CM 
	allows employed mirrors to know what 
	documents people are requesting.

	This can be corrected with server side restrictions
	and more agressive employment contracts.

	The CM-Server may require that clients agree to 
	cache other documents than the one they are already
	requesting.  These other documents may be encrypted,
	broken into blocks, and	padded with chaff.
	The employee only can look at the data and log where
	it goes, but has no idea what it contains. 
	He only knows where it came from and where it went.

	The CM-server tells requestors how to decode the 
	data from the random blocks they gather from 
	servers around the world.

	Clients must set their own thresholds as to what 
	kinds of contracts they are willing to participate
	in.  With a cooperative public, privacy can be
	readily guaranteed.


Last modified: Sat Jun 24 00:03:40 EDT 2000