Coercive Mirroring
Request For Comment
Coercive Mirroring is a simple technique for coercing downloaders
into helping to mirroring content in an automated and secure manner.
The Pitch:
Wouldn't it be great if some of the people who downloaded your
files would help mirror your content on an as-needed basis?
The only reason these folks don't is that the mechanisms are not
in place to make it easy to manage mirroring. Manual mirror maintainers
must select which sites need to be mirrored,
establish a relationship with the source site to gain reliable
access, and set up a script to keep the mirror up-to-date.
The result is that sites start with no mirrors, a wave of
downloads swamp their serving capacity, the maintainer solicits
friends and neighbors to mirror bits and pieces of his site.
Mirrors get set up, then their bandwidth gets swamped, or not enough
traffic is redirected from the primary site, or traffic is redirected
but users have no automated way to test the authenticity of files from
the mirror sites. The site maintiner doesn't have an automated way
of keeping track of his mirrors or of keeping them up-to-date.
What a big mess! I can't believe this works at all!
Manual mirroring works on a time scale of weeks or months.
Today when a site gets posted to Slashdot, traffic spikes throught the
roof in 30 seconds. The maintainer has only minutes to respond,
and often ends up getting a huge bill or hitting a bandwidth cutoff.
The lucky are those with 'unlimited' bandwidth hosting services.
In the 'unlimited' case, it won't take long for thm to kindly suggest
you take your bandwidth hogging data elsewhere.
I once mirrored a single file that had just been released.
It ended up consuming an entire 10BaseT link solid for a week.
For my hosting provider, this was not their idea of 'unlimited'.
Mirrors need to be distributed if you want to serve millions.
We know from game theory that the best way to encourage
cooperation is to play by the rules of Tit for Tat.
I'll be nice to you if you are nice to me.
If you don't keep your promises, I'll lock you out.
Coercive Mirroring is a means of automating a
request for mirroring and the corresponding tit-for-tat
reward or possible punishment. It lets low-bandwidth
servers deliver internet services in a scalable manner
to a greatly magnified number of clients while retaining
central control over the content.
This protocol does *not* address the needs of
database-backed applications. These requre more
complicated resources at the mirror and are
beyond the scope of this document.
Coercive Mirroring delivers flat files and
appropriately segmented streamed media.
The following two mechanisms are added to the
server side:
RFM -- REQUEST_FOR_MIRROR
Request that the client provide a local mirror after
downloading the requested document.
RTUM -- REDIRECT_TO_UNTRUSTED_MIRROR
Forward the client on to an alternative source.
Give them the data they need to test the
integrity of the resulting file.
Include a URL for reporting a bad cache.
REQUEST_FOR_MIRROR
REQUEST_URL = HTTP://XXXX originaly requested URL
START_TIME = GMT DATE/TIME of when mirroring will start
STOP_TIME = GMT DATE/TIME of when mirroring will end
MAX_TRAFFIC = maximum bytes of data expected to be served
MAX_CLIENTS = Number of clients expected to be redirected
--- client specifies ---
CACHE_URL = [where the proxy will keep the files]
--- optional ---
CONFIRM_URL = Web page to hit when content is ready
FRAGMENT_LIST = encrypted fragments that may need caching
REDIRECT_TO_UNTRUSTED_MIRROR
REQUEST_URL = originally requested URL
REDIRECT_URL = url of proxied document HTTP://YYYY
HASH = CRC to check document integrity
PUBLIC_KEY = key to check document authenticity
FEEDBACK_URL = url used to report on deadbeats
--- client specifies ---
DOWNLOAD_TIME = time it took to download from the mirror
--- optional ---
FRAGMENT_LIST = list of URLs for encrypted fragments
DECRYPTION_KEY = key to decrypt assembled fragments
The server component of CM can be
implemented as an apache module.
Initially, the client end of CM can be
done manually, however CM really takes off
when clients and caching proxies cooperate
with CM automatically.
Bandwith may be rising, but in the wireless space
it is very difficult to balence broadband with
spectrum allocation and and power constraints.
Intelligent caching proxies are essential to
delivering high-quality wireless services.
Coercive mirroring is one way to accelerate
the practical adoption of intelligent caching
to the ends of the network.
A complete protocol specification is not yet available.
The following conversations are intended to give a
sense of how CM works in day-to-day use.
[B] Ordinary Web Browser, user has some space on a web server
[S] Coercive Mirroring Server
[M] Employee in the mirroring business
B->S Standard HTTP Request* for HTTP://S/SSSS
(*) NOTE: Clients may pre-agree
to participate in CM within the
HTTP request, eliminiating the
next two steps of negotiation.
B<-S ErrorCode: NNN Too Busy
"I'm under too much load, however,
you can have it if agree to my terms."
REQUEST_FOR_MIRROR
action = HTTP://S/cgi-bin/RFM
REQUEST_URL = HTTP://S/SSSS
START_TIME = Jun 23 23:34:10 UTC 2000
STOP_TIME = Jun 24 23:34:10 UTC 2000
MAX_TRAFFIC = 3MB
MAX_CLIENTS = 10
FRAGMENT_LIST = [ not used ]
--- client specifies ---
CACHE_URL = [ blank ]
B->S "Yes, I agree to the following terms:"
HTTP-GET: HTTP://S/cgi-bin/RFM
REQUEST_URL = HTTP://S/SSSS
START_TIME = Jun 23 23:34:10 UTC 2000
STOP_TIME = Jun 24 23:34:10 UTC 2000
MAX_TRAFFIC = 3MB
MAX_CLIENTS = 10
--- client specifies ---
CACHE_URL = HTTP://B/BBBB
B<-S "OK, now I'm giving you a redirect.
The data has a hash and is signed with a private key.
Send me feedback if the guy is a deadbeat and I'll
get you another cache."
REDIRECT_TO_UNTRUSTED_MIRROR
action = HTTP://S/cgi-bin/FEEDBACK
REQUEST_URL = HTTP://S/SSSS
REDIRECT_URL = HTTP://M/MMMM
HASH = CRC
PUBLIC_KEY = KKKK
--- client specifies ---
DOWNLOAD_TIME = [ blank ]
B->M Standard HTTP request for HTTP://M/MMMM
B<-M HTTP... DATA
B->S HTTP-GET: HTTP://S/cgi-bin/FEEDBACK
REQUEST_URL = HTTP://S/SSSS
REDIRECT_URL = HTTP://M/MMMM
HASH = CRC
PUBLIC_KEY = KKKK
DOWNLOAD_TIME = 36s
B<-S "OK, Thanks! This guy did well. I'll up his rankings."
VOCABULARY
CM-Server
A webserver extended with the special
ability to negotiate with users using
Coercive Mirroring
Untrusted Mirror
A client that has filled in an
REQUEST_FOR_MIRROR and
agreed to the specified terms.
REQUEST_FOR_MIRROR
form a client must fill out to
participate in the TFT Protocol
REDIRECT_TO_UNTRUSTED_MIRROR
forward a client to an untrusted
proxy, give the client enough
information to make his/her own
decisions about data validity.
STREAMING
CM has the capacity to allow content creators
to publish their content at extremely low cost.
Streaming media is more difficult in that
packets have a very short Time-to-live.
By breaking a stream into packets, the CM
protocol makes it practical to create streamed
interfaces that scale as readily as flat files.
Establish a starting URL for your stream:
HTTP://pehr.net/mystream/
which redirects to the latest media packet
HTTP://pehr.net/mystream/2000-05-22-GMT1734.mp3
Clients automatically fetch the newest mp3 as it
becomes availabe, caching as they go. To conserve
space, the server may begin deleting mp3s older
than one hour.
Playback at the client lags by recording time
+ encoding time + download time. Download time
may vary considerably as employees take in
and serve back data.
The client will be jockying for
employee positions with other clients.
START_TIME becomes very important.
A smart CM-Server will give priority to
clients with the best latency track record.
An employee position is desireable because
it means that the data is delivered sooner
and more reliably, but it requires the client
to *perform* to keep their employee status.
If an employee does a bad job delivering content,
they must go to the back of the line with
the rest of the clients.
Scalable streaming is likely to require 2-3 times
as much client-side bandwidth. However systems
like DSL are often somewhat symetric, having
pre-allocated throughput for uplink and downlink.
On such systems, being a TFT employee requires
very little additional overhead.
STATE TABLES
This section is an incomplete atempt to outline the
required state tables required to implement the
CM Protocol.
CM-Server Global State
HTTP - files are served to anybody who wants 'em
HTTP+CM - files are only served to "employees"
CM-Server File State
Generous - severed to everybody
Normal - auto-enable CM when busy
Streaming - requires very low MAX_SETUP_DELAY
Force - always require CM (preferred for big files)
Client Download State
Requestor - requested a file
Applicant - agreed to the APPLICATION but no files yet
Pending - received file, now confiming receipt
Employee - serving files as needed until expiration
Deadbeat - somebody tattled, reported non-delivery
BANDWIDTH CALCULATIONS
Gross assumptions on improvements in performance
worst case scenario: new employee, downloading from a proxy
transaction overhead = 512 bytes * 6
total latency = round-trip * 3
typical scenario: new employee, downloading direct
transaction overhead = 512 bytes * 4
total latency = round-trip * 2
best case scenario: direct, unchanged
transaction overhead = 0
total latency = round-trip * 1
If there are many deadbeats, latency starts to get ugly.
A CM-Server can do smart things to monitor and police
its employees. Punishing deadbeats with denied access
may ultimately be required if smart algorthms aren't
enough.
Stable delivery is achieved if..
.1% of clients are willing to serve 1000 copies
1% of clients are willing to serve 100 copies
10% of clients are willing to serve 10 copies
33% of clients are willing to serve 3 copies
Transaction overhead is capped at ~2KB when stable.
Line Speed Rate Req/sec Req/day *absolute max*
56Kb 7KB 3 .25M
1.4Mb 180KB 90 7.8M
10 Mb 800KB 400 34M
In the limit, the performance improvement is
directly related to file size.
The most important improvement comes from
total overal bandwidth consumption.
Many low-end ISP accounts cap total bandwidth
and charge large premiums fro sending excess data.
CM provides a way for ISPs to offer nearly
unlimited distribution to their clients at
an extremely low total cost.
PRIVACY EXTENSIONS
It is important to preserve user privacy.
The standard implementation of CM
allows employed mirrors to know what
documents people are requesting.
This can be corrected with server side restrictions
and more agressive employment contracts.
The CM-Server may require that clients agree to
cache other documents than the one they are already
requesting. These other documents may be encrypted,
broken into blocks, and padded with chaff.
The employee only can look at the data and log where
it goes, but has no idea what it contains.
He only knows where it came from and where it went.
The CM-server tells requestors how to decode the
data from the random blocks they gather from
servers around the world.
Clients must set their own thresholds as to what
kinds of contracts they are willing to participate
in. With a cooperative public, privacy can be
readily guaranteed.
Last modified: Sat Jun 24 00:03:40 EDT 2000