So, what is a CDN?

Very bluntly, a Content Delivery Network (or CDN) is a set of computers that delivers your content to the users, in place of your own servers architecture. By "content", I mean any kind of data that flows to the user, specifically html/css/js, flash apps, images and possibly videos.

Thus content is either stored entirely on the CDN itself, or cached copies of it are first requested to your origin server, then served directly until they "expire".

This second strategy (based on caching) relies on the HTTP protocol capacity to specify a time-to-live (TTL) for a given resource, which defines the amount of time during which HTTP clients (be it a caching proxy or the final user-agent) will keep the resource without requesting it again at the "origin".

Why should you care?

You possibly pay for your bandwidth, or your bandwidth is limited

Unless you own an ISP, it's quite likely you are paying for the bandwidth you do consume, and/or that your hosting limits your traffic.

So, a traffic spike, or a global rise of visits on your site will either cost you extra money, or degrade badly your users experience.

Using a CDN will let you reduce your origin bandwidth in drastic proportions. You may be able to cut your traffic by more than 90% on purely static content.

You certainly have limited computing power, and you sure do pay for that

Serving resources to the users, be they static (images that don't change) or dynamic (server generated web-pages), do not only cost bandwidth, but also cost computing power. Likewise, it's limited by your server(s) capacity, and scaling that up may be as "simple" as stacking up a new machine ($), or as complex as redesigning entirely your software application ($$$).

While reducing your origin traffic on static content will not reduce your computing resources consumption in the same magnitude as for bandwidth, there sure is no little gain on that front.

You have a limited number of servers at your disposal (worldwide)

You possibly have a couple machines in a datacenter near your office. If you can afford it, you possibly have as well a US or European box somewhere, so that foreigners don't have a crappy experience. Either way, it's unlikely you will be able to buy as many points of presence as Akamai - or Google, for that matter...

That means you will never get an homogeneous user experience: people "close" to your data-center will have a great experience (say, same country, and people with good ISPs), and other people will have a degraded experience.

CDN providers usually have huge networks (IIRC Akamai owns 40k servers) and plants them on the "border" of the internet, housing at ISPs. Not only this makes user experience more homogeneous, but it also boosts it drastically.

Such a global user experience improvement is likely out of reach for you otherwise.

You can't guarantee a 100% uptime

Enhancing your up-time is not a simple game. Identifying all single points of failure in your software architecture, then deploying load-balancers and redundant machines, is certainly a costly operation that still won't guarantee a 100% uptime.

By taking advantage of your CDN ability to keep on serving cached resources even if the origin is down, you can alleviate "origin disasters" and maintain at least parts of your service while fixing the problem.

How does it work concretely?

When a resource is first requested by a visitor, the CDN fetches it from your origin server, then keep a local copy of it. Any subsequent visitor will get the cached copy, with no round-trip to the origin - until the TTL expires. When it does, the CDN will query again the origin, and either get an updated copy, or be notified that the resource hasn't changed, and will get back to serving directly for another TTL span. Some requests may not be cached and forwarded directly to the origin instead (typically, POST) - deciding what you want to be cached, and what you want not to be, is under your control, by specifying cache-control directives.

Using adequate TTLs depending on the resource determines how your traffic will shape. Typically, assets (like javascript files, stylesheets, images) can use a very long TTLs (whenever your developers change the resource, they may append a query string to the resource name wherever it's requested, so that a new cache entry is created).

This is exactly the same thing as cache-control optimizations meant to have single users request some of the content only once (eg: have their user-agent cache it), except this applies for all users (from the "origin" POV).

While some people may think this is limited to purely static content, this is not accurate. You may very well apply that to dynamically generated content as well. The only question you have to ask yourself is: at what rate do you want your changes on dynamic content to be propagated? Typically, a web blog can very well suffer a one hour latency before a new post hits the planet. And a data feed may as well easily suffer a 20 minutes refresh policy. Either way, this all depends on your application structure, but if you are running anything that doesn't need to be fully real-time in its entirety, then it's quite likely you can get important gains from a CDN HTTP caching capabilities.

What is so cool with CirruxCache?

Commercial CDN usually cost quite a lot. This is expected, and that's the way it should be: you typically get excellent support, a knowledge base, and a ready to use product. People who can afford it should go for such a commercial provider.

Now, CirruxCache on the other hand is an open-source application, meant to run on top of the Google AppEngine platform which usage is considerably cheaper than any commercial CDN provider (or even free, if you don't hit the quota).

Note that shad is still in the early phase as far as documentation is concerned, but it's coming. As for the platform robustness, we stressed it already quite a lot at Zoomorama! :-)

IMHO, CirruxCache really is a great app that lives-up to its promises: both the idea and the implementation are simple and to the point - providing a straight-forward, robust, CDN-like application that can compete at a bargain cost.