Understanding Cache control and enabling it for optimal results
November 7, 2009 Leave a comment
Cache control is a mechanism used to control the caching behavior of a web browser. By default, all modern browsers know what content to be cached and reuse when again it requires. Browsers are using their own unique algorithms to decide whether the cached content can be safely re-used or not. Normally, browsers cache all the images in a web page and just check its freshness against the server’s copy whenever a new request comes. If the content is modified on server, it will simply re-download it.
The mechanism of content caching is pretty simple but an effective configuration of server will only allow utilizing this feature fully and properly.
Mainly five response headers and their values control the caching behavior in an http client-server communication.
1. Date
2. Last-Modified
3. Cache-Control
4. ETag
5. Expires
Date & Last-Modified
A web server sets the Date and Last-Modified headers default in its response unless directed otherwise. The Last-Modified header value helps a browser to identify weather the content is modified or not on server. Browsers can send conditional requests and verify the freshness of a resource whenever requires. This header is a minimum requirement at least when no other cache control headers are present.
Cache-Control
Cache-Control header values are a set of control instructions which one end can force the other to obey. A browser can request its cache preferences to server. Also, a web server can force the browser to follow some caching behaviors. This may include commonly used directives like “no-cache”, “no-store”, “max-age” or “must-revalidate”. These control instructions always get high preference over any other cache control header values and browser algorithms.
ETag & Expires
They are two different methods to implement improved client side caching. ETag (Entity Tag) is actually a checksum of the resource’s attributes and it is a unique value dynamically generated by server or application. For a ‘static file’ resource, this can be the ‘sum’ of size, modification time and inode number. An application can generate this tag value based on some specific criterias and set in resource’s response header.
Expires is an entirely different technology which guarantees a valid cached resource until the expiry time is reached. A combination of Expires and Last-Modified values help managing the cache control of static files simply and more efficiently.
How to setup my server to use appropriate cache control headers
Many people prefer avoiding ETag unless it is required for any specific reason. Also many believe it can help improve the server’s performance itself if removed. The concept of ETag is similar to what Last-Modified header does. A conditional request based on the last known ETag value can send to server to verify the freshness of a resource. Server will answer with a 304 response (Not Modified) if not modified. ETag ensures more accuracy when compared to Last-Modified header, but for static file resources, most of the time it is not necessary to generate this extra header. ETag is useful when you need strong validation of some kind of dynamic generated contents. There is a known issue too with ETag where you use multiple servers to load balance the traffic. The generated ETag will be different on different servers and this will create confusion to clients.
You can set or unset the ETag header as shows below:
To generate ETag for static files in Apache using all available attributes,
FileETag All
To unset this behavior explicitly,
<IfModule headers_module>
Header unset ETag
</IfModule>
FileETag None
Note: ‘headers_module’ needs to be enabled this to work.
The combination of Expires and Last-Modified headers with appropriate values give the best result. See below how to set Expires directives in Apache
<IfModule expires_module>
ExpiresActive On
ExpiresByType application/javascript A604800
ExpiresByType application/x-javascript A604800
ExpiresByType text/css A604800
ExpiresByType image/gif A604800
ExpiresByType image/jpeg A604800
ExpiresByType image/png A604800
ExpiresByType image/x-icon A604800
ExpiresByType application/x-shockwave-flash A604800
</IfModule>
If ‘expires_module’ is enabled and ExpiresActive set to ‘On’, Expires functionality turns on. ExpiresByType directive allows specifying different ‘interval’ values for different MIME Types. See http://httpd.apache.org/docs/2.2/mod/mod_expires.html to understand how to use the interval syntax and a default expiry time using ExpiresDefault.
Expires module sets a ‘max-age’ value equal to the ‘interval’ specified and adds to the Cache-Control header value.
Cache-Control max-age=604800
Note, if you remove or unset Last-Modified header when Expires is active, browsers will be forced to not use any conditional checking at all. This behavior is not recommended even through if we can eliminate the extra overhead of sending a conditional checking. Browsers will handle this situation intelligently and they will only re-validate the cached content when you re-request (Refresh) a web page or max-age is reached. RFC standards recommend Last-Modified header to be send with every response header.