Turn your website into maintenance mode – SEO friendly solution using .htaccess file

Here is a simple trick which enables to turn your website into maintenance mode using .htaccess file.

If you want to give an informative message saying “We are under maintenance”, create a custom HTML page containing the message and put it in the website root. For e.g., we create an HTML page called “under-maintenance.html” and place it inside the website root, so that we can access it through the URL http://www.yourdomain.com/under-maintenance.html

Setup status “503″ and header “Retry-After”

Depending on the maintenance duration, we also have to care about the search engines which may visit the website during maintenance. For doing this, we will send an http status code 503, “Service Unavailable”, along with the HTML page containing maintenance information. Also, we will inform search engines, visit the website only after the maintenance is completed.

Read more of this post

Understanding Cache control and enabling it for optimal results

Cache control is a mechanism used to control the caching behavior of a web browser. By default, all modern browsers know what content to be cached and reuse when again it requires. Browsers are using their own unique algorithms to decide whether the cached content can be safely re-used or not. Normally, browsers cache all the images in a web page and just check its freshness against the server’s copy whenever a new request comes. If the content is modified on server, it will simply re-download it.

The mechanism of content caching is pretty simple but an effective configuration of server will only allow utilizing this feature fully and properly.

Mainly five response headers and their values control the caching behavior in an http client-server communication.

1. Date
2. Last-Modified
3. Cache-Control
4. ETag
5. Expires

Date & Last-Modified

A web server sets the Date and Last-Modified headers default in its response unless directed otherwise. The Last-Modified header value helps a browser to identify weather the content is modified or not on server. Browsers can send conditional requests and verify the freshness of a resource whenever requires. This header is a minimum requirement at least when no other cache control headers are present.

Cache-Control

Cache-Control header values are a set of control instructions which one end can force the other to obey. A browser can request its cache preferences to server. Also, a web server can force the browser to follow some caching behaviors. This may include commonly used directives like “no-cache”, “no-store”, “max-age” or “must-revalidate”. These control instructions always get high preference over any other cache control header values and browser algorithms.

ETag & Expires

They are two different methods to implement improved client side caching. ETag (Entity Tag) is actually a checksum of the resource’s attributes and it is a unique value dynamically generated by server or application. For a ‘static file’ resource, this can be the ‘sum’ of size, modification time and inode number. An application can generate this tag value based on some specific criterias and set in resource’s response header.

Expires is an entirely different technology which guarantees a valid cached resource until the expiry time is reached. A combination of Expires and Last-Modified values help managing the cache control of static files simply and more efficiently.

How to setup my server to use appropriate cache control headers

Many people prefer avoiding ETag unless it is required for any specific reason. Also many believe it can help improve the server’s performance itself if removed. The concept of ETag is similar to what Last-Modified header does. A conditional request based on the last known ETag value can send to server to verify the freshness of a resource. Server will answer with a 304 response (Not Modified) if not modified. ETag ensures more accuracy when compared to Last-Modified header, but for static file resources, most of the time it is not necessary to generate this extra header. ETag is useful when you need strong validation of some kind of dynamic generated contents. There is a known issue too with ETag where you use multiple servers to load balance the traffic. The generated ETag will be different on different servers and this will create confusion to clients.

You can set or unset the ETag header as shows below:
To generate ETag for static files in Apache using all available attributes,

FileETag All

To unset this behavior explicitly,

<IfModule headers_module>
   Header unset ETag
</IfModule>
FileETag None

Note: ‘headers_module’ needs to be enabled this to work.

The combination of Expires and Last-Modified headers with appropriate values give the best result. See below how to set Expires directives in Apache

<IfModule expires_module>
   ExpiresActive On
   ExpiresByType application/javascript A604800
   ExpiresByType application/x-javascript A604800
   ExpiresByType text/css A604800
   ExpiresByType image/gif A604800
   ExpiresByType image/jpeg A604800
   ExpiresByType image/png A604800
   ExpiresByType image/x-icon A604800
   ExpiresByType application/x-shockwave-flash A604800
</IfModule>

If ‘expires_module’ is enabled and ExpiresActive set to ‘On’, Expires functionality turns on. ExpiresByType directive allows specifying different ‘interval’ values for different MIME Types. See http://httpd.apache.org/docs/2.2/mod/mod_expires.html to understand how to use the interval syntax and a default expiry time using ExpiresDefault.

Expires module sets a ‘max-age’ value equal to the ‘interval’ specified and adds to the Cache-Control header value.

Cache-Control max-age=604800

Note, if you remove or unset Last-Modified header when Expires is active, browsers will be forced to not use any conditional checking at all. This behavior is not recommended even through if we can eliminate the extra overhead of sending a conditional checking. Browsers will handle this situation intelligently and they will only re-validate the cached content when you re-request (Refresh) a web page or max-age is reached. RFC standards recommend Last-Modified header to be send with every response header.

HTTP / Web server troubleshooting using Wget.

There are few useful options to the powerful wget command, a non-interactive Linux/Unix command line downloader which helps you identifying various http server responses, performance related issues and optional feature supports.

For probing an http server and identifying its response, we can use the spider option.

wget --spider http://www.google.com

–06:24:36– http://www.google.com/
Resolving www.google.com… 74.125.53.103, 74.125.53.99, 74.125.53.104, …
Connecting to www.google.com|74.125.53.103|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
200 OK

Read more of this post