Easy commandline argument parsing in shell script

Here is a fairly easy method which can parse all input arguments passed to a shell script in commandline. This block of code can be either included in top of the script or source it from an external file.

Using this code within a shell script is very simple, just needs to define a global variable named ARGS_INPUT_FORMAT and set appropriate values to it. The argument parsing code will read this variable and set appropriate inputs as well as outputs.

Set values to the variable ARGS_INPUT_FORMAT as shown below.

Read more of this post

Turn your website into maintenance mode – SEO friendly solution using .htaccess file

Here is a simple trick which enables to turn your website into maintenance mode using .htaccess file.

If you want to give an informative message saying “We are under maintenance”, create a custom HTML page containing the message and put it in the website root. For e.g., we create an HTML page called “under-maintenance.html” and place it inside the website root, so that we can access it through the URL http://www.yourdomain.com/under-maintenance.html

Setup status “503″ and header “Retry-After”

Depending on the maintenance duration, we also have to care about the search engines which may visit the website during maintenance. For doing this, we will send an http status code 503, “Service Unavailable”, along with the HTML page containing maintenance information. Also, we will inform search engines, visit the website only after the maintenance is completed.

Read more of this post

Understanding Cache control and enabling it for optimal results

Cache control is a mechanism used to control the caching behavior of a web browser. By default, all modern browsers know what content to be cached and reuse when again it requires. Browsers are using their own unique algorithms to decide whether the cached content can be safely re-used or not. Normally, browsers cache all the images in a web page and just check its freshness against the server’s copy whenever a new request comes. If the content is modified on server, it will simply re-download it.

The mechanism of content caching is pretty simple but an effective configuration of server will only allow utilizing this feature fully and properly.

Mainly five response headers and their values control the caching behavior in an http client-server communication.

1. Date
2. Last-Modified
3. Cache-Control
4. ETag
5. Expires

Date & Last-Modified

A web server sets the Date and Last-Modified headers default in its response unless directed otherwise. The Last-Modified header value helps a browser to identify weather the content is modified or not on server. Browsers can send conditional requests and verify the freshness of a resource whenever requires. This header is a minimum requirement at least when no other cache control headers are present.

Cache-Control

Cache-Control header values are a set of control instructions which one end can force the other to obey. A browser can request its cache preferences to server. Also, a web server can force the browser to follow some caching behaviors. This may include commonly used directives like “no-cache”, “no-store”, “max-age” or “must-revalidate”. These control instructions always get high preference over any other cache control header values and browser algorithms.

ETag & Expires

They are two different methods to implement improved client side caching. ETag (Entity Tag) is actually a checksum of the resource’s attributes and it is a unique value dynamically generated by server or application. For a ‘static file’ resource, this can be the ‘sum’ of size, modification time and inode number. An application can generate this tag value based on some specific criterias and set in resource’s response header.

Expires is an entirely different technology which guarantees a valid cached resource until the expiry time is reached. A combination of Expires and Last-Modified values help managing the cache control of static files simply and more efficiently.

How to setup my server to use appropriate cache control headers

Many people prefer avoiding ETag unless it is required for any specific reason. Also many believe it can help improve the server’s performance itself if removed. The concept of ETag is similar to what Last-Modified header does. A conditional request based on the last known ETag value can send to server to verify the freshness of a resource. Server will answer with a 304 response (Not Modified) if not modified. ETag ensures more accuracy when compared to Last-Modified header, but for static file resources, most of the time it is not necessary to generate this extra header. ETag is useful when you need strong validation of some kind of dynamic generated contents. There is a known issue too with ETag where you use multiple servers to load balance the traffic. The generated ETag will be different on different servers and this will create confusion to clients.

You can set or unset the ETag header as shows below:
To generate ETag for static files in Apache using all available attributes,

FileETag All

To unset this behavior explicitly,

<IfModule headers_module>
   Header unset ETag
</IfModule>
FileETag None

Note: ‘headers_module’ needs to be enabled this to work.

The combination of Expires and Last-Modified headers with appropriate values give the best result. See below how to set Expires directives in Apache

<IfModule expires_module>
   ExpiresActive On
   ExpiresByType application/javascript A604800
   ExpiresByType application/x-javascript A604800
   ExpiresByType text/css A604800
   ExpiresByType image/gif A604800
   ExpiresByType image/jpeg A604800
   ExpiresByType image/png A604800
   ExpiresByType image/x-icon A604800
   ExpiresByType application/x-shockwave-flash A604800
</IfModule>

If ‘expires_module’ is enabled and ExpiresActive set to ‘On’, Expires functionality turns on. ExpiresByType directive allows specifying different ‘interval’ values for different MIME Types. See http://httpd.apache.org/docs/2.2/mod/mod_expires.html to understand how to use the interval syntax and a default expiry time using ExpiresDefault.

Expires module sets a ‘max-age’ value equal to the ‘interval’ specified and adds to the Cache-Control header value.

Cache-Control max-age=604800

Note, if you remove or unset Last-Modified header when Expires is active, browsers will be forced to not use any conditional checking at all. This behavior is not recommended even through if we can eliminate the extra overhead of sending a conditional checking. Browsers will handle this situation intelligently and they will only re-validate the cached content when you re-request (Refresh) a web page or max-age is reached. RFC standards recommend Last-Modified header to be send with every response header.

HTTP / Web server troubleshooting using Wget.

There are few useful options to the powerful wget command, a non-interactive Linux/Unix command line downloader which helps you identifying various http server responses, performance related issues and optional feature supports.

For probing an http server and identifying its response, we can use the spider option.

wget --spider http://www.google.com

–06:24:36– http://www.google.com/
Resolving www.google.com… 74.125.53.103, 74.125.53.99, 74.125.53.104, …
Connecting to www.google.com|74.125.53.103|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
200 OK

Read more of this post

kSar – an easy sar grapher utility

kSar is an easy to use application for creating graphical reports from your sar daily report. kSar is written in java and it supports generating the report in PDF, HTML, CSV, JPG and PNG formats.

Screenshot

kSar CPU Statistics

kSar CPU Statistics

Sar is a system activity reporting tool which comes with the sysstat rpm package. Sar will collect your system activities every 10 minutes and store in /var/log/sa/saXX where XX is the zero-padded two digit day of month. Once you have sar is ready in your linux system, kSar can read the daily report file generated by sar from /var/log/sa/sarXX and generate the graphical report.

kSar can work in GUI as well as in CLI modes. GUI mode gives you the visualization of sar generated report with customizable easy to read graphs.

The CLI mode gives you the flexibility of reading sar reports from the specified input file and create graphical output in various formats like PDF, HTML or images. The command line interface will be useful if you want to generate a report on daily basis and send it via email using cron job.

Read more of this post

FTP firewall issues in Passive mode

In Linux, the default FTP mode is “Passive” where it is “Active” in Windows. The Passive mode FTP causes client to connect to high port in server. This high port is unpredictable and can range from 1024 to 65535 (high ports). Different client connections use different ports and it is difficult to identify the port which needs to be opened in server side to establish data connection from client in Passive mode. Normally if you use a firewall (say iptables) and block all the ports except 21 (FTP control port), the data transfer between client and server will be blocked in Passive mode.

Read more of this post

Rsync: The powerful network copy tool

rsync is a small, light weight, easy to use linux command line tool which can transfer N number of files from source to destination over any kind of network. Especially when copying files over limited bandwidth, rsync is much faster and reliable.

Read more of this post