Good caching procedure (HTML and images)
I've been interested for a long time how IE handles cache and how browser decides wether to bring image from cache or fetch it from server. So I got Network Monitor (part of SMS 2.0) to catch all communications between browser and server and started experimenting with IE. After hour of digging into documentation for IE I found out that prior to ver 4.0 there were no such thing as "cached object", by default (if server does not send any special HTTP headers) - IE always contacts server to check if updated version of product is available.
(Standard installation of IIS and no special HTTP headers sent ):
1. If this is first query to server then GET requests is look like this . As a response to query server sends back resource.
Please pay attention to Last Modified header which corresponds to respective OS file property. This value take a big role in all consequetive requests which are sent to server.
2. All subsequentive requests to resource are a little bit different. The query itself has an additional header now (If-Modified-Since) which is equal to the value which was received by browser in previous communication. Response from server is very much different from mentioned above. If the file was not modified since the time which was received by server from browser - server returns code 0x0130 which tells browser that there is no updated version available and it can bring resource from cache.
With above mentioned solution there are a couple of obvious disadvantages. First there would be always delay which would be required to establish TCP/IP connection, then to send query, then to receive response. Also it places additional load on bandwidth of both client and server and could be significant in case of SSL and modem connections. So that's why there are additional headers which are available to modify that behavior (look below)
That's how modified communications are look like (IIS is tuned to send Expires header):
1. If this is first query to server then GET requests is look like this . As a response to query server sends back resource.
Please pay attention that along with standard Last Modified header which was used in first case there are now 2 more headers (Expires and Cache-Control). Frankly speaking I still don't understand why browser needs both of them but based on values of this headers - browser stamps resource with expiration date and outs it into cache. In the above mentioned example cache time value is 432000 s which corresponds to 5 days. This information is used by server in all subsequent requests to servers.
2. All next requests to server - IE just checks whether content in cache was expired or not and respectively does not make any trip to server for resource at all.
This method is extremely helpful on homepages of companies which usually does not change as frequently as other pages and so can be cached on client side.
If you have questions are any additions (especially if you know why browser needs both Expires and Content-Control:max-age) then please email.
P.S. Below is very nasty problem which I have faced when was implementing method on a cluster of Web servers. Please read information below because there is a chance that you can face the same problem. The problem was as follows: I was managing a cluster of 3 web servers and this configuration was working fine on server no.1 but not 2 or 3. Machines were identical and I have spent several hours digging registry and metabase for possible solutions. Network Monitor helped me again in this situation - look what I found out while catching problems with faulty servers. If you can see there is incorrect value sent in cache-control value which is for some reason set to 0 instead of supposed 432000. Why that was happening I still don't understand but I tried the most simple solution and to my surprised it worked. I added additional header with the same name, so response become look like this. It fixed the problem and browser have given precedence to the maximum value, after that all of servers become working fine.