Four Tips for Setting up HTTP File Downloads

calendarMarch 24, 2010 in Firefox , HTTP , HTTPS , HttpWatch , Internet Explorer , Javascript , Optimization

Web sites don’t just contain pages; sometimes you need to provide files that users can download. Putting a file on your web server and linking to it from an HTML page is just the first step. You also need to be aware of the HTTP response headers that affect file downloads.

These four tips cover some of the issues you may run into:

Tip #1: Forcing a Download and Controlling the File Name

Providing a download link in the HTML is easy:

...
<a href="http://download.httpwatch.com/httpwatch.exe">Download</a>
...

It works well for binary files like setup programs and ZIP archives that the browser doesn’t know how to display. A dialog is displayed allowing the user to save the file locally:

IE File Save Dialog

The trouble is that the browser behaves differently if the file is something that it can display itself. For example, if you link to a plain text file the browser just opens it and doesn’t prompt to save the download:

Plain Text in IE

You can force the use of the file download dialog by adding the following response header:

Content-Disposition: attachment; filename=<file name.ext>

The header also allows you to control the default file name. This can be handy if you’re generating the content in something like getfile.aspx but you want to supply a more meaningful file name to the user.

For static content you can manually configure the additional header in your web server. For example, here’s the setting in IIS:

content_disposition_header

For dynamically generated content you would need to add this header in the page’s server side code.

After adding the header, the browser will always prompt the user to download the file:

plain_text_download

Tip #2: Use Effective HTTP Caching

Like any other content, it’s worth setting up HTTP caching to maximize the speed of download and minimize your bandwidth costs. Usually content needs to expire immediately or be cached forever.

Our example download of the HTTP spec (RFC2616) could be cached forever because it is not expected to change. You can see here in HttpWatch we have set up a far futures Expires value and set Cache-Control to public :

effective_caching

This allows future downloads of the file to be delivered from the local browser cache or an intermediate proxy. If the file is subject to frequent changes, you may want to expire it immediately so that a fresh copy is always downloaded. You can do this by setting Expires to -1 or any date in the past.

Tip #3: Don’t break HTTPS downloads in IE

It’s tempting to use the no-store and no-cache directives with the Cache-Control response header to prevent any caching of a file that is often updated:

Cache-Control: no-store, no-cache

This works in Firefox, but watch out for Internet Explorer. It interprets these flags as meaning that the content should never be saved to the disk when HTTPS is being used and causes the file download dialog to hang at 0% for several minutes:

https_ie_hang

It eventually displays an error message:

https_ie_error

There’s more information about this problem and other possible causes in a post on Eric Lawrence’s IEInternals blog.

Tip #4: Don’t Forget to Setup Analytics

You’ll probably want to track file downloads along with other metrics from your web site. Javascript based solutions such as Google Analytics are very popular, but will not show file downloads by default. This is because downloading a file does not cause any Javascript to be executed.

With Google Analytics you need to add an onlick handler to enable download tracking:

...
<a onclick="pageTracker._trackPageview('/httpwatch.exe');" href="...">Download</a>
...

You can see the Google Analytics call being made just before the file download starts:

ga_download

Ajax Caching: Two Important Facts

calendarAugust 7, 2009 in Caching , Firefox , HTTP , HttpWatch , Internet Explorer

Ajax calls are just like any other HTTP request that might be used to build a web page. However, due to their dynamic nature people often overlook the benefit of caching them.

Rule 14 of High Performance Web Sites states:

Make Ajax Cacheable

Make sure your Ajax requests follow the performance guidelines, especially having a far future Expires header.

The rest of this blog post covers two important facts that will help you understand and effectively apply caching to Ajax requests.

Fact #1 : Ajax Caching Is The Same As HTTP Caching

The HTTP and Cache sub-systems of modern browsers are at a much lower level than Ajax’s XMLHttpRequest object. At this level, the browser doesn’t know or care about Ajax requests. It simply obeys the normal HTTP caching rules based on the response headers returned from the server.

If you know about HTTP caching already, you can apply that knowledge to Ajax caching. The only real difference is that you may need to setup response headers in a different way to static files.

The following response headers are used to make your Ajax cacheable:

  • Expires: This should be set to an appropriate time in the future depending on how often the content changes. For example, if it is a stock price you might set an Expires value 10 seconds in the future. For a photograph, you might set a far futures Expires header because you don’t ever expect it to change. The Expires header allows the browser to reuse the cached content for a period of time and avoid any unnecessary round-trips to the server.
  • Last-Modified: It’s a good idea to set this so that the browser can use an If-Modified-Since header in a conditional GET request to check its locally cached content. The server would respond with a 304 status code if the data doesn’t require an update.
  • Cache-Control: If appropriate, this should be set to ‘Public’ so that intermediate proxies and caches can store and share the content with other users  It will also enable caching of HTTPS requests on Firefox.

Of course, this doesn’t apply if you use the POST method in your Ajax requests, because POST requests are never cached. You should always use the POST method if your Ajax request has side effects, e.g. moving money between bank accounts.

We’ve setup a Ajax caching demo that shows these headers in action. In HttpWatch, you can see that we’ve set all three of these headers in the Ajax response:

Ajax Caching Headers

If you click on the ‘Ajax Update’ button at regular intervals, the time only changes approximately once a minute because the Expires header is set to one minute in the future. In this HttpWatch screenshot you can see that repeated clicks of the update button cause Ajax requests that read directly from the browser cache and result in no network activity (i.e. the value in the Sent and Received columns is zero bytes) :

Ajax Caching

The final click at 1:06.531 does result in an Ajax request that requires a network round-trip, because the cached data is now more than one minute old. The 200 response from the server indicates that a fresh copy of the content was downloaded.

Fact #2: IE Doesn’t Refresh Ajax Based Content Before Its Expiration Date

Sometimes Ajax is used at load time to populate sections of a page (e.g. a price list). Instead of being triggered by a user event such as a button click, it is directly called from the Javascript that runs when the page is loaded. This makes the Ajax call behave as if it were a request for an embedded resource.

As you develop a page like this, it is tempting to refresh the page in an attempt to update the embedded Ajax content. With other embedded resources such as CSS or images, the browser automatically sends the following types of requests depending on whether F5 (Refresh) or Ctrl+F5 (Forced Refresh) is used:

  1. F5(Refresh) causes the browser to build a conditional update request if the content originally had a Last-Modified response header. It uses the If-Modified-Since request header so that server can avoid unnecessary downloads where possible by returning the HTTP 304 response code.
  2. Ctrl+F5 (Forced Refresh) causes the browser to send an unconditional GET request with a Cache-Control request header set to ‘no-cache’. This indicates to all intermediate proxies and caches that the browser needs the latest version of the resource regardless of what has already been cached.

Firefox propagates the type of refresh down to any Ajax request that is made during the loading of the page and will therefore update any Ajax derived content as if it were an embedded resource. This screen shot of the HttpWatch plugin-in for Firefox shows the effect of refreshing our Ajax Caching demo page:

Refresh of Ajax Request in Firefox

Firefox ensured that the Ajax request was issued as a conditional GET. The server responds with a 304 in our demo if the cached data is less than 10 seconds old or a 200 response with the updated content if it is out of date.

In Internet Explorer, the load-time Ajax request is treated as though it is unrelated to the rest of the page refresh and there is no propagation of the user’s Refresh action. No GET request is sent to the server if the cached Ajax content has not yet expired. It simply reads the content directly from the cache, resulting in the (Cache) result value in HttpWatch. Here’s the effect of F5 in IE before the content has expired:

IE Refresh of Ajax Request

Even with Ctrl+F5, the Ajax derived content is still read from the cache:

IE Forced Refresh

This means that any Ajax derived content in IE is never updated before its expiration date – even if you use a forced refresh (Ctrl+F5). The only way to ensure you get an update is to manually remove the content from the cache. In HttpWatch, you can do this using the Tools menu:

Clear cache Entry

HTTPS Performance Tuning

calendarJanuary 15, 2009 in Caching , Firefox , HTTP , HTTPS , HttpWatch , Internet Explorer , Optimization

An often overlooked aspect of web performance tuning is the effect of using the HTTPS protocol to create a secure web site. As applications move from the desktop onto the web, the need for security and privacy means that HTTPS is now heavily used by web sites that need to be responsive enough for every day use.

The tips shown below may help you to avoid some of the common performance and development problems encountered on sites using HTTPS:

Tip #1: Use Keep-Alive Connections

Whenever a browser accesses a web site it must create one or more TCP connections. That can be in lengthy operation even with normal unsecured HTTP.  The use of Keep-Alive connections reduces this overhead by reusing TCP connections for multiple HTTP requests. The screen shot below from HttpWatch shows the TCP connection time over HTTP is approximately 130 milliseconds for our web site when it is accessed from the UK:

Using HTTPS, the lack of Keep-Alive connections can lead to an even larger degradation in performance because an SSL connection also has to be setup once the TCP connection has been made. This requires several roundtrips between the browser and web server to allow them to agree on an encryption key (often referred to as the SSL session key) . The corresponding connection time to the same server using HTTPS is nearly four times longer as it includes the HTTPS overhead:

If a HTTPS connection is re-used the overhead of the both the TCP connection and SSL handshake are avoided.

Some web browsers and servers now allow the re-use of these SSL session keys across connections, but you may not always have control over the web server configuration or the type of browser used.

Tip #2: Avoid Mixed Content Warnings

In a previous blog post we talked about the confusing and annoying dialog that is displayed by default if a secure page uses any HTTP resources:

To stop this warning dialog interrupting the page download you need to ensure that everything on the page is accessed over HTTPS. It doesn’t have to be from the same site but it must use HTTPS. For example, the addition of HTTPS support allows the Google Ajax libraries to be safely loaded from secure pages.

Tip #3: Use Persistent Caching For Static Content

If you follow Tip #2 then everything that your page needs, including images, CSS and Javascript, will be accessed over HTTPS. You would normally want to persistently cache static content like this for as long as possible to reduce load on the web site and improve performance when a user revisits your site.

Of course, you wouldn’t want to cache anything on the disk that was specific to the user (e.g. HTML page with account details or a pie chart of their monthly spending). On most pages though, nearly all of the non-HTML content can be safely stored, shared and re-used.

There seems to be some confusion over whether caching is possible with HTTPS. For example, Google say this about Gmail over HTTPS:

You may find that Gmail is considerably slower over the HTTPS connection, because browsers do not cache these pages and must reload the code that makes Gmail work each time you change screens.

Although, 37Signals acknowledge that in-memory caching is possible, they say that persistent caching is not possible:

The problem is that browsers don’t like caching SSL content. So when you have an image or a style sheet on SSL, it’ll generally only be kept in memory and may even be scrubbed from there if the user is low on RAM (though you can kinda get around that).

Even when you do your best to limit the number of style sheets and javascript files and gzip them for delivery, it’s still mighty inefficient and slow to serve them over SSL every single time the user comes back to your site. Even when nothing changed. HTTP caching was supposed to help you with that, but over SSL it’s almost all for naught.

In reality, persistent caching is possible with HTTPS using both IE and Firefox.

Using HttpWatch you can see if content is loaded from the cache by looking for (Cache) in the Result column or by looking for the blue Cache Read block in the time chart. Here is an example of visiting https://www.httpwatch.com with a primed cache in IE. You can see that all the static resources are reloaded directly from the cache without a round trip to the web server:

If you try this in Firefox 3.0 without adjusting your response headers you will see this instead:

The ‘200’ values in the Result column indicate that the static content is being reloaded even though the site was previously visited and a valid Expires setting was used. Unless you specify otherwise, Firefox will put HTTPS resources into the in-memory cache so that they can only be re-used within a browser session. When Firefox closes the contents of the in-memory cache is lost.

The about:cache page in Firefox confirms that these files are stored using the in-memory cache:

To allow persistent caching of secure content in Firefox you need to set the Cache-Control response header to Public:

This value moves HTTPS based content into the persistent Firefox disk cache and in the case of https://www.httpwatch.com it more than halves the page load time due to the decrease in network round trips and TCP/SSL connections:

The Public cache setting is normally used to indicate that the content is not per user and can be safely stored in shared caches such as HTTP proxies. With HTTPS this is meaningless as proxies are unable to see the contents of HTTPS requests. So Firefox cleverly uses this header to control whether the content is stored persistently to the disk.

This feature was only added in Firefox version 3.0 so it won’t work with older versions. Fortunately, the take up version 3.0 is reported to be much faster than IE 6 to 7.

Tip #4: Use an HTTPS Aware Sniffer

A network sniffer is an invaluable tool for optimizing and debugging any client server application. But if you use a network level tool like Netmon or WireShark you cannot view HTTPS requests without access to the private key that the web site uses to encrypt SSL connections.

Often organizations will not allow the use of private keys outside of a production environment, and even if you have administrative access to your web site, getting the private key and using it to decrypt network traffic is not an easy task.

HttpWatch was originally born out of the frustration of trying to debug secure sites. Viewing HTTPS traffic in HttpWatch is easy as it integrates directly with IE or Firefox, and therefore has access to the unencrypted version of the data that is transmitted over HTTPS.

The free Basic Edition shows high level data and performance time charts for any site, and lower level data for a number of well know sites including Amazon.com, eBay.com, Google.com and HttpWatch.com . For example, you can try it with HTTPS by going to https://www.httpwatch.com .

You can check SSL/TLS configuration our new SSL test tool SSLRobot . It will also look for potential issues with the certificates, ciphers and protocols used by your site. Try it now for free!

Ready to get started? TRY FOR FREE Buy Now