6 Things You Should Know About Fragment URLs

calendarMarch 1, 2011 in HttpWatch

1. A Fragment URL Specifies A Location Within A Page

Any URL that contains a # character is a fragment URL. The portion of the URL to the left of the # identifies a resource that can be downloaded by a browser and the portion on the right, known as the fragment identifier, specifies a location within the resource:

In HTML documents, the browser looks for an anchor tag <a> with an id attribute matching the fragment. For example, in the URL shown above the browser finds a matching tag in the Printing Support heading:

Printing Support

 

and scrolls the page to display that section:

2. Fragments Are not Sent in HTTP Request Messages

If you try using fragment URLs in an HTTP sniffer like HttpWatch, you’ll never see the fragment IDs in the requested URL or Referer header. The reason is that the fragment identifier is only used by the browser – it doesn’t affect which resource is returned from the server.

Here’s a screen shot of HttpWatch showing the traffic generated by refreshing a fragment URL:

So don’t expect to see fragments identifiers in your server side code.

3. Anything After the First # is a Fragment Identifier

It doesn’t matter if the first # appears to be contained within the host name, path or query string – it always indicates where the fragment identifier starts.

For example, here’s a URL that attempts to encode an HTML color and shape into the query string:

http://example.com/?color=#ffff&amp;shape=circle

Unfortunately, the # in the HTML color makes the rest of the URL a fragment identifier and the server will see a single, empty color parameter in the query string:

4. Changing A Fragment ID Doesn’t Reload a Page but Does Create History

Fragments have a couple of handy features. First, if you manually change a fragment URL from something like this:

http://www.httpwatch.com/features.htm#filter

to this:

http://www.httpwatch.com/features.htm#print

and the browser scrolls the page to the new location but doesn’t reload the page.

However, it does add an entry in the browser’s history so that clicking the Back button will go back to the original location in the page.

These features are particularly useful when used with JavaScript (see below) to create linkable URLs and history for pages that either use top level HTML frames or update their content dynamically with Ajax calls.

5. JavaScript Can Use window.location.hash to Change Fragment IDs

The window object’s hash property allows JavaScript to manipulate the current page’s fragment identifier. As described in 4) this can be used to add history entries for a page without forcing a complete reload.

We recently deployed the help and automation reference for HttpWatch on our web site using the frame based HTML generated by the help authoring tool. Although the content was easily accessible in the browser, the URL in the location bar  didn’t change as you moved between topics making it practically impossible to share URLs for topics of interest.

The solution was to use fragment identifiers and JavaScript to create linkable URLs. The fragment identifier specifies the embedded help topic page:

6. Googlebot Ignores Fragments By Default

The Googlebot is responsible for crawling sites to find content and embedded links that will become part of the Google search index. It fetches and parses HTML, but it’s not a full blown browser and doesn’t have a JavaScript engine. As a consequence, it will normally ignore fragment identifiers and just look at the resource returned from the web server. Any JavaScript used by your page to load or build content will not be executed.

This means it would be impossible for Ajax driven sites to be indexed and have their fragment URLs returned directly in Google searches. To overcome this problem Google supports a convention that allows the Googlebot to turn fragment identifiers into query string parameters.

To use this indexing scheme you would first need to change all your fragment identifiers to start with a ! symbol:

http://www.example.com/ajax.html#mystate

would need to change to:

http://www.example.com/ajax.html#!mystate

The presence of the leading ! indicates to Google that you support this scheme.

Also, your page needs to be able supply the HTML for a given state in response to a query string parameter named _escaped_fragment_ . When the Googlebot needs the content for a given state it supplies the fragment identifier using a simple GET request and a query string value:

http://www.example.com/ajax.html?_escaped_fragment_=mystate

Ajax Caching: Two Important Facts

calendarAugust 7, 2009 in Caching , Firefox , HTTP , HttpWatch , Internet Explorer

Ajax calls are just like any other HTTP request that might be used to build a web page. However, due to their dynamic nature people often overlook the benefit of caching them.

Rule 14 of High Performance Web Sites states:

Make Ajax Cacheable

Make sure your Ajax requests follow the performance guidelines, especially having a far future Expires header.

The rest of this blog post covers two important facts that will help you understand and effectively apply caching to Ajax requests.

Fact #1 : Ajax Caching Is The Same As HTTP Caching

The HTTP and Cache sub-systems of modern browsers are at a much lower level than Ajax’s XMLHttpRequest object. At this level, the browser doesn’t know or care about Ajax requests. It simply obeys the normal HTTP caching rules based on the response headers returned from the server.

If you know about HTTP caching already, you can apply that knowledge to Ajax caching. The only real difference is that you may need to setup response headers in a different way to static files.

The following response headers are used to make your Ajax cacheable:

  • Expires: This should be set to an appropriate time in the future depending on how often the content changes. For example, if it is a stock price you might set an Expires value 10 seconds in the future. For a photograph, you might set a far futures Expires header because you don’t ever expect it to change. The Expires header allows the browser to reuse the cached content for a period of time and avoid any unnecessary round-trips to the server.
  • Last-Modified: It’s a good idea to set this so that the browser can use an If-Modified-Since header in a conditional GET request to check its locally cached content. The server would respond with a 304 status code if the data doesn’t require an update.
  • Cache-Control: If appropriate, this should be set to ‘Public’ so that intermediate proxies and caches can store and share the content with other users  It will also enable caching of HTTPS requests on Firefox.

Of course, this doesn’t apply if you use the POST method in your Ajax requests, because POST requests are never cached. You should always use the POST method if your Ajax request has side effects, e.g. moving money between bank accounts.

We’ve setup a Ajax caching demo that shows these headers in action. In HttpWatch, you can see that we’ve set all three of these headers in the Ajax response:

Ajax Caching Headers

If you click on the ‘Ajax Update’ button at regular intervals, the time only changes approximately once a minute because the Expires header is set to one minute in the future. In this HttpWatch screenshot you can see that repeated clicks of the update button cause Ajax requests that read directly from the browser cache and result in no network activity (i.e. the value in the Sent and Received columns is zero bytes) :

Ajax Caching

The final click at 1:06.531 does result in an Ajax request that requires a network round-trip, because the cached data is now more than one minute old. The 200 response from the server indicates that a fresh copy of the content was downloaded.

Fact #2: IE Doesn’t Refresh Ajax Based Content Before Its Expiration Date

Sometimes Ajax is used at load time to populate sections of a page (e.g. a price list). Instead of being triggered by a user event such as a button click, it is directly called from the Javascript that runs when the page is loaded. This makes the Ajax call behave as if it were a request for an embedded resource.

As you develop a page like this, it is tempting to refresh the page in an attempt to update the embedded Ajax content. With other embedded resources such as CSS or images, the browser automatically sends the following types of requests depending on whether F5 (Refresh) or Ctrl+F5 (Forced Refresh) is used:

  1. F5(Refresh) causes the browser to build a conditional update request if the content originally had a Last-Modified response header. It uses the If-Modified-Since request header so that server can avoid unnecessary downloads where possible by returning the HTTP 304 response code.
  2. Ctrl+F5 (Forced Refresh) causes the browser to send an unconditional GET request with a Cache-Control request header set to ‘no-cache’. This indicates to all intermediate proxies and caches that the browser needs the latest version of the resource regardless of what has already been cached.

Firefox propagates the type of refresh down to any Ajax request that is made during the loading of the page and will therefore update any Ajax derived content as if it were an embedded resource. This screen shot of the HttpWatch plugin-in for Firefox shows the effect of refreshing our Ajax Caching demo page:

Refresh of Ajax Request in Firefox

Firefox ensured that the Ajax request was issued as a conditional GET. The server responds with a 304 in our demo if the cached data is less than 10 seconds old or a 200 response with the updated content if it is out of date.

In Internet Explorer, the load-time Ajax request is treated as though it is unrelated to the rest of the page refresh and there is no propagation of the user’s Refresh action. No GET request is sent to the server if the cached Ajax content has not yet expired. It simply reads the content directly from the cache, resulting in the (Cache) result value in HttpWatch. Here’s the effect of F5 in IE before the content has expired:

IE Refresh of Ajax Request

Even with Ctrl+F5, the Ajax derived content is still read from the cache:

IE Forced Refresh

This means that any Ajax derived content in IE is never updated before its expiration date – even if you use a forced refresh (Ctrl+F5). The only way to ensure you get an update is to manually remove the content from the cache. In HttpWatch, you can do this using the Tools menu:

Clear cache Entry

New Ajax Page in the HTTP Gallery

calendarJune 18, 2008 in HTTP , HttpWatch , Javascript

We’ve added a page to our HTTP Gallery that provides an introduction to Ajax and some simple working examples. The new page is available here:

http://www.httpwatch.com/httpgallery/ajax/

BTW, you can view the AJAX requests made by this page using the free Basic Edition of HttpWatch.

Ready to get started? TRY FOR FREE Buy Now