History Sniffing

Tech @ FTC 2013-03-21

We all know about links on web pages.  We’ve all also noticed that links we’ve visited look different.

You’ve never visited this one, right?  (I’m pretty sure you haven’t, since the link is to a randomly-generated URL.)  On the other hand, if you’re reading this blog you probably have visited http://www.ftc.gov/ or http://techatftc.wordpress.com/.  (If not, visit one of them now and come back to this page.)  You can see from this example how they look on this blog.

Visited and unvisited links are displayed differently on different web sites.  Here’s what a piece of this page looked like when I was creating this blog post:

image001

It looks different on the live blogging site, though.  The ability to customize the look and feel of a site is very important in today’s Web.  If nothing else, being able to change how links are displayed can make that aspect consistent with the rest of a web page’s visual design.  Unfortunately, there’s a privacy risk: doing that makes it possible for web sites you visit to figure out where you’ve been.

This technique, sometimes called “history sniffing”, isn’t a newly-discovered problem; a Firefox bug report about it was created more than a dozen years ago.  It’s not a simple problem to solve, though, because of the complexity of modern web pages: there are many different ways in which link-styling can be indicated, and many ways in which it can be queried.

The simplest setup starts with a Cascading Style Sheet entry for link-coloring:

:link {    /* for unvisited links */    color: blue;}:visited {    /* for visited links */    color: purple;}

(You can find many different variants by searching the web for “history sniffing”; this particular one is taken from an excellent web page by L. David Baron, a long-time Firefox developer.)  So far, so good; problems arise because it’s possible to query the color of any given link, and hence its “visited” status.

Again, there are many different ways to do this.  Some rely on the JavaScript function getComputedStyle(); here’s Baron’s example:

if (getComputedStyle(link, "").color == "rgb(0, 0, 128)") {        // we know link.href has not been visited} else {        // we know link.href has been visited

(I’ve omitted the code that sets the variable “link”; again, see his page for the gory details.)  Others use built-in features of CSS.    (See this paper for a technical analysis of different schemes.)  Once a web page has detected that you’ve visited some site of interest, it can request some specific page, not because it wants the content—it might just be a transparent GIF—but as a signaling mechanism.  The net result is a violation of user privacy.

What should you do?  If you’re a web site developer, you certainly should not engage in history-sniffing; apart from being unethical, you might run into legal difficulties.  Indeed, the FTC has just announced a settlement in a history-sniffing case.  Equally important, if you include content from other sites on your pages (and most commercial sites do), make sure they’re not doing anything nasty.

Consumers face a harder problem.  The simplest thing to do is to upgrade to a modern browser; today’s browsers incorporate certain defenses.  Firefox, for example, now has a variety of features to defeat most forms of history-sniffing; Internet Explorer has similar defenses as of IE9.  (The IE9 explanation also has a nice demo page at http://www.debugtheweb.com/test/cssvisited.htm—try it to see if your browser has fixed the problem.)  For example, JavaScript code only sees the “unvisited” colors, and URLs loaded via CSS directives are always loaded, regardlessof whether the directives apply to the visited or unvisited variants.   (Installing the latest version of a browser tends to solve many other security problems as well.  Unless you have a really good reason to refrain, you’re probably well-advised to turn on autoupdate.)

You can always clear your browser history before visiting a site you suspect of doing this, though of course that deprives you of history information.  Better yet, switch to “private browsing mode” (sometimes known as “incognito mode”).  These solutions, of course, assume that you know which sites are doing this.  A study published two years ago showed some modest incidence of history-sniffing in the wild, including on one Alexa “Top 100” site.