The internet’s original data collection tool is still ubiquitous, even as the web moves away from it
By: Colin Lecher
Originally published on The Markup.
Cookies may be one of the most maligned parts of the internet, but they weren’t always so notorious. Back in 1994, a young man named Lou Montulli developed the cookie as a way for website operators to help users save work by remembering them across multiple visits.
The humble idea quickly caught on and morphed into a tool for advertisers to closely track user behavior across the internet and target their ads appropriately.
“When advertisements became popular, especially with Google and all these ad markets, then there was more momentum toward finding and tracking data because the advertising had to be personalized,” said Rahul Telang, a professor of information systems at Carnegie Mellon University.
Today, cookies are pervasive on the modern web. But there are also signs that they’re on their way out: In 2019, Mozilla announced that its popular Firefox browser would block third-party cookies by default, describing the change in a blog post as “a major step in our multi-year effort to bring stronger, usable privacy protections to everyone using Firefox.” Last year, Apple introduced similar default protections for the Safari browser.
Google, which brought the business model of tracking users for ad targeting to massive scale, has been slower to adopt similar changes. After initially pledging in 2020 to block third-party tracking for users of its Chrome browser by 2022, Google pushed the date for the change back to 2023.
For now, however, cookies are still nearly ubiquitous. When The Markup scanned more than 80,000 popular websites using our web privacy inspection tool Blacklight, we found that 87 percent loaded cookies from third parties or from tracking network requests.
And even once cookies are gone, the technologies slated to replace them come with concerns of their own.
So what, exactly, is a cookie? And what would getting rid of them actually solve?
Here’s a rundown.
What Is a Cookie?
Simply, it’s a small file that tags website visitors to recognize them later. When you browse a website with cookies, the file is stored on your computer. Later, websites and tracking companies can look at that file to see who you are and determine certain things about your behavior, like whether you return to the site frequently or put certain shopping items in your cart on the site the last time you visited.
In one commonly used analogy, it’s like a coat check. You hand over your coat and get a ticket in exchange, so the attendant can determine what belongs to you when you return.
So when you visit, say, a shopping website, a cookie might determine what products you look at. Another cookie might be used to remember your login information so you don’t have to reenter your password every time you visit.
Cookies come in different flavors. There are first-party cookies—ones that come from the site you’re visiting—along with third-party cookies, which load when you visit a site but aren’t necessarily from the site you’re visiting. You may be shopping for shoes on a retail store’s site, for example, when a Facebook tracker starts to follow you around.
The trackers can either be “session” cookies or “persistent” cookies. Session cookies, as the name suggests, expire when you end your session, by closing your browser, for example. But persistent cookies can stick around until they reach an expiration date, possibly months or even years later.
What’s the Problem?
For one, the information collected by cookies can be extraordinarily sensitive. To build a profile of you, cookies can track information about your browsing history to guess your demographics and interests. If you’re a 45-year-old woman who frequents websites for soccer scores, for example, that’s a data point that could be valuable to advertisers looking to sell soccer jerseys.
Using data obtained from cookies, advertisers can then target ads directly to people they think might interact with them. They can also check whether someone has seen an ad, or interacted with one already. Eventually, they can build a dossier that works out your age, interests—and even, with some effort, potentially identify exactly who you are.
All of it happens in a way that’s invisible to most people.
“Your browsing history could be shared with dozens of different companies that you’ve never heard of,” said Bennett Cyphers, a staff technologist at the Electronic Frontier Foundation (EFF) who has followed recent changes in web-tracking technology. “It’s very difficult to figure out that it’s happening at all, and then it’s almost impossible to figure out what happens to that data after it leaves your computer.”
An investigation by The Markup using Blacklight uncovered just how sensitive that information can be. Last year we found user data being tracked for advertisers on more than 100 websites offering services for undocumented immigrants, domestic and sexual abuse survivors, sex workers, and LGBTQ people.
In the United States, there’s even less protection. One state law, the California Consumer Privacy Act, or CCPA, requires disclosures about how data is collected and stored but does not require consent for cookies.
There are some cookie-blocking options built by third parties. Tools like the browser extension uBlock or the EFF-built Privacy Badger can stop unwanted cookies from loading, but they often also block ads, which means that websites try to block users of those tools.
The good news is the internet seems to be trending away from the cookie. Cyphers said consumer awareness of web tracking and more ways for those consumers to opt out have led to diminishing returns for advertisers. “Most people don’t want to go around sharing their browser history with random strangers,” Cyphers said.
Bowing to that consumer demand, Mozilla’s Firefox and Apple’s Safari both moved to block third-party tracking by default on their popular browsers in the past few years, and Google has pledged to follow suit with its Chrome browser. The changes have led to uncertainty for companies who have built their businesses around advertising based on user behavior. Some have taken to calling it the “cookiepocalypse.”
But even if the cookie meets its demise, there are hints that the tracking tech of the future may introduce its own concerns.
Google, for example, has proposed a series of technologies like FLoC, short for Federated Learning of Cohorts, which, instead of allowing advertisers to use third-party cookies to track visitors, would effectively track user behavior on the Chrome browser directly, then sort users into groups and share that information in bulk with advertisers. Google describes it as a “privacy-first future” solution, but privacy advocates aren’t so certain.
“FLoC is based on large anonymous groups, not tracking individuals across the web as third-party cookies do today,” Vinay Goel, privacy engineering director for Chrome, said in a statement. “Chrome has also built into FLoC robust measures removing groupings/classifications that may be more strongly associated with sensitive topics such as race, sexuality, or personal hardships, without learning specifically which sensitive topics.”
Cyphers, for one, has been skeptical of Google’s plan, recently describing it in a blog post as “a terrible idea” and simply trading one form of surveillance for another.
Telang, the Carnegie Mellon professor, said he’s encouraged by the push for improved privacy—but that it’s not clear whether changes made by companies will ultimately lead to a better future for consumers. “Right now, we only know that, hey, they’ll stop the private information being shared,” said Telang. “But will it lead to improved security? That’s a question that I don’t know the answer to right now.”
As Cyphers pointed out in a recent blog post for the EFF, some smaller advertisers are also pitching their own plans to continue tracking users in a post-cookie world, possibly by pressing them to more frequently share unique IDs like email addresses.
Cyphers said changes like that would be relatively transparent for users—but would also mean handing over personal information closely tied to your identity that could be used to track you for years into the future. “It’s better and it’s worse,” Cyphers said. “I think it’s mostly worse.”
Whether one, or none, of the ideas gain steam, the future of the internet beyond the cookie is at a clear turning point.
“It’s still the most common way that people are tracked on the web,” Cyphers said, “but I think that over the next few years, that is going to change.”
This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.