When you sign up for a newsletter, make a hotel reservation, or check out online, you probably assume that it doesn’t matter if you type your email address wrong three times, or if you change your mind and leave Page X. Nothing actually happens until you click the submit button, right? Well, maybe not. As with so many assumptions about the Internet, this isn’t always the case, according to new research: A surprising number of websites collect some or all of your data as you enter it into a digital form.
Researchers from KU Leuven, Radboud University and the University of Lausanne searched and analyzed the top 100,000 websites and examined scenarios in which a user visits a website while in the European Union and a website from the United States visited. They found that 1,844 websites collected an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some way. Many of the websites do not appear to intend to do the data logging, instead integrating third-party marketing and analytics services that cause the behavior.
After specifically scanning websites for password leaks in May 2021, the researchers also found 52 websites where third parties, including Russian tech giant Yandex, had randomly collected password data before submitting it. The group shared their findings with these websites, and all 52 cases have since been resolved.
“If there is a submit button on a form, the reasonable expectation is that it will do something — that it will submit your data when you click it,” says Güneş Acar, professor and researcher in the Radboud University’s Digital Security Group. University and one of the leaders of the study. “We were super surprised by these results. We thought we’d find maybe a few hundred websites that collect your emails before you send them, but this far exceeded our expectations.”
The researchers, who will present their findings at the Usenix security conference in August, said they were inspired to investigate what they call “leaky forms” by media reports, particularly from Gizmodo, through third parties who collect form data regardless of submission status. They point out that at its core, the behavior is similar to what are known as keyloggers, which are typically malicious programs that log everything a target type types. But on a mainstream top 1,000 site, users probably don’t expect their information to be keylogged. And in practice, the researchers saw some variations in behavior. Some websites logged data keypress after keypress, but many pulled full submissions from one field when users clicked on the next.
“In some cases, when you click the next field, they collect the previous one, like you click the password field and they collect the email, or you just click anywhere and they immediately collect all the information,” says Asuman Senol, a data protection officer and identity researcher at KU Leuven and one of the co-authors of the study.” We did not expect to find thousands of websites; and in the US the numbers are really high, which is interesting.”
The researchers say the regional differences may be related to companies being more cautious about user tracking due to the EU’s General Data Protection Regulation and may even be less likely to integrate third-party providers. However, they emphasize that this is only one possibility and the study did not examine explanations for the differences.
Through significant efforts to notify websites and third parties that collect data in this way, the researchers found that an explanation for some of the unexpected data collections may have to do with the challenge of distinguishing a “send” action from other user actions on certain Sites differentiate pages. However, the researchers emphasize that this is not a sufficient justification from a privacy perspective.
Since the paper’s completion, the group had also made a discovery about Meta Pixel and TikTok Pixel, invisible marketing trackers that embed services into their websites to track users across the web and show them ads. Both claimed in their documentation that customers could enable “automatic advanced matching,” which would trigger data collection when a user submits a form. In practice, however, the researchers found that these tracking pixels captured hashed email addresses before delivery, an obfuscated version of email addresses used to identify web users across platforms. For US users, 8,438 websites may have shared data with Meta, Facebook’s parent company, via pixels, and 7,379 websites for EU users may be affected. For TikTok Pixel, the group found 154 sites for US users and 147 for EU users.
Researchers filed a bug report with Meta on March 25, and the company quickly assigned an engineer to the case, but the group hasn’t heard an update since. The researchers notified TikTok on April 21 — they only recently discovered the TikTok behavior — and have received no response. Meta and TikTok did not immediately respond to WIRED’s request for comment on the results.
“The privacy risks for users are that they will be tracked even more efficiently; They can be tracked across different websites, across different sessions, across mobile and desktop,” says Acar. “An email address is such a useful identifier for tracking because it is global, unique, and constant. You cannot delete it like you delete your cookies. It’s a very powerful identifier.”
Acar also notes that as tech companies seek to phase out cookie-based tracking for privacy reasons, marketers and other analysts will increasingly rely on static IDs like phone numbers and email addresses.
Since the results indicate that deleting data in a form before submitting it may not be enough to protect against any collection, the researchers developed a Firefox extension called LeakInspector to detect the collection of fraudulent forms. And they hope their findings will raise awareness of the issue, not only for regular web users, but also for website developers and administrators, who can proactively check whether their own systems, or the third-party providers they use, are collecting data from forms without collect consent.
Leaky forms are just another type of data collection to be wary of in an already extremely crowded online space.
This story originally appeared on wired.com.