Skip to content
Menu

The Great Scrape: The Clash Between Scraping and Privacy

Artificial intelligence (AI) systems rely heavily on vast amounts of data, often obtained through “scraping,” which involves the automated extraction of large datasets from the internet. Much of this scraped data is personal information. This data fuels AI tools like facial recognition, deep fakes, and generative AI. While scraping supports web searching, archiving, and valuable scientific research, its use for AI can raise ethical and societal concerns.

The rate and scale of data scraping are rapidly increasing, despite many privacy laws seemingly conflicting with such practices. This article argues for a serious reevaluation of scraping in light of privacy laws. Scraping often contravenes key privacy principles, including fairness, individual rights and control, transparency, consent, purpose limitation and secondary use restrictions, data minimization, onward transfer, and data security. Data protection laws that incorporate these principles are frequently disregarded in the context of scraping.

Scraping has largely avoided scrutiny under privacy law because those who scrape often assume that publicly available data is free to use without restriction. However, the accessibility of data does not exempt scrapers from legal and ethical considerations. Privacy laws often protect publicly available data, and privacy principles are relevant even when personal data can be accessed by others.

This article examines the inherent conflict between scraping and privacy law. With the relentless advancement and exponential growth of AI, we are experiencing what we term the “great scrape.” A significant reconciliation with privacy principles is essential.

Share via
Copy link
Powered by Social Snap