AI Companies Ignore Website Protocols and Continue Scraping

AI companies are reportedly bypassing website protocols to scrape data.

Author :Muhammed KayanDate : 22 June 2024Category : Tech News

It has come to light that several AI firms have been found disregarding protocols designed to prevent them from scraping websites. The Robots Exclusion Protocol, also known as robots.txt established in 1994 to guide web crawlers on which pages they are allowed to access is being overlooked by these AI companies. This development has sparked worries, among publishers and content creators regarding the use of their data.

As per a communication from TollBit, a startup facilitating licensing agreements between publishers and AI firms “AI agents from sources (not limited to one company) are choosing to bypass the robots.txt protocol for retrieving content from websites.” While adherence to robots.txt is not mandatory it has long been a practice in the community. The communication did not mention companies. Reports indicate that OpenAI and Anthropic known for creating the popular ChatGPT and Claude chatbots respectively are, among those disregarding the protocol.

Implications for Publishers and AI Companies

AI Companies Ignore Website Protocols and Continue Scraping

The unauthorized copying of website content, by AI companies poses challenges for both publishers and the AI industry. Publishers worry about their intellectual property being misused and losing control over their content. They are concerned that AI generated summaries or rephrased versions of their articles could be inaccurate or fail to give credit potentially harming their reputation and income.

On the hand AI companies argue that the Robots Exclusion Protocol is not a framework and that a new understanding between publishers and AI firms may be necessary. Some AI leaders have defended their actions suggesting that publishers may have to adjust to the evolving landscape of content creation and distribution in the age of intelligence. Nevertheless the lack of transparency and apparent disregard for website owners preferences have sparked concerns, about these AI companies practices.

Giriş

Register

Lost Password

AI Companies Ignore Website Protocols and Continue Scraping

AI companies are reportedly bypassing website protocols to scrape data.

Implications for Publishers and AI Companies

Share

Related Contents

Sony Unveils PS5 Pro with Larger GPU, Ray Tracing, and AI Upscaling

Huawei Scores Big with 3M Foldable Phone Pre-Orders

Galaxy S25 Ultra: Compact Design, Revamped Camera Specs

Bluesky Soars to 9 Million Users Amidst Social Media Boom

Apple Intelligence: Two Key Features Delayed Until iOS 18.2

Apple Watch Series 10 Rumored to Detect Sleep Apnea

Leave a ReplyCancel Reply

Most Read

Physics-Based Besiege Coming to PS5, PS4, Switch on Dec 12

Native American-Inspired Adventure ‘As the Leaves Fall’ Announced for PC

Bungie Revamps Destiny 2 with New Release Strategy

Sony Unveils PS5 Pro with Larger GPU, Ray Tracing, and AI Upscaling

Huawei Scores Big with 3M Foldable Phone Pre-Orders

Advertising

Reviews

Popular Topics

Categories

Popular Contents

About Us