- Large Language Models trained on vast amounts of data scrapped from the internet to generate human-like responses.
- Website publishers previously had no way to opt out of their data being used to train AI models.
Website publishers can now easily opt out of Google Bard or any other future AI models Google makes from using their data for training.
Google announced on Thursday (Sept. 28) it’s giving a way to opt out by disallowing “User-Agent: Google-Extended” in the site’s robots.txt document.
The new tool will allow sites to continue to get scrapped and indexed by crawlers like the Googlebot as the only use case for the data.
This follows a similar move by OpenAI, the creator of ChatGPT, launching a web crawler of its own recently, with instructions on how to block it.
Many sites have already moved to block the web crawler released by OpenAI. Notable among them being Medium, CNN, Reuters and the New York Times.