AI models have more than likely been trained on your website

While it’s already well known that ChatGPT and similar AI have been trained on websites and books, a recent investigation from the Washington Post has made that fact even clearer than before. They analyzed one of the data sets used by the biggest chatbots and made that data set searchable—so you can see how your own websites have contributed. The three biggest sites in the dataset: Google’s patent search, Wikipedia, and Scribd. Also on the list: b-ok.org, a site that offered pirated ebooks until it was seized by the US Justice Department last year.