this post was submitted on 21 Apr 2025
11 points (86.7% liked)
Technik
759 readers
15 users here now
die Community für alles, was man als Technik beschreiben kann
the community for everything you could describe as technology
Beiträge auf Deutsch oder Englisch
Posts in German or English
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
My guess is that one of the major motivations of this is to identify their own texts when training future AI models. Training LLMs on LLM-generated data is harmful to their performance and leads to regression, but more and more data they scrape from the internet is LLM-generated. With measures like this they may be able to filter out a significant chunk of the data they generated themselves from future training data.