this post was submitted on 21 Apr 2025
11 points (86.7% liked)

Technik

759 readers
15 users here now

die Community für alles, was man als Technik beschreiben kann


the community for everything you could describe as technology


Beiträge auf Deutsch oder Englisch


Posts in German or English

founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] excral 2 points 1 week ago

My guess is that one of the major motivations of this is to identify their own texts when training future AI models. Training LLMs on LLM-generated data is harmful to their performance and leads to regression, but more and more data they scrape from the internet is LLM-generated. With measures like this they may be able to filter out a significant chunk of the data they generated themselves from future training data.