I mean training a LLM with publicly available data is pretty fair game and legal in basically the entire world, as its legal to inform yourself with publicly available data and its legal to reference it, wich is basically what a LLM does, it doesn't copy anything. This is also why the artists "fighting" ai have 0 chance reasonably unless they can prove that the LLMs learned on data from shadow archives with inaccessible data (wich is basically impossible to prove and the company's making the training data are allowed to do it because they do it for research and not profit, they then give the training data to the company's that train their Comercial neural networks on it making it legally at worst a gray zone)
I don't like "ai" (LLMs, image Generators and so on) but the truth is they can do this.