this post was submitted on 18 Jul 2024
326 points (99.7% liked)

TechTakes

1425 readers
149 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

we appear to be the first to write up the outrage coherently too. much thanks to the illustrious @self

you are viewing a single comment's thread
view the rest of the comments
[–] Lumisal@lemmy.world -3 points 4 months ago (2 children)

Mistral isn't trained on copy righted data. It's based off selective databases that were open use. This article in general is full of false information. But I suppose most people only read the headlines.

[–] sailor_sega_saturn@awful.systems 26 points 4 months ago (2 children)

https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8#6527a6fca6eaf92e6c26fa59

Unfortunately we're unable to share details about the training and the datasets (extracted from the open Web) due to the highly competitive nature of the field.

The "open web" is full of copyrighted material.

[–] fasterandworse@awful.systems 13 points 4 months ago* (last edited 4 months ago)

We had a social contract!

Mustafa Suleyman

[–] froztbyte@awful.systems 9 points 4 months ago

but it's apache2 sega! tooooootes freebies!

[–] fasterandworse@awful.systems 12 points 4 months ago (1 children)
[–] Lumisal@lemmy.world -5 points 4 months ago (1 children)
[–] fasterandworse@awful.systems 11 points 4 months ago (1 children)

if you're not gonna read the fucken thing then fuck off.

[–] Lumisal@lemmy.world -5 points 4 months ago (2 children)

I did read the thing, then provided an article explaining why detecting copyrighted material / determining if something is written by AI is very inaccurate.

Perhaps take your own advice to "read the fucken thing" next time instead of making yourself look like an idiot. Though I doubt you've ever heard of "better to stay silent and let them think you the fool than to speak and remove all doubt".

Btw, I even recall that Ars specifically covered the company you linked to in a separate article as well. I'd be glad to provide it once you've come to your senses and want to discuss things like an adult.

[–] froztbyte@awful.systems 14 points 4 months ago (2 children)

Mistral’s Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of the prompts.

did you know that a lesser-known side effect of the infinite monkeys approach is that they will produce whole sections of copyright content abso-dupo-lutely by accident? wild, I know! totes coinkeedink!

I’d be glad to provide it once you’ve come to your senses and want to discuss things like an adult

jesus fucking christ you must be a fucking terrible person to work with

I've seen toddlers throw more mature tantrums

[–] fasterandworse@awful.systems 9 points 4 months ago

she wrote harry potter with an llm, didn't she?

[–] fasterandworse@awful.systems 11 points 4 months ago (1 children)

you're conflating "detecting ai text" with "detecting an ai trained on copyrighted material"

send the relevant article or shut up