this post was submitted on 20 Dec 2024

207 points (97.3% liked)

/0

1589 readers

62 users here now

Meta community. Discuss about this lemmy instance or lemmy in general.

Service Uptime view

founded 2 years ago

MODERATORS

db0@lemmy.dbzer0.com

207

Lemmy realtime CSAM detection tool updated in accuracy (lemmy.dbzer0.com)

submitted 2 days ago* (last edited 2 days ago) by db0@lemmy.dbzer0.com to c/div0@lemmy.dbzer0.com

22 comments fedilink hide all child comments

One year ago I developed the first (and from what I know, still only) real-time CSAM detection tool for the fediverse. This has been in use by this instance and recently the real-time version was put in use by lemmy.world. Unfortunately the false-positive rate was a tad too high as this was still using my original implementation in horde-safety. But through our demands in the AI Horde, we've had to constantly tweak and improve it over the past year and thereofre we've had an improved checker for a while, but not used in fedi-safety.

Unfortunately I haven't had the time/motivation to update into it recently so lemmy.world pinged me about its false positive rate being a tad too high, I felt it was a good time to do so.

So now horde-safety has been updated and it should already be more accurate. The admins of lemmy.world already put it into production and they have the most demand, so they'll report back with their findings in a week. If this is not sufficient for lemmy's purpose, I have some other ideas for tweaking it.

And yes, memes and pressure on the admins is what caused me to look into it, but remember we're all just volunteers here. I would have looked into it if y'all had asked nicely as well ;)

Speaking of volunteers, if you want to support my work in providing tooling for lemmy and the Fediverse, feel free to send some support my way which covers all of my FOSS project work.

top 22 comments

sorted by: hot top controversial new old

[–] SpiceDealer@lemmy.world 13 points 1 day ago

Keep up the good work. Much respect to unsung heros such yourself.

[–] InFerNo@lemmy.ml 4 points 1 day ago (3 children)

Can we go back to calling it CP? CSAM is my countries' online authentication and authorization platform. Every citizen uses it, it's big. I use it daily and as a developer maintain the implementation in our software. I often have to search for issues and use the term in my search 😭

Using CSAM to describe CP seemingly came outta nowhere, this platform is much older.

[–] theonlytruescotsman@sh.itjust.works 8 points 22 hours ago (1 children)

CSAM is the official term for pretty much all English speaking countries and covers material not otherwise classed as pornography. It's also at least 15 years old, since that's when I started in trust and safety and it was being used in training materials then. Your country chose poorly.

[–] InFerNo@lemmy.ml 5 points 22 hours ago

CSAM was created in 2011, our country isn't English, it's indeed a poor choice. Could be used as that abbreviation for a longer time in specialised circles, but I've only seen it being broadly used in recent years.

[–] db0@lemmy.dbzer0.com 9 points 1 day ago (1 children)

No because "porn" implies consent

[–] InFerNo@lemmy.ml 5 points 22 hours ago (2 children)

Many people in porn against their will. That's extremely short-sighted.

[–] _cryptagion@lemmy.dbzer0.com 1 points 14 hours ago (1 children)

No, short sighted was using CSAM for your software’s name.

[–] InFerNo@lemmy.ml 1 points 5 hours ago

It's not my software's name, it's a government's service name that we need to use. It's widely in use.

[–] Agent641@lemmy.world 3 points 22 hours ago

Well that would more accurately be called rape, then.

[–] cows_are_underrated 1 points 21 hours ago

Your search history must look quite suspicious.

[–] Akagigahara@lemmy.world 31 points 2 days ago (1 children)

This is really cool and interesting!

I am especially curious how the detection works. Do you use an LLM model to analyze the pictures or something else?

[–] db0@lemmy.dbzer0.com 38 points 2 days ago (1 children)

It uses a CLIP to interrogate images for various weights related to underage and lewd content and triangulates potential CSAM this way. You can look at the code yourself here: https://github.com/Haidra-Org/horde-safety/

[–] Akagigahara@lemmy.world 16 points 2 days ago

Thanks, I am not particularly well versed in analytical process, still early in my studies. But I'll give it a shot.

It's a great tool to have, cool to see it around!

[–] rickdg@lemmy.world 26 points 2 days ago

Thank you for your service o7

[–] hok@lemmy.dbzer0.com 14 points 1 day ago (2 children)

Curious, how do you evaluate the performance without breaking the law?

[–] Batman@lemmy.world 12 points 1 day ago (1 children)

Looking at this it looks like the author just... has csam to evaluate with: https://github.com/Haidra-Org/horde-safety/blob/main/tests/test_csam_checker.py

Guess we don't know the laws where they live though. Or where they run the program.

[–] chicken@lemmy.dbzer0.com 9 points 1 day ago

It seems like it could be legally problematic, but I'm not sure what the alternative would be other than accepting the privacy/autonomy nightmare of funneling all traffic through a government affiliated centralized service.

[–] Railcar8095@lemm.ee 9 points 1 day ago (1 children)

I didn't delve very deep, but it seems it uses a pre trained model that classifies images with anime tags (loli, lewd...) and gives some weights to those. I guess at some point the author will review the results of the real images on Lemmy and use them to tweak those weights, which I understand might be way at least temporary illegal (they could destroy the image and keep only the transformed version, which is "impossible" to turn back into the original image, do it still works as training data)

DeepDanbooru is the model of you're interested.

[–] db0@lemmy.dbzer0.com 9 points 1 day ago* (last edited 1 day ago) (1 children)

Deepdanbooru is one of the two models. We also use open ai clip

[–] Railcar8095@lemm.ee 3 points 1 day ago (1 children)

Well, if that was my only mistake then your code was surprisingly easy to follow (rule number one of github: never read the readme.md)

How do you deal with tweaking? Do you get random samples from Lemmy? Anything you can share about the legal aspect.of it?

[–] db0@lemmy.dbzer0.com 4 points 1 day ago

horde-safety is there primarily to protect the AI Horde. We have no shortage of creeps and pedos trying to use our crowdsource resources.

[–] araneae@beehaw.org 6 points 1 day ago

That kicks ass! I was witness and arguably part or a circular firing squad shooting you for the art you make with Horde but this is undeniably a positive application and a wonderful tool for the 'verse. Thanks for your work.