this post was submitted on 31 Aug 2024

30 points (73.4% liked)

Programming

17351 readers

342 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 1 year ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

Any AI tool to analyse a git repo for malicious code? (discuss.tchncs.de)

submitted 2 months ago by unknowing8343@discuss.tchncs.de to c/programming@programming.dev

32 comments fedilink hide all child comments

I'm trying to feel more comfortable using random GitHub projects, basically.

you are viewing a single comment's thread
view the rest of the comments

[–] TootSweet@lemmy.world 31 points 2 months ago* (last edited 2 months ago) (1 children)

I don't think "AI" is going to add anything (positive) to such a use case. And if you remove "AI" as a requirement, you'll probably get more promising candidates than if you restrict yourself to "AI" (whatever that means) solutions.

[+] unknowing8343@discuss.tchncs.de -16 points 2 months ago* (last edited 2 months ago) (4 children)

I don't care if the solution is AI based or not, indeed.

I guess I thought it like that because AI is quite fit for the task of understanding what might be the purpose of code in a few seconds/minutes without you having to review it. I don't know how some non-AI tool could be better for such task.

Edit: so many people against the idea. Have you guys used GitHub Copilot? It understands the context of your repo to help you write the next thing... Right? Well, what if you apply the same idea to simply review for malicious/unexpected behaviour on third party repos? Doesn't seem too weird for me.

[–] trashgirlfriend@lemmy.world 25 points 2 months ago

AI is quite fit for the task

EXTREMELY LOUD INCORRECT BUZZER

[–] TootSweet@lemmy.world 13 points 2 months ago* (last edited 2 months ago) (1 children)

AI is quite fit for the task of understanding what might be the purpose of code

Disagree.

I don’t know how some non-AI tool could be better for such task.

ClamAV has been filling a somewhat similar use case for a long time, and I don't think I've ever heard anyone call it "AI".

I guess bayesian filters like email providers use to filter spam could be considered "AI" (though old-school AI, not the kind of stuff that's such a bubble now) and may possibly be applicable to your use case.

[–] lemmyvore@feddit.nl 1 points 2 months ago (2 children)

Bayesian filters are statistical, they have nothing to do with machine learning.

[–] TootSweet@lemmy.world 6 points 2 months ago* (last edited 2 months ago)

The A* algorithm doesn't have anything to do with machine learning either, but the first time I ever learned about it was in a computer science class in college called something like "Introduction To Artificial Intelligence".

But it's very much the case that the term "AI" has a very different meaning now-a-days during this cringy bubble than it did back in 2004 or 2005 or whenever that was.

Today "AI" is basically synonymous with "BS". Lol.

[–] 31337@sh.itjust.works 6 points 2 months ago

If you're talking about naive bayes filtering, it most definitely is an ML model. Modern spam filters use more complex ML models (or at least I know Yahoo Mail used to ~15 years ago, because I saw a lecture where John Langford talked a little bit about it). Statistical ML is an "AI" field. Stuff like anomaly detection are also usually ML models.

[–] Shareni@programming.dev 4 points 2 months ago

AI is quite fit for the task of understanding

Sure, and parrots are amazing at spotting fallacies like cherry picking...

[–] FizzyOrange@programming.dev -5 points 2 months ago (1 children)

Don't listen to the idiots downvoting you. This is absolutely a good task for AI. I suspect current AI isn't quite clever enough to detect this sort of thing reliably unless it is very blatant malicious code, but a lot of malicious code is fairly blatant if you have the time to actually read an entire codebase in detail, which of course AI can do and humans can't.

For example the extra . that disabled a test in xz? I think current AI would easily be capable of highlighting it as wrong. It probably wouldn't be able to figure out that it was malicious rather than a mistake yet though.

[–] thesmokingman@programming.dev 4 points 2 months ago (1 children)

I mean anything is a good fit for future, science fiction AI if we imagine hard enough.

What you describe as “blatant malicious code” is probably only things like very specific C&C domains or instruction sets. We already have very efficient string matching tools for those, though, and they don’t burn power at an atrocious rate.

You’ve given us an example so PoC||GTFO. Major code AI tools like Copilot struggle to explain test files with a variety of styles, skips, and comments, so I think you have your work cut out for you.

[+] FizzyOrange@programming.dev -6 points 2 months ago (1 children)

We already have very efficient string matching tools for those, though

How is a string matching tool going to find a single .?

You’ve given us an example so PoC||GTFO

🙄

[–] thesmokingman@programming.dev 5 points 2 months ago (1 children)

A single character, per your definition, is not blatant malicious code. Stop moving the goalposts.

It’s clear you don’t understand the space and you don’t seem to have any interest in acting in good faith based on your other comments so good luck.

[–] FizzyOrange@programming.dev -2 points 2 months ago

I'm not moving any goalposts. The addition of the . was very blatant. They literally just added a syntax error. It went undetected because humans don't have the stamina to exhaustively do code review down to that level. Computers (even AI) don't have that issue.

You are clearly out of your depth here.