this post was submitted on 14 Oct 2024
105 points (94.9% liked)

Asklemmy

43940 readers
379 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy πŸ”

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 5 years ago
MODERATORS
 

Google, DuckDuckGo and Bing now all return the same shitty LLM-generated nonsense sites to most of my searches, and don't respect my literal search terms even when I put them in quotes.

I'm not ready to pay for search, yet.

Is there any alternative?

top 50 comments
sorted by: hot top controversial new old
[–] PhilipTheBucket@ponder.cat 65 points 1 month ago (1 children)

What are you talking about? I just tried two test queries on DDG, and neither one had LLM-generated nonsense, and the one that was in double-quotes returned only five results, all of which had the double-quoted phrase and one of which was the thing I was challenging it to find.

Can you give an example of a query where DDG returns LLM results or doesn't respect your double-quotes?

[–] bananahammock@lemmy.ca 48 points 1 month ago (2 children)

I think they are referring to the search engines returning LLM content farm websites.

[–] LadyMeow@lemmy.blahaj.zone 7 points 1 month ago (4 children)

Maybe I’m a little out of the loop, what are llm content farm websites?

[–] fjordbasa@lemmy.world 34 points 1 month ago (2 children)

Low effort websites made easier by LLM generated text. It’s not new, just made easier with the ubiquity of LLM tools. Think of it as the latest generation of spam websites πŸ™ƒ

[–] LadyMeow@lemmy.blahaj.zone 15 points 1 month ago (1 children)

Ah I see. Junk β€˜news’ and other regurgitated blah. Yeah, I’d guess any free search engine will probably be bloated with that. Not to mention that it’s google, bing, and orange bing. Not a ton of crawlers out there indexing everything is there?

[–] Darorad@lemmy.world 4 points 1 month ago

The only other (not absolutely tiny) one I'm aware of is brave, but it has its own issues

[–] anothermember@lemmy.zip 2 points 1 month ago* (last edited 1 month ago)

Ironically one of DDG's early selling points, before they fully jumped on the privacy bandwagon, was that they would filter out results for low-effort content farms (this was pre-LLM stuff).

I had used DDG since almost the beginning and it was one of the things I was originally sold on. It's difficult to find a source for it now but I did find this: https://web.archive.org/web/20110608072253/https://www.technologyreview.com/blog/post.aspx?bid=377&bpid=25532

[–] Bell@lemmy.world 6 points 1 month ago

Forbes for example

[–] JackbyDev@programming.dev 5 points 1 month ago (1 children)

Think recipe websites that take forever to get to the recipe but it's for other topics. Like a simple question, "what is the release date for X new game?" And then there will be like 5+ paragraphs of jibber jabber about the game and then finally the last article will say when it releases.

This sort of site has been around for a while but supposedly they're more common nowadays. Personally I think people just have a better eye for things not written entirely by humans. Either way it's annoying to deal with them.

[–] LadyMeow@lemmy.blahaj.zone 5 points 1 month ago

Ugh I feel like I have been seeing more of that. Asked how many ml in a wine pour and got like 5 sites that wouldn’t just come out and say it. All kinds of gobbledegook dancing around the topic but no one would just freaking say it. 140ml in case you needed it

[–] GammaGames@beehaw.org 4 points 1 month ago

Sites that mass-generate garbage using llms

[–] SkavarSharraddas@gehirneimer.de 34 points 1 month ago (1 children)

https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist is supposed to remove AI slop from the results of various search engines. Wants the uBlacklist browser addon though.

[–] WolfLink@sh.itjust.works 7 points 1 month ago

uBlacklist is an excellent add on anyway

[–] KLISHDFSDF@lemmy.ml 33 points 1 month ago (4 children)

Posted this previously:


yes. use any of the following, in no particular order:

  • ecosia.org - A non-profit certified B corp that plants trees by serving ads in your search results. Bing search underneath.
  • duckduckgo.com - A privacy friendly search engine. Primarily sourced from Bing but mixes in a few other sources.
  • any SearXNG instance - A self-hostable search front-end to various search engines.
  • marginalia.nu - specifically 'random' - An independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed.
[–] i_am_not_a_robot@feddit.uk 11 points 1 month ago (2 children)
[–] logging_strict@lemmy.ml 2 points 1 month ago

Thx for sharing

Stract on github

Can self host

[–] anothermember@lemmy.zip 1 points 1 month ago (1 children)

Are you (or is anyone here) daily-driving Stract yet? I discovered it a few months ago and thought it was everything I was looking for in a search engine, but also concluded that its search results aren't up to the standard I can use for now, so I filed it as one to look out for. Would be interested in hearing others' experiences.

[–] i_am_not_a_robot@feddit.uk 1 points 1 month ago

Unfortunately not. I'd like to, but as you say it's not quite there yet. I probably should try it more frequently.

[–] Dave@lemmy.nz 10 points 1 month ago (2 children)

I think Searx is a good suggestion. Can be a bit slow to return results because it runs the search on a bunch of search engines and compiles the results, but that helps to make sure better stuff rises to the top.

[–] logging_strict@lemmy.ml 5 points 1 month ago* (last edited 1 month ago)

The other suggestions aren't suggestions at all. They are obsoleted by searx.space

DDG ... obsolete

startpage.com ... obsolete

Browsers have default search engines. Curse everytime, DDG is accidentally queried.

DDG is a curse word!

Any centralized site, with privacy claims, is treated as lying thru their teeth. Front run future news.

[–] BigBootyBoy@sh.itjust.works 2 points 1 month ago

Just tried it and it actually worked surprisingly well

[–] limitsomething@lemmy.ml 1 points 1 month ago

Microsoft invests in Ecosia

[–] oozynozh@lemm.ee 11 points 1 month ago (2 children)

I use Qwant sometimes but it's sourced from Bing. Searx is better if you can self-host. Kagi is better if you can afford to pay (but you asked for free).

[–] Taalnazi@lemmy.world 2 points 1 month ago* (last edited 1 month ago)

There's also Startpage, also pretty good. That has search results from Google and a bit of Bing though, so if that's a dealbreaker, then yeah.

[–] abbenm@lemmy.ml 2 points 1 month ago (1 children)

Doesn't Kagi offer X amount of free searches per month before you pay?

[–] festus@lemmy.ca 5 points 1 month ago

No. They have a trial of 100 one-time searches, but that's it.

[–] lastweakness@lemmy.world 7 points 1 month ago (2 children)

Brave Search is pretty nice

[–] idotherock@lemm.ee 2 points 1 month ago

I enjoy using Brave Search. But it sucks for images for some reason. I use DDG for images.

[–] AdamBomb@lemmy.sdf.org 1 points 1 month ago

Yeah, agree. I don’t see many LLM results but I can’t say I never see any.

[–] scottmeme@sh.itjust.works 7 points 1 month ago (1 children)
[–] root@lemmy.world -1 points 1 month ago
[–] nirodhaavidya@lemmy.world 6 points 1 month ago (2 children)

Kagi if you are willing to pay for the service. I think that's reasonable but your needs may vary.

[–] davel@lemmy.ml 3 points 1 month ago (1 children)

I might pay for quality search if my searches weren’t linked back to my credit card and therefore identity.

[–] majestictechie@lemmy.fosshost.com 1 points 1 month ago* (last edited 1 month ago)

If it wasn't something I use every day and even for work, I wouldn't bother. But yeah, Kagi result are definitely better. Being able to rate sites high/lower in results and blocking some all together really has helped filter out the nonsense that a lot of search engines give

[–] codenul@lemmy.ml 6 points 1 month ago (1 children)

Not sure if its untrustworthly or not, but switched over to Startpage and been liking the results. Just wished it would implement the !bang system from ddg

www.startpage.com

[–] Avero 6 points 1 month ago

They use Googles results with a bit of Bing mixed in. Bangs should work too, like !wiki for Wikipedia or !d for DeepL They're partly owned by an adtech company though (and say they dont share anything).

[–] lordnikon@lemmy.world 4 points 1 month ago

Startpage.com works well enough

I've been using brave search on my pc and phone for maybe 6 months now. i still use google like 10% of the time if i'm searching for something that isn't in english, but otherwise, id even say for many things brave returns better results than google

[–] FlashMobOfOne@lemmy.world 3 points 1 month ago

udm14.org is about as close as you can get

[–] random@lemmy.blahaj.zone 2 points 1 month ago (1 children)
[–] AsudoxDev@programming.dev 1 points 1 month ago

Startpage uses Google.

[–] abbadon420@lemm.ee 2 points 1 month ago

You can go old school like me. I just keep a list of trusted sources. Like I work a lot with SpringBoot, so I add "baeldung" to my search. If I want to look up a vacation spot, I usually add the name of national television's vacation review show, because they always give good advice and have been doing so for over 20 years.

[–] CanadaPlus@lemmy.sdf.org 1 points 1 month ago* (last edited 1 month ago)

shitty LLM-generated nonsense sites

It's kind of impossible to filter these out computationally, regardless of brand. You pretty much need to run them through a bigger LLM than generated them, and the economics of doing that for every indexed site are obviously bad. Doing it by hand may or may not be workable either, depending on how quickly you can detect bad domains versus how quickly a new domain can be put up.

It's on us to figure out who's trustworthy, and who just sounds authoritative, unfortunately.