this post was submitted on 29 Apr 2025
533 points (97.5% liked)
Technology
69545 readers
4093 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Anyone who writes a spider that's going to inspect all the content out there is already going to have to have dealt with this, along with about a bazillion other kinds of oddball or bad data.
Competent ones, yes. Most developers aren't competent, scraper writers even less so.
That's true. Scrapping is a gold mine for the people that don't know. I worked for a place which crawls the internet and beyond (fetches some internal dumps we pay for). There is no chance a zip bomb would crash the workers as there are strict timeouts and smell tests (even if a does it will crash an ECS task at worst and we will be alerted to fix that within a short time). We were as honest as it gets though, following GDPR, honoring the robots file, no spiders or scanners allowed, only home page to extract some insights.
I am aware of some big name EU non-software companies very interested in keeping an eye on some key things that are only possible with scraping.