this post was submitted on 17 Feb 2024
0 points (NaN% liked)

Technology

37599 readers
273 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
top 23 comments
sorted by: hot top controversial new old
[–] SorteKanin@feddit.dk 0 points 7 months ago (1 children)

Remember the whole "if you aren't paying for the product, you are the product"?

It wasn't enough to turn you into a product. Now they also want to turn you into a resource. Farming your comments and posts to feed to an AI model.

What an economy we've built.

[–] tux0r@feddit.de 0 points 7 months ago (1 children)

I wonder why I don't pay for Lemmy.

[–] SorteKanin@feddit.dk 0 points 7 months ago (3 children)

The kind of frightening thing is that anyone could start an instance on the Fediverse, collect all the posts and comments coming in as all instances usually do and then use it to do the same thing, and I'm not sure there's currently anything (legally or otherwise) stopping them.

But at least we have the option to defederate such an instance. If we can find out which ones do it...

[–] BitOneZero@beehaw.org 0 points 7 months ago (1 children)

Free and open information, like Wikipedia, used to be an ideal. I have used Reddit since 2008 or earlier because it got on search engines and shared information consistently on precise topics. Twitter used to also be this way, but now mostly only puts paid subscribers on search engines.

If you are to organize information around topics, such as a Commodore 64 community, and the protocol openly allows copies to be made via federation, I encourage people to have the attitude that information be treated like Wikipedia content. It sucks now that so much information from 10 years ago has been just entirely lost now that so many deliberately purged their Reddit comments, etc. Tragedy of the commons. And it drags down the entire planet that people squirrel away discussions on topics that are generally public. It's like now everyone wants to monetize even their discussions on Commodore 64 or automotive repair / have behind absolute control or paywalls /etc.

[–] tryptaminev@feddit.de 0 points 7 months ago

I wouldn't consider this a tragedy of the commons situation. People entrusted reddit to remain a somewhat acceptable company, and reddit betrayed that trust.

People didn't purge their comments to remove this information from the public, but they purged it from reddit making money off limiting the access to this information.

[–] lemmyingly@lemm.ee 0 points 7 months ago (1 children)

If an instance is defederated, the owners can just spin up a new instance.

I've always thought about what you've said about Lemmy when people start talking about how Lemmy is more privacy focused than Reddit.

As one of your replies have said many people in the hundreds/thousandths have a copy of your data on Lemmy - the instance owners. If you decide you've shared too much information then you end up asking every owner to delete that nugget of information. And realistically there is nothing to enforce it. This is one benefit of the walled garden of places like Reddit because they are legally obligated to delete the information especially in places like the EU.

[–] SorteKanin@feddit.dk 0 points 7 months ago (1 children)

This is one benefit of the walled garden of places like Reddit because they are legally obligated to delete the information especially in places like the EU.

In theory yes, but anyone can also scrape reddit for all its posts and comments (and someone likely is). And nobody is making them delete the data. And then there's stuff like the Internet archive complicating stuff further.

[–] lemmyingly@lemm.ee 0 points 7 months ago (1 children)

Whilst true about anyone can scrape data off Reddit, I think it's more of a pain since before the API updates the rate limit was 2 API calls per second. You also have to find or create a scraper. With Lemmy, you follow the instructions (copy and paste) on join-lemmy.org to create your instance and you're done. Both methods you have to configure it to subscribe to communities, so they're about the same.

In the EU at least there is a right to be forgotten, so yeah, Reddit and other platforms are forced to delete the data on request. I'm not sure how the same can be applied to a distributed network like Lemmy.

There were publicly available archives of Reddit. The last time I checked, you couldn't find the latest submissions and comments. Maybe things have changed, maybe newer alternatives have appeared.

[–] tryptaminev@feddit.de 0 points 7 months ago

For the right to be forgotten, this only applies to personal information. E.g. information that can be associated with information, that could be used to identify you.

Since you usually have an email for signup, that would make the data fall under personal information. But reddit could just delete the email adress and your user name and show something like:

[deleted]
When does the Narwhal bacon?

And well, it is pretty difficult to find out if, when and where there is backups that still contain your information and could be given to the AI model trainers too. To find these things out, we'd need a precedence case that makes a data protection agency investigate reddit throughouly.

[–] Sibbo@sopuli.xyz 0 points 7 months ago (1 children)

Legally, in EU, you probably cannot scrape an instance of someone else because of the database copyright law. But I have no idea if that applies to being part of the network. Since the other instances send you their content willingly.

Maybe someone should make a license extension to ActivityPub, where instances can communicate what can and what can't be done with the information they publish. Then at least there would be legal clarity. If it can be enforced is another question.

[–] Kichae@lemmy.ca 0 points 7 months ago (1 children)

The thing is, the license probably doesn't mean a whole lot in that case because of the way content is shared on the Fediverse.

As you say, you actively send your content to other websites, and licenses need at least some degree of active acceptance. Including a license field in the metadata almost certainly does not meet any kind of legal threshold. It's significantly weaker than the EULAs they everyone knows that nobody reads.

[–] tux0r@feddit.de 0 points 7 months ago

The content posted here has no obvious license. I wonder if an administrator could just put any license of his choice on your posts.

[–] DragonTypeWyvern@literature.cafe 0 points 7 months ago (1 children)

Funny, I don't see anyone saying the AI companies have free right to Reddit's content.

[–] Natanael@slrpnk.net 0 points 7 months ago (1 children)

Can users opt out? Because the content belong to the users

[–] tryptaminev@feddit.de 0 points 7 months ago

my layman understanding would be, that they include it in the TOS and your only option would be to leave the platform and demand them to delete all your content, which they may or may not do. E.g. they could just train the AI on an older backup. Good luck getting your rights recognized and abided by.

[–] DeltaTangoLima@reddrefuge.com 0 points 7 months ago (1 children)

And that's why I deleted all my posts and comments before deleting my account. Sure, they could probably go back and restore it if they wanted but, so far, they haven't.

Glad I landed here on Lemmy.

[–] Phen@lemmy.eco.br 0 points 7 months ago (2 children)

I deleted all my comments last year. Recently I got a notification for a response in one of such comments. When I clicked the notification link, my comment and the response were visible. The comment doesn't show up in my profile.

[–] Hubi@feddit.de 0 points 7 months ago

I've had the same experience. Most scripts just erase the comments available directly through your reddit profile, which is limited to the most recent ~2000 posts that you've made. To fully erase anything and everything, you need to request all your data from reddit, download the .zip and feed it into an application like shreddit.

[–] thatsnothowyoudoit@lemmy.ca 0 points 7 months ago* (last edited 7 months ago) (1 children)

Reddit was aggressively rate limiting tools used to delete and edit content in a funny way when the API pricing was announced. The API wouldn’t return an error, the rate limiting was silent, and the tools would report successful deletion or edits even when the edit or deletion wasn’t made.

I had to modify an existing script to handle the 5-second rate limit and, lieu of deleting, I just rewrote each comment with a farewell.

Even then I did 3 passes (minor additional edits) in cases Reddit was saving previous edits.

My content has stayed edited.

[–] dubyakay@lemmy.ca 0 points 7 months ago* (last edited 7 months ago) (1 children)

Do you still have the Python script available?

I was fine with keeping my comments up before for the future searchers, but I'm not fine with that shithole making profit off of it.

[–] Hubi@feddit.de 0 points 7 months ago

I recently used shreddit with the --gdpr-export-dir flag and it worked perfectly.

[–] bilboswaggings@sopuli.xyz 0 points 7 months ago (1 children)
[–] Hubi@feddit.de 0 points 7 months ago* (last edited 7 months ago)

And the outputs of bots. There has been a shocking increase in auto-generated comments on reddit in the past years and it's turning the training data into a minefield.