this post was submitted on 14 Nov 2024
49 points (98.0% liked)

Fediverse

28493 readers
317 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS
 

I don't actually want to do this right now, but I do want to know if it's really decentralized yet. Completely looks like it means each of:

  • A client ✅
  • A personal data server ✅
  • A relay ❓
  • Labelers ✅
  • Feed generators ✅

It looks like the relay might be the bottleneck. If I'm understanding the protocol correctly, a relay could consume less than the whole network so it doesn't have to be ridiculously expensive to operate, but I'm not finding examples of people doing it.

top 11 comments
sorted by: hot top controversial new old
[–] Blaze 1 points 6 days ago

ive been asking about this for a long time. ive yet to be presented with a non-bluesky controlled relay instance. this is a lynch-pin of the protocol and prevents true federation.

happy to be proven wrong someday, but bluesky is just twitter with user-contolled nodes. they can decide to remove nodes at their whim.

[–] surfrock66@lemmy.world 10 points 1 week ago (3 children)

This is a good breakdown. A firehose relay takes TB's of storage and is not practical for self-hosting, and AppView isn't hostable yet: https://alice.bsky.sh/post/3laega7icmi2q

[–] originalucifer@moist.catsweat.com 5 points 1 week ago (1 children)

A firehose relay takes TB's of storage

which is similar nonsense which ActivityPub has with replicating whole datasets everywhere.. cept its one company controlling the whole shebang. its a failure of design.

[–] 9point6@lemmy.world 5 points 1 week ago* (last edited 1 week ago) (1 children)

My friend, it's not nonsense, it's basically how decentralised communication has to work if you want any reasonable level of recency & history in the data.

Usenet was basically the original and I believe a modern news provider requires something like 50 petabytes of storage to run a 10 year data retention service

[–] Cris_Color@lemmy.world 2 points 1 week ago (2 children)

Not the person you replied to-

I don't follow why it would be necessary, would you mind expanding on why its needed for decentralized interaction to function the way users would expect?

(Also I recognize that might be a huge can of worms, if you do mind thats perfectly understandable. You seem more knowledgable than myself and its an issue I'm very curious about, so it seemed worth asking :)

[–] Kichae@lemmy.ca 2 points 6 days ago

From my experience, how people expect it to work is to be centralized and neutrally hosted, with instances acting as dumb portals, mainframe+terminal style. So it cannot work as people expect and be decentralized.

[–] 9point6@lemmy.world 6 points 1 week ago* (last edited 1 week ago) (1 children)

Essentially for something to be decentralised and not ephemeral, everyone needs a copy of the data.

To go into a bit more detail—one of the biggest benefits of decentralised systems is generally redundancy has to be built in otherwise you have a Single Point Of Failure™️, and then you get data loss when it's gone. Given any sensible decentralised system is designed to avoid this scenario, that data has to be somewhere, and generally the simplest and less expensive (in terms of processing) way to improve on data in one place, is to have it in every place. Any time the data isn't in one place or every place, you then have an exercise in figuring out where it actually is. This "finding it" processing is going to take time and effort, and if you imagine a standard semi-popular lemmy post, that's potentially data coming from all sorts of different places, which may or may not be there—this would inevitably make request times ridiculous and basically no one would use it.

At the end of the day, any kind of processing is energy, cost & time expensive, whereas storage makes that part of the process effectively instant and is much cheaper than increasing processing power in both cost and energy.

So basically in this use case and many like it: it makes sense if you're trying to pick what to optimise, you optimise for lower processing and higher storage requirements rather than vice versa.

The history aspect is more straightforward to understand given the above, if you expect people to care what happened a year ago and want to support that, that data needs to live somewhere

[–] Cris_Color@lemmy.world 1 points 6 days ago

Thank you very much the full explanation, there's a lot of that I hadn't considered before. Intuitively it feels like it defeats a lot of the point for everything to be hosted over and over, but I can see how it'd be really hard for things to work as expected if there's only one copy of EVERY comment in a big thread and they're all in different places

[–] Zak@lemmy.world 4 points 1 week ago

That's enlightening. It links to an article about self hosting a relay, which explains that, as I suspected, a relay does not have to mirror the entire network. It also seems that using a relay at all is an optional optimization.

It looks like the BlueSky AppView is not (yet?) open source. I wonder why nobody has built an alternative yet.

[–] pablo@lemm.ee 1 points 1 week ago* (last edited 1 week ago)

For better or for worse, this fragmentation on the fediverse (which causes the missing replies and posts in smaller instances) is what allows fediverse instances to be hosted by smaller actors and one of the most important reasons I feel the fediverse can and will survive (regardless of how popular other services become)

It feels quite unrealistic that a hobbyist or even small organisations would ever be able to fully host bluesky. Unless I’m fundamentally misunderstanding something about how it works.

Can relays be lazy, be used by a limited amount of users and just stream those users activities plus what they follow? If that is possible then I guess you’d be losing the benefit of algorithmic feeds, though… or again I fundamentally don’t understand something about the protocol.