this post was submitted on 25 Jul 2024

655 points (100.0% liked)

196

16589 readers

2147 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago

MODERATORS

moss@lemmy.blahaj.zone

greembow@lemmy.blahaj.zone

moss@lemmy.world

queue@beehaw.org

funky_rodent@lemmy.blahaj.zone

PeachyMcPeachface@lemmy.blahaj.zone

greembow@lemmy.world

remotelove@lemmy.ca

Roflmasterbigpimp@feddit.de

qaz@lemmy.world

A_Very_Big_Fan@lemm.ee

qaz@lemmy.blahaj.zone

A_Very_Big_Fan@lemmy.world

qaz@lemmy.dbzer0.com

qaz@sh.itjust.works

qaz@lemmy.sdf.org

655

The Rule (lemmy.ml)

submitted 4 months ago by roon@lemmy.ml to c/196@lemmy.blahaj.zone

63 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] AdrianTheFrog@lemmy.world 5 points 4 months ago

I don't have access to llama 3.1 405b but I can see that llama 3 70b takes up ~145 gb, so 405b would probably take 840 gigabytes, just to download the uncompressed fp16 (16 bits / weight) model. With 8 bit quantization it would probably take closer to 420 gb, and with 4 bit it would probably take closer to 210 gb. 4 bit quantization is really going to start harming the model outputs, and its still probably not going to fit in your RAM, let alone VRAM.

So yes, it is a crazy model. You'd probably need at least 3 or 4 a100s to have a good experience with it.