this post was submitted on 27 Jun 2024
3 points (61.5% liked)

Technology

33636 readers
217 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
top 4 comments
sorted by: hot top controversial new old
[–] cygnus@lemmy.ca 3 points 1 week ago

Finally some good "AI" news. Those things aren't going away, so I'm happy to see any improvements to their energy efficiency.

[–] theshatterstone54@feddit.uk 2 points 1 week ago* (last edited 1 week ago) (1 children)

Why are people downvoting? This is huge and should make LLMs more power efficient and memory efficient.

[–] yogthos@lemmy.ml 0 points 1 week ago

Indeed, this seems like a big step forward, and here's a link to the model https://github.com/ridgerchu/matmulfreellm

[–] autotldr@lemmings.world 0 points 1 week ago

This is the best summary I could come up with:


The researchers' approach involves two main innovations: first, they created a custom LLM and constrained it to use only ternary values (-1, 0, 1) instead of traditional floating-point numbers, which allows for simpler computations.

Second, the researchers redesigned the computationally expensive self-attention mechanism in traditional language models with a simpler, more efficient unit (that they called a MatMul-free Linear Gated Recurrent Unit—or MLGRU) that processes words sequentially using basic arithmetic operations instead of matrix multiplications.

These changes, combined with a custom hardware implementation to accelerate ternary operations through the aforementioned FPGA chip, allowed the researchers to achieve what they claim is performance comparable to state-of-the-art models while reducing energy use.

Researchers claim the MatMul-free LM achieved competitive performance against the Llama 2 baseline on several benchmark tasks, including answering questions, commonsense reasoning, and physical understanding.

The researchers project that their approach could theoretically intersect with and surpass the performance of standard LLMs at scales around 10²³ FLOPS, which is roughly equivalent to the training compute required for models like Meta's Llama-3 8B or Llama-2 70B.

The article was updated on June 26, 2024 at 9:20 AM to remove an inaccurate power estimate related to running a LLM locally on a RTX 3060 created by the author.


The original article contains 570 words, the summary contains 206 words. Saved 64%. I'm a bot and I'm open source!