Machine Learning

1

3

New AI model “learns” how to simulate Super Mario Bros. from video footage (arstechnica.com)

submitted 6 days ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

2

11

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o) (huggingface.co)

submitted 1 week ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

"Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close.

The technique that drives Reflection 70B is simple, but very powerful.

Current LLMs have a tendency to hallucinate, and can’t recognize when they do so.

Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer.

Additionally, we separate planning into a separate step, improving CoT potency and keeping the outputs simple and concise for end users.

Important to note: We have checked for decontamination against all benchmarks mentioned using @lmsysorg’s LLM Decontaminator.

The weights of our 70B model are available today on @huggingface here: https://huggingface.co/mattshumer/Reflection-70B

@hyperbolic_labs API available later today.

Next week, we will release the weights of Reflection-405B, along with a short report going into more detail on our process and findings.

Most importantly, a huge shoutout to @csahil28 and @GlaiveAI.

I’ve been noodling on this idea for months, and finally decided to pull the trigger a few weeks ago. I reached out to Sahil and the data was generated within hours.

If you’re training models, check Glaive out.

This model is quite fun to use and insanely powerful.

Please check it out — with the right prompting, it’s an absolute beast for many use-cases.

Demo here: https://reflection-playground-production.up.railway.app/

405B is coming next week, and we expect it to outperform Sonnet and GPT-4o by a wide margin.

But this is just the start. I have a few more tricks up my sleeve.

I’ll continue to work with @csahil28 to release even better LLMs that make this one look like a toy.

Stay tuned."

https://x.com/mattshumer_/status/1831767014341538166

3

6

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI (www.lifeiscomputation.com)

submitted 1 week ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

4 comments fedilink

4

5

The Difference Between Speaking and Thinking (www.theatlantic.com)

submitted 1 week ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

https://archive.is/SXZMe

5

Diffusion Models Are Real-Time Game Engines (gamengen.github.io)

submitted 2 weeks ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

6

4

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. (github.com)

submitted 2 weeks ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

7

1

Transformer Explainer (poloclub.github.io)

submitted 4 weeks ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

8

4

Alibaba claims no. 1 spot in AI math models with Qwen2-Math (venturebeat.com)

submitted 1 month ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

9

4

How to convert a positionally encoded predicted embedding from a decoder to its matching token? (infosec.pub)

submitted 1 month ago by yboutros@infosec.pub to c/machinelearning@lemmy.ml

0 comments fedilink

When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn't the predicted/decoded embeddings also be positionally encoded?

10

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow (decrypt.co)

submitted 1 month ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

1 comments fedilink

11

20

AI models collapse when trained on recursively generated data (www.nature.com)

submitted 1 month ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

3 comments fedilink

12

1

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing (lmsys.org)

submitted 2 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

13

1

Alibaba's Qwen LLM model leading open source rankings (huggingface.co)

submitted 2 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

14

1

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters! (arxiv.org)

submitted 2 months ago* (last edited 2 months ago) by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

15

1

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0 (www.together.ai)

submitted 2 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

16

1

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate (huggingface.co)

submitted 3 months ago by ylai@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

17

1

Torrent tracker for open models (aitracker.art)

submitted 3 months ago by keepthepace@slrpnk.net to c/machinelearning@lemmy.ml

0 comments fedilink

Someone (Dreamertist on reddit) got tired of depending on Huggingface for downloading models and proposes a torrent tracker to share more efficiently these huge blobs.

It just started, only a few models uploaded yet, but I think it is worth that we all put our local stash online there. Making a new torrent is super easy (one missing step though: when "re-downloading" the model you need to save it in the directory where it already exists. This way it will "resume" at 100% completion and switch to seeding mode)

18

2

Can gpt generate a gpt model? (sh.itjust.works)

submitted 3 months ago by wargreymon@sh.itjust.works to c/machinelearning@lemmy.ml

1 comments fedilink

Imagine AI giving offsprings...

19

1

Sakuga-42M Dataset: Scaling Up Cartoon Research (arxiv.org)

submitted 3 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

20

1

How AI 'Understands' Images (CLIP) (www.youtube.com)

submitted 4 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

21

1

Where do these stains come from and how can I fix them? (sopuli.xyz)

submitted 4 months ago* (last edited 4 months ago) by smokinliver@sopuli.xyz to c/machinelearning@lemmy.ml

0 comments fedilink

Hey guys,

I have been experimenting with self-supervised visual learning a bit. Until now I have only ever used U-Nets and related architectures.

No matter what specific task, images or other parameters I changed I always encountered these stains on my output-images (here marked with green), although sometimes more, sometimes less.

Now I wondered if anybody could tell me where they came from and how I could prevent them?

In the attached picture the input (left) and target (right) are the same, so that I can be sure these stains do not come from a badly designed learning task, yet they still appear (output is the middle image).

Thanks in advance and all the best :D

Edit: added line breaks

22

1

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models (hidiffusion.github.io)

submitted 4 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

23

1

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior (animate-your-word.github.io)

submitted 4 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

24

1

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance (arxiv.org)

submitted 5 months ago by yogthos@lemmy.ml to c/machinelearning@lemmy.ml

0 comments fedilink

25

0

What are your thoughts on Microsoft Copilot? (lemmy.blahaj.zone)

submitted 5 months ago by Kit@lemmy.blahaj.zone to c/machinelearning@lemmy.ml

0 comments fedilink

Copilot sounds amazing on paper. The free (to 365 subs) version on the web is just Chat GPT4, so that's familiar enough. The integration with 365 applications is really what grabs me. Stuff like tossing it 10 spreadsheets and asking it to analyze and compare the data, having a virtual assistant to remind me of upcoming actionables, and summarizing a meeting when I zone out - it all sounds really handy.

I met with Microsoft last week and they're down for giving me a 90 day trial if I want to take it for a spin. Any thoughts or suggestions? I ideally want to determine if this will improve productivity for my end users enough to be worth the insane cost of $30/user/mo.