overview for Blaed

44

submitted 1 month ago by Blaed@lemmy.world to c/fosai@lemmy.world

10 comments fedilink

Meta has released and open-sourced Llama 3.1 in three different sizes: 8B, 70B, and 405B

This new Llama iteration and update brings state-of-the-art performance to open-source ecosystems.

If you've had a chance to use Llama 3.1 in any of its variants - let us know how you like it and what you're using it for in the comments below!

Llama 3.1 Megathread

For this release, we evaluated performance on over 150 benchmark datasets that span a wide range of languages. In addition, we performed extensive human evaluations that compare Llama 3.1 with competing models in real-world scenarios. Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Additionally, our smaller models are competitive with closed and open models that have a similar number of parameters.

As our largest model yet, training Llama 3.1 405B on over 15 trillion tokens was a major challenge. To enable training runs at this scale and achieve the results we have in a reasonable amount of time, we significantly optimized our full training stack and pushed our model training to over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this scale.

Official Meta News & Documentation

See also: The Llama 3 Herd of Models paper here:

https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

HuggingFace Download Links

`8B`

Meta-Llama-3.1-8B

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B

Meta-Llama-3.1-8B-Instruct

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

Llama-Guard-3-8B

https://huggingface.co/meta-llama/Llama-Guard-3-8B

Llama-Guard-3-8B-INT8

https://huggingface.co/meta-llama/Llama-Guard-3-8B-INT8

`70B`

Meta-Llama-3.1-70B

https://huggingface.co/meta-llama/Meta-Llama-3.1-70B

Meta-Llama-3.1-70B-Instruct

https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

`405B`

Meta-Llama-3.1-405B-FP8

https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-FP8

Meta-Llama-3.1-405B-Instruct-FP8

https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

Meta-Llama-3.1-405B

https://huggingface.co/meta-llama/Meta-Llama-3.1-405B

Meta-Llama-3.1-405B-Instruct

https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct

Getting the models

You can download the models directly from Meta or one of our download partners: Hugging Face or Kaggle.

Alternatively, you can work with ecosystem partners to access the models through the services they provide. This approach can be especially useful if you want to work with the Llama 3.1 405B model.

Note: Llama 3.1 405B requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.

Learn more at:

https://llama.meta.com/docs/getting_the_models

Running the models

More guides and resources

How-to Fine-tune Llama 3.1 models

https://llama.meta.com/docs/how-to-guides/fine-tuning

Quantizing Llama 3.1 models

https://llama.meta.com/docs/how-to-guides/quantization

Prompting Llama 3.1 models

https://llama.meta.com/docs/how-to-guides/prompting

Llama 3.1 recipes

https://github.com/meta-llama/llama-recipes

YouTube media

Rowan Cheung - Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

https://www.youtube.com/watch?v=Vy3OkbtUa5k

Matthew Berman - BREAKING: LLaMA 405b is here! Open-source is now FRONTIER!

https://www.youtube.com/watch?v=JLEDwO7JEK4

Wes Roth - Zuckerberg goes SCORCHED EARTH.... Llama 3.1 BREAKS the "AGI Industry"*

https://www.youtube.com/watch?v=QyRWqJehK7I

1littlecoder - How to DOWNLOAD Llama 3.1 LLMs

https://www.youtube.com/watch?v=R_vrjOkGvZ8

Bloomberg - Inside Mark Zuckerberg's AI Era | The Circuit

https://www.youtube.com/watch?v=YuIc4mq7zMU

1

Direct Preference Optimization: Your Language Model is Secretly a Reward Model (lemmy.world)

submitted 6 months ago by Blaed@lemmy.world to c/fosai@lemmy.world

0 comments fedilink

Hello everyone. Today I'd like to catch up on another paper, a popular one that has pushed a new fine-tuning trend called DPO (Direct Preference Optimization).

Included with the paper are a few open-source projects and code repos that support DPO training. If you are fine-tuning models, this is worth looking into!

DPO Arxiv Paper

https://arxiv.org/abs/2305.18290

Try Fine-tuning w/ DPO using Axolotl

https://github.com/OpenAccess-AI-Collective/axolotl

Try Fine-tuning w/ DPO using Llama Factory

https://github.com/hiyouga/LLaMA-Factory

Try Fine-tuning w/DPO using Unsloth

https://github.com/unslothai/unsloth

Now.. onto the paper!

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).

However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model.

In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss.

The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.

Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.

Figure 1: DPO optimizes for human preferences while avoiding reinforcement learning. Existing methods for fine-tuning language models with human feedback first fit a reward model to a dataset of prompts and human preferences over pairs of responses, and then use RL to find a policy that maximizes the learned reward. In contrast, DPO directly optimizes for the policy best satisfying the preferences with a simple classification objective, fitting an implicit reward model whose corresponding optimal policy can be extracted in closed form

Figure 2: Left. The frontier of expected reward vs KL to the reference policy. DPO provides the highest expected reward for all KL values, demonstrating the quality of the optimization.

Right. TL;DR summarization win rates vs. human-written summaries, using GPT-4 as evaluator. DPO exceeds PPO’s best-case performance on summarization, while being more robust to changes in the sampling temperature.

Learning from preferences is a powerful, scalable framework for training capable, aligned language models. We have introduced DPO, a simple training paradigm for training language models from preferences without reinforcement learning.

Rather than coercing the preference learning problem into a standard RL setting in order to use off-the-shelf RL algorithms, DPO identifies a mapping between language model policies and reward functions that enables training a language model to satisfy human preferences directly, with a simple cross-entropy loss, without reinforcement learning or loss of generality.

With virtually no tuning of hyperparameters, DPO performs similarly or better than existing RLHF algorithms, including those based on PPO; DPO thus meaningfully reduces the barrier to training more language models from human preferences.

Our results raise several important questions for future work. How does the DPO policy generalize out of distribution, compared with learning from an explicit reward function?

Our initial results suggest that DPO policies can generalize similarly to PPO-based models, but more comprehensive study is needed. For example, can training with self-labeling from the DPO policy similarly make effective use of unlabeled prompts? On another front, how does reward over-optimization manifest in the direct preference optimization setting, and is the slight decrease in performance in Figure 3-right an instance of it?

Additionally, while we evaluate models up to 6B parameters, exploration of scaling DPO to state-of-the-art models orders of magnitude larger is an exciting direction for future work. Regarding evaluations, we find that the win rates computed by GPT-4 are impacted by the prompt; future work may study the best way to elicit high-quality judgments from automated systems. Finally, many possible applications of DPO exist beyond training language models from human preferences, including training generative models in other modalities.

MoE-Mamba

Efficient Selective State Space Models with Mixture of Experts

Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur

State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based LLMs, including recent state-of-the-art open-source models.

We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable, Transformer-like performance.

Our model, MoE-Mamba, outperforms both Mamba and Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer.

Category	Hyperparameter	Value
Model	Total Blocks	8 (16 in Mamba)
	dmodel	512
Feed-Forward	df f	2048 (with Attention) or 1536 (with Mamba)
Mixture of Experts	dexpert	2048 (with Attention) or 1536 (with Mamba)
	Experts	32
Attention	nheads	8
Training	Training Steps	100k
	Context Length	256
	Batch Size	256
	LR	1e-3
	LR Warmup	1% steps
	Gradient Clipping	0.5

MoE seems like the logical way to move forward with Mamba, at this point, I'm wondering could there anything else holding it back? Curious to see more tools and implementations compare against some of the other trending transformer-based LLM stacks.

1

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (lemmy.world)

submitted 7 months ago* (last edited 7 months ago) by Blaed@lemmy.world to c/fosai@lemmy.world

0 comments fedilink

Hello everyone, I have a very exciting paper to share with you today. This came out a little while ago, (like many other papers since my hiatus) so allow me to catch you up if you haven't read it already.

Mamba

Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.

Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language.

We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements.

First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token.

Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba).

Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences.

As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.

On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

(...) Mamba achieves state-of-the-art results on a diverse set of domains, where it matches or exceeds the performance of strong Transformer models. We are excited about the broad applications of selective state space models to build foundation models for different domains, especially in emerging modalities requiring long context such as genomics, audio, and video. Our results suggest that Mamba is a strong candidate to be a general sequence model backbone.

What are your thoughts on Mamba?

1

Develop Alongside Local LLMs w/ Open Interpreter (lemmy.world)

submitted 7 months ago by Blaed@lemmy.world to c/fosai@lemmy.world

0 comments fedilink

I don't think this has been shared here before. Figured now is as good time as ever.

I'd like to share with everyone Open Interpreter.

Open Interpreter

Check it out here: https://github.com/KillianLucas/open-interpreter

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer's general-purpose capabilities:

Create and edit photos, videos, PDFs, etc.

Control a Chrome browser to perform research

Plot, clean, and analyze large datasets

...etc. ⚠️ Note: You'll be asked to approve code before it's run.

Comparison to ChatGPT's Code Interpreter

OpenAI's release of Code Interpreter with GPT-4 presents a fantastic opportunity to accomplish real-world tasks with ChatGPT.

However, OpenAI's service is hosted, closed-source, and heavily restricted:

No internet access.

Limited set of pre-installed packages.

100 MB maximum upload, 120.0 second runtime limit.

State is cleared (along with any generated files or links) when the environment dies.

Open Interpreter overcomes these limitations by running in your local environment. It has full access to the internet, isn't restricted by time or file size, and can utilize any package or library.

This combines the power of GPT-4's Code Interpreter with the flexibility of your local development environment.

Open Interpreter Roadmap

1

What open-source LLMs are you using in 2024? (lemmy.world)

submitted 7 months ago by Blaed@lemmy.world to c/fosai@lemmy.world

0 comments fedilink

There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver?

Personally, I am using many Mistral/Mixtral models and a few random OpenHermes fine-tunes for flavor. I was also pleasantly surprised by some of the DeepSeek models. Those were fun to test.

I believe 2024 is the year open-source LLMs will catchup with GPT-3.5 and GPT-4. We're already most of the way there. Curious to hear what new contenders are on the block and how others feel about their performance/precision compared to other state-of-the-art (closed) source models.

1

FOSAI 2024 (lemmy.world)

submitted 7 months ago by Blaed@lemmy.world to c/fosai@lemmy.world

0 comments fedilink

Hello everyone.

I'm back!

To anyone still reading - I hope you have been enjoying the rapid amount of progress we've seen in the space since my hiatus.

You'll be happy to hear I'm going to be periodically cleaning up some of the outdated resources in favor of new, updated documentation both on our frontpage and on our sidebar.

I know I also promised you all official FOSAI models on HuggingFace. I did not forget. Those are still in the pipeline. More info on that and other updates coming soon.

In the meantime, is there anything in terms of guides, resources, or notes that you'd like to see in particular? Let me know in the comments and I'll see where it might fit on the list.

Cheers!

Blaed

Why do you like LLMs? in c/fosai@lemmy.world

[–] Blaed@lemmy.world 0 points 10 months ago (1 children)

What I find particularly exciting is that we’re seeing this evolution in real-time.

Can you imagine what these models might look like in 2 years? 5? 10?

There is a remarkable future on the horizon. I hope everyone gets an equal chance to be a part of it.

0

Why do you like LLMs? (lemmy.world)

submitted 10 months ago by Blaed@lemmy.world to c/fosai@lemmy.world

3 comments fedilink

Genuinely curious.

Why do you like LLMs? What hopes do you have for AI & AGI in our near and distant future?

3

We're building FOSAI models! Cast your votes and pick your tunings. (lemmy.world)

submitted 10 months ago* (last edited 10 months ago) by Blaed@lemmy.world to c/fosai@lemmy.world

0 comments fedilink

Hey everyone!

I think it's time we had a fosai model on HuggingFace. I'd like to start collecting ideas, strategies, and approaches for fine-tuning our first community model.

I'm open to hearing what you think we should do. We will release more in time. This is just the beginning.

For now, I say let's pick a current open-source foundation model and fine-tune on datasets we all curate together, built around a loose concept of using a fine-tuned LLM to teach ourselves more bleeding-edge technologies (and how to build them using technical tools and concepts).

FOSAI is a non-profit movement. You own everything fosai as much as I do. It is synonymous with the concept of FOSS. It is for everyone to champion as they see fit. Anyone is welcome to join me in training or tuning using the workflows I share along the way.

You are encouraged to leverage fosai tools to create and express ideas of your own. All fosai models will be licensed under Apache 2.0. I am open to hearing thoughts if other licenses should be considered.

We're Building FOSAI Models! 🤖

Our goal is to fine-tune a foundation model and open-source it. We're going to start with one foundation family with smaller parameters (7B/13B) then work our way up to 40B (or other sizes), moving to the next as we vote on what foundation we should fine-tune as a community.

Fine-Tuned Use Case ☑️

Technical

FOSAI Model Idea #1 - Research & Development Assistant
FOSAI Model Idea #2 - Technical Project Manager
FOSAI Model Idea #3 - Personal Software Developer
FOSAI Model Idea #4 - Life Coach / Teacher / Mentor
FOSAI Model Idea #5 - FOSAI OS / System Assistant

Non-Technical

FOSAI Model Idea #6 - Dungeon Master / Lore Master
FOSAI Model Idea #7 - Sentient Robot Character
FOSAI Model Idea #8 - Friendly Companion Character
FOSAI Model Idea #9 - General RPG or Sci-Fi Character
FOSAI Model Idea #10 - Philosophical Character

OR

FOSAI Foundation Model ☑️

Foundation Model ☑️

(Pick one)

Mistral
Llama 2
Falcon
..(Your Submission Here)

Model Name & Convention

snake_case_example
CamelCaseExample
kebab-case-example

0.) FOSAI ☑️

fosai-7B
fosai-13B

1.) FOSAI Assistant ☑️

fosai-assitant-7B
fosai-assistant-13B

2.) FOSAI Atlas ☑️

fosai-atlas-7B
fosai-atlas-13B

3.) FOSAI Navigator ☑️

fosai-navigator-7B
fosai-navigator-13B

4.) ?

Datasets ☑️

TBD!
What datasets do you think we should fine-tune on?

Alignment ☑️

To embody open-source mentalities, I think it's worth releasing both censored and uncensored versions of our models. This is something I will consider as we train and fine-tune over time. Like any tool, you are responsible for your usage and how you choose to incorporate into your business and/or personal life.

License ☑️

All fosai models will be licensed under Apache 2.0. I am open to hearing thoughts if other licenses should be considered.

This will be a fine-tuned model, so it may inherit some of the permissions and license agreements as its foundation model and have other implications depending on your country or local law.

Generally speaking, you can expect that all fosai models will be commercially viable through the selection process of its foundation family and the post-processing steps that are fine-tuning the model.

Costs

I will be personally covering all training and deployment costs. This may change if I choose to put together some sort of patronage, but for now - don't worry about this. I will be using something like RunPod or some other custom deployed solution for training.

Cast Your Votes! ☑️

Share Your Ideas & Vote in the Comments Below! ✅

What do you want to see out of this first community model? What are some of the fine-tuning ideas you've wanted to try, but never had the time or chance to test? Let me know in the comments and we'll brainstorm together.

I am in no rush to get this out, so I will leave this up for everyone to see and interact with until I feel we have a solid direction we can all agree upon. There will be plenty of more opportunities to create, curate, and customize more fosai models I plan to release in the future.

Update [10/25/23]: I may have found a fine-tuning workflow for both Llama (2) and Mistral, but I haven't had any time to validate the first test run. Once I have a chance to do this and test some inference I'll be updating this post with the workflow, the models, and some sample output with example datasets. Unfortunately, I have ran out of personal funds to allocate to training, so it is unsure when I will have a chance to make another attempt at this if this first attempt doesn't pan out. Will keep everyone posted as we approach the end of 2023.

0

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B (lemmy.world)

submitted 1 year ago by Blaed@lemmy.world to c/technology@lemmy.ml

0 comments fedilink

cross-posted from: https://lemmy.world/post/3879861

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Hello everyone! This post marks an exciting moment for !fosai@lemmy.world and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:

https://www.phind.com/blog/code-llama-beats-gpt4

Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

CodeLlama-34B achieved 48.8% pass@1 on HumanEval

CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples.

The methodology is:

For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.

A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval

Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.

https://huggingface.co/Phind/Phind-CodeLlama-34B-v1

https://huggingface.co/Phind/Phind-CodeLlama-34B-Python-v1

If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world.

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

0

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B (lemmy.world)

submitted 1 year ago by Blaed@lemmy.world to c/technology@lemmy.world

0 comments fedilink

cross-posted from: https://lemmy.world/post/3879861

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Hello everyone! This post marks an exciting moment for !fosai@lemmy.world and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:

https://www.phind.com/blog/code-llama-beats-gpt4

Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

CodeLlama-34B achieved 48.8% pass@1 on HumanEval

CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples.

The methodology is:

For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.

A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval

Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.

https://huggingface.co/Phind/Phind-CodeLlama-34B-v1

https://huggingface.co/Phind/Phind-CodeLlama-34B-Python-v1

If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world.

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

0

Introducing Stable-Diffusion.cpp (Inference in Pure C/C++) (lemmy.world)

submitted 1 year ago by Blaed@lemmy.world to c/technology@lemmy.world

0 comments fedilink

cross-posted from: https://lemmy.world/post/3549390

stable-diffusion.cpp

Introducing stable-diffusion.cpp, a pure C/C++ inference engine for Stable Diffusion! This is a really awesome implementation to help speed up home inference of diffusion models.

Tailored for developers and AI enthusiasts, this repository offers a high-performance solution for creating and manipulating images using various quantization techniques and accelerated inference.

https://github.com/leejet/stable-diffusion.cpp

Key Features:

Efficient Implementation: Utilizing plain C/C++, it operates seamlessly like llama.cpp and is built on the ggml framework.

Multiple Precision Support: Choose between 16-bit, 32-bit float, and 4-bit to 8-bit integer quantization.

Optimized Performance: Experience memory-efficient CPU inference with AVX, AVX2, and AVX512 support for x86 architectures.

Versatile Modes: From original txt2img to img2img modes and negative prompt handling, customize your processing needs.

Cross-Platform Compatibility: Runs smoothly on Linux, Mac OS, and Windows.

Getting Started

Cloning, building, and running are made simple, and detailed examples are provided for both text-to-image and image-to-image generation. With an array of options for precision and comprehensive usage guidelines, you can easily adapt the code for your specific project requirements.
git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
If you have already cloned the repository, you can use the following command to update the repository to the latest code.
cd stable-diffusion.cpp
git pull origin master
git submodule update

More Details

Plain C/C++ implementation based on ggml, working in the same way as llama.cpp

16-bit, 32-bit float support

4-bit, 5-bit and 8-bit integer quantization support

Accelerated memory-efficient CPU inference

Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image

AVX, AVX2 and AVX512 support for x86 architectures

Original txt2img and img2img mode

Negative prompt

stable-diffusion-webui style tokenizer (not all the features, only token weighting for now)

Sampling method

Euler A

Supported platforms

Linux

Mac OS

Windows

This is a really exciting repo. I'll be honest, I don't think I am as well versed in what's going on for diffusion inference - but I do know more efficient and effective methods running those models are always welcome by people frequently using diffusers. Especially for those who need to multi-task and maintain performance headroom.

Blaed

Llama 3.1 Megathread

Official Meta News & Documentation

HuggingFace Download Links

8B

70B

405B

Getting the models

Running the models

Linux

Windows

Mac

Cloud

More guides and resources

YouTube media

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

MoE-Mamba

Mamba

Open Interpreter

Comparison to ChatGPT's Code Interpreter

We're Building FOSAI Models! 🤖

Fine-Tuned Use Case ☑️

Foundation Model ☑️

Model Name & Convention

Datasets ☑️

Alignment ☑️

License ☑️

Costs

Cast Your Votes! ☑️

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Blog Post

Download

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Blog Post

Download

stable-diffusion.cpp

Key Features:

Getting Started

More Details

`8B`

`70B`

`405B`

`Linux`

`Windows`

`Mac`

`Cloud`