this post was submitted on 04 May 2025
12 points (92.9% liked)

LocalLLaMA

2939 readers
15 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS
 

Hi, I'm not too informed about LLMs so I'll appreciate any correction to what I might be getting wrong. I have a collection of books I would like to train an LLM on so I could use it as a quick source of information on the topics covered by the books. Is this feasible?

you are viewing a single comment's thread
view the rest of the comments
[–] will_a113@lemm.ee 1 points 1 day ago

Making your own embeddings is for RAG. Most base model providers have standardized on OpenAIs embeddings scheme, but there are many ways. Typically you embed a few tokens worth of data at a time and store that in your vector database. This lets your AI later do some vector math (usually cosine similarity search) to see how similar (related) the embeddings are to each other and to what you asked about. There are fine tuning schemes where you make embeddings before the tuning as well but most people today use whatever fine tuning services their base model provider offers, which usually has some layers of abstraction.