AI

4050 readers

1 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 3 years ago

I have 64 zipped megabytes of AIM conversations I had in high school. how hard would it be to train an LLM to be me from 15 years ago? (lemmy.world)

submitted 1 month ago by ch00f@lemmy.world to c/artificial_intel@lemmy.ml

10 comments fedilink hide all child comments

top 10 comments

sorted by: hot top controversial new old

[–] keepthepace@slrpnk.net 12 points 1 month ago

It is called finetuning. I haven't tried it but oobagooba's text-generation-webui has a tab to do it and I believe it is pretty straightforward.

Fine tune a base model on your dataset and then tou will then need to format your prompt in the way your AIM logs are organized. e.g. you will need to add "" add the end of your text completion task. It will complete it in the way it learnt it.

If you don't have a the GPU for it, many companies offer fine-tuning as a service like Mistral

[–] PerogiBoi@lemmy.ca 8 points 1 month ago (2 children)

Why would you want this??? Anything I wrote from 16 years ago is so beyond cringey. You must have been a stellar kid.

[–] DaGeek247@fedia.io 11 points 1 month ago

Because funy

[–] corsicanguppy@lemmy.ca 5 points 1 month ago* (last edited 1 month ago)

I have 26 years of saved outgoing email.

Recently I needed to redo a fix I learned about in 1998 and implemented then. I implemented it again to install a crappy software project that from its composition canNOT have been from before the post-y2k firing of so many mentors.

Only remembered after 3 hours of searching, saving myself another few hours and surely a nervous breakdown. But, after filtering AD on the client end, the project installed easily.

That's the best example, but the things I don't discover I answered already on Stackoverflow I discover I answered years ago in email.

[–] wuphysics87@lemmy.ml 4 points 1 month ago (1 children)

The real question is why do you have 64 mb of aim conversations?

[–] ch00f@lemmy.world 3 points 1 month ago

Because I communicated with a lot of people over AIM? It’s actually more than just high school. Covers 2004 to around 2012. Also it’s 64mb zipped. Actual size is much larger.

[–] istanbullu@lemmy.ml 2 points 1 month ago

Not hard with Huggingface PEFT

[–] will_a113@lemmy.ml 2 points 1 month ago

Putting aside why you'd want to do this, it'd be pretty easy, actually. You'd still use a big model like GPT4 or Claude as your "base" but you would do two things:

Give it a knowledge base using your conversatons. You can manually vectorize them into a key-value database like Pinecone and build yourself an agent using a toolchain like Langchain, or just use a service (OpenAI Agents lets you upload data from your browser)
Have one of the big LLMs (with a large context size) ingest all of those conversations and build out a prompt that describes "you"

you would then

Feed that generated prompt (with your own edits, of course) back into either your custom Langchain agent or OpenAI Agent

[–] fcano@infosec.pub 2 points 1 month ago (1 children)

You may try https://github.com/instructlab. You will need to transform those conversations to a specific yaml format.

[–] ch00f@lemmy.world 1 points 1 month ago

Great tip! I got the demo project up and running in around 30 minutes. Glad to see it's running locally (and not too slowly on my CPU build).

Now to actually train the thing...