this post was submitted on 28 Aug 2024
19 points (95.2% liked)
Technologie: Nachrichten aus Wissenschaft, Bio und Technik
245 readers
9 users here now
Eine Community um Nachrichten aus Wissenschaft, Medizin und Technik zu sammeln und zu diskutieren.
founded 5 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Somewhat confusing how they present it. As I understand it:
So basically they're just doing AI video generation, but specialized to only make videos of Doom? I have mixed feelings. This doesn't seem terribly novel over using any other game or activity to train a specialist video maker. Given the unique properties of Doom, I don't understand why they didn't use demos more. Demos (a list of player inputs basically) would seem to be easier to parse with an LLM and easier to generate. Then you can run the demo on an actual copy of Doom and not have these weird visual artifacts.
I suppose the implication is that the LLM has somehow learned the internal rules of Doom and is able to simulate them in the course of generating video. But I don't buy it. Watch the UI. Sometimes pickups register, sometimes they don't. Picking up items often results in random amounts of ammo, sometimes of multiple types. Its like how text generators can easily fool people into seeing intelligence by using, essentially, cold reading techniques on them. To a first approximation, it seems like the model "understands" Doom. But it could just as easily be other rules or inferences being made. Maybe when the screen flashes red, it knows health is supposed to decrease by some amount? Maybe it's good at picking up regular temporal events, so it can replicate the DoT from standing in nukage, but doesn't truly understand why that's happening so sometimes health just randomly tics down by a point or two for no reason?
Mostly, I just see an attempt to normalize the idea that "Generative AI belongs in (and can replace, lol) actual game engines". Please hire us and pay us exorbitant amounts of money to ruin your game engines.
Not quite.
They probably had that lying around anyway. Traditional bots follow pre-programmed paths, a navmesh. I don't think you'd get the more organic, exploring gameplay out of those that's necessary here. They say there's not enough recorded gameplay from humans.
No, they trained Stable Diffusion 1.4; a so-called latent diffusion model. It does text-to-image. You're probably aware of such image generators.
They trained it further for their task; to take frames and player input as a prompt instead of text. On google hardware, they get 20 fps out of the AI, which is just enough for humans to play the game, in a sense.