this post was submitted on 26 Aug 2024
17 points (100.0% liked)

TechTakes

1432 readers
112 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] self@awful.systems 14 points 2 months ago* (last edited 2 months ago) (10 children)

yeah, this is weirdly sneerable for a 404 article, and I hope this isn’t an early sign they’ve enshittifying. let’s do what they should have and take a critical look at, ah, GameNGen, a name for their research they surely won’t regret

Diffusion Models Are Real-Time Game Engines

wow! it’s a shame that creating this model involved plagiarizing every bit of recorded doom footage that’s ever existed, exploited an uncounted number of laborers from the global south for RLHF, and burned an amount of rainforest in energy that also won’t be counted. but fuck it, sometimes I shop at Walmart so I can’t throw stones and this sounds cool, so let’s grab the source and see how it works!

just kidding, this thing’s hosted on github but there’s no source. it’s just a static marketing page, a selection of videos, and a link to their paper on arXiv, which comes in at a positively ultralight 10 LaTeX-formatted letter-sized pages when you ignore the many unhelpful screenshots and graphs they included

so we can’t play with it, but it’s a model implementing a game engine, right? so the evaluation strategy given in the paper has to involve the innovative input mechanism they’ve discovered that enables the model to simulate a gameplay loop (and therefore a game engine), right? surely that’s what convinced a pool of observers with more-than-random-chance certainty that the model was accurately simulating doom?

Human Evaluation. As another measurement of simulation quality, we provided 10 human raters with 130 random short clips (of lengths 1.6 seconds and 3.2 seconds) of our simulation side by side with the real game. The raters were tasked with recognizing the real game (see Figure 14 in Appendix A.6). The raters only choose the actual game over the simulation in 58% or 60% of the time (for the 1.6 seconds and 3.2 seconds clips, respectively).

of course not. nowhere in this paper is their supposed innovation in input actually evaluated — at no point is this work treated experimentally like a real-time game engine. also, and you pointed this out already — were the human raters drunk? (honestly, I couldn’t blame them — I wouldn’t give a shit either if my mturk was “which of these 1.6 second clips is doom”) the fucking thing doesn’t even simulate doom’s main gameplay loop right; dead possessed marines just turn to a blurry mess, health and armor don’t make sense in any but the loosest sense, it doesn’t seem to think imps exist at all but does randomly place their fireballs where they should be, and sometimes the geometry it’s simulating just casually turns into a visual paradox. chances are this experimental setup was tuned for the result they wanted — they managed to trick 40% of a group of people who absolutely don’t give a fuck that the incredibly short video clip they were looking at was probably a video game. amazing!

if we ever get our hands on the code for this thing, I’m gonna make a prediction: it barely listens to input, if at all. the video clips they’ve released on their site and YouTube are the most coherent this thing gets, and it instantly falls apart the instant you do anything that wasn’t in its training set (aka, the instant you use this real-time game engine to play a game and do something unremarkably weird, like try to ram yourself through a wall)

[–] ibt3321@lemmy.blahaj.zone 12 points 2 months ago (5 children)

The paper is so bad...

the agent's policy π ... the environment ε

What is up with AI papers using fancy symbols to notate abstract concepts when there isn't a single other instance of the concept to be referred to

They offer a bunch of tables with numbers in a metric that isn't explained, showing that they are exactly the same for "random" and "agent" policy, in other words, inputs don't actually matter! And they say they want to use these metrics for training future versions. Good luck.

For the sample size they are using 60% seems like a statistically significant rate, and they only tested at most 3 seconds after real gameplay footage.

Sidenote: Auto-regressive models for much shorter periods are really useful for when audio is cutting out. Those use really simple math, they aren't burning any rainforests

I'm willing to retract my statement that these guys don't have any ulterior motives.

[–] sailor_sega_saturn@awful.systems 12 points 2 months ago* (last edited 2 months ago) (3 children)

The paper starts with a weirdly bad definition of "computer game" too. It almost makes me think that (gasp) the paper was written by non-gamers.

Computer games are manually crafted software systems centered around the following game loop: (1) gather user inputs, (2) update the game state, and (3) render it to screen pixels. This game loop, running at high frame rates, creates the illusion of an interactive virtual world for the player.

No rendering: Myst

No frame rate: Zork

No pixels: Asteroids

No virtual world: Wordle

No screen: Soundvoyager, Audio Defense (well these examples have a vestigial screen, but they supposedly don't really need it)

[–] Soyweiser@awful.systems 10 points 2 months ago (1 children)
[–] self@awful.systems 12 points 2 months ago (1 children)

things that are games:

  • the control circuitry for a $1 solar-powered calculator
  • my car
  • X11

things that aren’t games:

  • pinball, unless it has an electronic score display
  • Quake-style dedicated servers
  • rogue (nethack)
[–] bitofhope@awful.systems 9 points 2 months ago

More computer games:

  • web browsers
  • stock market trackers
  • election watch

More computer non-games:

  • hangman on a paper teletype
  • ARGs
  • anything on the Vectrex
load more comments (1 replies)
load more comments (2 replies)
load more comments (6 replies)