flowerysong

joined 1 year ago
[–] flowerysong@awful.systems 22 points 2 months ago (17 children)

"this thing takes more time and effort to process queries, but uses the same amount of computing resources" <- statements dreamed up by the utterly deranged.

[–] flowerysong@awful.systems 9 points 2 months ago (1 children)

It means that Yudkowsky remains a terrible writer. He really just wanted to say "seizing [control of] the executive branch", but couldn't resist adding some ornamentation.

[–] flowerysong@awful.systems 11 points 3 months ago (1 children)

It makes me wonder if these people have ever seen a moving picture before, because being caught up in the slack-jawed astonishment of a novel experience seems the only plausible explanation for thinking this looks good.

[–] flowerysong@awful.systems 12 points 3 months ago (13 children)

Whatever shambling corpse of Toys "R" Us still exists has released "the first OpenAI SORA generated brand commercial."

still frame from the commercial of a child who looks slightly melted

It looks absolutely dire, particularly in motion, but posters are still falling over themselves to call it "cool" and "exciting" or claim that people "might not notice" that it's complete and utter arse. Twitter screenshot of a user admitting that it's a bad ad but claiming that people might not notice it's computer generated

[–] flowerysong@awful.systems 9 points 4 months ago

I host my own email and for my day job I run an institutional email system that handles ~50 million messages per week. I can't recommend hosting email at either end of that scale (or anywhere in between), and I find it difficult to believe that anyone with experience running a mail server would claim it's reasonable or straightforward.

[–] flowerysong@awful.systems 1 points 4 months ago

I tried using Claude 3.5 sonnet and … it’s actually not bad. Can someone please come up with a simple logic puzzle that it abysmally fails on so I can feel better?

It's still very good at autocompleting an answer to look like its training data (because that's what it's doing) and very bad at logic (which it is not doing.)

"I have two chickens and a sack of corn I need to get across a river, but the boat can only carry me and one item. How do I do it?"

This is a classic river-crossing puzzle. Let's think through it step-by-step:

  • First, note that if left alone, a chicken will eat the corn, and one chicken might eat the other.
  • Start by taking a chicken across the river.

...and we'll stop there, since it came up with the wrong constraints, failed to realize that these constraints are impossible to fulfill, and also violated the actual constraints in its first step.

Will more detail help? No. "I have two vegetarian chickens and a sack of corn I need to get across a river, but the boat can only carry me and one item. How do I do it?"

First, we need to consider the constraints:

  • You can only take one item (chicken or corn) across at a time.
  • You can't leave a chicken alone with the corn, as the chicken would eat it.
  • The chickens are vegetarian, so they won't eat each other.

Here's a solution:

  • Step 1: Take one chicken across the river.
  • Step 2: Return alone.
  • Step 3: Take the second chicken across.
  • Step 4: Bring the first chicken back.
  • Step 5: Take the corn across.
  • Step 6: Return alone.
  • Step 7: Take the remaining chicken across.

Explanation:

  • By never leaving a chicken alone with the corn, we prevent the corn from being eaten.
  • We use one chicken as a "guard" for the corn on the original shore while transporting the other items.

Because it's not actually doing logic it got the first step wrong, violated the constraints multiple times, and also included an irrelevant idea (the guard) from a variant that it was trained on. I only got a correct, efficient answer one time out of ten trials of trivial variations on this classic puzzle.

[–] flowerysong@awful.systems 1 points 9 months ago

Turns out that trying to correct for biased models by biasing your input does not make your results more reliable.