this post was submitted on 31 Jul 2024
426 points (98.4% liked)

News

23367 readers
3103 users here now

Welcome to the News community!

Rules:

1. Be civil


Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.


2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.


Obvious right or left wing sources will be removed at the mods discretion. We have an actively updated blocklist, which you can see here: https://lemmy.world/post/2246130 if you feel like any website is missing, contact the mods. Supporting links can be added in comments or posted seperately but not to the post body.


3. No bots, spam or self-promotion.


Only approved bots, which follow the guidelines for bots set by the instance, are allowed.


4. Post titles should be the same as the article used as source.


Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.


5. Only recent news is allowed.


Posts must be news from the most recent 30 days.


6. All posts must be news articles.


No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.


7. No duplicate posts.


If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.


8. Misinformation is prohibited.


Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.


9. No link shorteners.


The auto mod will contact you if a link shortener is detected, please delete your post if they are right.


10. Don't copy entire article in your post body


For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

founded 1 year ago
MODERATORS
 
  • Delta Air Lines CEO Ed Bastian said the massive IT outage earlier this month that stranded thousands of customers will cost it $500 million.
  • The airline canceled more than 4,000 flights in the wake of the outage, which was caused by a botched CrowdStrike software update and took thousands of Microsoft systems around the world offline.
  • Bastian, speaking from Paris, told CNBC’s “Squawk Box” on Wednesday that the carrier would seek damages from the disruptions, adding, “We have no choice.”
top 50 comments
sorted by: hot top controversial new old
[–] ASDraptor@lemmy.autism.place 142 points 3 months ago (3 children)

499.999.990

Remember that you got your $10 gift card for Uber eats.

[–] FlyingSquid@lemmy.world 47 points 3 months ago (1 children)
[–] JohnnyCanuck@lemmy.ca 20 points 3 months ago

It worked but there was a $10 convenience fee.

[–] Railcar8095@lemm.ee 14 points 3 months ago

Technically, it was a $10 gift card for each IT technician, so that could have been a whole $100!

Not so bad after all

[–] Evotech@lemmy.world 8 points 3 months ago

No, only the partners did

[–] dhork@lemmy.world 93 points 3 months ago* (last edited 3 months ago) (1 children)

Bastian said the figure includes not just lost revenue but “the tens of millions of dollars per day in compensation and hotels” over a period of five days. The amount is roughly in line with analysts’ estimates. Delta didn’t disclose how many customers were affected or how many canceled their flights.

It's important to note that the DOT recently clarified a rule that reinforced that if an airline cancels a flight, they have to compensate the customer. So that's the real reason why Delta had to spend so much, they couldn't ignore their customers and had to pay out for their inconvenience.

https://www.kxan.com/news/can-you-get-compensation-if-your-flight-was-delayed-or-canceled-by-the-crowdstrike-outage/

So think about how much worse it might have been for fliers if a more industry-friendly Transportation Secretary were in charge. The airlines might not have had to pay out nearly as much to stranded customers, and we'd be hearing about how stranded fliers got nothing at all.

[–] corsicanguppy@lemmy.ca 6 points 3 months ago* (last edited 3 months ago) (1 children)

Now do Canada.

Our best airline just got bought by pretty much a broadcom, mechs are striking because, well, Canada isn't an at-will state near Jersey, everyone's looking to bail because now they have to be the dicks to customers they didn't like being at the other (national) airline. The whole enshittification enchilada.

Late flights? Check. Missed connections? Check. Luggage? Laughable. And extra. Compensation? "No hablo canadiensis".

We need that hard rule where they fuck up and they gotta make it rain too.

Like, is it so hard to keep a working but dark airplane in a parking spot for when that flight's delayed because the lav check valve is jammed? This seems to be basic capacity planning and business continuity. They need to get a clue under their skin or else they get the hose again.

load more comments (1 replies)
[–] Poem_for_your_sprog@lemmy.world 65 points 3 months ago (6 children)

Why do news outlets keep calling it a Microsoft outage? It's only a crowdstrike issue right? Microsoft doesn't have anything to do with it?

[–] echodot@feddit.uk 36 points 3 months ago* (last edited 3 months ago) (3 children)

It's sort of 90% of one and 10% of the other. Mostly the issue is a crowdstrike problem, but Microsoft really should have it so their their operating system doesn't continuously boot loop if a driver is failing. It should be able to detect that and shut down the affected driver. Of course equally the driver shouldn't be crashing just because it doesn't understand some code it's being fed.

Also there is an argument to be made that Microsoft should have pushed back more at allowing crowdstrike to effectively bypass their kernel testing policies. Since obviously that negates the whole point of the tests.

Of course both these issues also exist in Linux so it's not as if this is a Microsoft unique problem.

[–] themeatbridge@lemmy.world 6 points 3 months ago (1 children)

There's a good 20% of blame belonging to the penny pinchers choosing to allow third-party security updates without testing environments because the corporation is too cheap for proper infrastructure and disaster recovery architecture.

Like, imagine if there was a new airbag technology that promised to reduce car crashes. And so everyone stopped wearing seatbelts. And then those airbags caused every car on the road to crash at the same time.

Obviously, the airbags that caused all the crashes are the primary cause. And the car manufacturers that allowed airbags to crash their cars bear some responsibility. But then we should also remind everyone that seatbelts are important and we should all be wearing them. The people who did wear their seatbelts were probably fine.

Just because everyone is tightening IT budgets and buying licenses to panacea security services doesn't make it smart business.

[–] ricecake@sh.itjust.works 7 points 3 months ago

In this case, it's less like they stopped wearing seatbelts, and more like the airbags silently disabled the seatbelts from being more than a fun sash without telling anyone.

To drop the analogy: the way the update deployed didn't inform the owners of the systems affected, and didn't pay attention to any of their configuration regarding update management.

load more comments (2 replies)
[–] cheddar@programming.dev 31 points 3 months ago* (last edited 3 months ago) (1 children)

The answer is simple: they have no idea what they are talking about. And that is true for almost every topic they are reporting about.

load more comments (1 replies)
[–] Rekhyt@lemmy.world 14 points 3 months ago (2 children)

It was a Crowdstrike-triggered issue that only affected Microsoft Windows machines. Crowdstrike on Linux didn't have issues and Windows without Crowdstrike didn't have issues. It's appropriate to refer to it as a Microsoft-Crowdstrike outage.

[–] ricecake@sh.itjust.works 28 points 3 months ago (4 children)

Funny enough, crowdstrike on Linux had a very similar issue a few months back.

load more comments (4 replies)
[–] Poem_for_your_sprog@lemmy.world 4 points 3 months ago (18 children)

I guess microsoft-crowdstrike is fair, since the OS doesn't have any kind of protection against a shitty antivirus destroying it.

I keep seeing articles that just say "Microsoft outage", even on major outlets like CNN.

load more comments (18 replies)
[–] skuzz@discuss.tchncs.de 11 points 3 months ago (5 children)

Honestly, with how terrible Windows 11 has been degrading in the last 8 or 9 months, it's probably good to turn up the heat on MS even if it isn't completely deserved. They're pissing away their operating system goodwill so fast.

There have been some discussions on other Lemmy threads, the tl;dr is basically:

  • Microsoft has a driver certification process called WHQL.
  • This would have caught the CrowdStrike glitch before it ever went production, as the process goes through an extreme set of tests and validations.
  • AV companies get to circumvent this process, even though other driver vendors have to use it.
  • The part of CrowdStrike that broke Windows, however, likely wouldn't have been part of the WHQL certification anyways.
  • Some could argue software like this shouldn't be kernel drivers, maybe they should be treated like graphics drivers and shunted away from the kernel.
  • These tech companies are all running too fast and loose with software and it really needs to stop, but they're all too blinded by the cocaine dreams of AI to care.
load more comments (5 replies)
load more comments (2 replies)
[–] hydrashok@sh.itjust.works 62 points 3 months ago (5 children)

Pretty sure their software’s legal agreement, and the corresponding enterprise legal agreement, already cover this.

The update was the first domino, but the real issue was the disarray of Delta’s IT Operations and their inability to adequately recover in a timely fashion. Sounds like a customer skimping on their lifecycle and capacity planning so that Ed can get just a bit bigger bonus for meeting his budget numbers.

[–] Brkdncr@lemmy.world 32 points 3 months ago (1 children)

Negligence can make contracts a little less permanent.

[–] hydrashok@sh.itjust.works 8 points 3 months ago (1 children)

Delta was the only airline to suffer a long outage. That’s why I say Crowdstrike is the kickoff, but the poor, drawn-out response and time to resolve it is totally on Delta.

[–] Brkdncr@lemmy.world 5 points 3 months ago (1 children)

Idk, crowdstike had a few screwups in their pocket before this one. They might be on the hook for costs associated with an outage caused by negligence. I’m not a lawyer, but I do stand next to one in the elevator.

load more comments (1 replies)
[–] modeler@lemmy.world 21 points 3 months ago (4 children)

Couldn't agree more.

And now that this occurred, and cost $500m, perhaps finally some enterprise companies may actually resource IT departments better and allow them to do their work. But who am I kidding, that's never going to happen if it hits bonuses and dividends :(

[–] xmunk@sh.itjust.works 10 points 3 months ago

We just lost 500 million - we can't afford that right now! /s

[–] echodot@feddit.uk 4 points 3 months ago

According to The headhunters are constantly trying to recruit me for inappropriate jobs it is starting to get traction with companies and they are starting to actually hire fully skilled it departments. Opposed to the ones merely willing to work for near minimum wage which is what they had before.

In some ways it won't really make a difference because fully staffed up I.T departments also needs to be listened to by management, and that doesn't happen often in corporate environments, but still they'll pay the big bucks so that's good enough for me.

load more comments (2 replies)
[–] Semi_Hemi_Demigod@lemmy.world 7 points 3 months ago (2 children)

I wasn't affected by this at all and only followed it on the news and through memes, but I thought this was something that needed hands-on-keyboard to fix, which I could see not being the fault of IT because they stopped planning for issues that couldn't be handled remotely.

Was there some kind of automated way to fix all the machines remotely? Is there a way Delta could have gotten things working faster? I'm genuinely curious because this is one of those Windows things that I'm too Macintosh to understand.

[–] Shadow@lemmy.ca 17 points 3 months ago (1 children)

All the servers and infrastructure should have "lights out management". I can turn on a server, reconfigure the bios and install windows from scratch on the other side of the world.

Potentially all the workstations / end point devices would need to be repaired though.

The initial day or two I'll happily blame on crowdstrike. After that, it's on their IT department for not having good DR plans.

load more comments (1 replies)
load more comments (1 replies)
load more comments (2 replies)
[–] exanime@lemmy.world 47 points 3 months ago (1 children)

Don't worry everyone... Each and everyone of the CEOs involved in this debacle will earn millions this year and next and will eventually retire with more money they could possible spend in 10 lifetimes

If anything, they'll continue to fall upwards completely deserving even more money

load more comments (1 replies)
[–] corsicanguppy@lemmy.ca 47 points 3 months ago (1 children)

No, POOR PLANNING and allowing an external entity the ability to take you down, that's what did it. Pretend you're pros, Delta, and be adequate.

Holy halfwit projection, batman.

[–] Xanis@lemmy.world 12 points 3 months ago (1 children)

The stories I could tell about how companies will hire a team to run tests on their digital and physical systems while also limiting access to outside nodes disconnected or screened from their core, primary, IMPORTANT systems.

Kicker is that plenty of people who work for these companies get it. Very rarely does someone in a position to do something about it actually understand. A few thousand dollars and they could have hired a hat or two to run penetration on systems and fixed the vulnerabilities, or at least shored them up so this fucking 000 bug didn't impact them so harshly.

But naaaaaaah. Gotta cut payroll, brb.

[–] emax_gomax@lemmy.world 15 points 3 months ago (3 children)

I'm not sure any kind of pentest would prevent crowdstrikes backdoor access to release updates at its own discretion and cadence. The only way to avoid that would be blocking crowdstrike from accessing the Internet but I'd bet they'd 100% brick the host over letting that happen. If anything this is a good lesson in not installing malware to prevent even worse malware. You handed the keys to your security to a party that clearly doesn't care and paid the price. My reaction to that legal disclaimer of crowdstrikes stating they take no responsibility for anything they do... responsibility is the only reason anyone would buy anything from them (aside from being forced by legal requirements that clearly didn't have anyone who understood them involved in the legislation).

load more comments (3 replies)
[–] TheAuthor_13@lemm.ee 29 points 3 months ago (2 children)

Good. They’ve been stealing from their customers for decades; this is fuckin’ karmic.

[–] themeatbridge@lemmy.world 12 points 3 months ago (1 children)

Also, maybe don't put all your eggs into one single basket, from an infrastructure perspective.

[–] stoy@lemmy.zip 5 points 3 months ago (4 children)

Yeah, I say I as migrate another service to Azure...

load more comments (4 replies)
load more comments (1 replies)
[–] JJROKCZ@lemmy.world 23 points 3 months ago (8 children)

I can’t wait to see crowdstrike get liquidated from all of this, MSOFT is getting so much flak when this straight up wasn’t their fault

[–] kubica@fedia.io 12 points 3 months ago

The reboot 15 times solution, etc it is a bit on their side. But in general I agree, CrowdStrike and the industries that need that kind of service should know better.

[–] kevindqc@lemmy.world 8 points 3 months ago (2 children)

Their stock is at +44% since July 2023, they might be fine

load more comments (2 replies)
load more comments (6 replies)
[–] riskable@programming.dev 22 points 3 months ago (1 children)

Yeah... Maybe don't put all your IT eggs in one basket next time.

Delta is the one that chose to use Crowdstrike on so many critical systems therefore the fault still lies with Delta.

Every big company thinks that when they outsource a solution or buy software they're getting out of some responsibility. They're not. When that 3rd party causes a critical failure the proverbial finger still points at the company that chose to use the 3rd party.

The shareholders of Delta should hold this guy responsible for this failure. They shouldn't let him get away with blaming Crowdstrike.

[–] clstrfck@lemdro.id 17 points 3 months ago (10 children)

So you think Delta should’ve had a different antivirus/EDR running on every computer?

[–] Th4tGuyII@fedia.io 9 points 3 months ago (2 children)

I think what @riskable@programming.dev was saying is you shouldn't have multiple mission critical systems all using the same 3rd party services. Have a mix of at least two, so if one 3rd party service goes down not everything goes down with it

[–] partial_accumen@lemmy.world 12 points 3 months ago (1 children)

That sounds easy to say, but in execution it would be massively complicated. Modern enterprises are littered with 3rd party services all over the place. The alternative is writing and maintaining your own solution in house, which is an incredibly heavy lift to cover the entirety of all services needed in the enterprise. Most large enterprises are resources starved as is, and this suggestion of having redundancy for any 3rd party service that touches mission critical workloads would probably increase burden and costs by at least 50%. I don't see that happening in commercial companies.

[–] Th4tGuyII@fedia.io 6 points 3 months ago (16 children)

As far as the companies go, their lack of resources is an entirely self-inflicted problem, because they're won't invest in increasing those resources, like more IT infrastructure and staff. It's the same as many companies that keep terrible backups of their data (if any) when they're not bound to by the law, because they simply don't want to pay for it, even though it could very well save them from ruin.

The crowdstrike incident was as bad as it was exactly because loads of companies had their eggs in one basket. Those that didn't recovered much quicker. Redundancy is the lesson to take from this that none of them will learn.

load more comments (16 replies)
[–] ricecake@sh.itjust.works 6 points 3 months ago (1 children)

In this case, it's a local third party tool and they thought they could control to cadence of updates. There was no reason to think there was anything particularly unstable about the situation.

This is closer to saying that half of your servers should be Linux and half should be windows in case one has a bug.

Crowdstrike bypassed user controls on updates.
The normal responsible course of action is to deploy an update to a small test environment, test to make sure it doesn't break anything, and then slowly deploy it to more places while watching for unexpected errors.
Crowdstrike shotgunned it to every system at once without monitoring, with grossly inadequate testing, and entirely bypassed any user configurable setting to avoid or opt out of the update.

I was much more willing to put the blame on the organizers that had the outages for failing to follow best practices before I learned that they way the update was pushed would have entirely bypassed any of those safeguards.

It's unreasonable to say that an organization needs to run multiple copies of every service with different fundamental infrastructure choices for each in case one magics itself broken.

[–] kbin_space_program@kbin.run 6 points 3 months ago

Crowdstrike also bypassed Microsoft's driver signing as part of their update process, just to make the updates release faster.

That MS is getting any flak for this is just shit journalism.

load more comments (9 replies)
[–] hperrin@lemmy.world 19 points 3 months ago* (last edited 3 months ago) (2 children)

Sure, but they did send a $10 Uber Eats gift card, so you gotta take that into account.

load more comments (2 replies)
[–] billwashere@lemmy.world 16 points 3 months ago

Good thing they got that $10 Uber Eats card.

[–] ulkesh@lemmy.world 10 points 3 months ago

Aw that’s a shame. Poor rich company.

[–] solsangraal@lemmy.zip 5 points 3 months ago

womp.

womp.

[–] mrecom@lemmy.world 5 points 3 months ago
load more comments
view more: next ›