this post was submitted on 20 Jul 2024

155 points (97.5% liked)

Technology

1340 readers

160 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.

Rules

1. English only

Title and associated content has to be in English.

2. Use original link

Post URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.

3. Respectful communication

All communication has to be respectful of differing opinions, viewpoints, and experiences.

4. Inclusivity

Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

5. Ad hominem attacks

Any kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.

6. Off-topic tangents

Stay on topic. Keep it relevant.

7. Instance rules may apply

If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.

Companion communities

!globalnews@lemmy.zip
!interestingshare@lemmy.zip

Icon attribution | Banner attribution

founded 11 months ago

MODERATORS

BrikoX@lemmy.zip

155

CrowdStrike’s faulty update crashed 8.5 million Windows devices, says Microsoft (www.theverge.com)

submitted 3 months ago by BrikoX@lemmy.zip to c/technology@lemmy.zip

36 comments fedilink hide all child comments

Here are the details about what went wrong on Friday.

top 36 comments

sorted by: hot top controversial new old

[–] gravitas_deficiency@sh.itjust.works 43 points 3 months ago (5 children)

I feel like that’s not even close to what the real number is, considering the impact it had.

[–] Godort@lemm.ee 28 points 3 months ago

If this figure is accurate, the massive impact was likely due to collateral damages. If this took down every server at an enterprise and left most of the workstations online, then that still means that those workstations were basically paperweights.

[–] Sami@lemmy.zip 17 points 3 months ago* (last edited 3 months ago)

They have about 24,000 clients so that comes out to around 350 impacted machines per client which is reasonable. It only takes a few impacted machines for thousands of people to be impacted if they are important enough.

[–] SchmidtGenetics@lemmy.world 8 points 3 months ago* (last edited 3 months ago) (1 children)

My bothers work uses VMs so if the server is down there’s probably 50k computers right there. But it’s only 1 affected computer.

[–] gravitas_deficiency@sh.itjust.works 5 points 3 months ago (2 children)

As far as I know, none of the OSes used for virtualization hosts at scale by any of the major cloud infra players are Windows.

Not to mention: any company that uses any AWS or azure or GCP service is “using VMs” in one form or another (yes, I know I am hand waving away the difference between VMs and containers). It’s basically what they build all of their other services on.

[–] Godort@lemm.ee 3 points 3 months ago

No, but HyperV is used extensively in the SMB space.

VMWare is popular for a reason, but its also insanely expensive if you only need an AD server and a file share.

[–] SchmidtGenetics@lemmy.world 2 points 3 months ago (1 children)

Banks use VMs and banks were down without access to their systems to login into the VM, so they could work. They were bricked by extension.

[–] gravitas_deficiency@sh.itjust.works 1 points 3 months ago (1 children)

No, the clients were bricked. The VMs themselves were probably fine - and in fact, probably auto-rollbacked the update to a working savepoint after the update failed (assuming the VM infrastructure was properly set up).

[–] SchmidtGenetics@lemmy.world 2 points 3 months ago* (last edited 3 months ago)

He couldn’t login to the VM to access his work portals or emails, call it what you will, but one bricked computer/server affected thousands.

It’s weird that you’re arguing, but asked how it was possible in the first place. VMs are the answer dude, argue all you want, but it’s making you look foolish for A not understanding, and B arguing against the answer. Also, why this one thread? Multiple other people told you the exact same thing. You just looking for an argument here or something?

[–] ByteOnBikes@slrpnk.net 6 points 3 months ago

I wonder if a large percentage of impact is internal facing systems.

And we won't know until Monday.

[–] biscuitswalrus@aussie.zone 5 points 3 months ago (1 children)

That's how supply chains work. A link in the chain is broken, the whole thing doesn't work. Also 10% of major companies being affected, is still giant. But you're here using online services, probably still buying bread probably got fuel, probably playing video games. It's huge in the media, and it saw massive affects but there's heaps of things that just weren't even touched that information spread on. Like TV news networks seemingly kept going enough to report on it non stop unaffected. Tbh though any good continuity and disaster recovery plan should handle this with impact but continuity.

[–] remotelove@lemmy.ca 3 points 3 months ago (1 children)

The only companies I have seen with workable BCDR plans are banks, and that is because they handle money for rich people. It wouldn't surprise me if many core banking systems are hyper-legacy as well.

I honestly think that a majority of our infrastructure didn't collapse because of the lack of security controls and shitty patch management programs.

Sure. Compliance programs work for some aspects of business but since the advent of "the cloud", BCDR plans have been a paperwork drill.

(There are probably some awesome places out there with quadruple-redunant networks with the ability to outlast a nuclear winter. I personally haven't seen them though.)

[–] biscuitswalrus@aussie.zone 3 points 3 months ago (1 children)

It's impossible to tell and you're probably more close to the truth than not.

One fact alone, bcdr isn't an IT responsibility. Business continuity should be inclusive of things like: when your CNC machine no longer has power, what do you do? Cause 1: power loss. Process: Get the diesel generator backup running following that SOP. Cause 2:broken. Process: Get the mechanic over, or get the warranty action item list. Rely on the SLA for maintenance. Cause 3: network connectivity. Process: use USB following SOP.

I've been a part of a half dozen or more of these over time, which is not that many for over 200 companies I've supported.

I've even done simulations, round table "Dungeons and dragons" style with a person running the simulation. Where different people have to follow the responsibilities in their documented process. Be it calling clients and customers and vendors, or alerting their insurance, or positing to social media, all the way through to the warehouse manager using a Biro, ruler, and creating stock incoming and outgoing by hand until systems are operational again.

So I only mention this because you talk about IT redundancy, but business continuity is not an IT responsibility, although it has a role. It's a business responsibility.

Further kind of proving your point since anyone who's worked a decade without being part of a simulation or contribute to their improvement at least, probably proves they've worked at companies who don't do them. Which isn't their fault but it's an indicator of how fragile business is and how little they are accountable for it.

[–] remotelove@lemmy.ca 2 points 3 months ago (1 children)

You aren't wrong about my description. My direct experience with compliance is limited to small/medium tech companies where IT is the business. As long as there is an alternate work location and tech redundancy, the business can chug along as usual. (Data centers are becoming more rare so cloud redundancy is more important than ever.) Of course, there is still quite a bit that needs to be done depending on the type of emergency, as you described: It's just all IT, customer and partner centric.

Unfortunately, that does make compliance an IT function because a majority of the company is in some IT engineering function, less sales and marketing.

I can't speak to companies in different industries whereas you can. When physical products and manufacturing is at stake, that is way out of scope with what I could deal with.

[–] biscuitswalrus@aussie.zone 2 points 3 months ago (1 children)

Hmm, yeah. Thanks for sharing. Because of 15 odd years of IT Managed Services, I only have non-technical companies on the brain and in my world view I hadn't considered technology provider companies at all. They typically don't need managed service providers (right or wrong :p).

[–] remotelove@lemmy.ca 1 points 3 months ago* (last edited 3 months ago) (1 children)

It gets worse. Tech companies are service providers that typically work with a chain of other service providers. About 40%-50% of the controls for the last SOC2 audit I ran was carved out and deferred to our service providers. (Also, there are limited applicable frameworks: SOC2, PCI, ISO-270001, HIPAA and HITRUST are common for me, but usually related to cloud services.)

Yeah, I tend to break the brains of auditors that have never dealt with startups and have been used to Fortune 500 mega-companies. What's funnier, is that I am just a lowly security engineer. A very experienced security engineer, but a lowly one nonetheless.

Auditor: So what is your documented process for this ?

Me: Uhh, we don't have one?

Auditor: What about when X or Y catastrophic issue happens?

Me: Anyone just pushes this button and activates that widget.

Auditor: Ok. Uh. Is that process documented?

Me: Nope. We probably do it about 2-3 times a week anyway.

[–] biscuitswalrus@aussie.zone 2 points 3 months ago

Yeah we do a lot around frameworks at my current place, and previously we worked directly with customers with iso and acsc essential 8 frameworks. For us, non-compliance = revenue opportunity. That means we are financially rewarded for aligning them and encouraged to do so. On that same note I wrote up a checklist for "sysadmin best practices" aimed for driving reviews and checks and Remedial opportunities for small businesses, useful in that space. I got such an overwhelming amount of response in the msp reddit from people asking in DMs about it (not hundreds, just dozens, too many for me though). It's quiet here in lemmy. Happy to share my updated version of course, just I think if you're dealing in your sector it'll look like childs play lol. But I kind of want to encourage a bit of community within professionals here. I just don't want do spend time on it..

I feel you about the lowly experienced officer bit though. An account manager or business development manager, or even CTO won't listen to me. I have a business degree, most of them don't. I try to apply critical decision making in my solutions and risk advisory. But the words fall on deaf ears. I take a small but very guilty pleasure watching the very thing I warn against, happening both to clients and my employers. Especially when the prevention was trivial but all it needed was any amount of attention.

After nearly 20 years of IT and about 15 in MSP I'm so tired. I'm very much resonating with that "lowly engineer" comment.

[–] DogPeePoo@lemm.ee 9 points 3 months ago* (last edited 3 months ago)

CrowdStrike lives up to its name

[–] Canopyflyer@lemmy.world 5 points 3 months ago* (last edited 3 months ago) (1 children)

Hey Crowdstrike...

That's not imposter syndrome you're feeling right now.

[–] possiblylinux127@lemmy.zip 3 points 3 months ago (1 children)

*crowdstrike

[–] Canopyflyer@lemmy.world 3 points 3 months ago (1 children)

It's not imposter syndrome for me either. At least I didn't bring down millions of systems all across the world

[–] possiblylinux127@lemmy.zip 2 points 3 months ago* (last edited 3 months ago)

My bad

Sorry about all those blue screens

[–] unfnknblvbl@beehaw.org 4 points 3 months ago* (last edited 3 months ago) (1 children)

This number seems quite low. My organisation alone would have had something like 3000 employee devices taken down. Since it happened on a day where most people WFH, there's at least another thousand static devices in my building alone that may not have been in use at the time that will shit the bed tomorrow morning.

The same thing applies to our much larger sister companies interstate. So that's another 6,000 or so devices.

The two largest energy retailers were affected too, so that's another 5,000 devices at a conservative estimate.

Then there's all the self-service checkouts that went down across Australia. I have no idea how many there are, but if every Coles and Woolworths has ten of them, that's another ~40,000 devices.

That's just the organisations that I am personally aware of as being affected in Australia and can get ballpark figures for.

Obviously Microsoft are getting their figures from the auto-reportimg that happened on each crash, but it really does seem like it's too low.

It's beyond time to diversify our IT infrastructure. Enough with sticking everything "in the cloud" and paying for software (and devices!!) we don't own.

[–] Chozo@fedia.io 4 points 3 months ago* (last edited 3 months ago) (1 children)

So, those numbers all account for about 54,000 of the 8.5 million devices. Using fairly generous rounding, that still leaves approximately 8.5 million more devices.

A million is a lot.

[–] unfnknblvbl@beehaw.org 1 points 3 months ago

Way to miss the point. That's 54,000 that one person knows of across a small handful of organisations in one small country. I'm not even including the dozens more organisations I know were affected but can't come up with a ballpark figure for.

[–] Irremarkable@fedia.io 3 points 3 months ago* (last edited 3 months ago) (1 children)

Yknow I almost majored in IT/anything in that realm. Real glad I didn't right now. And most other times, but especially right now.

[–] buddascrayon@lemmy.world 2 points 3 months ago (2 children)

If you had majored in IT you would know that this Crowdstrike thing is an easy, though somewhat tedious, fix. There's honestly far more annoying problems that IT people have to content with.

[–] Irremarkable@fedia.io 1 points 3 months ago* (last edited 3 months ago) (1 children)

I'm well aware that it's not a complicated fix, I'm more than capable of doing it. Being a guy on an understaffed IT team in an office of hundreds right now sounds fucking miserable.

[–] buddascrayon@lemmy.world 2 points 3 months ago* (last edited 3 months ago)

Not really. It's a ton of overtime, the problem is not my fault, and no one can yell at me for taking too long because there's no way to get it done faster.

If you want to talk about a giant pain in the ass look at what happens when a malicious virus runs rampant in an office. Then you have to clean each computer individually, sometimes having to wipe and reload whole machines. Which can take fucking hours because you have to update each computer after you do the wipe and reload. Even if you're working from images there's going to be at least a half a dozen updates if not more waiting to be redownloaded and reinstalled. And company bosses tend not to think it takes all that long to do that and therefore blame you for the delay in getting everyone up and running. So I'd rather them be mad at somebody else for the extreme downtime, like Crowdstrike.

[–] whoisearth@lemmy.ca 0 points 3 months ago

Like justifying staffing and budgets. Fuck office politics.

[–] Greyghoster@aussie.zone 2 points 3 months ago (1 children)

How many systems in the world’s military went down, you know in war machines of Russia and Israel and Ukraine?

[–] Avg@lemm.ee 3 points 3 months ago (3 children)

Those computers don't have auto update enabled

[–] lemming741@lemmy.world 2 points 3 months ago (1 children)

CrowdStrike’s channel file updates were pushed to computers regardless of any settings meant to prevent such automatic updates, Wardle noted.

https://x.com/patrickwardle/status/1814367918425079934

[–] Avg@lemm.ee 1 points 3 months ago

I work at an enterprise software company and have some well known, security conscience customer. The above is only true for us humans, if you have the money, you can dictate whatever the fuck you want.

[–] remotelove@lemmy.ca 1 points 3 months ago* (last edited 3 months ago)

Absolutely that. For networks that matter, patches are usually tested independently. While I wouldn't trust the average military command to do patch testing, any civilian/corporate contractors absolutely would, because money. (Microsoft is likely at the top of that stack...)

There are other conditions as well. EDR infrastructure, if it exists, would need to be isolated on a "Government cloud" which is a different beast completely. Plus, there are different levels of networks, some being air-gapped.

[–] Greyghoster@aussie.zone 1 points 3 months ago

Normally I would agree however this doesn’t appear to be a Microsoft update but a CrowdStrike update. Given that everyone is worried about ransomware etc.