CrowdStrike effectively bricked windows, Mac and Linux today.

Windows machines won’t boot, and Mac and Linux work is abandoned because all their users are on twitter making memes.

Incredible work.

  • Klanky@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 months ago

    I wish my Windows work machine wouldn’t boot. Everything worked fine for us. :-(

    • Affidavit@lemm.ee
      link
      fedilink
      arrow-up
      2
      ·
      2 months ago

      Could be worse. I was the only member of my entire team who didn’t get stuck in a boot loop, meaning I had to do their work as well as my own… Can’t even blame being on Linux as my work computer is Windows 11, I got ‘lucky’; I just got a couple of BSODs and the system restarted just fine.

      • Rivalarrival@lemmy.today
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 months ago

        Funny, mine did a couple BSODs then restarted just fine, at first. Then a fist shaped hole appeared in the monitor and it wouldn’t turn on again.

        Weird bug.

    • cheesepotatoes@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      2 months ago

      Good lord I would hope critical surgical computers like that aren’t networked externally… Somehow I’m guessing I’m wrong.

    • half coffee@lemy.lol
      link
      fedilink
      arrow-up
      1
      ·
      2 months ago

      Anecdotal, but my spouse was in surgery during the outage and it went fine, so I imagine they take precautions (like probably having a test machine for updates before they install anything on the real one, maybe)

      • Blank@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        2 months ago

        There were no test rings for this one and it wasn’t a user controlled update. It was pushed by CS in a way that couldn’t be intercepted/tested/vetted by the consumer unless your device either doesn’t have CS installed or isn’t on an external network… or I suppose you could block CS connections at the firewall. 🤷‍♂️

      • Zacryon@feddit.org
        link
        fedilink
        arrow-up
        0
        ·
        2 months ago

        Depending on the machine, I guess it’s likely that those aren’t using Windoofs at all. I would be surprised if there were devices in use during surgery who run on that.

    • smb@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      well maybe letting them pay compensation to all(!) victims (not just their customers) for all losses including lost time already would solve that problem.

      that would leave the decades-long unsolved problem of microsoft not beeing held liable for their buggy products (which is the reason for all security-products-as-a-workaround-to-compensate-that-crappy-os companies existance) open.

      why not in general hold companies liable for the damage they cause so they CAN develop beeing more cautious with what they do? i mean not ONLY cs should be sued to hell, but ALL of them should be sued until they are reasonable cautious with all possible damages they can cause (and already did in the past)

  • PrettyFlyForAFatGuy@feddit.uk
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    2 months ago

    As a career QA, i just do not understand how this got through? Do they not use their own software? Do they not have a UAT program?

    Heads will roll for this

    • HyperMegaNet@lemm.ee
      link
      fedilink
      arrow-up
      1
      ·
      2 months ago

      From what I’ve read, it sounds like the update file that was causing the problems was entirely filled with zeros; the patched file was the same size but had data in it.

      My entirely speculative theory is that the update file that they intended to deploy was okay (and possibly passed internal testing), but when it was being deployed to customers there was some error which caused the file to be written incorrectly (or somehow a blank dummy file was used). Meaning the original update could have been through testing but wasn’t what actually ended up being deployed to customers.

      I also assume that it’s very difficult for them to conduct UAT given that a core part of their protection comes from being able to fix possible security issues before they are exploited. If they did extensive UAT prior to deploying updates, it would both slow down the speed with which they can fix possible issues (and therefore allow more time for malicious actors to exploit them), but also provide time for malicious parties to update their attacks in response to the upcoming changes, which may become public knowledge when they are released for UAT.

      There’s also just an issue of scale; they apparently regularly release several updates like this per day, so I’m not sure how UAT testing could even be conducted at that pace. Granted I’ve only ever personally involved with UAT for applications that had quarterly (major) updates, so there might be ways to get it done several times a day that I’m not aware of.

      None of that is to take away from the fact that this was an enormous cock up, and that whatever processes they have in place are clearly not sufficient. I completely agree that whatever they do for testing these updates has failed in a monumental way. My work was relatively unaffected by this, but I imagine there are lots of angry customers who are rightly demanding answers for how exactly this happened, and how they intend to avoid something like this happening again.

        • smb@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          or maybe even automatically like in any well done CD or CI environment. at least their customers now know that they ARE the only test environment CS actually has or uses. ¯_(ツ)_/¯

          “if only” - poem (“3 seconds” edition):

          if only.

          if only there would exist CEOs in the world that could learn from their noob-dumb-brain-dead-faults instead of always ever speaking about their successes which were always-ever really done by others instead.

          if only.

          if only there were shareholders willing to really look at that wreck that tells all his false success storys and lies, so CEOs could then maybe develop at least a minimum of willingness to learn. maybe a minimum of 3 seconds of learning per decade and per ceo could already help lots of companies a really huge lot.

          if only.

          if only there was damage compensation in effect so that shareholders would be actually willing to take at least some seconds - maybe 3 seconds of really looking at new CEOs could already help, but its only shareholders, not sure if they would be able to concentrate that long or maybe are already too much degenerated over the generations of beeing parasitic only - to look at the CEOs and the damage they cause before giving them ability to cause that damage over and over again.

          if only.

  • danc4498@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 months ago

    Is there a good eli5 on what crowdstrike is, why it is so massively used, why it seems to be so heavily associated with Microsoft and what the hell happened?

    • Baggie@lemmy.zip
      link
      fedilink
      arrow-up
      4
      ·
      2 months ago

      Gonna try my best here:

      Crowdstrike is an anti-virus program that everyone in the corporate world uses for their windows machines. They released a update that made the program fail badly enough that windows crashes. When it crashes like this, it tries to restart in case it fixes the issue, but here it doesn’t, and computers get stuck in a loop of restarting.

      Because anti-virus programs are there to prevent bad things from happening, you can’t just automatically disable the program when it crashes. This means a lot of computers cannot start properly, which means you also cannot tell the computers to fix the problem remotely like you usually would.

      The end result is a bunch of low level techs are spending their weekends manually going to each computer individually, and swapping out the bad update file so the computer can boot. It’s a massive failure on crowdstrikes part, and a good reason you shouldn’t outsource all your IT like people have been doing.

      • themeatbridge@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        2 months ago

        It’s also a strong indicator that companies are not doing enough to protect their own infrastructure. Production servers shouldn’t have third party software that auto-updates without going through a test environment. It’s one thing to push emergency updates if there is a timely concern or vulnerability, but routine maintenance should go through testing before being promoted to prod.

    • Captain Aggravated@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      Crowdstrike is a cybersecurity company that makes security software for Windows. It apparently operates at the kernel-level, so it’s running in the critical path of the OS. So if their software crashes, it takes Windows down with it.

      This is very popular software. Many large entities including fortune 500 companies, transport authorities, hospitals etc. use this software.

      They pushed a bad update which caused their software to crash, which took Windows down with it on an extremely large number of machines worldwide.

      Hilariously bad.

                • Dashi@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  2 months ago

                  If there is any software you want running at kernel though it is your AV. Not saying Spotify has a reason for running at kernel though… But running AV at kernel in theory is a better way to protect the machine and you.

                • rottingleaf@lemmy.world
                  link
                  fedilink
                  arrow-up
                  0
                  ·
                  edit-2
                  2 months ago

                  Third parties love their trojans just being treated as normal way of life.

                  “Anti-cheats” instead of not being imbeciles while designing protocols for multiplayer, “anti-viruses” which need to run kernel-level and download databases with executable code, video drivers which just can’t be packaged with Windows.

                  One thing I’ve realized is that large parts of social structure are dependent on cheating. We all want to cheat, so we all agree to a system where cheating is possible, but pretend it’s not happening until someone gets caught and then just behave as if nothing happened.

                  One necessary part of someone’s upbringing is honesty. There’s an amazingly deep moment in LOTR where Eomer says that Rohirrim don’t lie, so they are not easily deceived.

                  This is not a poetic device. This is how it works. Ponzi schemes usually target people who think they are smarter and more cunning and will gain something from them. And rigged security systems work because most of participants think they are the ones who may at some point abuse those systems, but most of them are the ones becoming eventually victims of such abuse.

      • smb@lemmy.ml
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        This is very popular software.

        if that’s a “good” argument for you, then i’ve already heared that, and it nearly never really fits. here is another one for you that is an argument as generic as yours: “maybe try eating poo, trillions of flies cannot be wrong, poo is VERY popular food, much more popular than any human food !!! (as in mass per day as well as in its number of consumers)”

        • Captain Aggravated@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          I wasn’t making a case for adopting this software. Just pointing out that it is widely used, which is why it had such a wide effect.

          I think you’ll find most corporations would jump off a bridge if they saw their competitors jump.

          • smb@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            so i misunderstood. sry then.

            and yes, every company running an alltime-ever-in-news-due-to-critical-exploitable-bugs-in-the-mailclient already IS in freefall after that said jump.

  • Lucidlethargy@sh.itjust.works
    link
    fedilink
    arrow-up
    1
    ·
    2 months ago

    Lol, they only bricked specific machines running their product. Everyone else was fine.

    This was a business problem, not a user problem.

  • thecodeboss@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    What’s the criteria for a Windows machine to be affected? I use Windows but haven’t had any issues today.

  • hsdkfr734r@feddit.nl
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    2 months ago

    Can an OS be bricked?:

    A brick (or bricked device) is a mobile device, game console, router, computer or other electronic device that is no longer functional due to corrupted firmware, a hardware problem, or other damage.[1] The term analogizes the device to a brick’s modern technological usefulness.[2]

    Edit: you may click the tiny down arrow if you think it can’t. ;)

        • macniel@feddit.org
          link
          fedilink
          arrow-up
          0
          ·
          2 months ago

          and the one I was replying to was asking about an OS being bricked, not about the bios or firmware.

          AND even then you can reflash the bios, its time consuming and costly but you can.

          • Saik0@lemmy.saik0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            AND even then you can reflash the bios, its time consuming and costly but you can.

            then nothing can be bricked because on paper you can desolder the rom chip and put another one in place.

            If you want to be stupidly pedantic about shit, then nothing is anything.