• CrowdStrike and drivers (was Re: why reliable linux hasn't gained more

    From The Wanderer@21:1/5 to jeremy ardley on Sun Jul 21 00:40:01 2024
    This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
    On 2024-07-20 at 09:19, jeremy ardley wrote:

    On 20/7/24 18:35, George at Clug wrote:

    On Saturday, 20-07-2024 at 13:54 hlyg wrote:

    crowdstrike makes news headlines, many Windows become blue
    screens

    The CrowdStrike issue was not a Windows issue, it was a CrowdStrike
    issue.

    The problem did not affect our Windows computers as we have not
    installed CrowdStrike software.

    I think the media have a habit of over exaggerating things.

    The problem was not CrowdStrike as such. It happens in the best of operations.

    The problem is the Windows Systems Administrators who contracted for
    / allowed unattended remote updates of kernel drivers on live
    hardware systems. This is the height of folly and there is no
    recovery if it causes a BSOD.

    Speaking as someone who administers (part of) a CrowdStrike Falcon
    deployment at my workplace, although I was not involved in selecting it
    and would not be able to decide to switch to something else: I do not
    believe this is a fair description of what happened.

    CrowdStrike Falcon does not manage kernel drivers in general. It manages
    its own locally-installed client, which happens to include some
    kernel-level drivers. The update in this case does not appear to have
    actually modified any of those drivers; it appears to have added a new
    data file for use by such a driver, and those data files appear to be misleadingly named in such a way that they look like drivers.

    (I have not confirmed that personally yet, although I have access to the
    files in question and intend to do so, but people who are more familiar
    with Windows drivers than I am have stated that the files in question do
    not comport with the binary file format used by Windows driver files.)

    All the sysadmins involved did is agree to let an antivirus-equivalent
    utility update itself, and its definitions. I would be surprised if this
    could not have easily happened with *any* antivirus-type utility which
    has self-update capability; I'm fairly sure all modern broad-spectrum antivirus-etc. suites on Windows do kernel-level access in similar
    fashion. CrowdStrike just happens to be the company involved when it
    *did* happen.

    That the sysadmins decided to deploy CrowdStrike does not make it
    reasonable to fault them for this consequence, any more than e.g. if a
    gamer decided to install a game, and then the game required a patch to
    let them keep playing, and that patch silently included new/updated DRM
    which installed a driver which broke the system (as I recall some past
    DRM implementations have reportedly done), it would then be reasonable
    to fault the gamer. In neither case was the consequence foreseeable from
    the decision.

    The situation is recoverable if all the windows machines are virtual
    with a good backup/restore plan. The situation is not recoverable if
    the kernel updates are on raw iron running Windows.

    The situation is trivially recoverable if you can get access to the
    machine in a way which lets you either boot to safe mode and get local-administrator access, or lets you boot an alternative environment
    (e.g. live-boot media) from which you can read and write to the hard
    drive.

    I've spent a fair chunk of my workday today going around to affected
    computers and performing a variant of the latter process.

    Once you've done that, the fix is simple: delete, or move out of the
    way, a single file whose name claims that it's a driver. With that file
    gone, you can reboot, and Windows will come up normally without the
    bluescreen.

    Heads should roll but obviously won't

    What good would decapitation do, here? At most, CrowdStrike's people are
    guilty of rolling out an insufficiently-tested update, or of designing a
    system such that it's too easy for an update to break things in this
    way, or that it's possible to break things in this way not with an
    actual new client version (which goes through a release cascade, with
    each organization deciding which of the most recent three versions each
    of their computers will get) but just with a data-files update (which,
    as we have seen here, appears to go out to all clients regardless of
    version).

    The first would be poor institutional practice; the others would be potentially-questionable software design, although it's hard to know
    without seeing the internal architecture of the software in question and understanding *why* it's designed that way.

    In either case, it's not obvious to me why decapitating a few scapegoats
    would *improve* the situation going forward, unless it can be determined
    that specific people were actually negligent.

    --
    The Wanderer

    The reasonable man adapts himself to the world; the unreasonable one
    persists in trying to adapt the world to himself. Therefore all
    progress depends on the unreasonable man. -- George Bernard Shaw


    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEJCOqsZEc2qVC44pUBKk1jTQoMmsFAmacPHYACgkQBKk1jTQo MmtUxA/+IshWV9MfcAv9B6wbZYElsGdLEqMbJ+PUffxA98VwC3O26ySvZFR3f62Z v0yXw9jR8jlUENt6w5KXRev85dRaktLfGkeFd0POhWtMECkAbiSghhQMgGg/+PpI CCmGX0M0S2Q6jMauB/Shj0NL7Jf7FkskUTHb0u2vj00sI/QxT3nOC3MxUa7zyyTJ exQOnwEwyAgmyqoLB/wNXo4E5VaHa67jsQMuwo2V7KV9H92oCYbpGJoHc+nUmY28 fs1Yauw8AWytjTXcMo7boOMwz4uLtx8o6Vgp3zDp3bQ8yFBAcFWXnwA44yEqQvIl iFs2LwxOEuUupigsOiHss58DI/Dk0YHeeBGH1mB3fBGJD9BSxMlxa2oPO4rhI0O8 j4+36Mr/O8QcluTgQXAZAJDSPNVBjmjY7Os4iaCS9CLJfX1UE+bMRPeCGucgOrdB 4gu657zx+QZAsNqQppMe2lAI1qr+XGW3t54my/+KHXSWXgjG4Nc8mvfWerCA7bap wjwIFHVX2zYfl/pZVtztbEaBCWO53xW9krsWaqJ7SPtkiUEQk/5rdg07YhcuRPnH AST/uF8dPponc1K6fSYzgMEm/+AM/crHooQp00HLIiQzTl2aFdaqsWHUngoERCtL RpRzZGOBI9zyY4a+eAElIrNJNftK
  • From George at Clug@21:1/5 to All on Sun Jul 21 02:00:01 2024
    On Sunday, 21-07-2024 at 08:38 The Wanderer wrote:
    On 2024-07-20 at 09:19, jeremy ardley wrote:

    On 20/7/24 18:35, George at Clug wrote:

    On Saturday, 20-07-2024 at 13:54 hlyg wrote:

    crowdstrike makes news headlines, many Windows become blue
    screens

    The CrowdStrike issue was not a Windows issue, it was a CrowdStrike
    issue.

    The problem did not affect our Windows computers as we have not
    installed CrowdStrike software.

    I think the media have a habit of over exaggerating things.

    The problem was not CrowdStrike as such. It happens in the best of operations.

    The problem is the Windows Systems Administrators who contracted for
    / allowed unattended remote updates of kernel drivers on live
    hardware systems. This is the height of folly and there is no
    recovery if it causes a BSOD.

    Speaking as someone who administers (part of) a CrowdStrike Falcon
    deployment at my workplace, although I was not involved in selecting it
    and would not be able to decide to switch to something else: I do not
    believe this is a fair description of what happened.

    CrowdStrike Falcon does not manage kernel drivers in general. It manages
    its own locally-installed client, which happens to include some
    kernel-level drivers. The update in this case does not appear to have actually modified any of those drivers; it appears to have added a new
    data file for use by such a driver, and those data files appear to be misleadingly named in such a way that they look like drivers.

    (I have not confirmed that personally yet, although I have access to the files in question and intend to do so, but people who are more familiar
    with Windows drivers than I am have stated that the files in question do
    not comport with the binary file format used by Windows driver files.)

    All the sysadmins involved did is agree to let an antivirus-equivalent utility update itself, and its definitions. I would be surprised if this could not have easily happened with *any* antivirus-type utility which
    has self-update capability; I'm fairly sure all modern broad-spectrum antivirus-etc. suites on Windows do kernel-level access in similar
    fashion. CrowdStrike just happens to be the company involved when it
    *did* happen.

    That the sysadmins decided to deploy CrowdStrike does not make it
    reasonable to fault them for this consequence, any more than e.g. if a
    gamer decided to install a game, and then the game required a patch to
    let them keep playing, and that patch silently included new/updated DRM
    which installed a driver which broke the system (as I recall some past
    DRM implementations have reportedly done), it would then be reasonable
    to fault the gamer. In neither case was the consequence foreseeable from
    the decision.

    The situation is recoverable if all the windows machines are virtual
    with a good backup/restore plan. The situation is not recoverable if
    the kernel updates are on raw iron running Windows.

    The situation is trivially recoverable if you can get access to the
    machine in a way which lets you either boot to safe mode and get local-administrator access, or lets you boot an alternative environment
    (e.g. live-boot media) from which you can read and write to the hard
    drive.

    I've spent a fair chunk of my workday today going around to affected computers and performing a variant of the latter process.

    Once you've done that, the fix is simple: delete, or move out of the
    way, a single file whose name claims that it's a driver. With that file
    gone, you can reboot, and Windows will come up normally without the bluescreen.

    Heads should roll but obviously won't

    What good would decapitation do, here? At most, CrowdStrike's people are guilty of rolling out an insufficiently-tested update, or of designing a system such that it's too easy for an update to break things in this
    way, or that it's possible to break things in this way not with an
    actual new client version (which goes through a release cascade, with
    each organization deciding which of the most recent three versions each
    of their computers will get) but just with a data-files update (which,
    as we have seen here, appears to go out to all clients regardless of version).

    The first would be poor institutional practice; the others would be potentially-questionable software design, although it's hard to know
    without seeing the internal architecture of the software in question and understanding *why* it's designed that way.

    In either case, it's not obvious to me why decapitating a few scapegoats would *improve* the situation going forward, unless it can be determined
    that specific people were actually negligent.

    Thanks Wanderer,

    Please no 'decapitating', or I would have lost my head many years ago, and often (if that is possible).

    Testing is important. Like 'backup and restore verification', often considered insufficient in hindsight after an incident, but rarely considered insufficient before the incident.

    Even with our best testing, we all make mistakes from time to time, and I have made my fair share.

    My aim is not to blame, but it is necessary to identify the cause and to carefully consider how to mitigate further occurrences.

    Over reaction is not good - one decision might be not to use anti-virus software, which would mitigate the issue of anti-virus software bugs causing outages, but that would be far worse a solution than an occasional and rare outage.

    And as for testing, testing IS necessary, but it will only ever be testing, 1) it is not possible to test for everything, 2) over testing can cause issues too, while still not capturing all potential issues.

    I want to thank all the people from CrowdStrike and all the people applying the fix patches, thanks for quickly restoring services. Keep up the great work of protecting our Internet services.

    George.




    --
    The Wanderer

    The reasonable man adapts himself to the world; the unreasonable one
    persists in trying to adapt the world to himself. Therefore all
    progress depends on the unreasonable man. -- George Bernard Shaw



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)