Jon Masters’ Post

Computer Architect #ArmServers @Google | Previously @Red Hat, @NUVIA | Author of several Linux programming books

1mo

In case anyone is still looking for a super simplified explanation for what happened with that CrowdStrike issue today: Your Windows computer has different types of software. The most critical piece that controls the hardware (chips, memory, etc.) is called a "kernel". The kernel can be extended to support new hardware using "drivers". Usually, these are for something you added to your computer (e.g. a GPU/graphics card). But some drivers can be used to do different things. One thing that is popular (but wrong, in my opinion) is to do "cyber" "security" or anti-virus type stuff by extending the kernel using a driver. This is needed in order to intercept certain basic operations (like opening files or network connections to other computers) and monitor them for "compliance" and the like. Using a driver, you can (for example) have code that monitors every file that is opened and screens it for malware. It plugs in at such a low level that anything opening a file on the computer will be detected. That's why drivers are used. The problem is that drivers extend the Operating System kernel and the OS kernel is not a forgiving environment. While normal software having a bug might "crash" and need to be restarted, bugs in OS code will cause the whole computer to stop running (blue screen). It appears as if in this case, the driver itself was ok, but an updated file was provided that was used to tell the driver what to do. A bug in the driver (couldn't handle corrupted config files) meant it tried to access a bad memory location and crashed, taking down the machine. Because this driver is loaded so early during system bootup, it can cause a "boot loop" where it loads, crashes the machine, the machine restarts, and the process repeats. Until someone manually boots in a special recovery mode and deletes the bad update. Anyway, that's the super simplified explanation folks haven't given you all day.

80 Comments

Eric Curtin

Principal Software Engineer working on Red Hat In-Vehicle Operating System

1mo

A lot of Linux systems and other OSes have automated protection against this early boot loop problem... Android, ChromeOS, rpm-ostree based systems with greenboot do. Which prevents this in theory. When a system upgrade is applied you atomically switch root into that new updated rootfs on reboot. You store a boot counter somewhere, let's say as a GPT partition attribute, some Android devices store the counter here, let's say that's set to 7. Everytime a boot is attempted via bootloader that boot counter is decremented, if a boot is completed successful this boot counter stops and the boot is marked as healthy. But if that counter goes to zero, you rollback to the old rootfs with the old software so you have a bootable system. RHEL for Edge and Red Hat In-Vehicle OS work like this. In fact any modern built from scratch OS should work like this, at least for core components (things like crowdstrike that have kernel drivers are in this bucket) 👆 Legacy OSes like Windows have an excuse though, when they were developed nobody was doing AB system upgrades with automated rollbacks.

65 Reactions

Jeffrey Chamberlain

Principal Engineer at Intel Corporation

1mo

I have to agree with your parenthetical point: is this level of under the hood access to the machine's internals through an OS driver even worth the risk/reward tradeoff? It is a conversation that needs to be had. It is at least arguable that the risk that this kind of error poses is the same, if not greater, than the additional risk-mitigation that level of driver access is providing to the overall security solution.

Irvan Krantzler (he/him)

Leading software teams to accomplish great things

1mo

It sure seems like they should have figured out that a driver with a bug or a compromised driver would certainly make them vulnerable. So rethinking that will most certainly happen. I’m curious about why they didn’t catch it during a slow rollout or something of that ilk. Why did the entire world have to be disrupted? Knowing how hard this all is, it’s easy to be a Monday morning quarterback. But I would like to understand that part of it, because the scope was what really surprised me.

David Baeumler

Marketing Director at Red Hat Inc. | Creative brand & product storytelling

1mo

We should have made a video together about this.

1 Reaction

Tim Ocock

Interim CTO and product delivery leader

1mo

This is a terrible layman's explanation. Why is it even necessary to introduce the concept of a kernel for laymen to explain this? Here's a better explanation "Anti virus software needs full access to the whole computer to spot and catch viruses. Unfortunately that means if there's a bug in the anti virus itself, it can crash the whole system. Since anti-virus software updates regularly, a new update came that had exactly that kind of bug in it, so every computer running that anti virus crashed."

4 Reactions

Roberto Avanzi

Security and Cryptology Architect, Research Fellow of the CRI, University of Haifa — Security Engineering Veteran — Designer of QARMA, co-submitter of Kyber (FIPS 203: ML-KEM)

1mo

Well, no. The driver was not ok. If a malformed config file can cause it to crash, then the driver was defective. Software should always properly sanitize any input. The driver did not do it.

2 Reactions

Berenice Mann PhD, FCIM, Chartered Marketer

1mo

Isn't the real problem that it seems to have been released without proper testing? On all current OSs. Surely testing on *just one* real computer would have shown the issue up instantly.

3 Reactions

D. Scott Bonomi

Playing Bridge and waiting for the right gig

1mo

Anyone playing in kernel pace should have an automated rollback if the system fails to boot. Keep a copy of the last successful file set and if you get some number of boot fails, roll back to the last good boot set and then indicate the update is a bad file. A history mark as Failed instead of Installed and then if another attempt is made to load that update, report immediate failure. I am sure the amateur wizards in Redmond have never considered such an option. I work in the embedded space and a remote update cannot leave a system in an unusable state. I do recall being told to fix an issue with the statement "do not allow us to make any boat anchors" where the only possible use for failed boot system was as an anchor.

1 Reaction

James Cuff

unix whisperer | hpc apprentice | advisor

1mo

Great write up. New info today. “On Windows systems, Channel Files reside in the following directory: C:\Windows\System32\drivers\CrowdStrike\ and have a file name that starts with “C-”. Each channel file is assigned a number as a unique identifier. The impacted Channel File in this event is 291 and will have a filename that starts with “C-00000291-” and ends with a .sys extension. Although Channel Files end with the SYS extension, they are not kernel drivers. Channel File 291 controls how Falcon evaluates named pipe1 execution on Windows systems. Named pipes are used for normal, interprocess or intersystem communication in Windows.” https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/

2 Reactions

Omer Tunali

1mo

Considering the abundance of computing resources (number of cores per processor), maybe it is time to switch to hybrid kernel architectures like macOS and iOS (probably based on NextOS of Next workstations by Steve Jobs) for desktops and workstations. Hybrid kernels are safer: Core functionality like memory management, process/thread management etc is run in Ring 0 and device drivers, file system services, TCP/UDP IP stack is run in a separate memory space (not the user space, in between). This is much safer IMHO. It is not necessary to start from scratch, FreeBSD and Mach kernel could be used as a starting point.

See more comments

To view or add a comment, sign in

More Relevant Posts

Kadir Gökhan Sezer

Software Developer Specializing in Unix Environments
1mo
Report this post
Every programmer should read

Jon Masters

Computer Architect #ArmServers @Google | Previously @Red Hat, @NUVIA | Author of several Linux programming books
1mo

In case anyone is still looking for a super simplified explanation for what happened with that CrowdStrike issue today: Your Windows computer has different types of software. The most critical piece that controls the hardware (chips, memory, etc.) is called a "kernel". The kernel can be extended to support new hardware using "drivers". Usually, these are for something you added to your computer (e.g. a GPU/graphics card). But some drivers can be used to do different things. One thing that is popular (but wrong, in my opinion) is to do "cyber" "security" or anti-virus type stuff by extending the kernel using a driver. This is needed in order to intercept certain basic operations (like opening files or network connections to other computers) and monitor them for "compliance" and the like. Using a driver, you can (for example) have code that monitors every file that is opened and screens it for malware. It plugs in at such a low level that anything opening a file on the computer will be detected. That's why drivers are used. The problem is that drivers extend the Operating System kernel and the OS kernel is not a forgiving environment. While normal software having a bug might "crash" and need to be restarted, bugs in OS code will cause the whole computer to stop running (blue screen). It appears as if in this case, the driver itself was ok, but an updated file was provided that was used to tell the driver what to do. A bug in the driver (couldn't handle corrupted config files) meant it tried to access a bad memory location and crashed, taking down the machine. Because this driver is loaded so early during system bootup, it can cause a "boot loop" where it loads, crashes the machine, the machine restarts, and the process repeats. Until someone manually boots in a special recovery mode and deletes the bad update. Anyway, that's the super simplified explanation folks haven't given you all day.
Like Comment
To view or add a comment, sign in
Kai Lampka, Dr.-Ing. (habil.)
1mo Edited
Report this post
1x1 on problems with monolithic kernel designs and how a basic AB SW update scheme looks like (first comment of original post)... good explanation for the not-so-informed-reader.

Jon Masters

Computer Architect #ArmServers @Google | Previously @Red Hat, @NUVIA | Author of several Linux programming books
1mo

In case anyone is still looking for a super simplified explanation for what happened with that CrowdStrike issue today: Your Windows computer has different types of software. The most critical piece that controls the hardware (chips, memory, etc.) is called a "kernel". The kernel can be extended to support new hardware using "drivers". Usually, these are for something you added to your computer (e.g. a GPU/graphics card). But some drivers can be used to do different things. One thing that is popular (but wrong, in my opinion) is to do "cyber" "security" or anti-virus type stuff by extending the kernel using a driver. This is needed in order to intercept certain basic operations (like opening files or network connections to other computers) and monitor them for "compliance" and the like. Using a driver, you can (for example) have code that monitors every file that is opened and screens it for malware. It plugs in at such a low level that anything opening a file on the computer will be detected. That's why drivers are used. The problem is that drivers extend the Operating System kernel and the OS kernel is not a forgiving environment. While normal software having a bug might "crash" and need to be restarted, bugs in OS code will cause the whole computer to stop running (blue screen). It appears as if in this case, the driver itself was ok, but an updated file was provided that was used to tell the driver what to do. A bug in the driver (couldn't handle corrupted config files) meant it tried to access a bad memory location and crashed, taking down the machine. Because this driver is loaded so early during system bootup, it can cause a "boot loop" where it loads, crashes the machine, the machine restarts, and the process repeats. Until someone manually boots in a special recovery mode and deletes the bad update. Anyway, that's the super simplified explanation folks haven't given you all day.
Like Comment
To view or add a comment, sign in
Boris Lotkov

Software Developer at Micro88
2mo Edited
Report this post
About new versions of our products are released and something else other How can you protect your computers from hacker attacks? You can, for example, attend a webinar where they will tell you in detail how to write secure code for a Turing machine. Or you can not go anywhere and just install Micro88 DoNotDisturb or Micro88 Minefield. A new versions of Micro88 DoNotDisturb and Micro88 Minefield are released. We've still been fighting hackers attacks from China all last week here, and we've introduced something new to our products. These new versions will protect you from dangerous attackers. All is free. Protect your servers and client computers with our products Micro88 Minefield https://lnkd.in/eXvBmkzh Micro88 DoNotDisturb https://lnkd.in/ekVzht8D They use reliable protection methods and do not consume CPU and memory of your computer. Use also our flagship - Micro88 Continuous Backup to protect your client computers. It'll backup all changes in your files and documents without any backup schedule, changed files are backed up immediately. No backup schedule is using at all. We save your time. Everything will be done without your participation. Also, your client archives with their backups are reliably protected by our backup server. And they don’t lie somewhere on the disk without being looked at. Maybe someone will say that this is nonsense, but no, this is not nonsense at all. Later, you will thank us 20 times for our care of you and the data of your client computers. You can get information about us and our products from MS Copilot or simply by visiting our website https://lnkd.in/ezrQfAx7 Kind Regards, Micro88 Software Group
Like Comment
To view or add a comment, sign in
Tomasz D.

A good engineer.
3mo
Report this post
It is sometimes very disturbing (even) for me to find that popular websites that we think are safe are in fact full of viruses, javascript injections etc. I know this because we sometimes get requests from our customers to verify some websites that have been marked by Zscaler as malicious. Unfortunately, the vast majority of those verifications are true positives. The internet is no longer the place it was 30 years ago when I searched for stuff using Altavista and websites took 10 minutes to load. It has changed and not all the changes have been positive, I'm afraid. If you own even a small company and want to protect your business, talk to Zscaler about protecting your internet access with our ZIA as well as replacing your VPN solution with ZPA. If you're an employee, talk to your IT department. If you're a school headmaster, consider the fact that the school internet access needs to be protected and supervised. These days everyone needs to be concerned with the security of their IT environment. If you own a computer, you're vulnerable already. At least try to be a difficult target.
Like Comment
To view or add a comment, sign in
Rüdiger Küpper

Chief Information Security Officer & DevOps Engineer at mogenius Machen ist wie wollen, nur krasser. Doing is like wanting, only more intense.
9mo
Report this post
LitterDrifter USB Worm https://ift.tt/U7wYn60 A new worm that spreads via USB sticks is infecting computers in Ukraine and beyond. The group—known by many names, including Gamaredon, Primitive Bear, ACTINIUM, Armageddon, and Shuckworm—has been active since at least 2014 and has been attributed to Russia’s Federal Security Service by the Security Service of Ukraine. Most Kremlin-backed groups take pains to fly under the radar; Gamaredon doesn’t care to. Its espionage-motivated campaigns targeting large numbers of Ukrainian organizations are easy to detect and tie back to the Russian government. The campaigns typically revolve around malware that aims to obtain as much information from targets as possible. One of those tools is a computer worm designed to spread from computer to computer through USB drives. Tracked by researchers from Check Point Research as LitterDrifter, the malware is written in the Visual Basic Scripting language. LitterDrifter serves two purposes: to promiscuously spread from USB drive to USB drive and to permanently infect the devices that connect to such drives with malware that permanently communicates with Gamaredon-operated command-and-control servers. via Schneier on Security https://ift.tt/uD98yi5 November 24, 2023 at 01:04PM
Like Comment
To view or add a comment, sign in
Yokeshwar Raja

MSc in Applied Cybersecurity • Certified Ethical Hacker (CEH) v11 • VAPT • GRC • Risk Management • Cyber Forensics • • • 𝘵𝘺𝘱𝘪𝘯𝘨...
1mo
Report this post
What happen on Friday 19th of July 2024? Can you believe that a tiny little 40 KB rapid response content file caused Friday's issue! CrowdStrike released an update to their content configuration for all the windows sensor to gather data on possible threats. This updates are said to be regular part of the mechanisms of the falcon platform. But this created a problematic update which resulted in a windows system crash. The systems in the scope of the update included windows hosts running sensor version 7.11 and above which were online between 5:09 BST to 06:27 BST on the 19th of July 2024 and received the update. Now, to dive deep into the issue CrowdStrike releases configuration updates in two diffferent ways - 1.)Sensor content- this directly updates the CrowdStike's own falcon sensor that runs at the kernel level 2.)Rapid Response content- this updates how sensor behaves to detect malware. This is the one that caused the 78 Minutes outage on friday. Might just sound like 78 minutes that's not much but this caused issues in various platforms from shoppers to people boarding flights. crowdstrike has promised to improve its Rapid Response Content testing by using local developer testing, and various other checks to prevent this from happening again. Stay aware, Stay Vigilant! Read more- -https://lnkd.in/e4AaXnRJ -https://lnkd.in/eVxWRyGu #Crowdstike #Microsoft #SecurityUpdate
Like Comment
To view or add a comment, sign in
Arwa'a Naji

Penetration Tester | CTF Player
8mo Edited
Report this post
Here's a detailed explanation of Staged, Inline, and Meterpreter payloads in Metasploit, along with examples: 1. Staged Payloads: Characteristics: - Delivered in two phases to enhance stealth and accommodate larger payloads. - Stager: Small initial payload that establishes communication with the attacker. - Stage: Main payload with full functionality, downloaded by the stager. Example: windows/shell/reverse_tcp: - Stager establishes a reverse TCP connection to the attacker's machine. - Stager downloads the larger reverse shell payload. - Complete reverse shell functionality is executed on the target. Advantages: - Evade antivirus detection due to smaller initial footprint. - Handle more complex payloads that might exceed memory constraints. 2. Inline (Single) Payloads: Characteristics: - Self-contained, delivering full functionality in a single payload. - Generally smaller and simpler than staged payloads. - Often more stable and consistent due to fewer dependencies. Example: windows/shell_reverse_tcp: - Delivers a reverse TCP shell in a single payload. - No need for a separate stager or download phase. Advantages: - Simplicity and potential for better stability. - Suitable for smaller payloads or when stealth is less critical. 3. Meterpreter Payloads: Characteristics: - Advanced, feature-rich payload offering extensive control over compromised systems. - Memory-resident, avoiding disk writes and detection. - Uses TLS encryption for communication with Metasploit. - Provides extensive post-exploitation capabilities. Example: windows/meterpreter/reverse_tcp: - Establishes a reverse TCP connection to the attacker's machine. - Loads Meterpreter into memory on the target system. - Offers a wide range of commands for: 1- File system interaction 2- Process manipulation 3- Network pivoting 4- Password dumping 5- Keylogging And more.. When to Choose Which Type: - Staged: When stealth is paramount, or payload size is large. - Inline: When simplicity and stability are priorities, or payload size is small. - Meterpreter: When extensive post-exploitation capabilities are needed. #ExploitationFramework #PenetrationTesting #CybersecurityTools #InfoSecExploits #PayloadDevelopment #CyberSecurityResearch #VulnerabilityAssessment #EthicalHacking
Like Comment
To view or add a comment, sign in
2 Dog Digital

373 followers
2mo
Report this post
🐾 Is your business barking up the wrong cybersecurity tree? 🐾 Fake IT support sites are out there, and they pose a threat to your businsess. These cyber threats can compromise your data and control your systems, leaving you in a real doghouse. 🐶🔒 At 2 Dog Digital, we’ve sniffed out the best tips to keep your business secure. Don't let your guard down! Read our latest blog post to learn how to protect your small business and stay one step ahead of the cyber hounds. 🐕💻 🌐➡️ https://lnkd.in/ec7BBhxn

Protecting Your Business from Fake IT Support Sites

https://2dogdigital.com
Like Comment
To view or add a comment, sign in
Rahma ElGewely

Soc Analyst T2 @Securenass
5mo
Report this post
#System Monitor (Sysmon) Sysmon includes the following capabilities: -Logs process creation with full command line for both current and parent processes. -Records the hash of process image files using SHA1 (the default), MD5, SHA256 or IMPHASH. -Multiple hashes can be used at the same time. -Includes a process GUID in process create events to allow for correlation of events even when Windows reuses process IDs. -Includes a session GUID in each event to allow correlation of events on same logon session. -Logs loading of drivers or DLLs with their signatures and hashes. -Logs opens for raw read access of disks and volumes. -Optionally logs network connections, including each connection’s source process, IP addresses, port numbers, hostnames and port names. -Detects changes in file creation time to understand when a file was really created. Modification of file create timestamps is a technique commonly used by malware to cover its tracks. -Automatically reload configuration if changed in the registry. -Rule filtering to include or exclude certain events dynamically. -Generates events from early in the boot process to capture activity made by even sophisticated kernel-mode malware. #List_of_Sysmon_Event_IDs_for_Threat_Hunting
1 Comment
Like Comment
To view or add a comment, sign in

View Profile Follow

Jon Masters’ Post

More from this author

On addressing #Meltdown and #Spectre in future silicon...

Reflections on building RHEL for Arm

Happy 6th birthday Red Hat ARM Team (RHAT)!

Explore topics