Feedback & Followups
- πΊπΈ Kaspersky officially announce they are leaving the US β www.bleepingcomputer.com/β¦
- Google are not the only company losing the battle against malicious ads ATM: Facebook ads for Windows desktop themes push info-stealing malware β www.bleepingcomputer.com/β¦
- Watering hole attacks against developers continue (attacker victims in a place they naturally come):
- A real-world example of why we advise users to get rid of unsupported devices, especially routers: Chinese APT40 hackers hijack SOHO routers to launch attacks β www.bleepingcomputer.com/β¦
- ATM, it seems Apple are winning the cat-and-mouse game against grey-hat cybersecurity companies like Cellebrite: Newer iPhones Running iOS 17.4 May Be Immune To Cellebrite, At Least For Now β www.macobserver.com/β¦
Deep Dive β The Confused CloudStrike & Microsoft Kerfuffles
From an American view two significant outages happened overnight on one night, and in case it wasnβt confusing enough, they were not entirely disconnected.
The First Outage β Microsoft Azure US Central
Things started to go wrong when Microsoft pushed a configuration change to one of the regions in their global cloud network, it just happened to be the one serving much of the United States (Central US).
This change caused the VMs powering this region to respond to connectivity glitches to their backend storage by rebooting rather than pausing, so the available compute power in the region plummeted and it wasnβt enough to meet demand.
This had two effects:
- Some of Microsoftβs first-party services in the region became overloaded, Teams & Xbox Live in particular seem to have been causing people to complain.
- Some of the services Microsoft sell to corporations became overloaded, particularly the Power Platform which is used to power business logic in the cloud with server-less functions (really cool tech actually)
Microsoft seem to have been able to deal with the first problem pretty quickly by migrating the Teams service for US customers to different regions, but that added latency and probably stressed those regions so the service was likely sluggish for a while.
By the time I woke up in Ireland, the Teams issue looked to be under control, but the Power Platform was still orange on the service health dashboard.
For context, like with the leader in the field, Amazon Web Services (AWS), Microsoft offer Azure services in multiple geographic regions, and when you provision something you choose not only a primary region but the level of resilience you want to pay for. The scale starts at none and goes up to full geo-redundancy with resources mirrored in different parts of the world.
American corporations using the Power Platform Customers who chose to take the risk and save money with lesser resilience would have had problems running their business processes, causing outages.
The Second Problem β The Bad CloudStrike Update
As morning dawned on the other side of the world a new problem emerged. Some Australians and New Zealanders arriving into their offices found their Windows PCs & Servers stuck on Blue Screens of Death, and reboots didnβt help β MEEP!
It wasn’t all Windows computers, just some, and after some initial confusion, the pattern soon became clear β it was Windows devices protected by the Enterprise AV product Falcon Strike from the very well-regarded Cybersecurity experts CrowdStrike.
Falcon Strike is a cloud-first real-time AV product driven by AI that uses lightweight local agents which stream their telemetry up to the cloud and get high-frequency updates pushed down to keep the protection as current as possible. Because all the agents stream their data to the cloud in real-time, CloudStrike can use AI to learn about attacks as they happen, and quickly send rules to all their other clients, theoretically nipping even novel attacks in the bud very quickly.
You can see why this product is popular with large enterprise customers β unlike more traditional AV which is great at protecting against known threats and very poor at protecting against newly emerging threats, this is architected to give good protection against even the most novel attacks. Novel attacks first emerge against valuable targets, so the bigger a company is, the more appealing a product like Falcon Strike looks!
A subtle but important point to note is that there is a trade-off here. All updates need testing before they go out, but that adds lag to the process, and the whole point is that the system should be really reactive. The way you balance this is with a massive bank of virtual machines running a wide array of tests in an entirely automated way. In theory, your test suite should cover every possible configuration in use in the real world, but it simply can’t, so there will be gaps.
These kinds of systems tend to follow the power law statistical distribution, so small errors affecting a few customers are massively more probably than big errors affecting lots of customers, but sometimes you get unlucky!
At this stage, I don’t think we know enough to understand how something got through testing that affected so many customers, but one worrying piece of anecdata is that this is not the first time this year an entire OS family seems to have been affected β there were bugs crashing two different flavours of Linux earlier this year. They just didn’t get the same kind of press because they didn’t have the same scale of impact.
Why is Recovery so Slow?
CrowdStrike figured out the root cause pretty quickly, and they revoked the problem update, but that only stops more machines from being knocked out, it does nothing to bring the dead machines back!
To compound the problem, it can’t be fixed remotely or automatically because the fix is to boot the device into safe mode, delete a single file, and then re-boot. In a corporate environment, most users don’t have local administrator rights on their PCs, so they literally can’t fix the problem themselves, they have to wait for some from IT to physically restore their device.
Having said that, servers should be easier to restore because most are virtual these days, and any company being run in an even reasonably responsible manner will have daily if not hourly snapshot backups they can roll back to. But, and office with servers and no PCs is still not a very functional place!
One Final Connection Back to Microsoft
In case there wasn’t already enough confusion between Microsoft’s part in the day’s drama and CloudStrike’s part, one of the services Microsoft sell to enterprise customers is virtual desktop PCs. You run your actual work PC in the cloud, and use a thin client to access it from anywhere, even a web browser. Companies manage these virtual PCs like they were physical, so, they will push out AV tools to them like they would any other PC, including FalconStrike in some cases, so, Microsoft reported that many of their cloud desktops also got stuck into infinite reboot loops because of the CloudStrike bug.
A Sting in the Tail β Cybercriminals try to Cash In
As always happens when something nasty gets headlines, cybercriminals are targeting companies with fake ‘fixes’ from CloudStrike that are actually malware π
This is timely reminder that this same kind of dynamic is in play each time there is any kind of bad news, be it a natural disaster, an accident, or a war, baddies will try to exploit the situation for profit.
Can we Learn any Lessons from all this?
Let’s start easy, does the Azure region outage teach us anything? To be honest, nothing new, we’ve seen this before with Amazon, Google, and Microsoft cloud services. It doesn’t happen often, but entire regions do sometimes go down. This is why all these providers offer resiliency as a feature.
When companies choose to accept a higher risk of failure to save money, the risk is real.
Moving on to the CloudStrike event, I don’t see a clear-cut answer.
You might assume the lesson is not to rely on one vendor for all your AV, but that’s a terrible idea. To have any chance of running an effective cybersecurity operation you need a unified platform. Yes, having all your eggs in one basket is a risk, but having a total hodge-podge is actually worse. Instead of a low risk of a really spectacular outage, you’ll suffer lots of smaller incidents very frequently, and you’ll struggle to contain them. Your cybersecurity team will spend all their time firefighting and filing breach reports, and your reputation will suffer. Better to have a small chance of being one of many many companies affected at the same time when everyone knows it’s not your fault, but the vendors!
You might assume CloudStrike must be some kind of fly-by-night operation, but they are extremely well respected. The reason they are used by so many such big companies is that they are one of the best, and that’s a reputation they’ve earned over many years of hard work.
I’m a little concerned that it seems they had warnings their testing systems were leaky a few months back, so it’s possible they deserve some criticism for not reacting to those warnings better, but it’s equally possible they are very busy re-architecting things behind the scenes, and that there are changes in the pipeline already. We have much too little information today to draw any conclusions about whether or not CloudStrike were in some way negligent. Expect to learn much more in the future because it seems inevitable that CloudStrike will need to publish a detailed incident report on all this once they’ve had time to gather all the facts and do the needed analysis to engineer an appropriate response.
For now, my advice is to ignore anyone who tells you that the blame for this is in any way clear. That’s a sign of someone who just doesn’t get that this is a tradeoff all the way down:
- You need a rapid response, and you need testing, the more you test, the slower your response
- You need a single cybersecurity platform to be able to run an effective operation, but that makes you vulnerable to a catastrophic failure
Maybe this is a good argument for allowing your users to choose their end-user OS as long as it’s supported by your cybersecurity platform, and you allow your sysadmins to use multiple server solutions as long as they too are supported by your platform.
Links
- Major Microsoft 365 outage caused by Azure configuration change β www.bleepingcomputer.com/β¦
-
Microsoft confirms CrowdStrike update also hit Windows 365 PCs β www.bleepingcomputer.com/β¦
-
Context: CloudStrike’s review on Gartner β www.gartner.com/β¦ (~4.8 out of five on all metrics)
-
There were some humorous responses:
- XKCD released an unusually topical comic: CrowdStrike (2961) β xkcd.com/β¦
- XKCD released an unusually topical comic: CrowdStrike (2961) β xkcd.com/β¦
β Action Alerts
- Patch Tuesday: Microsoft July 2024 Patch Tuesday fixes 142 flaws, 4 zero-days β www.bleepingcomputer.com/β¦
- Netgear warns users to patch auth bypass, XSS router flaws β www.bleepingcomputer.com/β¦ (impacts popular gaming routers like the Nighthawk series)
Worthy Warnings
-
AT&T leaked the call & SMS metadata for all their customers, and hence, all the people their customers called or messaged between the 1st of May and the 31st of October 2022, and on the 2nd of January 2023 β krebsonsecurity.com/β¦ (Part of the Snowflake breach)
-
πΊπΈ Rite Aid says June data breach impacts 2.2 million people β www.bleepingcomputer.com/β¦
“This data included purchaser name, address, date of birth and driverβs license number or other form of government-issued ID presented at the time of a purchase between June 6, 2017, and July 30, 2018.
- Three breaches expose users to automated targeted phishing:
- Email addresses of 15 million Trello users leaked on hacking forum β www.bleepingcomputer.com/β¦
- Over 400,000 Life360 user phone numbers leaked via unsecured API β www.bleepingcomputer.com/β¦
- If you ever bought anything from Zotac (very popular in the PC gaming sphere), beware that they accidentally exposed all their RMA information to search engines β www.bleepingcomputer.com/β¦
- Beware, there is a Smishing (SMS-based phishing) attack targeting Apple ID, the messages take victims to a fake iCloud login page β www.macobserver.com/β¦
- Related: Apple have updated their support document on avoiding being phished: Recognise and avoid social engineering schemes, including phishing messages, phoney support calls and other scams β support.apple.com/β¦ (worth bookmarking for sharing with friends & family as needed)
- Advice from Bart: remember, when entering details on a web page, always look up and check the address bar
Notable News
- πͺπΊ X (formerly Twitter) joins the ranks of companies with preliminary findings against them for breaking the EU Digital Services Act (DSA) β ec.europa.eu/β¦ (Digital Services, not Digital Markets!)
- Complaints revolve around the Blue checkmark being misleading, the absence of required advertisement transparency reporting, and the lack of data access for researchers.
- Remember Preliminary Findings are official accusations, not convictions, the company now gets to offer a defence
- Google have been caught with their fingers in the proverbial cookie jar, though in a surprisingly open way: Google Chrome, Along With Other Popular Chromium Browsers, Grants System Monitoring Privileges to *.google.com Domains β daringfireball.net/β¦
- Google have made their Advanced Protection program for at-risk people a little more accessible by allowing users to choose passkeys rather than requiring hardware FIDO tokens β www.bleepingcomputer.com/β¦
- After downplaying the weakness for years, Signal have agreed to start encrypting local copies of chats in their desktop apps making use of OS-level key stores to securely store the keys (i.e. keychain on Macs) β www.bleepingcomputer.com/β¦
- MacPaw have previewed technology they have developed for real-time on-device phishing detection that promises to be a lot more effective than our existing block-listing approach β appleinsider.com/β¦
- Making use of the AI hardware on modern chips, they use on-device AI to pre-load link destinations in the background and check if they imitate known brands
- This was presented at a research conference, it was not a product demo, so we don’t know how or when we’ll get to purchase this, but it looks very promising
- Two nice cybersecurity-related announcements from Microsoft:
- Windows Updates will be evolving to give smaller to downloads and make installs more robust: Microsoft announces new Windows ‘checkpoint’ cumulative updates β www.bleepingcomputer.com/β¦
- All versions of Exchange Online (free services like Hotmail as well as paid offerings like Office365) will support DNSSEC+DANE for inbound email validation β www.bleepingcomputer.com/β¦
- πΈπ¬ Singapore leads the way, and hopefully, many other countries will soon follow: Banks in Singapore to phase out one-time passwords in 3 months β www.bleepingcomputer.com/β¦ (only phishing-resistant MFA is acceptable now, no more codes users have to type in, whether they be via SMS or an authenticator app)
Palate Cleansers
- From Allison:
- π§ An excellent interview with TikTok creator Sanjana Curtis: Clear+Vivid with Alan Alda: Sanjana Curtis- Sprinkling Stardust on TikTok β overcast.fm/β¦
- π¦ Sanjana’s Stardust series of videos on astrophysics β www.tiktok.com/β¦
- From Bart:
- A fascinating (and possibly nostalgic) long read from FastCompany: What the internet looked like in 1994, according to 15 webpages born that year β www.fastcompany.com/β¦
- π¦ A video of Steve Jobs speaking to the 1983 International Design Conference β tidbits.com/β¦ & stevejobsarchive.com/β¦
Legend
When the textual description of a link is part of the link it is the title of the page being linked to, when the text describing a link is not part of the link it is a description written by Bart.
Emoji | Meaning |
---|---|
π§ | A link to audio content, probably a podcast. |
β | A call to action. |
flag | The story is particularly relevant to people living in a specific country, or, the organisation the story is about is affiliated with the government of a specific country. |
π | A link to graphical content, probably a chart, graph, or diagram. |
π§― | A story that has been over-hyped in the media, or, “no need to light your hair on fire” π |
π΅ | A link to an article behind a paywall. |
π | A pinned story, i.e. one to keep an eye on that’s likely to develop into something significant in the future. |
π© | A tip of the hat to thank a member of the community for bringing the story to our attention. |