By Lee Brotherston, Founding Engineer, Opshelm
If you talk to anyone in the defensive information security profession who has been doing this for more than a few years, you will recognize that we share many things in common; one big thing is that we’re all exhausted. Defensive security teams are tasked with a mind-boggling variety of responsibilities, told that we have to prevent breaches and incidents that would put the company in the news at all costs, but also often told, “Oh, and don’t get in anyone’s way or do anything that might interfere with sales/marketing/product development/the CEO’s pet project.” We’re all overwhelmed and under-resourced, and we are constantly looking for new tools that can help us make better progress on our goals within the limitations that we’ve been given.
Unfortunately, our options there are often pretty limited. The push toward AI and ML as the saviors of the tech industry have left us with noisy tools that are difficult to interpret, with results we can’t trust. The focus on near-zero false negatives (no missed alerts) from these and other tools makes them so noisy and full of false positives that we can’t take any action without extensive investigation and research by the security team, increasing the workload that we were hoping to decrease.
The deceptive allure of AI
Having spent many years on the security side of various companies, I can tell you from first-hand experience how difficult it is to trust security solutions. As a member of the security team, the responsibility to keep your systems functioning falls mostly on your shoulders. How are you supposed to know which security solution to trust when vendors are never all that interested in explaining how their product actually works? It’s always: “Trust us, trust the black box, it does — insert AI magic here.” AI and anomaly detection are great buzzwords to justify a higher price tag and let a vendor’s sales team claim that, of course, the tool can detect X, Y, or Z, but letting them form the core of your security tooling is a recipe for disaster.
For example, I had the pleasure of working with a network security tool, which used full packet capture devices to collect packets from a number of choke points, analyze them for security events, and output alerts to the security operations team. This tool provided a pretty graphical representation of the traffic it was analyzing—another single pane of glass to check—along with alerts. However, because it was apparently using a proprietary, heuristic, machine learning, super secret, anomaly detection engine (ahem), the reason, and sometimes source, of the alerts was opaque to the security team. Alerts would arrive declaring that “something bad” had occurred “somewhere”, and here was the offending packet.
Of course, this is not actionable information without further context. Is this alert valid? How can we tell if it’s valid without knowing how it was derived? If this determination cannot be made, do we blindly trust the alert and risk wasting time and resources on tracking it down, or do we discard it and risk failing to act on a valid security event? Can we even do anything?
So, if security doesn’t have the time to investigate every one of these, why don’t they just pass them on to the responsible teams to review and either justify or act upon? Here’s why: security teams tend to have a finite amount of social capital to spend with other internal teams, which they draw down each time they involve others in something which turns out to be a false positive. It is rare that a security team can repeatedly ask other teams to investigate a false positive without earning a reputation. It has been my experience that security teams need to be careful with spending their social capital, which means only raising the alarm once they have high-quality, actionable information to share.
Returning to the vendor in question, their overall approach to alert opacity could be summarized as a “trust us, we know best”. It’s entirely possible that they did, in fact, know best with regards to if the alert is valid or not. However, this will cause the team tasked with triaging such alerts to expend time determining the cause of the issue, the location of the issue, and sometimes if there even is an issue. When a security alert is received, the team’s ability to quickly and easily triage it for false positives and make a determination on its potential impact makes a massive difference in their ability to respond effectively.
While this decision by the vendor’s legal and marketing teams does make some business sense, it has some unpleasant side effects for their customers. Often the desire that we saw from this vendor to protect intellectual property—or sometimes to obfuscate a not completely working product—inadvertently leads to a product that customers feel they cannot trust. Not to mention that “we trust the third party closed system” does not work with regulatory compliance or customers.
Going beyond alerts
What’s more, even if you do trust these companies to solve your problems in a black box, they often are just finding problems; the AI magic doesn’t actually solve anything. The security engineer gets a list of alerts in their inbox, and it is now their job to either identify and fix the underlying cause or to convince another team to do so. The vendors know that their alerts can’t be relied upon, so they stop there and leave everything else up to you. This creates loads of extra work and stress for the security team.
What that means on the security side is that you spend a lot of time testing and assessing which solutions you can trust and which will actually reduce your workload. For many years I searched for a tool that openly and transparently tells me how it works and how it’s going to optimize my system, my controls, and my time.
I have worked with multiple solutions over the years, which, in theory, attempt to do just that: provide optimization of systems, controls, and processes. These solutions end up attempting to ensure a security event would never be missed and create alerts for everything, whether it is relevant to the environment it is protecting or not. For example, one network intrusion detection solution I have worked with, in an effort to never miss an event, had rules which were as simple as a TCP connection to a specific port. Why? Because a remote access tool was known to run on that port, and the vendor believed they could not risk failing to alert on a connection to a remote access tool.
The quality of the rule, and hence the alert, was so low that any connection to that port could trigger the alert, and did, regularly. The result of this is not some kind of utopian security “full spectrum visibility”, but rather a deluge of alerts that frustrate the security operations team, use up valuable resources, and rapidly earn the platform the reputation of generating false positives. Alert fatigue leads to teams ignoring, or at least deprioritizing, alerts from that system. This quite famously and publicly happened to Target in 2014 (https://www.pcmag.com/news/target-ignored-data-breach-warning-signs) when their vendor created a valid alert to a security event which led to a breach, but it “did not warrant immediate follow-up”.This is not to say that a vendor has to be infallible, however, the quality and fidelity of alerts should be of a net benefit to the team, not a source of additional work.
So what is the solution? A security platform that finds, analyzes, and fixes exposed misconfigurations in your system without asking anything from the security engineer. A platform that continuously ingests cloud log events and periodically scans your environment to map and understand its current configuration. As changes come in, reviews can be made against predefined lists of rules or even custom rules. Sure, they can go back and look at what’s been done and even undo it if they find it necessary. But the work is done for them, and therein lies the automated remediation.
Lee Brotherston is the Founding Engineer of Opshelm. He is a seasoned security leader with decades of experience at all levels of security, and is the co-author of the hugely successful O’Reilly “Defensive Security Handbook.” With a knack for security research (most notably research into TLS Fingerprinting techniques), Lee is regularly invited to speak at security conferences like B-sides, BlackHat, and RSA.