Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage] [index] [jschauma@netmeister.org] [@jschauma] [RSS]

Defense at Scale

January 16th, 2016

The following is an approximate transcript of the talk I gave today at BSidesNYC 2016. The slides are also available here and on slideshare.


Last year, my colleague Chris Rohlf gave a keynote at BSidesNOLA entitled "Offense at Scale". Offense sounds fun. Pwn all the things. And you're always going to win! And normally I'm a big fan of being massively offensive. Unfortunately, I find myself on the defense when it comes to information security.

So I figured I'd pitch a talk on "Defense at Scale". And since that's a fairly easy topic, we can wrap it up in just two slides.

First of all, are you on Twitter? If so, cut that shit out. You have better things to do with your time. How are you going to defend your infrastructure if you're dicking around social media all day?

Also: follow me on Twitter.

Here's how you defend at scale. Can't be done. The end. Everything's fucked. You're pwned.

And this concludes my talk. Thank you for your time, you were great. Try the veal!

Did you know that infosec people can be a bit cynical at times? True story. This talk might as well be entitled "This shit does not scale." But let's pretend for a moment that not everything is completely broken. Or that even though we know everything is completely broken we somehow still haven't lost our motivation to fix things. (Because somebody has to.)

But scale warps everything, and the ways in which scale affects attackers necessarily must also influence how we defend our systems and infrastructure. And in doing so, large scale can quickly help identify which (defensive) tools, techniques and tactics are effective -- and which ones are a waste of time, money, and effort. These lessons can then also be applied at small scale, helping you grow.

I believe that this tweet really hits it on the head. The one thing that matters when trying to defend large scale systems is reducing your attack surface. There are many other things you can do, but this should be your overall goal, which it sometimes is easy to lose sight of.

But let us begin by defining what 'scale' is.

"But is it webscale?"

"At scale" is one of those words used a lot on the internet. What do we mean by "at scale"? Normally: "a fuckton". "A fuckton" of users. "A fuckton" of systems. "A fuckton" of connections, and "a fuckton" of packets going across "a fuckton" of wires.

At small scale, you may still have the illusion that you can actually secure your infrastructure. You and I know that you can't, but at small scale, you want to believe. But even though small scale has a significantly different attack surface and thus requires a different focus, a different threat model, it's important to take in the lessons from large scale, since you want to lay the foundation for scalable security practices early.

When you're a startup, you may think that "a fuckton of microsystems in the cloud" means "at scale" -- it does not. "At scale" is when you are the cloud, when you run your own private cloud. And weird things happen "at scale". Large scale gets you well acquainted with Murphy's Law. One in a million is next Tuesday.

Large scale means that you have a lot of everything.

A lot of employees. Who download malware, get pwned by script kiddies, actually get targeted by the NSA, who will get phished, attacked, bribed, compromised; employees who come, change roles, and go -- and sometimes come back. Employees who use the internet with Flash enabled, also known as "barebacking.

A lot of machines housing a lot of VMs and containers. (With a lot of hardware components, using a lot of firmware, being shipped to you by a lot of vendors.)

And with a lot of hardware, and if you're of a certain age, then you also accumulate a fuckton of baggage, cruft, tech dept:

It's worth noting that this system upon which I stumbled the other day is almost older than the social media service on which I bragged about it. (But seriously, shut that fucker down, what's wrong with you?)

At large scale, you have a lot of physical locations. With a lot of different security controls, mechanisms, physical protections, in a lot of jurisdiction with a lot of laws. A lot of data travelling across a lot of networks across a lot of jurisdictions, which you are now subject to.

And on top of all those many systems, VMs, and containers, you run a lot of different operating systems and versions with a fuckton of different pieces of softwares and libraries, all with a shitload of vulnerabilities.

In a word, you have a fuckton of attack surface.

You also have a lot of users (doing a lot of unexpected things), handing you a lot of their private data. (Access to the realtime stream of user updates at Twitter was called the 'Firehose'. Trying to process that data felt a lot like this.)

And having all this data and all these users also gets you a lot of attention. Attention from all sorts of attackers, ranging from competition, to script kiddies, to the various black hats, to government agencies and organized crime. We know that most attackers are resource bound, but at scale, you also do have those who are not. And it's important to acknowledge this: if you're big enough, then capable adversaries will have an incentive to go after you, both passively as well as actively. You and your employees become targets.

(There's a reason my official job title is 'Paranoid'.)

Here's what you do not have:

You do not have a fuckton of staff. You do not have a shitload of budged.

I don't care where you work, I'd bet that you're not appropriately staffed. And this is where we're getting somewhere: you're never going to have enough people to do all the work you want to do.

Suppose you had all the staff and budget you currently think is missing. How would that change what you're focusing on? I submit that you'd still be understaffed. Which means that you have to prioritize and figure out what's important.

Ok, let's take a quick detour here: Who here has heard of DevOps? DevOps has drastically changed how organizations develop, deploy, and maintain software. I'll get back to some of the concepts such as continous deployment and continous integration in a moment, but how did the emergence of DevOps change the game? By the principle of Skin in the Game.

Developers, who used to toss a WAR file over the fence to Ops to deploy suddenly get paged when things break. Ops were empowered -- and got the responsibility -- of fixing their stack. The teams began working together.

This is what we need to do. All too often, the security team is tossing a Proof-of-Concept over the wall to developers and ops to fix their stack.

Who triages your Bug Bounty intake? Who analyzes any possible exploits and builds Proofs of Concept and digs into the libraries and frameworks? Who spends most of their time, effort, energy on resolving these issues? The one that repeatedly ships code with code injection paths and persistent XSS?

Do your developers and ops people have skin in the game of security?

We need to get everybody in the company to take responsibility for the security of the code you ship, the infrastructure you build. You know what is a really good motivator in large organizations? Money.

Which department pays out the bug bounty?

If you're a CISO or Director of Security, consider billing other departments for security failures repeatedlly introduced. Make it clear that your team is there to help: your team can educate the others, can build secure libraries, help you design secure infrastructure, but everybody else needs to have some skin in the game.

Security is not going to be prioritized if it remains a cost center paid for outside of your organization.

You're low on staff and low on budget. You'll always be. You will never be able to do all the things you think are important. You'll always have to make difficult choices. Prioritize.

But don't just prioritize based on gut instinct. Measure your impact, measure your outcomes. Focus on what makes a difference. Accept what you cannot change (right now).

Remember as well that most of the time you do not need to accomplish perfect security -- you can't anyway -- but that it's frequently sufficient to just raise the cost for the attacker. You may not be able to defend against NSA's Tailored Access Operations, but if you can force them to burn an 0-day because you made passive surveillance impossible, then you've significantly increased the security of the majority of your users.

We know it's impossible to achieve a 100% secure state. We cannot possibly defend against all possible threats all the time. Can't be done. Which, of course, means that we have to figure out what we're throwing our weight behind. We have to prioritize. We also have to prioritize, because there's one thing that simply won't scale: Fucks.

And Infosec is hard on you here. You can't fix everything, and if you try, you'll soon find yourself out of fucks. Sometimes you simply have to very consciously and specifically not give a fuck. Cause otherwise, you'll suddenly find yourself exhausted, burned out, unable to give even just a minor fuck.

Prioritize. Do what matters.

In the mean time, how do we currently defend?

"Prevent, Detect, React"; somehow this was translated to "Firewalls, Alerting, Patching". What the fuck? Appliances? SIEM? I almost literally can't even. Almost. But don't you dare say "vulnerability management". 'Cause that shit does not scale.

Oh, you have a tool that tracks CVEs and opens tickets for each new vulnerability and then you correlate those tickets with your package inventory and you identify vulnerable hosts and then you open tickets with the teams who own the software stack and they have to update their machines? How's that working for you?

Yes, I rant. Because I wrote crap like that. Seriously. In come CVEs, out come Jira tickets. The whole thing. Bad idea. Won't scale. I repent.

Here's why this shit won't scale. Can't scale!

Vulnerabilities are not sparse.

Creating a lot of busy work for all your internal teams to upgrade hundreds of thousands of systems based on your mapping of thousands of ill-categorized vulnerabilities is not an efficient way to reduce your attack surface if vulnerabilities are plentiful. It's a laborsome, futile, and purely reactive approach.

And it burns your own political capital as well as, exponentially, the fucks your partners can possibly give. Every time you bug another team to upgrade a package on their systems, you are burning their fucks. Remember, fucks are limited!

This is how many departments and engineers view the security team.

Anybody here a fan of taking their shoes off at the airport? Why is it that we ridicule the TSA, but use the same reactive approach when it comes to software?

Taking off our shoes at the airport has not in any way reduced their attack surface in a meaningful manner. The same goes for individually patched vulnerabilities. Don't be the TSA.

So I've had this rule for a while: when you think about whether or not you should do something, think about whether or not it makes things worse; if so, stop doing it. If not, think about whether or not it makes things better; if not, don't do it.

Use the resources large scale offers you. Focus on the big things, on how to make a difference. Don't just look a quarter ahead, or a year. Define goals for where you want to be in five years. Small scale environments don't have the luxury to think long term -- you do.

The only way to make a meaningful difference, to reduce the attack surface without negatively impacting your productivity here, is by automatically updating all software regularly. No human interaction necessary. Just think how much we've increased end-user security in the last few years by moving our browsers to auto-update. We need to get there on the server.

It's important to remember that large scale also has significant advantages: large scale environments are more likely to have the resources to build the significant infrastructure components needed to regularly and automatically deploy updates across the board. However, I believe that we still have a long way to go before our servers and services are at a point even close to where our desktop OS and mobile applications are with their automatically and regularly applied updates.

Think big. Think long term. Consider how Microsoft turned itself around from being the butt of all jokes to running a tight secure software development program. This didn't come about over night; it took them around 10 years. But at large scale, you can have a 10 year vision and execute on it.

We suck at 'prevent'. We suck, because all we do is 'react'. We're chasing individual bugs. But vulnerabilities are dense. If you fix an SQLi on service X, you have not reduced your attack surface in a meaningful way. Again, this shit doesn't scale. Instead, you should focus on fixing not the individual bug, but the entire class of vulnerability. Don't fix an SQLi in framework F; fix all code/command injection across all frameworks.

At large scale, you need to have a team that provides secure libraries that need to be used across our entire company. But you need to commit. Don't fire and forget tickets. Identify the underlying vulnerability class and then prioritize addressing that. That may take a bit longer to solve, but it significantly impacts your attack surface. You're in the big league -- play like you mean it. Large scale can be an enabler here.

But you have to be agile for this.

We suck at 'prevent'. How about 'detect'? Surely we can't be all that terrible, what with all the cool products out there to help you detect APTs.

Here's a myth Security Vendors would like you to believe: they'd like you to think that you can detect compromises in real time. Sounds awesome, right? They'll use the term 'defense in depth' to sell you not one, but ten different platforms and modules to integrate into their proprietary system. Which comes without an API, nor a command-line client, and you can only query it using IE6 with both Flash and an outdated Java enabled and the results are emailed to you in PDF or Excel format.

THIS SHIT DOES NOT SCALE.

For the last ten years now I've seen many sales pitches and products that sound really awesome. Not a single one has been able to handle the traffic, the number of users, the number of firewall rules, the number of systems or what have you that we threw at them. Not. A. Single. One.

(Same goes for open source tools, but at least those we can fix or adapt.)

But as I said earlier, let's suspend disbelief and suppose your fancy IDS can actually process all the events and traffic generated by hundreds of thousands of systems with thousands of users. Suppose it doesn't just croak.

Who here thinks that their infrastructure is currently compromised? That you have attackers in your systems?

Ok, who here was paged by their SIEM today to alert them to the compromise?

Here's how a real compromise is detected:

  • random engineer gets paged because some frontend operations are taking a long time
  • frontend oncall digs in, finds that database updates are taking a long time, pages DB oncall
  • DB oncall digs around, finds that reads work fine, but writes take a while
  • digs some more, finds the DB is locked
  • goes "huh, weird"
  • digs some more, finds DB is locked b/c some process is reading it. And has been for a while. Because that process is reading all of it. Dumping it, exfiltrating it.
  • goes "ooooh..."

Your fancy-ass netflow monitoring did not catch this; your fancy-ass malware detection mechanism didn't catch this; your snakeoil-alert-on-anything-unusual-except-when-the-unusual-looks-legit product didn't catch it. A human going "huh, weird" detected it.

But across the industry, we continue to deploy more and more SIEMs and other vendor crap. We build our own, too. We replace syslog with syslog-ng, then syslog-ng with rsyslog, then rsyslog with fluentd and we feed all the data into splunk and then we add osquery and grr and bro and we collect all the processes running on every system everywhere, and we're very proud that we can do this "at scale", because that is in fact hard to do.

But what are we doing here, really? We're just creating a bigger haystack. I remain convinced that, increasing your hay stack does not make it easier to find the needle.

Sure, collecting a lot of data for forensics is useful. But this ain't no free lunch. With the limited resources you have, are you making a meaningful difference in spending time and effort on collecting more data? Are you actually acting on it?

I'd like to suggest that you should stop collecting data that you do not actually use. Don't build an infrastructure to gather information that would be nice to have just in case. Doing so reminds me of another government agency -- no, not the TSA.

This gets me to another rule I have: lead by example. All too often security teams present a strong "do as I say, not as I do" rule. That is, they feel excempt from the rules they enforce for others.

Likewise, if a team came to you and asked to collect all information about all processes on all systems and to store that information without having a good and immediate use case, would you greenlight that?

If another team asked you to do what you just proposed, would you say 'no'? If so, don't go and do it yourself.

Other people's fucks are also limited.

"No"s are something security teams are big on. But they shouldn't. Every time you tell another team "no", you're spending political capital and good will.

You're wasting other peoples' fucks, and I thought we had established that we all only have a limited number to give. You know what happens when other people run out of fucks? They stop talking to you. They begin working around the clever little restrictions you put in place.

Nothing more detrimental to your goals than putting security restrictions in the way of an engineer who just wants to get her job done.

We can't continue to react. We don't have the resources. Defense is far too reactive. We need to become much more proactive all around. One way to do this is to embrace automation.

Any task that requires humans to perform follow-up actions is doomed to fail to make a significant impact. Vulnerabilities are dense. Stop reacting.

In operations and development, the industry has embraced continous integration and continous deployment. Infosec professionals everywhere struggle with this. As do I. Much of this poses major security headaches, but there's no way around it.

The benefits are too many. And Infosec should not stand in the way of something that makes most engineers' lives easier. Instead, embrace it. Use it for your own goals: CI/CD allows you to integrate with all your software. You can perform tests here, ensure your security libraries are used (correctly), ensure vulnerable versions are not pushed out, etc. etc.

So, with all of this in mind, here are a few things I think we should already be doing today:

Note that all of these apply equally to your mobile, desktop, laptop, and server/container environments. There are some things we do well on the servers (configuration management, CI/CD) and some things we do well on the desktops/mobile (auto-updates). Focus on doing all the good things on all the platforms.

There are also a few things that I think we should be looking at in the future. Accept the fact that your network/endpoints are already compromised. Rethink your 'trusted network' model. Review Google's Beyond Corp paper.

Rethink whether or not you really need to have hundreds of user accounts on hundreds of thousands of machines just in case an engineer has to interactively log in. (Mark Burgess once noted something to the effect that that any time a user interactively logs in on a system, you have ruined its configuration management state. Be more resilient.)

Figure out your strategy for how to deal with containers; what visibility do you have into what software they were built with? How do you keep track of where they are deployed? What if they are short-lived and spun up on demand for specific requests?

Consider relying on remote attestation of hardware, signed software, trustworthy environment.

Think big.

January 16th, 2016


[An abbreviated, incomplete guide to help you decide whether or not you're plagiarizing] [Index] [Everything is Awful (And You're Not Helping)]