Scotty Bauer (a Utah grad student), Pascal Cuoq, and I have an article in the latest PoC||GTFO about introducing a backdoor into sudo using a compiler bug. In other words, the C code implementing sudo does not contain a backdoor, but a backdoor appears when sudo is built using a particular compiler version (clang 3.3, here). The advantages of this kind of backdoor include subtlety, deniability, and target-specificity.
Of course I have no idea who, if anyone, is trying to use compiler bugs to create backdoors. However if I worked for the offensive wing of a well-funded security service, this would be one of the tools in my toolbox. But that’s neither here nor there. What follows are some ideas about how various groups of people can help defend against these backdoors.
Compiler Developers
- Fix known miscompilation bugs as rapidly as possible
- Consider back-porting fixes to miscompilation bugs into maintenance releases
- Go looking for trouble using fuzz tools
Maintainers of Open Source Packages
- Be suspicious of baroque patch submissions
- Consider rewriting patches
OS Packagers
- Assess compilers for reliability before selecting a system compiler
- Aggressively test compiled code for all platforms before deployment
- Run a trusting trust test on the system compiler (our attack in the PoC||GTFO article isn’t a trusting trust attack– but this won’t hurt)
End User
- Recompile the system from source, using a different compiler or compiler version
- Run your own acceptance tests on precompiled applications
Researchers
- Create practical proved-correct compilers — which won’t contain the kind of bug that we exploited
- Create practical translation validation schemes — which would detect the miscompilation as it occurs
- Create practical N-version programming systems — which would detect the backdoor as it executes
Overall, this kind of attack is not easy to defend against, and my guess is that most instances of it (if any exist) will never be detected.
19 responses to “Defending Against Compiler-Based Backdoors”
I wonder if differential compilation could be automated for detecting this sort of attack. Likely at the individual function level and via static analysis and/or fuzzing. Sort of an offline form the the N-version option already mentioned.
It would have the advantage of providing notice to the developers rather than the users but the down side of being fiendishly difficult to get authoritative results from.
Off hand I expect the hard problem would be accurately filtering out false positives caused by “impossible” inputs, kind of like the requirement that c-smith not provoke UB.
—-
Tangential thought: Differential fuzzing of the input of production code would be one way to detect attacks *and the compiler bugs that enable them*. Differential fuzzing of the inputs/initial state of random programs would still detect compiler bugs. Have you considered combining c-smith and AFL?
I was certain I was going to see at least a fleeting reference to the “Ken Thompson Hack” (http://c2.com/cgi/wiki?TheKenThompsonHack), but alas, got to the end without one.
It is an interesting problem. I have no doubt that certain agencies are trying to exploit this attack vector; in fact, I think it was just a few months ago that we read about some government contractor giving a presentation about tampering with Apple’s code generation tools (in theory, not in deployed attacks).
Dan, the link to “trusting trust test” will make you happy.
What if it’s your CPU that, once activated, introduces your compiler to produce backdoored code?
Or, for that matter, any other part of your trusted(?) hardware platform that has write access to the relevant bits? Hacked microcode updates, anyone?
E.g. ship trojaned parts to target, wait for audit to pass,
then, when it has been long enough in production, start producing trojaned code by the bulk.
I suppose this is just one part of the trust chain that is usually ignored or omitted.
John, Why should compilers writers do anything? Its the users problem and if they are not willing to pay their compiler vendor to fix problems then they are obviously not interested in the fall out from compiler faults.
Compiler writing is a business like any other. Perhaps compiler writers should start introducing fault triggers into their products and selling details to the highest bidders.
@Derek: Your “corrupt compiler writer” model only works so long as no-one suspects what you are up to. As soon as they do they will run a mile and switch to a compiler maintained according to higher standards. Even if the faults are deniable all the way through, and you have lock-in because you are the only compiler developer for a platform (for example), your platform will eventually get a reputation for being hacked, even if no-one can directly put a finger on who is to blame. So hopefully your evil plan will fail. Hopefully!
Hi Derek,
All the more reason to use open source tools and avoid imbalanced trust relations with vendors. In an ecosystem with a plethora of languages and tools, I think vendors are going to find themselves on the losing end with customer-hostile features like you propose.
Ben,
My comment is more applicable to open source than close source (where the vendor sells compilers and has an incentive to keep customers happy and can be sued). What does an open source compiler writer care what you think (unless you happen to be one of the large vendors that employs them)?
bcs, I’ve wanted to work on this for a long time. My idea is basically to compile each function twice and use a (hypothetical, but plausible) binary comparison tool to prove equivalence. When verification fails (it will, for big functions), split the function into 2 parts and try again, recursively. Eventually the verification will work, although efficiency will suffer due to added function calls. Obviously the splitter has to be trusted. I even have a prototype splitter sitting around somewhere. Getting a working equivalence checker was the problem. I talked to multiple research groups about this but nobody seemed to have one that worked very well.
Unspecified and undefined behaviors are a problem with this story!
Differential fuzzing at the function level is a good vision but very difficult when you don’t know the function’s preconditions.
Jon, these are all separate (and hard) problems…
Derek, the LLVM and GCC people generally fix wrong-code bugs fairly quickly, and that’s good enough for me.
I can’t speak for my coauthors, but one of the reasons I worked on the PoC||GTFO article was to help change the economics of compiler correctness.
What about fuzzing at the whole program / c-smith level? Can the global variables that gets dumped at the end be initialized arbitrarily or are there forbidden values/states that must be avoided?
bcs, fuzzing at the whole-program level is important and necessary, but there’s not going to be any way to get anything close to the level of coverage that you’d want in order to be sure there aren’t any miscompilations.
I’m pretty sure that in the short and medium terms, we’ll need to exploit existing modularity in the program (e.g. functions) to get something like translation validation or equivalence checking to work. The only ways to avoid this are verified compilers or supernatural-level testing of whole programs.
I wonder how far CompCert is from being able to compile a barebones Linux distribution? That would be a good project. Performance would suffer, but perhaps not that much. Compiling an OS kernel with CompCert would no doubt be a project in itself.
In c-smith; are there any constraints on the initial values of the global variables that eventually get dumped? If not then it should be reasonably simple to set up AFL to init them for each run. For data dependent control flows this might produce interesting results that pure c-smith might miss (or take longer to find).
[Grrr. I only intended to post one of my last two comments.]
Yeah, detailed fuzzing of production code is likely unobtainable in the neat term. Thankfully that’s not what I’m suggesting. Rather a way to possibly get better (or maybe just differ) coverage out of c-smith’s synthetic random code.
bcs, I can delete a comment for you if you tell me the number.
Regarding amplifying Csmith using something like afl-fuzz, it’s worth trying, though in effect Csmith has data values that are opaque to the compiler due to reads from volatile and also complex code that performant compilers have no hope whatsoever of analyzing.
Another possible mitigation would be that, when wrong-code bugs are fixed, the compiler is also updated to warn when it compiles code that would have triggered the wrong-code bug on the unfixed compiler.
This way, as the fixed compiler percolates out into the ecosystem, you’ll pretty quickly find any examples of real-world code that triggered the bug. They can then be examined for any security / backdoor implications, and any bad actors should be exposed relatively quickly.
kme, I’ve long wanted to do something similar to your suggestion. For each of a large collection of miscompilation bugs:
1. isolate a patch for that bug
2. compile an entire distribution using a pair of compilers that differ by only this patch
3. inspect all instances where the object code differs
There are practical difficulties — just isolating minimal patches for GCC is often not straightforward. Step 3 will be impractical in some cases since there might be lots of binary differences that aren’t miscompilations.
I like your suggestion but suspect that the necessary buyin from compiler developers won’t be there.
kme, just for the record, the company I work for (IAR Systems) from time to time do something similar to what you propose, i.e. create a special compiler that reports if code is compiled based on the same faulty assumptions that caused the original bug. And it’s not something recent, we’ve done it for 10+ years. I suspect that if you talk to the big commercial build chain vendors they probably do the same.
However, routinely doing this for all bugs and weaving the bug detection code into the production compiler would most likely create a maintenance nightmare down the line.