A lot of software defects stem from developers who lack a proper sense of paranoia. While paranoia can of course be taken to idiotic extremes, a healthy respect for Murphy’s Law is useful in programming. This post is a list of useful paranoid techniques that I hope to use as the basis for a lecture on paranoid programming for my “solid code” class — please leave a note if you have a favorite paranoid technique that I’ve missed.
- using plenty of assertions
- validating and sanitizing inputs
- checking for all possible errors returned by library functions, not just likely ones
- erring on the side of caution when corrupted state is detected, for example by halting or restarting part or all of the system instead of attempting to repair and recover
- checksumming RAM and ROM contents in embedded systems
- minimizing the scope from which identifiers are visible
- using and paying attention to compiler warnings and static analysis results
- using and paying attention to the output of dynamic tools like Valgrind and Clang’s UBSan
- using (to whatever extent this is feasible) formal methods tools like Frama-C to verify tricky, important functions
- writing fuzzers even for software that shouldn’t need to be fuzzed
- leveraging the type system to get better guarantees, for example by avoiding unnecessary type casts and by putting some kinds of integer values into single-members structs when writing C
- for critical loops, writing down the loop variant and invariant
- if possible, building the code using different compilers and testing the resulting executables against each other
- using timeouts, retry loops, and watchdog timers when appropriate
- aggressively keeping functions simple
- aggressively avoiding mutable global state
- not using concurrency unless absolutely necessary
- taking advantage of whatever language support is available for controlling mutable state: const, pure functions, ownership types, etc.
- giving subsystems the least privileges they need to do their jobs, for example using seccomp
- testing, testing, testing, and coverage
- being aware of current best practices for things like choice of libraries
- tracking down transient failures rather than ignoring them
- disabling interrupts inside of interrupt handlers, even when timing arguments support the non-existence of nested interrupt handlers
- getting people who didn’t write the code to read the code
- conservative allocation of stack memory in environments where stacks are a constrained resource
- unit-testing library functions, particularly mathematical ones, instead of trusting that they are correct for all inputs
As a random example, several commenters (and Bruce Dawson in email) took issue with the volatile-qualified lock variable in this post; my view is that this qualifier probably does no harm and might do some good by encouraging the compiler to be careful in how it touches the lock variable. This seems to me to be a reasonable application of paranoia; obviously others disagree.
The other day someone asked me how paranoid programming differs from regular old defensive programming. I don’t necessarily have a great answer but (1) the top 10 google hits for “defensive programming” seemed to be missing many of the techniques I’ve listed here and (2) “defensive” doesn’t quite capture the depth of my mistrust of computer programs. Just because you’re paranoid doesn’t mean they’re not really out to get you, and make no mistake: the computer is out to get you (as are the people on the other side of the network).
35 responses to “Paranoid Programming”
As part of my own paranoia I use ‘hard’ compiler options. My templated Makefile for c++ have these:
-Wall -Werror -Wextra -Weffc++ -pedantic
0. Code coverage, and set a required percent.
1. Comments – good, useful comments. I spent a couple hours commenting and documenting today (post code review). One section of the comments were more apologies for a hack, and while writing out the apology figured out how to implement the feature correctly.
const
I’m not sure if this counts as paranoid, or something more extreme, but MISRA has a whole bunch of rules like eschewing recursion and dynamic memory allocation. (I’d give a link to a PDF copy I found on Google, but I’m pretty sure the MISRA C standards are not meant to be freely available.)
While some of those rules seem very extreme, they may be prudent when the software directly controls human safety features. It sounds like Toyota could have benefited from them: http://www.safetyresearch.net/Library/Bookout_v_Toyota_Barr_REDACTED.pdf
These probably fall within the realm of “validating and sanitizing inputs”, but to this (quite excellent) list I would add: (0) always-always-always null-check pointers, and (1) never trust communication channels, and protect communicated data with an appropriate checksum or hash in response.
We can increase the paranoia level of (0) by insisting on runtime pointer range checking (such as to within the valid memory space of an embedded machine) or even a live stack keepout or keepin check, while we can make (1) more paranoid by insisting on application- or presentation-layer checksums even when the data link layer of an underlying protocol (such as Ethernet) already provides a CRC. In the case of high-bandwidth interfaces like this, the extra bits are cheap insurance against “but-this-should-never-happen!” sorts of corruptions.
I’ll be linking to this post in this week’s Embedded Link Roundup on UpEndian.com – thanks for the great information!
That’s an awesome list BUT one way to have all the code you write be solid is to write no code at all. Paranoia is not the most productive emotional state; I still struggle with balancing paranoia (new tests, new checks) and new features. Would love your list to be annotated with some sort of difficulty score (how long would it take to implement this?) and some sort of value score (how useful is it?).
Write redundant code. That is code that will enforces requirements in more than one way.
Make code fail as early as possible. Failures at compile time (e.g. via the type system) are better than in the unit tests are better than at boot time (i.e. before accepting any inputs) are better than at run time.
Do thing so that you end up thinking more up front and less while coding.
I often use these in combination when changing a program. Find a change I can make that will break the build in ways that have obvious fixes that will always be the correct things to do. Then I just keep building the code and fixing compiler/test errors until the test pass again.
++ not using concurrency unless absolutely necessary
Quick story that helped reinforce this one for me: I am advising a student on his senior project. He recently needed to implement some asynchronous network communication code. (This is a student who has heard me rant at great length (and hopefully cogently) against the use of threads).
Student: So I used threads in this part because I need to wait for a response to this request, but still remain responsive.
Me: Well, you’re probably better off using some kind of event loop framework, but it’s just a prototype now so no big deal.
S: I know, I know, but it’s such a simple use of threads, I’m sure it’s okay.
… later …
S: You see, one thread just loops until the other thread sets this flag to True. [with no synchronization.]
M: Textbook data race.
S: Huh? But only one thread is writing.
M: [Hangs head in mock despair.]
Lesson (to the extent that there is one): Normal programmers don’t know how to use threads properly. They just shouldn’t.
Probably not going to happen except for serious embedded code, but showing either actual WCET or at least constant upper bound on every loop is nice and somewhat doable if you’re paranoid.
In C instrumenting to do fixed initialization of all stack-local structs, and having a debug/test mode where you can scan for non-initialized fields. Can do on pass-in structs also, but failing to initialize a field in a locally constructed struct seems to be a common problem.
> via the type system
Yes! Even in C, if I were more paranoid, I would use single-member structs all the time.
Thanks for the comments, folks! I’ve done some minor updates to the post incorporating some of your feedback.
Ryan, I like the MISRA rules and teach them to my embedded software class. It does contain a few hilarious ones like “you shall not execute undefined behavior” and some of them are obviously nonsense for general-purpose programming.
Eddie I like the idea of scoring with costs and benefits but those things are so situational that I don’t know how to do it. Re. the wisdom to not write new code, I wish I had it!
David, single-member structs are a great tool, it’s too bad they uglify the code so much. Maybe a better solution is bolting on a stronger type system. I’ve used (and loved) PC-Lint.
Another technique is the use of ‘const’ on statically allocated data structures that don’t need to change, allowing them to be allocated in the read-only data segment and read-only memory at runtime. This is particularly useful for tables of function pointers such as you might find in a state machine.
There’s also the technique of filling dynamic memory with poison values immediately before it is freed.
Although I’ve only ever used this during testing, but if the output must absolutely be correct, implement multiple algorithms to compute the same thing and compare the outputs. If the outputs don’t match, fail as gracefully/safely as possible. Obviously this is not practical in all or even many situations, but in some it is useful. I’ve personally used this approach when implementing something particularly complicated by first implementing the simple/slow version, and then the fast version. The outputs get compared with unit tests and a fuzzer. This serves as a nice double check that the fast version didn’t miss handling some case.
@Will: A post here a while back is relevant to that:
http://blog.regehr.org/archives/467
Paranoid programming can also very fruitfully be augmented with hygienic programming which among its basic tenets includes having a CI/build pipeline from day 1 instead of day 100 and following best practices for packaging your software.
The slogan “make illegal states unrepresentable” fits here. It comes from the world of functional programming, but it applies anywhere — choose data structures, argument and return types, etc so that illegal values don’t have a representation at all. As a trivial example, if your function can be commanded to start and stop, taking an enum with values START and STOP is preferable to taking a string that’s intended to be either “START” or “STOP”.
Use weak guards for cycles (i.e. != instead of <). This makes off-by-one errors painfully visible.
Check for class invariants, either manually or using aspect-oriented-programming (and possibly disable this in production).
Good list and comments.
I’m not sure if process is in bounds here, but if you can afford it:
– Survey existing products/projects before writing code; considering buying (carefully) before making. This isn’t a programming technique, but from my experience this is a incredibly common, extremely expensive, massively-bug-generating mistake
– Carefully select and ensure adherence to a coding standard
– Continuous integration– if you can afford it, do it
– For embedded programming, do as much testing as possible off the target, then do as much testing as you can on the target (even including running unit tests; investments needed to support this will pay off on most projects of reasonable scale)
– To follow “conservative allocation of stack memory in environments where stacks are a constrained resource” — consider maintaining a canary for stack bounds. Diagnosing failures related to and fixing a stack overflow bug in a production embedded system is… expensive. Note: I’m referring to stack overflow, not stack buffer overflow
– If at all possible, give the code to someone else to beat on; it is best if they are detail- or security- oriented, but the more, the merrier
Use compile-time asserts whenever you can; some kinds of assumptions can be validated with no runtime cost at all. e.g. if you have two structs or typedefs that are supposed to be the same size, you can simultaneously verify that and document the assumption right next to the code that depends on it, using static_assert or your own similar macro.
When making an optimized version of an algorithm, always keep the original version around (disabled by #ifdef or equivalent). Ideally this can give you a debug mode that can be enabled at compile-time or run-time, where the optimized and original versions will both be run and the results will be asserted to be the same. Even if that is not practical, at a minimum it gives you the ability to switch back to the original algorithm if a problem is discovered with the optimized code. Or if you have to port the code to a new platform and the optimized algorithm doesn’t work on it for some reason.
For the same reason, when writing “tricky” code that depends on compiler-and platform-specific features (such as vector intrinsics) its a good practice to always write a “simple”, portable implementation at the same time, even if it is disabled by an #ifdef.
When discussing assertions previously, you mentioned having two levels of asserts: “light” asserts that can be used all the time, even in production builds, and “heavy” asserts that are too expensive to have on all the time so they are only enabled in some kind of heavy-checking build. Its funny to me because I call those heavy asserts “paranoid asserts”, and name the macros etc. the same as my standard assert macros but with “_PARANOID” at the end of the name. It then makes sense to have a “paranoid build” which is just like a regular optimized build except with all of the paranoid error-checking and validation stuff enabled. If you have test suites, fuzzers, etc. then it makes sense to run them against that build too.
I personally recommend a good coding style. It could have helped avoiding the recent Apple security bug: goto fail; goto fail;… I mean: fail! 🙂
Automatic re-formatting of code that forces a particular coding style can also be handy. There are systems around GIT that don’t accept a change if it’s not the right coding style (e.g. wrong indentation). That could have caught that bug, too.
Given Apple’s recent ‘goto fail’ embarassment, perhaps basic unit testing ought to head the list…
http://arstechnica.com/security/2014/02/extremely-critical-crypto-flaw-in-ios-may-also-affect-fully-patched-macs/
Thanks again, folks, this is great stuff.
One really nice technique no-one has yet mentioned is Spolsky’s tip about naming your variables and functions to represent what types of values they can hold. This makes the code look wrong straight away if you mix different types without converting them first.
http://www.joelonsoftware.com/articles/Wrong.html
@Magnus This reminds me of the Hungarian notation. It sounds bad. Modern IDEs should help the programmer with the types needed by the function and a good type system should not allow dangerous/not well-defined castings to compile (C is not very strong on this…). I.e. both the programming and the checking can be (and for the sake of sanity, probably should be) automated.
Re: “erring on the side of caution when corrupted state is detected, for example by halting or restarting part or all of the system instead of attempting to repair and recover”. This is good for security but not so much for safety nor availability. Think of a fly-by-wire system, or even a Web server under denial-of-service attacks. I would rather rephrase this as “Think of a fail-safe mode to be entered when something unexpected happen. Make sure you test or otherwise verify this degraded mode of operation 10 times as much as the rest of the code.”
Xavier, I agree, thanks!
1. In C, uninitialized static variables get initialized to zero. Try hard to adjust your logic so that zero is an illegal value and trap for it. For example, enum {FIRST=1, SECOND, THIRD} var;
2. If possible, arrange your build scripts so that each version you build has a visible unique label (perhaps a automatically incrementing version number; or the date and time; CRC; etc). This makes it harder to get confused about which version you’re running or which version you saw the bug in.
3. When entities have nontrivial amounts of both read-only and read-write attributes, organize the read-only portion as an instance of a const structure in read-only memory and include in it a pointer to the corresponding instance of the read-write structure. Reference the entity by a pointer to the read-only structure. (Use of const provides compile-time protection against writes to the read-only structure; allocating it in read-only memory provides run-time protection.)
Paranoid with a capital “P”…
1. I have worked on a few programs where the requirements mandated periodic testing of the CPU instruction set. In 25+ years I can recall two instances where this sort of test actually detected a CPU failure.
2. Perform the power-up RAM test with code that is not dependent on RAM. That is, no use of stack, all variables are kept in registers, etc.
I’m reminded of the first time I wrote a RAM test. The algorithm was simple: I wrote a value into RAM, read it back to see if it was the expected value, and then went on the next location. My boss smiled and did the silliest thing: he pulled out a RAM chip. My test still passed. The RAM circuitry was isolated by line drivers from the executable memory, so with no memory chip to load the data bus, bus capacitance acted as a short-term 1-bit memory.
A few jobs back I worked for a company that made controllers for newspaper printing presses. Paper races through the machine at a prodigious rate. The worst possible event is a “web break” – when the paper tears. The result is paper shooting out of the machine and snaking around in all directions. Serious or fatal injuries can be expected. The software did periodic hardware tests and had lots of robustness checks. If a problem was detected, we entered a tight loop that lit a failure lamp and kept the paper moving at a uniform speed. The operators used software-free backup controls to halt the press.
I forgot to point out that with the print press controllers, the worst way to handle an error would be to halt the software — that would all but guarantee a web break.
re 24 Mate Soos | February 25, 2014 at 6:22 am
Maybe I should have used a word different from ‘type’. The technique helps you avoid mistakes like writing unescaped text to a database (SQL injection). The function that writes to database is named so the programmer can tell it should only be given escaped text. Have a look at the link in my post, it’s a nice technique.
Hello guys,
Since comments are blocked in the assertions article and you mentioned the use of assertions in this one, I just want to ask you something about the use of assertions with TDD.
I started studying TDD about a month ago and I read somewhere (I can’t remember where) that when you develop code with TDD, there is no need to use assertions, since you have tested all the code that is part of your application for both the common cases and for the cases where the inputs contained errors, just to be sure it will always work as expected.
What is your opinion on that statement?
“you shall not execute undefined behavior” isn’t as hilarious as it should be. I’m sure you’ve heard someone say “I know it’s technically undefined but that’s okay because when I try it it does the right thing”
#27 Don Bockenfeld wrote:
> In C, uninitialized static variables get initialized to zero. Try hard to adjust your logic so that zero is an illegal value and trap for it. For example, enum {FIRST=1, SECOND, THIRD} var;
I have use Don’s approach. I have also made zero correspond to a legal but failed state e.g. enum { BROKEN=0, FIRST, SECOND, THIRD} var;
I can see arguments for and against both approachs. These arguments depend heavily on context – I especially favour the second where static analysis is used to enforce a stricter type modboth el for enums.
I would love to understand why the second approach might be considered a bad idea, particularly from a paranoid perspective.