Straight Man

Hank Devereaux, chair of the dysfunctional English department at a small university, is having a midlife crisis.  His wife, leaving town, fears he’ll be either in jail or the hospital before she returns — and she is not disappointed.  Straight Man is hilarious, I had to stop reading it in bed because it was too hard to giggle quietly enough to keep from waking Sarah up.  I don’t know anything about Russo but it is clear that he has spent time in academia, his portrayal of the odd personalities and petty politics is too perfect to be guesswork.  This is one of the best books I’ve read in years, I highly recommend it.

Margin in Software Systems

Margin of safety is a fundamental engineering concept where a system is built to tolerate loads exceeding the maximum expected load by some factor.  For example, structural elements of buildings typically have a margin of safety of 100%: they can withstand twice the expected maximum load.  Pressure vessels have more margin, in the range 250%-300%, whereas the margin for airplane landing gear may be only 25%.  (All these examples are from the Wikipedia article.)

We can say that a software system has a margin of safety S with respect to some external load or threat L only when the expected maximum load Lmax can be quantified and the system can be shown to function properly when subjected to a load of (1+S)Lmax.  Software systems are notoriously low on margin: a single flaw will often compromise the entire system.  For example, a buffer overflow vulnerability in a networked application can permit an attacker to run arbitrary code at the same privilege level as the application, subverting its function and providing a vector of attack to the rest of the system.

Software is often defined to be correct when, for every input, it produces the correct output.  Margin is an orthogonal concern.  For example, there exist systems that are formally verified to be correct, such as CompCert and seL4, that have no margin at all with respect to flaws not covered by the proof — a bug or trojan in the assembler, linker, or loader invalidates the safety argument of either system.  Similarly, there exist systems that are obviously not correct, that have considerable margin.  For example, my Windows laptop has enough RAM that it can tolerate memory leak bugs for quite a while before finally becoming unusable.

There are other examples of margin in software systems:

  • Many classes of real-time systems have a characteristic utilization bound: a fraction of total available CPU time that, if not exceeded, permits all sub-computations to meet their time constraints.  Real safety-critical systems are usually designed to use less CPU time than their theoretical utilization bounds, providing margin against spurious interrupts or other unforeseen demands on the processor.
  • Forward error correction provides margin against data corruption.
  • n-version programming and replication provide margin respectively against software and hardware defects.
  • It is common to pick a cryptographic key larger than the smallest currently-unbreakable key, providing some margin against future improvements in processing power and cryptanalysis.

The piece that is missing, as far as I can tell, is a collection of broader results about how margin in software systems relates to overall system reliability, and how useful kinds of software margin can be obtained at acceptable cost.

What are some general ways to gain margin?  Overprovisioning a resource, as in the examples above, is very common.  Defense in depth is also important: many software systems have only two lines of defense against attack: safety checks at the application level, and safety checks in the OS kernel.  If both of these defenses fall — as is common — the entire system has been subverted.  Virtual machines, safe language runtimes, and similar mechanisms can add layers of defense, as can firewalls and other external barriers.

Something I want to see is “margin-aware formal methods.”  That is, ways to reason about software systems under attack or load.  The result, ideally, would be analogous to well-understood principles of safety engineering.  Examples already exist:

  • Symbolic robustness analysis is an early proof-of-concept technique for showing that small perturbations in the input to a control system result in only small changes to the control signal
  • The critical scaling factor in scheduling theory is the largest degree of slowdown computations can incur before deadlines start being missed
  • Byzantine fault tolerance is a body of theory showing that a distributed system can produce correct results if fewer than a third of its nodes are compromised

An important obstacle to margin in software systems is the high degree of coupling between components.  Coupling can be obvious, as when multiple activities run in the same address space, or it can be insidiously indirect, including common failure modes in independent implementations, as Knight and Leveson observed in 1986.  There are many other kinds of coupling, including reliance on multiple copies of a flawed library, a single password used across multiple systems, etc.  It can be very hard to rule out all possible forms of coupling between components — for example, even if we use Byzantine fault tolerance and n-version programming, we can still be compromised if all chips come from the same (faulty or malicious) fab.

In summary: engineered physical systems almost always have margin with respect to failures.  Too often, software does not.  This should be fixed.  I want my OS and web browser to each come with a statement such as “We estimate that no more than 25 exploitable buffer overflows remain in this product, therefore we have designed it to be secure in the presence of up to 70 such problems.”

Picking a Research Topic in Computer Systems

This post is a collection of observations and advice for people who want to choose a research topic in computer systems.  I’m not claiming to be some kind of genius in this area, but I have enough ideas that they seemed worth writing down. This advice is probably most useful for graduate students in CS, but may also be helpful for junior profs and for undergrads interested in doing research.

Picking a research area

By “research area” I mean a sub-area of computer systems, such as object-oriented operating systems, distributed hash tables, or whatever.  This often ends up being a pretty pragmatic decision and it is understood that a good researcher can work concurrently in multiple areas or change area as often as every few years.

The easiest way is to choose an area that a lot of other people are working on.  I’m quite serious, and this method has some important advantages.  First, all those people at MIT and Berkeley aren’t stupid; if they’re working on ultra-reliable byzantine transcoders, there’s probably something to it.  Second, if you work on the same thing others are working on, the odds are good they’ll be interested in your work and will be more willing to fund and publish it.  The potentially serious drawback is that you’ll likely miss the initial, exciting phase of research where lots of new things are being discovered and there’s not yet a lot of commonly accepted wisdom about what the best solutions are.  In the worst case, you’ll miss the entire main period of interest and arrive late when the best people have moved on to other topics.

The research area that you choose should have a chance of making a difference.  Systems research tends to be applied, and a great deal of it is engineering rather than science.  The focus, in most cases, is making something work better for someone.  The more people benefit, the better.

Fundamentally, the area you choose needs to be one that you have some natural affinity for.  You should enjoy working on it and your intuition should strongly suggest that all is not right: there is work that needs to be done.

If you’re a PhD student seeking a research assistantship, it would be practical to choose a research area that your advisor has a grant to work on.

Departing from the accepted wisdom

Every good research idea is a departure from the accepted wisdom, but it’s important to depart at the right level.  Consider these extremes:

  1. You reject the notion of binary computers.  Instead, ternary computation will sidestep most of the fundamental problems faced by computer science.  Everything from instruction sets to complexity theory must be thrown out and rebuilt from scratch.
  2. You reject the notion of the semicolon as a statement terminator in programming languages.  Instead, the dollar sign should be used.  A revolution in software productivity will ensue.

The first idea diverges from the status quo in a fundamental way, but it will be difficult to get buy-in from people.  The second departure is too minute to matter, and nobody will care even if you do a big user study showing that the dollar sign is better with p<0.05.  In both cases, the research idea does not feel like one that will change the world.  In contrast, some great examples of departing from the conventional wisdom in the right way can be found in David Patterson’s work: see RISC and RAID.

Focusing a skeptical personality

If the point is to challenge the accepted wisdom in some fashion, you can’t very well go believing everything people tell you.  Computer systems is not exactly a rigorous area of science and you will hear all manner of ridiculous explanations for things.

Good systems problems, in my experience, come from noticing something wrong, some discrepancy between how the world works and how it should work.  However, it doesn’t appear possible to do good work based on someone else’s observations.  For example, you can tell me that parallel programming is hard or software transactional memory is too slow, but the fact is that if I haven’t seen the problems for myself, if long and bitter experience hasn’t trained my intuition with a thousand little facts about what works and what does not, then I’m probably going to come up with an irrelevant or otherwise bad solution to the problem.

How does this work in practice?  You go about your business building kernels or web servers or whatever, and along the way there are a hundred irritations.  Most of them are incidental, but some are interesting enough that you start pulling at the thread.  Most of these threads die quickly when you discover that the problem was adequately solved elsewhere, is boring, or is too difficult to solve at present.  Every now and then it becomes clear that there’s a real problem to be solved, that nobody else is attacking it (at least in the right way), and that it would be fun to work on.  These are your potential research projects.  There are probably other ways to find research ideas in computer systems, but this is the only one I know of.

Here’s an anecdote.  As an instructor of embedded systems courses I’d long been annoyed by buggy cross compilers for microcontrollers.  Students who are struggling to write correct code do not need this extra level of hassle.  Finally, one day I was showing a small snippet of assembly code in lecture and a student noticed that it was wrong.  I assumed that it was a cut-and-paste error, but it turned out the compiler was mistranslating a 2-line function.  This seemed beyond the pale, so I wrote some more test cases and tested some more compilers and kept finding more and more bugs.  Even at this point, after I’d spent probably 40 hours on the problem, it was not at all clear that I was moving towards any kind of research result.  It was only after hundreds of additional hours of work that it became obvious the project had legs: everyone knows that embedded compilers contain bugs, but probably most people would be surprised that we were able to find (so far) 200 compiler bugs including major wrong-code bugs in every C compiler we’ve tested.  So in this case, the surprising result is quantitative.

A few more observations

  • The best research problems are often those that are not yet of major industrial interest, but that will be addressed by billion-dollar industries in 10-20 years.  Once a problem becomes the focus of intense industrial interest, it becomes a difficult target to attack from academia.  At the very least, you need a seriously new angle.
  • Don’t get into a research area late in its life cycle.
  • Thomas Edison said it’s 1% inspiration, 99% perspiration.  In computer systems it’s more like 99.9% perspiration: if you’re not careful you can build things for years without getting any research results.
  • It’s really not possible to figure out in advance which promising ideas are going to pan out and which ones are distractions.  Keep a number of different ideas in mind and work in directions that cut off as few options as possible.
  • Ignore sunk costs.  Always be ready to drop an idea, no matter how proud of it you are.
  • If you become aware of someone credible who is working on your exact idea, drop it.  First, at this point you have a 50% chance of winning the race to publication.  Second, duplicated work wastes time and tax dollars.  Life’s too short.
  • Often, problems and obstacles that seem insurmountable at first can be flipped around and turned into interesting features of your work or even advantages.
  • Periodic re-examination of assumptions is useful.  A recent example I like is Google native client.  Most efforts to isolate untrusted binary code just take whatever the compiler outputs.  The Google project gets good mileage by hacking the compiler so that the code doesn’t contain so many stupid things.  It’s a good idea — who said the compiler was set in stone?
  • If your project seems boring, think about dropping it.  If you’re not excited why would anyone else be?
  • Writing a couple of paragraphs about an idea has the effect of revealing bad ideas for what they are, and making good ideas better.  It’s almost comical how many of my ideas look silly when written down.  It’s definitely possible to read too much, but it’s probably impossible to write too much.
  • Smart people love to solve problems, regardless of their relevance.  Avoid the trap where you improve a result far beyond the point of diminishing returns.
  • Code tweaking is seductive; if you do it, consider it time spent on a hobby, not time spent doing research.
  • Once you’re invested in a research area, it’s tempting to stay there since you’re on top of the learning curve.  This can turn into a trap.  When it’s time to move on, just do it.  Many of the best researchers change areas several times during their careers.
  • Having a grand vision up-front is OK as long as it’s somewhat vague and doesn’t prevent you from seeing the actual situation.
  • Computer systems speak to you, in their own way.  Always listen to what they’re telling you.


Picking a good research problem is at least half the battle, especially for PhD students.  It’s worth studying why some ideas and approaches are great while others are boring.

The Compiler Doesn’t Care About Your Intent

A misunderstanding that I sometimes run into when teaching programming is that the compiler can and should guess what the programmer means.  This isn’t usually quite what people say, but it’s what they’re thinking.  A great example appeared in a message sent to the avr-gcc mailing list.  The poster had upgraded his version of GCC, causing a delay loop that previously worked to be completely optimized away.  (Here’s the original message in the thread and the message I’m quoting from.)  The poster said:

… the delay loop is just an example, of how simple, intuitive code can throw the compiler into a tizzy. I’ve used SDCC(for mcs51) where the compiler ‘recognises’ code patterns, and says “Oh, I know what this is – it’s a delay loop! – Let it pass.”(for example).

Another example comes from the pre-ANSI history of C, before the volatile qualifier existed.  The compiler would attempt to guess whether the programmer was accessing a hardware register, in order to avoid optimizing away the critical accesses.  I’m sure there are more examples out there (if you know of a good one, please post a comment or mail me).

The problem with this kind of thinking is that the correctness of code now depends on the compiler correctly guessing what the programmer means.  Since the heuristics used to guess aren’t specified or documented, they are free to change across compilers and compiler versions.  In the short term, these hacks are attractive, but in the long run they’re little disasters waiting to happen.

One of the great innovations of the past 50 years of programming language research was to separate the semantics of a language from its implementation.  This permits the correctness of an implementation to be judged solely by how closely it conforms to the standard, and it also permits programs to be reasoned about as mathematical objects.  C is not the most suited to mathematical reasoning, but there are some excellent research projects that do exactly this.  For example Michael Norrish’s PhD thesis formalized C in the HOL theorem prover, and Xavier Leroy’s CompCert compiler provably preserves the meaning of a C program as it is translated into PPC or ARM assembly.

Of course, the intent of the programmer does matter sometimes.  First, a well-designed programming language takes programmer intent into account and makes programs mean what they look like they mean.  Second, intent is important for readability and maintainability.  In other words, there are usually many ways to accomplish a given task, and good programmers choose one that permits subsequent readers of the code to easily grasp what is being done, and why.  But the compiler does not, and should not, care about intent.

C and C++ Make It Hard to Read a Register for Its Side Effects

[ This post was co-written with Nigel Jones, who maintains an excellent embedded blog Stack Overflow.  Nigel and I share an interest in volatile pitfalls in embedded C/C++ and this post resulted from an email discussion we had.  Since we both have blogs, we decided to both post it.   However, since comments are not enabled here, all discussion should take place at Nigel’s post. ]

Once in awhile one finds oneself having to read a device register, but without needing nor caring what the value of the register is. A typical scenario is as follows. You have written some sort of asynchronous communications driver. The driver is set up to generate an interrupt upon receipt of a character. In the ISR, the code first of all examines a status register to see if the character has been received correctly (e.g. no framing, parity or overrun errors). If an error has occurred, what should the code do? Well, in just about every system we have worked on, it is necessary to read the register that contains the received character — even though the character is useless. If you don’t perform the read, then you will almost certainly get an overrun error on the next character. Thus you find yourself in the position of having to read a register even though its value is useless. The question then becomes, how does one do this in C? In the following examples, assume that SBUF is the register holding the data to be discarded and that SBUF is understood to be volatile. The exact semantics of the declaration of SBUF vary from compiler to compiler.

If you are programming in C and if your compiler correctly supports the volatile qualifier, then this simple code suffices:

void cload_reg1 (void)

This certainly looks a little strange, but it is completely legal C and should generate the requisite read, and nothing more. For example, at the -Os optimization level, the MSP430 port of GCC gives this code:

   mov &SBUF, r15

Unfortunately, there are two practical problems with this C code. First, quite a few C compilers incorrectly translate this code, although the C standard gives it an unambiguous meaning. We tested the code on a variety of general-purpose and embedded compilers, and present the results below. These results are a little depressing.

The second problem is even scarier. The problem is that the C++ standard is not 100% clear about what the code above means. On one hand, the standard says this:

In general, the semantics of volatile are intended to be the same in C++ as they are in C.

A number of C++ compilers, including GCC and LLVM, generate the same code for cload_reg1() when compiling in C++ mode as they do in C mode. On the other hand, several high-quality C++ compilers, such as those from ARM, Intel, and IAR, turn the function cload_reg1() into object code that does nothing. We discussed this issue with people from the compiler groups at Intel and IAR, and both gave essentially the same response. Here we quote (with permission) from the Intel folks:

The operation that turns into a load instruction in the executable code is what the C++ standard calls the lvalue-to-rvalue conversion; it converts an lvalue (which identifies an object, which resides in memory and has an address) into an rvalue (or just value; something whose address can’t be taken and can be in a register). The C++ standard is very clear and explicit about where the lvalue-to-rvalue conversion happens. Basically, it happens for most operands of most operators – but of course not for the left operand of assignment, or the operand of unary ampersand, for example. The top-level expression of an expression statement, which is of course not the operand of any operator, is not a context where the lvalue-to-rvalue conversion happens.

In the C standard, the situation is somewhat different. The C standard has a list of the contexts where the lvalue-to-rvalue conversion doesn’t happen, and that list doesn’t include appearing as the expression in an expression-statement.

So we’re doing exactly what the various standards say to do. It’s not a matter of the C++ standard allowing the volatile reference to be optimized away; in C++, the standard requires that it not happen in the first place.

We think the last sentence sums it up beautifully. How many readers were aware that the semantics for the volatile qualifier are significantly different between C and C++? The additional implication is that as shown below GCC, the Microsoft compiler, and Open64, when compiling C++ code, are in error.

We asked about this on the GCC mailing list and received only one response which was basically “Why should we change the semantics, since this will break working code?” This is a fair point. Frankly speaking, the semantics of volatile in C are a bit of mess and C++ makes the situation much worse by permitting reasonable people to interpret it in two totally different ways.

Experimental Results

To test C and C++ compilers, we compiled the following two functions to object code at a reasonably high level of optimization:

extern volatile unsigned char foo;
void cload_reg1 (void)
void cload_reg2 (void)
  volatile unsigned char sink;
  sink = foo;

For embedded compilers that have built-in support for accessing hardware registers, we tested two additional functions where as above, SBUF is understood to be a hardware register defined by the semantics of the compiler under test:

void cload_reg3 (void)

void cload_reg4 (void)
  volatile unsigned char sink;
  sink = SBUF;

The results were as follows.


We tested version 4.4.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() . No warnings are generated. The C++ compiler shows the same behavior as the C compiler.

Intel Compiler

We tested icc version 11.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler emits code loading from foo for both cload_reg1() and cload_reg2(), without giving any warnings. The C++ compiler emits a warning “expression has no effect” for cload_reg1() and this function does not load from foo. cload_reg2() does load from foo and gives no warnings.

Sun Compiler

We tested suncc version 5.10, hosted on x86 Linux and also targeting x86 Linux, using optimization level -O. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.


We tested opencc version 4.2.3, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.

LLVM / Clang

We tested subversion rev 98508, which is between versions 2.6 and 2.7, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() . A warning about unused value is generated for cload_reg1(). The C++ compiler shows the same behavior as the C compiler.

CrossWorks for MSP430

We tested version, hosted on x86 Linux, using optimization level -O. This compiler supports only C. foo was not loaded in cload_reg1(), but it was loaded in cload_reg2().


We tested version, hosted on Windows XP, using maximum speed optimization. The C compiler performed the load in all four cases. The C++ compiler did not perform the load for cload_reg1() or cload_reg3(), but did for cload_reg2() and cload_reg4().

Keil 8051

We tested version 8.01, hosted on Windows XP, using optimization level 8, configured to favor speed. The Keil compiler failed to generate the required load in cload_reg1() (but did give at least give a warning), yet did perform the load in all other cases including cload_reg3() suggesting that for the Keil compiler, its IO register (SFR) semantics are treated differently to volatile variable semantics.


We tested version 9.70, hosted on Windows XP, using Global optimization level 9, configured to favor speed. This was very interesting in that the results were almost a mirror image to the Keil compiler. In this case the load was performed in all cases except cload_reg3(). Thus the HI-TECH semantics for IO registers and volatile variables also appears to be different – just the opposite to Keil! No warnings was generated by the Hi-TECH compiler when it failed to generate code.

Microchip Compiler for PIC18

We tested version 3.35, hosted on Windows XP, using full optimization level. This rounded out the group of embedded compilers quite nicely in that it didn’t perform the load in either cload_reg1() or cload_reg3() – but did in the rest. It also failed to warn about the statements having no effect. This was the worst performing of all the compilers we tested.


The level of non-conformance with the C compilers, together with the genuine uncertainty as to what the C++ compilers should do, provides a real quandary. If you need the most efficient code possible, then you have no option other than to investigate what your compiler does. If you are looking for a generally reliable and portable solution, then the methodology in cload_reg2() is probably your best bet. However it would be just that: a bet. Naturally, we (and the other readers of this blog) would be very interested to hear what your compiler does. So if you have a few minutes, please run the sample code through your compiler and let us know the results.


We’d like to thank Hans Boehm at HP, Arch Robison at Intel, and the compiler groups at both Intel and IAR for their valuable feedback that helped us construct this post. Any mistakes are, of course, ours.

Computer Science Fiction

Science fiction explores the effect of technological progress on society.  It is ironic, then, that the majority of SF authors miserably failed to predict the impact of computers and information technology. Why does Google return no meaningful hits for “computer science fiction?”  Is it not obvious that this term needs to exist, if we wish to understand the next 50 years at all?

Looking back, it is clear that most SF authors made the same mistake: they took whatever computers existed at the time and envisioned larger, more powerful, smarter versions of the same kind of machine.  The genre is filled with this kind of obvious — and, in retrospect, boring — extrapolation. In contrast, the interesting thing that has happened over the past few decades is the development and deployment of new kinds of computer-based systems, which are now pervasively embedded in the world.  Something like 10 billion processors are manufactured each year and only a tiny fraction looks anything like a mainframe.

Of course there were other mis-steps, such as overlooking the impact of wireless communication.  One of my favorite mispredictions comes from Neuromancer, where Gibson makes a point of showing us that pay phones still exist in the middle of the 21st century.  People in Neuromancer have their brains wired directly into the network, but they don’t carry cell phones.

As a professional computer scientist, part of my job is a form of applied science fiction.  Instead of trying to predict the impact of science, we try to make the impact happen.  Doing this job well, however, requires real insight into the future.  Therefore, one of the questions I’m interested in is: Who in SF got it right?  That is, who saw beyond the obvious extrapolations about computer technology and really understood what the future might look like?  And how did they do it?

The best single example of a computer science fiction author is Vernor Vinge.  His story True Names preceded the cyberpunk movement and paved the way for subsequent work like Neuromancer and Snow Crash.  Vinge’s books contain a high density of good ideas, on a par with some of the best earlier idea-driven writers like Asimov and Niven.  A Fire Upon the Deep gets solid mileage out of pack animals that function as distributed systems and also provides a reasonable exploration of what really powerful machines might be like (“applied theology” indeed).  Subsequently, A Deepness in the Sky gave us the term “software archeology” — an ugly but highly plausible window into the future of software maintenance — and is the first and perhaps only work in the sub-sub-genre “sensor network science fiction.”  Of course, Vinge depicts pervasive embedded sensing as invariably leading to totalitarian control and then societal collapse.

There are two major CS-related themes running across Vinge’s work.  The first is the singularity, and probably too much has been written about it elsewhere.  It seems uninteresting to speculate about a point in time that is defined as being immune to speculation.  The second centers on the fickle nature of control in computer systems.  Very early, Vinge foresaw the kinds of battles for control in networked resources that are playing out now in botnets and in corporate and government networks.  Concepts like a trusted computing base and subversion via network attack look like they are probably fundamental.  Our understanding of how these ideas will evolve is flawed at best.  In fact, the entire history of computer security has been adversary driven: we put an insecure system out into the world, wait for it to be exploited, and then react.  This happens over and over, and I have seen very few signs that a more proactive approach is emerging.

How did Vinge do such a good job?  This is tough to analyze.  Like all good extrapolators, he separated the fundamental from the incidental, and pushed the fundamentals forward in useful ways.  He knew enough CS to avoid technical gaffes but somehow failed to let this knowledge interfere with his predictions.  It is no coincidence that few of the now-common top-level uses of the Internet were thought of by computer scientists: we’re too busy dealing with other levels of the system.  We know how computers are supposed to be used and this is a huge handicap.

As a group, the cyberpunk authors did a good job in seeing the impact of information technology, but as far as I can tell their contributions to computer science fiction were largely derived from things that Vinge and others had already predicted.  It’s interesting that almost all of these authors insisted on making the network into a physical space accessed using VR.  This was more than a plot device, I think: people really want cyberspace to be somehow analogous to physical space.  The real-world failure of VR as a metaphor for understanding and navigating networked computer systems has been interesting to watch; I’m not sure that I understand what is going on, but as I see it the network simply does not demand to be understood as a physical space.  Of course, spaces are useful for games and other kinds of interaction, but it’s not at all clear that VR will ever be the primary metaphor for interacting with the network.  The poor state of interface technology (Where are those brain plugs? Why are we still using stupid mice?) is obviously a factor as well.

200 Compiler Bugs

This morning I reported the 200th bug found by our compiler testing tool.  It is a new way to crash GCC.  The failure-inducing input is not pretty so I won’t give it here, but it can be found in GCC’s bugzilla.  Although the testing tool is now entirely developed by some excellent PhD students, I have remained the primary bug reporting person.  It is basically grunt work but it keeps me technically involved with the project.

It took us about two years of real time and maybe three person-years of effort to find these 200 bugs.  When will we reach 300?  It’s not clear.  On one hand, our testing tool is becoming extremely powerful and we are branching out into more target architectures.  On the other hand, both GCC and LLVM have developed a significant degree of immunity to our tests.  The co-evolution of the testing tool and the tested systems has been interesting to watch and I hope to write more about this at some point.  In any case, at this point our tester is the bottleneck for reporting bugs in LLVM and GCC (for x86/x64 and for the most common compiler flags, at least).  For most other compilers we have looked at, the bottleneck is either in creating good bug reports — this is time consuming — or in the compiler team’s ability (and sometimes, unfortunately, willingness) to fix the bugs that we report.

Our (generally) uncomplaining and silent partners in this effort are the compiler developers who fix the bugs that we report.  I often watch the progress of each bug report as it is confirmed, discussed, and eventually fixed.  Over the course of this project I’ve become very impressed with the kind of hacking talent the LLVM and GCC projects have attracted.  Hopefully their effort, and ours, is well-spent and in the long run we will end up with several highly reliable open-source compilers.

Finally, I should add that we are exceptionally grateful to DARPA for funding this effort.

How to Evaluate a Computer Systems Research Paper

Some excellent resources exist about how to write a good systems paper. This post is about a slightly different topic.

In a typical recent year I review about 100 papers, mostly conference papers 8-14 pages long in 9 or 10 point font. People in similar positions — mid-career computer systems professors — are generally in the same situation and some have it worse. Since a good review takes three or four hours to write, it’s important to develop some shortcuts. Many of mine take the form of “sniff tests”: ways to rapidly discover that a paper contains bogus or useless results. Perhaps one third of papers that I review fall into this category. If I can save time by writing relatively brief reviews of them, then I can spend more time reviewing the marginal-to-good papers: these are the ones that stand to benefit most from detailed feedback. The best papers, like the worst, require relatively little effort to review.

I almost always skip the abstract.  For one thing, I’ve probably already read it when deciding how to rank the paper in my reviewing preferences.  Additionally, the introduction to the paper almost always contains the same information but with more motivation and background.

The first thing I do when evaluating a paper is to read its conclusion. Authors are almost invariably more truthful about their contributions in the conclusion than in the abstract and introduction. Why? First, people often write their papers front-to-back, and usually the introduction gets written before the final results have arrived. The authors are still optimistic. Second, it is simply psychologically harder to oversell the results of paper in the conclusion, where the authors are aware that the reader has just finished reading a perhaps not totally conclusive evaluation section.

The most important questions to ask when reading a paper are: Does it contain a new idea? A useful idea? Both can be hard to answer.

One complication in evaluating novelty is that the actual contribution of a paper is often different than the one(s) stated in the paper. Most of the time this is not due to any deliberate misrepresentation by the authors. Rather, people usually emphasize what they thought was hard or fun about the work, neglecting the fact that many hard and fun — and often substantially similar — research projects have been done in the past. Also, the authors’ perception of their contributions tend to be heavily colored by their previous work and their backgrounds.  Furthermore, sometimes the actual contributions of a piece of work become clear only years after its initial publication.

Another problem is that evaluating the novelty of an idea requires a huge breadth of knowledge covering many thousands of papers in related, and not-obviously-related, research areas. It is not uncommon for zero or one of the people evaluating a paper submitted to a conference to have a really solid idea about where the paper fits into the literature. Complicating matters, many people submit papers to a venue that they know and like, where it will be evaluated by people they know and like, even if these are not the most appropriate people to be evaluating the paper. Changing research sub-areas seems to be much easier and more appealing than changing communities.

To show that an idea is useful, it is customary to evaluate it analytically and/or experimentally. Experimental results come in many sub-flavors, including those based on simulation and implementation. Regardless of the actual technique, there are many sniff-tests that should be applied to any computer systems paper.

For analytical results: Does the result make sense? Is it grounded in reality? Does it tell us anything new? Are the theorems and lemmas actually formal or are they “pseudo-formal”: written in the formal style but lacking key definitions and steps in reasoning?  Would a theoretician with the appropriate background find the results useful or interesting?

For simulation results: Did the authors have to use simulation or were they simply too lazy to find a better evaluation method? Are the simulation parameters realistic? Does the simulation tell us anything new? I once reviewed a paper that presented a collection of analytical results and then a collection of simulation results that were based on an exact implementation of the analytical model. Of course, the “experimental” results matched the predicted results nearly perfectly (and trivially). Another paper that I once reviewed performed its evaluation on a simulation of a large multiprocessor computer where the parameters and workload were chosen in such a way that the aggregate throughput of the multiprocessor was around one instruction per cycle. Of course any conclusions reached from this kind of simulation are useless.

For experimental results: Did the authors measure the right quantities? Were appropriate tests of statistical significance used? (In computer systems, confidence intervals are quite avant-garde.) Is the measured effect robust? Is the baseline a sensible one? A commonly used trick is to compare the new work against an obsolete or otherwise obviously defective baseline, rather than the state of the art. Another common trick is to report on the degree of improvement offered by a new technique, but to omit the absolute numbers.

One sniff test I like to apply to a piece of research is to ask the following questions. First, what percent of actual systems cannot benefit from the proposed technique no matter what? Second, what percent of systems get the benefit of the proposed technique without any special effort? Third, what percent of systems are left? Perhaps surprisingly, a lot of research fails this trivial test. As an example, let’s suppose I’m proposing a new CPU scheduling technique for desktop Linux/Windows/MacOS boxes (this is not a random example: my PhD thesis was basically about this).  First we ask: what class of systems cannot benefit from the proposed technique?  There are several answers, but probably the most obvious one is overloaded machines: those that cannot finish their workloads (displaying video frames, decoding audio chunks, etc.) on time regardless of scheduling discipline.  Second, we ask: what class of systems gets the benefits of good scheduling without a good scheduler?  The answer is: those with low CPU loads.  If the average length of the run queue is not greater than one, then all work-conserving scheduling algorithms are equivalent.  Finally, we ask: which systems are left?  That is, who actually benefits from the smart new scheduler?  The answer is: systems whose load is in a fairly narrow range between too low and too high.  The narrowness of this band, and the presence of techniques for getting into the underload region (for example manually reducing the degree of multiprogramming), were the ultimate reasons why I stopped working on scheduling problems.

Other common problems include: A solution to a problem that comes with significant, unavoidable costs that are not discussed. Solutions that cannot scale to the situations they are meant to address. Improvements that are far too small to be significant. Techniques that exploit the same source of benefit as an existing technique, but that are more complicated or are otherwise inferior.

Looking back over this post I see that it could be read as being very cynical: just looking for ways to reject papers.  But the fact is, if the research community is unwilling to self-censor, then the pushback has to come from somewhere else.  It’s also worth noting that the interaction between the relentless pressure to publish and program committees full of people like me has resulted in an incredible proliferation of conferences and workshops.  To a limited extent, this kind of community diversification and evolution is good, it helps the field adapt to changes.  On the other hand, most people would agree that things have gone too far.  But this is a subject for another post.

There is plenty of excellent systems research being done. Fantastic new ideas are changing how systems are built and making it possible to build new kinds of computer systems. The problem, on the other hand, is that the research community produces a lot of mediocre and useless work (I have produced some myself).  Lacking a marketplace, we rely on humans to evaluate both the good work and the bad, and those of us who have to evaluate a broad cross-section of research results need to adapt our strategies in order to make effective use of time. In summary, Sturgeon’s Law applies.

50 Vertical Miles

A little over a year ago my family moved to a house near the north edge of Salt Lake City.  Although access to real mountains is not great — it’s about a three-hour walk to the nearest 8000′ peak and a major slog to a 9000′ peak — the foothill access is excellent.  At the same time, after way too much sedentary work, sedentary travel, and time at home with small kids, I found myself with high blood pressure and needing to lose weight, so I started doing a 45-minute hike each day, with a bit over 750′ elevation gain/loss.

After a year of this I ended up in decent shape and around 20 pounds lighter.  The cool part, though, is that 365 days of 750 feet comes out to 50 vertical miles hiked.  I was a little disappointed to compute that I’ll never be able to hike to the equivalent of geosynchronous orbit, but low Earth orbit should be attainable this year.  Of course due to travel and being sick, I missed some days, but also there were plenty of days where I hiked 2000-3000 vertical feet, so probably the average was maintained.  The hardest part is not missing days when weather is crappy or work and kids make life busy.  The solution, however, turned out to be easy: a good facemask and a powerful headlamp.

Although hiking the same set of trails day after day threatens to become boring, there has been a nice unintended benefit.  Since little brain power is required, I get a lot of unstructured time to think.  As far as I can tell, this has improved the quality of my work quite a bit; I usually return from a hike with three or four new ideas for me or my students to try out.  Even if only a few percent of these ideas are useful, the time is still well spent.  Hiking is even better than the shower for generating new ideas — who knew?

Nine ways to break your systems code using volatile

The volatile qualifier in C/C++ is a little bit like the C preprocessor: an ugly, blunt tool that is easy to misuse but that — in a very narrow set of circumstances — gets the job done.  This article will first briefly explain volatile and its history and then, through a series of examples about how not to use it, explain how to most effectively create correct systems software using volatile.  Although this article focuses on C, almost everything in it also applies to C++.

What does a C program mean?

The C standard defines the meaning of a C program in terms of an “abstract machine” that you can think of as a simple, non-optimizing interpreter for C.  The behavior of any given C implementation (a compiler plus target machine) must produce the same side effects as the abstract machine, but otherwise the standard mandates very little correspondence between the abstract machine and computation that actually happens on the target platform.  In other words, the C program can be thought of as a specification of desired effects, and the C implementation decides how to best go about making those effects happen.

As a simple example, consider this function:

int loop_add3 (int x) {
  int i;
  for (i=0; i<3; i++) x++;
  return x;

The behavior of the abstract machine is clear: it creates a function-scoped variable named i which loops from 0 through 2, adding 1 to x on each iteration of the loop.  On the other hand, a good compiler emits code like this:

  movl 4(%esp), %eax
  addl $3, %eax

In the actual machine, i is never incremented or even instantiated, but the net effect is the same as if it had been.  In general, this gap between the abstract and actual semantics is considered to be a good thing, and the “as if” wording in the standard is what gives the optimizer the freedom to generate efficient code from a high-level specification.

What does volatile mean?

The problem with the gap between the abstract and concrete semantics is that C is a low-level language that was designed for implementing operating systems.  Writing OS code often requires that the gap between the abstract and actual semantics be narrowed or closed.  For example, if an OS wants to create a new page table, the abstract semantics for C fails to capture an important fact: that the programmer requires an actual page table that sits in RAM where it can be traversed by hardware.  If the C implementation concludes that the page table is useless and optimizes it away, the developers’ intent has not been served.  The canonical example of this problem is when a compiler decides that a developer’s code for zeroing sensitive data is useless and optimizes it away.  Since the C abstract machine was not designed to consider cases where this data may be snooped later, the optimization is legitimate (though obviously undesirable).

The C standard gives us just a few ways to establish a connection between the abstract and actual machines:

  • the arguments to main()
  • the return value from main()
  • the side-effecting functions in the C standard library
  • volatile variables

Most C implementations offer additional mechanisms, such as inline assembly and extra library functions not mandated by the standard.

The way the volatile connects the abstract and real semantics is this:

For every read from a volatile variable by the abstract machine, the actual machine must load from the memory address corresponding to that variable.  Also, each read may return a different value.  For every write to a volatile variable by the abstract machine, the actual machine must store to the corresponding address.  Otherwise, the address should not be accessed (with some exceptions) and also accesses to volatiles should not be reordered (with some exceptions).

In summary:

  • Volatile has no effect on the abstract machine; in fact, the C standard explicitly states that for a C implementation that closely mirrors the abstract machine (i.e. a simple C interpreter), volatile has no effect at all.
  • Accesses to volatile-qualified objects obligate the C implementation to perform certain operations at the level of the actual computation.

Where did volatile come from?

Historically, the connection between the abstract and actual machines was established mainly through accident: compilers weren’t good enough at optimizing to create an important semantic gap.  As optimizers improved, it became increasingly clear that a systematic solution was needed.  In an excellent USENET post 20 years ago Doug Gwyn explained how volatile came about:

To take a specific example, UNIX device drivers are almost always coded entirely in C, and on the PDP-11 and similar memory-mapped I/O architectures, some device registers perform different actions upon a “read-byte”, “read-word”, “write-byte”, “write-word”, “read-modify-write”, or other variations of the memory-bus access cycles involved.  Trying to get the right type of machine code generated while coding the driver in C was quite tricky, and many hard-to-track-down bugs resulted.  With compilers other than Ritchie’s, enabling optimization often would change this behavior, too.  At least one version of the UNIX Portable C Compiler (PCC) had a special hack to recognize constructs like

((struct xxx *)0177450)->zzz

as being potential references to I/O space (device registers) and would avoid excessive optimization involving such expressions (where the constant lay within the Unibus I/O address range).  X3J11 decided that this problem had to be faced squarely, and introduced “volatile” to obviate the need for such hacks.  However, although it was proposed that conforming implementations be required to implement the minimum possible access “width” for volatile-qualified data, and that is the intent of requiring an implementation definition for it, it was not practical to insist on it in every implementation; thus, some latitude was allowed implementors in that regard.

It’s hard to overstate how bad an idea it is for a compiler to use strange heuristics about code structure to guess the developer’s intent.

Nine ways to break your systems code using volatile

1. Not enough volatile

The most obvious kind of volatile error is to leave it out when it is required.  Let’s look at a specific example.  Suppose we’re developing software for an AVR 8-bit embedded processor, which (on some models) has no hardware multiplier.  Since multiplies are going to happen in software, we’re probably interested in seeing how slow they are, so we know how hard to try to avoid them.  So we write a little benchmark program like this:

#define TCNT1 (*(uint16_t *)(0x4C))
signed char a, b, c;
uint16_t time_mul (void) {
  uint16_t first = TCNT1;
  c = a * b;
  uint16_t second = TCNT1;
  return second - first;

Here TCNT1 points to a hardware register living at address 0x4C.  This register provides access to Timer/Counter 1: a free-running 16-bit timer that we assume is configured to run at some rate appropriate for this experiment.  We read the register before and after the multiply operation, and subtract to find the duration.  Side note: although at first glance this code looks like it fails to account for the case where TCNT1 overflows from 65535 to 0 during the timing run, it actually works properly for all durations between 0 and 65535 ticks.

Unfortunately, when we run this code, it always reports that the multiply operation required zero clock ticks.  To see what went wrong, let us look at the assembly language:

$ avr-gcc -Os -S -o - reg1.c
  lds r22,a
  lds r24,b
  rcall __mulqi3
  sts c,r24
  ldi r24,lo8(0)
  ldi r25,hi8(0)

Now the problem is obvious: both reads from the TCNT1 register have been eliminated and the function is simply returning the constant zero (avr-gcc returns a 16-bit value in the r24:r25 register pair).

How can the compiler get away with never reading from TCNT1?  First, let’s remember that the meaning of a C program is defined by the abstract machine described in the C standard.  Since the rules for the abstract machine say nothing about hardware registers (or concurrent execution) the C implementation is permitted to assume that two reads from an object, with no intervening stores, both return the same value.  Of course, any value subtracted from itself is zero.  So the translation performed by avr-gcc here is perfectly correct; it is our program that’s wrong.

To fix the problem, we need to change the code so that TCNT1 points to a volatile location.

#define TCNT1 (*(volatile uint16_t *)(0x4C))

Now, the C implementation is not free to eliminate the reads and also it cannot assume the same value is read both times.  This time the compiler outputs better code:

$ avr-gcc -Os -S -o - reg2.c
  in r18,0x2c
  in r19,0x2d
  lds r22,a
  lds r24,b
  rcall __mulqi3
  sts c,r24
  in r24,0x2c
  in r25,0x2d
  sub r24,r18
  sbc r25,r19

Although this assembly code is correct, our C code still contains a latent error.  We’ll explore it later.

Normally, you will find definitions for device registers in system header files.  If so, you will not need to use volatile in this case.  But it may be worth checking that the definitions are correct, they aren’t always.

Let’s look at another example.  In an embedded system you are implementing, some computation must wait for an interrupt handler to fire.  Your code looks like this:

int done;
__attribute((signal)) void __vector_4 (void) {
  done = 1;
void wait_for_done (void) {
  while (!done) ;

Here wait_for_done() is designed to be called from the non-interrupt context, whereas __vector_4() will be invoked by the interrupt controller in response to some external event.  We compile this code into assembly:

$ avr-gcc -Os wait.c -S -o -
  push r0
  in r0,__SREG__
  push r0
  push r24
  ldi r24,lo8(1)
  sts done,r24
  pop r24
  pop r0
  out __SREG__,r0
  pop r0
  lds r24,done
  tst r24
  breq .L3

The code for the interrupt handler looks good: it stores to done as intended.  The rest of the interrupt handler is just AVR interrupt boilerplate.  However, the code for wait_for_done() contains an important flaw: it is spinning on r24 instead of spinning on a RAM location.  This happens because the C abstract machine has no notion of communication between concurrent flows (whether they are threads, interrupts, or anything else).  Again, the translation is perfectly correct, but does not match the developer’s intent.

If we mark done as a volatile variable, the interrupt handler code does not change, but wait_for_done() now looks like this:

  lds r24,done
  tst r24
  breq .L3

This code will work.  The issue here is one of visibility.  When you store to a global variable in C, what computations running on the machine are guaranteed to see the store?  When you load from a global variable, what computations are assumed to have produced the value?  In both cases, the answer is “the computation that performs the load or store is assumed to be the only computation that matters.”  That is, C makes no visibility guarantees for normal variable references.  The volatile qualifier forces stores to go to memory and loads to come from memory, giving us a way to ensure visibility across multiple computations (threads, interrupts, coroutines, or whatever).

Again, our C code contains a latent bug that we’ll investigate later.

A few other legitimate uses of volatile, including making variables in UNIX programs visible to signal handlers, are discussed by Hans Boehm.

Summary: The abstract C machine is connected to the actual machine in only a few places.  The memory behavior of the actual machine may be very different from the operations specified in source code.  If you require additional connections between the two levels of abstraction, for example to access device registers, the volatile qualifier can help.

2. Too much volatile

In a well-designed piece of software, volatile is used exactly where it is needed.  It serves as documentation, saying in effect “this variable does not play by the C rules: it requires a strong connection with the memory subsystem.”  In a system that uses too much volatile, variables will be indiscriminately labeled as volatile, without any technical justification.  There are three reasons why this is bad.  First, it’s bad documentation and will confuse subsequent maintainers.  Second, volatile sometimes has the effect of hiding program bugs such as race conditions.  If your code needs volatile and you don’t understand why, this is probably what is happening.  Far better to actually fix the problem than to rely on a hack you do not understand to solve a problem you do not understand.  Finally, volatile causes inefficiency by handicapping the optimizer.  The overhead that it introduces is hard to track down since it is spread out all over the system– a profiler will be little help in finding it.

Using volatile is a little like deciding what kind of insurance policy to buy.  Too little insurance and you may run into problems down the road.  Too much insurance and you’re covered but in the long run you end up paying too much.

Summary: Use volatile only when you can provide a precise technical justification.  Volatile is not a substitute for thought (Nigel Jones said this).

3. Misplaced qualification

At the level of C syntax, volatile is a type qualifier.  It can be applied to any type, following rules that are similar to, but not quite the same as, the rules for the const qualifier.  The situation can become confusing when qualified types are used to build up more complex types.  For example, there are four possible ways to qualify a single-level pointer:

int *p;                              // pointer to int
volatile int *p_to_vol;              // pointer to volatile int
int *volatile vol_p;                 // volatile pointer to int
volatile int *volatile vol_p_to_vol; // volatile pointer to volatile int

In each case, either the pointer is volatile or not, and the pointer target is volatile or not.  The distinction is crucial: if you use a “volatile pointer to regular int” to access a device register, the compiler is free to optimize away accesses to the register.  Also, you will get slow code since the compiler will not be free to optimize accesses to the pointer.  This problem comes up pretty often on embedded mailing lists; it’s an easy mistake to make.  It’s also easy to overlook when vetting code since your eye may just be looking for a volatile somewhere.

For example, this code is wrong:

int *volatile REGISTER = 0xfeed;
*REGISTER = new_val;

To write clear, maintainable code using volatile, a reasonable idea is to build up more complex types using typedefs (of course this is a good idea anyway).  For example we could first make a new type “vint” which is a volatile int:

typedef volatile int vint;

Next, we create a pointer-to-vint:

vint *REGISTER = 0xfeed;

Members of a struct or union can be volatile, and structs/unions can also be volatile.  If an aggregate type is volatile, the effect is the same as making all members volatile.

We might ask, does it make sense to declare an object as both const and volatile?

const volatile int *p;

Although this initially looks like a contradiction, it is not.  The semantics of const in C are “I agree not to try to store to it” rather than “it does not change.”  So in fact this qualification is perfectly meaningful and would even be useful, for example, to declare a timer register than spontaneously changes value, but that should not be stored to (this example is specifically pointed out in the C standard).

Summary: Since C’s type declaration syntax is not particularly readable or intuitive, volatile qualifiers must be placed with care.  Typedefs are a useful way to structure complex declarations.

4. Inconsistent qualification

The last version of Linux 2.2 was 2.2.26.  In that version, in the file arch/i386/kernel/smp.c at line 125, we find this definition:

volatile unsigned long ipi_count;

So far, no problem: we’re declaring a long to store the number of inter-processor interrupts and making it volatile.  However, in the header file include/asm-i386/smp.h at line 178 we find this definition:

extern unsigned long ipi_count;

C files that include this header will not treat ipi_count as volatile, and this could easily cause problems.  Kernels in the 2.3 series also contain this error.

Recent versions of gcc treat this kind of inconsistent qualification as a compile-time error, so these problems have disappeared.  However, it’s a good bet that some embedded compilers (obviously including those based on older versions of gcc) will permit you to make this mistake.

Another way to get inconsistent qualification is through typecasts.  Casts can be implicit, for example passing a pointer-to-volatile to a function expecting an unqualified pointer.  The compiler will warn about this; these warnings should never be ignored or suppressed.  Explicit typecasts that remove qualifiers should be avoided, these generally do not cause any warnings.  The C standard explicitly states that a program’s behavior is undefined if you access a volatile object through an unqualified pointer.

Summary: Never use inconsistent qualification.  If a variable is declared as volatile, then all accesses to it, direct or indirect, must be through volatiles and pointers-to-volatile.

5. Expecting volatile to enforce ordering with non-volatile accesses

Next we come to an issue that even some experts in embedded software development get wrong, and that even experts in C language semantics have arguments about.

The question is: What was wrong with the fixed C code examples above, where we added volatile to the TCNT1 register handle and to the done flag?  The answer, depending on who you believe, is either “nothing” or else “the compiler may reorder the operations in such a way as to create broken output.”

One school of thought is that compilers may not move accesses to global variables around accesses to volatile variables.  There seems to be a consistent reading of the standard that backs this up.  The problem with this reading is that important compilers are based on a different interpretation, which says that accesses to non-volatile objects can be arbitrarily moved around volatile accesses.

Take this simple example (which originated with Arch Robison):

volatile int ready;
int message[100];
void foo (int i) {
  message[i/10] = 42;
  ready = 1;

The purpose of foo() is to store a value into the message array and then set the ready flag so that another interrupt or thread can see the value.  From this code, GCC, Intel CC, Sun CC, and Open64 emit very similar assembly:

$ gcc -O2 barrier1.c -S -o -
  movl 4(%esp), %ecx
  movl $1717986919, %edx
  movl $1, ready
  movl %ecx, %eax
  imull %edx
  sarl $31, %ecx
  sarl $2, %edx
  subl %ecx, %edx
  movl $42, message(,%edx,4)

Obviously the programmer’s intent is not respected here, since the flag is stored prior to the value being written into the array.  As of this writing LLVM does not do this reordering but, as far as I know, this is a matter of chance rather than design.  A number of embedded compilers refuse to do this kind of reordering as a deliberate choice to prefer safety over performance.  I’ve heard, but not checked, that recent Microsoft C/C++ compilers also take a very conservative stance on volatile accesses.  This is probably the right choice, but it doesn’t help people who have to write portable code.

One way to fix this problem is to declare message as a volatile array.  The C standard is unambiguous that volatile side effects must not move past sequence points, so this will work.  On the other hand, adding more volatile qualifiers may suppress interesting optimizations elsewhere in the program.  Wouldn’t it be nice if we could force data to memory only at selected program points without making things volatile everywhere?

The construct that we need is a “compiler barrier.”  The C standard does not provide this, but many compilers do.  For example, GCC and sufficiently compatible compilers (including LLVM and Intel CC) support a memory barrier that looks like this:

asm volatile ("" : : : "memory");

It means roughly “this inline assembly code, although it contains no instructions, may read or write all of RAM.”  The effect is that the compiler dumps all registers to RAM before the barrier and reloads them afterwards.  Moreover, code motion is not permitted around the barrier in either direction.  Basically a compiler barrier is to an optimizing compiler as a memory barrier is to an out-of-order processor.

We can use a barrier in the code example:

volatile int ready;
int message[100];
void foo (int i) {
  message[i/10] = 42;
  asm volatile ("" : : : "memory");
  ready = 1;

Now the output is what we wanted:

$ gcc -O2 barrier2.c -S -o -
  movl 4(%esp), %ecx
  movl $1717986919, %edx
  movl %ecx, %eax
  imull %edx
  sarl $31, %ecx
  sarl $2, %edx
  subl %ecx, %edx
  movl $42, message(,%edx,4)
  movl $1, ready

What about compilers that fail to support memory barriers?  One bad solution is to hope that this kind of compiler isn’t aggressive enough to move accesses around in a harmful way.  Another bad solution is to insert a call to an external function where you would put the barrier.  Since the compiler doesn’t know what memory will be touched by this function, it may have a barrier-like effect.  A better solution would be to ask your compiler vendor to fix the problem and also to recommend a workaround in the meantime.

Summary: Most compilers can and will move accesses to non-volatile objects around accesses to volatile objects, so don’t rely on the program ordering being respected.

6. Using volatile to get atomicity

Earlier we saw a case where volatile was used to make a value visible to a concurrently running computation.  This was — in limited circumstances — a valid implementation choice.  On the other hand it is never valid to use volatile to get atomicity.

Somewhat surprisingly for a systems programming language, C does not provide guarantees about atomicity of its memory operations, regardless of the volatility of objects being accessed.  Generally, however, individual compilers will make guarantees such as “aligned accesses to word-sized variables are atomic.”

In most cases, you use locks to get atomicity.  If you’re lucky, you have access to well-designed locks that contain compiler barriers.  If you’re programming on bare metal on an embedded processor, you may not be so lucky.  If you have to devise your own locks, it would be wise to add compiler barriers.  For example, older versions of TinyOS for AVR chips used these functions to acquire and release the global interrupt lock:

char __nesc_atomic_start (void) {
  char result = SREG;
  return result;
void __nesc_atomic_end (char save) {
  SREG = save;

Since these functions can be (and generally are) inlined, it was always possible for the compiler to move code outside of a critical section.  We changed the locks to look like this:

char__nesc_atomic_start(void) {
  char result = SREG;
  asm volatile("" : : : "memory");
  return result;
void __nesc_atomic_end(char save) {
  asm volatile("" : : : "memory");
  SREG = save;

Perhaps interestingly, this had no effect on TinyOS efficiency and even made the code smaller in some cases.

Summary: Volatile has nothing to do with atomicity.  Use locks.

7. Using volatile on a modern machine

Volatile is of very limited usefulness on a machine that is out-of-order, multiprocessor, or both.  The problem is that while volatile forces the compiler to emit memory operations on the actual machine, these loads and stores by themselves do not constrain the hardware’s behavior very much.  It is vastly preferable to find good locking primitives and use them, as opposed to rolling them yourself.  Consider this spin-unlock function from the ARM port of Linux:

static inline void arch_spin_unlock(arch_spinlock_t *lock) {
  __asm__ __volatile__("str %1, [%0]\n" : : "r" (&lock->lock), "r" (0) : "cc");

Before unlocking, smp_mb() is executed, which boils down to something like this:

__asm__ __volatile__ ("dmb" : : : "memory");

This is both a compiler barrier and a memory barrier.

Summary: Without help from properly designed synchronization libraries, writing correct concurrent code on an out-of-order machine or multiprocessor is extremely hard, and volatile is of little help.

8. Using volatile in multi-threaded code

This issue overlaps with the previous two but it’s important enough to be worth repeating.  Arch Robison says that volatile is almost useless for multi-threaded programming.  And he’s right.  If you have threads then you should also have locks, and should use them.  There’s a wonderful result showing that properly synchronized code — where shared variables are always accessed from within critical sections — executes in a sequentially consistent fashion (provided that the locks are properly implemented, and you shouldn’t have to worry about that).  This means that if you use locks, you don’t have to worry about compiler barriers, memory system barriers, or volatile.  None of it matters.

Summary: To write correct multi-threaded code, you need primitives providing (at least) atomicity and visibility.  On modern hardware volatile provides neither.  You should write properly synchronized multi-threaded code whenever possible.  Idioms like double-checked locking are best avoided.

9. Assume volatile accesses are translated correctly

Compilers are not totally reliable in their translation of accesses to volatile-qualified objects.  I’ve written extensively about this subject elsewhere, but here’s a quick example:

volatile int x;
void foo (void) {
  x = x;

The proper behavior of this code on the actual machine is unambiguous: there should be a load from x, then a store to it.  However, the port of GCC to the MSP430 processor behaves differently:

$ msp430-gcc -O vol.c -S -o -

The emitted function is a nop.  It is wrong.  In general, compilers based on gcc 4.x are mostly volatile-correct, as are recent versions of LLVM and Intel CC.  Pre-4.0 versions of gcc have problems, as do a number of other compilers.

Summary: If your code makes correct use of volatiles and still does not work, consider reading the compiler’s output to make sure it has emitted the proper memory operations.

What about the Linux people?

You can find various rants, screeds, and diatribes against volatile on Linux mailing lists and web pages.  These are largely correct, but you have to keep in mind that:

  1. Linux often runs on out-of-order multicores where volatile by itself is nearly useless.
  2. The Linux kernel provides a rich collection of functions for synchronization and hardware access that, properly used, eliminate almost all need for volatile in regular kernel code.

If you are writing code for an in-order embedded processor and have little or no infrastructure besides the C compiler, you may need to lean more heavily on volatile.


Optimizing compilers are tricky to reason about, as are out-of-order processors.  Also, the C standard contains some very dark corners.  The volatile qualifier invites all of these difficulties to come together in one place and interact with one another.  Furthermore, it provides weaker guarantees than people commonly assume.  Careful thought is required to create correct software using volatile.

Happily, most programmers who write user-mode C and C++ can get away without ever using volatile.  Also, the vast majority of kernel mode programmers will seldom if ever need volatile.  It is primarily needed by people working near bare metal; for example, working on an embedded microcontroller or porting an OS to a new platform.


I’ve been programming computers for 26 years and embedded systems for 17 years.  For the last eight years I’ve taught operating systems, embedded systems, compilers, and related courses to undergrads and graduate students.  I’ve tried to educate them about volatile and have a pretty good idea about where people go wrong with it.

In February 2010 I gave some lectures at RWTH Aachen, including about an hour on getting the volatile qualifier wrong.  This post expands on that material.