The PhD Grind, and Why Research Isn’t Like Sex

Phil Guo’s short online book, The PhD Grind, is the best description of the modern PhD experience in CS that I know of. People working on, or thinking about working on, a PhD in CS should read it. In this post I just want to comment on a few things.

Phil vividly describes the sinking feeling that he was just doing grunt work, not research. This feeling is extremely common, as is the actual fact of doing grunt work. In systemsy areas of CS there’s just no way to avoid building lots of stuff, fixing broken crap that other students (or, worse, professors) have built, running repetitive experiments that are hard to fully automate, etc. But here’s the thing: doing research isn’t like having sex where you might suddenly say to yourself: “Hey, I’m doing it!” Rather, most of the time, the research comes out only in hindsight. You look back over a year or three of hard work and say: What was interesting about what we did there? What did we do and learn that was new? If you can’t find anything then you’re screwed, but usually this only happens when the project was poorly conceived in the first place.

The “but I’m not doing research” problem comes up so often that I have a little canned speech about it that boils down to: “Work hard and trust me.” Really, this is the only way forward, because consciously trying to do research every day just makes no sense when you’re working on a large software system. Students who fail to heed this advice risk getting stuck in a kind of research paralysis where they stop making progress. I’m not saying that “Aha!” moments don’t exist. They do, and they’re great. But their importance is greatly overstated in narratives about research breakthroughs. For one thing, nine out of ten of these moments results in a useless or non-novel insight. For another, these moments are only possible because of all the preceding hard work. So who’s to say that the grunt work isn’t part of the research process too?

Another common experience that Phil illustrates nicely is the difficulty of coping with truly open problems. Our high schools and universities have not prepared us for this. I have a hunch that some of the brightest people from the best schools may be especially vulnerable to this problem because they’re so used to winning. Nothing has ever made them feel stupid before. Research makes you feel stupid—if it doesn’t, you’re not doing it right.

Although Phil shows far better than average judgement throughout his career as a PhD student, he seems to have made a subtle error regarding the role of publications. He clearly describes the Stanford culture: publish or perish, even as a student. It’s much the same elsewhere. But it’s easy to buy into this idea a little too much (this is very similar to the disastrous phenomenon where students become too invested in grades and test scores). For example:

At this point, I had given up all hope of getting a job as a professor, since I still had not published a single paper for my dissertation; competitive computer science faculty job candidates have all already published a few acclaimed fi rst-author papers by this time in their Ph.D. careers.

This may be true, but even if so it’s largely a matter of luck. Most of the time, a student who publishes a solid top-tier paper early in their career does this because they were handed a meaty project in a hot area. Another way to say this is that even very talented students flounder when faced with the problem of selecting a topic to work on—so they rely (rightly) on their advisor. But here’s the thing: we, on the faculty hiring committee, are aware of all this. We know that there’s a fine, but crucial, line between the student with six top-tier papers who was fed every one of the ideas by his advisor, and the student who published only three or four papers but on her own ideas and agenda. Top-10 CS departments produce many of the first kind of student each year. Phil (who I know slightly in person) is one of the second kind. Even top-tier schools produce only a few of these people and they are exceptionally valuable. On a hiring committee, I would fight to interview this kind of person, though this doesn’t happen very often—these people are rare in the first place and then many of them do not continue on to academia. So, when Phil says this:

… I sensed that my current publication record wasn’t impressive enough to earn me a respectable tenure-track faculty job. My hunch was confirmed months later when I saw that, sadly, fellow students with slightly superior publication records still didn’t get any job offers.

He’s assuming that we just count top-tier papers instead of looking at the individual and their context. This happens sometimes, of course, but we try not to do it. The best hiring committees do not do this. Alas, as the bar for new hires continues to rise, it’s getting harder and harder to hire people like Phil. This is not good for our field.

Here Phil gets at a separate, and perhaps deeper, problem:

Of the 26 Stanford Computer Science Department Ph.D. graduates in my year, I consider myself fairly mediocre from an academic perspective since most of my papers were second-tier and not well-received by the establishment. My dissertation work awkwardly straddled several computer science subfields–Programming Languages, Human-Computer Interaction, and Operating Systems–so it wasn’t taken seriously by the top people in any one particular subfield.

Similarly:

The kinds of research topics I’m deeply passionate about aren’t very amenable to winning grant funding, because they aren’t well-accepted by today’s academic establishment.

There’s a large element of truth here. The research process (and particularly its funding aspect) is unfortunately very conservative. But this doesn’t necessarily mean that a person should accept defeat. Rather, take it as a challenge. The question is: How can I impose my vision on the paper committees and the funding committees? It’s hard, and it’ll take some time, but it can be done. People on these committees want to make the right decisions—you just have to give them the ammunition they need in order to actually do it instead of funding or accepting the next piece of boring, incremental work.

Anyway, I am not criticizing Phil’s decision to give academia a miss. It’s his decision and there’s every chance it’s the right one. But I do think he might have overestimated the importance of the top tier paper and underestimated people’s ability to appreciate the subtleties of a budding research career and to cope with novel, interdisciplinary work.

Finally, Phil has a nice answer to the question: “If you are not going to become a professor, then why even bother pursuing a Ph.D.?”

Here is an imperfect analogy: Why would anyone spend years training to excel in a sport such as the Ironman Triathlon—a grueling race consisting of a 2.4-mile swim, 112-mile bike ride, and a 26.2-mile run—when they aren’t going to become professional athletes? In short, this experience pushes people far beyond their physical limits and enables them to emerge stronger as a result. In some ways, doing a Ph.D. is the intellectual equivalent of intense athletic training.

This rings true.

Maybird

Over the weekend my friend Derek and I did a short backpacking trip into Maybird Gulch, a minor sub-drainage of Little Cottonwood Canyon near Salt Lake City. Amazingly, on a Friday night our only company was a pack of coyotes (or perhaps this isn’t so amazing—most of Maybird is very steep and/or rocky terrain).

There’s a nameless peak on the ridge dividing Maybird from Hogum Fork that I’ve long been interested in climbing. We traversed along its knife-edged south ridge but turned around when we started doubting our ability to cling to the rocks in the 40+ MPH wind gusts. Down at our campsite the winds were lower but it was still a pretty noisy night. Unlike most years in mid-June, the snow was almost gone; we were lucky to find enough to cool down the beer.

Burning in a Module with Random Unit Testing

Sometimes a class or subsystem makes us uneasy; when something goes wrong in our software, we’ll immediately suspect the shady module is somehow involved. Often this code needs to be scrapped or at least refactored, but other times it’s just immature and needs to be burned in. Randomized unit testing can help with this burn-in process, increasing our confidence in a module.

This post isn’t about creating a random unit tester. Sometimes you can reuse a tester such as Quickcheck or Randoop. If not, it’s easy to create one by hand; I’ll cover that in a separate post.

The point is to increase our confidence in a module. But what does that mean? I’ll try to clarify with an example. Let’s say I implement a new data structure, perhaps a B-tree. Even after doing some unit testing, I probably won’t be particularly confident in its correctness. The question is: Under what conditions can I use random testing to gain as much confidence in my new B-tree as I would have in a balanced search tree from the C++ STL? For that matter, how confident am I in a random data structure from the STL in the first place? I would say “moderately confident.” That is, in general I would be happy to just use STL code in my program without specifically unit-testing it first. On the other hand, I would not just reuse the STL if I were developing safety-critical software.

Almost every time I write a software module that has a clean API and that might be reused, I write a random tester for it. Over the years I’ve developed a sort of informal procedure that, if successful, results in a burned-in module that I’m fairly confident about. Here are the necessary conditions.

  • Understandable code, clean API — I have to be able to understand the entire module at once. If not, it needs to be broken into units that are tested separately. The module can’t contain spaghetti logic or make use of extraneous state. If it does, it needs to be refactored or rewritten before burning in.
  • Heavy use of assertions — Every precondition and postcondition that can be checked, is. A repOk() / checkRep() method exists that does strong invariant checking. During fuzzing it is invoked after every regular API call.
  • Mature fuzzer for the module’s API — The random unit tester for the module is strong: it has been iteratively improved to the point where its tests reach into all parts of the module’s logic.
  • Fault injection — APIs used (as opposed to provided) by the module, such as system calls, have been tested using mocks that inject all error conditions that might happen in practice.
  • Good coverage — The maturity of the fuzzer is demonstrated by 100% coverage of branches that I believe should be covered. This includes error checking branches, but of course does not include assertion failure branches. At the system testing level, 100% coverage is generally impossible, but at the unit level I consider it to be mandatory. Coverage failures indicate bad code, a bad API, or a bad random tester.
  • Separate validation of code that is sneaky with respect to coverage — Branch coverage is, in some cases, a very weak criterion. Examples include complex conditionals and code that uses lookup tables. These have to be separately validated.
  • Checkers are happy — Valgrind, IOC, gcc -Wall, pylint, or whatever tools apply to the code in question are happy, at least up to the point of diminishing returns.
  • Oracles are happy — If a strong oracle is available, such as an alternative implementation of the same API, then it has been used during random testing and no important differences in output have been found.

You could argue that this list has little to do with random testing, but I’d disagree. I seldom if ever trust a software module unless it has been subjected to a broad variety of inputs, and it can be very hard to get these diverse inputs without using random numbers somehow. A haphazard (or even highly systematic) collection of unit tests written by myself or some other developer does not accomplish this, for code of any complexity.

A lot of real-world software, particularly in web-land, seems to be burned in by deploying it and watching the error logs and mailing lists. This development style has its place, especially since there’s a lot of software that’s just not easy to unit test. Test via deployment is how most of my group’s open-source projects work, in fact. But that doesn’t mean that it’s not satisfying and useful to be able to produce a piece of high-quality software the first time.

Might it be possible to bypass the burn-in process, for example using formal verification? Absolutely not, though we would hope that verified software contains fewer errors. Also, the errors found will tend to have a different character. The relationship between software testing and verification is a tricky issue that will increase in importance over the next few decades.

This post would be incomplete without mentioning that people with a background in academic software engineering sometimes claim that random testing can improve our confidence in software in a totally different sense from what I’m talking about here. In that line of thinking, you create an operational profile for what real inputs look like, you generate random test cases that “look like” inputs in the profile, and then finally you devise a statistical argument that gives a lower bound for the reliability of the software. I don’t happen to believe that this kind of argument is useful very often.

Slightly More Sensible Signed Left-Shifts in C11 and C++11

Left-shift of signed integers in C99, C11, and C++11 is difficult to use because shifting a 1 bit into or past the sign bit (assuming two’s complement, of course) is an undefined behavior. Many medium and large C and C++ programs do this. For example, many codes use 1<<31 for INT_MIN. IOC can detect this problem, but it happens so commonly, and developers are so uninterested in hearing about it, that we added command line flags that tell IOC to not check for it, in order to avoid annoying people.

Today a reader pointed me to this active issue that the C++ standards committee is evaluating. Basically the proposal is to make it permissible to shift a 1 into the sign bit, but not past it. The effect will be to un-break a significant number of C/C++ programs that currently execute undefined behavior when compiled as C99, C11, or C++11. Issue 1457 is marked as “ready,” meaning (as far as I can tell) that it stands a good chance of being ratified and incorporated into future versions of the C11 and C++11 standards. This seems like a step in the right direction.

The Central Limit Theorem Makes Random Testing Hard

I believe that the central limit theorem provides a partial explanation for why it can be very difficult to create an effective random tester for a software system.

Random testing is carpet bombing for software: the more of the system you can hit, the better it works. The central limit theorem, however, tells us that the average of a number of random choices is going to follow a normal distribution. This is important because:

  1. Sophisticated random test-case generators take as input not one but many random numbers, effectively combining them in order to create a random test case.
  2. A normal distribution, whose probability decays exponentially as we move away from its center, is a very poor choice for carpet bombing.

An Example

Let’s say we’re testing a container data structure such as a bounded queue. It has a capacity bug that fires only when an element is added to a full queue. The obvious way to generate random test cases for a container is to randomly generate a sequence of API calls. Let’s assume that only the enqueue and dequeue operations are available, and that we generate each with 50% probability. How likely is it that our random tester will discover the capacity bug? The number of elements in the queue is just a 1-dimensional random walk, telling us that the probability will decrease exponentially as the capacity of the bounded queue increases.

What does this mean, concretely? If the queue is 64 elements in size and a random test case is 10,000 enqueue/dequeue operations, each test case has about a 90% chance of finding the queue bug by attempting to add an element to a full queue. This drops to 50% if the queue holds 128 elements, and to just a few percent if the queue holds 256 elements. Larger queues make it quite unlikely that this corner case will be tested.

In this trivial example, the central limit theorem is not so likely to bite us. We’ll probably notice (using a code coverage tool or similar) that the “enqueue on full” condition is not tested, and we’ll cover this behavior by hand.

Another Example

A while ago I posted this bit puzzle. I won’t repeat the problem description, but the quick summary is that the goal is to find an extremely efficient implementation for some fairly simple functions, each of which operates on a pair of bit vectors. The obvious way to test a function like this is to generate two random bit vectors and then inspect the output. In general “inspect the output” is tricky but it’s trivial here since my slow example code serves as an oracle.

But now let’s ask a simple question: given a pair of random 64-bit numbers, what is the Hamming distance between them? This is the distribution:

This is, of course, the normal distribution centered at 32. This distribution shows up here because each bit is effectively a separate random number, for purposes of the distance computation.

But why do we care? This matters because the functions of interest in my earlier blog post have a special case when the Hamming distance is 0. That is, when the input bit vectors are equal it’s not totally clear what to do unless you look at my reference implementation. In fact, if you read the comments you can see that a couple of people wrote buggy code due to missing this case. As the graph above makes clear, the probability of generating two 64-bit vectors whose Hamming distance is zero is quite low.

More Examples

Can we find more examples where the central limit theorem makes it hard to do good random testing? Definitely. Consider fuzzing a filesystem. If the API calls in a test case are equally likely to open and close a file, we’re not likely to reach the maximum number of open files. If a sequence of API calls is equally likely to create and remove a directory, we’re not likely to test the maximum number of subdirectories. Similar arguments can be made for almost any resource limit or edge case in a filesystem.

I suspect that normal distributions in test cases limit the effectiveness of Csmith. For example, since the number of local variables in a function is determined by many calls to the PRNG, we might expect this quantity to be normally distributed. However, certain numbers of local variables (I’m simplifying of course) cause interesting code in the register allocator, such as its spilling and rematerialization logic, to be called. Is Csmith adequately stress-testing that kind of logic for all compilers that we test? Probably not.

My hypothesis is that many similar effects occur at larger scale as we apply random testing to big, complex software systems. The larger the test case, the more likely it will be that some derived property of that test case follows a normal distribution.

Solutions

Here are some possible ways to prevent normally distributed properties from making random testing less effective.

First, we could try to execute an extremely large number of random test cases in order to saturate even parts of the space of test cases that are generated with low probability. But this is not the right solution: the rapid falling off of the normal distribution means we’ll need to increase the number of test cases exponentially.

Second, we can impose (small) arbitrary limits on system parameters. For example, if we test queues of size 4, we’re likely to reach the overflow condition. If we test bit vectors of size 8, we’re likely to test the case where the inputs are equal. This is an extremely powerful testing technique: by reducing the size of the state space, we are more likely to reach interesting edge cases. However, it has a couple of drawbacks. For one thing, it requires modifying the software under test. Second, it requires whitebox knowledge, which may be hard to come by.

Third, we can hack the random tester to reshape its distributions. For the queue example, we might generate enqueue operations 75% of the time and dequeue operations 25% of the time. For the bit functions, we can start with a single random 64-bit number and then create the second input by flipping a random number of its bits (this actually makes the graph of frequencies of occurrences of Hamming distances completely flat). This will fix the particular problems above, but it is not clear how to generalize the technique. Moreover, any bias that we insert into the test case generator to make it easier to find one problem probably makes it harder to find others. For example, as we increase the bias towards generating enqueue operations, we become less and less likely to discover the opposite bug of the one we were looking for: a bug in the “dequeue from empty” corner case.

Fourth, we can bias the probabilities that parameterize the random tester, randomly, each time we generate a test case. So, for the queue example, one test case might get 2% enqueue operations, the next might get 55%, etc. Each of these test cases will be vulnerable to the constraints of the central limit theorem, but across test cases the centers of the distributions will change. This is perhaps more satisfactory than the previous method but it ignores the fact that the probabilities governing a random tester are often very finely tuned, by hand, based on intuition and experience. In Csmith we tried randomly setting all parameters and it did not work well at all (the code is still there and can be triggered using the --random-random command line option) due to complex and poorly understood interactions between various parameters. Swarm testing represents a more successful attempt to do the same thing, though our efforts so far were limited to randomly toggling binary feature inclusion switches—this seems much less likely to “go wrong” than does what we tried in Csmith. My current hypothesis is that some combination of hand-tuning and randomization of parameters is the right answer, but it will take time to put this to the test and even more time to explain why it works.

(Just to be clear, in case anyone cares: Swarm testing was Alex’s idea. The “random-random” mode for Csmith was my idea. The connection with the central limit theorem was my idea.)

Summary

A well-tuned random tester (such as the one I blogged about yesterday) can be outrageously effective. However, creating these tools is difficult and is more art than science.

As far as I know, this connection between random testing and the central limit theorem has not yet been studied—if anyone knows differently, please say. I’d also appreciate hearing any stories about instances where this kind of issue bit you, and how you worked around it. This is a deep problem and I hope to write more about it in the future, either here or in a paper.

Update from evening of June 19

Thanks for all the feedback, folks! I agree with the general sentiment that CLT applies a lot more narrowly than I tried to make it sound. This is what you get from reading hastily written blog posts instead of peer reviewed papers, I guess!

Anyway, the tendency for important properties of random test cases to be clumped in center-heavy distributions is a real and painful effect, and the solutions that I outline are reasonable (if partial) ones. I’ll try to reformulate my arguments and present a more developed version later on.

Also, thanks to the person on HN who mentioned the law of the iterated logarithm. I slept through a lot of college and apparently missed the day they taught that one!

1500+ Bugs from One Fuzzer

This metabug links to all of the defects found in Firefox’s JavaScript engine by jsfunfuzz. The surprise here isn’t that bugs were found, but rather that more than 1500 bugs were found in a single language runtime by a single test case generator. I’m interested in exactly what is going on here. One possibility would be that JS performance has become so important in the last five years that it supersedes all other goals, making these bugs inevitable.  Another possibility is that something is wrong with the architecture or the development process of this particular JS engine. It’s also possible that I’m simply out of touch with bug rates in real software development efforts and this kind of result is perfectly normal and expected. Regrettably, jsfunfuzz is no longer public, most likely because it was like handing a loaded gun to the enemy. Anyhow, jsfunfuzz serves as an excellent example of how powerful random testing can be.

PLDI in Beijing

PLDI 2012 was in Beijing earlier this week. Unfortunately I had only one full day to be a tourist; it would have been nice to bail out of the conference for another half day to see more stuff but that didn’t end up happening. My student Yang Chen went to college in Beijing and knew his way around; he showed me and my colleague Mary Hall around on our first day in Beijing. We walked around the olympic village (right next to the conference hotel) and then took the subway to Tiananmen Square. I liked the way the maps on the subway trains show your current location. The system is huge, probably comparable with London’s; it’s hard to believe it had just two lines in the 1990s.

We spent a half-day in the Forbidden City; we got lucky and had weather in the mid 80s with clear skies. Then we walked around a while and ate Szechuan beef, frogs, and eels. I was a little surprised how spicy everything was, but really it was perfect and didn’t get in the way of the flavors at all. I was a bit worried that my poor chopstick technique would make me look silly, but the Chinese attitude seems to be very practical. I’d have liked to spend a lot more time just walking around but at some point in the afternoon jet lag caught up with us and we took a taxi back to the hotel. My almost-former student Lu Zhao took us out for dinner and the food was superb.

One of the main reasons I like going to conferences is that while I’m naturally monomaniacal, talking to people and listening to talks can kind of shake my brain out of whatever rut it has gotten itself into. This isn’t something that has to happen all that often, but it feels important. Something funny about going to conferences these days is that often there are so many people I want to talk to that I end up missing presentations that I’d have liked to attend. This is in sharp contrast to my experiences at conferences 10 or 15 years ago where I basically didn’t know anybody.

Operating behind the Chinese firewall was annoying, mainly because Google seemed to be probabilistically blocked. Also, it took a while to deprogram myself from clicking on Twitter and Facebook any time I got a bit bored. That was a habit that needed breaking anyway. One of the interesting quirks was fine-grained blocking of Wikipedia pages. For example, the main entry for China loaded perfectly well but I couldn’t get to any page regarding the history of China.

One of the most pleasant things about this trip was learning about a number of Csmith users who I hadn’t known about. The tool appears to be pretty well-known these days and I guess it’s easy enough to use that people actually use it. Open source is weird that way: there are these silent users that you never hear about except by random chance. One user, Wei Huo from the Chinese Academy of Science, invited me to visit and give a talk about Csmith; this was great to do. There are also some Csmith users at Tsinghua and two of my students–Xuejun Yang, the main Csmith developer, and Yang Chen–gave talks there on the day that I flew home. A guy from a compiler consulting company told me that they’re getting a lot of bug reports from Csmith, which means that at least some embedded compiler practitioners are aware of Csmith. This is important first because these practitioners are highly non-academic, and second because it is the embedded compiler domain where I think tools like Csmith are most important. In other words, it’s not like GCC for x86 is so buggy that it’s hard to use–but some fraction of embedded compilers are buggy enough to make everyday development painful.

I gave the talk on C-Reduce, which was fun since us mid-career professors don’t get to do this very often–it is almost always the students who give the talks. My reasoning was that even though this paper had some student authors, C-Reduce has been my pet project for several years and I did basically all of the heavy lifting to get our submission out the door. However, between the submission deadline last November and now, Yang Chen did a ton of work on source to source transformations for C-Reduce and now his code dominates mine by maybe 10x. So probably I should have let him give the talk, but this possibility didn’t even occur to me until it was too late.

Announcing C-Reduce: A Better Test-Case Reducer for C/C++ Compiler Debugging

Test-case reduction means taking a large input to a computer program (for compiler debugging, the input is itself a program) and turning it into a much smaller input that still triggers the bug. It is a very important part of the debugging process.

Delta, an excellent open-source implementation of the delta debugging algorithm ddmin, has been the test-case reduction tool of choice for compiler developers. We’ve just released C-Reduce: a test case reducer that often gives output 25 times smaller than Delta’s. The contribution of C-Reduce is adding a lot of domain specificity. For example, it knows how to resolve a typedef, flatten a namespace, inline a function call, remove a dead argument from a function, and rename a variable. These (and dozens of other transformations) use Clang as a source-to-source transformation engine. Although domain specificity is displeasing in a reduction tool, C/C++ programs cannot be fully reduced without it. This earlier blog post contains many examples of C-Reduce’s output.

Resources:

  • Download source from the C-Reduce home page.
  • Learn how to use C-Reduce and see some examples here.
  • See our PLDI 2012 paper for a more thorough explanation and evaluation of C-Reduce.

We’d be happy to hear feedback.

Street Fighting Computer Science

One of my favorite recent books is Street Fighting Mathematics: a collection of techniques and heuristics for rapidly and roughly estimating the solutions to problems that may be very difficult to solve exactly. The book is important because estimation is incredibly useful for understanding the world and because our education system does not do a very good job in teaching it. I think the tacit assumption is that people will pick up street fighting methods on their own.

During the last couple of years I’ve started to wonder how to teach street fighting computer science. The need for rapid estimation comes up all the time, for example, when studying operating systems. How long does it take to read all of a 2 TB hard disk? How long does it take a packet to cross a data center, a city, or a continent? How big is a 36-bit address space? These are very simple problems compared to the ones that Mahajan tackles but even so, not everyone can quickly rattle off approximate answers.

Mahajan’s book is structured around a collection of general-purpose techniques:

  • Dimensional analysis — In CS terms, the universe has a type system and it can be exploited to make inferences and catch mistakes.
  • Easy cases — A correct solution must work for all input values — so set parameters to zero, one, infinity, or whatever makes the math simple.
  • Lumping — Use discrete approximations (perhaps a very coarse one, such as invoking the trapezoid rule with just one trapezoid) instead of calculus.
  • Pictorial proofs — Use visualization to gain intuition into problems who symbolic forms are hard to understand.
  • Taking out the big part — Split an answer into a big part and a correction; don’t even compute the correction unless more precision is needed.
  • Analogy — Find an analogous but simpler problem, solve it, and project the result back to the original problem.

These apply to street fighting CS just as well as street fighting mathematics. For example, taking out the big part is something we do all the time when determining efficiency of algorithms. On the other hand, the bulk of the technical content in Mahajan’s book is a collection of calculus-based problems about the physical world. These aren’t really where we want to be spending most of our time in a CS course (especially if everyone else’s calculus is as rusty as mine). What might we cover instead? Here are a few ideas:

  • Powers of 2 and 10 — This is basic, but all CS folks need to be able to rapidly convert any power of 2 up to about 40 to the nearest (or at least a close) power of 10, and vice versa. Similarly, converting between hex and binary should be quick.
  • Important constants — These are mainly sizes (hard disk, CD/DVD, RAM on a PC and smartphone, various caches, etc.), times (packet transmission, cycles, seeks, interrupt latencies, human perceptual constants), and rates (signal propagation in copper and fiber, bandwidth of various interconnects, power consumption of common devices, etc.). Here’s a nice list of times. A few important ASCII codes (such as those for 0, A, and a) should be memorized.
  • Counting arguments — The pigeonhole principle and such.
  • Lopsided distributions — For example when estimating flight time through a network using measurements, all outliers will be on one side.
  • Rules of thumb — Amdahl’s Law, Occam’s Razor, etc.

Next, we want a collection of diverse example problems to work out. Probably these will mostly come from people’s everyday experience making computer systems work, but we can also draw on:

Basically I’m just fishing around here to see if this idea has legs; I’d appreciate feedback.