The Zion Subway

When Josh, my older son’s best friend’s dad, suggested that we take our combined kids through the Left Fork of North Creek in Zion National Park (more commonly called The Subway), I wasn’t immediately excited. For one thing, it’s a somewhat technical canyon, and for another the permit that we got was for mid-May, towards the end of the spring runoff, when the canyon would contain plenty of deep, cold water.

The first problem to be solved was figuring out a process where we could get our kids safely down multiple rappels in the canyon. We couldn’t just hire an experienced guide since Zion NP doesn’t allow that. We settled on spending a couple of sessions practicing rappelling with an instructor before leaving home, and then doing some more practicing in Zion before entering the canyon. I was ready to nix the trip if any of the kids seemed unsafe but they all did really well. We ended up with a process where Josh would rappel first, then the kids would rappel down with me backing them up with a belay, and then finally I’d rappel with Josh giving me a fireman’s belay. The rappels in this canyon are fairly short so we knew that a number of common failure modes (unable to communicate, rapping off the end of the rope, etc.) weren’t going to be an issue. Also the canyon is bolted so we wouldn’t have to worry about building anchors. We spent a lot of time making sure the kids wouldn’t get fingers/gloves/clothes pinched in their belay devices.

The second problem was dealing with 40°F / 4°C water. We ended up renting drysuits for the three younger children and putting the rest of us in wetsuits; this worked well. I saw a bit of chattering teeth in the longer water sections but luckily we were in the deepest part of the canyon in the middle of the day and there were always patches of sun to warm up in.

Here the kids and I are hanging out at the upper trailhead while Josh makes the car shuttle happen:

The upper part of the hike is a short section of alpine forest and then some gorgeous slickrock:

Finally we’re looking directly into the narrows, but still a couple hundred vertical feet above the canyon bottom:

A steep gully bypasses the cliffs:

And finally we’re in the canyon, getting suited up at the first sign of deep water:

Alas I have no rappel pictures since I was managing the process from above. The second rappel was challenging: it had an awkward start, running water, and finished in waist-deep water. Here Josh is coiling a rope at the bottom of the first, easy rappel, which was down the face of this boulder:

Since each person’s backpack had a drybag inside of it (with as much air trapped as possible), the backpacks could be used as flotation devices. Also, all of the kids are decent swimmers and the drysuits kept them pretty warm. They found the wet parts of this canyon to be tremendously fun:

Plenty of short, slippery downclimbs:

A little unnerving to watch the kids swimming off into the dark:

The scenery was really spectacular:

Finally we arrive at the actual “subway” section where the canyon bottom is rounded out:

Perhaps the most-photographed log in the world, not particularly photogenic here due to the harsh light, but other people have done better:

After this there’s a final technical obstacle, a 15 m rappel, and then a long and not particularly easy or fun walk out.

Still smiling at the end, but tired after 10 hours on the move:

Pointer Overflow Checking is in LLVM

Production-grade memory safety for legacy C and C++ code has proven to be a frustratingly elusive goal: plenty of research solutions exist but none of them appear to be deployable as-is. So instead, we have a patchwork of partial solutions such as CFI, ASLR, stack canaries, hardened allocators, and NX.

Today’s quick post is about another piece of the puzzle that very recently landed in LLVM: pointer overflow checking. At the machine level a pointer overflow looks just like an unsigned integer overflow, but of course at the language level the overflowing operation is pointer arithmetic, not unsigned integer arithmetic. Keep in mind that in these languages, unsigned overflow is defined but signed overflow is undefined. Pointer overflow is a weak indicator of undefined behavior (UB): the stricter rule is that it is UB to create a pointer that lies more than one element outside of an allocated object. It is UB merely to create such a pointer, it does not need to be dereferenced. Also, it is still UB even if the overflowed pointer happens to refer to some other allocated object.

Here is the patch, it was originally developed by Will Dietz (who is doing his PhD at UIUC under Vikram Adve) and then pushed into the tree by Vedant Kumar (a compiler hacker at Apple). In 2013, Will wrote a great blog post about the patch. He showed lots of examples of pointer overflows in open source programs. Also see an earlier post of mine.

To see pointer overflow checking in action you’ll need to build a very recent Clang/LLVM (r304461 or later) from source, and then you can try out this stupid little program:

$ cat pointer-overflow.c
#include  <stdio.h>
#include  <stdint.h>

int main(void) {
  for (int i, *p = &i; ; p += 1000)
    printf("%p\n", p);
$ clang -O3 pointer-overflow.c -Wall -fsanitize=pointer-overflow -fsanitize-trap=pointer-overflow -m32
$ ./a.out 0xff8623c4
Illegal instruction

Of course the result is much the same if the pointer is decremented in the loop, instead of incremented; it just takes longer to hit the overflow.

The transformation implemented by the compiler here is pretty straightforward. Here’s IR for the uninstrumented program (I cleaned it up a bit):

define i32 @main() {
  %i = alloca i32, align 4
  br label %for.cond

  %p.0 = phi i32* [ %i, %entry ], [ %add.ptr, %for.cond ]
  %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0), i32* %p.0)
  %add.ptr = getelementptr inbounds i32, i32* %p.0, i32 1000
  br label %for.cond

To instrument the program, the last two instructions are changed into these three instructions (and also a trap basic block is added, which simply calls the LLVM trap intrinsic):

  %1 = icmp ult i32* %p.0, inttoptr (i32 -4000 to i32*)
  %add.ptr = getelementptr inbounds i32, i32* %p.0, i32 1000
  br i1 %1, label %for.cond, label %trap

The icmp checks whether the not-yet-incremented pointer is below 0xfffff060, in which case it can be incremented without overflowing.

Can pointer overflow checking by used as a mitigation in production code? This should be fine if you (as I did above) use the -fsanitize-trap=pointer-overflow flag to avoid dragging in any of the UBSan runtime library. But how efficient is it? I ran SPEC INT 2006 with and without pointer overflow checking. 400.perlbench actually contains pointer overflows so we’ll leave it out. Here are the raw scores with and without pointer overflow checking, and here are the increases in runtime due to pointer overflow checking, sorted from best to worst:

462.libquantum -1%
429.mcf 5%
471.omnetpp 5%
403.gcc 9%
483.xalancbmk 12%
473.astar 27%
401.bzip2 34%
445.gobmk 50%
458.sjeng 79%
464.h264ref 113%
456.hmmer 119%

Keep in mind that this implementation is totally untuned (the patch landed just today). No doubt these scores could be improved by teaching LLVM to eliminate unnecessary overflow checks and, when that doesn’t work, to hoist checks out of inner loops.

Although, in the example above, I enabled pointer overflow checking using an explicit flag, these checks are now part of UBSan and -fsanitize=undefined will enable them.

Compiler Optimizations are Awesome

This piece, which I hadn’t gotten around to writing until now since I thought it was all pretty obvious, explains why Daniel J. Bernstein’s talk, The death of optimizing compilers (audio) is wrong, and in fact compiler optimizations are extremely wonderful and aren’t going anywhere.

First, the thesis of the talk is that almost all code is either hot, and therefore worth optimizing by hand, or else cold, and therefore not worth optimizing at all (even with -O). Daniel Berlin, a compiler person at Google, has looked at the data and disagrees. We can also refute Bernstein’s argument from first principles: the kind of people who can effectively hand-optimize code are expensive and not incredibly plentiful. If an optimizing compiler can speed up code by, for example, 50%, then suddenly we need to optimize a lot less code by hand. Furthermore, hand-optimized code has higher ongoing maintenance costs than does portable source code; we’d like to avoid it when there’s a better way to meet our performance goals.

Second, size matters. Most of the computers in the world are embedded and many of these are storage-constrained. Compiler optimization reduces code size and this phenomenon is completely independent of the hot/cold issue. Without optimization we’d have to buy more expensive deeply-embedded processors that have more on-chip flash memory, and we’d also have to throw away many of those 16 GB phones that are cheap and plentiful and fairly useful today.

Third, most future software isn’t written in C and C++ but rather in higher-level languages, which more or less by definition rely on the optimizer to destroy abstraction layers, do compile-time memory management, etc.

Finally, I claim that the economics of compiler optimization are excellent. A lot of dollars are spent each year making code run faster, either by buying hardware resources or by paying programmers to write faster code. In contrast, there are probably a few thousand people actively doing compiler optimization work, and just about everyone benefits from this. If we can centralize on fewer compiler infrastructures, like GCC and LLVM and V8, then the economics get even better.

In summary, of course there’s plenty of hot code that wants to be optimized by hand, and of course there’s plenty of cold code that sees little benefit due to optimizing compilers. But neither of these facts forms an argument against optimizing compilers, which are amazingly useful and will continue to be for the indefinite future.

Translation Validation of Bounded Exhaustive Test Cases

This piece is jointly authored by Nuno Lopes and John Regehr.

Compilers should be correct, but it is not straightforward to formally verify a production-quality compiler implementation. It is just too difficult to recover the high-level algorithms by looking at an enormous mess of arithmetic, loops, and memory side effects. One solution is to write a new compiler such as CompCert that is designed to be verified. Alternatively, we keep our large, low-level code base such as GCC or LLVM and settle for weaker forms of validation than formal verification. This piece is about a new way to do the second thing. Our focus is the middle-end optimizers, which seem to be the most difficult part of a compiler to get right. The target is LLVM.

End-to-end compiler testing, supported by a random source code generator like Csmith, is great — but it only gets us so far. The expressiveness of the program generator is one limitation, but a more serious problem is the normalization that happens in the compiler frontend. The issue is that there are a lot of valid code patterns that Clang will never emit and that are therefore impossible to test by driving Clang. As a Clang user you may not happen to care about this, but as LLVM people we want the middle-end optimizations to be free of logic errors and also the non-Clang-emittable code is important in practice since there are lots of frontends out there besides Clang.

The first step is to generate lots of LLVM IR. Rather than creating a relatively small number of large functions, as Csmith would do, this IR generator generates lots of tiny functions: it uses bounded exhaustive test generation to create every LLVM function up to a certain size. A fun thing about this kind of generator is its choose() operator. In random mode, choose() returns a random number; in exhaustive mode, it uses fork() to explore all alternatives. While this isn’t the most efficient way to do search, leveraging the OS keeps the generator very simple. The most vexing thing about this design is allowing it to use multiple cores while stopping it from being a fork bomb. The current version doesn’t contain code that tries to do this.

The next step is to run LLVM optimizations on the generated functions. One thing we want to try is the collection of passes that implements “-O2,” but it also makes sense to run some individual passes since it is possible for sequences of passes to miss bugs: early passes can destroy constructs that would trigger bugs in later ones, and the late passes can clean up problems introduced by earlier ones. In fact both of those things seem to happen quite often.

We end up with lots of pairs of unoptimized and optimized LLVM functions. The obvious thing to do is run them with the same inputs and make sure that the outputs are the same, but that only works when the executions encounter no undefined behaviors. Solutions to the UB problem include:

  1. Generating UB-free code, as Csmith would. At the level of these tiny functions that would be a major handicap on the generator’s expressiveness and we definitely do not wish to do it.
  2. Create an LLVM interpreter that detects UB instead of silently skipping over it. The rule is that the optimizer is allowed to preserve UB or remove it, but never to add it. In other words, the correctness criterion for any compiler transformation isn’t input/output equivalence but rather input/output refinement. Someone needs to write this interpreter, perhaps using lli as a starting point (though the last time we looked, the slow/simple interpreter mode of lli had suffered some bit rot).
  3. Formally verify the refinement relation using Alive. This is better than an interpreter because Alive verifies the optimization for all inputs to the function, but worse because Alive doesn’t support all of LLVM, but rather a loop-free subset.

It is option three that we chose. The Alive language isn’t LLVM but rather an LLVM-like DSL, but it is not too hard to automatically translate the supported subset of LLVM into Alive.

In the configuration that we tested (2- and 4-bit integers, three instructions per function, including select but not including real control flow, floating point, memory, or vectors) about 44.8 million functions are generated and binned into 1000 files. We identified seven configurations of the LLVM optimizers that we wanted to test: -O2, SCCP, GVN, NewGVN, Reassociate, InstSimplify, and InstCombine. Then, to make the testing happen we allocated 4000 CPU cores (in an Azure HPC cluster) to process in batch the 7000 combinations of file + optimization options. Each combination takes between one and two hours, depending on how many functions are transformed and how long Alive takes to verify the changes.

If we could generate all LLVM functions and verify optimization of them, then we’d have done formal verification under another name. Of course, practically speaking, there’s a massive combinatorial explosion and we can only scratch the surface. Nevertheless, we found bugs. They fall into two categories: those that we reported and that were fixed, and those that cannot be fixed at this time.

We found six fixable LLVM bugs. The most common problem was transformations wrongly preserving the nsw/nuw/exact attributes that enable undefined behaviors in some LLVM instructions. This occurred with InstCombine [1], GVN [1], and Reassociate [1,2]. InstSimplify generated code that produces the wrong output for some inputs [1]. Finally, we triggered a crash in llc [1].

The unfixable bugs stem from problems with LLVM’s undefined behavior model. One way to fix these bugs is to delete the offending optimizations, but some of them are considered important. You might be tempted to instead fix them by tweaking the LLVM semantics in such a way that all of the optimizations currently performed by LLVM are valid. We believe this to be impossible: that there does not exist a useful and consistent semantics that can justify all of the observed optimizations.

A common kind of unfixable bug is seen in the simplification logic that LLVM has for select: it transforms “select %X, undef, %Y” into “%Y”. This is incorrect (more details in the post linked above) and, worse, has been shown to trigger end-to-end miscompilations [1]. Another source of problems is the different semantics that different parts of LLVM assume for branches: these can also cause end-to-end miscompilations [1,2].

In summary, this is a kind of compiler testing that should be done; it’s relatively easy and the resulting failing test cases are always small and understandable. If someone builds an UB-aware LLVM interpreter then no tricky formal-methods-based tools are required. This method could be easily extended to cover other compilers.

There are some follow-on projects that would most likely provide a good return on investment. Our test cases will reveal many, many instances where an LLVM pass erases an UB flag that it could have preserved; these could be turned into patches. We can do differential testing of passes against their replacements (for example, NewGVN vs. GVN) to look for precision regressions. The set of instructions that we generate should be extended; for example, opt-fuzz already has some limited support for control flow.

The code to run these tests is here.

Spring 2017

The hills above Salt Lake City are finally turning green.

Earlier in the year my family took a short trip to southeast Utah but it rained so much one day that I didn’t think the dirt roads would be passable, so we visited Ratio, a land art installation near Green River UT.

The next day started out foggy and cold, here’s an unassuming stretch of Muddy Creek shortly before it joins the Fremont River to become the Dirty Devil.

Later it cleared up and we explored the San Rafael Desert. This track didn’t seem to have seen much traffic over the winter.

In a nearby canyon I found a grinding stone that someone had stashed between 700 and a few thousand years ago.

Later in spring it turned out my kids’ school vacations were misaligned so instead of getting out into the desert as a family I took each kid individually on a short trip. Here we’re partway up a trail that was used in the first half of the 20th century to give sheep access to a remote mesa top.

The weather was imperfect but showy; here the Henry Mountains, the last part of the lower 48 to be mapped and explored, are getting stormed on. I feel like deserts are supposed to be dry but it seems like we get rained on on almost every trip.

Wind and grass.

North Caineville Mesa and Factory Butte.

Indian paintbrush.

This is the kind of photograph you only seem to get when you’re soaked from one rain storm and another is approaching. We had gotten the tent up during the first shower, so were mostly dry and happy. I accidentally grabbed a one-person tent for this trip so the ten year old and I had a pretty cozy night.

During his break, my older son and I explored some areas around Escalante, UT. This Anasazi granary under an arch is something I’d been wanting to see for a long time, but had previously been thwarted by logistical problems such as a long, rugged drive.

The masonry is in about as good condition as any I’ve seen, and notice the sticks at the top of the opening.

We also ran across some less well-preserved granaries.

I always wonder about the circumstances that lead to this kind of thing being abandoned, perhaps it broke inside an animal or when it hit the ground after a miss? Often you find broken arrowheads along with chippings indicating a site where people sat and worked, but this point was all by itself.

Afternoon light in Alvey Wash, a large canyon draining the Kaiparowits Plateau.

The next day we visited the Red Breaks canyon system, which has some spectacular slots filled with nice sandstone and small climbing problems. Not shown: climbing problems and freezing, waist-deep water.

A bizarre landform in the Red Breaks area that is often called the Escalante Volcano (though it is not, as far as I know, of volcanic origin). It’s hard to tell from this photo but this thing is enormous; the sandstone dome in the center of the “volcano” is about 80 feet tall.

A neat area of petrified logs in Egg Canyon off the Burr Trail near Boulder, UT.

Some of the logs bridged the waterway.

I hope everyone else had a nice spring too!

Taming Undefined Behavior in LLVM

Earlier I wrote that Undefined Behavior != Unsafe Programming, a piece intended to convince you that there’s nothing inherently wrong with undefined behavior as long as it isn’t in developer-facing parts of the system.

Today I want to talk about a new paper about undefined behavior in LLVM that’s going to be presented in June at PLDI 2017. I’m an author of this paper, but not the main one. This work isn’t about debating the merits of undefined behavior, its goal is to describe and try to fix some unintended consequences of the design of undefined behavior at the level of LLVM IR.

Undefined behavior in C and C++ is sort of like a bomb: either it explodes or it doesn’t. We never try to reason about undefined programs because a program becomes meaningless once it executes UB. LLVM IR contains this same kind of UB, which we’ll call “immediate UB.” It is triggered by bad operations such as an out-of-bounds store (which is likely to corrupt RAM) or a division by zero (which may cause the processor to trap).

Our problems start because LLVM also contains two kinds of “deferred UB” which don’t explode, but rather have a contained effect on the program. We need to reason about the meaning of these “slightly undefined” programs which can be challenging. There have been long threads on the LLVM developers’ mailing list going back and forth about this.

The first kind of deferred UB in LLVM is the undef value that acts like an uninitialized register: an undef evaluates to an arbitrary value of its type. Undef is useful because sometimes we want to say that a value doesn’t matter, for example because we know a location is going to be over-written later. If we didn’t have something like undef, we’d be forced to initialize locations like this to specific values, which costs space and time. So undef is basically a note to the compiler that it can choose whatever value it likes. During code generation, undef usually gets turned into “whatever was already in the register.”

Unfortunately, the semantics of undef don’t justify all of the optimizations that we’d like to perform on LLVM code. For example, consider this LLVM function:

define i1 @f(i32) {
  %2 = add nsw i32 %0, 1
  %3 = icmp sgt i32 %2, %0
  ret i1 %3

This is equivalent to “return x+1 > x;” in C and we’d like to be able to optimize it to “return true;”. In both languages the undefinedness of signed overflow needs to be recognized to make the optimization go. Let’s try to do that using undef. In this case the semantics of “add nsw” are to return undef if signed overflow occurs and to return the mathematical answer otherwise. So this example has two cases:

  1. The input is not INT_MAX, in which case the addition returns input + 1.
  2. The input is INT_MAX, in which case the addition returns undef.

In case 1 the comparison returns true. Can we make the comparison true for case 2, giving us the overall result that we want? Recall that undef resolves as an arbitrary value of its type. The compiler is allowed to choose this value. Alas, there’s no value of type i32 that is larger than INT_MAX, when we use a signed comparison. Thus, this optimization is not justified by the semantics of undef.

One choice we could make is to give up on performing this optimization (and others like it) at the LLVM level. The choice made by the LLVM developers, however, was to introduce a second, stronger, form of deferred UB called poison. Most instructions, taking a poison value on either input, evaluate to poison. If poison propagates to a program’s output, the result is immediate UB. Returning to the “x + 1 > x” example above, making “add nsw INT_MAX, 1” evaluate to poison allows the desired optimization: the resulting poison value makes the icmp also return poison. To justify the desired optimization we can observe that returning 1 is a refinement of returning poison. Another way to say the same thing is that we’re always allowed to make code more defined than it was, though of course we’re never allowed to make it less defined.

The most important optimizations enabled by deferred undefined behavior are those involving speculative execution such as hoisting loop-invariant code out of a loop. Since it is often difficult to prove that a loop executes at least once, loop-invariant code motion threatens to take a defined program where UB sits inside a loop that executes zero times and turn into into an undefined program. Deferred UB lets us go ahead and speculatively execute the code without triggering immediate UB. There’s no problem as long as the poisonous results don’t propagate somewhere that matters.

So far so good! Just to be clear: we can make the semantics of an IR anything we like. There will be no problem as long as:

  1. The front-ends correctly refine C, C++, etc. into IR.
  2. Every IR-level optimization implements a refinement.
  3. The backends correctly refine IR into machine code.

The problem is that #2 is hard. Over the years some very subtle mistakes have crept into the LLVM optimizer where different developers have made different assumptions about deferred UB, and these assumptions can work together to introduce bugs. Very few of these bugs can result in end-to-end miscompilation (where a well-formed source-level program is compiled to machine code that does the wrong thing) but even this can happen. We spent a lot of time trying to explain this clearly in the paper and I’m unlikely to do better here! But the details are all there in Section 3 of the paper. The point is that so far these bugs have resisted fixing: nobody has come up with a way to make everything consistent without giving up optimizations that the LLVM community is unwilling to give up.

The next part of the paper (Sections 4, 5, 6) introduces and evaluates our proposed fix, which is to remove undef, leaving only poison. To get undef-like semantics we introduce a new freeze instruction to LLVM. Freezing a normal value is a nop and freezing a poison value evaluates to an arbitrary value of the type. Every use of a given freeze instruction will produce the same value, but different freezes may give different values. The key is to put freezes in the right places. My colleagues have implemented a fork of LLVM 4.0 that uses freeze; we found that it more or less doesn’t affect compile times or the quality of the generated code.

We are in the process of trying to convince the LLVM community to adopt our proposed solution. The change is somewhat fundamental and so this is going to take some time. There are lots of details that need to be ironed out, and I think people are (rightfully) worried about subtle bugs being introduced during the transition. One secret weapon we have is Alive where Nuno has implemented the new semantics in the newsema branch and we can use this to test a large number of optimizations.

Finally, we noticed that there has been an interesting bit of convergent evolution in compiler IRs: basically all heavily optimizing AOT compilers (including GCC, MSVC, and Intel CC) have their own versions of deferred UB. The details differ from those described here, but the effect is the same: deferred UB gives the compiler freedom to perform useful transformations that would otherwise be illegal. The semantics of deferred UB in these compilers has not, as far as we know, been rigorously defined and so it is possible that they have issues analogous to those described here.

Fun at the UNIX Terminal Part 1

This post is aimed at kids, like the 6th graders who I was recently teaching about programming in Python. It is more about having fun than about learning, but I hope that if you enjoy playing around at the UNIX terminal, you’ll eventually learn to use this kind of system for real. Keep in mind this immortal scene from Jurassic Park.

To run the commands in this post, you’ll need a UNIX machine: either Linux or Mac OS X will work. You’ll also need the ability to install software. There are two options:

  • Install precompiled binaries using a package manager, I’ll give command lines for Homebrew on OS X and for Apt on Ubuntu Linux. You’ll need administrator access to run Apt or to install Homebrew, but you do not need administrator access to install packages after Homebrew has been installed. Other versions of Linux have their own package managers and they are all pretty easy to use.
  • Build a program from source and install it in your home directory. This does not require administrator access but it’s more work and I’m not going to go into the details, though I hope to do this in a later post.

ROT13 using tr

The tr utility should be installed by default on an OS X or Ubuntu machine. It translates the characters in a string into different characters according to rules that you provide. To learn more about tr (or any other command in this post) type this command (without typing the dollar sign):

$ man tr

This will show you the UNIX system’s built-in documentation for the command.

In this and subsequent examples, I’ll show text that you should type on a line starting with a dollar sign, which is the default UNIX prompt. Text printed by the system will be on lines not starting with a dollar sign.

We’re going to use tr to encrypt some text as ROT13, which simply moves each letter forward in the alphabet by 13 places, wrapping around from Z to A if necessary. Since there are 26 letters, encrypting twice using ROT13 gives back the original text. ROT13 is fun but you would not want to use it for actual secret information since it is trivial to decrypt. It is commonly used to make it hard for people to accidentally read spoilers when discussing things like movie plot twists.

Type this:

$ echo 'Hello this is a test' | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Uryyb guvf vf n grfg

Now to decrypt:

$ echo 'Uryyb guvf vf n grfg' | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Hello this is a test

Just two more things before moving on to the next command.

First, the UNIX pipe operator (the “|” character in the commands above, which looks a little bit like a piece of pipe) is plumbing for UNIX commands: it “pipes” the output of one command to the input of a second command. We’ll be using it quite a bit.

Second, how exactly did we tell tr to implement ROT13? Well, the first argument, ‘A-Za-z’, gives it a set of characters to work with. Here A-Z stands for A through Z and a-z stands for a through z (computers treat the capital and lowercase versions of letters as being separate characters). So we are telling tr that it is going to translate any letter of the alphabet and leave any other character (spaces, punctuation, numbers, etc.) alone. The second argument to tr, ‘N-ZA-Mn-za-m’, specifies a mapping for the characters in the first argument. So the first character in the first argument (A) will be translated to the first character of the second argument (N), and so on. We could just as easily use tr to put some text in all uppercase or all lowercase, you might try this as an exercise.


Tragically, this command isn’t installed by default on a Mac or on an Ubuntu Linux machine. On a Mac you can install it like this:

brew install fortune

If this doesn’t work then you need to get Homebrew setup, try this page.

On Ubuntu try this:

sudo apt-get install fortune

The “sudo” command will ask you to enter your password before running the next command, apt-get, with elevated privileges, in order to install the fortune program in a system directory that you are normally not allowed to modify. This will only work if your machine has been specifically configured to allow you to run programs with elevated privileges.

In any case, if you can’t get fortune installed, don’t worry about it, just proceed to the next command.

Fortune randomly chooses from a large collection of mildly humorous quotes:

$ fortune 
I have never let my schooling interfere with my education.
		-- Mark Twain


This command is installed by default on a Mac; on Ubuntu you’ll need to type “sudo apt install gnustep-gui-runtime”.

Type this:

$ say "you just might be a genius"

Make sure you have sound turned up.

The Linux say command, for whatever reason, requires its input to be a command line argument, so we cannot use a pipe to send fortune’s output to say. So this command will not work on Linux (though it does work on OS X):

$ fortune | say

However, there’s another trick we can use: we can turn the output of one command into the command-line arguments for another command by putting the first command in parentheses and prefixing this with a dollar sign. So this will cause your computer to read a fortune aloud:

$ say $(fortune)

Another way to accomplish the same thing is to put fortune’s output into a file and then ask say to read the file aloud:

$ fortune > my_fortune.txt
$ say -f my_fortune.txt

Here the greater-than symbol “redirects” the output of fortune into a file. Redirection works like piping but the output goes to a file instead of into another program. It is super useful.

If you run “say” on both a Linux box and a Mac you will notice that the Mac’s speech synthesis routines are better.


The extremely important cowsay command uses ASCII art to show you a cow saying something. Use it like this:

$ fortune | cowsay
/ What is mind? No matter. What is \
| matter? Never mind.              |
|                                  |
\ -- Thomas Hewitt Key, 1799-1875  /
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Both Homebrew and Apt have a package called “cowsay” that you can install using the same kind of command line you’ve already been using.

Cowsay has some exciting options, such as “-d” which makes the cow appear to be dead:

$ fortune | cowsay -d
/ Laws are like sausages. It's better not \
| to see them being made.                 |
|                                         |
\ -- Otto von Bismarck                    /
        \   ^__^
         \  (xx)\_______
            (__)\       )\/\
             U  ||----w |
                ||     ||

Use “man cowsay” to learn more.


Don’t install ponysay unless you feel that cowsay is too restrictive. Also, it isn’t available as a precompiled package. You can build it from source code by first installing the “git” package using apt-get or brew and then running the following commands:

$ git clone
$ cd ponysay
$ ./ --freedom=partial --private install

This procedure puts ponysay into an odd location, but whatever. Here (assuming Linux, on a Mac you’ll need to pipe a different command’s output to ponysay) a cute pony tells us the prime factorization of a number:


Figlet (actually called FIGlet but that’s not what you type to run the command) prints text using large letters comprised of regular terminal characters. For example:

$ whoami | figlet
 _ __ ___  __ _  ___| |__  _ __ 
| '__/ _ \/ _` |/ _ \ '_ \| '__|
| | |  __/ (_| |  __/ | | | |   
|_|  \___|\__, |\___|_| |_|_|   

Figlet has lots of options for controlling the appearance of its output. For example, you can change the font:

$ echo 'hello Eddie' | figlet -f script
 _          _   _          ___                    
| |        | | | |        / (_)   |     |  o      
| |     _  | | | |  __    \__   __|   __|      _  
|/ \   |/  |/  |/  /  \_  /    /  |  /  |  |  |/  
|   |_/|__/|__/|__/\__/   \___/\_/|_/\_/|_/|_/|__/

Another command, toilet, is similar to figlet but has even more options. Install both of these programs using the same kinds of commands we’ve already been using to talk to package managers.


The UNIX “cat” program prints a file, or whatever is piped into it, to the terminal. Lolcat is similar but it prints the text in awesome colors:


The bb program doesn’t seem to be available from Homebrew, but on Ubuntu you can install it using “sudo apt-get install bb”. It is a seriously impressive ASCII art demo.


You know how lots of web sites want you to sign up using your name and address, but your parents hopefully have trained you not to reveal your identity online? Well, the rig utility can help, it creates a random identity:

$ rig
Juana Waters
647 Hamlet St
Austin, TX  78710
(512) xxx-xxxx

The zip codes and telephone area codes are even correct. For some reason rig will never generate an identity that lives in Utah.


The bc program is a calculator but unlike almost every other calculator you use, it can handle numbers of unlimited size (or, more precisely, numbers limited only by the amount of RAM in your computer) without losing precision. Try this:

$ echo '2 ^ 100' | bc

Unfortunately bc does not have a built-in factorial function but you can write one easily enough using bc’s built-in programming language. Start bc in interactive mode (this will happen by default if you don’t pipe any text into bc) by just typing “bc”. Then enter this code:

define fact(n) {
  if (n < 2) return 1;
  return n * fact(n - 1);

Now you can compute very large factorials:


While we're at it, you should figure out why the factorial of any large number contains a lot of trailing zeroes.


We've only scratched the surface, I'll share more entertaining UNIX commands in a followup post. Some of these programs I hadn't even heard of until recently, a bunch of people on Twitter gave me awesome suggestions that I've used to write this up. If you want to see a serious (but hilariously outdated) video about what you can do using the UNIX command line, check out this video in which one of the original creators of UNIX shows how to build a spell checker by piping together some simple commands.