What PC-lint is Really For: Enforcing Language Dialects

John Carmack’s piece about using static analyzers is really nice, and he’s right that applying PC-lint to a significant piece of C code results in page after page of warnings. Faced with this problem on a code base I cared about, and not wanting to give up, I first created a config file for PC-lint that suppressed the vast majority of warnings and then, over a few weeks, cleaned up the remaining issues. Once the code base linted clean, I could turn on classes of warnings whenever I had the time and inclination.

As an example, PC-lint supports fine-grained application of stronger typing rules. So one day I might assume that all uses of “bool” or some other enumerated type are segregated from uses of integers. Of course, in any real code base, such an assumption is wrong and a pile of annoying warnings ensues. I’d either fix them or else decide that it wasn’t worth it until some major rearchitecting could be done.

The cool thing was that after a while I realized that I wasn’t really writing C code anymore, but rather a much more strongly typed dialect where I actually had a degree of (justifiable) confidence that certain kinds of errors were not being made. The C compiler, with its near-nonexistent error checking, was acting as a mere code generator. It wasn’t even the case that PC-lint was uncovering a lot of serious bugs in my code (though it did find a few). Rather, it gave me some peace of mind and assured me that what I thought was happening in the code was in fact happening.

Of course, we always write code in dialects, whether we realize it or not. In many cases, language dialects are formalized as coding conventions or standards so a group of people can all read or write the same dialect — this has enormous advantages. The things that I found charming about PC-lint are that (1) it supported a very flexible range of dialects and (2) it quickly and mechanically ensured that my code stayed within the dialect. When writing non-C code (especially Perl) I often wish for a similar dialect checker.

, ,

13 responses to “What PC-lint is Really For: Enforcing Language Dialects”

  1. Boy, this posting really makes me think you should explore other languages that have good type systems.

  2. Robby, I’ve used some of those languages and like them well enough, but they seem to fall over for some of the more demanding tasks. And it is for demanding tasks where I’m actually interested in static analysis and such to harden the code.

    For example, the code base under discussion in this post plugged into an extremely hairy, performance-critical part of an OS kernel. Which language with a good type system would you recommend for that situation?

  3. I was reacting to the last paragraph of your post (sorry for the lack of clarity!), where you seem to be suggesting that a good way to make your programming experience better is to add a “dialect checker” for perl. I think this is a worse idea than using any number of different languages (and I hope I’m not “making an ass of u and me” when I assume that you aren’t using perl in said performance-critical code).

    As for the other, I think we agree that C is very good at what it is very good at, and that there are a lot of downsides to its singular focus. In particular, I think it helps with “performance-critical” but not at all with “hairy”. (Your post also seems to suggest that you agree that it doesn’t deal well with hairy!).

    Perhaps better type systems for C are the answer. There have been a few projects over the years that try to do this: deputy and cyclone come to mind. (I have not actually tried to program with them, tho.)

    Also, if you have found Racket to “fall over” in this manner, please do submit a bug report.

  4. PS: I should have said this earlier, but I think the idea of having program-specific type systems is a great one and Racket is even taking some first steps in this direction by supporting two type systems in a single program.

  5. > For example, the code base under discussion in this post plugged into an extremely hairy,
    > performance-critical part of an OS kernel. Which language with a good type system
    > would you recommend for that situation?

    If you are ready to pay the price in complexity¹, ATS is what I would recommend.

    ¹: low-level programming has a lot of subtle invariants that programmer statically reason about. Only they usually are not precisely expressed in the type system. ATS makes those small differences explicit, and as a result it feels like C, annotated with a lot of subtle stuff.

    Cyclone is also nice.

  6. Robby — hairy systems code will have all sorts of non-functional requirements such as not allocating, not incurring a page fault, being safe with respect to hardware interrupts, not referencing the stack pointer or some other set of registers, doing odd things with memory mappings, etc. C does pretty well at giving the developer the kind of control that is necessary for dealing with these constraints, and they have little to do with performance. Racket and similar high-level languages are not (at present, anyway) remotely the answer to these problems, unless I’ve missed something.

    Cyclone and Deputy do not seem to be actively maintained. Unfortunately I never used Cyclone. I liked Deputy a lot but its type system only dealt with weaknesses that lead to memory safety errors; PC-lint addresses a different kind of weakness that is also worth eliminating.

    Re. Perl, I just like it!

  7. Could I suggest that you check out professional static analysis tools from _this_ millennium? E.g., for C/C++, Coverity 4.x or Klocwork, or perhaps for Java and security stuff, Fortify. The expensive tools generally give you a free trial, and you don’t have to give the bugs back.

  8. Static analysis tools from this millennium are impressive machines that holds the user in awe for a moment before he realises that they are only there to compensate for languages from deep inside the previous one. At that point, they become massive monuments of sadness.

    Our task should not be to ameliorate the tools used for dusty decks but to design truly modern replacements. ATS has already been mentioned; I’ve found it a tad cumbersome, but it is definitely a step in the right direction. Cyclone always felt too limited to me.

    (I spent a couple of days before Christmas wading through the reports from one of the expensive tools mentioned by Flash Sheridan above in the vain hope of finding a serious bug in millions of lines of legacy code. No catch, but tons of false positives and many bugs in the tool.)

  9. The warning policy strategy you describe is pretty much what I advocate when you run PC-lint on a new codebase. By doing that, we’ve gradually ramped up our warning policy to the point where it’s exceptionally aggressive – and despite that the core projects within our main product (Visual Lint) are lint clean.

    I have some ideas for automating this process, which could be interesting. One of these days we’ll implement them in Visual Lint, which I can imagine will make it even easier to use PC-lint on a new codebase.

    The same techniques are applicable to other analysis tools as well, of course.

  10. From my experience static code analysis needs to be built in to the compiler or needs to be part of the acceptance criteria for the build and test labs. If it is a separate from the regular software development then it is unlikely to be taken seriously. Of course this means that static analysis needs to be acceptably fast.