I wanted to write a manifesto-ish sort of piece about what compilers are supposed to look like in the future. Nuno was the obvious coauthor since I’ve talked to him about this topic so much that I’m overall not really sure which parts started out as his ideas and which were mine. The article didn’t end up feeling like a good fit for a series of blog posts, so we’ve placed it on arXiv: Future Directions for Optimizing Compilers. The first paragraph of the paper gives the main idea:
As software becomes larger, programming languages become higher-level, and processors continue to fail to be clocked faster, we’ll increasingly require compilers to reduce code bloat, eliminate abstraction penalties, and exploit interesting instruction sets. At the same time, compiler execution time must not increase too much and also compilers should never produce the wrong output. This paper examines the problem of making optimizing compilers faster, less buggy, and more capable of generating high-quality output.
We plan to keep revising the paper and would be happy to receive feedback on it.
34 responses to “Future Directions for Optimizing Compilers”
This is a good read. One thing i hated about LLVM was the pattern matching in InstCombine Pass. There was a code for every possible permutation of the patterns and it kind of killed the essence by having a brute-force-ish approach.
Two things I’ve wondered about optimizing compilers:
1) If they were allowed to take significantly more time and resources (e.g. >=10x) how much better a job could they do with little or no fundamentally new R&D, just what is already know today? Clearly that’s not viable for most builds, but for actual release candidates or weekly builds of some projects, spending an extra day making it 1% faster might actually be worth the time.
2) What changes to the languages could be made to expand optimization opportunities even more? C++ move semantics are an example of this, but can we go further? E.g. what advantage would it provide if a type were able to allow relaxed copy semantics? I.e. allow the same kind of fast and lose with copies and lifetimes that integers can use? Would that permit more aggressive copy-elision and (N)RVO in enough places to warrant its use for types like std::string and std::shared_ptr?
Many optimizations resolve around the idea that in cases where there are a variety of ways a compiler could perform an action, *all of which would meet application requirements*, it is helpful to allow compilers to freely select among them. A key point which I think often gets missed is that giving a compiler the freedom to choose in various circumstances from among actions which don’t meet application requirements will impede optimizations by forcing programmers to write code whose behavior is always rigidly defined.
Because C is used on so many different platforms and for so many purposes, any fixed set of behavior which is defined uniformly, without extensions, on every implementation will be defined sub-optimally on most. According to the published Rationale, the authors of the Standard intended UB to encourage useful variety among quality implementations which could extend the range of semantics available to the programmer in various ways. I agree that good optimization would require a language specification that is “widely agreed upon” as fully specifying the semantics that should be available to programmers in a range of programming fields. The only way a single C Standard can hope to accomplish that, however, is by recognizing dialects (perhaps selectable via directives, or–at minimum–testable via predefined macros) which are targeted toward various purposes, and which offer behavioral guarantees appropriate to those purposes. As it is, the behavioral guarantees offered by the Standard are severely sub-optimal for anything other than a few specialized applications which will never be called upon to process data from possibly-malicious sources.
Most applications outside such specialized niches need will need to offer one of two general behavioral guarantees in cases where they are fed data they cannot process (perhaps because it is invalid, or it places higher demands on parts of the program than were expected).
1. Programs for some purposes my need to guarantee that if they report successful completion all output will have been computed as specified; when given data they cannot process, they may process an arbitrary amount of it (producing output in specified fashion) before giving up and reporting an error.
Note that optimizers may benefit significantly from being allowed to correctly process input in cases beyond those required by the Standard (as would happen if e.g. (x*15/3) was refactored into (x*5) without bothering to check whether computation of x*15 would overflow), and also from being allowed to exit early in cases where the program is going to fail without having to process all data up to the point of failure (thus allowing operations that might fail to be performed eagerly in cases where that would benefit efficiency).
2. Programs for other purposes may meet requirements if they produce essentially arbitrary output when given invalid data, provided they they uphold some loose behavioral constraints. Although different applications may need different constraints, that would often be best handled by having an implementation offer loose constraints but provide ways of tightening them.
For example, guaranteeing that arithmetic overflow will at worst yield a “wobbly” value with no other side effects, and that some operations, given a “wobbly” value, will select a non-wobbly result from among the wobbly value’s possibilities, would allow programmers to give compilers much more freedom than would be possible without such guarantees.
If implementations that extend the Standard with guarantees designed to facilitate the above requirements are fed programs that exploit such guarantees, they will be able to generate more efficient code than any implementation would be able to produce given code that avoids at all cost any situations that presently invoke UB. If compiler writers want to generate optimal code for a wide range of applications, they should work on trying to define and support optional semantic extensions that would fit well with various applications’ needs.
Hi Benjamin, nobody really knows the answer to #1, though there have been compilers like Stalin that try really, really hard to make code fast. It’s a pretty interesting question for sure.
The thing that most comes to mind for #2 is Rust, where the front-end has super duper aliasing information that the middle-end is basically oblivious to. One can imagine future versions of the Rust compiler using this to good effect. Also languages like ISPC avoid some of the pitfalls of C and C++ and can therefore be compiled to vector code far more effectively. CUDA and OpenCL also seem like they’ve hit similar design points. It has been argued that functional languages also cater to optimizing compilers but so far that vision has been quite hard to realize.
Found an error:
“Second, âˆ’a+âˆ’b can be rewritten as âˆ’(a âˆ’ b), saving one operation.”
ITYM âˆ’(a + b)
Hi John, yeah, these are all interesting areas to explore, and I’ve seen research doing some of the kinds of things you outline (don’t have links handy, sorry).
Regarding UB and dialects, I take what seems to be more or less the opposite view than John Payson does above. And the people on the C++ committee that I’ve seen discuss the topic seem to hold the same view: that any program that invokes UB is ill defined and no compiler should ever promise to enforce any semantic on it.
A big chunk the reasoning for holding that stance is that if you expect you code to remain correct, then you need to know that the compiler that you developed it under and the compiler you eventually build it under do the same thing. Without a well defined semantic, the only way to ensure that is to never change even the minor version number of the compiler you use, let alone what vendor or (heaven forbid) what architecture you target. If on the other hand you have a well defined semantic (e.g. via pre-defined-macros or some other approach), that would be come ipso facto part of the standard and no longer UB. At that point you might as well just make it part of the de jure standard.
As to how to allow programs to be better avoid over constraining the compiler: allow more precise specification of what code should do so that the author can constrain it in exactly and *only* the ways that actually matter. This could either be by allowing code to add selected constraints to particular usages (like Payson proposed, but in a more controlled way so that different choices can be made in different places) or by allowing particular usages to relax constrains (e.g. relaxed rules around observable lifetimes and copies of objects).
Perhaps I failed to adequately express myself, since I’m essentially in agreement with what Benjamin Shropshire is saying. Programmers shouldn’t generally *need* to use UB, because the Standard should define all the behaviors needed to accomplish tasks as cleanly as efficiently as would be possible on implementations that extend the language in common ways, and programmers seeking to write quality code should use behaviors defined by the Standard when practical. A fundamental difficulty, however, is that the design of C89, and every C or C++ Standard since, fundamentally relies upon the assumption that implementations will use UB as a means to fill in any necessary behaviors not provided by the Standard, and thus makes no effort to define all the behaviors necessary to make an implementation particularly suitable for anything other than a few specialized purposes.
If the Standard were to define a concept of a “Selectively Conforming” program which will not invoke UB when given any possible input, but would not be required to run usefully on any particular kind of implementation, then it could meaningfully seek to define enough behaviors to allow a substantial fraction of the existing corpus of C programs could be made Selectively Conforming (but usable for their intended purpose on their intended platforms) merely by adding various #if/#error and #pragma directives. In many cases, a number of actions would have their behavior essentially defined as “Do XXX in a defined fashion characteristic of the environment”, leaving the programmer responsible for knowing what environment the code would run in, and how it would define the behavior in question, thus avoiding the need to have the Standard itself worry about the particulars. Indeed, if there were no need to allow for any optimization, a complete specification for a category of compilers that could handle the majority of tasks that cannot be accommodated in Standard C could be quite short, since it wouldn’t have to do a whole lot besides say that if an implementation defines __SIMPLE_LOW_LEVEL_C, it must use a target platform’s native data types when they exist, or build longer data types by concatenating shorter ones without padding bits, and that in cases where some parts of the Standard defines the behavior of some action and some other part says an overlapping category of actions invoke UB, the defined behavior takes precedence.
If, however, the Standard continues to ignore the fact that many tasks can be accomplished easily by implementations that offer guarantees beyond those mandated by the Standard, but can’t be accomplished practically–or even at all–without such guarantees, then a demand that programmers refrain from using behaviors beyond those defined by the Standard would be a demand that the language not be regarded as suitable for purposes that it should be able to handle quite well.
I believe the largest opportunity for optimising compilers is to engage in a two sided conversation with programmers using assert-like functions.
Certainly as you, enable high optimization levels, compilers get smarter about spotting and warning about defects in your code.
A while back, I discovered that, counter intuitively, adding asserts in some cases, disabled the optimizers ability to spot defects.
The link at the bottom is to an extended thread on the subject.
But the point is this provides an excellent mechanism for a two way flow of information between the compiler and the programmer.
If compilers understood assert-like functions , through data flow analysis that it does anyway, they could warn the programmer if any assert expression could possibly be false on any data path.
Conversely, if asserts could inform the optimiser that it’s expression will always be true, and that the optimiser could use and rely on that information.
I do warn that this usage of asserts does _not_ conform to the usage of asserts made in many existing programs, or conform to the intent of many existing programmers.
Thus I strongly recommend that such a facility be given a different name, thus neatly side stepping any controversy.
@John Payson: are you conflating “implementation defined behavior” with “undefined behavior”? (FWIW those terms of art are defined by the standard as very different things.)
If you make that substitution, then I more or less agree with you. With regards to UB, I strongly stand by my claims:
– Programmers shouldnâ€™t *ever* use or need to use UB.
– implementations *shouldn’t* extend the language where the standard speaks of UB.
– programmers seeking to write quality code should *only* use behaviors defined by the Standard (including where the standard defines something as up to the implementation).
Also; from what I can tell, at least the people currently writing the C++ standard are *NOT* assuming that implementations will use UB as a means to fill in any necessary behaviors not provided by the standard, rather that is what *implementation defined* behavior is for. UB is intended to be used as a way to allow the compiler to make assumption about the code that are prohibitive or impossible to check.
As for the concept of “Selectively Conforming”, I’m not sure if that would provide anything on top of writing the majority of the code in a fully standard and portable way with the platforms specific code segregated behind a common API where multiple implementations can be selected form (at link time) that each use implementation defined behavior to get the job done. And all that is available today.
The difference between Undefined Behavior and Implementation-Defined Behavior is that the latter–at minimum–implies that quality implementations should behave in some consistent fashion, even in cases where the cost of guaranteeing any sort of consistency would exceed the value thereof. A decision to classify an action as invoking UB implies nothing more than a judgment that some implementations might plausibly benefit from not having to define the behavior.
Consider, for example, that a ones’-complement or sign-magnitude implementation which has a rotate-physical-bits-left-by-N instruction, but rotating a number and masking will take longer than than adding a number to itself. Under C89, such an implementation would have had to shift the physical bits in all cases, even when shifting by one bit. Changing the behavior to Implementation Defined would have allowed a compiler to always use repeated addition or always shift the physical bits, but either choice would have significantly degraded performance in some cases.
On the vast majority of C implementations, including so far as I can tell every non-contrived non-sanitizing C99 implementation, left-shifts of all values–positive or negative–work the same as multiplication by a power of two in every case where the product would fit in the result type. Platforms where this imposes any performance penalty whatsoever are relatively rare, and even on the platforms where it would occasionally require one more instruction, every compiler I’ve seen adds that instruction. Nonetheless, even though behavior is defined *identically* on nearly every C implementation, the Standard treats left shifts of negative values as Undefined Behavior to allow for hypothetical implementations like the ones’-complement or sign-magnitude ones mentioned above.
Incidentally, according to the C89 Rationale, one of the reasons the Committee decided to have short unsigned types promote consistently to signed int is that the majority of current implementations would treat “uint1 = ushort1 * ushort2;” as though the operands were unsigned, even when the result is between INT_MAX+1u and UINT_MAX. Any idea why the authors would have given that as a reason that if they didn’t expect that the fraction of implementations behaving that way would, if anything, increase?
Thanks Paul! This is fixed in our repo now, will propagate to arXiv next time we refresh.
I don’t think Proebstingâ€™s Law is meant to be anything more than a suggestion to consider whether various kinds of optimization improvements are worth the effort. If it were trying to measure anything, it should compare the time required to perform a task using the fastest possible machine code that would meet requirements to the time required to process it with code an optimizer actually produces, to establish an upper bound on the fraction of a program’s execution time that could be eliminated while still meeting requirements.
I think Proebstingâ€™s Law both overestimates and underestimates compiler improvements, based upon whether one is measuring the best machine code an implementation can produce when given source code that exploits any useful features and guarantees it offers, or the machine code an implementation can produce when given source code that is not tailored toward it in any way. A modern compiler given something like “(x >> y) | (x << ((32-y) & 31)" may generate code that is six times as fast as what would have been produced by a 1990s compiler that guaranteed that "x << 32" will yield either x or 0, chosen in Unspecified fashion, but it would be no better than what the 1990s compiler would have produced if fed code that exploits the behavioral guarantee that nearly all platforms will facilitate at zero cost.
While there might be some applications fields where highly aggressive optimizations are useful, there are many situations where adding more "hinting" directives could have a more useful effect. In many cases, for example, a programmer will know that a "search" loop will be likely to receive data that will allow it to exit very early, or will be likely to receive data that lets it run to completion without finding anything, but a compiler will have no way of knowing. Letting a programmer specify an expected average number of iterations would let an implementation generate code that's optimized for that. If a programmer predicts a loop will usually run 2-4 iterations, but on one particular occasion it happens to run 4,000,000, performance might be an order of magnitude slower than if the prediction had been correct. On the other hand, if every time the loop runs 4,000,000 iterations is balanced by a billion where it runs only twice, the time saved by foregoing vectorization setup on loops that exit almost immediately may outweigh the time lost by foregoing it on longer loops that run to completion.
If one judges compilers by how efficiently they can perform tasks when the programmer "helps them out", optimizations based on hinting directives would be tasty low-hanging fruit,. Trying to perform more complicated analysis while not bothering to look for low-hanging fruit seems premature (i.e. the root of all evil)
In my opinion, the notion that compilers may act as if undefined behavior can never happen is horribly damaging to safe programming. I believe that “optimizationism” (the principle that compiling code sequences into short and pretty assembly language trumps all other considerations) has distorted the proper specification of programming language behavior, leading to useless “object models,” uselessly under-specified behavior (such as order of evaluation), and useless “functions” like std::launder in C++. (Some experts in C++ will tell you, for example, that they believe that it is impossible to write std::vector in standard C++.)
In C and C++ programmers have always behaved with a notional “object model” that objects are bags of bits, and you get out whatever you put in. There are millions of lines written to that expectation, and all of these brilliant new optimizers that adhere to the de jure instead of de facto standard break old, working code.
Well-written C programs–even those performing low-level tasks–should not be reliant upon compilers executing code exactly as specified and keeping the abstract machine synchronized perfectly with the physical one at all times. Instead, they should ensure that compilers that make a bona fide effort to identify places where such synchronization is required will be able to recognize them.
Further, when processing those rare applications where it is possible to absolutely 100.000% guarantee that all inputs will be valid, precondition-based optimizations may be genuinely useful even though violation of those preconditions could reformat hard drives, launch nasal demons, etc. Because I don’t write such applications, a compiler that would be unsuitable for anything else would be useless to me, but that doesn’t mean research in them won’t be useful to other people, *especially if preconditions could be specified explicitly by programmers rather than implicitly generated based upon UB* [IMHO, the latter point is where research should be directed, since such explicit preconditions are in just about every way imaginable better than implicit ones].
What’s unfortunate is that neither compiler writers nor programmers seem able to grasp the fact that different implementations are, and should be, suitable for different purposes. The fact that a compiler may be useless for some purposes unless it behaves predictably in some case doesn’t mean the Standard should mandate that all implementations support such a behavior. On the other hand, the fact that the Standard fails to mandate a behavior does not imply any judgment as to whether an implementation could be suitable for any particular purpose without supporting it.
@Payson IIRC the standards go into great detail about what behaviors of the abstract machine are and are not observable and the compiler is only required to ensure that the actual machine exhibits the same observable behaviors *as if* the code were executed on the the abstract machine.
FWIW: I have no issue with compilers that in fact exhibit stable predictable behavior under UB. I just consider any code that exploits that (particularly if it does so *intentionally*) to be bad enough that nothing else matters. As such, I view any effort to build such a compiler to be a pointless exercise and have no interest in it.
OTOH, work to extend what can be done without provoking UB (as defined by the standard, not the tool-chain), via things like intrinsics, seems like a very valid and valuable area for R&D.
Thanks for this article. 🙂
Considering that you mention superoptimizers, here is one concern I have with them: Reproducible builds. I think there is a lot of value in being able to produce an *identical* binary from the same source code on two different machines — it provides a way to make sure that a given binary is actually produced from the sources it claims to originate from (which is why e.g. Tor uses it). The Debian Reproducible Builds effort is a great example of how far one can get on that avenue with today’s technology.
With superoptimizers, if search is controlled e.g. by a timeout, we lose any hope of compiling in a deterministic fashion. Are you aware of any work on superoptimizers that attempts to preserve the ability to have a reproducible build? I could imagine either some deterministic criterion to abort the search (number of queries, or so), or something where the result of the search is committed to the source code repository.
Compilers subset ISAs. This was noticed at least as far back as Patterson Ditzel (“compilers are often unable to utilize complex instructions”). It’s probably a good idea to raise this subsetting from implicit and ad hoc to part of the ISA formal model.
Indeed, do you really want a formal model of all of x86? The AAD instruction, ASCII Adjust AX Before Division, is still valid in 32b mode. But that’s just cruft to a compiler. Do you really want x87 floating point? It’s really hard to formalize x87 and the guy at Transmeta who tried almost had a nervous breakdown.
After aggressively subsetting x86, it looks more like a variable length RISC ISA. Use the ARM v8 Machine Readable Specification as a starting point and try to map x86_64 onto that. The alternative is waiting for a formal specification of all of x86 which will never happen.
BTW, if it’s hard to formalize an instruction then it’s a really good candidate for omission. And cheating by looking at the ARM v8 MRS is allowed. Other ISAs can follow suit.
Ralf, I agree about deterministic compiles — and these aren’t the only argument about using solvers at compile time. Solvers are also very memory-hungry and are not fast. I believe that most of the time, solver results will be cached or otherwise used indirectly, this indirection has the side effect of fixing the determinism problem.
In the paper you said about Hamming weights computations. It’s very interesting. Is it possible nowadays to detect another algorithms in sources and optimize them? E.g. since C++17 we have std::gcd. But in old codebases we have a lot of different implementations of this algorithm. And compiler cannot replace them by std::gcd (if we compiler with -std=c++17) beacause cannot recognize what is function do. How can we help here with recognition? Can we (ab)use C++ contracts as hint for the compiler?
Ralf: there are many SAT/SMT solvers (& model checkers) that support “deterministic timeouts”. They can count units of work instead of seconds.
Notably hardware model checkers introduced such capabilities in the 90s precisely because customers of EDA tools wanted reproducible results.
I have this uncomfortable feeling you are diving deeper and deeper into the wrong rabbit hole.
ie. You believe the only path forward is to implement better and better optimisers that output faster and faster code for existing language standards.
Yet the most obvious (to the working programmer) benefit of the improving optimisers over the years, is _not_ faster code, but better warnings.
ie. The deeper dataflow analysis of better optimisers enable the compiler to better detect defective source code.
Up to now, this has mostly been a pleasant, but unintended side effect.
This really really should be making everybody in the field sit up and think, how can we do this do a lot better on this front?
How can we create a two way flow of information between the programmer and the optimiser to communicate what each side knows and believes about the code.
**Even if it means add small tweaks to the language standard to do so!**
gcc has been adding function and variable attributes that permit the programmer to communicate to the compiler optimizing code referencing those symbols. eg. “pure”, “const”, ….
Alas, last I looked (gcc 7.3) the current implementation had the near fatal flaw, that even if the optimiser can conclude from compiling the implementation, that the attribute wasn’t justified, it doesn’t warn the programmer that he got it wrong.
ie. It makes the function attributes a hazardous tool to use.
Reachability and uninitialized variables are a curious one.
gcc will warn you if it knows a statement is unreachable. That “knowledge” clearly depends on optimization level. ie. You will often only get a warning it you’re at -O2 or above.
Another “tool” is __builtin_unreachable(), it will alter what the optimizer outputs, but again, it’s a one way conversation, you cannot get gcc to warn you if that point _is_ indeed provably reachable .
I would love to see the concept of a precondition baked into an optimizers understanding of the code.
ie. Everything _after_ the precondition should be optimized under the assumption that the precondition expression _will_ hold.
Everything _before_ the precondition should warn if on some code path the precondition could fail.
When a function is being compile as a standalone object file, the compiler can’t tell.
But as the compiler (and the linker now with LTO) inlines more and more code…. it often _can_ and _does_ know that preconditions have been violated!
Ralf: People don’t actually want reproducible builds, rather they want a verification that a given binary is a valid translation of a given source. It just turns out the most direct solution to doing that with normal tools is to fully reproduce the exact same build process and compare the results.
Another solution to that problem it to publish your binary along with enough artifacts from your building it that someone else can re-generate it: That way you don’t need to bother with a fully deterministic build solution at all, but rather you enable proving that it is valid to convert the published source into the eventual binary by handing them a trace of all the useful inspection and transformation steps you did (“note that A is true because of X, Y and Z”, “Apply transformation #1 at point ‘label_42’, justified by ‘A’”, etc.) If your compiler is valid, then (with or without such a script) it should only ever produce valid translations, which (if it matches the published binary) proves that the published binary it self is a valid translation.
If the tools supported that, then this would have other advantages as well: Because the non-determinism is entirely in the “deciding what to do”, and that stage has a lot of useless effort that can be avoided in a verification process (it’s way simpler to walk someone thought a proof than to find it in the first place). Also, aside from verification, using the last builds results as a crib sheet for where to look first may be useful for speeding up builds of even new sources if they aren’t too different from the last build.
Hi Alexander, I’m not sure about contracts, but my belief is that solvers will be instrumental in updating older codes to take advantage of modern idioms such as std::gcd. This could be part of a source-level refactoring tool instead of an optimizing compiler.
John, the state of the art in compiler warnings and similar static analyses has made a giant amount of progress in the last 15-20 years, and it is great stuff. It also happens to be orthogonal to optimization, which is what I’m writing about here. I certainly do not believe that optimizations are the only path forward and have never said this.
Benjamin, people sometimes do want reproducible builds, as a separate thing from correct compiles.
Very interesting read. A couple of minor nits… “As of fall 2018…” To an international audience, it is not obvious when this refers to. “…such as fun addressing modes…” I’m not sure which ones are considered “fun.”
@John Carter, as you may have discovered in your GCC discussions, “if (!expr) __builtin_unreachable();” is one way to teach GCC assumptions. However, as you’ve said, it won’t warn you if it knows you made a mistake.
There is a similar problem Ã reproducibility (??) in static analysis tooling (e.g. Codepeer, Spark2014, cbmc) where you set timeouts and ‘give up’ a path in the analysis or give up on a solver. It can give different results from machine to machine, loaded or not, etc.
One solution is to define ‘steps’ instead of ‘timeouts’. You break everything in ‘steps’ and stop after a defined number. You might need to calibrate once what ‘timeout’ is acceptable for you and infer the ‘steps’ value :-).
Nuno, Lionel: Thanks, I was not aware that SAT/SMT solvers already do this. That’s great. 🙂
Benjamin: Reproducibility has other advantages (e.g. for compiler developers, it is easier to debug compiler performance if the compiler is deterministic), but I agree reproducibility is not necessary for verifying compilation. What you describe sounds like an extreme form of “let’s store solver output during compilation and make that an input to the second translation to get the same result”. That works, but of course it is nicer if we don’t have to also manage that extra piece of state and communicate it between all involved parties.
> Benjamin, people sometimes do want reproducible
> builds, as a separate thing from correct compiles.
Yes, an extreme example: After PalmSource sold the name “Palm” to Palm, we had to release a new version of the development kit which changed only some user-visible strings. If our build engineers could have released a new build that passed binary comparison except for those strings, I could have done a lot less testing, but ended up missing an unexpected social engagement. (They couldnâ€™t, for procedural documentation reasons.)
A less extreme example: The compiler lead had discovered a forward-compatibility bug through pure reason after code freeze. We were fairly confident in the determinism of our compiler build system and in our automated tests, so we didnâ€™t feel obliged to redo _all_ of the manual testing, and I didnâ€™t need to cancel my trip to meet my new mother-in-law in the Ukraine.
I liked some of the comments above about an ultra-slow massively-optimising final build; normally I care about how fast the compiler itself compiles only for the benefit of our engineersâ€™ code-build-oops-debug cycle, and would be happy to set up another Jenkins server devoted to a slower but better build. But if it differed enough I would need to think hard about the testing implications.
At the beginning of Section 3 in arXiv:1809.02161, `((x <> 31) + 1` is proposed as an inefficient idiom to flip and return the lowest bit of `x`. If `x = 1`, then the result should be zero according to the text description and the expression `~x & 1` . Yet `((1 <> 31) + 1) == 2`. I think `((x + 1) <> 31` is the correct expression (and there is no overflow if `x` is signed or `x` and if `x` has exactly 32 bits and so on and so forth).
In Section of arXiv:1809.02161, it says that
> After compiling LLVM itself (3.5 MSLOC of C++) Souperâ€™s cache occupies
> 362 MB of RAM; when dumped to disk, the file is 149 MB. This is
> acceptable for a research prototype but it isnâ€™t the solution that we
> want to deploy.
Please provide reference points for what is small and what is large. Compiling small projects with C++ header-only libraries can easily consume several hundred megabytes and as far as I remember, when I was still building TensorFlow (written in C++) on a regular basis, the C++ compiler would often consume more than 500 MB. One reason is certainly that the TensorFlow developers follow the Google C++ Style Guide requiring self-contained headers and forbidding forward declarations .
I would like to see three more aspects discussed:
* Similar to John Carter’s point of view, I find that assert-like functions and enforcements in C++ are essential development tools but are treated like any other statement in the code.
* C and C++ are full of undefined and unspecified behavior and it would be interesting to have diagnostics when these two behaviors are exploited by the optimizer.
* Execution time requirements of the compiler. Short of unicorn compilers that finished by the time my finger left the “enter” key, I am perfectly ok with my compiler running up to a week on a
state-of-the-art computer for /release/ builds but during development, instant code generation for `-O1` optimization level is absolutely desirable so that the developer does not idle.
The code at the start of section 3 is correct as described. The value 1, shifted left by 31, gives INT_MIN. Shifting right by 31 gives -1, and adding 1 gives 0. We could quibble about the definedness of left shifting a 1 into or past the sign bit, but that doesn’t seem to be what you are describing.
Regarding “small” and “large” we were deliberately vague about the context, but it seems clear that a superoptimizer database that is multiple GB will be inconvenient, and since there’s no technical reason for this to exist (outside of our research prototype) we should avoid it.
Regarding the other points you bring up, these would all be interesting, but our document ended up being pretty long already and we had to cut off the discussions somewhere.
> The code at the start of section 3 is correct as described.
You are right. I assumed unsigned arithmetic although x is described as signed 32-bit integer in the second line of Section 3.
> Regarding “small” and “large” we were deliberately vague about the context, but it seems clear that a superoptimizer database that is multiple GB will be inconvenient
My understanding of Souper is that it operates in a fully automated fashion, that it generates only provably correct peephole optimizations for any front-end and any target architecture, and it increases run-time only by 10% compared to `-O3`. This comes at the cost of 360 MB of memory per 3.5 MSLOC. Compare these items to machine learning (ML): Training models is time and resource demanding but if done properly, many insights can be acquired much cheaper and much faster than by employing human analysts. I claim that the economics are already in favor of Souper deployment.
Hi Christoph, unfortunately the 10% slowdown is only for the case where we get 100% cache hits. If we are forced to run synthesis during compilation (to fill misses in Souper’s cache) the compiler slowdown can be enormous (depending on how hard we want synthesis to try).
Anyhow, I totally agree with you in principle, but also fast compile times seem to have not gone away as something that software developers really, really like, so when I write this kind of document I’m trying to be sensitive to that.