A Month of Invalid GCC Bug Reports, and How to Eliminate Some of Them

During July 2016 the GCC developers marked 38 bug reports as INVALID. Here’s the full list. They fall into these (subjective) categories:

8 bug reports stemmed from undefined behavior in the test case (71753, 71780, 71803, 71813, 71885, 71955, 71957, 71746)
1 bug report was complaining about UB exploitation in general (71892)
15 bug reports came from a misunderstanding (or disagreement) about the non-UB semantics of a programming language, usually C++ but also C and Fortran (71786, 71788, 71794, 71804, 71809, 71890, 71914, 71939, 71963, 71967, 72580, 72750, 72761, 71844, 71772)
4 bug reports stemmed from a misunderstanding of something besides the language semantics, such as command line flags (72736, 71729, 71995, 71777)
5 bug reports were caused by an unrelated problem on the reporter’s system such as an out-of-memory error, a borked Cygwin installation, out-of-date files in a build tree, etc (71735, 71770, 71903, 71918, 71978)
1 bug report was about a bug that the devs didn’t want to fix since it was in an inactive branch and had been fixed in all active branches (72051)
4 bug reports didn’t end up demonstrating any reproducible problem (71940, 71944, 71986, 72076)

I’ve often thought that it would be nice for a compiler bug reporting system to be active instead of passively serving up files and discussions. An active bug reporting system would be able to run:

a wide variety of compiler versions,
the compiler’s output, and
tools for finding undefined behaviors.

Of course not all bug reports would be able to make use of these features. However, one can imagine that there is a significant subset of compiler bug reports where the reporter, cooperating with the system, would be able to conclusively demonstrate that the compiler crashes or generates wrong code. In cases where this cannot be demonstrated, the process of interacting with an active bug reporting system will help the reporter understand what the actual issue is without wasting a compiler developer’s time.

An active bug reporting system can run lots of experiments to determine how many compiler versions, how many target platforms, and how many optimization levels are affected by the bug. It can also determine which revision introduced the problem and who committed the breaking change — suggesting an initial owner for the bug. It can run a testcase reducer to make the program triggering the bug smaller. All of these things will help compiler developers prioritize among reported bugs. The system should also be able to automatically flag duplicate bug reports. When a bug stops reproducing, the bug reporting system will notice this and flag the PR as being ready to close, and could also help out by packaging up an addition to the regression test suite.

Update:

A few additional details. During July a total of 328 bugs were reported, ignoring those marked as spam. 143 of these were resolved: 22 as duplicates, 81 as fixed, 38 as invalid, 1 as wontfix, and 1 as worksforme. Out of the remaining 185 unresolved bugs, 15 are assigned, 86 are new, 1 is reopened, 74 are unconfirmed, and 9 are waiting.

I believe that an active bug reporting system will make many of these 290 non-invalid bug reports easier to deal with, as opposed to only helping with the invalid ones!

Note: In the initial version of this post I mentioned 36 invalid bugs, not 38, because I was only searching for bugs that were marked as resolved. Also searching for closed bugs brings the total to 38.

August 15, 2016

regehr

Compilers, Computer Science, Software Correctness

7 responses to “A Month of Invalid GCC Bug Reports, and How to Eliminate Some of Them”

Peter Maydell says:

August 15, 2016 at 11:22 am

Assuming I didn’t drive bugzilla wrong, those 36 bug reports are part of a total of 328 bugs opened during that time frame, so about 10% of the total.
regehr says:

August 15, 2016 at 11:50 am

Peter, right! I should figure out what happened to the rest of these– I’ll do that and update the post.
octoploid says:

August 15, 2016 at 1:47 pm

Yes, no doubt such a system would be very nice to have.
But who will implement it? It will be a huge effort to get it robust and reliable.

The last time I mentioned this idea on the #gcc IRC channel, the reply was something like:
“why not make it automatically fix the compiler, too?”
regehr says:

August 15, 2016 at 4:31 pm

Hi octoploid, indeed there are plenty of computer scientists working on automatic bug fixing, I suppose it is only a matter of time until they turn their efforts towards compilers.
Seo Sanghyeon says:

August 16, 2016 at 6:10 am

PHP has such active system, https://3v4l.org/ which tests more than 150 different versions of two independent implementations (original PHP and HHVM). 3v4l.org is officially endorsed by HHVM developers: https://github.com/facebook/hhvm/wiki/How-to-Report-Issues
regehr says:

August 17, 2016 at 1:00 am

Seo, thanks for the links, this is great stuff.
Anton Ertl says:

August 27, 2016 at 10:15 am

I found the large number of “invalid” bugs interesting, as I did something similar last year and found only three. Looking at my writeup explains the difference:

“Of the 25 bug reports for gcc components rtl-optimization and tree-optimization resolved or closed between 2015-07-01 and 2015-07-16, three were marked as invalid, and all three were due to “optimizations”, none due to optimizations”

So the difference is due to the components; you did not restrict that, and from the 8 reports “from undefined behavior in the test case”, only one was reported against the components above, most were against components C or C++, and one against libstdc++. Looking at these 8 reports:

71753 (integer overflow), 71803 (misleading warning from integer overflow), 71885 (memset() “optimized” away) are compiler bugs in my book. 71957 (something involving C++ virtual functions): code that worked for years is broken by a recent gcc version. I don’t undertstand enough about the program to understand what is happening, but gcc certainly does not follow the principle “we don’t break the user experience” there.

71780 (comparison does not satisfy contract), 71813 (out-of-scope access to local), 71746 (out-of-scope access to a local), 71955 (uninitialized data) look like programming bugs that usually show up independent of optimization or compiler version (but I don’t understand the code for 71780 or 71955 enough to tell for sure), so I tend to see these bugs as really invalid. Note that none of the reporters mention that the program has worked in earlier versions of gcc.

As for automating bug handling, there is not that much effort required: If the test program terminates, close the bug report as invalid because it is not a strictly conformant program (at least as far as C programs are concerned); if it does not terminate, close it as invalid because it does not terminate. And if there is no test program, close the bug report as invalid because of the lack of a test program.

More seriously, I expect that the kind of treatment you envision will make the gcc people even less considerate of the people who spend their time on writing bug reports, which will have the effect of deterring people from spending their time on bug reporting; whether that is good or bad depends on your POV.