This week we finished up the triangle classifier. Not all students’ implementations pass all test cases, which is totally fine with me as long as people learned some lessons about when not to use floating point (I did). On Tuesday we also chose a smallish UNIX utility for each student to evaluate as if it were going to be used as mission-critical software (not sure how our lives would depend on `less` but that’s beside the point). Next Tuesday we’ll read some students arguments and decide whether the evidence is convincing.

Thursday I lectured about assertions, one of my favorite topics. For several years I’ve been avoiding writing a blog post about assertions because there’s plenty of good information out there, but I really do need to write that post now that I’ve sort of collected all my thoughts in one place. The triangle classifier offered poor opportunities for writing assertions, but later on we’ll play with codes where assertions are basically mandatory. Anyway, I’ll try to get that written up soon.

Hello, John.

I am not sure what you mean by “some lessons about when not to use floating point”, so my comment may not apply here, but as a general remark inspired by what this phrase could mean:

The problem with floating-point is that people start with a vague overconfident intuition of what should work, and progressively refine this intuition by removing belief when they are bitten by implementations not doing what they expected.

There aren’t many sub-fields of computer science that are approached this way. A few years ago, one could perhaps name “concurrency”.

It is not a good learning process: at any point in time, one’s knowledge is only the bunch of assumptions one hasn’t tested yet. This is no way to build bridges.

_____

This said, I wanted to submit a single-precision floating-point implementation for the triangle classifier, and I didn’t, because I could not convince myself that it worked or have the time to find an easy tweak it so that I would be sure that it worked.

Hi Pascal, I had been guessing that since a point coordinate in this problem fits into 31 bits, almost any not-stupid FP implementation using doubles (with 52 bits or whatever of mantissa) would be able to solve the problem precisely. But that turned out to be wrong, as I probably would have seen if I were a little further along the progression you mention here. I don’t use FP very much and the one course in numerical analysis that I took in college was not very good, alas.

I would be very interested to see the single-precision solution.

The source of the problems with double precision was that, when the coordinates are between 0 and n, there are approximately 3n^2/pi^2 different possible slopes/angles that are between 0 and 45 degrees, and n is 2^31-1. The test cases that I dropped in the discussion required solutions to distinguish between two angles differing by much less than DBL_EPSILON = 2^-52. The 64-bit significands of the available long double type seemed only barely sufficient (per my handwritten proof) for my FP solution, and the cargl() variant is probably incorrect given an uncooperative cargl().

If n had been, e.g., one million ~ 2^20, things would have worked out better for the FP solutions.

My single-precision code is at:

http://ideone.com/1uBk3Z

The first part of the file contains functions borrowed from CRlibm. The naming convention is like this:

OpD1D2D3[Cond] implements the operation Op. Arguments and results are represented as sums of single-precision floating-point numbers, in the style of http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic

The last digit D3 is the number of floats used to represent the result. The other digit(s) represent the number of floating-point numbers used to represent the arguments.

For those functions that are easier to implement knowing which of the arguments is larger, the Cond version works for all cases, and non-Cond versions expect the largest argument to be passed first.

The versions in CRlibm are for double-precision numbers, and are well documented. I did not dare copy the documentation because I did not know how to adapt the detailed accuracy specifications.

Two single-precision floating-point numbers give you 48 bits of precision, enough to represent one coordinate or the subtraction of two such coordinates. Three single-precision floating-point numbers provide 72 bits of precision, enough to represent the product of one coordinate or difference by another.

The second part of the file is function parse(), which converts the decimal representation of a number between 0 and 2^31-1 to the sum of two floats. After having parsed a few digits in one float, it moves to a second float while computing the power of ten by which the first float needs to be multiplied. The final result is first_float * power_of_ten + second_float, computed with the functions of the first part.

All natural numbers up to 2^24 (16777216 ) can be represented as floats. Because (5 ** 7) * 214 = 16718750 is slightly lower than this limit, it seems possible to use a simpler algorithm, parsing the first few digits up to 214 into a first float and then directly multiplying the first float by ten using float multiplication while up to 7 digits are parsed into the second float. The two floats could be put together with a single Add12 operation in the end.

(“214” is the first three digits of 2^31-1, the largest value for the first three digits of a decimal representation of a number below 2^31 that can be followed by 7 more digits)

The third part, the main() function, relies on cross-product to determine whether two vectors defined by sides of the triangle are colinear, computes squares of lengths of sides as the sum of three floating-point numbers. It compares the squares directly for equality, and uses Pythagoras’ theorem to classify the triangle as right, acute or obtuse.

In this third part, it is the comparisons for equality that require more work. According to the comments in CRlibm, some of the more complex operations on composite numbers are approximate to more than one ULP of the last float (meaning that two triple-single numbers that mathematically represent the same number could end up slightly different). This would require further thinking.

“””The problem with floating-point is that people start with a vague overconfident intuition of what should work, and progressively refine this intuition by removing belief when they are bitten by implementations not doing what they expected.

There aren’t many sub-fields of computer science that are approached this way. A few years ago, one could perhaps name “concurrency”.”””

The commonality here, I think, is that the contracts offered by the primitives don’t compose very well. To some extent we’ve figured out better primitives for concurrency, but with FP, we’re pretty much doomed to complexity or lack of performance. It takes a lot of work to get completely correct, efficient FP code for even relatively simple things like the triangle software; in practice, authors of robust numerical software seem content to detect when things have gone seriously wrong.

On the subject of software-testing and floating point is this recent post:

http://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/