Type Coercions and Floating-Point Types

Consider a language where we only allow automatic, implicit type conversions (coercion) among numeric types if every value of the source type can be represented as a value of the target type, i.e., there is no truncation and no round-off. Let us call this kind of coercion value-preserving coercion. With such a strict coercion rule, an unsigned integer with four bytes can be coerced into a signed integer with eight bytes but the compiler (interpreter) will not coerce signed integers to unsigned integers. In this post, we highlight a reason why coercions from single-precision to double-precision floating-point types may be undesirable although they are value preserving.

An IEEE 754 single-precision float has 24 bits of precision whereas an IEEE 754 double-precision float has 53 bits of precision so from a programmer's perspective, value-preserving coercion is simple:

• coercion from floats to integers is always forbidden because of fractional values,
• signed and unsigned integers with 8 or 16 bits can be coerced to single-precision floats,
• signed and unsigned integers with 8, 16, or 32 bits can be coerced to double-precision floats, and
• single-precision floats can be coerced to double-precision floats.

These rules are simple and intuitive.
[The number of bits in the precision refers to normalized floats. A floating-point variable representing integer values is always normalized.]

We will now highlight a scenario where coercion from single to double precision is undesirable. Let `x` and `y` be positive real numbers such that `x+y` is an accurate solution to a given problem and let `x` and `y` have similar modulus. Naturally, if both `x` and `y` are accurate to seven decimal digits (about 24 binary digits), then `x+y` will be accurate to about seven digits, too; if both `x` and `y` are accurate to 16 decimal digits (about 53 binary digits), then `x+y` will be accurate to about 16 digits, too. Now, if `x` is accurate to seven decimal digits and `y` is accurate to 16 decimal digits, then `x+y` will be accurate to about seven digits. The observation in this simple example can be extended to more complex problems involving vector-valued quantities and vector norms. Thus, from the point of view of a numerical analyst it makes sense to coerce double-precision to single-precision floats when combining these two types.

Both lines of thought made their way into programming languages. For example, the following Python 2 code using NumPy prints `<type 'numpy.float64'>` meaning the single-precision float was coerced to a double-precision float:

```import numpy
x = numpy.float32(1)
y = numpy.float64(1)
print type(x*y)```

Contrast this behavior with the following Matlab snippet:

```x = single(1);
y = double(1);
disp(class(x*y));```

It prints `single` implying a coercion from double to single precision.

Let us call a calculation where we use single-precision and double-precision floats as mixed-precision calculation. Computations with double-precision floats are decidedly more expensive than computations with single-precision floats, so we can assume that mixed-precision calculations by a numerical analyst must be intentional. Consequently, I argue that coercion from single-precision to double-precision floats is undesirable for a numerical analyst because it may hide unintentional mixed-precision calculations. Yet, coercions from double to single precision are undesirable, as well, because they are not value preserving. Hence, explicit type conversions between single- and double-precision floating-point types seem to be the only safe option reconciling the justified but contradictory points of view of programmers and numerical analysts.

As an example for an unintentional mixed-precision computation, consider the following piece of code, where `x` is a single-precision float:

`0.5 * x`

The value 0.5 can be represented exactly by all floating point types (with radix 2) so the programmer's intent may be a multiplication of two single-precision floats. Nevertheless, many programming languages like Python, C, C#, and Java treat `0.5` as a double-precision constant (`0.5f` would be the corresponding single-precision constant) and in conjunction with coercion, `0.5 * x` will be a double-precision float. Matlab avoids the problem by interpreting `0.5` as a double-precision value and coercing to single precision if needed. Haskell avoids the problem, too:

```Prelude> let x = 1 :: Float
Prelude> let y = 1 :: Double
Prelude> :t 0.5
0.5 :: Fractional a => a
Prelude> :t 0.5 * x
0.5 * x :: Float
Prelude> :t 0.5 * y
0.5 * y :: Double```

Here, we executed statements in the Haskell interpreter GHCi. `:t` prints the type of the following expression and as we can see in line 4, the constant `0.5` is of neither a Double nor a Float and it is coerced as necessary in the expressions `0.5 * x` and `0.5 * y`.