Consider a language where we only allow automatic, implicit type conversions (coercion) among numeric types if every value of the source type can be represented as a value of the target type, i.e., there is no truncation and no round-off. Let us call this kind of coercion *value-preserving coercion*. With such a strict coercion rule, an unsigned integer with four bytes can be coerced into a signed integer with eight bytes but the compiler (interpreter) will not coerce signed integers to unsigned integers. In this post, we highlight a reason why coercions from single-precision to double-precision floating-point types may be undesirable although they are value preserving.

An IEEE 754 single-precision float has 24 bits of precision whereas an IEEE 754 double-precision float has 53 bits of precision so from a programmer's perspective, value-preserving coercion is simple:

- coercion from floats to integers is always forbidden because of fractional values,
- signed and unsigned integers with 8 or 16 bits can be coerced to single-precision floats,
- signed and unsigned integers with 8, 16, or 32 bits can be coerced to double-precision floats, and
- single-precision floats can be coerced to double-precision floats.

These rules are simple and intuitive.

*[The number of bits in the precision refers to normalized floats. A floating-point variable representing integer values is always normalized.]*

We will now highlight a scenario where coercion from single to double precision is undesirable. Let `x`

and `y`

be positive real numbers such that `x+y`

is an accurate solution to a given problem and let `x`

and `y`

have similar modulus. Naturally, if both `x`

and `y`

are accurate to seven decimal digits (about 24 binary digits), then `x+y`

will be accurate to about seven digits, too; if both `x`

and `y`

are accurate to 16 decimal digits (about 53 binary digits), then `x+y`

will be accurate to about 16 digits, too. Now, if `x`

is accurate to seven decimal digits and `y`

is accurate to 16 decimal digits, then `x+y`

will be accurate to about *seven* digits. The observation in this simple example can be extended to more complex problems involving vector-valued quantities and vector norms. Thus, from the point of view of a numerical analyst it makes sense to coerce double-precision to single-precision floats when combining these two types.

Both lines of thought made their way into programming languages. For example, the following Python 2 code using NumPy prints `<type 'numpy.float64'>`

meaning the single-precision float was coerced to a double-precision float:

import numpy x = numpy.float32(1) y = numpy.float64(1) print type(x*y)

Contrast this behavior with the following Matlab snippet:

x = single(1); y = double(1); disp(class(x*y));

It prints `single`

implying a coercion from double to single precision.

Let us call a calculation where we use single-precision and double-precision floats as *mixed-precision* calculation. Computations with double-precision floats are decidedly more expensive than computations with single-precision floats, so we can assume that mixed-precision calculations by a numerical analyst must be intentional. Consequently, I argue that coercion from single-precision to double-precision floats is undesirable for a numerical analyst because it may hide unintentional mixed-precision calculations. Yet, coercions from double to single precision are undesirable, as well, because they are not value preserving. Hence, explicit type conversions between single- and double-precision floating-point types seem to be the only safe option reconciling the justified but contradictory points of view of programmers and numerical analysts.

As an example for an unintentional mixed-precision computation, consider the following piece of code, where `x`

is a single-precision float:

0.5 * x

The value 0.5 can be represented exactly by all floating point types (with radix 2) so the programmer's intent may be a multiplication of two single-precision floats. Nevertheless, many programming languages like Python, C, C#, and Java treat `0.5`

as a double-precision constant (`0.5f`

would be the corresponding single-precision constant) and in conjunction with coercion, `0.5 * x`

will be a double-precision float. Matlab avoids the problem by interpreting `0.5`

as a double-precision value and coercing to single precision if needed. Haskell avoids the problem, too:

Prelude> let x = 1 :: Float Prelude> let y = 1 :: Double Prelude> :t 0.5 0.5 :: Fractional a => a Prelude> :t 0.5 * x 0.5 * x :: Float Prelude> :t 0.5 * y 0.5 * y :: Double

Here, we executed statements in the Haskell interpreter GHCi. `:t`

prints the type of the following expression and as we can see in line 4, the constant `0.5`

is of neither a Double nor a Float and it is coerced as necessary in the expressions `0.5 * x`

and `0.5 * y`

.