Consider a language where we only allow automatic, implicit type conversions (coercion) among numeric types if every value of the source type can be represented as a value of the target type, i.e., there is no truncation and no round-off. Let us call this kind of coercion value-preserving coercion. With such a strict coercion rule, an unsigned integer with four bytes can be coerced into a signed integer with eight bytes but the compiler (interpreter) will not coerce signed integers to unsigned integers. In this post, we highlight a reason why coercions from single-precision to double-precision floating-point types may be undesirable although they are value preserving.
An IEEE 754 single-precision float has 24 bits of precision whereas an IEEE 754 double-precision float has 53 bits of precision so from a programmer's perspective, value-preserving coercion is simple:
- coercion from floats to integers is always forbidden because of fractional values,
- signed and unsigned integers with 8 or 16 bits can be coerced to single-precision floats,
- signed and unsigned integers with 8, 16, or 32 bits can be coerced to double-precision floats, and
- single-precision floats can be coerced to double-precision floats.
These rules are simple and intuitive.
[The number of bits in the precision refers to normalized floats. A floating-point variable representing integer values is always normalized.]
We will now highlight a scenario where coercion from single to double precision is undesirable. Let x
and y
be positive real numbers such that x+y
is an accurate solution to a given problem and let x
and y
have similar modulus. Naturally, if both x
and y
are accurate to seven decimal digits (about 24 binary digits), then x+y
will be accurate to about seven digits, too; if both x
and y
are accurate to 16 decimal digits (about 53 binary digits), then x+y
will be accurate to about 16 digits, too. Now, if x
is accurate to seven decimal digits and y
is accurate to 16 decimal digits, then x+y
will be accurate to about seven digits. The observation in this simple example can be extended to more complex problems involving vector-valued quantities and vector norms. Thus, from the point of view of a numerical analyst it makes sense to coerce double-precision to single-precision floats when combining these two types.
Both lines of thought made their way into programming languages. For example, the following Python 2 code using NumPy prints <type 'numpy.float64'>
meaning the single-precision float was coerced to a double-precision float:
import numpy x = numpy.float32(1) y = numpy.float64(1) print type(x*y)
Contrast this behavior with the following Matlab snippet:
x = single(1); y = double(1); disp(class(x*y));
It prints single
implying a coercion from double to single precision.
Let us call a calculation where we use single-precision and double-precision floats as mixed-precision calculation. Computations with double-precision floats are decidedly more expensive than computations with single-precision floats, so we can assume that mixed-precision calculations by a numerical analyst must be intentional. Consequently, I argue that coercion from single-precision to double-precision floats is undesirable for a numerical analyst because it may hide unintentional mixed-precision calculations. Yet, coercions from double to single precision are undesirable, as well, because they are not value preserving. Hence, explicit type conversions between single- and double-precision floating-point types seem to be the only safe option reconciling the justified but contradictory points of view of programmers and numerical analysts.
As an example for an unintentional mixed-precision computation, consider the following piece of code, where x
is a single-precision float:
0.5 * x
The value 0.5 can be represented exactly by all floating point types (with radix 2) so the programmer's intent may be a multiplication of two single-precision floats. Nevertheless, many programming languages like Python, C, C#, and Java treat 0.5
as a double-precision constant (0.5f
would be the corresponding single-precision constant) and in conjunction with coercion, 0.5 * x
will be a double-precision float. Matlab avoids the problem by interpreting 0.5
as a double-precision value and coercing to single precision if needed. Haskell avoids the problem, too:
Prelude> let x = 1 :: Float Prelude> let y = 1 :: Double Prelude> :t 0.5 0.5 :: Fractional a => a Prelude> :t 0.5 * x 0.5 * x :: Float Prelude> :t 0.5 * y 0.5 * y :: Double
Here, we executed statements in the Haskell interpreter GHCi. :t
prints the type of the following expression and as we can see in line 4, the constant 0.5
is of neither a Double nor a Float and it is coerced as necessary in the expressions 0.5 * x
and 0.5 * y
.