ExactFloat

ExactFloat is a multiple-precision floating point type based on the OpenSSL Bignum library. It has the same interface as the built-in "float" and "double" types, but only supports the subset of operators and intrinsics where it is possible to compute the result exactly. So for example, ExactFloat supports addition and multiplication but not division (since in general, the quotient of two floating-point numbers cannot be represented exactly). Exact arithmetic is useful for geometric algorithms, especially for disambiguating cases where ordinary double-precision arithmetic yields an uncertain result.

ExactFloat is a subset of the faster and more capable MPFloat class (which is based on the GNU MPFR library). The main reason to use this class rather than MPFloat is that it is subject to a BSD-style license rather than the much more restrictive LGPL license.

It has the following features:

- ExactFloat uses the same syntax as the built-in "float" and "double" types, for example: x += 4 + fabs(2*y*y - z*z). There are a few differences (see below), but the syntax is compatible enough so that ExactFloat can be used as a template argument to templatized classes such as Vector2, VectorN, Matrix3x3, etc.

- Results are not rounded; instead, precision is increased so that the result can be represented exactly. An inexact result is returned only in the case of underflow or overflow (yielding signed zero or infinity respectively), or if the maximum allowed precision is exceeded (yielding NaN). ExactFloat uses IEEE 754-2008 rules for handling infinities, NaN, rounding to integers, etc.

- ExactFloat only supports calculations where the result can be represented exactly. Therefore it supports intrinsics such as fabs() but not transcendentals such as sin(), sqrt(), etc.

Syntax Compatibility with "float" and "double"

ExactFloat supports a subset of the operators and intrinsics for the
built-in "double" type.  (Thus it supports fabs() but not fabsf(), for
example.)  The syntax is different only in the following cases:

- Casts and implicit conversions to built-in types (including "bool") are
not supported.  So for example, the following will not compile:

ExactFloat x = 7.5;
double y = x;            // ERROR: use x.ToDouble() instead
long z = x;              // ERROR: use x.ToDouble() or lround(trunc(x))
q = static_cast<int>(x); // ERROR: use x.ToDouble() or lround(trunc(x))
if (x) { ... }           // ERROR: use (x != 0) instead

- The glibc floating-point classification macros (fpclassify, isfinite,
isnormal, isnan, isinf) are not supported.  Instead there are
zero-argument methods:

ExactFloat x;
if (isnan(x)) { ... }  // ERROR: use (x.is_nan()) instead
if (isinf(x)) { ... }  // ERROR: use (x.is_inf()) instead

Using ExactFloat with Vector3, etc.

ExactFloat can be used with templatized classes such as Vector2 and Vector3 (see "util/math/vector.h"), with the following limitations:

- Cast() can be used to convert other vector types to an ExactFloat vector type, but not the other way around. This is because there are no implicit conversions from ExactFloat to built-in types. You can work around this by calling an explicit conversion method such as ToDouble(). For example:

typedef Vector3<ExactFloat> Vector3_xf; Vector3_xf x; Vector3_d y; x = Vector3_xf::Cast(y); // This works. y = Vector3_d::Cast(x); // This doesn't. y = Vector3_d(x[0].ToDouble(), x[1].ToDouble(), x[2].ToDouble()); // OK

- IsNaN() is not supported because it calls isnan(), which is defined as a macro in <math.h> and therefore can't easily be overrided.

Precision Semantics ___________________

Unlike MPFloat, ExactFloat does not allow a maximum precision to be specified (it is always unbounded). Therefore it does not have any of the corresponding constructors.

The current precision of an ExactFloat (i.e., the number of bits in its mantissa) is returned by prec(). The precision is increased as necessary so that the result of every operation can be represented exactly.

Constructors

this
this(ExactFloat b)

////////////////////////////////////////////////////////////////////////// Copy constructor.

this
this(T v)
Undocumented in source.

Members

Enums

RoundingMode
enum RoundingMode

Rounding modes. kRoundTiesToEven and kRoundTiesAwayFromZero both round to the nearest representable value unless two values are equally close. In that case kRoundTiesToEven rounds to the nearest even value, while kRoundTiesAwayFromZero always rounds away from zero.

Functions

abs
ExactFloat abs()
Undocumented in source. Be warned that the author may not have intended to support it.
exp
int exp()

Return the exponent of this ExactFloat given that the mantissa is in the range \[0.5, 1\). It is an error to call this method if the value is zero, infinity, or NaN.

fabs
ExactFloat fabs()

///// Miscellaneous simple arithmetic functions. Absolute value.

isFinite
bool isFinite()

Return true if this value is a normal floating-point number or zero, i.e. it is not infinity or NaN.

isInf
bool isInf()

Return true if this value is infinity (positive or negative).

isNan
bool isNan()

Return true if this value is NaN (Not-a-Number).

isNormal
bool isNormal()

Return true if this value is a normal floating-point number. Non-normal values (zero, infinity, and NaN) often need to be handled separately because they are represented using special exponent values and their mantissa is not defined.

isZero
bool isZero()

Return true if this value is zero (including negative zero).

opAssign
void opAssign(T v)

Construct an ExactFloat from a "double". The constructor is implicit so that this class can be used as a replacement for "float" or "double" in templatized libraries. (With an explicit constructor, code such as "ExactFloat f = 2.5;" would not compile.) All double-precision values are supported, including denormalized numbers, infinities, and NaNs.

opAssign
void opAssign(T v)

Construct an ExactFloat from an "int". Note that in general, ints are automatically converted to doubles and so would be handled by the constructor above. However, the particular argument (0) would be ambiguous; the compiler wouldn't know whether to treat it as a "double" or "const char*" (since 0 is a valid null pointer constant). Adding an "int" constructor solves this problem.

opAssign
void opAssign(T s)

Construct an ExactFloat from a string (such as "1.2e50"). Requires that the value is exactly representable as a floating-point number (so for example, "0.125" is allowed but "0.1" is not).

opBinary
ExactFloat opBinary(ExactFloat b)

Addition.

opBinary
ExactFloat opBinary(ExactFloat b)

Subtraction.

opBinary
ExactFloat opBinary(ExactFloat b)

Multiplication.

opBinary
ExactFloat opBinary(T b)

Support operations with any convertable types.

opBinaryRight
ExactFloat opBinaryRight(T a)

Support operations with any convertable types.

opCmp
int opCmp(ExactFloat b)

Comparison operators (<, <=, >, >=).

opCmp
int opCmp(T b)

Support operations with any convertable types.

opEquals
bool opEquals(ExactFloat b)

Comparison operators (==, !=).

opEquals
bool opEquals(T b)

Support operations with any convertable types.

opOpAssign
ExactFloat opOpAssign(ExactFloat b)
Undocumented in source. Be warned that the author may not have intended to support it.
opUnary
ExactFloat opUnary()

////////////////////////////////////////////////////////////////////////// Unary plus.

opUnary
ExactFloat opUnary()

Unary minus.

roundToMaxPrec
ExactFloat roundToMaxPrec(int max_prec, RoundingMode mode)

Round the ExactFloat so that its mantissa has at most "max_prec" bits using the given rounding mode. Requires "max_prec" to be at least 2 (since kRoundTiesToEven doesn't make sense with fewer bits than this).

scaleAndCompare
int scaleAndCompare(ExactFloat b)
Undocumented in source.
setInf
void setInf(int sign)

Set the value of the ExactFloat to positive infinity (if sign >= 0) or negative infinity (if sign < 0).

setNan
void setNan()

Set the value of the ExactFloat to NaN (Not-a-Number).

setZero
void setZero(int sign)

Set the value of the ExactFloat to +0 (if sign >= 0) or -0 (if sign < 0).

toDouble
double toDouble()

Round to double precision. Note that since doubles have a much smaller exponent range than ExactFloats, very small values may be rounded to (positive or negative) zero, and very large values may be rounded to infinity.

toHash
size_t toHash()
Undocumented in source. Be warned that the author may not have intended to support it.
toString
string toString()

Return a human-readable string such that if two values with the same precision are different, then their string representations are different. The format is similar to printf("%g"), except that the number of significant digits depends on the precision (with a minimum of 10). Trailing zeros are stripped (just like "%g").

toStringWithMaxDigits
string toStringWithMaxDigits(int max_digits)

Return a string formatted according to printf("%Ng") where N is the given maximum number of significant digits.

toUniqueString
string toUniqueString()

Return a human-readable string such that if two ExactFloats have different values, then their string representations are always different. This method is useful for debugging. The string has the form "value<prec>", where "prec" is the actual precision of the ExactFloat (e.g., "0.215<50>").

unsignedLess
bool unsignedLess(ExactFloat b)
Undocumented in source. Be warned that the author may not have intended to support it.

Properties

maxPrec
int maxPrec [@property getter]

////////////////////////////////////////////////////////////////////////// Return the maximum precision of the ExactFloat. This method exists only for compatibility with MPFloat.

prec
int prec [@property getter]

Return the actual precision of this ExactFloat (the current number of bits in its mantissa). Returns 0 for non-normal numbers such as NaN.

sign
int sign [@property getter]

Return +1 if this ExactFloat is positive, -1 if it is negative, and 0 if it is zero or NaN. Note that unlike sign_bit(), sgn() returns 0 for both positive and negative zero.

signBit
bool signBit [@property getter]

Return true if the sign bit is set (this includes negative zero).

Static functions

infinity
ExactFloat infinity(int sign)

Return an ExactFloat equal to positive infinity (if sign >= 0) or negative infinity (if sign < 0).

nan
ExactFloat nan()

Return an ExactFloat that is NaN (Not-a-Number).

numSignificantDigitsForPrec
int numSignificantDigitsForPrec(int prec)

Return an upper bound on the number of significant digits required to distinguish any two floating-point numbers with the given precision when they are formatted as decimal strings in exponential format.

signedZero
ExactFloat signedZero(int sign)

////////////////////////////////////////////////////////////////// Return an ExactFloat equal to positive zero (if sign >= 0) or negative zero (if sign < 0).

Static variables

MAX_EXP
int MAX_EXP;

The maximum exponent supported. If a value has an exponent larger than this, it is replaced by infinity (with the appropriate sign).

MAX_PREC
int MAX_PREC;

The maximum number of mantissa bits supported. If a value has more mantissa bits than this, it is replaced with NaN. (It is expected that users of this class will never want this much precision.)

MIN_EXP
int MIN_EXP;

The minimum exponent supported. If a value has an exponent less than this, it is replaced by zero (with the appropriate sign).

Meta