ExactFloat

ExactFloat is a multiple-precision floating point type based on the OpenSSL Bignum library. It has the same interface as the built-in "float" and "double" types, but only supports the subset of operators and intrinsics where it is possible to compute the result exactly. So for example, ExactFloat supports addition and multiplication but not division (since in general, the quotient of two floating-point numbers cannot be represented exactly). Exact arithmetic is useful for geometric algorithms, especially for disambiguating cases where ordinary double-precision arithmetic yields an uncertain result.

ExactFloat is a subset of the faster and more capable MPFloat class (which is based on the GNU MPFR library). The main reason to use this class rather than MPFloat is that it is subject to a BSD-style license rather than the much more restrictive LGPL license.

It has the following features:

- ExactFloat uses the same syntax as the built-in "float" and "double" types, for example: x += 4 + fabs(2*y*y - z*z). There are a few differences (see below), but the syntax is compatible enough so that ExactFloat can be used as a template argument to templatized classes such as Vector2, VectorN, Matrix3x3, etc.

- Results are not rounded; instead, precision is increased so that the result can be represented exactly. An inexact result is returned only in the case of underflow or overflow (yielding signed zero or infinity respectively), or if the maximum allowed precision is exceeded (yielding NaN). ExactFloat uses IEEE 754-2008 rules for handling infinities, NaN, rounding to integers, etc.

- ExactFloat only supports calculations where the result can be represented exactly. Therefore it supports intrinsics such as fabs() but not transcendentals such as sin(), sqrt(), etc.

Syntax Compatibility with "float" and "double"

ExactFloat supports a subset of the operators and intrinsics for the
built-in "double" type.  (Thus it supports fabs() but not fabsf(), for
example.)  The syntax is different only in the following cases:

- Casts and implicit conversions to built-in types (including "bool") are
not supported.  So for example, the following will not compile:

ExactFloat x = 7.5;
double y = x;            // ERROR: use x.ToDouble() instead
long z = x;              // ERROR: use x.ToDouble() or lround(trunc(x))
q = static_cast<int>(x); // ERROR: use x.ToDouble() or lround(trunc(x))
if (x) { ... }           // ERROR: use (x != 0) instead

- The glibc floating-point classification macros (fpclassify, isfinite,
isnormal, isnan, isinf) are not supported.  Instead there are
zero-argument methods:

ExactFloat x;
if (isnan(x)) { ... }  // ERROR: use (x.is_nan()) instead
if (isinf(x)) { ... }  // ERROR: use (x.is_inf()) instead

Using ExactFloat with Vector3, etc.

ExactFloat can be used with templatized classes such as Vector2 and Vector3 (see "util/math/vector.h"), with the following limitations:

- Cast() can be used to convert other vector types to an ExactFloat vector type, but not the other way around. This is because there are no implicit conversions from ExactFloat to built-in types. You can work around this by calling an explicit conversion method such as ToDouble(). For example:

typedef Vector3<ExactFloat> Vector3_xf; Vector3_xf x; Vector3_d y; x = Vector3_xf::Cast(y); // This works. y = Vector3_d::Cast(x); // This doesn't. y = Vector3_d(x[0].ToDouble(), x[1].ToDouble(), x[2].ToDouble()); // OK

- IsNaN() is not supported because it calls isnan(), which is defined as a macro in <math.h> and therefore can't easily be overrided.

Precision Semantics ___________________

Unlike MPFloat, ExactFloat does not allow a maximum precision to be specified (it is always unbounded). Therefore it does not have any of the corresponding constructors.

The current precision of an ExactFloat (i.e., the number of bits in its mantissa) is returned by prec(). The precision is increased as necessary so that the result of every operation can be represented exactly.

struct ExactFloat {

void opAssign(T v);

void opAssign(T s);

static ExactFloat signedZero(int sign);

static ExactFloat infinity(int sign);

static ExactFloat nan();

int maxPrec [@property getter];

int prec [@property getter];

int exp();

void setZero(int sign);

void setInf(int sign);

bool signBit [@property getter];

int sign [@property getter];

double toDouble();

string toString();

string toStringWithMaxDigits(int max_digits);

string toUniqueString();

size_t toHash();

static int numSignificantDigitsForPrec(int prec);

ExactFloat roundToMaxPrec(int max_prec, RoundingMode mode);

ExactFloat opUnary();

ExactFloat opBinary(ExactFloat b);

ExactFloat opBinary(T b);

ExactFloat opBinaryRight(T a);

ExactFloat opOpAssign(ExactFloat b);

bool opEquals(ExactFloat b);

bool opEquals(T b);

int scaleAndCompare(ExactFloat b);

bool unsignedLess(ExactFloat b);

int opCmp(ExactFloat b);

int opCmp(T b);

ExactFloat fabs();

ExactFloat abs();

}

Constructors

this this(ExactFloat b): ////////////////////////////////////////////////////////////////////////// Copy constructor.
this this(T v): Undocumented in source.

Members

Enums

RoundingMode enum RoundingMode: Rounding modes. kRoundTiesToEven and kRoundTiesAwayFromZero both round to the nearest representable value unless two values are equally close. In that case kRoundTiesToEven rounds to the nearest even value, while kRoundTiesAwayFromZero always rounds away from zero.

Functions

abs ExactFloat abs(): Undocumented in source. Be warned that the author may not have intended to support it.
exp int exp(): Return the exponent of this ExactFloat given that the mantissa is in the range \[0.5, 1\). It is an error to call this method if the value is zero, infinity, or NaN.
fabs ExactFloat fabs(): ///// Miscellaneous simple arithmetic functions. Absolute value.
isFinite bool isFinite(): Return true if this value is a normal floating-point number or zero, i.e. it is not infinity or NaN.
isInf bool isInf(): Return true if this value is infinity (positive or negative).
isNan bool isNan(): Return true if this value is NaN (Not-a-Number).
isNormal bool isNormal(): Return true if this value is a normal floating-point number. Non-normal values (zero, infinity, and NaN) often need to be handled separately because they are represented using special exponent values and their mantissa is not defined.
isZero bool isZero(): Return true if this value is zero (including negative zero).
opAssign void opAssign(T v): Construct an ExactFloat from a "double". The constructor is implicit so that this class can be used as a replacement for "float" or "double" in templatized libraries. (With an explicit constructor, code such as "ExactFloat f = 2.5;" would not compile.) All double-precision values are supported, including denormalized numbers, infinities, and NaNs.
opAssign void opAssign(T v): Construct an ExactFloat from an "int". Note that in general, ints are automatically converted to doubles and so would be handled by the constructor above. However, the particular argument (0) would be ambiguous; the compiler wouldn't know whether to treat it as a "double" or "const char*" (since 0 is a valid null pointer constant). Adding an "int" constructor solves this problem.
opAssign void opAssign(T s): Construct an ExactFloat from a string (such as "1.2e50"). Requires that the value is exactly representable as a floating-point number (so for example, "0.125" is allowed but "0.1" is not).
opBinary ExactFloat opBinary(ExactFloat b): Addition.
opBinary ExactFloat opBinary(ExactFloat b): Subtraction.
opBinary ExactFloat opBinary(ExactFloat b): Multiplication.
opBinary ExactFloat opBinary(T b): Support operations with any convertable types.
opBinaryRight ExactFloat opBinaryRight(T a): Support operations with any convertable types.
opCmp int opCmp(ExactFloat b): Comparison operators (<, <=, >, >=).
opCmp int opCmp(T b): Support operations with any convertable types.
opEquals bool opEquals(ExactFloat b): Comparison operators (==, !=).
opEquals bool opEquals(T b): Support operations with any convertable types.
opOpAssign ExactFloat opOpAssign(ExactFloat b): Undocumented in source. Be warned that the author may not have intended to support it.
opUnary ExactFloat opUnary(): ////////////////////////////////////////////////////////////////////////// Unary plus.
opUnary ExactFloat opUnary(): Unary minus.
roundToMaxPrec ExactFloat roundToMaxPrec(int max_prec, RoundingMode mode): Round the ExactFloat so that its mantissa has at most "max_prec" bits using the given rounding mode. Requires "max_prec" to be at least 2 (since kRoundTiesToEven doesn't make sense with fewer bits than this).
scaleAndCompare int scaleAndCompare(ExactFloat b): Undocumented in source.
setInf void setInf(int sign): Set the value of the ExactFloat to positive infinity (if sign >= 0) or negative infinity (if sign < 0).
setNan void setNan(): Set the value of the ExactFloat to NaN (Not-a-Number).
setZero void setZero(int sign): Set the value of the ExactFloat to +0 (if sign >= 0) or -0 (if sign < 0).
toDouble double toDouble(): Round to double precision. Note that since doubles have a much smaller exponent range than ExactFloats, very small values may be rounded to (positive or negative) zero, and very large values may be rounded to infinity.
toHash size_t toHash(): Undocumented in source. Be warned that the author may not have intended to support it.
toString string toString(): Return a human-readable string such that if two values with the same precision are different, then their string representations are different. The format is similar to printf("%g"), except that the number of significant digits depends on the precision (with a minimum of 10). Trailing zeros are stripped (just like "%g").
toStringWithMaxDigits string toStringWithMaxDigits(int max_digits): Return a string formatted according to printf("%Ng") where N is the given maximum number of significant digits.
toUniqueString string toUniqueString(): Return a human-readable string such that if two ExactFloats have different values, then their string representations are always different. This method is useful for debugging. The string has the form "value<prec>", where "prec" is the actual precision of the ExactFloat (e.g., "0.215<50>").
unsignedLess bool unsignedLess(ExactFloat b): Undocumented in source. Be warned that the author may not have intended to support it.

Properties

maxPrec int maxPrec [@property getter]: ////////////////////////////////////////////////////////////////////////// Return the maximum precision of the ExactFloat. This method exists only for compatibility with MPFloat.
prec int prec [@property getter]: Return the actual precision of this ExactFloat (the current number of bits in its mantissa). Returns 0 for non-normal numbers such as NaN.
sign int sign [@property getter]: Return +1 if this ExactFloat is positive, -1 if it is negative, and 0 if it is zero or NaN. Note that unlike sign_bit(), sgn() returns 0 for both positive and negative zero.
signBit bool signBit [@property getter]: Return true if the sign bit is set (this includes negative zero).

Static functions

infinity ExactFloat infinity(int sign): Return an ExactFloat equal to positive infinity (if sign >= 0) or negative infinity (if sign < 0).
nan ExactFloat nan(): Return an ExactFloat that is NaN (Not-a-Number).
numSignificantDigitsForPrec int numSignificantDigitsForPrec(int prec): Return an upper bound on the number of significant digits required to distinguish any two floating-point numbers with the given precision when they are formatted as decimal strings in exponential format.
signedZero ExactFloat signedZero(int sign): ////////////////////////////////////////////////////////////////// Return an ExactFloat equal to positive zero (if sign >= 0) or negative zero (if sign < 0).

Static variables

MAX_EXP int MAX_EXP;: The maximum exponent supported. If a value has an exponent larger than this, it is replaced by infinity (with the appropriate sign).
MAX_PREC int MAX_PREC;: The maximum number of mantissa bits supported. If a value has more mantissa bits than this, it is replaced with NaN. (It is expected that users of this class will never want this much precision.)
MIN_EXP int MIN_EXP;: The minimum exponent supported. If a value has an exponent less than this, it is replaced by zero (with the appropriate sign).

ExactFloat

Constructors

Members

Enums

Functions

Properties

Static functions

Static variables

Meta

Source