Explain each of the following
concepts, along with at least one suitable example for each:
(i) round-off error (ii) chopping
error (iii) truncation error (iv)
floating-point representation (v) significant digits in a decimal
representation
Ans
round-off error
Error caused
by approximating a figure (number) with fewer digits from
an original figure with more digits. For
example, rounding off 99.987 to 100. Whereas rounding-errors may be
harmless in manual computations, they can become serious-mistakes
in computer calculations involving thousands or millions of
mathematical operations. Also called roundoff error. Compare
with truncation error.
Precision
|
Base
|
Sign
|
Exponent
|
Significand
|
Single precision
|
2
|
1
|
8
|
23+1
|
Double precision
|
2
|
1
|
11
|
52+1
|
Roundoff error is the difference
between an approximation of a number used in computation and its exact
(correct) value. In certain types of computation, roundoff error can be
magnified as any initial errors are carried through one or more intermediate
steps.
An egregious example of
roundoff error is provided by a short-lived index devised at the Vancouver
stock exchange (McCullough and Vinod 1999). At its inception in 1982, the index
was given a value of 1000.000. After 22 months of recomputing the index and
truncating to three decimal places at each change in market value, the index
stood at 524.881, despite the fact that its "true" value should have
been 1009.811.
Other sorts of roundoff
error can also occur. A notorious example is the fate of the Ariane rocket
launched on June 4, 1996 (European Space Agency 1996). In the 37th second of
flight, the inertial reference system attempted to convert a
64-bit floating-point number to a 16-bit number, but instead
triggered an overflow error which was interpreted by the guidance system as
flight data, causing the rocket to veer off course and be destroyed.
The Patriot missile
defense system used during the Gulf War was also rendered ineffective due to
roundoff error (Skeel 1992, U.S. GAO 1992). The system used an integer timing
register which was incremented at intervals of 0.1 s. However, the integers
were converted to decimal numbers by multiplying by
the binaryapproximation of 0.1,
0.00011001100110011001100_2=(209715)/(2097152).
As a result, after 100
hours (3.6×10^6 ticks), an error of
(1/(10)-(209715)/(2097152))(3600·100·10)=(5625)/(16384)
approx 0.3433 second
had accumulated. This
discrepancy caused the Patriot system to continuously recycle itself instead of
targeting properly. As a result, an Iraqi Scud missile could not be targeted
and was allowed to detonate on a barracks, killing 28 people.
truncation error
Truncation
error is the difference between a truncated value and the actual
value. A truncated quantity is represented by a numeral with a fixed number of
allowed digits, with any excess digits "chopped off" (hence the
expression "truncated").
Next Steps
·
"Resource busy" error when truncating table
I
am inserting records into a sample table at t...
(SearchOracle.com)
·
Group permission for truncating tables
Sybase
expert Mich Talebzadeh explains how to a...
(SearchEnterpriseLinux.com)
As
an example of truncation error, consider the speed of light in a
vacuum. The official value is 299,792,458 meters per second. In scientific
(power-of-10) notation, that quantity is expressed as 2.99792458 x 108.
Truncating it to two decimal places yields 2.99 x 108. The
truncation error is the difference between the actual value and the truncated
value, or 0.00792458 x 108. Expressed properly in scientific
notation, it is 7.92458 x 105.
In
computing applications, truncation error is the discrepancy that arises from
executing a finite number of steps to approximate an infinite process. For
example, the infinite series 1/2 + 1/4 + 1/8 + 1/16 + 1/32 ... adds up to
exactly 1. However, if we truncate the series to only the first four terms, we
get 1/2 + 1/4 + 1/8 + 1/16 = 15/16, producing a truncation error of 1 - 15/16,
or 1/16
floating-point representation
There
are posts on representation of floating point format. The objective of this
article is to provide a brief introduction to floating point format.
The
following description explains terminology and primary details of IEEE 754
binary floating point representation. The discussion confines to single and
double precision formats.
Usually,
a real number in binary will be represented in the following format,
ImIm-1…I2I1I0.F1F2…FnFn-1
Where
Im and Fn will be either 0 or 1 of integer and
fraction parts respectively.
A
finite number can also represented by four integers components, a sign (s), a base
(b), a significand (m), and an exponent (e). Then the numerical value of the
number is evaluated as
(-1)s x
m x be ________ Where m < |b|
Depending
on base and the number of bits used to encode various components, the IEEE
754 standard defines five basic formats. Among the five formats, the
binary32 and the binary64 formats are single precision and double precision
formats respectively in which the base is 2.
Table
– 1 Precision Representation
Single Precision
Format:
As
mentioned in Table 1 the single precision format has 23 bits for significand (1
represents implied bit, details below), 8 bits for exponent and 1 bit for sign.
For
example, the rational number 9÷2 can be converted to single precision float
format as following,
9(10) ÷
2(10) = 4.5(10) = 100.1(2)
The
result said to be normalized, if it is represented with
leading 1 bit, i.e. 1.001(2) x 22. (Similarly when
the number 0.000000001101(2) x 23 is
normalized, it appears as 1.101(2) x 2-6). Omitting
this implied 1 on left extreme gives us the mantissa of
float number. A normalized number provides more accuracy than
corresponding de-normalized number. The implied most
significant bit can be used to represent even more accurate significand (23 + 1
= 24 bits) which is called subnormalrepresentation. The
floating point numbers are to be represented in normalized form.
The
subnormal numbers fall into the category of de-normalized numbers. The
subnormal representation slightly reduces the exponent range and can’t be
normalized since that would result in an exponent which doesn’t fit in the
field. Subnormal numbers are less accurate, i.e. they have less room for
nonzero bits in the fraction field, than normalized numbers. Indeed, the
accuracy drops as the size of the subnormal number decreases. However, the
subnormal representation is useful in filing gaps of floating point scale near
zero.
In
other words, the above result can be written as (-1)0 x 1.001(2) x
22 which yields the integer components as s = 0, b = 2,
significand (m) = 1.001, mantissa = 001 and e = 2. The corresponding single
precision floating number can be represented in binary as shown below,
Where
the exponent field is supposed to be 2, yet encoded as 129 (127+2) called biased
exponent. The exponent field is in plain binary format which also
represents negative exponents with an encoding (like sign magnitude, 1’s
compliment, 2’s complement, etc.). The biased exponent is used for
representation of negative exponents. The biased exponent has advantages over
other negative representations in performing bitwise comparing of two floating
point numbers for equality.
A bias of
(2n-1 – 1), where n is # of bits used in
exponent, is added to the exponent (e) to get biased exponent (E). So,
the biased exponent (E) of single precision number can
be obtained as
E = e + 127
The
range of exponent in single precision format is -126 to +127. Other values are
used for special symbols.
No comments:
Post a Comment