|

SCD technical
consultant
Richard Valent
|
by Richard Valent
When computations go awry in your program,
you may notice incorrect numbers in some output fields, even though your program
continues to execute. Sometimes you may notice strings like INF and NaN in fields
where only numbers should be; these indicate certain kinds of floating-point exceptions
(FPEs). INF means "infinity" and NaN means "not a number." Sometimes it's
hard to find where these FPEs occur in your code, but you must find and fix them.
They are useful only as diagnostics, and they harm performance since each FPE
interrupts the processor on which it occurs.
So how do you find where FPEs are occuring in your code?
Salting your code with print statements is hit-or-miss and invasive, and we
do not recommend it. If you believe you have only a few FPEs, you are well advised
to use a debugger like TotalView or dbx, which will often automatically point
at the first FPE in your core file. But if you have many FPEs, weeding them out
in this manner can be tedious.
Error trapping
An alternative and reliable method is
called "trapping." By trapping, we mean setting a trap at your program's
runtime that gets tripped when an FPE occurs, after which the program execution
follows a prescribed course of your choice. This course is referred to as "handling"
the error, where the handling you choose may cause the program to abort, print
a diagnostic message, or provide a traceback. With certain methods of trapping,
you can even provide a subroutine or function that changes the behavior of the
floating-point arithmetic, though you should consult a numerical analyst about
the consequences before handling errors in this manner.
Since trapping and handling require extra processor time, you may wish to
remove trapping/handling subroutine calls and compiler options after you have
removed your program's FPEs.
Both trapping and handling are implemented via "signals," and you often find
their documentation under the broader topics of "signals" or "signal handling".
All computers discussed in this article utilize IEEE binary
floating-point arithmetic [1], with the exception of Cray, which
uses Cray floating-point arithmetic. (Please consult a Cray Research CPU hardware
reference manual if you need information about Cray's format.)
Six floating-point error types
There are six FPE types in the context
of IEEE floating-point arithmetic.
- Underflow
- One form of underflow exception is signaled by the creation of a tiny nonzero
result between the minimum expressible exponent, which, because it is tiny, may
cause some other exception later. The other form of underflow exception is signaled
by an extraordinary loss of accuracy during the approximation of such tiny numbers
by denormalized numbers.
- Overflow
- The overflow exception is signaled when what would have been the magnitude
of the rounded floating-point result, were the exponent range unbounded, is larger
than the destination format's largest finite number.
- Integer overflow
- The integer overflow exception is signaled when an integer quantity is larger
than the destination format's largest integer.
- Divide by zero
- The divide-by-zero exception is signaled on an implemented divide operation
if the divisor is zero and the dividend is a finite nonzero number.
- Invalid operand (infinity)
- The invalid operand exception is signaled when one or both of the operands
are invalid for an implemented operation. The result (if not trapped) is NaN for
floating-point numbers and not defined for fixed-point numbers.
- Inexact result
- The inexact result exception is signaled when the rounded result of an operation
is not exact or if it overflows without an overflow trap. Users normally do not
trap or handle this type of FPE, in deference to the others.
Variables, utilities, and calls
Vendors may choose among three interfaces
for trapping FPEs: environment variables, utilities, and subroutine calls. Environment
variables provide the least invasive interface of the three, but only SGI provides
it.
Each vendor discussed in this note provides the subroutine-call
interface for trapping FPEs in Fortran, but each has its own implementation, so
portability is lost. A proposal for IEEE floating-point exception handling in
Fortran is given in [2]. However, no vendor has implemented it,
to our knowledge.
To find out what a vendor offers for FPE trapping and handling, you can browse
the vendor's online documentation, using the search engine and search words like
"FPE" and "signal." Looking at man pages and hardcopy manuals helps
too. Vendors provide too little documentation in the area of trapping; one almost
feels it is done as an afterthought.
The remainder of this note is devoted to showing interfaces for trapping FPEs,
in the context of our experience at NCAR. We include the Cray example at the end,
since in our opinion it is least helpful. Here is a table of the interfaces you
will see below:
| Machine |
Environment
variable |
Utility
interface |
Subroutine
interface |
| IBM |
-- |
dbx |
external fhandler_
|
| SGI |
TRAP_FPE |
ssrun, prof |
call handle_sigfpes |
| SUN |
-- |
dbx |
call ieee_handler |
| CRAY |
-- |
-- |
call sigon, sigoff, fsigctl |
IBM trapping FPEs via subroutine fhandler_
- Documentation
- XL Fortran for AIX: User's Guide, Version 6, Release 1: "Detecting
and Trapping Floating-Point Exceptions," p. 312. Also -qflttrap option,
pp. 192-193.
- OS and compiler
- AIX 4.3.3.10, Fortran Version 07.01
- Compilation
-
xlf -c -qfree -qflttrap=und:en -qsigtrap=fhandler_ job.f -lmass
cc -c flttrap_handler.c
xlf job.o flttrap_handler.o
- job.f explanation
- The program calls the IBM-provided subroutine fhandler_ when an underflow
is encountered so that underflows are cut over to zero. This is accomplished via
compiler flags and external statements rather than user call fhandler_
instrumentation.
Note: subroutine fhandler_ handles each type of FPE, not just underflows.
You will want to study subroutine fhandler_ to see if it handles FPEs
according to your needs, and modify it accordingly.
- Comments
- Not straightforward until you read the documentation and know to pick up file
flttrap_handler.c from directory /usr/lpp/xlf/samples/floating_point.
- job.f
program main
implicit none
integer i
real*8 u,v
external fhandler_
!
v = 1.0d-300
u = exp(v)
do i=1,25
v = v*1.0d-01
u = exp(v)
write(6,*)'i,u=',i,u,v
end do
stop
end program main
SGI trapping FPEs via environment variable TRAP_FPE
- Documentation
- man sigfpe
- OS and compiler:
- IRIX 64 6.5, MIPSpro f90 Version 7.2.1
- Compilation
- See script immediately below. Note -l fpe required.
- Comments
- None. TRAP_FPE is an excellent trapping interface.
SGI provides environment variable TRAP_FPE as a convenient way to count errors
and trace overflows in your program without having to add routine calls or code
to your program. To use it, you must set TRAP_FPE and compile your code with library
option -l fpe. See the fsigfpe man page for more information. To
duplicate the floating-point behavior on UNICOS, set TRAP_FPE as follows:
setenv TRAP_FPE \
"UNDERFL=FLUSH_ZERO; OVERFL=ABORT,TRACE; DIVZERO=ABORT,TRACE; \
INVALID=ABORT,TRACE"
f90 -64 -mips4 job.f -l fpe
a.out
SGI trapping FPEs via ssrun and prof utilities
- Documentation
- man ssrun; man prof
- OS and compiler:
- IRIX 64 6.5, MIPSpro f90 Version 7.2.1
- Compilation
- See script immediately below. Note -l fpe -l fpe_ss required.
- Comments: None.
SGI provides utilities ssrun and prof which may be used together
to determine where floating-point exceptions occur in your code. Use SpeedShop
utility ssrun with option -fpe on your executable to build an
intermediate file, which you then profile with the prof command to make
a report that counts FPE exceptions routine by routine. You must build your executable
with library options -l fpe and -l fpe_ss. The resulting report
is helpful for overviewing your code's FPEs, but it is not a replacement for a
full trace report obtainable by the above methods. Example:
f90 -64 -mips4 job.f -l fpe -l fpe_ss
ssrun -fpe a.out
prof a.out.fpe.m1043136
a.out
SGI trapping FPEs via subroutine calls
- Documentation
- /usr/include/f90sigfpe (text file)
- OS and compiler:
- IRIX 64 6.5, MIPSpro f90 Version 7.2.1
- Compilation
- f90 -mips4 -64 -fixedform job.f -L/usr/lib64 -lfpe
- job.f explanation
- The program calls user-provided subroutine abort_overfl when an overflow
is encountered. Statements like fsigfpe(2) % abort =2 set Fortran 90
structure component values.
- Comments
- The vendor should provide a man page at the very least.
- job.f
include '/usr/include/f90sigfpe.h'
external abort_overfl
real x
! set OVERFL abort and trace values in f90sigfpe.h common block
! sigfpe via f90 structures as per documentation in f90sigfpe.h
fsigfpe(2) % abort =2
fsigfpe(2) % trace =2
! turn on handler
call handle_sigfpes
1 (FPE_ON, FPE_EN_OVERFL, 0, FPE_ABORT_ON_ERROR,abort_overfl)
! do the work here
x = pow(10.0,10)
x = pow(10.0,40)
x = pow(10.0,50)
! turnoff handling
call handle_sigfpes(FPE_OFF, FPE_EN_OVERFL, 0, 0, 0)
stop
end
real function pow(x,n)
integer n
real x
pow = x**n
print*, ' x,n,pow=', x,n,pow
return
end
subroutine abort_overfl(pc)
integer*4 pc
print *, 'subroutine abort_overfl: pc=', pc
return
end
SUN trapping FPEs via the dbx utility
- Documentation
- AnswerBook (SUN's online documentation)
- OS and compiler:
- SunOS 5.5.1, SunSoft F90 Version 1.0.1.0
- Comments
- dbx's catch FPE is handy for finding an FPE's line number.
f90 -g job.f90
dbx a.out
(dbx) catch FPE
(dbx) run
SUN trapping FPEs via subroutine calls
- Documentation
- man -s 3f f77_ieee_environment
- OS and compiler:
- Solaris 2.5.1, f90 WorkShop Compilers 4.2
- Compilation
- f77 job.f
- job.f explanation
- The program calls user-provided subroutine sample_handler when an
overflow is encountered. It does this by passing the routine name through SUN's
IEEE_HANDLER routine.
- Comments
- The documentation is hard to follow.
- job.f
program sun
C
C Sample program to illustrate using SUN Fortran ieee exception handling.
C
C There are three types of action : get, set, and clear.
C There are five types of exception :
C inexact
C division ... division by zero exception
C underflow
C overflow
C invalid
C all ... all five exceptions above
C common ... invalid, overflow, and division
C exceptions
C
C Note: all and common only make sense with set or clear.
C
C Individual call to ieee_handler accumulate the requests.
C
external sample_handler
C
C Set up traps on all exceptions.
C
ieeer = ieee_handler ( 'set', 'common', sample_handler)
if (ieeer .ne. 0) print *,' ieee_handler cannot set exceptions '
C
a = 0.
print *,a
b = 1./a
print *,b
c = 5.
print *,c
stop
end
integer function sample_handler ( sig, code, sigcontext)
C
C User-supplied exception handler.
C
integer sig, code, sigcontext(5)
print *, 'ieee exception'
stop
end
Cray trapping FPEs via subroutine calls
Caveat: Cray C90 and J90 series run Cray arithmetic, not IEEE binary
floating-point arithmetic. The only FPEs you can trap on these Crays are floating-point
overflow and divide by zero. For completeness, we provide an example showing how
to do this on Cray machines.
- Documentation
- man signal, man fsigctl
OS and compiler:
- UNICOS 10.0.0.3 and f90 Version 3.1.0.0
Compilation
- f90 job.f
job.f explanation
- Use routines fsigctl, sigoff, and sigon to trap floating-point
exceptions and other signals. Provide your own routine sighndlr to do
what you want when an exception is encountered, e.g call routine tracebk
for a trace.
Comments
- There is no way to distinguish between different kinds of floating-point exceptions.
But underflow is not a problem since Cray arithmetic automatically cuts over to
zero.
job.f
program main
real x
external sighndlr
C
C register to catch signals 8==SIGFPE, floating-point error
call fsigctl('REGISTER',8,sighndlr)
C
C no interuptions
call sigoff()
C
C force overflow
x = 1.0
do 20 k=1,3000
x = 10.0*x
20 continue
print*, x
C
C release signals
call sigon()
write(*,*) 'after sigrelease'
stop 'test'
end
subroutine sighndlr()
write(*,*) ' in signal handler.'
write(*,*) ' do whatever needs to be done,'
write(*,*) ' then return to point of interruption'
return
end
References
- IEEE Standard for Binary Floating-Point Arithmetic
(IEEE Std 754-1985) [back to text]
- WG5 (1997), "Technical Report for Floating-Point Exception
Handling in Fortran," ISO/IEC/JTC1/SC22/WG5 N1281 [back to text]
Back to contents
|