Most Fortran users will want to use no optimization when
developing and testing programs, and use -O
or -O2
when
compiling programs for late-cycle testing and for production use.
However, note that certain diagnostics--such as for uninitialized
variables--depend on the flow analysis done by -O
, i.e. you
must use -O
or -O2
to get such diagnostics.
The following flags have particular applicability when compiling Fortran programs:
-malign-double
Noticeably improves performance of g77
programs making
heavy use of REAL(KIND=2)
(DOUBLE PRECISION
) data
on some systems.
In particular, systems using Pentium, Pentium Pro, 586, and
686 implementations
of the i386 architecture execute programs faster when
REAL(KIND=2)
(DOUBLE PRECISION
) data are
aligned on 64-bit boundaries
in memory.
This option can, at least, make benchmark results more consistent across various system configurations, versions of the program, and data sets.
Note: The warning in the gcc
documentation about
this option does not apply, generally speaking, to Fortran
code compiled by g77
See Aligned Data, for more information on alignment issues.
Also also note: The negative form of -malign-double
is -mno-align-double
, not -benign-double
.
-ffloat-store
This option is effective when the floating-point unit is set to work in
IEEE 854 `extended precision'--as it typically is on x86 and m68k GNU
systems--rather than IEEE 754 double precision. -ffloat-store
tries to remove the extra precision by spilling data from floating-point
registers into memory and this typically involves a big performance
hit. However, it doesn't affect intermediate results, so that it is
only partially effective. `Excess precision' is avoided in code like:
a = b + c d = a * ebut not in code like:
d = (b + c) * e
For another, potentially better, way of controlling the precision,
see Floating-point precision.
-fforce-mem
-fforce-addr
-fno-inline
-ffast-math
-funsafe-math-optimizations
, and
-fno-trapping-math
.
-funsafe-math-optimizations
-fno-trapping-math
-fstrength-reduce
-frerun-cse-after-loop
-fexpensive-optimizations
-fdelayed-branch
-fschedule-insns
-fschedule-insns2
-fcaller-saves
-funroll-loops
DO
loops by
unrolling them and is probably generally appropriate for Fortran, though
it is not turned on at any optimization level.
Note that outer loop unrolling isn't done specifically; decisions about
whether to unroll a loop are made on the basis of its instruction count.
Also, no `loop discovery'1 is done, so only loops written with DO
benefit from loop optimizations, including--but not limited
to--unrolling. Loops written with IF
and GOTO
are not
currently recognized as such. This option unrolls only iterative
DO
loops, not DO WHILE
loops.
-funroll-all-loops
DO WHILE
loops by
unrolling them in addition to iterative DO
loops. In the absence
of DO WHILE
, this option is equivalent to -funroll-loops
but possibly slower.
-fno-move-all-movables
-fno-reduce-all-givs
-fno-rerun-loop-opt
g77
based on gcc
version 2.8.
Each of these might improve performance on some code.
Analysis of Fortran code optimization and the resulting optimizations triggered by the above options were contributed by Toon Moene ([email protected]).
These three options are intended to be removed someday, once they have helped determine the efficacy of various approaches to improving the performance of Fortran code.
Please let us know how use of these options affects
the performance of your production code.
We're particularly interested in code that runs faster
when these options are disabled, and in
non-Fortran code that benefits when they are
enabled via the above gcc
command-line options.
See Options That Control Optimization, for more information on options to optimize the generated machine code.
loop discovery refers to the
process by which a compiler, or indeed any reader of a program,
determines which portions of the program are more likely to be executed
repeatedly as it is being run. Such discovery typically is done early
when compiling using optimization techniques, so the ``discovered''
loops get more attention---and more run-time resources, such as
registers---from the compiler. It is easy to ``discover'' loops that are
constructed out of looping constructs in the language
(such as Fortran's DO
). For some programs, ``discovering'' loops
constructed out of lower-level constructs (such as IF
and
GOTO
) can lead to generation of more optimal code
than otherwise.