Inlining of Subprograms - GNAT User's Guide

Next: Vectorization of loops, Previous: Debugging Optimized Code, Up: Performance Considerations

7.1.5 Inlining of Subprograms

A call to a subprogram in the current unit is inlined if all the following conditions are met:

The optimization level is at least -O1.
The called subprogram is suitable for inlining: It must be small enough and not contain something that gcc cannot support in inlined subprograms.
Any one of the following applies: pragma Inline is applied to the subprogram and the -gnatn switch is specified; the subprogram is local to the unit and called once from within it; the subprogram is small and optimization level -O2 is specified; optimization level -O3 is specified.

Calls to subprograms in with'ed units are normally not inlined. To achieve actual inlining (that is, replacement of the call by the code in the body of the subprogram), the following conditions must all be true:

The optimization level is at least -O1.
The called subprogram is suitable for inlining: It must be small enough and not contain something that gcc cannot support in inlined subprograms.
The call appears in a body (not in a package spec).
There is a pragma Inline for the subprogram.
The -gnatn switch is used on the command line.

Even if all these conditions are met, it may not be possible for the compiler to inline the call, due to the length of the body, or features in the body that make it impossible for the compiler to do the inlining.

Note that specifying the -gnatn switch causes additional compilation dependencies. Consider the following:

     
     package R is
        procedure Q;
        pragma Inline (Q);
     end R;
     package body R is
        ...
     end R;
     
     with R;
     procedure Main is
     begin
        ...
        R.Q;
     end Main;

With the default behavior (no -gnatn switch specified), the compilation of the Main procedure depends only on its own source, main.adb, and the spec of the package in file r.ads. This means that editing the body of R does not require recompiling Main.

On the other hand, the call R.Q is not inlined under these circumstances. If the -gnatn switch is present when Main is compiled, the call will be inlined if the body of Q is small enough, but now Main depends on the body of R in r.adb as well as on the spec. This means that if this body is edited, the main program must be recompiled. Note that this extra dependency occurs whether or not the call is in fact inlined by gcc.

The use of front end inlining with -gnatN generates similar additional dependencies.

Note: The -fno-inline switch can be used to prevent all inlining. This switch overrides all other conditions and ensures that no inlining occurs. The extra dependences resulting from -gnatn will still be active, even if this switch is used to suppress the resulting inlining actions.

Note: The -fno-inline-functions switch can be used to prevent automatic inlining of subprograms if -O3 is used.

Note: The -fno-inline-small-functions switch can be used to prevent automatic inlining of small subprograms if -O2 is used.

Note: The -fno-inline-functions-called-once switch can be used to prevent inlining of subprograms local to the unit and called once from within it if -O1 is used.

Note regarding the use of -O3: -gnatn is made up of two sub-switches -gnatn1 and -gnatn2 that can be directly specified in lieu of it, -gnatn being translated into one of them based on the optimization level. With -O2 or below, -gnatn is equivalent to -gnatn1 which activates pragma Inline with moderate inlining across modules. With -O3, -gnatn is equivalent to -gnatn2 which activates pragma Inline with full inlining across modules. If you have used pragma Inline in appropriate cases, then it is usually much better to use -O2 and -gnatn and avoid the use of -O3 which has the additional effect of inlining subprograms you did not think should be inlined. We have found that the use of -O3 may slow down the compilation and increase the code size by performing excessive inlining, leading to increased instruction cache pressure from the increased code size and thus minor performance improvements. So the bottom line here is that you should not automatically assume that -O3 is better than -O2, and indeed you should use -O3 only if tests show that it actually improves performance for your program.