17.18.2 RTL to RTL Peephole Optimizers

The define_peephole2 definition tells the compiler how to substitute one sequence of instructions for another sequence, what additional scratch registers may be needed and what their lifetimes must be.

(define_peephole2
  [insn-pattern-1
   insn-pattern-2
   …]
  "condition"
  [new-insn-pattern-1
   new-insn-pattern-2
   …]
  "preparation-statements")

The definition is almost identical to define_split (see Defining How to Split Instructions) except that the pattern to match is not a single instruction, but a sequence of instructions.

It is possible to request additional scratch registers for use in the output template. If appropriate registers are not free, the pattern will simply not match.

Scratch registers are requested with a match_scratch pattern at the top level of the input pattern. The allocated register (initially) will be dead at the point requested within the original sequence. If the scratch is used at more than a single point, a match_dup pattern at the top level of the input pattern marks the last position in the input sequence at which the register must be available.

Here is an example from the IA-32 machine description:

(define_peephole2
  [(match_scratch:SI 2 "r")
   (parallel [(set (match_operand:SI 0 "register_operand" "")
                   (match_operator:SI 3 "arith_or_logical_operator"
                     [(match_dup 0)
                      (match_operand:SI 1 "memory_operand" "")]))
              (clobber (reg:CC 17))])]
  "! optimize_size && ! TARGET_READ_MODIFY"
  [(set (match_dup 2) (match_dup 1))
   (parallel [(set (match_dup 0)
                   (match_op_dup 3 [(match_dup 0) (match_dup 2)]))
              (clobber (reg:CC 17))])]
  "")

This pattern tries to split a load from its use in the hopes that we’ll be able to schedule around the memory load latency. It allocates a single SImode register of class GENERAL_REGS ("r") that needs to be live only at the point just before the arithmetic.

A real example requiring extended scratch lifetimes is harder to come by, so here’s a silly made-up example:

(define_peephole2
  [(match_scratch:SI 4 "r")
   (set (match_operand:SI 0 "" "") (match_operand:SI 1 "" ""))
   (set (match_operand:SI 2 "" "") (match_dup 1))
   (match_dup 4)
   (set (match_operand:SI 3 "" "") (match_dup 1))]
  "/* determine 1 does not overlap 0 and 2 */"
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0) (match_dup 4))
   (set (match_dup 2) (match_dup 4))
   (set (match_dup 3) (match_dup 4))]
  "")

There are two special macros defined for use in the preparation statements: DONE and FAIL. Use them with a following semicolon, as a statement.

DONE

Use the DONE macro to end RTL generation for the peephole. The only RTL insns generated as replacement for the matched input insn will be those already emitted by explicit calls to emit_insn within the preparation statements; the replacement pattern is not used.

FAIL

Make the define_peephole2 fail on this occasion. When a define_peephole2 fails, it means that the replacement was not truly available for the particular inputs it was given. In that case, GCC may still apply a later define_peephole2 that also matches the given insn pattern. (Note that this is different from define_split, where FAIL prevents the input insn from being split at all.)

If the preparation falls through (invokes neither DONE nor FAIL), then the define_peephole2 uses the replacement template.

If we had not added the (match_dup 4) in the middle of the input sequence, it might have been the case that the register we chose at the beginning of the sequence is killed by the first or second set.