In the simplest cases, a regexp is just a literal string that must match exactly. For example, the pattern:
regexp
matches the string "regexp" and no others.
Some characters have a special meaning when they occur in a regexp.
They aren't matched literally as in the previous example, but instead
denote a more general pattern. For example, the character *
is used to indicate that the preceeding element of a regexp may be
repeated 0, 1, or more times. In the pattern:
smooo*th
the * indicates that the preceeding o can be repeated 0 or
more times. So the pattern matches:
smooth smoooth smooooth smoooooth ...
Suppose you want to write a pattern that literally matches a special
character like * -- in other words, you don't want to * to
indicate a permissible repetition, but to match * literally. This
is accomplished by quoting the special character with a backslash.
The pattern:
smoo\*th
matches the string:
smoo*th
and no other strings.
In seven cases, the pattern is reversed -- a backslash makes the
character special instead of making a special character normal. The
characters +, ?, |, (, and ) are
normal but the sequences \+, \?, \|, \(,
\), \{, and \} are special (their meaning is
described later).
The remaining sections of this chapter introduce and explain the various special characters that can occur in regexps.