Compilation flags let you modify some aspects of how regular expressions work. Flags are available in the re module under two names, a long name such as IGNORECASE, and a short, one-letter form such as I. (If you're familiar with Perl's pattern modifiers, the one-letter forms use the same letters; the short form of re.VERBOSE is re.X.) Multiple flags can be specified by bitwise OR-ing them; re.I | re.M sets both the I and M flags, for example.
Here's a table of the available flags, followed by a more detailed explanation of each one.
Flag | Meaning |
---|---|
DOTALL, S | Make . match any character, including newlines |
IGNORECASE, I | Do case-insensitive matches |
LOCALE, L | Do a locale-aware match |
MULTILINE, M | Multi-line matching, affecting ^ and $ |
VERBOSE, X | Enable verbose REs, which can be organized more cleanly and understandably. |
Locales are a feature of the C library intended to help in writing programs that take account of language differences. For example, if you're processing French text, you'd want to be able to write \w+ to match words, but \w only matches the character class [A-Za-z]; it won't match "é" or "ç". If your system is configured properly, and a French locale is selected, certain C functions will tell the program that "é" should also be considered a letter. Setting the LOCALE flag when compiling a regular expression will cause the resulting compiled object to use these C functions for \w; this is slower, but also enables \w+ to match French words as you'd expect.
For example, here's a RE that uses re.VERBOSE; see how much easier it is to read?
charref = re.compile(r""" &# # Start of a numeric entity reference (?P<char> [0-9]+[^0-9] # Decimal form | 0[0-7]+[^0-7] # Octal form | x[0-9a-fA-F]+[^0-9a-fA-F] # Hexadecimal form ) """, re.VERBOSE)
charref = re.compile("&#(?P<char>[0-9]+[^0-9]" "|0[0-7]+[^0-7]" "|x[0-9a-fA-F]+[^0-9a-fA-F])")