4. String Substitutions

regsub.sub(pattern, replacement, string) performs a single string substitution, replacing the substring matched by pattern with the replacement string. regsub.gsub is more commonly used; it replaces all non-overlapping matches to the pattern in the string. Both functions have been replaced by re.sub(pattern, replacement, string, count). count is optional, and gives the number of substitutions to perform; without a count argument, the default action is to perform all the substitutions possible.

regsub code:

regsub.sub('[ \t\n]+', ' ', s)

re code:

re.sub('[ \t\n]+', ' ', s, 1)

regsub code:

regsub.gsub('[ \t\n]+', ' ', s)

re code:

re.sub('[ \t\n]+', ' ', s)

regsub.split(string, pattern) splits the string apart into fields separated by delimiters matching the pattern, and returns a list containing the fields. The optional maxsplit parameter allows limiting the number of splits performed. The splitx() function also returns the delimiters as part of the list.

Both tasks are performed by the re.split(pattern, string) function. Note that the order of the arguments has been reversed! Since all the other re functions have the pattern first, the order was changed for consistency. There's still an optional maxsplit parameter, with the same meaning and the same default value. To keep the delimiters, put a group in the pattern; after each delimiter is found, the value of .groups() for the MatchObject instance will be appended to the list.

regsub code:

regsub.split(s, ',[ \t]*')

re code:

re.split(',[ \t]*', s)

regsub code:

regsub.splitx(s, ',[ \t]*')

re code:

re.split('(,[ \t]*)', s)