Overview

Python’s re module implements PCRE-compatible regular expressions. Compile patterns that are used more than once; the module caches up to 512 compiled patterns but explicit re.compile() makes intent clear. For heavyweight parsing, reach for a grammar library instead. This card covers the function surface, flags, character classes, quantifier behavior, and the patterns that appear most often in practice. See regex-patterns for flavor-agnostic reference and common validation patterns.

re module functions

Choose the function that matches your search intent.

FunctionReturnsWhen to use
re.match(pat, s)Match object or NonePattern anchored to start of string.
re.fullmatch(pat, s)Match object or NonePattern must cover the entire string.
re.search(pat, s)Match object or NoneFirst match anywhere in the string.
re.findall(pat, s)List of strings (or tuples)All non-overlapping matches as plain strings.
re.finditer(pat, s)Iterator of Match objectsAll matches with positions; prefer over findall when groups matter.
re.sub(pat, repl, s)StringReplace all matches; repl can be a string or callable.
re.subn(pat, repl, s)(string, count)Like sub but also returns the substitution count.
re.split(pat, s)List of stringsSplit on pattern; capturing groups appear in the list.
re.compile(pat, flags)Pattern objectPre-compile for reuse; exposes the same methods above.
re.escape(s)StringEscape all special characters; use before embedding user input in a pattern.
import re
 
EMAIL = re.compile(r"[\w.+-]+@[\w-]+\.[\w.]+")
 
# search vs match
re.search(r"\d+", "abc123")   # matches; position 3
re.match(r"\d+", "abc123")    # None; no digits at start
 
# finditer preserves span information
for m in EMAIL.finditer(text):
    print(m.group(), m.span())
 
# sub with a callable
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "a1 b2 c3")
# "a2 b4 c6"

Flags

Pass flags to re.compile() or embed them inline with (?flags).

FlagShortInlineEffect
re.IGNORECASEre.I(?i)Case-insensitive matching.
re.MULTILINEre.M(?m)^ and $ match start/end of each line.
re.DOTALLre.S(?s). matches newline as well.
re.VERBOSEre.X(?x)Whitespace and # comments ignored; write readable patterns.
re.ASCIIre.A(?a)\w, \d, \s match ASCII only, not full Unicode.
re.UNICODEre.U(?u)Default in Python 3; \w etc. match Unicode characters.
# VERBOSE for a complex pattern
DATE = re.compile(r"""
    (?P<year>  \d{4}) -   # four-digit year
    (?P<month> \d{2}) -   # two-digit month
    (?P<day>   \d{2})     # two-digit day
""", re.VERBOSE)
 
# Combine flags with |
re.compile(r"foo", re.IGNORECASE | re.MULTILINE)

Character classes

The shorthands and custom classes that matter in Python.

ClassMatchesNote
\dUnicode digits (0-9 plus others)Use [0-9] or re.ASCII for ASCII-only.
\DNon-digitComplement of \d.
\w[a-zA-Z0-9_] (+ Unicode by default)Does not include hyphen or dot.
\WNon-word
\sWhitespace including \t, \n, \r, \f, \v
\SNon-whitespace
[abc]Any of a, b, c
[^abc]Anything except a, b, c
[a-z0-9_.-]Range plus literalsHyphen must be first, last, or escaped inside [].
.Any char except newlineUse re.DOTALL to include newline.

Greedy vs lazy quantifiers

Greedy quantifiers consume as much as possible. Lazy quantifiers stop as soon as the rest of the pattern can match.

QuantifierGreedyLazy
0 or 1???
0 or more**?
1 or more++?
exactly n{n}n/a
n to m{n,m}{n,m}?
s = "<a>link</a>"
 
re.search(r"<.+>", s).group()   # "<a>link</a>"  greedy
re.search(r"<.+?>", s).group()  # "<a>"          lazy
 
# Greedy is almost always correct for whole-token patterns.
# Lazy is useful for extracting content between delimiters.
html_tags = re.findall(r"<.*?>", "<b>bold</b> and <i>italic</i>")
# ["<b>", "</b>", "<i>", "</i>"]

Avoid nested quantifiers like (a+)+; they can cause catastrophic backtracking on mismatched input.

Common patterns

Pre-compiled patterns worth keeping in a project utilities module.

Pattern nameRegexNotes
UUID v4[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}Lowercase hex.
ISO date\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])Validates ranges; not calendar truth.
Slug[a-z0-9]+(?:-[a-z0-9]+)*URL-safe identifier.
Python identifier[A-Za-z_]\w*Matches valid variable names.
IPv4((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)Bounded octets.
Semver\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?Simplified; no full spec validation.
SLUG = re.compile(r"^[a-z0-9]+(?:-[a-z0-9]+)*$")
 
def is_slug(s: str) -> bool:
    return bool(SLUG.fullmatch(s))

Named groups and back-references

Use named groups for readable extraction; use back-references to match repeated content.

# Named groups
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", "2026-05-14")
m.group("year")   # "2026"
m.groupdict()     # {"year": "2026", "month": "05", "day": "14"}
 
# Back-reference: match doubled words
re.search(r"\b(\w+)\s+\1\b", "the the problem").group()  # "the the"
 
# sub with named groups in replacement
re.sub(r"(?P<last>\w+), (?P<first>\w+)", r"\g<first> \g<last>", "Smith, John")
# "John Smith"

Common gotchas

  • re.match() anchors to the start but not the end. Use re.fullmatch() for complete-string validation.
  • \d matches non-ASCII digits by default in Python 3. Use [0-9] or add re.ASCII when you mean decimal digits only.
  • re.findall() returns a list of strings when there are no groups, a list of strings when there is one group, and a list of tuples when there are multiple groups. This inconsistency surprises people; use re.finditer() and call .group() explicitly.
  • Passing a bytes pattern to a str input (or vice versa) raises TypeError. Keep types consistent.
  • re.split(r"(\s+)", s) includes the matched separator in the output list because the group is capturing. Use (?:\s+) if you want separators dropped.
  • re.escape() is required before inserting user-controlled text into a pattern. Skipping it opens a ReDoS vector.
  • The module-level functions (re.search, re.sub) compile and cache the pattern on each call. For tight loops, compile explicitly and call methods on the Pattern object.