Overview
Python’s re module implements PCRE-compatible regular expressions. Compile patterns that are used more than once; the module caches up to 512 compiled patterns but explicit re.compile() makes intent clear. For heavyweight parsing, reach for a grammar library instead. This card covers the function surface, flags, character classes, quantifier behavior, and the patterns that appear most often in practice. See regex-patterns for flavor-agnostic reference and common validation patterns.
re module functions
Choose the function that matches your search intent.
| Function | Returns | When to use |
|---|---|---|
re.match(pat, s) | Match object or None | Pattern anchored to start of string. |
re.fullmatch(pat, s) | Match object or None | Pattern must cover the entire string. |
re.search(pat, s) | Match object or None | First match anywhere in the string. |
re.findall(pat, s) | List of strings (or tuples) | All non-overlapping matches as plain strings. |
re.finditer(pat, s) | Iterator of Match objects | All matches with positions; prefer over findall when groups matter. |
re.sub(pat, repl, s) | String | Replace all matches; repl can be a string or callable. |
re.subn(pat, repl, s) | (string, count) | Like sub but also returns the substitution count. |
re.split(pat, s) | List of strings | Split on pattern; capturing groups appear in the list. |
re.compile(pat, flags) | Pattern object | Pre-compile for reuse; exposes the same methods above. |
re.escape(s) | String | Escape all special characters; use before embedding user input in a pattern. |
import re
EMAIL = re.compile(r"[\w.+-]+@[\w-]+\.[\w.]+")
# search vs match
re.search(r"\d+", "abc123") # matches; position 3
re.match(r"\d+", "abc123") # None; no digits at start
# finditer preserves span information
for m in EMAIL.finditer(text):
print(m.group(), m.span())
# sub with a callable
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "a1 b2 c3")
# "a2 b4 c6"Flags
Pass flags to re.compile() or embed them inline with (?flags).
| Flag | Short | Inline | Effect |
|---|---|---|---|
re.IGNORECASE | re.I | (?i) | Case-insensitive matching. |
re.MULTILINE | re.M | (?m) | ^ and $ match start/end of each line. |
re.DOTALL | re.S | (?s) | . matches newline as well. |
re.VERBOSE | re.X | (?x) | Whitespace and # comments ignored; write readable patterns. |
re.ASCII | re.A | (?a) | \w, \d, \s match ASCII only, not full Unicode. |
re.UNICODE | re.U | (?u) | Default in Python 3; \w etc. match Unicode characters. |
# VERBOSE for a complex pattern
DATE = re.compile(r"""
(?P<year> \d{4}) - # four-digit year
(?P<month> \d{2}) - # two-digit month
(?P<day> \d{2}) # two-digit day
""", re.VERBOSE)
# Combine flags with |
re.compile(r"foo", re.IGNORECASE | re.MULTILINE)Character classes
The shorthands and custom classes that matter in Python.
| Class | Matches | Note |
|---|---|---|
\d | Unicode digits (0-9 plus others) | Use [0-9] or re.ASCII for ASCII-only. |
\D | Non-digit | Complement of \d. |
\w | [a-zA-Z0-9_] (+ Unicode by default) | Does not include hyphen or dot. |
\W | Non-word | |
\s | Whitespace including \t, \n, \r, \f, \v | |
\S | Non-whitespace | |
[abc] | Any of a, b, c | |
[^abc] | Anything except a, b, c | |
[a-z0-9_.-] | Range plus literals | Hyphen must be first, last, or escaped inside []. |
. | Any char except newline | Use re.DOTALL to include newline. |
Greedy vs lazy quantifiers
Greedy quantifiers consume as much as possible. Lazy quantifiers stop as soon as the rest of the pattern can match.
| Quantifier | Greedy | Lazy | |
|---|---|---|---|
| 0 or 1 | ? | ?? | |
| 0 or more | * | *? | |
| 1 or more | + | +? | |
| exactly n | {n} | n/a | |
| n to m | {n,m} | {n,m}? |
s = "<a>link</a>"
re.search(r"<.+>", s).group() # "<a>link</a>" greedy
re.search(r"<.+?>", s).group() # "<a>" lazy
# Greedy is almost always correct for whole-token patterns.
# Lazy is useful for extracting content between delimiters.
html_tags = re.findall(r"<.*?>", "<b>bold</b> and <i>italic</i>")
# ["<b>", "</b>", "<i>", "</i>"]Avoid nested quantifiers like (a+)+; they can cause catastrophic backtracking on mismatched input.
Common patterns
Pre-compiled patterns worth keeping in a project utilities module.
| Pattern name | Regex | Notes |
|---|---|---|
| UUID v4 | [0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12} | Lowercase hex. |
| ISO date | \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]) | Validates ranges; not calendar truth. |
| Slug | [a-z0-9]+(?:-[a-z0-9]+)* | URL-safe identifier. |
| Python identifier | [A-Za-z_]\w* | Matches valid variable names. |
| IPv4 | ((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d) | Bounded octets. |
| Semver | \d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)? | Simplified; no full spec validation. |
SLUG = re.compile(r"^[a-z0-9]+(?:-[a-z0-9]+)*$")
def is_slug(s: str) -> bool:
return bool(SLUG.fullmatch(s))Named groups and back-references
Use named groups for readable extraction; use back-references to match repeated content.
# Named groups
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", "2026-05-14")
m.group("year") # "2026"
m.groupdict() # {"year": "2026", "month": "05", "day": "14"}
# Back-reference: match doubled words
re.search(r"\b(\w+)\s+\1\b", "the the problem").group() # "the the"
# sub with named groups in replacement
re.sub(r"(?P<last>\w+), (?P<first>\w+)", r"\g<first> \g<last>", "Smith, John")
# "John Smith"Common gotchas
re.match()anchors to the start but not the end. Usere.fullmatch()for complete-string validation.\dmatches non-ASCII digits by default in Python 3. Use[0-9]or addre.ASCIIwhen you mean decimal digits only.re.findall()returns a list of strings when there are no groups, a list of strings when there is one group, and a list of tuples when there are multiple groups. This inconsistency surprises people; usere.finditer()and call.group()explicitly.- Passing a bytes pattern to a str input (or vice versa) raises
TypeError. Keep types consistent. re.split(r"(\s+)", s)includes the matched separator in the output list because the group is capturing. Use(?:\s+)if you want separators dropped.re.escape()is required before inserting user-controlled text into a pattern. Skipping it opens a ReDoS vector.- The module-level functions (
re.search,re.sub) compile and cache the pattern on each call. For tight loops, compile explicitly and call methods on thePatternobject.