Python Regex (re module) Cheatsheet

Overview

Python’s re module implements PCRE-compatible regular expressions. Compile patterns that are used more than once; the module caches up to 512 compiled patterns but explicit re.compile() makes intent clear. For heavyweight parsing, reach for a grammar library instead. This card covers the function surface, flags, character classes, quantifier behavior, and the patterns that appear most often in practice. See regex-patterns for flavor-agnostic reference and common validation patterns.

re module functions

Choose the function that matches your search intent.

Function	Returns	When to use
`re.match(pat, s)`	Match object or `None`	Pattern anchored to start of string.
`re.fullmatch(pat, s)`	Match object or `None`	Pattern must cover the entire string.
`re.search(pat, s)`	Match object or `None`	First match anywhere in the string.
`re.findall(pat, s)`	List of strings (or tuples)	All non-overlapping matches as plain strings.
`re.finditer(pat, s)`	Iterator of Match objects	All matches with positions; prefer over `findall` when groups matter.
`re.sub(pat, repl, s)`	String	Replace all matches; `repl` can be a string or callable.
`re.subn(pat, repl, s)`	`(string, count)`	Like `sub` but also returns the substitution count.
`re.split(pat, s)`	List of strings	Split on pattern; capturing groups appear in the list.
`re.compile(pat, flags)`	`Pattern` object	Pre-compile for reuse; exposes the same methods above.
`re.escape(s)`	String	Escape all special characters; use before embedding user input in a pattern.

import re
 
EMAIL = re.compile(r"[\w.+-]+@[\w-]+\.[\w.]+")
 
# search vs match
re.search(r"\d+", "abc123")   # matches; position 3
re.match(r"\d+", "abc123")    # None; no digits at start
 
# finditer preserves span information
for m in EMAIL.finditer(text):
    print(m.group(), m.span())
 
# sub with a callable
re.sub(r"\d+", lambda m: str(int(m.group()) * 2), "a1 b2 c3")
# "a2 b4 c6"

Flags

Pass flags to re.compile() or embed them inline with (?flags).

Flag	Short	Inline	Effect
`re.IGNORECASE`	`re.I`	`(?i)`	Case-insensitive matching.
`re.MULTILINE`	`re.M`	`(?m)`	`^` and `$` match start/end of each line.
`re.DOTALL`	`re.S`	`(?s)`	`.` matches newline as well.
`re.VERBOSE`	`re.X`	`(?x)`	Whitespace and `#` comments ignored; write readable patterns.
`re.ASCII`	`re.A`	`(?a)`	`\w`, `\d`, `\s` match ASCII only, not full Unicode.
`re.UNICODE`	`re.U`	`(?u)`	Default in Python 3; `\w` etc. match Unicode characters.

# VERBOSE for a complex pattern
DATE = re.compile(r"""
    (?P<year>  \d{4}) -   # four-digit year
    (?P<month> \d{2}) -   # two-digit month
    (?P<day>   \d{2})     # two-digit day
""", re.VERBOSE)
 
# Combine flags with |
re.compile(r"foo", re.IGNORECASE | re.MULTILINE)

Character classes

The shorthands and custom classes that matter in Python.

Class	Matches	Note
`\d`	Unicode digits (0-9 plus others)	Use `[0-9]` or `re.ASCII` for ASCII-only.
`\D`	Non-digit	Complement of `\d`.
`\w`	`[a-zA-Z0-9_]` (+ Unicode by default)	Does not include hyphen or dot.
`\W`	Non-word
`\s`	Whitespace including `\t`, `\n`, `\r`, `\f`, `\v`
`\S`	Non-whitespace
`[abc]`	Any of a, b, c
`[^abc]`	Anything except a, b, c
`[a-z0-9_.-]`	Range plus literals	Hyphen must be first, last, or escaped inside `[]`.
`.`	Any char except newline	Use `re.DOTALL` to include newline.

Greedy vs lazy quantifiers

Greedy quantifiers consume as much as possible. Lazy quantifiers stop as soon as the rest of the pattern can match.

Quantifier	Greedy	Lazy
0 or 1	`?`	`??`
0 or more	`*`	`*?`
1 or more	`+`	`+?`
exactly n	`{n}`	n/a
n to m	`{n,m}`	`{n,m}?`

s = "<a>link</a>"
 
re.search(r"<.+>", s).group()   # "<a>link</a>"  greedy
re.search(r"<.+?>", s).group()  # "<a>"          lazy
 
# Greedy is almost always correct for whole-token patterns.
# Lazy is useful for extracting content between delimiters.
html_tags = re.findall(r"<.*?>", "<b>bold</b> and <i>italic</i>")
# ["<b>", "</b>", "<i>", "</i>"]

Avoid nested quantifiers like (a+)+; they can cause catastrophic backtracking on mismatched input.

Common patterns

Pre-compiled patterns worth keeping in a project utilities module.

Pattern name	Regex	Notes
UUID v4	`[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}`	Lowercase hex.
ISO date	`\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])`	Validates ranges; not calendar truth.
Slug	`[a-z0-9]+(?:-[a-z0-9]+)*`	URL-safe identifier.
Python identifier	`[A-Za-z_]\w*`	Matches valid variable names.
IPv4	`((25[0-5]\|2[0-4]\d\|1\d\d\|[1-9]?\d)\.){3}(25[0-5]\|2[0-4]\d\|1\d\d\|[1-9]?\d)`	Bounded octets.
Semver	`\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?`	Simplified; no full spec validation.

SLUG = re.compile(r"^[a-z0-9]+(?:-[a-z0-9]+)*$")
 
def is_slug(s: str) -> bool:
    return bool(SLUG.fullmatch(s))

Named groups and back-references

Use named groups for readable extraction; use back-references to match repeated content.

# Named groups
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", "2026-05-14")
m.group("year")   # "2026"
m.groupdict()     # {"year": "2026", "month": "05", "day": "14"}
 
# Back-reference: match doubled words
re.search(r"\b(\w+)\s+\1\b", "the the problem").group()  # "the the"
 
# sub with named groups in replacement
re.sub(r"(?P<last>\w+), (?P<first>\w+)", r"\g<first> \g<last>", "Smith, John")
# "John Smith"

Common gotchas

re.match() anchors to the start but not the end. Use re.fullmatch() for complete-string validation.
\d matches non-ASCII digits by default in Python 3. Use [0-9] or add re.ASCII when you mean decimal digits only.
re.findall() returns a list of strings when there are no groups, a list of strings when there is one group, and a list of tuples when there are multiple groups. This inconsistency surprises people; use re.finditer() and call .group() explicitly.
Passing a bytes pattern to a str input (or vice versa) raises TypeError. Keep types consistent.
re.split(r"(\s+)", s) includes the matched separator in the output list because the group is capturing. Use (?:\s+) if you want separators dropped.
re.escape() is required before inserting user-controlled text into a pattern. Skipping it opens a ReDoS vector.
The module-level functions (re.search, re.sub) compile and cache the pattern on each call. For tight loops, compile explicitly and call methods on the Pattern object.

LLM Best Practices

Explorer

Python Regex (re module) Cheatsheet

Overview

re module functions

Flags

Character classes

Greedy vs lazy quantifiers

Common patterns

Named groups and back-references

Common gotchas

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Python Regex (re module) Cheatsheet

Overview

re module functions

Flags

Character classes

Greedy vs lazy quantifiers

Common patterns

Named groups and back-references

Common gotchas

Related

Graph View

Table of Contents

Backlinks