Overview
Regex flags modify how an engine interprets the pattern and the input. The same flag letter usually means the same thing across engines, but the API differs: Python uses re.IGNORECASE or the inline (?i) flag; JavaScript uses a trailing /gi literal. This card lists flags, anchors, quantifiers, and zero-width assertions. For full pattern vocabulary see regex-patterns.
Flags by language
| Flag letter | Python constant | JS | Effect |
|---|---|---|---|
i | re.IGNORECASE | /i | Case-insensitive match |
m | re.MULTILINE | /m | ^ and $ match start/end of each line, not the string |
s | re.DOTALL | /s | . matches newline characters |
x | re.VERBOSE | N/A | Ignore unescaped whitespace and # comments in pattern |
g | N/A (findall) | /g | Global: find all matches, not just the first |
u | Default in Python 3 | /u | Unicode mode; required for \p{} in JS |
A | re.ASCII | N/A | \w, \d, \s match ASCII only (Python default is Unicode) |
L | re.LOCALE | N/A | Locale-aware \w etc.; deprecated in Python 3 |
Combine flags: re.compile(r"pattern", re.I | re.M) in Python; /pattern/gim in JS.
Inline flags in the pattern string apply from that point: (?i)foo matches foo, FOO, Foo. Inline flags are useful when the regex is passed as a string and you cannot set the flags argument.
Anchors
Anchors are zero-width: they match a position, not a character.
| Anchor | Matches | re.M effect (Python/JS) |
|---|---|---|
^ | Start of string | Start of each line |
$ | End of string (before optional final \n) | End of each line |
\A | Absolute start of string | Not affected by M |
\Z | Absolute end of string | Not affected by M |
\b | Word boundary (between \w and \W) | Not affected by M |
\B | Non-word boundary | Not affected by M |
Use \A and \Z when validating whole strings and you cannot guarantee the re.MULTILINE flag is absent.
Quantifiers
| Quantifier | Meaning | Example | Matches |
|---|---|---|---|
? | 0 or 1 (greedy) | colou?r | color, colour |
* | 0 or more (greedy) | a*b | b, ab, aab |
+ | 1 or more (greedy) | a+b | ab, aab |
{n} | Exactly n | \d{4} | exactly 4 digits |
{n,} | n or more | \d{2,} | 2 or more digits |
{n,m} | n to m (greedy) | \d{2,4} | 2, 3, or 4 digits |
?? | 0 or 1 (lazy) | <.+?> | shortest match |
*? | 0 or more (lazy) | ".*?" | shortest quoted string |
+? | 1 or more (lazy) | <.+?> | non-greedy HTML tag |
{n,m}? | Lazy bounded | \d{2,4}? | prefer 2, stop there |
+ + (?=...) | Possessive (Python 3.11+ atomic) | — | no backtrack into group |
Greedy quantifiers grab as much as possible and backtrack; lazy quantifiers grab as little as possible and expand. Lazy matching is not always faster; it still backtracks.
Lookahead and lookbehind
Zero-width assertions match without consuming characters.
| Syntax | Name | Matches if |
|---|---|---|
(?=...) | Lookahead | Pattern ahead matches |
(?!...) | Negative lookahead | Pattern ahead does not match |
(?<=...) | Lookbehind | Pattern behind matches |
(?<!...) | Negative lookbehind | Pattern behind does not match |
Examples:
| Expression | Matches |
|---|---|
\d+(?= dollars) | Digits followed by ” dollars” (digits only, not the word) |
\d+(?! dollars) | Digits NOT followed by ” dollars” |
(?<=\$)\d+ | Digits preceded by $ (digits only) |
(?<!\$)\d+ | Digits NOT preceded by $ |
(?<=\bpre)\w+ | Word part after “pre” prefix |
Python lookbehind must be fixed-width. (?<=ab+) raises an error; (?<=ab) works.
Named groups and backreferences
| Syntax | Python | JS | Meaning |
|---|---|---|---|
| Named capture | (?P<name>...) | (?<name>...) | Named group |
| Named backref | (?P=name) | \k<name> | Reference named group in pattern |
| Non-capturing | (?:...) | (?:...) | Group without capture |
| Alternation | cat|dog | cat|dog | Either alternative |
Access named groups: m.group("name") in Python; m.groups.name in JS with /d flag or named group syntax.
Common gotchas
re.MULTILINEchanges^/$but not.; you still needre.DOTALLfor.to match newlines.- In Python, raw strings
r"..."prevent double-escaping."\n"is a newline;r"\n"is backslash-n for the regex engine. - JavaScript’s global
/gflag maintains alastIndexstate on the regex object. Reusing the same compiled regex in a loop without resettinglastIndexproduces incorrect results. \bis a word boundary between\wand\W. In Unicode mode,\wincludes non-ASCII letters, so\bpositions shift compared to ASCII mode.- Lookbehind in Python must be fixed-width. Use
re2or restructure the pattern if you need variable-length lookbehind. - Catastrophic backtracking happens with nested quantifiers like
(a+)+. Test ambiguous patterns on adversarial input before deploying in a server context.