Overview
Use a real parser when one exists; reach for regex only when the input is small, structured, and one-shot. This card lists the patterns that come up most often along with the caveats that bite. Flavors differ: PCRE (most languages), POSIX (grep, sed), and the JavaScript engine each have small but real divergences. Patterns below use PCRE/Perl syntax unless noted.
Character classes
The shorthands worth memorizing.
| Class | Matches |
|---|---|
. | Any character except newline (with s flag: any). |
\d | Digit [0-9]. |
\D | Non-digit. |
\w | Word char [A-Za-z0-9_]. |
\W | Non-word. |
\s | Whitespace (space, tab, newline). |
\S | Non-whitespace. |
[abc] | Any of a, b, c. |
[^abc] | Any except a, b, c. |
[a-z] | Range a to z. |
\b | Word boundary (zero-width). |
\B | Non-word-boundary. |
^ / $ | Start / end of string or line (with m flag). |
\w does not include hyphen or dot. Spell those out: [\w.-].
Quantifiers
How many times to match the preceding element.
| Quantifier | Meaning |
|---|---|
? | 0 or 1. |
* | 0 or more. |
+ | 1 or more. |
{n} | Exactly n. |
{n,} | At least n. |
{n,m} | Between n and m. |
*?, +?, ?? | Lazy (non-greedy) versions. |
Default quantifiers are greedy. <.+> on <a><b> matches <a><b>. Lazy <.+?> matches <a>.
Grouping and lookarounds
The features that let regex compose.
| Construct | Effect |
|---|---|
(abc) | Capturing group. |
(?:abc) | Non-capturing group. |
(?<name>abc) | Named capturing group. |
\1, \2 | Backreference to capture. |
(?=abc) | Positive lookahead (zero-width). |
(?!abc) | Negative lookahead. |
(?<=abc) | Positive lookbehind. |
(?<!abc) | Negative lookbehind. |
| | Alternation. |
Prefer non-capturing groups (?:...) when no backreference is needed. They keep the capture array clean.
Common patterns
The patterns to copy, paste, and adjust.
Email (RFC-pragmatic)
^[\w.!#$%&'*+/=?^`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+$
The official RFC 5322 grammar runs hundreds of characters and matches strings real mail systems reject. Use the pragmatic form for client-side validation; send a confirmation email for the real check.
URL (http and https)
^https?://[^\s/$.?#].[^\s]*$
This catches obvious garbage. For host validation use a URL parser (URL in JS, urllib.parse in Python).
IPv4 address
^((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)$
(\d{1,3}\.){3}\d{1,3} will match 999.999.999.999. The octet-bounded pattern above rejects invalid ranges.
IPv6 address (basic)
^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$
This rejects the :: shorthand and embedded IPv4. Use a parser (ipaddress in Python, net.ParseIP in Go) for anything user-facing.
Port number (1 to 65535)
^([1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$
\d{1,5} lets through 0 and 99999. The bounded form is necessary if the value matters.
ISO date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
This validates digit ranges but not month-length truth (Feb 30 passes). For real date validation parse with the language’s date library.
ISO datetime (RFC 3339)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])T([01]\d|2[0-3]):[0-5]\d:[0-5]\d(\.\d+)?(Z|[+-]([01]\d|2[0-3]):[0-5]\d)$
This covers the form most APIs accept.
UUID v4
^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
The 4 and [89ab] fix the version and variant bits. Drop them for any UUID version.
US ZIP and Canadian postcode
^(\d{5}(-\d{4})?|[A-CEGHJ-NPR-TVXY]\d[A-Z]\s?\d[A-Z]\d)$
International addresses are too varied for regex; do not try.
Hex color
^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$
Covers #rgb, #rgba, #rrggbb, #rrggbbaa.
Common gotchas
- Regex is not a parser. HTML, JSON, and YAML have nesting rules regex cannot honor; use a real parser.
^and$in PCRE without themflag mean start and end of string, not line.\din some flavors (Python withre.UNICODE, JavaScript withuflag) matches digit characters in other scripts. Use[0-9]when you mean ASCII digits only.- POSIX (grep, sed) does not support lookaround. Use Perl-compatible mode (
grep -P,pcregrep) when you need it. - Greedy quantifiers in pathological inputs can hang. Avoid nested quantifiers like
(a+)+; they cause catastrophic backtracking. - Always anchor validation regex with
^and$. Without anchors,\d{3}matches insideabc123def. - Case-sensitivity defaults differ. Pass the
iflag or(?i)when needed.