Overview

Use a real parser when one exists; reach for regex only when the input is small, structured, and one-shot. This card lists the patterns that come up most often along with the caveats that bite. Flavors differ: PCRE (most languages), POSIX (grep, sed), and the JavaScript engine each have small but real divergences. Patterns below use PCRE/Perl syntax unless noted.

Character classes

The shorthands worth memorizing.

ClassMatches
.Any character except newline (with s flag: any).
\dDigit [0-9].
\DNon-digit.
\wWord char [A-Za-z0-9_].
\WNon-word.
\sWhitespace (space, tab, newline).
\SNon-whitespace.
[abc]Any of a, b, c.
[^abc]Any except a, b, c.
[a-z]Range a to z.
\bWord boundary (zero-width).
\BNon-word-boundary.
^ / $Start / end of string or line (with m flag).

\w does not include hyphen or dot. Spell those out: [\w.-].

Quantifiers

How many times to match the preceding element.

QuantifierMeaning
?0 or 1.
*0 or more.
+1 or more.
{n}Exactly n.
{n,}At least n.
{n,m}Between n and m.
*?, +?, ??Lazy (non-greedy) versions.

Default quantifiers are greedy. <.+> on <a><b> matches <a><b>. Lazy <.+?> matches <a>.

Grouping and lookarounds

The features that let regex compose.

ConstructEffect
(abc)Capturing group.
(?:abc)Non-capturing group.
(?<name>abc)Named capturing group.
\1, \2Backreference to capture.
(?=abc)Positive lookahead (zero-width).
(?!abc)Negative lookahead.
(?<=abc)Positive lookbehind.
(?<!abc)Negative lookbehind.
|Alternation.

Prefer non-capturing groups (?:...) when no backreference is needed. They keep the capture array clean.

Common patterns

The patterns to copy, paste, and adjust.

Email (RFC-pragmatic)

^[\w.!#$%&'*+/=?^`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+$

The official RFC 5322 grammar runs hundreds of characters and matches strings real mail systems reject. Use the pragmatic form for client-side validation; send a confirmation email for the real check.

URL (http and https)

^https?://[^\s/$.?#].[^\s]*$

This catches obvious garbage. For host validation use a URL parser (URL in JS, urllib.parse in Python).

IPv4 address

^((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)$

(\d{1,3}\.){3}\d{1,3} will match 999.999.999.999. The octet-bounded pattern above rejects invalid ranges.

IPv6 address (basic)

^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$

This rejects the :: shorthand and embedded IPv4. Use a parser (ipaddress in Python, net.ParseIP in Go) for anything user-facing.

Port number (1 to 65535)

^([1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$

\d{1,5} lets through 0 and 99999. The bounded form is necessary if the value matters.

ISO date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

This validates digit ranges but not month-length truth (Feb 30 passes). For real date validation parse with the language’s date library.

ISO datetime (RFC 3339)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])T([01]\d|2[0-3]):[0-5]\d:[0-5]\d(\.\d+)?(Z|[+-]([01]\d|2[0-3]):[0-5]\d)$

This covers the form most APIs accept.

UUID v4

^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$

The 4 and [89ab] fix the version and variant bits. Drop them for any UUID version.

US ZIP and Canadian postcode

^(\d{5}(-\d{4})?|[A-CEGHJ-NPR-TVXY]\d[A-Z]\s?\d[A-Z]\d)$

International addresses are too varied for regex; do not try.

Hex color

^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$

Covers #rgb, #rgba, #rrggbb, #rrggbbaa.

Common gotchas

  • Regex is not a parser. HTML, JSON, and YAML have nesting rules regex cannot honor; use a real parser.
  • ^ and $ in PCRE without the m flag mean start and end of string, not line.
  • \d in some flavors (Python with re.UNICODE, JavaScript with u flag) matches digit characters in other scripts. Use [0-9] when you mean ASCII digits only.
  • POSIX (grep, sed) does not support lookaround. Use Perl-compatible mode (grep -P, pcregrep) when you need it.
  • Greedy quantifiers in pathological inputs can hang. Avoid nested quantifiers like (a+)+; they cause catastrophic backtracking.
  • Always anchor validation regex with ^ and $. Without anchors, \d{3} matches inside abc123def.
  • Case-sensitivity defaults differ. Pass the i flag or (?i) when needed.