Why your regex is slow: catastrophic backtracking explained
The reason `(a+)+b` hangs on `aaaaaaaaac` — and the five patterns that turn a 10ms validator into a 30-second ReDoS attack.
# The 30-second example
// Regex: (a+)+b
// String: "aaaaaaaaaaaaaaaaaaaaaaaaac"
// Time to fail: 30 seconds on an M1.
Yes, a 26-character string. Let's see why.
# What a backtracking engine does
When a regex like (a+)+ meets "aaaa", it has to decide how to split that string across the two quantifiers. Options:
- Outer
(a+)+matches everything once:(aaaa) - Or twice:
(aaa)(a),(aa)(aa),(a)(aaa) - Or three times, four times...
Each split is a potential match. The engine tries the greediest first, then backtracks.
For n a's, the number of splits is 2^(n-1). At 20 characters, that's 524,288 combinations. At 30, over a billion.
If the overall regex then fails at the end (the b doesn't match), the engine tries every last split before giving up. That's when your CPU catches fire.
# The fingerprint of the problem
You have catastrophic backtracking when:
1. Two nested quantifiers can match the same character — (a+)+, (.+)+, (a|aa)+
2. Greedy + failure — the regex eventually fails, forcing the engine to try all combinations
The second part matters. (a+)+b on "aaaaab" finishes fast because it matches. The pain comes when the match fails.
# Five patterns that are bombs
(a+)+b // nested + quantifier
(a*)*b // nested * quantifier
(a|a)*b // alternation with overlapping branches
(a|aa|aaa)+b // alternation that can match one char multiple ways
(.*a){11} // bounded quantifier around an unbounded one
All of these are O(2^n) in pathological cases.
# Real examples
# Email validation (yes, really)
^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}...
An "email regex" found in production. The (...)* can match every way an email domain can be split into dots. Attacker submits a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a@ and the validator hangs.
# Cloudflare's 2019 outage
.*(?:.*=.*)
Two .* patterns with overlapping matches. When the input had many = characters, matching exploded. Caused a 27-minute global outage in July 2019.
# Stack Overflow's 2016 outage
A single regex applied to a single long post brought down stackoverflow.com for 34 minutes. Pattern:
^[\s\u200c]+|[\s\u200c]+$
Alternating spaces and zero-width non-joiners defeated the trimmer.
# How to fix it
# 1. Use possessive quantifiers (when supported)
(a+)+b // vulnerable
(a++)+b // safe: no backtracking inside (a++)
Possessive ++ means "match as many as possible and never give any back." Supported in PCRE, Java, Ruby. Not in JavaScript or Python re.
# 2. Use atomic groups
(?>a+)+b // atomic — once matched, no backtracking allowed
Also PCRE/Java/Ruby. JavaScript added them in ES2018, so modern Node works.
# 3. Rewrite to eliminate ambiguity
(a+)+b // bad
a+b // same intent, no ambiguity
Nine times out of ten the nested quantifier isn't actually needed.
# 4. Anchor tightly
^.*foo.*$ // can backtrack wildly
^[^f]*foo.*$ // can't
The second form fails fast when there's no 'f' in the string.
# 5. Use a non-backtracking engine
Go's regexp, Rust's regex crate, and Google's RE2 all use the Thompson NFA algorithm. Guaranteed linear time. They reject some features (backreferences, lookbehinds) in exchange.
// Go: safe by construction
re := regexp.MustCompile("(a+)+b")
re.MatchString("aaaaaaaaaaaac")
// Linear time, always
# Detection tools
- regex101.com shows step counts for a test input — a useful canary.
- recheck (npm) statically analyses a regex for ReDoS vulnerability.
- rxxr2 is a research tool that finds attack strings given a vulnerable pattern.
Any regex that touches user input should pass one of these before shipping.
# The rules
1. Never use a regex with nested quantifiers on user input.
2. For user-facing validation, prefer simple character-class-based patterns over alternation.
3. If you need complex matching, use a non-backtracking engine (Go, Rust) or write a real parser.
4. Every regex on user input needs a timeout. Node's /regex/.exec has no timeout — you have to Promise.race it.
<div class="callout callout-tip" role="note"><div class="callout-title">Tip</div><div class="callout-body"><p>When in doubt, test against an input of 100 characters that forces the regex to fail. If it doesn't finish in under 100ms, rewrite it.</p></div></div>
# Test your patterns
Paste any regex into our Regex Tester — live matching, flag toggles, and you can copy-paste adversarial inputs directly. It uses the browser's ECMAScript engine so if it's slow there, it's slow in your app.
# Related reading
Frequently asked questions
›Is this a real security issue?
Yes. ReDoS (regex denial of service) is CWE-1333. In 2019, a regex in Cloudflare's WAF brought down a significant fraction of the internet for 27 minutes. An unreviewed regex in user input validation is a real attack surface.
›Which engines avoid catastrophic backtracking?
Go's RE2, Rust's `regex` crate, and RE2-based engines use a linear-time algorithm — they don't backtrack. JavaScript (V8), Python's `re`, Perl, PCRE all use backtracking and are vulnerable.
›How do I test if my regex is safe?
Run it against a string with a long repeating prefix followed by a character that fails the last part. If it takes more than a few ms on 100 characters, you have a backtracking problem.