Skip to content
All posts

Why your regex is slow: catastrophic backtracking explained

The reason `(a+)+b` hangs on `aaaaaaaaac` — and the five patterns that turn a 10ms validator into a 30-second ReDoS attack.

DDDev DeskDeveloper Tools EditorPublished April 27, 20262 min readadvanced

# The 30-second example


// Regex: (a+)+b
// String: "aaaaaaaaaaaaaaaaaaaaaaaaac"
// Time to fail: 30 seconds on an M1.

Yes, a 26-character string. Let's see why.

# What a backtracking engine does

When a regex like (a+)+ meets "aaaa", it has to decide how to split that string across the two quantifiers. Options:

  • Outer (a+)+ matches everything once: (aaaa)
  • Or twice: (aaa)(a), (aa)(aa), (a)(aaa)
  • Or three times, four times...

Each split is a potential match. The engine tries the greediest first, then backtracks.

For n a's, the number of splits is 2^(n-1). At 20 characters, that's 524,288 combinations. At 30, over a billion.

If the overall regex then fails at the end (the b doesn't match), the engine tries every last split before giving up. That's when your CPU catches fire.

# The fingerprint of the problem

You have catastrophic backtracking when:

1. Two nested quantifiers can match the same character — (a+)+, (.+)+, (a|aa)+

2. Greedy + failure — the regex eventually fails, forcing the engine to try all combinations

The second part matters. (a+)+b on "aaaaab" finishes fast because it matches. The pain comes when the match fails.

# Five patterns that are bombs


(a+)+b            // nested + quantifier
(a*)*b            // nested * quantifier
(a|a)*b           // alternation with overlapping branches
(a|aa|aaa)+b      // alternation that can match one char multiple ways
(.*a){11}         // bounded quantifier around an unbounded one

All of these are O(2^n) in pathological cases.

# Real examples

# Email validation (yes, really)


^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}...

An "email regex" found in production. The (...)* can match every way an email domain can be split into dots. Attacker submits a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a@ and the validator hangs.

# Cloudflare's 2019 outage


.*(?:.*=.*)

Two .* patterns with overlapping matches. When the input had many = characters, matching exploded. Caused a 27-minute global outage in July 2019.

# Stack Overflow's 2016 outage

A single regex applied to a single long post brought down stackoverflow.com for 34 minutes. Pattern:


^[\s\u200c]+|[\s\u200c]+$

Alternating spaces and zero-width non-joiners defeated the trimmer.

# How to fix it

# 1. Use possessive quantifiers (when supported)


(a+)+b        // vulnerable
(a++)+b       // safe: no backtracking inside (a++)

Possessive ++ means "match as many as possible and never give any back." Supported in PCRE, Java, Ruby. Not in JavaScript or Python re.

# 2. Use atomic groups


(?>a+)+b      // atomic — once matched, no backtracking allowed

Also PCRE/Java/Ruby. JavaScript added them in ES2018, so modern Node works.

# 3. Rewrite to eliminate ambiguity


(a+)+b        // bad
a+b           // same intent, no ambiguity

Nine times out of ten the nested quantifier isn't actually needed.

# 4. Anchor tightly


^.*foo.*$     // can backtrack wildly
^[^f]*foo.*$  // can't

The second form fails fast when there's no 'f' in the string.

# 5. Use a non-backtracking engine

Go's regexp, Rust's regex crate, and Google's RE2 all use the Thompson NFA algorithm. Guaranteed linear time. They reject some features (backreferences, lookbehinds) in exchange.


// Go: safe by construction
re := regexp.MustCompile("(a+)+b")
re.MatchString("aaaaaaaaaaaac")
// Linear time, always

# Detection tools

  • regex101.com shows step counts for a test input — a useful canary.
  • recheck (npm) statically analyses a regex for ReDoS vulnerability.
  • rxxr2 is a research tool that finds attack strings given a vulnerable pattern.

Any regex that touches user input should pass one of these before shipping.

# The rules

1. Never use a regex with nested quantifiers on user input.

2. For user-facing validation, prefer simple character-class-based patterns over alternation.

3. If you need complex matching, use a non-backtracking engine (Go, Rust) or write a real parser.

4. Every regex on user input needs a timeout. Node's /regex/.exec has no timeout — you have to Promise.race it.

<div class="callout callout-tip" role="note"><div class="callout-title">Tip</div><div class="callout-body"><p>When in doubt, test against an input of 100 characters that forces the regex to fail. If it doesn't finish in under 100ms, rewrite it.</p></div></div>

# Test your patterns

Paste any regex into our Regex Tester — live matching, flag toggles, and you can copy-paste adversarial inputs directly. It uses the browser's ECMAScript engine so if it's slow there, it's slow in your app.

Common questions

Frequently asked.

Is this a real security issue?

Yes. ReDoS (regex denial of service) is CWE-1333. In 2019, a regex in Cloudflare's WAF brought down a significant fraction of the internet for 27 minutes. An unreviewed regex in user input validation is a real attack surface.

Which engines avoid catastrophic backtracking?

Go's RE2, Rust's `regex` crate, and RE2-based engines use a linear-time algorithm — they don't backtrack. JavaScript (V8), Python's `re`, Perl, PCRE all use backtracking and are vulnerable.

How do I test if my regex is safe?

Run it against a string with a long repeating prefix followed by a character that fails the last part. If it takes more than a few ms on 100 characters, you have a backtracking problem.

Nieuwe berichten, één keer per week.

Praktische handleidingen voor ontwikkelaars. Geen spam. Uitschrijven kan op elk moment.

Tools mentioned

Pick up where the post leaves off.

Keep reading

More from the field notes.