Custom Regex
Operator-supplied regex patterns with named placeholders for project-specific identifiers
Custom Regex runs operator-supplied regex patterns against the input. It runs in the preflight stage so its matches are masked before LLM-stage checks run, and so its placeholders merge with PII / Secret Keys / URL placeholders in the parent's longest-first mask pass.
This is the right component when no built-in detector covers your domain — internal ticket IDs, proprietary order numbers, customer-account formats, study identifiers, etc.
Properties
| Property | Description |
|---|---|
| Patterns | Array of named regex entries. Each entry has a Name (used as the placeholder [name] in sanitize mode and as the violation identifier in check mode) and a Regex (a Java regular expression). At least one entry is required |
Regex Syntax
Each entry's Regex field accepts two forms:
- Plain Java regex:
ORD-\d{4}— matched with default flags (case-sensitive). - JavaScript-style literal:
/pattern/flags— supportsi(case-insensitive),m(multiline),s(dotall), andu(Unicode). Example:/ssn-\d{4}/i.
Both forms are validated up front:
- Maximum expression length so a workflow author can't paste a multi-megabyte pathological pattern.
- Per-match character budget so catastrophic-backtracking inputs abort with an execution-budget error rather than hanging the worker thread.
A bad pattern is treated as a configuration error and the request is blocked — a broken regex is an operator bug, not a transient outage, and silently allowing requests through a dead detector is worse than blocking until someone fixes it.
Two Variants
- Custom Regex (check) — flags every match. The violation diagnostic info carries the pattern names so you can see which entries fired (useful when several entries are configured).
- Custom Regex (sanitize) — replaces each match with
[name](the entry's own name in square brackets). Stable, operator-controlled placeholder.
The sanitize variant uses [name] rather than the shared <TYPE> convention used by PII / Secret Keys / URLs because the operator owns the naming. Name your entries in upper-case (CUSTOMER_ID) if you want LLM-stage downstream checks to see entries as the same shape as built-in masks — the classifier doesn't care about the brackets vs. angle brackets.
Examples
One internal-ID pattern, with a JS-style literal for case insensitivity:
{
"type": "guardrails/v1/customRegexCheck",
"parameters": {
"patterns": [
{ "name": "ORDER_ID", "regex": "/ORD-\\d{4,}/i" }
]
}
}Multiple named patterns running together, mask in place:
{
"type": "guardrails/v1/customRegexSanitize",
"parameters": {
"patterns": [
{ "name": "internal-id", "regex": "MY-INTERNAL-\\d{4}" },
{ "name": "ticket", "regex": "TCK-\\d{6}" },
{ "name": "study-id", "regex": "STUDY-[A-Z]{2}-\\d{3}" }
]
}
}A workflow run with "ref MY-INTERNAL-1234 for TCK-987654" produces "ref [internal-id] for [ticket]".
Edge Cases
- Replacement-string metacharacters: a pattern name containing
$or\would normally be interpreted as a regex back-reference. The sanitize variant quotes them safely so a name like$1\0renders literally as[$1\0]. - Multiple-entry budget failures: each entry runs in its own try/catch. If two entries are pathological, both surface in the aggregated exception (as suppressed causes) so you can fix all bad patterns at once instead of finding them one at a time across runs.
How is this guide?
Last updated on