Skip to content

std.regex: multi-digit backreference silently drops the overshooting digit #11035

Description

@kubo39

When a multi-digit backreference like \12 is written but fewer than 12 capturing groups exist, std.regex silently drops the trailing digit — it is matched as neither a backreference digit, a literal, nor an octal escape, and no error is raised.

import std.regex, std.stdio;

void main()
{
    // only one group exists, so `\12` cannot be group 12
    writeln(matchFirst("aa2", regex(`(a)\12`)).hit);   // prints "aa", not "aa2"
}

(a)\12 reduces to \1 and the 2 disappears (it also matches "aa" with no trailing 2). With 12+ groups \12 correctly refers to group 12; only this fallback path is broken.

I think the cause is, in std/regex/internal/parser.d:

//perl's disambiguation rule i.e.
//get next digit only if there is such group number
popFront();
while (nref < maxBackref && !empty && std.ascii.isDigit(front))
{
    nref = nref * 10 + front - '0';
    popFront();                 // the extra digit is consumed here
}
if (nref >= maxBackref)
    nref /= 10;                 // number reverted, but cursor already past the digit

The loop forms nref = 12 and consumes the 2; when group 12 doesn't exist, nref /= 10 reverts the number, but the cursor has already moved past the digit, so it is lost. This already contradicts the comment's own rule — "get next digit only if there is such group number".

The rule is also labelled "perl's disambiguation rule", but that isn't how Perl resolves \12: an out-of-range multi-digit reference is an octal escape in Perl..

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions