-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix incorrect regex pattern in regex_replace_posix_groups #19827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix incorrect regex pattern in regex_replace_posix_groups #19827
Conversation
The regex pattern was using (\d*) which matches zero or more digits,
causing a lone backslash to be incorrectly converted to ${}. Changed
to (\d+) which requires at least one digit, ensuring only actual POSIX
capture group references like \1, \2, etc. are converted to , .
Added unit tests to verify the correct behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the regex_replace_posix_groups function where the regex pattern (\d*) incorrectly matched zero or more digits, causing a lone backslash \ to be transformed into ${}. The fix changes the pattern to (\d+) to require at least one digit.
Changes:
- Modified the regex pattern in
regex_replace_posix_groupsfrom(\d*)to(\d+)to match one or more digits - Added comprehensive unit tests to validate the fix and prevent regression
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
xanderbailey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. Thanks for the PR. This looks correct to me but I don't have approval permissions!
The regex pattern was only matching single backslash (\1) but SQL string
literals with escaped backslashes pass through as double backslash (\\1).
Changed the pattern from (\\)(\d+) to \\{1,2}(\d+) to match 1 or 2
backslashes followed by digits, ensuring proper handling of SQL escaped
backreferences like 'X\\1Y'.
Added unit tests for the double backslash case.
|
Hi @GaneshPatil7517 Thank you for contribution. Please try to use the PR templates instead of the manual descriptions, it will allow the reviewers to easily understand the context. |
ok sir ill follow from you instruction.... |
|
FYI @shivbhatia10 -- is there any chance you would like to help review this PR too? |
shivbhatia10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you @GaneshPatil7517 !
|
Thank you so much to all for feedback, all checks are passed successfully passed Excited to see it merged... |
kosiew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GaneshPatil7517 for contributing.
| fn regex_replace_posix_groups(replacement: &str) -> String { | ||
| static CAPTURE_GROUPS_RE_LOCK: LazyLock<Regex> = | ||
| LazyLock::new(|| Regex::new(r"(\\)(\d*)").unwrap()); | ||
| LazyLock::new(|| Regex::new(r"\\{1,2}(\d+)").unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also matches \0, which will be rewritten as ${0}. In Rust’s regex replacement syntax, ${0} substitutes the entire match, not a numbered capture.
Is \0 is meant to be a valid capture reference or treated literally?
Let's add a test to explicitly document the behaviour of \0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback Great catch on the \0 behavior.
I've added documentation and a test to explicitly document this behavior:
\0 is converted to ${0}, which in Rust's regex replacement syntax substitutes the entire match
This is consistent with POSIX behavior where \0 (or &) refers to the entire matched string
Added test cases:
// Test \0 behavior: \0 is converted to ${0}, which in Rust's regex
// replacement syntax substitutes the entire match. This is consistent
// with POSIX behavior where \0 (or &) refers to the entire matched string.
assert_eq!(regex_replace_posix_groups(r"\0"), "${0}");
assert_eq!(regex_replace_posix_groups(r"prefix\0suffix"), "prefix${0}suffix");
So \0 is intentionally treated as a valid capture reference (for the entire match), not literally. This aligns with standard regex behavior in most systems.
The
regex_replace_posix_groupsmethod was using the pattern(\d*)to matchPOSIX capture group references like
\1. However,*matches zero or moredigits, which caused a lone backslash
\to incorrectly become${}.Changed to
(\d+)which requires at least one digit, fixing the issue.Added unit tests to validate correct behavior.