-
Notifications
You must be signed in to change notification settings - Fork 528
Open
Description
Description
diffWords throws an error in replaceSuffix when both conditions are met:
intlSegmenteroption is provided- Input text contains an orphaned combining mark (e.g. U+0300) preceded by whitespace
Minimal reproduction
import { diffWords } from 'diff';
const segmenter = new Intl.Segmenter(undefined, { granularity: 'word' });
diffWords(
'* BHG, N 2029; \r\n* Λόγος εἰς τὸν ... \u0300Α next words',
'* BHG, N 2029; \r\n* Λόγος εἰς τὸν ... \u0300Α changed text',
{ intlSegmenter: segmenter }
);
// Error: string "* BHG, N 2029; \r\n* Λόγος εἰς τὸν ... ̀"
// doesn't end with suffix " "; this is a bugEven simpler:
diffWords(
'abc \u0300X def',
'abc \u0300Y ghi',
{ intlSegmenter: segmenter }
);
// Error: string "abc ̀" doesn't end with suffix " "; this is a bugWithout intlSegmenter, both examples work fine.
Root cause
Intl.Segmenter treats " \u0300" (space + combining grave accent) as a single non-word segment:
const segments = [...segmenter.segment('abc \u0300X def')];
// "abc" — isWordLike: true
// " ̀" — isWordLike: false ← space + combining mark merged
// "X" — isWordLike: true
// " " — isWordLike: false
// "def" — isWordLike: trueThis causes dedupeWhitespaceInChangeObjects in postProcess to fail: it calls replaceSuffix(startKeep.value, newWsPrefix, commonWsPrefix) expecting startKeep.value to end with a space, but the space was merged with the combining mark into a single token, so the keep-chunk ends with " \u0300" instead of " ".
Without intlSegmenter, the regex-based word splitter treats space and combining mark separately, so the bug doesn't occur.
Versions
diff: 7.0.0, 8.0.0–8.0.3 (all affected)- Node.js: v22.19.0
- Also reproduced in Chrome 133
Workaround
Strip orphaned combining marks (combining mark preceded by whitespace) before diffing:
const clean = text => text.replace(/(\s)[\u0300-\u036F\u0483-\u0489]+/g, '$1');
diffWords(clean(oldStr), clean(newStr), { intlSegmenter: segmenter });Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels