fix: surrogate pair (emoji) handling for a, r, s, ~, ga commands#9929
Open
k1832 wants to merge 1 commit intoVSCodeVim:masterfrom
Open
fix: surrogate pair (emoji) handling for a, r, s, ~, ga commands#9929k1832 wants to merge 1 commit intoVSCodeVim:masterfrom
k1832 wants to merge 1 commit intoVSCodeVim:masterfrom
Conversation
0cdb8e1 to
14abd9a
Compare
Open
14abd9a to
164974e
Compare
`position.getRight()` increments by 1 UTF-16 code unit, but emojis outside the Basic Multilingual Plane are encoded as 2 code units (a surrogate pair). Moving by 1 code unit lands between the pair, and VSCode's `validatePosition` clamps it back to the start — so the cursor effectively goes backward. Add `getSurrogateAwareRight()`/`getSurrogateAwareLeft()` helpers on the Position prototype that skip past surrogate pairs, and use them in the affected commands: - `a` (Append): cursor now lands after the emoji - `r` (ReplaceCharacter): replaces the full emoji, not half - `s` (ChangeOperator): deletes the full emoji before entering insert - `~` (ToggleCase): advances past the emoji without corrupting it - `ga` (UnicodeInfo): shows full codepoint (e.g. U+1F604) via codePointAt() instead of the half-surrogate from charCodeAt() Commands already protected (no changes needed): x/X (DeleteOperator), l/h (MoveRight/MoveLeft), y (YankOperator).
164974e to
c3b1300
Compare
This was referenced Feb 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
When the cursor is on a character encoded as a UTF-16 surrogate pair — emojis (
😄), rare CJK characters (𩸽), mathematical symbols (𝒟,𝔸), musical symbols (𝄞), etc. — several character-level commands break.position.getRight()increments by 1 UTF-16 code unit, but these characters are 2 code units. Moving by 1 lands between the pair, and VSCode'svalidatePositionclamps it back to the start — so the cursor effectively goes backward.a(Append)r(Replace)s(Change char)~(Toggle case)ga(Unicode info)Previous fixes addressed insert mode (PR #7977 / #6046) and motions (
l/h) and operators (x/X/y), but these 5 commands were missed. This PR completes the surrogate pair handling for character-level commands and adds 12 regression tests to prevent future breakage.Adds
getSurrogateAwareRight()/getSurrogateAwareLeft()helpers on the Position prototype that skip past surrogate pairs, following the same pattern already used byMoveRight(l) /MoveLeft(h). Uses these helpers in the 5 affected commands. Also switchesgafromcharCodeAt()tocodePointAt()to report the full Unicode codepoint.Which issue(s) this PR fixes
Fixes #9931
Partially addresses #8321 — this PR fixes
rand other character-level commands reported there. Remaining issues not addressed: easymotion uses raw character arithmetic for match positioning and marker decorations;xp(transpose) on surrogate pairs is also still broken (TODO added input.ts).Special notes for your reviewer: