fix(llm): sanitize control characters in function call JSON arguments #4196

ArpitKotecha · 2025-12-08T12:27:15Z

Description

Problem

LLMs sometimes generate function call JSON with literal control characters (e.g., newlines, tabs) inside string values. For example:

{"prompt": "A timeline showing:
- Event 1
- Event 2"}

The literal newline violates the JSON spec, causing pydantic_core.from_json() to fail with:

ValueError: control character (\u0000-\u001F) found while parsing a string

This breaks function tool execution when the LLM outputs multi-line content in tool arguments.

Solution

Add a _sanitize_json_control_chars() helper that escapes control characters within JSON string values before parsing:

\n → \\n
\r → \\r
\t → \\t
Other control chars → \\uXXXX

The function preserves already-escaped sequences and only modifies content inside JSON string values.

Changes

Added _sanitize_json_control_chars() helper function in utils.py
Modified prepare_function_arguments() to sanitize JSON before calling from_json()

Testing

Tested with real-world LLM output containing multi-line prompts that previously caused the error.

LLMs sometimes generate JSON with literal control characters (e.g., newlines, tabs) inside string values. These violate the JSON spec and cause pydantic_core's from_json() to fail with: ValueError: control character (\u0000-\u001F) found while parsing a string This adds a sanitization step before parsing that escapes control characters (\n, \r, \t, etc.) within JSON string values while preserving already-escaped sequences. Fixes issue where function tools with multi-line content in arguments would fail to parse.

CLAassistant · 2025-12-08T12:27:22Z

All committers have signed the CLA.

Copilot

Pull request overview

This PR adds sanitization for control characters in LLM-generated JSON function call arguments to prevent parsing errors. The main issue addressed is that LLMs sometimes generate JSON with literal control characters (newlines, tabs, etc.) inside string values, which violates the JSON specification and causes pydantic_core.from_json() to fail.

Key changes:

Added _sanitize_json_control_chars() helper function that escapes control characters within JSON string values
Modified prepare_function_arguments() to sanitize JSON before parsing
Updated mistralai dependency from 1.9.3 to 1.9.11 (with new invoke and pyyaml dependencies)

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

File	Description
uv.lock	Updated mistralai dependency from 1.9.3 to 1.9.11 and added invoke 2.2.1 and pyyaml dependencies
livekit-agents/livekit/agents/llm/utils.py	Added `_sanitize_json_control_chars()` function and integrated it into `prepare_function_arguments()` to escape control characters before JSON parsing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

livekit-agents/livekit/agents/llm/utils.py

davidzhao · 2026-01-09T07:13:32Z

livekit-agents/livekit/agents/llm/utils.py

+    if not json_str:
+        return json_str
+
+    def escape_control_chars_in_string(match: re.Match[str]) -> str:


could you add some tests for the parsing function? it's important to ensure that it's handling the wide variety of valid JSON inputs without breaking them.

also, isn't it better to run a regex replacement to strip them out?

Copilot AI review requested due to automatic review settings January 8, 2026 15:17

Copilot started reviewing on behalf of ArpitKotecha January 8, 2026 15:17 View session

format: agent/llm/utils.py

7bf3b00

ArpitKotecha force-pushed the fix/sanitize-json-control-chars branch from 592c79e to 7bf3b00 Compare January 8, 2026 15:18

Copilot AI reviewed Jan 8, 2026

View reviewed changes

livekit-agents/livekit/agents/llm/utils.py Show resolved Hide resolved

davidzhao reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): sanitize control characters in function call JSON arguments #4196

fix(llm): sanitize control characters in function call JSON arguments #4196

Uh oh!

ArpitKotecha commented Dec 8, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 8, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

davidzhao Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(llm): sanitize control characters in function call JSON arguments #4196

Are you sure you want to change the base?

fix(llm): sanitize control characters in function call JSON arguments #4196

Uh oh!

Conversation

ArpitKotecha commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Testing

Uh oh!

CLAassistant commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

davidzhao Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArpitKotecha commented Dec 8, 2025 •

edited

Loading

CLAassistant commented Dec 8, 2025 •

edited

Loading