Skip to content

Conversation

@AAgnihotry
Copy link
Contributor

@AAgnihotry AAgnihotry commented Jan 12, 2026

Summary

Adds live span tracking for evaluation runs using the upsert_span API, enabling real-time visibility into evaluation progress.

Changes

New Feature: LiveTrackingSpanProcessor

  • Real-time span updates: Sends span status as evaluations execute

    • on_start: Upserts span with RUNNING status when evaluation begins
    • on_end: Upserts span with final status (OK/ERROR) when complete
  • Smart span filtering: Only tracks evaluation-related spans

    • Span types: eval, evaluator, evaluation, eval_set_run, evalOutput
    • Spans with execution.id attribute
    • Filters out non-eval spans (agents, tools, etc.)
  • Graceful error handling: Catches and logs exceptions without failing evaluations

Implementation Details

  • Added LiveTrackingSpanProcessor class in src/uipath/_cli/_evals/_runtime.py
  • Integrated with LlmOpsHttpExporter for upsert_span API calls
  • Registered as span processor in UiPathEvalRuntime initialization

Testing

Added comprehensive test suite with 35+ test cases:

  • tests/cli/eval/test_live_tracking_span_processor.py
  • Tests cover initialization, span detection, status updates, exception handling, and edge cases
  • All tests passing ✅ (1613 passed, 7 skipped)
  • Full mypy type checking compliance

Benefits

  • Real-time monitoring: See evaluation progress as it happens
  • Better debugging: Track which evaluations are running vs completed
  • Non-intrusive: Doesn't affect evaluation execution or results
  • Production-ready: Comprehensive error handling and test coverage

Version

Bumped version to 2.4.16

Development Package

  • Use uipath pack --nolock to get the latest dev build from this PR (requires version range).
  • Add this package as a dependency in your pyproject.toml:
[project]
dependencies = [
  # Exact version:
  "uipath==2.4.16.dev1010943791",

  # Any version from PR
  "uipath>=2.4.16.dev1010940000,<2.4.16.dev1010950000"
]

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
publish-url = "https://test.pypi.org/legacy/"
explicit = true

[tool.uv.sources]
uipath = { index = "testpypi" }

[tool.uv]
override-dependencies = [
    "uipath>=2.4.16.dev1010940000,<2.4.16.dev1010950000",
]

Live Tracking Video

GMT20260113-234429_Clip_Anirudh.Agnihotry.s.Clip.01_13_2026.mp4

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 12, 2026
@AAgnihotry AAgnihotry added build:dev Create a dev build from the pr test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository and removed test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 12, 2026
AAgnihotry added a commit that referenced this pull request Jan 13, 2026
Added test_live_tracking_span_processor.py with 35+ test cases covering:

- Initialization and setup
- on_start() method behavior with RUNNING status
- on_end() method with final status
- Span type detection (_is_eval_span)
  - All eval span types: eval, evaluator, evaluation, eval_set_run, evalOutput
  - execution.id attribute detection
  - Non-eval span filtering
- Exception handling in on_start/on_end
- Edge cases: no attributes, empty attributes
- Processor lifecycle: shutdown, force_flush
- Parent context handling

All tests verify that:
- Only eval-related spans are tracked
- upsert_span is called with correct parameters
- Exceptions are caught and logged gracefully
- Non-eval spans are properly filtered out

Resolves the missing test coverage for PR #1094.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
AAgnihotry added a commit that referenced this pull request Jan 13, 2026
Added test_live_tracking_span_processor.py with 35+ test cases covering:

- Initialization and setup
- on_start() method behavior with RUNNING status
- on_end() method with final status
- Span type detection (_is_eval_span)
  - All eval span types: eval, evaluator, evaluation, eval_set_run, evalOutput
  - execution.id attribute detection
  - Non-eval span filtering
- Exception handling in on_start/on_end
- Edge cases: no attributes, empty attributes
- Processor lifecycle: shutdown, force_flush
- Parent context handling

All tests verify that:
- Only eval-related spans are tracked
- upsert_span is called with correct parameters
- Exceptions are caught and logged gracefully
- Non-eval spans are properly filtered out

Resolves the missing test coverage for PR #1094.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@AAgnihotry AAgnihotry marked this pull request as ready for review January 13, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build:dev Create a dev build from the pr test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant