Skip to content

Evaluation: Fix score format #520

@Prajna1999

Description

@Prajna1999

Describe the bug

The evaluation_run table score column stores two different JSON schemas:

  • Newer schema: {"traces": [{"scores": [{"name": "...", "value": ...}]}]}
  • Older legacy schema: {"cosine_similarity": {"avg": ..., "per_item_scores": [...]}}

This causes frontend rendering issues. Additionally, the older schema only contains cosine similarity scores (missing judge scores),
while the newer schema has both.

To Reproduce

Occurs sporadically. Likely related to the order in which cosine similarity and judge scores are fetched/merged.

Expected behavior

Single consistent schema for the score column across all evaluation runs.

Additional context

Older schema example:

{
    "cosine_similarity": {
        "avg": 0.6414,
        "std": 0.0800,
        "total_pairs": 24,
        "per_item_scores": [
            {
                "trace_id": "9b80f66b-...",
                "cosine_similarity": 0.7379
            }
        ]
    }
}

Newer schema example:

{
    "traces": [
        {
            "scores": [
                {
                    "name": "SNEHA correctness",
                    "value": ...
                }
            ]
        }
    ]
}

Screenshots

Image

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

To Do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions