-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The evaluation_run table score column stores two different JSON schemas:
- Newer schema: {"traces": [{"scores": [{"name": "...", "value": ...}]}]}
- Older legacy schema: {"cosine_similarity": {"avg": ..., "per_item_scores": [...]}}
This causes frontend rendering issues. Additionally, the older schema only contains cosine similarity scores (missing judge scores),
while the newer schema has both.
To Reproduce
Occurs sporadically. Likely related to the order in which cosine similarity and judge scores are fetched/merged.
Expected behavior
Single consistent schema for the score column across all evaluation runs.
Additional context
Older schema example:
{
"cosine_similarity": {
"avg": 0.6414,
"std": 0.0800,
"total_pairs": 24,
"per_item_scores": [
{
"trace_id": "9b80f66b-...",
"cosine_similarity": 0.7379
}
]
}
}
Newer schema example:
{
"traces": [
{
"scores": [
{
"name": "SNEHA correctness",
"value": ...
}
]
}
]
}
Screenshots

Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
To Do