Skip to content

Fix: Ensure proper embedding dimensions#54

Open
Daniel-Vaz wants to merge 4 commits intokagent-dev:mainfrom
Daniel-Vaz:embedding-dimensions-fix
Open

Fix: Ensure proper embedding dimensions#54
Daniel-Vaz wants to merge 4 commits intokagent-dev:mainfrom
Daniel-Vaz:embedding-dimensions-fix

Conversation

@Daniel-Vaz
Copy link
Contributor

Notes

  • Fixes silent INSERT failures by replacing NULL with empty strings for branch and repo in vec0 TEXT columns.
  • Embedding dimension is now set dynamically based on model name (e.g., 1536 for text-embedding-3-small).

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

@Daniel-Vaz I'm already fixing it in #53

@Daniel-Vaz
Copy link
Contributor Author

@djannot
Thank you !
I was just today noticing this behaviour. Could you please also address then the static dimension sizes in that same PR ? If not I can at least change this PR to just focus on that instead.

Locally I'm testing using Azure Open AI text-embedding-3-small model, and the dimension size difference from the one statically defined makes all my queries fail even when using that same model to build the db.

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

I'll ping you as soon as the other PR is merged, so you can rebase yours from main at that time, ok ?

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

Done, you can rebase

Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
Signed-off-by: dvaz-external <dvaz.external@epo.org>
@Daniel-Vaz Daniel-Vaz force-pushed the embedding-dimensions-fix branch from 0ae049d to 921188a Compare February 13, 2026 15:56
@Daniel-Vaz
Copy link
Contributor Author

@djannot Done. Feel free to have a look whenever you can.

@Daniel-Vaz Daniel-Vaz changed the title Fix: Prevent NULL values in vec0 TEXT columns, ensure proper embedding dimension Fix: Ensure proper embedding dimensions Feb 13, 2026
@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

npm test is returning errors

Signed-off-by: dvaz-external <dvaz.external@epo.org>
@Daniel-Vaz
Copy link
Contributor Author

My bad. Updated the test accordingly:

 Test Files  7 passed (7)
      Tests  456 passed (456)
   Start at  17:00:21
   Duration  6.72s (transform 1.48s, setup 0ms, import 7.33s, tests 7.30s, environment 1ms)

@djannot
Copy link
Collaborator

djannot commented Feb 13, 2026

Thanks. I used AI to analyze it and got some good feedback:

  1. Default value discrepancy - getEmbeddingDimension defaults to 1536 for unknown models, but all the function signatures default embeddingDimension parameter to 3072. This inconsistency could cause mismatches if someone passes a default through one path vs. the other. I'd recommend making the default consistent everywhere, or removing defaults from the function signatures to force callers to always pass the value explicitly.
  2. console.warn in a library function - utils.ts:115 uses console.warn directly instead of the project's Logger class. This is inconsistent with the rest of the codebase and won't be captured by the logging infrastructure.
  3. Hardcoded model knowledge is fragile - The model-to-dimension mapping will need updates whenever new models appear. Consider:
    • Making this configurable in the YAML/config (e.g., an embedding_dimensions field)
    • Falling back to a test API call to determine dimensions at startup
    • At minimum, documenting which models are supported
  4. getMetadataValue in Qdrant makes an extra API call (database.ts:281-290) - Every metadata read now calls getCollection to discover the dimension. This adds latency. It would be better to pass the dimension as a parameter here too, like the other methods.

@Daniel-Vaz
Copy link
Contributor Author

Thanks will look into this next week !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants