Skip to content

Conversation

@ahmedkansulum
Copy link

Summary

While working with VectorStore I noticed that reverse_search's
docstring does not match its behaviour:

  • The implementation simply joins a list of document IDs onto the existing
    vector store and returns the matching rows.
  • The docstring, however, describes a cosine-similarity based search and
    mentions n_results, which is not used at all.

In addition, reverse_search accepts an optional ids argument but does
not validate it. If ids has the wrong length or contains incompatible
types, this can lead to confusing errors from Polars rather than a clear
ValueError.

This PR tidies up both of these points without changing the behaviour of
reverse_search.


Changes

  • Update the VectorStore.reverse_search docstring to accurately describe
    the current behaviour:

    • Emphasise that the method looks up rows by document id and does not
      compute new embeddings or similarity scores.
    • Clarify the roles of query, ids, and the returned columns.
    • Document that n_results is currently unused but retained for public
      API compatibility.
  • Add input validation for the optional ids argument:

    • Require ids to be:
      • a list,
      • containing only str or int values,
      • the same length as query, and
      • containing only unique values.
    • If these conditions are not met, raise a clear ValueError with a
      helpful message (mirroring the behaviour already used in
      VectorStore.search).
  • Add a short inline comment noting that n_results is currently unused.


Rationale

  • Keeps the public docstrings in sync with the actual implementation,
    reducing confusion for users of the public API.
  • Makes the behaviour of reverse_search more robust and predictable
    when a caller provides custom ids.
  • Aligns argument validation between search and reverse_search:
    both now enforce a consistent contract for ids.

Testing

Locally I:

make check-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant