Skip to content

Conversation

@viclafargue
Copy link
Contributor

@viclafargue viclafargue commented Jan 21, 2026

Answers #1720

In multi-GPU replicated mode, the search query is divided in batches. These batches are ran in parallel with OpenMP. In some cases, there may be more batches than available GPUs causing a thread safety issue (at least for CAGRA indices). This change solves the issue. Each rank gets its own thread, that thread handles all batches for that rank sequentially. This prevents concurrent access to the same GPU from multiple threads.

@tfeher
Copy link
Contributor

tfeher commented Jan 21, 2026

I would expect CAGRA to be thread safe. We expect that we can search using multiple threads to achieve large throughput for concurrent small batch size queries. We should fix this in CAGRA.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me as a workaround. But we should find the root cause, please open an issue to track it.

@cjnolet
Copy link
Member

cjnolet commented Jan 26, 2026

@viclafargue cam you also link the issue in a comment in the code before this is merged?

@viclafargue viclafargue requested a review from a team as a code owner January 26, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

3 participants