-
Notifications
You must be signed in to change notification settings - Fork 150
[FEA] Binary IVF Flat Index #1099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/26.02
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…nto binary-kmeans
tfeher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Tarang, thank you for your work on this PR, it looks good to me!
| } | ||
| }; | ||
|
|
||
| template <int Veclen, typename T, typename AccT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have any input or output with type T, therefore I do not see why we need this param. Or is the static assert participating SFINAE logic?
In any case, this is does not need to hold the PR.
| uint32_t masked_val = xor_val & 0xffu; | ||
| int popcount = __popc(masked_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect use cases where padding dims to be divisible by 32 would be a problem? But this discussion need not hold this PR, I have created #1613 to continue this discussion.
|
/ok to test 91c6734 |
42f1bb7 to
e59a357
Compare
|
/ok to test 07354d1 |
|
/ok to test 07e1837 |
Depends on rapidsai/raft#2770
Implementation of binary ivf flat index (bitwise hamming metric for the IVF Flat index)
Key Features
1. Binary Index Structure
binary_centers_field to store cluster centers as packeduint8_tarrays for binary datauint8_tinputs with BitwiseHamming and add only single instantiations of newly added kernels2. K-means Clustering for Binary Data
The clustering approach for binary data required special handling:
Expanded Space Clustering: Binary data (uint8_t) is expanded to signed representation (int8_t) where each bit becomes ±1
Centroid Quantization: After computing float centroids in expanded space, they are converted back to binary format:
3. Distance Kernels
Coarse Search (Cluster Selection)
bitwise_hamming_distance_opfor query-to-centroid distances in order to computePairwiseDistancesFine-Grained Search (Within Clusters)
Extended the interleaved scan kernel (
ivf_flat_interleaved_scan.cuh) with specialized templates for BitwiseHamming:Veclen-based optimization: Different code paths based on vectorization width
uint32_t, use__popc(x ^ y)for 4-byte Hamming distanceEfficient memory access patterns:
loadAndComputeDisttemplates foruint8_tthat leverage vectorized loadsas of 10/17/2025
Binary size increase:
branch-25.12 (CUDA 12.9 + X86): 1232.414 MB
This PR (CUDA 12.9 + X86): 1251.051 MB