-
Notifications
You must be signed in to change notification settings - Fork 150
Auto select CAGRA build algorithom for hnsw::build #1719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bb78635 to
23a0b16
Compare
| size_t total_host = | ||
| graph_host_mem + host_workspace_size + 2e9; // added 2 GB extra workspace (IVF-PQ search) | ||
| size_t total_dev = | ||
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still optimistic, we need to update and test before the PR can be merged.
| hnsw::index_params params; | ||
| params.M = 24; | ||
| params.ef_construction = 200; | ||
| params.hierarchy = cuvs::neighbors::hnsw::HnswHierarchy::GPU; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this default #1617
| int64_t topk = 12; | ||
|
|
||
| // HNSW index parameters | ||
| hnsw::index_params params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out how to handle ace.build_dir in this setup: the user does not set ace_params. The algorithm is automatically selected. But if we happen to choose ace, then we need to know which disk space to use. Do you have suggestions @julianmi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a somewhat challenging problem. In general, I agree with @benfred's comment that hardcoding it is not a good approach.
I think the two most important properties are available disk space and speed. How about a layered approach:
- Environment variable. E.g.,
CUVS_ACE_BUILD_DIR. - We could read
/sys/blockon Linux to query for fast disks (NVMe > SSD > HDD) and check for free size. - Fall back to
/tmpor a generated temporary path like @benfred suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with a layered approach using only (1) and (3) which are both agreed on by the user to write to -- I don't think we can just write to a drive without user consent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that (2) is problematic. But (3) has similar problems, right? /tmp could be very small or on a slow disk.
Another approach I see would be to fail with a helpful message that a disk will be used given the graph size and the user should provide a suitable directory.
In general, I am not sure if environment variables are a good approach since the project does not use them a lot. @tfeher, @cjnolet What do you think?
mfoerste4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not go over all memory estimates in detail but suggest to align predictions with real data.
Is autotuning of ACE params part of a different PR? Besides the open question on the file location we might want to at least set the number of partitions dynamically.
| raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_)); | ||
| } | ||
|
|
||
| auto dataset_view = raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the data expected to always reside in host memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACE only supports host memory right now. The main reasons is that we expect the data size to be large and memory-mapped. Further, we do the partitioning and reordering on the host since there is no benefit of moving it to the GPU only to write it to disk afterwards.
Anyways, I think we can support device datasets easily since these should not end up using ACE with this heuristic. @tfeher What do you think?
| namespace helpers { | ||
| /** Calculates the workspace for graph optimization | ||
| * | ||
| * @param[in] n_rows number of rows in the dataset (or number of points in the grapt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param[in] n_rows number of rows in the dataset (or number of points in the grapt) | |
| * @param[in] n_rows number of rows in the dataset (or number of points in the graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also mst_optimize is not documented
| // ACE build and search example. | ||
| cagra_build_search_ace(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we want to rename this to something generic now that the selection is hidden from the user.
| int64_t topk = 12; | ||
|
|
||
| // HNSW index parameters | ||
| hnsw::index_params params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with a layered approach using only (1) and (3) which are both agreed on by the user to write to -- I don't think we can just write to a drive without user consent.
| // Configure ACE parameters for CAGRA | ||
| cuvs::neighbors::cagra::graph_build_params::ace_params cagra_ace_params; | ||
| cagra_ace_params.npartitions = ace_params.npartitions; | ||
| cagra_ace_params.ef_construction = params.ef_construction; | ||
| cagra_ace_params.build_dir = ace_params.build_dir; | ||
| cagra_ace_params.use_disk = ace_params.use_disk; | ||
| cagra_ace_params.max_host_memory_gb = ace_params.max_host_memory_gb; | ||
| cagra_ace_params.max_gpu_memory_gb = ace_params.max_gpu_memory_gb; | ||
| cagra_params.graph_build_params = cagra_ace_params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to add a heuristic for npartitions depending on dimensions here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have added heuristics in #1603.
julianmi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not get a chance to fully review the memory heuristics yet. I wonder how we can test it though. Should max_host_memory_gb and max_gpu_memory_gb be optional HNSW parameters that we could use to test that the expected algorithm is used based on memory limits set?
| constexpr static uint32_t kIndexGroupSize = 32; | ||
| constexpr static uint32_t kIndexGroupVecLen = 16; | ||
|
|
||
| std::cout << "pq_dim " << params.pq_dim << ", pq_bits " << params.pq_bits << ", n_lists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason not to use RAFT_LOG_INFO here and in the following lines?
| } | ||
| } | ||
|
|
||
| inline std::pair<size_t, size_t> get_available_memory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this helper be placed in cuvs::util:: like get_free_host_memory?
| }(); | ||
|
|
||
| RAFT_LOG_DEBUG("# Building IVF-PQ index %s", model_name.c_str()); | ||
| RAFT_LOG_INFO("# Building IVF-PQ index %s", model_name.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this and the following logging changes intentionally? Logging every 10 seconds might write a lot of output on a large run.
| size_t total_dev = | ||
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size | ||
|
|
||
| std::cout << "IVF-PQ build memory requirements\ndataset_gpu " << dataset_gpu_mem / 1e9 << " GB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about using RAFT_LOG_INFO.
| size_t total_host = | ||
| graph_host_mem + host_workspace_size + 2e9; // added 2 GB extra workspace (IVF-PQ search) | ||
| size_t total_dev = | ||
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size | |
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // added 1 GB extra workspace size |
| * | ||
| * @param[in] res raft resource | ||
| * @param[in] dataset shape of the dataset | ||
| * @param[in] param ivf-pq compression pramas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @param[in] param ivf-pq compression pramas | |
| * @param[in] param ivf-pq compression params |
Configuring HNSW graph build using CAGRA is complicated, because CAGRA offers multiple build algorithms. This PR implements an automatic algorithm selection. The goal is to have a simplified API, where the user needs to set only two parameters that control graph size and quality (
Mandef_constructionrespectively). This shall be familiar for HNSW users, and allows easier adaption of cuvs accelerated HNSW graph building.If we have enough memory (host and GPU) to do both the KNN graph building and optimization in memory, then we choose in memory build, and let
cagra::index_params::from_hnsw_paramsderive the additional configuration parameters.If the build would require more memory then available, then we choose ACE method and let the number of partitions derived using #1603.
For host we query the os for available memory, for GPU it is assumed that the whole device memory is available.