Skip to content

Conversation

@tfeher
Copy link
Contributor

@tfeher tfeher commented Jan 21, 2026

Configuring HNSW graph build using CAGRA is complicated, because CAGRA offers multiple build algorithms. This PR implements an automatic algorithm selection. The goal is to have a simplified API, where the user needs to set only two parameters that control graph size and quality (M and ef_construction respectively). This shall be familiar for HNSW users, and allows easier adaption of cuvs accelerated HNSW graph building.

  hnsw::index_params params;
  params.M               = 24;
  params.ef_construction = 200;
  params.hierarchy       = cuvs::neighbors::hnsw::HnswHierarchy::GPU;

  auto hnsw_index = hnsw::build(res, params, dataset_host_view);
  cuvs::neighbors::hnsw::serialize(res, "hnsw_index.bin", *hnsw_index);

If we have enough memory (host and GPU) to do both the KNN graph building and optimization in memory, then we choose in memory build, and let cagra::index_params::from_hnsw_params derive the additional configuration parameters.

If the build would require more memory then available, then we choose ACE method and let the number of partitions derived using #1603.

For host we query the os for available memory, for GPU it is assumed that the whole device memory is available.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@tfeher tfeher force-pushed the auto_selec_cagra_build branch from bb78635 to 23a0b16 Compare January 21, 2026 17:43
@tfeher tfeher removed request for a team January 21, 2026 17:46
@tfeher tfeher added breaking Introduces a breaking change improvement Improves an existing functionality labels Jan 21, 2026
Comment on lines +80 to +83
size_t total_host =
graph_host_mem + host_workspace_size + 2e9; // added 2 GB extra workspace (IVF-PQ search)
size_t total_dev =
std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still optimistic, we need to update and test before the PR can be merged.

@tfeher tfeher requested a review from mfoerste4 January 21, 2026 17:53
hnsw::index_params params;
params.M = 24;
params.ef_construction = 200;
params.hierarchy = cuvs::neighbors::hnsw::HnswHierarchy::GPU;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this default #1617

int64_t topk = 12;

// HNSW index parameters
hnsw::index_params params;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out how to handle ace.build_dir in this setup: the user does not set ace_params. The algorithm is automatically selected. But if we happen to choose ace, then we need to know which disk space to use. Do you have suggestions @julianmi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a somewhat challenging problem. In general, I agree with @benfred's comment that hardcoding it is not a good approach.

I think the two most important properties are available disk space and speed. How about a layered approach:

  1. Environment variable. E.g., CUVS_ACE_BUILD_DIR.
  2. We could read /sys/block on Linux to query for fast disks (NVMe > SSD > HDD) and check for free size.
  3. Fall back to /tmp or a generated temporary path like @benfred suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with a layered approach using only (1) and (3) which are both agreed on by the user to write to -- I don't think we can just write to a drive without user consent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that (2) is problematic. But (3) has similar problems, right? /tmp could be very small or on a slow disk.

Another approach I see would be to fail with a helpful message that a disk will be used given the graph size and the user should provide a suitable directory.

In general, I am not sure if environment variables are a good approach since the project does not use them a lot. @tfeher, @cjnolet What do you think?

Copy link
Contributor

@mfoerste4 mfoerste4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not go over all memory estimates in detail but suggest to align predictions with real data.

Is autotuning of ACE params part of a different PR? Besides the open question on the file location we might want to at least set the number of partitions dynamically.

raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_));
}

auto dataset_view = raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the data expected to always reside in host memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACE only supports host memory right now. The main reasons is that we expect the data size to be large and memory-mapped. Further, we do the partitioning and reordering on the host since there is no benefit of moving it to the GPU only to write it to disk afterwards.

Anyways, I think we can support device datasets easily since these should not end up using ACE with this heuristic. @tfeher What do you think?

namespace helpers {
/** Calculates the workspace for graph optimization
*
* @param[in] n_rows number of rows in the dataset (or number of points in the grapt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param[in] n_rows number of rows in the dataset (or number of points in the grapt)
* @param[in] n_rows number of rows in the dataset (or number of points in the graph)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mst_optimize is not documented

Comment on lines +100 to +101
// ACE build and search example.
cagra_build_search_ace(res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we want to rename this to something generic now that the selection is hidden from the user.

int64_t topk = 12;

// HNSW index parameters
hnsw::index_params params;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with a layered approach using only (1) and (3) which are both agreed on by the user to write to -- I don't think we can just write to a drive without user consent.

Comment on lines +1341 to +1349
// Configure ACE parameters for CAGRA
cuvs::neighbors::cagra::graph_build_params::ace_params cagra_ace_params;
cagra_ace_params.npartitions = ace_params.npartitions;
cagra_ace_params.ef_construction = params.ef_construction;
cagra_ace_params.build_dir = ace_params.build_dir;
cagra_ace_params.use_disk = ace_params.use_disk;
cagra_ace_params.max_host_memory_gb = ace_params.max_host_memory_gb;
cagra_ace_params.max_gpu_memory_gb = ace_params.max_gpu_memory_gb;
cagra_params.graph_build_params = cagra_ace_params;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add a heuristic for npartitions depending on dimensions here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have added heuristics in #1603.

Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get a chance to fully review the memory heuristics yet. I wonder how we can test it though. Should max_host_memory_gb and max_gpu_memory_gb be optional HNSW parameters that we could use to test that the expected algorithm is used based on memory limits set?

constexpr static uint32_t kIndexGroupSize = 32;
constexpr static uint32_t kIndexGroupVecLen = 16;

std::cout << "pq_dim " << params.pq_dim << ", pq_bits " << params.pq_bits << ", n_lists"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason not to use RAFT_LOG_INFO here and in the following lines?

}
}

inline std::pair<size_t, size_t> get_available_memory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this helper be placed in cuvs::util:: like get_free_host_memory?

}();

RAFT_LOG_DEBUG("# Building IVF-PQ index %s", model_name.c_str());
RAFT_LOG_INFO("# Building IVF-PQ index %s", model_name.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this and the following logging changes intentionally? Logging every 10 seconds might write a lot of output on a large run.

size_t total_dev =
std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size

std::cout << "IVF-PQ build memory requirements\ndataset_gpu " << dataset_gpu_mem / 1e9 << " GB"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about using RAFT_LOG_INFO.

size_t total_host =
graph_host_mem + host_workspace_size + 2e9; // added 2 GB extra workspace (IVF-PQ search)
size_t total_dev =
std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size
std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // added 1 GB extra workspace size

*
* @param[in] res raft resource
* @param[in] dataset shape of the dataset
* @param[in] param ivf-pq compression pramas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param[in] param ivf-pq compression pramas
* @param[in] param ivf-pq compression params

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Introduces a breaking change improvement Improves an existing functionality

Development

Successfully merging this pull request may close these issues.

3 participants