Vamana, the engine behind DiskANN

Wednesday, April 29, 2026

Vamana, the engine behind DiskANN

Vamana is a flat, single-layer graph structure used for fast Approximate Nearest Neighbour (ANN) searches, most notably as the core index for the DiskANN system developed by Microsoft Research. Unlike the multi-layered HNSW (Hierarchical Navigable Small Worlds), Vamana organises data into a single directed graph that balances local and long-range connections to enable rapid, "greedy" navigation across billion-scale datasets.

Core Characteristics of Vamana

Structure: Unlike hierarchical approaches like HNSW, Vamana builds a flat, single-layer graph.
Construction: Starts from an empty graph or a random graph, iteratively inserting nodes. It uses greedy search to find candidates and then applies robust pruning to limit neighbors (R).
Algorithm: Often run twice: first with a lower alpha (1) to build, then with a higher alpha (>1, e.g., 1.2) to prune, which improves recall.
Key Parameters: R (max edges per node) and L (search window size during construction) define the trade-off between index size, query speed, and accuracy

In General,

Flat Architecture: It avoids the memory-heavy hierarchy of multi-layered graphs, making it much more suitable for storage on SSDs rather than just RAM.
Small Diameter: The structure is optimized to have a "small diameter," meaning you can reach any point in the graph using very few jumps (hops), which minimizes expensive disk reads.
Robust Pruning: During construction, Vamana uses a unique RobustPrune heuristic. It checks if a potential new neighbor is "too close" in direction to an existing one; if so, it prunes the edge to ensure angular diversity and prevent redundant links.
Alpha Parameter (alpha): This is a tunable parameter that controls how aggressively the graph is pruned. Higher values of alpha generally lead to more long-range edges, which helps the search converge faster.

How the Graph is Built

Vamana construction typically involves the following steps:

Initialization: The graph starts as a random directed graph where each node has a fixed number of outgoing edges (\(R\)).
Greedy Search: For each point in the dataset, the algorithm performs a greedy search from a central starting point (the medoid) to find its approximate neighbors.
Refinement & Pruning: It then applies the RobustPrune procedure to refine these connections, often running two full passes—one with \(\alpha=1\) to build local connectivity and a second with \(\alpha > 1\) to add high-quality long-range "shortcuts".

Usage Examples

Disk-Based ANN Search: Powers DiskANN for searching datasets larger than RAM.
Similarity Search in Databases: Integrated into systems like Redis (SVS) and RAPIDS cuVS to enable fast vector indexing.
Embedding Search: Ideal for high-dimensional vector search tasks in ML pipelines.

Synonyms/Related Concepts

Proximity Graph: Vamana is a specific type of optimized graph.
Disk-based Index: Often referred to by its application in DiskANN.
Graph-based ANN: Classification of the algorithm, related to NSG (Neighborhood Search Graph).

Vamana differs from HNSW by avoiding a multi-layer structure, which often makes it more effective for scenarios where the index must reside on a disk.

RS Chandras Tech Blog | AI, ML, Agentic AI

Wednesday, April 29, 2026

Vamana, the engine behind DiskANN

No comments:

Post a Comment

Linear Regression

Pages

Search This Blog