Building a Tiny Vector Database in Rust: A Technical Walkthrough 🦀

Vector databases have existed for years, but recent advancements in scaling Retrieval Augmented Generation (RAG) systems have drawn significant attention from engineering teams seeking to implement AI solutions. These implementations employ distinct design philosophies. Some prioritize low-latency operations, while others emphasize scalability and distributed architecture support. This raises practical questions: How might researchers experiment with basic vector database implementations to test novel retrieval enhancements or semantic search improvements? What about developers seeking to understand the underlying mechanisms? Highly optimized production-grade systems present a challenge here – they often incorporate multiple abstraction layers that obscure implementation details from users.

nano-vectordb-rs provides a simple, minimalistic implementation using Rust - a systems programming language that ensures memory safety through its ownership and borrowing system.

This approach empowers developers with precise control over memory allocation/deallocation without garbage collection overhead, while zero-cost abstractions guarantee high-level code compiles to optimized machine instructions with no runtime penalty. These features make Rust particularly well-suited for building vector databases, where performance is critical. Additionally, Rust's built-in concurrency safety enables safe parallel execution, which we'll leverage to accelerate our algorithms through straightforward parallelism.

This implementation strips away the complexity to reveal core vector database mechanics through ~350 lines of focused code. Designed to educate, we'll explore:

Basics of vector databases
Design choices
Implementation details

Basics of vector databases

What is a vector database?

Think of it as a specialized library to store vectors:

Books = Vectors (arrays of numbers representing images, text, etc.)
Card Catalog = Metadata (additional info like timestamps or categories)
Librarian = Query engine (finds similar vectors quickly)

What are the core components of a vector database?

Vector Storage

Like a chef turning ingredients into recipes, embedding models convert raw data (text, images) into numerical vectors - think of it as creating a unique "flavor profile" for each ingredient. Just as a master chef can describe tomatoes as "sweet, acidic, umami," the translator turns "dog" into [0.7, -0.3, 0.5...] capturing its essence in machine language.

To store these vectors, we need a data structure. Hence a vector storage component is essential.

Similarity Search and Optimization

When retrieving ingredients based on the their flavour profiles, if we are looking for specific ingredients that are similar to what the chef is looking for, we have to perform similarity search. The most naive approach is to compare a query ingredient (vector) with all the ingredients in the database, which is essentially brute-force search.

When there are only a few ingredients to search over, brute-force is great. However, for applications where we are searching over millions or even billions of vectors, brute-force becomes too slow. We need special algorithms such as Approximate Nearest Neighbor (ANN), which are a class of algorithms designed to efficiently find data points closest to a query in high-dimensional spaces, balancing speed and accuracy.

There are several ANN algorithms which are mostly graph based (such as HNSW), but there are also hashing techniques, tree based structures, etc. Besides ANN algorithms, production grade vector databases also use quantization technniques to reduce memory footprint, allowing scalability.

Metadata Store

For every ingredient, we might want to store additional information that are tied to it such as calorific value, or when the ingredient was bought (timestamp). These are exact details that we might be interested to fetch when we have found similar ingredients.

Vector Index

Similarity search and optimization is usually a dual step process. Firstly, algorithms like the Hierarchical Navigable Small World (HNSW) are used to map vectors into search-optimized structures when inserting them into the database. In the querying step, these maps or indexes are then used to retrieve similar vectors very quickly.

Besides these core components, there are several other techniques and tools used to build scalable and reliable vector databases such as sharding, replication, multi-tenancy, etc. We will not go into these details, since we are focusing on the basics.

Design choices

Before explaining the implementation details, some design choices need to be explained.

Flattened matrix storage

Vectors are stored in a contiguous Vec<Float> with row-major order, enabling:

Better cache locality during similarity calculations
Efficient SIMD-friendly memory access patterns
Compact serialization/deserialization via base64 encoding

Rayon-based parallelism

Query processing uses data parallelism across CPU cores for:

Near-linear scaling with core count
Automatic work stealing for load balancing

Hybrid JSON/base64 format

The serialization strategy, i.e. the approach to convert the da8ta into a storable or transmittable format. The JSON format is used for metadata and the base64 format is used for the actual data. This hybrid approach is chosen for:

Human-readable metadata
Compact binary data
Efficient data transfer

Implementation details

Core data structures

`Data` Struct

#![allow(unused)]
fn main() {
pub struct Data {
    pub id: String,          // Unique identifier
    pub vector: Vec<Float>,  // Normalized vector data
    pub fields: HashMap<String, serde_json::Value> // Metadata
}
}

Stores both the vector and arbitrary metadata
Uses f32 for vector elements (type alias Float)
Vectors are normalized during storage

`DataBase` Struct (Internal)

#![allow(unused)]
fn main() {
struct DataBase {
    embedding_dim: usize,     // Vector dimensionality
    data: Vec<Data>,          // All entries
    matrix: Vec<Float>,       // Flattened vectors for SIMD
    additional_data: HashMap<String, serde_json::Value> // DB metadata
}
}

Key optimization: Stores vectors in a flattened matrix for:

Memory efficiency
SIMD-friendly data layout
Batch operations

`NanoVectorDB` Main Class

#![allow(unused)]
fn main() {
pub struct NanoVectorDB {
    pub embedding_dim: usize,  // Vector dimensionality
    pub metric: String,        // Distance metric ("cosine")
    storage_file: PathBuf,     // Persistence location
    storage: DataBase,         // Core data storage
}
}

Key methods

Initialization

#![allow(unused)]
fn main() {
pub fn new(embedding_dim: usize, storage_file: &str) -> Result<Self>
}

Creates new DB instance or loads from file
Validates matrix consistency
Enforces dimensionality constraints

Upsert mechanism

#![allow(unused)]
fn main() {
pub fn upsert(&mut self, mut datas: Vec<Data>) -> Result<(Vec<String>, Vec<String>)>
}

Update or Insert semantics:
- Updates existing vectors by ID
- Appends new vectors
Normalizes all vectors before storage
Returns tuple of (updated_ids, inserted_ids)

Vector Search (query)

#![allow(unused)]
fn main() {
pub fn query(&self, query: &[Float], top_k: usize, ...) -> Vec<HashMap...>
}

Query normalization
Parallel similarity calculation using Rayon
Threshold filtering (better_than)
Custom filtering support via DataFilter
Top-k results using max-heap
Result formatting with metadata

Persistence

#![allow(unused)]
fn main() {
pub fn save(&self) -> Result<()>
}

Helper Functions

Normalization

#![allow(unused)]
fn main() {
pub fn normalize(vector: &[Float]) -> Vec<Float>
}

Ensures unit vectors for cosine similarity

Dot Product

#![allow(unused)]
fn main() {
pub fn dot_product(...) -> Float
}

Optimized with 4-element chunks
SIMD-friendly memory layout
Handles remainder elements

Usage

Basic usage example

A simple example demonstrating the basic usage of the library. The example demonstrates:

Creating a 3-dimensional vector database
Upserting sample data
Querying for similar vectors
Deleting a vector

Advanced usage example

A more advanced example, using a real-world dataset (Wikipedia embeddings) and querying for similar entries. The example uses the hf-hub crate to fetch the dataset and the parquet crate to read parquet files. The example demonstrates:

Loading a dataset from Hugging Face Hub
Reading Parquet files
Upserting embedding data into the database
Querying for similar entries
Displaying results