Implementation details

Core data structures

Data Struct

#![allow(unused)]
fn main() {
pub struct Data {
    pub id: String,          // Unique identifier
    pub vector: Vec<Float>,  // Normalized vector data
    pub fields: HashMap<String, serde_json::Value> // Metadata
}
}
  • Stores both the vector and arbitrary metadata
  • Uses f32 for vector elements (type alias Float)
  • Vectors are normalized during storage

DataBase Struct (Internal)

#![allow(unused)]
fn main() {
struct DataBase {
    embedding_dim: usize,     // Vector dimensionality
    data: Vec<Data>,          // All entries
    matrix: Vec<Float>,       // Flattened vectors for SIMD
    additional_data: HashMap<String, serde_json::Value> // DB metadata
}
}

Key optimization: Stores vectors in a flattened matrix for:

  • Memory efficiency
  • SIMD-friendly data layout
  • Batch operations

NanoVectorDB Main Class

#![allow(unused)]
fn main() {
pub struct NanoVectorDB {
    pub embedding_dim: usize,  // Vector dimensionality
    pub metric: String,        // Distance metric ("cosine")
    storage_file: PathBuf,     // Persistence location
    storage: DataBase,         // Core data storage
}
}
Key methods
  1. Initialization
#![allow(unused)]
fn main() {
pub fn new(embedding_dim: usize, storage_file: &str) -> Result<Self>
}
  • Creates new DB instance or loads from file
  • Validates matrix consistency
  • Enforces dimensionality constraints
  1. Upsert mechanism
#![allow(unused)]
fn main() {
pub fn upsert(&mut self, mut datas: Vec<Data>) -> Result<(Vec<String>, Vec<String>)>
}
  • Update or Insert semantics:
    • Updates existing vectors by ID
    • Appends new vectors
  • Normalizes all vectors before storage
  • Returns tuple of (updated_ids, inserted_ids)
  1. Vector Search (query)
#![allow(unused)]
fn main() {
pub fn query(&self, query: &[Float], top_k: usize, ...) -> Vec<HashMap...>
}
  • Query normalization
  • Parallel similarity calculation using Rayon
  • Threshold filtering (better_than)
  • Custom filtering support via DataFilter
  • Top-k results using max-heap
  • Result formatting with metadata
  1. Persistence
#![allow(unused)]
fn main() {
pub fn save(&self) -> Result<()>
}
  1. Helper Functions

Normalization

#![allow(unused)]
fn main() {
pub fn normalize(vector: &[Float]) -> Vec<Float>
}
  • Ensures unit vectors for cosine similarity

Dot Product

#![allow(unused)]
fn main() {
pub fn dot_product(...) -> Float
}
  • Optimized with 4-element chunks
  • SIMD-friendly memory layout
  • Handles remainder elements