Profiling similarity search databases to match use case criteria
- Aju John
- Nov 2
- 3 min read
Updated: Nov 19

The setup and benchmarking work described in this post were carried out by Raghu Shankar of Austin, TX, with consulting support from ADS. We welcome use case suggestions for performance evaluations.
Target audience: Organizations evaluating similarity search use cases for business advantage and database/IT professionals evaluating search databases.
Fast similarity searches are gaining adoption across a wide range of use cases. However selecting and tuning similarity search databases require careful evaluation of a wide range of parameters that meet or exceed the performance and business criteria for the use case. This starts with performance profiling by tuning select parameters and observing the resulting metrics to ensure compliance to desired use case criteria.
This provides 4 advantages with minimal incremental effort:
Consistent and faster deployment using cloud native methods
Managing and scaling database resources to match use case criteria
Database related metrics on performance
Metrics and traces of cloud native resources through the observability stack
Second, the underlying engine for similarity search is vector database, refer 5 Surprising Facts about Vector Databases powering modern AI. This can be vector extensions to SQL and noSQL databases or native vector databases.
This investigation starts with Milvus vector database and its benchmarking suite on a local on-prem Kubernetes cluster.
Milvus is an open-source, native vector database for natural language processing and product recommendation systems. It stores vector embeddings that are numerical representations of text, images, video, and audio. This allows semantic searches - multimedia content with related contexts or meanings (in comparison to keyword based searches).
When profiling performance, it's important to assess various metrics and parameters that impact desired use case criteria.
Decision criteria for search databases
Below are the key parameters for architecting vector databases. For a chosen use case some will be tagged as critical criteria (“hard requirements”) for a successful implementation. All these parameters are closely inter-related; hence meeting the critical criteria for the use case means tuning many parameters in an optimal manner. This also includes nodes, cores, memory, network, and pod scaling which is available in cloud native stack with observability.
Datasets can be multi-modal, i.e., text, images, video and audio. Data needs to be converted to vector embeddings and quantized before storing into the database.
Larger dataset size leads to longer index build times or rebuild time. Index rebuild needs to be balanced consistency criteria for fresh data.
Vector Dimensions: Higher dimensions lead to higher resolution between data points, but also leads to longer index build and rebuild times.
Indexing algorithms: There are few options (examples HNSW and IVF) that trades-off latency, queries per sec, and accuracy.
Types of usage range from search performance, capacity, and streaming.
Queries per sec (QPS): Total queries per sec and the number of concurrent searches at desired latency and accuracy.
Recall: The accuracy of search results as % the "true" nearest neighbors. Vector databases are designed for approximate nearest neighbors - this is a tunable metric set by input criteria.
Latency: Measures the time for search operations, stated as percentiles P95 and P99. Latency must be lower than the criteria set by use case.
Load duration: Measures the time to load the dataset. For constantly refreshing large datasets this could be an important metric track.
Index build & optimize duration: Time required to build the indexes in the vector database. Data refresh causes index rebuilds. Lower duration is better.
Cloud native observability provides # of cores in use, cache and memory consumption, network utilization, packet flows, and storage.
Summary of results:
In cloud native cluster, the benchmark pods ran on one worker node with 40 cores and 64GB memory.
For a 500k dataset with 1,536 dimension vectors were loaded from the public cloud. The benchmark also provides the results of other databases for similar dataset and dimensions.
Indexing build time, compute-centric, 9 mins consuming 21 cores and 3.6GB memory. It was top of class when compared to similar benchmark runs due to the larger number of available cores.
Queries per sec, compute and memory centric, topped at 980 qps consuming 28 cores and 6.6GB memory. The system flattened out at 30 concurrent QPS limited by the available cores. This consumes more memory vs indexing but was not a limitation on this system. Second the system was top of class when compared to peers, mainly due to the large number of cores.
Recall was at 98%, middle of the pack comparatively.
Lastly P99 latency was 12ms, top of the pack comparatively.
For detailed results read Actionable Performance Tuning for vector databases.
In summary, the combination of cloud native database and cloud native observability stack makes it easy to deploy, run, and profile the database with benchmarks.





Comments