top of page

Deploying Similarity Search simplified with databases and cloud native stacks

Updated: Dec 1

ree

The setup and benchmarking work described in this post were carried out by Raghu Shankar of Austin, TX, with consulting support from ADS. We welcome use case suggestions for performance evaluations.


Target audience: Organizations evaluating similarity search use cases for business advantage and database/IT professionals evaluating search databases.


Fast similarity searches are solving real-world needs for finding approximate nearest matches in large datasets extremely quickly. They are applied widely including large language models (LLMs), AI, genomics, and product recommendations. Many emerging use cases will benefit from similarity searches of large multi-modal datasets.


Traditional databases with vector extensions and native vector databases are underlying engines for these fast and approximate similarity searches.


Meanwhile Cloud native technologies enable building and running scalable applications in private and hybrid clouds. Containers, service meshes, microservices, and declarative resources exemplify this approach. These enable loosely coupled systems that are resilient, manageable, and observable. 


Many databases are cloud-native which simplifies deployment and monitoring in Kubernetes environments. Importantly select databases have built Kubernetes operators that make deploying and managing their resources relatively simpler in Kubernetes environments. Database resources can be configured, deployed, managed, and scaled like any native Kubernetes resource.

 Selecting and tuning vector databases require careful evaluation of use-case-specific criteria. We deployed one similarity search database Milvus on bare-metal Kubernetes with observability stacks. The database benchmarking and profiling results along with the observability tools offer a powerful combination to tune the systems to meet the desired criteria. 


Our cloud native stack with one cloud native vector database is shown in the diagram below. 

Cloud native stack

Bare metal consists of server nodes each with 40 cores, 64GB memory, local disk storage, Ubuntu, and Kubernetes control plane and worker nodes. 

eBPF is enabled via Cilium and Beyla. It implements observability, security, and networking functionality in the Ubuntu kernel. It has kernel privileges to oversee and control the entire system. eBPF allows kernel profiling from user space without code changes.


Observability is enabled via 2 paths:

  1. Metrics via Cilium - Hubble - Prometheus - Grafana

  2. Traces via Beyla - Alloy - Tempo - Prometheus - Grafana 


Cilium is an open source, cloud native eBPF-based solution for providing, securing, and observing network connectivity for workloads.

Hubble is enabled via Cilium. It collects network flows and bare metal metrics (Path #1) and forwards to Prometheus.

Beyla: An OpenTelemetry (oTel) component paired with eBPF to capture traces for service graphs & span Metrics (Path #2). The data capture occurs without any modifications to the cloud application, i.e., auto-instrumentation. 

Alloy: Acts as a central data processor. It receives data from Beyla,  processes the data, and routes traces to Tempo and span metrics to Prometheus.

Prometheus: Collects and stores metrics and spans in a time-series database, including flow metrics from Hubble and span metrics from Alloy.

Tempo: The dedicated tracing system that receives and manages traces exported by Alloy. Local storage was deployed for Tempo.

Grafana: The primary visualization platform used to display metrics (CPU, memory, network flows, and L7 flows). Grafana also builds service graphs using span metrics from Prometheus and traces from Tempo.


ree

Similarity search databases 

Based on the use case and phase of development, similarity search databases can be built on:

  • SQL and noSQL databases with vector support 

  • Native “dedicated” vector databases

They come in many flavors of open source, proprietary, cloud, and on-prem. Overviews of vector databases -  Vector Databases - Understanding the Internals and 5 Surprising Facts about Vector Databases powering modern AI 


As a first step we deployed cloud native vector database Milvus and its Kubernetes operator from Zilliz Tech. The operator runs as a pod in its own namespace.

Milvus can then be deployed in standalone or distributed mode using either Helm charts or YAML files.

In local mode, persistent storage needs to be setup for Milvus, MinIO object storage, and logs. These can be provisioned from Kubernetes. We setup storage classes in Kubernetes via auto-provisioner (Rancher). Next, the Milvus Helm files request persistent volume claims (PVC) pointing to the storage classes. By applying Milvus helm files, the claims bind to the storage classes. These volumes with claims can be monitored from Kubernetes. 


The Helm deploy also creates 3 Milvus pods - standalone Milvus, etcd, and MinIO. Milvus uses etcd for high availability, strong consistency, and transaction support of meta data snapshots of collection schema and message consumption checkpoints. Milvus uses MinIO to snapshot logs, index files for scalar and vector data, and intermediate query results. These pods can be managed, monitored, and scaled like any other Kubernetes resources.   


Zilliz Tech also provides VectorDBbench that can be run on many supported databases along with many options of data set sizes, search algorithms, and many test cases. Profiling and benchmarking and will be covered in the next part of this series Profiling Similarity Search databases to match use cases and criteria. Areas to investigate are vectorization and quantization of datasets.


In summary similarity search on cloud native stacks provides 4 key advantages with minimal incremental effort:

  • Consistent and faster deployment using cloud native methods

  • Managing and scaling database resources to match use case criteria

  • Database related metrics on performance

  • Metrics and traces of cloud native resources through the observability stack



Comments


bottom of page