Vector Quantization
Vector Quantization
Weaviate’s Guide to
VECTOR
QUANTIZATION
Data accumulating.
Retrieval slowing.
Cost Skyrocketing.
Sounds familiar?
IA ledneQ
We are in the same boat.
VECTOR
QUANTIZATION
IA ledneQ
VECTOR
QUANTIZATION
in 2 ways
IA ledneQ
1
Product
Quantization
(PQ)
Patrick Middleton
IA ledneQ
What PQ does
Compresses your vector embeddings by
breaking them down into smaller,
manageable segments.
IA ledneQ
PQ Benefits
Reduces memory usage by almost 24
times while maintaining a balance
between performance and recall.
Best for
Those who use hnsw indexes and need a
fine balance between speed and
accuracy.
IA ledneQ
2
Binary
Quantization
(BQ)
Patrick Middleton
IA ledneQ
What BQ does
Converts each vector into a binary
format, drastically reducing the size from
bytes to bits.
IA ledneQ
BQ Benefits
Achieves a 32x reduction in storage
requirements and speeds up search
processes.
Best for
Projects where speed is critical, and
slight compromises on accuracy are
acceptable.
IA ledneQ
📚 Trade-offs
PQ might slightly reduce recall but
saves more memory.
📌 Prompting
📌 LLMs
📌 RAG
📌 Agents