0% found this document useful (0 votes)
6 views

vectorsearch

Uploaded by

maha.kandadai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

vectorsearch

Uploaded by

maha.kandadai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Vector search and

state-of-the-art
retrieval for
generative AI apps​

Pamela Fox
Principal Cloud Advocate (Python)​
Agenda  Retrieval-augmented generation (RAG)​
 Vectors and vector databases​
 State of the art retrieval with Azure AI Search​
 Data and platform integrations​
 Use cases
Retrieval-augmented
generation (RAG)​
The limitations of LLMS

Outdated public knowledge

No internal knowledge
Incorporating domain knowledge

Prompt Fine Retrieval


engineering tuning augmentation
In-context learning Learn new skills Learn new facts
(permanently) (temporarily)
The benefit of RAG
Up-to-date public knowledge

Access to internal knowledge


RAG – Retrieval Augmented Generation

Yes, your company perks cover


Do my company perks underwater activities such as
scuba diving lessons 1
cover underwater activities?

PerksPlus.pdf#page=2: Some of the


lessons covered under PerksPlus
include: · Skiing and snowboarding

User Document
lessons · Scuba diving lessons ·
Surfing lessons · Horseback riding
Large
Question Search lessons These lessons provide Language
employees with the opportunity to try
new things, challenge themselves, Model
and improve their physical skills.….
Robust retrieval for RAG apps
 Responses only as good as retrieved data
Example
 Keyword search recall challenges
 “vocabulary gap”
 Gets worse with natural language questions Question:
“Looking for lessons on
 Vector-based retrieval finds
underwater activities”
documents by semantic similarity
 Robust to variation in how concepts are articulated
(word choices, morphology, specificity, etc.) Won’t match:
“Scuba classes”
“Snorkeling group sessions”
Vectors and vector databases​
Vector embeddings
An embedding encodes an input as a list of floating-point numbers.

”dog” → [0.017198, -0.007493, -0.057982, 0.054051, -0.028336, 0.019245,…]

Different models output different embeddings, with varying lengths.


Model Encodes Vector length
word2vec words 300
Sbert (Sentence-Transformers) text (up to ~400 words) 768
OpenAI ada-002 text (up to 8191 tokens) 1536
Azure Computer Vision image or text 1024
….and many more models!

Demo: Compute a vector with ada-002 (aka.ms/aitour/vectors)


Vector similarity
We compute embeddings so that we can calculate similarity between inputs.
The most common distance measurement is cosine similarity.

def cosine_sim(a, b):


return dot(a, b) /
(mag(a) * mag(b)) Similar: Orthogonal: Opposite:
θ near 0 θ near 90 θ near 180
cos(θ) near 1 cos(θ) near 0 cos(θ) near -1

*For ada-002, cos(θ) values range from 0.7-1

Demo: Compare vectors with cosine similarity (aka.ms/aitour/vectors)


Demo: Vector Embeddings Comparison (aka.ms/aitour/vector-similarity)
Vector search
1. Compute the embedding vector for the query
2. Find K closest vectors for the query vector
 Search exhaustively or using approximations

Query Compute Query vector Search K closest vectors


embedding vector existing vectors

[-0.003335318, -
[[“snake”, [-0.122, ..],
“tortoise” OpenAI ada-002 0.0176891904,…] Search [“frog”, [-0.045, ..]]]
create embedding existing vectors

Demo: Search vectors with query vector (aka.ms/aitour/vectors)


Vector databases
PostgreSQL with pgvector example:
 Durably store and index CREATE EXTENSION vector;
vectors and metadata at scale
CREATE TABLE items (id bigserial PRIMARY KEY,
 Various indexing & retrieval embedding vector(1536));
strategies
INSERT INTO items (embedding) VALUES
 Combine vector queries with ('[0.0014701404143124819,
metadata filters 0.0034404152538627386,
-0.012805989943444729,...]');
 Enable access control
SELECT * FROM items
ORDER BY
embedding <=> '[-0.01266181, -0.0279284,...]’
LIMIT 5;

CREATE INDEX ON items


USING hnsw (embedding vector_cosine_ops);
Vector databases in Azure

Vectors in Azure databases Azure AI Search

Keep your data where it is: Best relevance: highest quality


native vector search capabilities of results out of the box

Built into Automatically index data


Azure Cosmos DB MongoDB vCore and from Azure data sources:
Azure Cosmos DB for PostgreSQL services SQL DB, Cosmos DB, Blob
Storage, ADLSv2, and more
Feature rich, enterprise-ready vector database
Azure AI Search Data and platform integration
State-of-the-art retrieval system

*Previously known as Azure Cognitive Search


Azure AI Search
Feature-rich Ingest any Seamless data State-of- Enterprise-
vector data type, from & platform the-art ready
database any source integrations search ranking foundation

Generally available Public preview Generally available

Vector search Azure AI Search in Semantic ranker


Azure AI Studio
Integrated
vectorization
Vector search in Azure AI Search​
Feature rich, enterprise-ready​
Vector search in Azure AI Search Generally available

 Comprehensive vector search solution


 Enterprise-ready
→ scalability, security and compliance
 Integrated with Semantic Kernel,
LangChain, LlamaIndex, Azure OpenAI
Service, Azure AI Studio, and more

Demo: Azure AI search with vectors


(aka.ms/aitour/azure-search)
Vector search strategies
ANN search Exhaustive KNN search
 ANN = Approximate Nearest Neighbors  KNN = K Nearest Neighbors
 Fast vector search at scale  Per-query or built into schema
 Uses HNSW, a graph method with  Useful to create recall baselines
excellent performance-recall profile  Scenarios with highly selective filters
 Fine control over index parameters  e.g., dense multi-tenant apps

r = search_client.search( r = search_client.search(
None, None,
top=5, top=5,
vector_queries=[VectorizedQuery( vector_queries=[VectorizedQuery(
vector=search_vector, vector=search_vector,
k_nearest_neighbors=5, k_nearest_neighbors=5,
fields="embedding")]) fields="embedding",
exhaustive=True)])
Rich vector search query capabilities
Filtered vector search r = search_client.search(
None,
 Scope to date ranges, categories, geographic top=5,
distances, access control groups, etc. vector_queries=[VectorizedQuery(
vector=query_vector,
 Rich filter expressions k_nearest_neighbors=5,
fields="embedding")],
 Pre-/post-filtering vector_filter_mode=VectorFilterMode.PRE_FILTER,
 Pre-filter: great for selective filters, no recall disruption filter=
"tag eq 'perks' and created gt 2023-11-15T00:00:00Z")
 Post-filter: better for low-selectivity filters,
but watch for empty results
https://learn.microsoft.com/azure/search/vector-search-filters

Multi-vector scenarios r = search_client.search(


None,
 Multiple vector fields per document top=5,
vector_queries=[
 Multi-vector queries VectorizedQuery(
vector=query1, fields=”body_vector",
 Can mix and match as needed k_nearest_neighbors=5,),
VectorizedQuery(
vector=query2, fields=”title_vector”,
k_nearest_neighbors=5,)
])
Enterprise ready vector database

Including option for customer-managed


Data Encryption
encryption keys

Secure Authentication Managed identity and RBAC support

Network Isolation Private endpoints, virtual networks

Extensive certifications across finance,


Compliance Certifications
healthcare, government, etc.
Not just text

 Images, sounds, graphs, and more


 Multi-modal embeddings - e.g., images + sentences in Azure AI Vision
 Still vectors → vector search applies
 RAG with images with GPT-4 Turbo with Vision

Demo: Searching images (aka.ms/aitour/image-search)


Azure AI Search:​
State-of-the-art retrieval system​
Relevance
 Relevance is critical for RAG apps 75

 Lots of passages in prompt → 70

degraded quality
65
→ Can’t only focus on recall

Accuracy
60
 Incorrect passages in prompt →
possibly well-grounded yet 55

wrong answers
→ Helps to establish thresholds for 50
5 10 15 20 25 30
“good enough” grounding data
Number of documents in input context
Source: Lost in the Middle: How Language Models Use Long Contexts, Liu et al. arXiv:2307.03172
Improving relevance
All information retrieval tricks apply!

Complete search stacks do better:


Reranking
 Hybrid retrieval (keywords + vectors) >
pure-vector or keyword
 Hybrid + Reranking > Hybrid
Fusion
(RRF)
Identify good & bad candidates
 Normalized scores from Semantic ranker
 Exclude documents below a threshold Vector Keywords

Demo: Compare text, vector, hybrid, reranker


(aka.ms/aitour/search-relevance)
SOTA re-ranking model
Generally available
Highest performing retrieval mode

Semantic New pay-go pricing: Free 1k


requests/month, $1 per additional 1k
ranker Multilingual capabilities
Includes extractive answers,
captions and ranking

*Formerly semantic search


Retrieval relevance across methods

80
72
70
60 58 59
60
50 50
Accuracy Score

50 48 48
44 45
41 41
40

30

20

10

0
Customer datasets Beir dataset Miracl dataset

Keyword Vector (ada-002) Hybrid Hybrid + reranking

Retrieval comparison using Azure AI Search in various retrieval modes on customer and academic benchmarks
Source: Outperforming vector search with hybrid + reranking
Impact of query types on relevance

Hybrid +
Keyword Vector Hybrid
Query type Semantic ranker
[NDCG@3] [NDCG@3] [NDCG@3]
[NDCG@3]
Concept seeking queries 39 45.8 46.3 59.6
Fact seeking queries 37.8 49 49.1 63.4
Exact snippet search 51.1 41.5 51 60.8
Web search-like queries 41.8 46.3 50 58.9
Keyword queries 79.2 11.7 61 66.9
Low query/doc term overlap 23 36.1 35.9 49.1
Queries with misspellings 28.8 39.1 40.6 54.6
Long queries 42.7 41.6 48.1 59.4
Medium queries 38.1 44.7 46.7 59.9
Short queries 53.1 38.8 53 63.9
Source: Outperforming vector search with hybrid + reranking
Azure AI Search:​
Seamless Data and Platform Integrations​
Data preparation for RAG applications
Chunking
 Split long-form text into short passages
 LLM context length limits
 Focused subset of the content
 Multiple independent passages
 Basics
 ~200–500 tokens/passage
 Maintain lexical boundaries
 Introduce overlap
 Layout
 Layout information is valuable, e.g., tables

Vectorization
 Indexing-time: convert passages to vectors

Example: Data preparation process


Integrated vectorization In preview
End-to-end data processing tailored to RAG

Data source File format Chunking Vectorization Indexing


access cracking
• Split text • Turn chunks • Document
• Blob Storage • PDFs into passages into vectors index
• ADLSv2 • Office • Propagate • OpenAI • Chunk index
• SQL DB documents document embeddings • Both
• CosmosDB • JSON files metadata or your
• … • … custom model

+ Incremental + Extract images


change tracking and text, OCR
as needed

https://learn.microsoft.com/azure/search/vector-search-integrated-vectorization
Azure AI Studio &
Azure AI SDK

 First-class integration
 Build indexes from data in Blob
Storage, Microsoft Fabric, etc.
 Attach to existing Azure AI Search
indexes
Use cases
Example uses
Developers have used Azure AI search to
create RAG apps for…
 Public government data
 Internal HR documents, company meetings,
presentations
 Customer support requests and call
transcripts
 Technical documentation and issue trackers
 Product manuals
Next steps
Learn more about Azure AI Search
https://aka.ms/AzureAISearch

Dig more into quality evaluation details and why Azure AI Search
will make your application generate better results
https://aka.ms/ragrelevance

Deploy a RAG chat application for your organization’s data


https://aka.ms/azai/python

Explore Azure AI Studio for a complete RAG development experience


https://aka.ms/AzureAIStudio
Join us to learn together!
Today's workshops: Upcoming virtual event:
Workshop: Developing a
production-level RAG workflow

12:00-1:15pm

2:15-3:30pm

Build a RAG workflow with Prompt


Flow, Azure AI Studio, Azure AI Search,
Cosmos DB and Azure OpenAI

See you there! aka.ms/hacktogether/chatapp

You might also like