01 Create An Azure AI Search Solution
01 Create An Azure AI Search Solution
• The problem for most organizations is not a lack of information, but the
challenge of finding and extracting the information from the massive set of
documents, databases, and other sources in which the information is stored.
• This data is a valuable source of insights for travel agents and customers as
they plan trips, but the sheer volume of data can make it difficult to find
relevant information to answer a specific customer question.
Introduction
• This solution enables agents and customers to query the index to find relevant
documents and extract information from them.
Manage capacity
Azure AI Search
• Azure AI Search provides a cloud-based solution for indexing and querying a
wide range of data sources, and creating comprehensive and high-scale search
solutions.
• Depending on the specific solution you intend to build, you may also need
Azure resources for data storage and other application services.
• The pricing tier you select determines the capacity limitations of your search
service and the configuration options available to you, as well as the cost of
the service.
• Basic (B): Use this tier for small-scale search solutions that include a
maximum of 15 indexes and 2 GB of index data.
• Standard (S): Use this tier for enterprise-scale solutions. There are multiple
variants of this tier, including S, S2, and S3; which offer increasing capacity in
terms of indexes and storage, and S3HD, which is optimized for fast read
performance on smaller numbers of indexes.
• Storage Optimized (L): Use a storage optimized tier (L1 or L2) when you
need to create large indexes, at the cost of higher query latency.
Manage capacity
Replicas and partitions
• Depending on the pricing tier you select, you can optimize your solution for
scalability and availability by creating replicas and partitions.
• Replicas are instances of the search service - you can think of them as
nodes in a cluster. Increasing the number of replicas can help ensure there
is sufficient capacity to service multiple concurrent query requests while
managing ongoing indexing operations.
• Put simply, the number of search units is the number of replicas multiplied by
the number of partitions (R x P = SU).
• For example, a resource with four replicas and three partitions is using 12
Manage capacity
Replicas and partitions
• Depending on the pricing tier you select, you can optimize your solution for
scalability and availability by creating replicas and partitions.
• Replicas are instances of the search service - you can think of them as
nodes in a cluster. Increasing the number of replicas can help ensure there
is sufficient capacity to service multiple concurrent query requests while
managing ongoing indexing operations.
• Put simply, the number of search units is the number of replicas multiplied by
the number of partitions (R x P = SU).
• For example, a resource with four replicas and three partitions is using 12
Understand search
•components
An AI Search solution consists of multiple components, each playing an
important part in the process of extracting, enriching, indexing, and searching
data.
Data source
Most search solutions start with a data
source containing the data you want to search.
Azure AI Search supports multiple types of data
source, including:
• Azure AI Search can pull data from these data sources for indexing.
• Alternatively, applications can push JSON data directly into an index, without
pulling it from an existing data store.
Understand search
components
Skillset
• In a basic search solution, you might index the data extracted from the data
source.
• For example, when indexing data in a database, the fields in the database
tables might be extracted; or when indexing a set of documents, file metadata
such as file name, modified date, size, and author might be extracted along
with the text content of the document.
Understand search
components
Skillset
• While a basic search solution that indexes data values extracted directly from
the data source can be useful, the expectations of modern application users
have driven a need for richer insights into the data.
• In Azure AI Search, you can apply artificial intelligence (AI) skills as part of the
indexing process to enrich the source data with new information, which can be
mapped to index fields.
• The indexer is the engine that drives the overall indexing process.
• It takes the outputs extracted using the skills in the skillset, along with the
data and metadata values extracted from the original data source, and
maps them to fields in the index.
• In some cases, such as when you add new fields to an index or new skills
to a skillset, you may need to reset the index before re-running the
indexer.
Understand search
components
Index
• Client applications can query the index to retrieve, filter, and sort
information.
Understand search
components
Index
• Each index field can be configured with the following attributes:
• facetable: Fields that can be used to determine values for facets (user
interface elements used to filter the results based on a list of known field
values).
• retrievable: Fields that can be included in search results (by default, all
fields are retrievable unless this attribute is explicitly removed).
Understand the indexing
process
• The indexing process works by creating a document for each indexed entity.
• You can think of each indexed document as a JSON structure, which initially
consists of a document with the index fields you have mapped to fields
extracted directly from the source data, like this:
• Document
o metadata_storage_name
o metadata_author
o content
Understand the indexing
process
• When the documents in the data source contain images, you can configure the
indexer to extract the image data and place each image in a
normalized_images collection, like this:
• document
o metadata_storage_name
o metadata_author
o content
o normalized_images
o image0
o Image1
• Normalizing the image data in this way enables you to use the collection of
images as an input for skills that extract information from image data.
Understand the indexing
process
• Each skill adds fields to the document, so for example a skill that detects the
language in which a document is written might store its output in a language
field, like this:
• document
o metadata_storage_name
o metadata_author
o content
o normalized_images
o image0
o image1
o language
Understand the indexing
process
• The output fields from each skill can be used as inputs for other skills later in
the pipeline, which in turn store their outputs in the document structure. For
example, we could use a merge skill to combine the original text content with
the text extracted from each image to create a new merged_content field that
contains all of the text in the document, including image text.
• document
o metadata_storage_name
o metadata_author
o content
o normalized_images
o image0
o Text
o image1
o Text
o language
o merged_content
Understand the indexing
process
• The fields in the final document structure at the end of the pipeline are
mapped to index fields by the indexer in one of two ways:
1. Fields extracted directly from the source data are all mapped to index
fields. These mappings can be implicit (fields are automatically mapped to
in fields with the same name in the index) or explicit (a mapping is
defined to match a source field to an index field, often to rename the field
to something more useful or to apply a function to the data value as it is
mapped).
2. Output fields from the skills in the skillset are explicitly mapped from their
hierarchical location in the output to the target field in the index.
Search an index
• After you have created and populated an index, you can query it to search for
information in the indexed document content.
• While you could retrieve index entries based on simple field value matching,
most search solutions use full text search semantics to query an index.
• Full text search queries in Azure AI Search are based on the Lucene query
syntax, which provides a rich set of query operations for searching, filtering,
and sorting data in indexes.
• Azure AI Search supports both of these capabilities through the search query API.
Filtering results
• You can apply filters to queries in two ways:
• You can achieve this result by submitting the following simple search expression:
• Alternatively, you can use an OData filter in a $filter parameter with a full Lucene
search expression like this:
Apply filtering and sorting
Filtering with
•facets
Facets are a useful way to present users with filtering criteria based on field values
in a result set.
• They work best when a field has a small number of discrete values that can be
displayed as links or options in the user interface.
• To use facets, you must specify facetable fields for which you want to retrieve the
possible values in an initial query.
• For example, you could use the following parameters to return all of the possible
values for the author field:
Apply filtering and sorting
Filtering with
•facets
The results from this query include a collection of discrete facet values that you can
display in the user interface for the user to select.
• Then in a subsequent query, you can use the selected facet value to filter the
results:
Apply filtering and sorting
Sorting results
• By default, results are sorted based on the relevancy score assigned by the query
process, with the highest scoring matches listed first.
• However, you can override this sort order by including an OData orderby parameter
that specifies one or more sortable fields and a sort order (asc or desc).
• For example, to sort the results so that the most recently modified documents are
listed first, you could use the following parameter values:
Enhance the index
• With a basic index and a client that can submit queries and display results, you can
achieve an effective search solution.
• This topic describes some of the ways in which you can extend your search solution.
Search-as-you-
•type
By adding a suggester to an index, you can enable two forms of search-as-you-type
experience to help users find relevant results more easily:
• Suggestions - retrieve and display a list of suggested results as the user types
into the search box, without needing to submit the search query.
• This topic describes some of the ways in which you can extend your search solution.
Search-as-you-
•type
By adding a suggester to an index, you can enable two forms of search-as-you-type
experience to help users find relevant results more easily:
• Suggestions - retrieve and display a list of suggested results as the user types
into the search box, without needing to submit the search query.
• You can customize the way this score is calculated by defining a scoring profile that
applies a weighting value to specific fields - essentially increasing the search score
for documents when the search term is found in those fields.
• Additionally, you can boost results based on field values - for example, increasing
the relevancy score for documents based on how recently they were modified or
their file size.
• After you've defined a scoring profile, you can specify its use in an individual search,
or you can modify an index definition so that it uses your custom scoring profile by
default.
Enhance the index
Synonyms
• Often, the same thing can be referred to in multiple ways. For example, someone
searching for information about the United Kingdom might use any of the following
terms:
• United Kingdom
• UK
• Great Britain*
• GB*
• *To be accurate, the UK and Great Britain are different entities - but they're
commonly confused with one another; so it's reasonable to assume that someone
searching for "United Kingdom" might be interested in results that reference "Great
Britain".
• To help users find the information they need, you can define synonym maps that link
related terms together. You can then apply those synonym maps to individual fields
in an index, so that when a user searches for a particular term, documents with
fields that contain the term or any of its synonyms will be included in the results.
Exercise - Create a search
solution
Knowledge check
1. You want to find information in Microsoft Word documents that are stored in an Azure Storage
blob container. What should you do to ensure the files can be accessed by Azure AI Search?
a) Add a JSON file that defines an Azure AI Search index to the blob container
b) Enable anonymous access for the blob container
c) In an Azure AI Services resource, and add a data source that references the container
where the files are stored
2. You are creating an index that includes a field named modified_date. You want to ensure that
the modified_date field can be included in search results. Which attribute must you apply to the
modified_date field in the index definition?
d) searchable
e) filterable
f) retrievable
3. You have created a data source and an index. What must you create to map the data values
in the data source to the fields in the index?
g) A synonym map
h) An indexer
i) A suggester
Knowledge check
4. You want to create a search solution that uses a built-in AI skill to determine the language in
which each indexed document is written, and enrich the index with a field indicating the
language. Which kind of Azure AI Search object must you create?
a) Synonym map
b) Skillset
c) Scoring Profile
5. You want your search solution to show results in descending order of the file_size field value.
What is the simplest way to accomplish this goal?
d) Create a scoring profile that boosts results based on the file_size field
e) Make the file_size field facetable, and include a facet parameter that specifies
the file_size field in queries.
f) Make the file_size field sortable, and include an orderby parameter that
specifies the file_size field in queries.
6. You have created a search solution. Users want to be able to enter a partial search expression
and have the user interface automatically complete the input. What should you add to the index?
g) A suggester
h) A synonym map.
i) A scoring profile.