Azure search: Introduction (part 1 of 3)

Date posted
7 February 2019
Reading time
17 Minutes
James Taylor

Azure search: Introduction (part 1 of 3)

 /></figure>
</div>



<h2><strong>What is Azure Search?</strong></h2>



<p>Azure Search is a search-as-a-service cloud solution which provides an API to allow developers to integrate powerful search functionality without having to manage or install the search technology. The search service can be managed and queried through a REST API with its complexity hidden behind this.</p>



<p>To get started a free trial can be used in the Azure portal to test different possible configurations and solutions, allowing you to discover the full potential of Azure Search. After you have set up a free trial there are four main steps to get started: creating the service, creating the index, loading the data and searching. I will explain these steps along with example code in the section 'Using Azure Search'. </p>



<p>A typical scenario is outlined in the diagram below. Azure Search is used in conjunction with an application's database. This database can be used to populate and sync the data held within the search service. Azure Search will sit beside the data store, which could be either a relational database or a noSQL database. Data can either be sent directly to Azure Search via the REST API, or it can be added by crawling your data store via indexers and data sources. </p>



<figure ><img  src=

Searchable - Fields marked as searchable will be searchable through the REST API. When a field is marked as searchable it undergoes token and word analysis. In the example above the description of the house is searchable. Therefore, if the description was "Spacious house with large garden" then the field will be broken down into words and undergo other lexical analysis, such as including word inflexions in the index. If the user searched for "gardening" or "gardens" then the example description would match, as they are word inflections of "garden". An important point about searchable fields is that they take up more space in the index, because Azure will store different variations of the word.

Filterable - Fields marked as filterable are fields which can be filtered with classic filters, such as equals, less than or more than. In the example above "lastRenovationDate" is marked as filterable. This allows the user to filter only for houses that have been renovated during a certain time frame, or houses that have been renovated recently.

Sortable - Fields marked as sortable can be specified to tell Azure Search how to order the results returned. By default Azure will return results in the order of the search score (based on how closely the search text matches a result in the index).

Analysers - Different analysers can be specified to tell Azure Search how to analyse the inputted data. For example, "lucene.fr" is used in the above example for the "description_fr" field. This means that the text will be analysed and suggestions, tokenisation and other analysis will be performed to better suit the French language. Various languages can be chosen, as well as various analysers.

Custom Analysers

In some scenarios you may want to analyse text differently to the standard approach taken by Azure Search. This can be done by standard or custom analysers. Analysers are configurations that filter or replace certain characters and symbols from the input text. The example above defines a custom analyser called "phonetic_ascii_analyzer". In this example a standard tokeniser is used but a custom analyser is created. The custom analyser will convert all input into lower case (search matching will happen on any case), ascii folding (normalises ?� or ?? to allow for easier matching) and phonetic (matches on phonetically similar words).

As well as custom analysers, a custom tokeniser can be created. A tokeniser defines how the input text can be split into independent tokens. For example, separating a sentence into words.

Data Sources and Indexers

A data source can be used (alongside an indexer) to sync data between a database and the Azure Search index. This can be done manually as a one-off job, or as a scheduled job of intervals up to 5 minutes. When defining a data source, you are defining the connection information for your database. This connection information is used by the indexer to sync the data.

Currently, there are 4 different types of data sources that can be used. Those types are: "azuresql", "documentdb" (Azure Cosmos DB), "azureblob" and "azuretable". A more advanced feature that can be specified as part of the data source definition is the high watermark change detection policy. This policy is used to specify when a column has been changed. This can be the row version or a last updated column (such as a timestamp). Another policy that can be specified is the SQL integration change detection policy. This is the most efficient change detection policy but can only be used by data sources that support change tracking (e.g Azure SQL DB V12). This policy does not require a column name but is done automatically.

Once the data source has been defined, an indexer can be defined. The indexer will extract information from the data source by crawling through it. A schedule is added as a parameter when creating the indexer, which will tell Azure how often to run the indexer and check for changes. This can be up to every 5 minutes. There are also some additional settings that can be stated such as 'batchSize' (number of items in a batch which can be tweaked to improve performance), 'maxFailedItems' and 'maxFailedItemsPerBatch' (number of failures, can be set to 0 for no errors allowed or -1 for infinite number of errors).

In case the fields in the index and fields in the data source do not match, field mappings can be defined. These field mappings can map names of fields in the data source to differently named fields in the index. Through the REST API actions such as create, delete, update and list indexers/data sources can be performed. You can also check on the status of the index, to view information on the failures that could have occurred during indexing.

Next up I will describe the basics of how to use Azure Search through the REST API in my 'Using Azure Search' blog.

About the author

James Taylor