Story

An overview on Azure Cognitive Search

5 March 2021
10 min

This blog helps you understand what is Azure Cognitive Search, the underlying technology, what unique capabilities it offers and how you can start building great apps for web, mobile or line of business. This article also highlights very specific capabilities for ingesting, enriching, indexing and visualization of the data.

Deel deze pagina

Search based apps are crucial for any business or enterprise for a variety of scenarios and end goals. So Azure Cognitive Search is a platform as a service offering that allows you, even if you're not the search expert, to be able to create wonderful search enabled applications for web, for mobile, maybe for your line of business or enterprise applications. Azure Search has all these capabilities that you would expect in a search application today. People have got used to all these kind of capabilities when they go look for restaurants, a job or maybe they want to buy something. Search boxes are in every application that you use every day. Customers have come to expect this capabilities in your application as well. They want great search applications. Great search applications are really more than just trying to match a term into character by character into each of the documents that you have in a database. They really expect a higher level of experience.

Imagine if you are creating, let's say, a jobs site and you add your search query right there. People expect to find results even if they make mistakes. Let's say that they may misspell a word or a term. They still want to be able to find the relevant content. Also, as they are typing, they will expect to see type-ahead capabilities such as autocomplete, synonyms and suggestions to be supported as part of their query. People want to portfolio facets, some filters and all kinds of other capabilities that allow them to drill a little bit deeper into the type of data that they care about. So they can get, in this case, to the job they really care about. But just like these, there could be many, many other capabilities such as geo-spatial queries or the support for highlighting snippets, paging, and different ranking algorithms. The beautiful thing about Azure Cognitive Search is that even if you're not a search expert, you have all of these capabilities available to you as PaaS offerings.

All the information

Looking at this stuff, this all looks like data that can come out of a database. But what if you don't have data that is highly structured? Is there something you can do about that? In this specific example you connect to, let's say, a SQL database or maybe a Cosmos DB database that has all this information. But in many cases, your data is not structured. It's really completely unstructured, and one of the new capabilities of Azure Cognitive Search is the ability to connect to different types of data sources; structured or unstructured, and being able to understand the content inside those data sources.

For instance, that I have Blob Storage with documents; that's completely unstructured content. I'm essentially just putting any type of document in there, like PDFs, PowerPoint files, Word documents, Excel spreadsheets, all kinds of different types of file formats. I want to be able to find text inside those documents. The way that we can do this with Azure Cognitive Search is through a concept Microsoft calls Index Sets. These Index Sets are essentially the pipeline that is shown in the picture below. There are document cracking skills that understand the different file formats, to extract the content out of those files: the text, the images, and the metadata. Then being able to enrich this information through what Microsoft calls skills. There are built-in skills exposed as Azure Cognitive Services and custom skills, essentially AI algorithms / Machine Learning algorithms are supported that you can develop and plug into your pipeline.

This allows you to have structured content now stored in a Search Index. The indexer’s job is basically cracking documents and then we run all of those intelligent skills over that document.

Cracking the document

So cracking the document; what does that mean? It's just like cracking a nut. You want the meat out of the nut. You want to eat the substance inside the shell of the nut. So each of the documents, like PDF documents or Word documents, even JSON documents or HTML have a textual representation inside of them. You want to extract the text out of those documents. Some of them contain images as well. So maybe you want to extract, as an example, the images inside a PowerPoint presentation.

And not only that, but you want to do further processing on those pieces of information and that's what the next stage or the skill set is for. So for instance, you may want to use OCR recognition and get the hand-written or printed text out of the images that you extracted from the document. Basically, the indexer is like the hammer; it cracks the document open, runs a bunch of intelligent skills on it, and then puts the relevant stuff into the index. The index can be searched to find document contents and it doesn't matter if you use English or Spanish in this case, it will still find the document contents.

This is just a brief introduction into Azure Cognitive Search. So if you want to learn more about how you can use it or how other companies have used it for their benefits, please visit our productpage!

There is also a podcast available where Liam Cavanagh, Principal Program Manager – Azure Search at Microsoft, John Koot, Director Alliances at OrangeNXT and Mane Lambeens, Lead Data Scientist and Product Owner – digitalNXT Search at OrangeNXT will go through WHY Microsoft is entering this domain of Search. They will share lessons learned and upcoming exiting news. Please listen to our podcast here:

OrangeNXT · Podcast: An overview on Azure Cognitive Search with Microsoft and OrangeNXT

digitalNXT Search – find information without searching

Organisations often have large amounts of data that can add substantial value. digitalNXT Search creates clarity in the chaos and makes unstructured data accessible.

Read more