So cracking the document; what does that mean? It’s just like cracking a nut. You want the meat out of the nut. You want to eat the substance inside the shell of the nut. So each of the documents, like PDF documents or Word documents, even JSON documents or HTML have a textual representation inside of them. You want to extract the text out of those documents. Some of them contain images as well. So maybe you want to extract, as an example, the images inside a PowerPoint presentation. And not only that, but you want to do further processing on those pieces of information and that’s what the next stage or the skill set is for. So for instance, you may want to use OCR recognition and get the hand-written or printed text out of the images that you extracted from the document. Basically, the indexer is like the hammer; it cracks the document open, runs a bunch of intelligent skills on it, and then puts the relevant stuff into the index. The index can be searched to find document contents and it doesn’t matter if you use English or Spanish in this case, it will still find the document contents.
This is just a brief introduction into Azure Cognitive Search. So if you want to learn more about how you can use it or how other companies have used it for their benefits, please visit our website: digitalNXT Search | OrangeNXT
There is also a podcast available where Liam Cavanagh, Principal Program Manager – Azure Search at Microsoft, John Koot, Director Alliances at OrangeNXT and Mane Lambeens, Lead Data Scientist and Product Owner – digitalNXT Search at OrangeNXT will go through WHY Microsoft is entering this domain of Search. They will share lessons learned and upcoming exiting news. Please listen to our podcast here: