AI-UniBot: Faster Indexing, Lower Costs

Processing large-scale corporate data for search with AI has traditionally remained a resource-intensive process, especially after the introduction of semantic clustering. Synchronous indexing required significant computing power, slowed down system performance, and substantially increased operational costs. Of course, we knew: sooner or later, we needed to solve this problem for our AI-UniBot Personal Assistant & Corporate Chatbot. And, of course, we understood: the sooner, the better.

Today, our AI-UniBot has integrated Azure OpenAI Batch Processing—a Microsoft technology that performs asynchronous batch data processing. This solution accelerates dataset indexing by several times and reduces its cost by 50%, transforming it into an efficient and scalable process. Below, we discuss why this matters.

When AI-UniBot analyzes documents for semantic clustering (for example, determining whether a technical report belongs to the «Finance» or «R&D» cluster), it uses powerful Artificial Intelligence (AI) models. Indexing tens of thousands of files creates a heavy load: system slowdowns due to synchronous requests, real-time computation costs increase, and parallel processing gets complex due to limitations.

Azure OpenAI Batch Processing works differently. First of all, it uses asynchronicity: indexing requests are accumulated in a «batch» and processed in the background. There is also a separate quota for processing AI requests: the indexing process does not interfere with User requests, so the system’s operation is not slowed down. At the same time, the cost of batch AI processing requests generated during indexing is 50% less than that of User requests. This allows AI-UniBot to process hundreds of thousands of documents simultaneously, guaranteeing completion in 24 hours at the latest.

We have tested this solution in several business cases.

At the Law Firm with high confidentiality requirements, AI-UniBot uses batch processing for semantic clustering of court decisions. Previously, synchronous AI model requests would block the bot’s operation during peak loads. This usually happens during working hours when dozens of Users overload the system with «urgent» requests. Whereas after business hours, the number of such requests would noticeably decrease.

Microsoft’s technology performs indexing in the background via a separate quota, ensuring uninterrupted AI-UniBot operation for real-time, urgent User requests (such as searching for precedents in the chat). This also eliminates the risk of data loss due to exceeding time limits for receiving a response.

For the international R&D Laboratory, the capability to simultaneously process 500+ technical reports daily was critical. Azure OpenAI Batch Processing automates the clustering of documents by area (such as «Nanotechnology,» «Bioengineering,» etc.) with a guarantee of completion within 24 hours, regardless of volume. This ensures an up-to-date knowledge base for DeepSearch (in-depth search) and DeepThink (relationship analysis) functions, without the need for additional server investments.

Here’s how it works: the technology separates resource-intensive indexing from AI-UniBot’s daily operations. Document processing requests are queued where they are batched. Then, they are asynchronously sent to Azure OpenAI’s resources, where AI models perform semantic analysis (such as determining contract topics or categorizing financial reports). Costs are reduced due to priority access to Microsoft’s «batch» quota, which is 50% cheaper than the standard API. The system automatically monitors progress, restarts failures, and integrates results into the AI-UniBot knowledge base without human intervention.

In short, after integrating Azure OpenAI Batch Processing, AI-UniBot has become the optimal solution for working with Big Data. The technology allows indexing gigantic archives of large Сompanies in hours instead of days. At the same time, system stability during heavy loads is guaranteed, and computational costs are reduced by half.