I have been working on a lab to integrate and automate the review of legacy documents. The idea is quite simple: take legacy documents, either physical or previously scanned and stored as PDF or images, and integrate them with an AI agent. From there, the agent can analyse the extracted data to review, assemble, and retrieve customer case information.

For this lab, I manually upload documents into a Blob Storage account. These documents are synthetic and generated using AI. In a real-world scenario, this ingestion could be fully automated, but for the purpose of the lab, I kept it manual for now.
Once the document is uploaded, a Logic App is triggered. This Logic App picks up the file and sends it to Azure AI Content Understanding, using the prebuilt Layout model, which is well suited for this type of scenario.
The service then extracts and parses all the content from the document and returns structured data, which is stored in Cosmos DB.
The Logic App also handles document lifecycle. If processing is successful, the document is moved to a container called processed. If something fails, which can happen if an unexpected document format is submitted, the Logic App records the failure along with the reason in a separate container. In both cases, the original file in the incoming container is deleted.

Up to this point, something quite interesting has already happened. What was previously offline or static data is now structured, queryable, and stored in a database. From here onwards, the possibilities expand significantly.
This is where things start to get more interesting. Instead of connecting directly to the database, I introduced an API layer using Azure Functions. This acts as a controlled interface between the data and the AI agent, avoiding direct exposure of the database and enforcing a structured way to retrieve information.
I created the following functions:
GetCaseById
Retrieves a specific case and its related information using a case identifier.
GetCasesByCustomer
Returns all cases associated with a given customer ID.
GetDocumentsByCaseId
Retrieves all processed documents linked to a specific case.
GetOpenApiSpec
Exposes the OpenAPI definition so the agent can understand how to interact with the API. This effectively enables grounding, ensuring the agent retrieves real data instead of relying on assumptions. In practice, this behaves similarly to a RAG-style pattern, where the agent retrieves structured information rather than generating it.
GetProcessedDocuments
Provides access to processed documents across the dataset, useful for broader queries.
SearchCustomerByName
Allows searching for a customer using a name instead of an ID, which becomes essential for more natural interaction.

I also built a simple web page to visualise and validate the stored data. However, the real highlight was creating an agent in Azure AI Foundry. Through the Functions API, the agent can filter information, assemble case views, analyse document content, and retrieve financial or contextual data.


Below are a few examples of how the agent can assist.
Prompt 1
What cases do we have for customer CUST-1001?
Agent retrieves data via API and summarises results

Prompt 2
Prompt: Summarise the documents in case CASE-1001
Agent extracts key information such as amounts and references

Prompt 3
Is there any financial impact in case CASE-1001?
Agent combines multiple data points to provide a structured answer

Prompt 4
What do we have for Michael White??

Prompt 5
Is there any Jane in the database? Retrieve core customer details for Jane, including identifiers and address information.

It is quite impressive how quickly this becomes useful. And this is still a simplified scenario, based on synthetic data. The same pattern applies directly to real-world problems.
Lessons learned
One thing that became very clear during this lab is something we already know from traditional IT. Trying to centralise everything into a single component rarely works well. It creates bottlenecks, increases complexity, and reduces overall quality. In the emerging agentic world, the same principle applies.
An agent should not be responsible for everything. Delegation becomes essential.
A key capability that would be particularly valuable in this scenario is the introduction of business jargon and acronym mapping. In most organisations, data is not described in a generic way. It is full of internal terminology, abbreviations, and domain-specific language. Enabling the system to understand and map these terms correctly significantly improves both retrieval accuracy and response quality.
This is where Azure AI Foundry IQ could naturally fit into the architecture as a next step. Instead of relying on a single agent and traditional retrieval approaches, Foundry IQ introduces a more advanced model based on agentic retrieval and orchestration.
This would allow multiple specialised agents to collaborate, using structured knowledge bases and shared context across different systems. Another important advantage is reuse. Knowledge bases created for one scenario can be reused across multiple agents, making the solution far more scalable and manageable.
It also improves observability. With built-in evaluation capabilities, it becomes possible to measure how effective the responses are, identify gaps, and continuously improve the system.
If I were to evolve this lab further, I would move towards a model where retrieval is handled by dedicated components, document understanding is separated from orchestration, and agents collaborate instead of operating in isolation. Very similar to how teams operate in real life.
It is important to mention that both Foundry IQ and Evaluation are currently in public preview. If you are interested in exploring this further, here are some useful references:
RAG Evaluation for knowledge bases: https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/foundry-iq-boost-response-relevance-by-36-with-agentic-retrieval/4470720
If anyone is interested in the full implementation, including the Logic App and Functions code, leave a comment and I’ll publish it on GitHub for everyone to use as a reference.

