In early 2025, our manufacturing AI team became the first team across all of Microsoft Cloud for Industries (MCI) to ship the Azure OpenAI Assistant API in a production product. We then demonstrated it live at HMI 2025 — one of the world's largest industrial automation events — with Rolls Royce as the customer.

This is what it took to get there.

The Starting Point: Copilot V3 Was Good, But Limited

Our existing AI layer — Copilot V3 — was a RAG-based pipeline that generated KQL queries from natural language. We'd gotten query accuracy up to 75%+, which was solid. But customers wanted more:

These weren't things we could bolt onto a query-generation pipeline. They required a fundamentally different model — the Azure OpenAI Assistant API.

The Challenge: No Playbook

Nobody in MCI had shipped the Assistant API to production before. There was no internal reference implementation, no team to ask "how did you handle X?". We were building the playbook as we went.

The three hardest problems were:

1. Manufacturing Data Is Large — Too Large for the Context Window

A manufacturing plant dataset can have hundreds of thousands of records across production, downtime, consumption, and scheduling. You can't dump all of that into a context window and ask the assistant to reason over it.

My solution: build a filtering layer that sits between the plant dataset and the assistant. Before the assistant ever sees data, the layer queries and extracts only the relevant subset — based on the user's question, the entity graph, and the time range. The assistant then operates on a focused, manageable slice.

Key insight Filtering data before it reaches the model is not just a performance optimisation — it directly improves accuracy. An assistant that sees 200 relevant rows gives better answers than one swimming in 200,000 rows of noise.

2. Views Instead of Complex Traversal Queries

One of the biggest sources of hallucinations in LLM-based systems is complex query generation. The more logic the model has to reason about to construct a query, the more chances for error.

I created 5 purpose-built ADX views — pre-computed, materialised views that the assistant could query directly:

Each view handled all the traversal logic internally. The assistant just needed to pick the right view and apply filters — dramatically simpler than generating traversal queries from scratch.

"View creation instead of creating complex traversal logic [was key in] avoiding hallucinations." — Manager feedback

3. Conversation Threading

Implementing conversation IDs so users could ask follow-up questions was conceptually simple but required careful threading logic — each conversation needed its own thread ID, and the assistant had to consistently pick up context from previous turns without cross-contamination between users.

The HMI 2025 Demo

The target was to be ready for HMI 2025 in April — one of the most visible industrial automation events globally, where Rolls Royce was doing a live customer demonstration of our platform.

We worked with the Rolls Royce team on their specific dataset: ingesting their data, creating custom ADX functions to match their P0 questions, and building a demo flow that showed end-to-end assistant capability — natural language → data → chart — live on stage.

Outcome Assistant achieved accuracy exceeding 75% on the Rolls Royce dataset. Successfully demonstrated live at HMI 2025. First team in MCI to onboard Azure OpenAI Assistant API to production.

What We Shipped

Lessons

Azure OpenAILLMRAGAzure Data ExplorerC#.NETPrompt Engineering