In early 2025, our manufacturing AI team became the first team across all of Microsoft Cloud for Industries (MCI) to ship the Azure OpenAI Assistant API in a production product. We then demonstrated it live at HMI 2025 — one of the world's largest industrial automation events — with Rolls Royce as the customer.
This is what it took to get there.
The Starting Point: Copilot V3 Was Good, But Limited
Our existing AI layer — Copilot V3 — was a RAG-based pipeline that generated KQL queries from natural language. We'd gotten query accuracy up to 75%+, which was solid. But customers wanted more:
- Follow-up questions that remembered context from earlier in the conversation
- Ability to generate charts and visualisations directly from the AI
- Advanced reasoning, not just query generation
These weren't things we could bolt onto a query-generation pipeline. They required a fundamentally different model — the Azure OpenAI Assistant API.
The Challenge: No Playbook
Nobody in MCI had shipped the Assistant API to production before. There was no internal reference implementation, no team to ask "how did you handle X?". We were building the playbook as we went.
The three hardest problems were:
1. Manufacturing Data Is Large — Too Large for the Context Window
A manufacturing plant dataset can have hundreds of thousands of records across production, downtime, consumption, and scheduling. You can't dump all of that into a context window and ask the assistant to reason over it.
My solution: build a filtering layer that sits between the plant dataset and the assistant. Before the assistant ever sees data, the layer queries and extracts only the relevant subset — based on the user's question, the entity graph, and the time range. The assistant then operates on a focused, manageable slice.
2. Views Instead of Complex Traversal Queries
One of the biggest sources of hallucinations in LLM-based systems is complex query generation. The more logic the model has to reason about to construct a query, the more chances for error.
I created 5 purpose-built ADX views — pre-computed, materialised views that the assistant could query directly:
- Downtime View
- Actual Production View
- Actual Consumption View
- Scheduled Production View
- Scheduled Consumption View
Each view handled all the traversal logic internally. The assistant just needed to pick the right view and apply filters — dramatically simpler than generating traversal queries from scratch.
"View creation instead of creating complex traversal logic [was key in] avoiding hallucinations." — Manager feedback
3. Conversation Threading
Implementing conversation IDs so users could ask follow-up questions was conceptually simple but required careful threading logic — each conversation needed its own thread ID, and the assistant had to consistently pick up context from previous turns without cross-contamination between users.
The HMI 2025 Demo
The target was to be ready for HMI 2025 in April — one of the most visible industrial automation events globally, where Rolls Royce was doing a live customer demonstration of our platform.
We worked with the Rolls Royce team on their specific dataset: ingesting their data, creating custom ADX functions to match their P0 questions, and building a demo flow that showed end-to-end assistant capability — natural language → data → chart — live on stage.
What We Shipped
- Filtering layer — data subsetting before assistant context
- Conversation IDs — persistent multi-turn conversations
- 5 ADX materialised views — pre-computed data for each domain
- File search capability — users can attach domain-specific files; assistant uses similarity search to include relevant content
- Configurable base prompt — CRUD APIs for users to add custom instructions to the assistant's base prompt
- Validate assistant job — automated scoring of assistant response quality
- L0 and L2 tests — full test coverage for all assistant flows
Lessons
- Design for the model's limitations, not its capabilities. The assistant is powerful, but hallucinations happen at the boundary of what it knows. Every design decision I made was about reducing what the model had to reason about from scratch.
- Get customer data early. We would have found more edge cases before HMI if we'd had Rolls Royce data in the system sooner. Ship early, test on real data.
- Materialised views are underrated. Pre-computing common query shapes dramatically reduces LLM hallucination and query failure rates. The model's job should be filtering and reasoning, not constructing complex joins.