Shipping Azure OpenAI Assistant in Production

In early 2025, our manufacturing AI team became the first team across all of Microsoft Cloud for Industries (MCI) to ship the Azure OpenAI Assistant API in a production product. We then demonstrated it live at HMI 2025 — one of the world's largest industrial automation events — with Rolls Royce as the customer.

This is what it took to get there.

The Starting Point: Copilot V3 Was Good, But Limited

Our existing AI layer — Copilot V3 — was a RAG-based pipeline that generated KQL queries from natural language. We'd gotten query accuracy up to 75%+, which was solid. But customers wanted more:

Follow-up questions that remembered context from earlier in the conversation
Ability to generate charts and visualisations directly from the AI
Advanced reasoning, not just query generation

These weren't things we could bolt onto a query-generation pipeline. They required a fundamentally different model — the Azure OpenAI Assistant API.

The Challenge: No Playbook

Nobody in MCI had shipped the Assistant API to production before. There was no internal reference implementation, no team to ask "how did you handle X?". We were building the playbook as we went.

The three hardest problems were:

1. Manufacturing Data Is Large — Too Large for the Context Window

A manufacturing plant dataset can have hundreds of thousands of records across production, downtime, consumption, and scheduling. You can't dump all of that into a context window and ask the assistant to reason over it.

My solution: build a filtering layer that sits between the plant dataset and the assistant. Before the assistant ever sees data, the layer queries and extracts only the relevant subset — based on the user's question, the entity graph, and the time range. The assistant then operates on a focused, manageable slice.

Key insight Filtering data before it reaches the model is not just a performance optimisation — it directly improves accuracy. An assistant that sees 200 relevant rows gives better answers than one swimming in 200,000 rows of noise.

2. Views Instead of Complex Traversal Queries

One of the biggest sources of hallucinations in LLM-based systems is complex query generation. The more logic the model has to reason about to construct a query, the more chances for error.

I created 5 purpose-built ADX views — pre-computed, materialised views that the assistant could query directly:

Downtime View
Actual Production View
Actual Consumption View
Scheduled Production View
Scheduled Consumption View

Each view handled all the traversal logic internally. The assistant just needed to pick the right view and apply filters — dramatically simpler than generating traversal queries from scratch.

"View creation instead of creating complex traversal logic [was key in] avoiding hallucinations." — Manager feedback

3. Conversation Threading

Implementing conversation IDs so users could ask follow-up questions was conceptually simple but required careful threading logic — each conversation needed its own thread ID, and the assistant had to consistently pick up context from previous turns without cross-contamination between users.

The HMI 2025 Demo

The target was to be ready for HMI 2025 in April — one of the most visible industrial automation events globally, where Rolls Royce was doing a live customer demonstration of our platform.

We worked with the Rolls Royce team on their specific dataset: ingesting their data, creating custom ADX functions to match their P0 questions, and building a demo flow that showed end-to-end assistant capability — natural language → data → chart — live on stage.

Outcome Assistant achieved accuracy exceeding 75% on the Rolls Royce dataset. Successfully demonstrated live at HMI 2025. First team in MCI to onboard Azure OpenAI Assistant API to production.

What We Shipped

Filtering layer — data subsetting before assistant context
Conversation IDs — persistent multi-turn conversations
5 ADX materialised views — pre-computed data for each domain
File search capability — users can attach domain-specific files; assistant uses similarity search to include relevant content
Configurable base prompt — CRUD APIs for users to add custom instructions to the assistant's base prompt
Validate assistant job — automated scoring of assistant response quality
L0 and L2 tests — full test coverage for all assistant flows

Lessons

Design for the model's limitations, not its capabilities. The assistant is powerful, but hallucinations happen at the boundary of what it knows. Every design decision I made was about reducing what the model had to reason about from scratch.
Get customer data early. We would have found more edge cases before HMI if we'd had Rolls Royce data in the system sooner. Ship early, test on real data.
Materialised views are underrated. Pre-computing common query shapes dramatically reduces LLM hallucination and query failure rates. The model's job should be filtering and reasoning, not constructing complex joins.

Azure OpenAILLMRAGAzure Data ExplorerC#.NETPrompt Engineering

Shipping Azure OpenAI Assistant in Production — First in MCI

The Starting Point: Copilot V3 Was Good, But Limited

The Challenge: No Playbook

1. Manufacturing Data Is Large — Too Large for the Context Window

2. Views Instead of Complex Traversal Queries

3. Conversation Threading

The HMI 2025 Demo

What We Shipped

Lessons