When I joined the Manufacturing Data Platform team at Microsoft, we were ingesting IoT data for customers with 330,000 nodes and 450,000 relationships. The pipeline worked — but it was slow, expensive, and had one major bottleneck hiding in plain sight: Azure Digital Twin (ADT).

This is the story of how I removed it, what I learned, and the numbers we ended up with.

The Problem: ADT Was Doing Too Much

Azure Digital Twin was sitting as an intermediate layer in the ingestion pipeline. Every node and relationship we ingested had to pass through ADT before landing in Azure Data Explorer (ADX), which was our actual query and analytics layer.

ADT was also the most expensive resource in the entire solution — customers were paying significantly for it. But more importantly, it was slowing us down in ways that were getting harder to ignore as customer datasets grew:

The core issue ADT was designed to model digital twins, not act as a high-throughput ingestion intermediary. We were using it for something it wasn't optimised for — and paying the performance and cost penalty.

The Hypothesis

We had already been storing everything in ADX anyway. ADT was essentially a pass-through that added latency and cost without providing meaningful analytical value in our architecture. The hypothesis was simple: what if we wrote directly to ADX and skipped ADT entirely?

I'd done a prior POC showing ADX was 5–6x faster than ADT for graph queries. That gave us confidence the data model could live entirely in ADX. Now the question was whether we could redesign the ingestion flows to match.

The Work

There were three distinct ingestion flows to redesign: node creation, relationship creation, and OPC-UA data ingestion. Each had its own quirks.

Node Ingestion

Straightforward to redesign — instead of creating twins in ADT then syncing to ADX, we wrote directly to ADX property tables. The main work was mapping the DTDL schema correctly and ensuring the batching logic was preserved.

Relationship Ingestion — Where the Real Gain Was

This is where the 54% improvement came from. The original flow was processing relationships one batch at a time through ADT, which involved multiple round trips. The key fix was modifying the GROUP BY clause logic to batch-process multiple twins in a single ADX call.

Instead of: for each relationship → call ADT → sync to ADX

We did: group relationships → single batch call to ADX

This eliminated the relationship ingestion bottleneck almost entirely.

Infrastructure and Configuration

I also updated the Bicep IaC templates to reflect the new architecture — ADT resources were removed, ADX configuration was updated. And I tuned configuration parameters (RequestBatchSize, PollingInterval, partitionCount) based on benchmarking data I had collected earlier.

Validation

Before shipping, I validated correctness across all 330k nodes and 450k relationships — zero validation errors. The concern was always: does removing ADT break anything downstream? Answer: no, because ADX was already the source of truth for all queries.

Results

39%
Faster node ingestion
18 min → 11 min
54%
Faster relationship ingestion
39 min → 18 min
$0
ADT cost — most expensive resource eliminated
Manager feedback "His passion for picking up challenges with ADT removal work... tackling them systematically and delivering impact with consistency."

What I'd Tell Someone Doing This

AzureAzure Data ExplorerIoTPerformanceC#BicepKQL