Manufacturing

Extracting Ducts from HVAC Drawings: A Vision-LLM Reference Build

Computer VisionVision LLMsTechnical DrawingsHVACMulti-Model AIReference Build

The Challenge

Facilities teams, MEP consultants, and industrial project managers spend hours reading mechanical drawings. Before any retrofit, compliance review, or equipment change, someone has to go through the drawing and manually count ducts, classify them by pressure, and flag the high-pressure runs that need special attention. On a complex floor plan, this is a full afternoon of work, and it has to happen before anything else moves. The obvious solution is "throw a vision model at it." The reason this doesn't just work is that general-purpose vision LLMs don't know what matters on a technical drawing. They see lines and text, but they don't know that a 14" round duct running from a kitchen hood is a HIGH pressure classification that triggers different fire-code requirements, or that a branch runout under 10" is a LOW pressure concern you can batch-review. Without domain encoding, the output is either too generic to trust or too verbose to use.

Our Solution

We built a reference implementation that uploads an HVAC mechanical PDF, sends it through a vision LLM with HVAC-specific prompting, and overlays the results directly on the drawing as color-coded, clickable annotations. The build is deliberately end-to-end — upload, analysis, rendering, and interaction — because the engineering challenge is not "call the vision API." It's making the output trustworthy enough that an engineer will actually use it. The analysis layer runs through a multi-model fallback chain: Claude Sonnet 4.6 first, Gemini 2.0 Flash as fallback, GPT-4o-mini as budget option. Each model is tried in order and the first to return a non-empty duct list wins. The response includes which model was actually used and which ones failed, so an operator can audit output quality over time without guessing. The classification layer encodes HVAC domain knowledge directly: duct dimensions drive the pressure class (HIGH for kitchen hood exhaust and grease ducts, MEDIUM for trunks over 10 inches, LOW for branch runouts under 10 inches), with pressure ratings mapped to water-gauge thresholds. Round and rectangular ducts get different visual treatments on the canvas — round ducts show end-cap circles, rectangular ducts scale stroke width by size — so an engineer can read the drawing at a glance. The rendering layer uses react-konva for interactive canvas overlays. Each detected duct is a clickable polyline with a dimension label, pressure class, pressure rating, and model confidence. A sidebar groups ducts by pressure class with collapsible sections, so reviewers can work top-down: high-pressure runs first, then medium, then low. Backend streams progress over SSE during analysis, because a 30-second vision call with no feedback feels broken. Users see which model is being tried, which ones returned empty, which ones failed and why.

What This Proves

Vision LLMs work on technical drawings when you encode the domain. The classification schema, the dimension extraction rules, the pressure mapping — none of that is the model's job. That's our job, before the model ever sees the PDF.
Multi-model fallback is a production requirement, not a nicety. Vision models fail in different ways on different drawings. A single-model pipeline breaks the first time Claude returns an empty duct list for a drawing that Gemini handles fine.
Confidence scores change how engineers use the output. Every duct annotation carries a confidence value. Engineers self-triage — high-confidence detections get a scan, low-confidence ones get a second look. The tool becomes an assistant, not an oracle.
The pattern generalizes. The same architecture — vision LLM + domain-encoded classification + interactive overlay — applies to P&IDs, electrical single-line diagrams, and equipment layouts. The domain schema changes; the engineering pattern stays.

What a Real Engagement Looks Like

Adapting this reference to a specific drawing domain takes 4–6 weeks:

Week 1: Domain encoding. Work with the client's senior engineer to capture the classification rules that matter for their drawings — the equivalent of the pressure-class schema in this build, but for P&IDs, single-lines, or layouts.

Week 2–3: Prompt and schema tuning. Iterate on the vision prompt until the model reliably extracts the right elements with the right attributes. Calibrate confidence thresholds against a labeled set of the client's real drawings.

Week 4: Canvas overlay and interaction design. Configure the visual treatments (colors, labels, shape cues) to match how the client's engineers actually read drawings.

Week 5–6: Integration and review. Connect to the client's drawing storage (SharePoint, Procore, Autodesk Construction Cloud, or a custom CMS). Evaluation round with the senior engineer. Tune, re-evaluate, ship.

Because the engineering pattern is already proven, the engagement focuses on what matters: capturing the client's domain knowledge and getting the tool into engineers' hands.

Outcomes

3 models

Vision-LLM fallback chain

3 pressure classes

HVAC-specific classification

SSE streaming

Real-time analysis progress

Interactive overlay

Click-through duct detail

Ready to achieve similar results?

Start with a focused PoC and see the value in your own operations.

Start a Conversation