Least-cost routing, reborn.
Telecom carriers have picked the cheapest route that clears the quality bar — per call, for decades. XSI-AIMS Compass™ applies that discipline to model calls. Declared requirements in, cheapest qualifying model out, every decision on the record.
The oldest routing discipline in telecom.
Walk into a Tier 2 carrier's switching office around 2005 and somewhere (taped beside a console, or burned into the softswitch database) lives the least-cost routing (LCR) table. Every termination route the carrier can buy is in it: the per-minute rate, the quality history (answer-seizure ratio, post-dial delay), the time-of-day bands, the failover order. When a call hits the switch, nobody asks which carrier the engineers prefer. The switch asks what this call needs — destination, quality floor, margin — and picks the cheapest route that clears the bar. The deck updates weekly (daily, on contested routes). Hardcoding a carrier would have been an absurdity no operator could afford.
Agent fleets route like it's 1980.
Most agent deployments in 2026 do the opposite. Model choice is a constant in the code — one flagship model for every call, from the 40-token classification ping to the 150,000-token contract review. The spread that constant ignores is wide. As of June 11, 2026, published per-million-token input list prices across the three major vendors' current model lineups run from $0.10 (Gemini 2.5 Flash-Lite) to $10 (Claude Fable 5), a 100× spread. Output rates run $0.40 to $50 — wider still, at 125×. A carrier facing a 100× spread across termination routes would route per call without a second thought. An engineering team facing the same spread across model vendors mostly ships a constant.
Declare the call — not the model.
XSI-AIMS Compass is the commercial model router for the XSI-AIMS™ standard — the Agent Instrumentation and Management Specification, an open standard for agent governance that XSI published June 12. Register provider keys once: Anthropic, OpenAI, Google, locally served, anything that speaks a known protocol. Compass profiles each registered model's cost and capability surface. Then, per call, the agent declares requirements instead of naming a model — max cost, required tool support, a latency budget, JSON-mode, minimum context window, modality, an adversarial-challenge round for safety-critical calls, vendor exclusions or pinned overrides. Eight dimensions. A requirements vector. Compass picks the cheapest registered model that meets all of them.
Declaring needs instead of naming models also inverts the maintenance burden. When a vendor ships a cheaper model that clears your existing floors, fleet costs fall with no code change — the same way a carrier's costs fell when a cheaper route entered the deck. The agent's code encodes what its calls require (stable for years) rather than which model to use (obsolete within a quarter).
How Compass scores a vector against the registered field is XSI's rate deck — implementation IP, the way a carrier's deck was the carrier's margin. The contract is open. The policy is the product.
What a constant costs.
An illustrative scenario — arithmetic at published vendor list prices, not a benchmark. XSI has published no XSI-AIMS Compass benchmarks, and this is not one. Take a 10-million-call monthly fleet at 2,000 input and 500 output tokens per call (a midsize agent estate with an ordinary mix of classification, extraction, drafting, and review traffic).
Illustrative scenario — list-price arithmetic, not a benchmark.
Workload: 10,000,000 calls/month · 2,000 input + 500 output tokens per call
All-frontier (every call at $5 in / $25 out per MTok — Claude Opus 4.8 list rate):
(2,000 × $5 + 500 × $25) / 1,000,000 = $0.0225 per call
10,000,000 × $0.0225 = $225,000 / month
Requirements-routed (70% of calls clear at $1 in / $5 out — Claude Haiku 4.5 list rate):
7,000,000 × $0.0045 = $31,500
3,000,000 × $0.0225 = $67,500
= $99,000 / month
Difference: $126,000 / month (56%) — same workload, same prompts.
Prices: published vendor list rates as of June 11, 2026.
Fifty-six percent off the same workload — not from discounts, not from caching, from routing. The 30% of calls that declared frontier requirements still got frontier models. The arithmetic scales linearly with volume and token shape — what moves it is the share of calls that genuinely needs the premium route. And the within-model levers still stack on top: prompt caching (cache hits price at 0.1× base input on Anthropic's published multipliers) and batch tiers (50% off at all three vendors) discount whichever model the call routed to.
Cheaper is not worse.
The objection writes itself — cheap models give bad answers. Two responses. First, requirements-scored routing never trades below the declared floor: a call that declares tool support, a 200k context window, and a two-second latency budget routes only to models that meet all three. Cost breaks ties among qualifiers — it never overrides a requirement. Second, the published evidence says harness structure can move outcomes more than model premium — on the workloads they measured. A University of Wisconsin–Madison study (arXiv:2604.13151, April 2026) measured +30–37 percentage points of task success from structured harness state alone, with no weight changes (Gemini-3.1-Flash-Lite went from 51.9% to 88.9% on their benchmark environments). That model lists at $0.25 per million input tokens. Their result, not an XSI-AIMS benchmark — and exactly the kind of result that makes requirements-routing pay.
For the calls where a wrong answer costs real money, Compass spends more on purpose. Declare an adversarial-challenge round and the call runs a primary model plus a different-vendor challenge model in parallel — disagreement surfaces to the agent before it commits. Two inferences instead of one, for the calls where a wrong answer costs more than two inferences.
Every decision on the record.
Telecom billing settles on call detail records (CDRs) — every routed call leaves one, and interconnect disputes die by the CDR. Compass keeps the same discipline. Every routing decision — which model served the call, which models lost and on what requirement — is recorded to the public XSI-AIMS agent registry audit log. Conformance reviewers see the decisions without seeing prompts or completions. The record also answers the question every fleet operator is currently guessing at: what share of traffic actually needs frontier capability. Once calls declare requirements, the requirement mix becomes measurable from production traffic (it is the number the arithmetic above turns on). When finance asks why the March invoice doubled, the answer is a query, not an archaeology project.
Gap: routing does not fix a bad prompt — a mis-specified call routes cheaply and fails cheaply, and no router can tell the difference. Savings depend entirely on workload mix: a fleet whose every call genuinely needs frontier reasoning and a frontier context window sees single-digit savings, not 56%. And the requirements taxonomy is engineering work the customer owns — deciding what each call class actually needs is the hard part, and Compass does not decide it for you.
The same table. New traffic.
Back to the switching office. The least-cost routing table never disappeared — it stopped being paper and became a database row consulted thousands of times a second, invisible and load-bearing, the quiet machinery under every call placed for four decades. Now read that table with 2026 traffic. Per-minute termination rates become per-token list prices. Destination and quality floor become eight declared requirements. The carrier's deck becomes a requirements vector scored across every registered model — with each decision logged where an auditor can find it. The discipline is decades old. Only the calls changed.
If you ran LCR — or you're paying for an agent fleet that should — talk to us.