Research

Entity Resolution for Enterprise AI Agents: Why ERP Data Breaks Every LLM

ERP data is full of duplicates, aliases, and inconsistent identifiers across systems. Without entity resolution, LLM agents silently produce wrong answers — and you won't catch them.

By 4 min read
Share

Ask an LLM agent: "How much did we sell to Acme last year?"

The agent will answer. It will sound right. It will probably be wrong.

Here's what's happening in the data layer below that question. The customer "Acme" exists in your CRM as Acme Corp. In your ERP, the same company is two separate accounts: Acme Corporation (parent) and Acme Corp NA (the US billing entity), because finance and sales never reconciled the merger. Your support ticketing system tracks them as a third record, acme-corp-prod, because IT-systems-naming. Your data lake stitched the three together with a fragile fuzzy-match in 2023 that hasn't been retrained since.

The agent retrieves the records the most-confident match returns and sums them. The number it gives you excludes one of the three records, double-counts a renewal, and silently uses the wrong fiscal year boundary because two of the systems disagree on what "FY24" means.

This is the entity resolution problem. It's the most underrated bug in enterprise AI, and it breaks every LLM that touches a real ERP.

Why LLMs are systematically bad at this

Entity resolution — deciding whether two records refer to the same real-world thing — is not a language problem. It's a graph and probabilistic-matching problem. LLMs treat strings as evidence, but they don't model the joint distribution over identifiers across systems, and they don't have access to the historical record of what got merged when and by whom.

Even with perfect retrieval, an LLM will:

  • Treat Acme Corp and Acme Corporation as plausibly distinct, then guess.
  • Pick the record with the most matching tokens, regardless of whether it's the system of record.
  • Miss aliases that don't share lexical similarity (ACM-001 vs. Acme).
  • Fail to merge records that should be merged, and merge records that shouldn't (the two unrelated "Smith Industries" subsidiaries).

The failure mode is the worst possible: confident, fluent, and wrong by 5-30%, with no trace.

Why this is harder than it looks

The textbook solution is "use deterministic IDs." The reason every enterprise has this problem anyway is that IDs are local to systems, and the systems were built at different times by different teams with different conventions. SAP's customer ID is not Salesforce's account ID. NetSuite's vendor ID is not the procurement system's supplier ID. The mapping between them lives in spreadsheets, in tribal knowledge, and in a half-finished MDM project from 2022.

A modern entity resolution system has to:

  • Combine string similarity, contextual features (address, tax ID, parent company), and graph signals (who-pays-whom, who-ships-where).
  • Handle the temporal dimension (Acme acquired BetaCo in Q2 — when does the consolidation start?).
  • Surface uncertainty rather than auto-merging — wrong merges are catastrophic and very hard to unwind.
  • Stay current as new records arrive, without retraining from scratch.

This is the kind of system that takes a small team six to twelve months to build poorly, and that we've spent years building well.

CrossER and what it measures

We benchmarked our entity resolution against the public CrossER dataset — the only open benchmark we know of that tests cross-system entity resolution at the schema variation and noise level you actually encounter inside an enterprise.

We hold the top published score on it.

The benchmark itself is open and reproducible. We mention this as a forcing function: if you're evaluating an enterprise AI vendor and they claim to handle entity resolution, ask them what their CrossER number is. If they don't have one, that's a signal.

What this means for your agent stack

If you're running an LLM agent over enterprise data and you don't have explicit entity resolution upstream of the model, every aggregation the agent does is suspect. That includes:

  • Customer-level reporting ("revenue from Acme")
  • Vendor consolidation ("total spend with Acme")
  • Supply chain views ("lead time for the Acme parts family")
  • Risk and compliance ("our exposure to Acme")

The right architecture is to resolve entities once, at the knowledge-layer level, and then let the model query against resolved entities — not raw records. This is one of the core jobs of an institutional knowledge layer, and it's why we don't think this problem gets solved by larger context windows or better RAG.

A practical test

If you want to know whether your current agent stack has this problem, run this experiment. Pick five customers, vendors, or products that you know exist under multiple names across your systems. Ask the agent the same factual question about each of them through different name variants. Compare the answers.

If the numbers don't match, you have an entity resolution gap, and every downstream answer the agent gives you is built on it.

Ready to make AI understand your enterprise?

See how Phyvant gives your AI tools the context they need to get things right.

Talk to us