MegatronLead

Perspectives

Why lead attribution dies in dedupe and how to keep it

Every dedupe operation in a CRM destroys attribution data, silently, by design. The fix is not a better merge rule; it is a different data model.

ByFounder, MegatronLead7 min read

Builds operational software for multi-market sales organizations. Twenty years across enterprise IT, M365, and revenue operations.

Perspectives

Why lead attribution dies in dedupe and how to keep it

If you have ever asked marketing "which channel drove this customer," and the answer was a confident number that turned out to be wrong, the cause is almost always the same: somewhere between the first touch and the closed deal, a merge happened, and a source disappeared.

This is not anyone's fault. It is the predictable consequence of a data model that most CRMs ship by default. The model is wrong, and the wrongness costs you budget decisions for as long as you keep using it.

The mechanic

Most CRMs store source as a field on the contact record. The field has one value. When a new touch happens (a form submission, an ad click, a CSV import), one of three things occurs:

  1. First-touch attribution. The original source is preserved. New touches do not update the field.
  2. Last-touch attribution. The most recent source overwrites whatever was there.
  3. Manual selection. When records are merged, an admin picks which source wins.

All three destroy data. The first throws away every touch after the first. The second throws away every touch except the last. The third throws away whichever the admin did not pick.

Most CRMs let you configure which destruction model to use. None of them solve the problem, because the problem is the assumption that source is one value.

The cost

This is not an academic concern. It shows up in three specific ways:

Channel ROI math is wrong. You spend $50K on Meta, $50K on LinkedIn. Your CRM says Meta drove $400K of pipeline. LinkedIn drove $200K. Cut LinkedIn? Maybe. Maybe not, because what you cannot see is that 30% of the Meta-attributed deals also had a LinkedIn touch upstream. Cutting LinkedIn might shrink the Meta number too.

Funnel analysis breaks at merge points. Reports show conversion from MQL to opportunity. The reports are computed against the current source field, not the historical one. After a year of merges, the source field reflects the post-merge winner, and the funnel rate per source is computed against a fictional distribution.

Customer reference questions get answered with hedges. "How did you find us" is a question your customer success team asks during onboarding. The answer in the CRM is one source. The customer's actual answer is "I saw your ad, then I read your blog, then a colleague mentioned you on LinkedIn." Three sources, one field, no way to record the full story.

The structural fix

The fix is not a better merge rule. The fix is to stop modeling source as a field.

Source is an event. Each touch is an event. Each event has a timestamp, a channel, a campaign or ad set, and a reference to the original payload. A contact has zero or more source events. When a new touch happens, the platform appends an event. It does not overwrite anything.

When two records merge, the merge preserves both contacts' source events on the surviving record. The merge does not pick a winner; it computes a set union.

The "primary source" field, if your reports need a single value, becomes derived: most recent event, first event ever, highest-weighted event under whatever attribution model you choose. The field is derived; the events are authoritative.

What this unlocks

Three concrete improvements:

Multi-touch attribution becomes real. You can run first-touch, last-touch, U-shaped, time-decay, custom-weight against the same data. You do not have to pick one and live with it. The data supports every model.

Dedupe stops being a budget risk. The marketing team does not have to argue with operations about whether to merge two records because the merge does not destroy data. Both source histories survive. The merge is a structural improvement.

Source attribution stays accurate across years. A deal closes two years after the first touch. You can still answer "how did this customer find us" with the full sequence: Meta in February, LinkedIn in May, a webinar in August, sales outreach in October.

Why CRMs do not ship this

This is a fair question. The answer is mostly historical. CRMs were designed around the deal, and the deal has one source for accounting purposes. The source field is the accountant's view, codified into the data model.

The marketer's view, "which touches contributed," was a later addition, bolted on as custom multi-touch attribution objects, third-party tools, or paid add-ons. Each is a workaround for the underlying data model not being right.

The right model is straightforward: source is an event. Not field. Implementing this in a CRM that has the field-based model baked in is a multi-quarter migration. Most CRMs prefer to add a workaround. The customers who care about attribution end up with a Lead Intelligence platform above the CRM that has the event-based model from day one.

What to ask your platform

If you are evaluating any system that handles lead data, three questions surface whether the source model is right:

  1. "Show me a contact that was merged. Can I see both contributing sources?" The answer should be a screen with both source events listed, with their timestamps. If the answer is "we keep the most recent" or "the admin picks," the model is field-based.

  2. "Can your reports compute first-touch and last-touch attribution against the same data?" The answer should be yes, with examples. If the answer is "we use one model and you configure which," the model is field-based.

  3. "What happens to attribution when two records merge?" The answer should be "both sets of sources are preserved on the survivor." If the answer involves picking a winner, the model is field-based.

The right answers are precise. The wrong answers are hedges.

The honest conclusion

Most CRMs cannot fix this without a data-model migration. The customers who care end up running a Lead Intelligence platform above the CRM that handles the canonical lead model correctly, and lets the CRM be the system of record for the deal.

This is the operational layer thesis: the CRM is excellent for opportunity work, and the upstream operational layer that handles ingestion, dedupe, attribution, and routing is a separate concern with its own data model. The two systems complement each other. The attribution stays intact because the source-as-event model lives upstream.

For how MegatronLead implements multi-source attribution specifically, see what is multi-source lead attribution and the platform overview.

Related reading

More in this category

Operationalize your lead pipeline.

Talk to us about how MegatronLead handles your specific markets, sources, and audit requirements.