Skip to content

Why is Relationship Discovery in data so hard?

I’ve been pondering this question for more than two decades. Tools to help with the problem have been available for longer than that. Experts have been telling us that this is an urgent problem to solve for even longer. And yet here we are, in 2024, and the problem is no easier to solve than it was way back then.

We’re left with a host of questions that are critical to answer to enable a smooth digital or data project, and yet the answers we produce are incomplete or wrong in many cases:

Data prep mapping readiness questionsThe consequences are that the business rules we use must be refined in what can seem like an endless iteration of test, analyse, and fix. This is an example of the OODA Loop, which is great when you are in production and learning about customer behaviour. It’s expensive and frustrating when each iteration of the loop delays customer insight and production value, and bursts your budget and business case.

The fundamental reason why current tools and approaches don’t work is the clumsy way we have driven the process over the decades. We have relied on variations of two themes: brute force and, more recently, Artificial Intelligence.

The brute force method basically joins two candidate columns together. This is accurate and deterministic. Plus 10 points for that. On the other hand, it comes with high labour costs and higher compute costs. We did some back of the envelope calculations and we reckon that, for a medium size data warehouse project with 1TB data, around 5000 candidate columns, and using a modern cloud data warehouse, the compute time needed is around 15 million seconds. The labour needed is on top of this. Minus 15 million points for that.

The AI method is less accurate, and non-deterministic. That is, the answer can shift the more it learns. Well, learning is good, right? True, but not great for your early results. It has other problems though. AI is good at predicting an answer within the field of things it has seen before. And that usually means you need a mature metadata base, including business rules, data lineage, data owners, etc. 

What if you need something new that it hasn’t seen before? Yeah, that’s tricky. What if you have a greenfield site? Or your metadata capture and management is immature? A much bigger problem. You’ll have to wait until you’ve collected and carefully curated all that metadata, which could take years. It’s a shame that your CEO won’t wait years to start the new project.

Let’s cut to the chase. We have a new technology that can give you the accurate answers you need, in a timeframe that is several orders of magnitude faster than those described above. Around a week, elapsed time. Oh, and it fits neatly into your existing RBAC controls.

If the problem we describe is one you face now, get in touch and find out how you can Relait your data. We’d love to prove it to you.