In any data intensive project, you will find an iterative process that looks much like the one shown in Figure 1 below.
It is rare for any project to deliver the business requirement with a single pass through of this process. Most project plans expect to iterate through this process a handful of times, and projects with higher complexity tend to need more iterations. Often, there are repeated surprises in the data that lay undiscovered in analysis steps, and are revealed only in development - which generates unplanned iterations.
A surprise occurs when the data behaves differently to expectations. These expectations are set by reading system documentation, analysing data quality, reviewing existing data reports or analytics, and engaging in surveys or workshops with Subject Matter Experts. Expectations are then documented and become the data and system requirements.
Surprises in the data are usually linked to a misunderstanding of how data from one system relates to data in another. The more systems in scope, the higher the number of surprises. (These surprises are often unpleasant.)
At DataJPS. we believe that the main reason for these surprises is that the data relationships, within and across applications and systems, are not well understood. We discuss why this is hard here and the impact on projects and business value here. (Our next blog will discuss why betting on GenAI to help you won't deliver the panacea you need.)
TL;DR - data relationship discovery is hard because traditional methods don't scale, so even though people know it's important, the task is dropped because it is too hard. The impact is connected to the extra iterations discussed above and in the links. We reckon it can add around a third to your delivery costs.
And that's assuming that you can complete the project as imagined. The data might not support it - and you only find out after spending a bucket of money.
What if you had an accurate knowledge graph that mapped your whole enterprise data estate? What if you could actually document the real relationships across your enterprise systems and applications? What if this could happen before the project started? What if you could get results inside a week, with near instantaneous response times?
This capability is in the core of Relait - Data Relationship Discovery from DataJPS. We're confident that we can reduce project risk, delivery timescales, and implementation costs. At the same time, we can improve data quality and business results for your pipeline of data intensive projects.
Get in touch via the contact form below for more information.