Preparing Your Data for the Next Decade of AI

In partnership with

Over the last year, I have sat in more AI strategy meetings, steering committees, and “innovation sessions” than I can count.

And the same pattern repeats every single time.

A team walks in with a polished prototype. Everyone is excited. The demo looks great. Those slides make it seem like the entire company is one quarter away from becoming AI-powered.

Then someone asks a simple question: “Where is the data coming from?”. The room goes quiet. Then three different answers get thrown around. Then someone mentions lineage. Then someone pulls up an old dashboard. And the meetings ends with..get ready for it.. “we will need to look into that..”.

And that is when the truth becomes impossible to ignore. The real blocker is not AI at all, it is that most companies do not understand their data well enough to use to responsibly, consistently or at scale.

I have watched this pattern in banks, fintech, healthcare, retail and tech. The domain changes, the underlying problem does not.

The Pattern That Always Shows Up

Teams can build impressive prototypes. But the moment basic data questions enter the conversation, definitions, ownership, controls, lineage, then everything starts to unwind.

AI makes the existing problems with data impossible to ignore. And inside most organizations, those problems are everywhere:

conflicting definitions across teams
multiple “sources of truth”
lineage that only exists in Visio diagrams from 2013
undocumented transformations buried inside SQL scripts
quality checks that no one monitors
critical data elements with no owner
metrics recalculated differently in every tool

AI simply amplifies whatever foundation it sits on. If the foundation is inconsistent, everything built on top of it becomes unpredictable.

Where AI Projects Actually Fall Apart

Every company has the same story:

POC works
Demo impresses leadership
Someone suggests scaling it

Then the project exposes reality:

the data feeding the model is reconciled nowhere
the logic behind key metrics is not documented
two systems use the same field name but mean different things
the “training dataset” was cleaned manually for the demo
the refresh process is unclear
nobody knows which upstream changes would break the model

And suddenly the “AI Initiative” turns into a six month data remediation effort. Then this project becomes complex all of the sudden and they stall. They stall because the underlying data can not support what the model needs to do.

Effortless Tutorial Video Creation with Guidde

Transform your team’s static training materials into dynamic, engaging video guides with Guidde.

Here’s what you’ll love about Guidde:

1️⃣ Easy to Create: Turn PDFs or manuals into stunning video tutorials with a single click.
2️⃣ Easy to Update: Update video content in seconds to keep your training materials relevant.
3️⃣ Easy to Localize: Generate multilingual guides to ensure accessibility for global teams.

Empower your teammates with interactive learning.

And the best part? The browser extension is 100% free.

Check out Guidde

What AI Actually Depends On

AI can work with imperfect data, what it cannot work with unknowable data.

Most organizations today can’t answer the simplest questions about their own information:

What does this field represent in business terms?
Where exactly does it originate?
What transformations shape it?
Who is accountable for its accuracy?
What controls validate it?
Which decisions rely on it downstream?

If you do not have clarity on these, AI will behave unpredictably, and it will do so with confidence, which is far worse than being wrong. AI is limited by the maturity of the data sitting underneath it.

The Framework I Use to Evaluate AI Readiness

After seeing the same issues across different companies, I started assessing AI readiness in four categories. This is the same lens I use before approving anything that touches AI.

Definitions Are Clear and Agreed Upon
If five teams define a metric differently, the model will simply inherit the confusion.
A usable definition looks like this:
- what the metric measures
- the exact formula
- the system of record
- the transformation logic
- who owns it
- which report depend on it
If humans can’t align on the definition, AI never will.
Lineage Exists Beyond PowerPoint
Real lineage is not a diagram created once and forgotten.
It is:
- continuously updated
- tied to actual business rules
- mapped at the field level
- connected to controls
- owned by someone
Without this, every data pipeline becomes a black box, and black boxes break AI.
Quality Checks Run Automatically and Tell You Something Useful
Most companies have “quality checks” that have not run in months. Effective controls:
- run on a schedule
- alert someone when they fail
- tie to business impact
- feed into a an escalation process
- are monitored, not assumed
If you can’t maintain quality for reporting, you won’t maintain it for AI.
Training Data is Curated, Not Dumped
People treat training data like a warehouse dump, “throw it all in and let the model learn”. But the model learns exactly what you give it.
If the data is inconsistent, incomplete, biased, stale, undocumented, you get those characteristics back at scale. Training data does not need to be perfect, just intentionally selected and governed.

Who Actually Wins With AI

You would think that companies buying the most GPUs or experimenting with the most pilots are winning, but that is actually not the case. It is the companies that treated data infrastructure:

cleaned up definitions years ago
reconciled metrics across functions
fixed lineage early
established ownership
documented logic
automated quality checks
created governed training datasets

AI rewards organizations that respected their data long before AI become a headline. Everyone else is discovering how expensive is to skip the fundamentals.

At some point, every company hits the same realization: AI is not a shortcut around data maturity. It is in fact a stress test for it. The organizations that move faster are the ones that invested in the fundamentals long before the AI hype cycle began.

Preparing Your Data for the Next Decade of AI

The Pattern That Always Shows Up

Where AI Projects Actually Fall Apart

Effortless Tutorial Video Creation with Guidde

What AI Actually Depends On

The Framework I Use to Evaluate AI Readiness

Who Actually Wins With AI

Get stories like this in your inbox weekly

Keep Reading

DATA EXEC
For data professionals ready to move from data analysis to business impact and unclock their next career level.

Preparing Your Data for the Next Decade of AI

The Pattern That Always Shows Up

Where AI Projects Actually Fall Apart

Effortless Tutorial Video Creation with Guidde

What AI Actually Depends On

The Framework I Use to Evaluate AI Readiness

Who Actually Wins With AI

Get stories like this in your inbox weekly

Keep Reading

DATA EXECFor data professionals ready to move from data analysis to business impact and unclock their next career level.

DATA EXEC
For data professionals ready to move from data analysis to business impact and unclock their next career level.