Lots of data isn't the same as useful data. Structure it first, then automate. That's the part most AI for business pitches skip.
Most owners we talk to start from the same place. They've been running the business for years, so they assume they're sitting on a goldmine of data. Years of invoices. A CRM full of contacts. Inboxes, spreadsheets, a shared drive nobody's cleaned since 2019. The logic goes: we've got all this data, so plugging in an AI tool should be easy money.
Then they plug it in, and it's underwhelming. The chatbot gives confidently wrong answers. The automated workflow needs a human to fix it every second run. The thing that was supposed to save ten hours a week saves about two and creates new headaches.
The tool usually isn't the problem. The data feeding it is.
The numbers nobody mentions in the sales deck
This isn't a hunch. Gartner predicts that through 2026, organisations will abandon 60% of AI projects that aren't supported by AI-ready data (Gartner, Feb 2025). In the same research, 63% of organisations said they either don't have the right data management practices for AI or aren't sure whether they do.
It gets blunter. Informatica's 2025 CDO survey found data quality and readiness was the single biggest obstacle to AI success, and only 12% of organisations reported data of sufficient quality and accessibility for AI applications. MIT's 2025 research on generative AI got the headline that did the rounds: roughly 95% of GenAI pilots delivered no measurable impact on the bottom line.
You can read those stats as "AI is hype." We read them differently. The models are fine. Most of the failures trace back to the same boring root cause: the data was a mess before anyone tried to automate on top of it.
Automation doesn't fix a mess. It speeds it up.
There's a line from a piece on small business automation that stuck with me: when a process is unclear, AI doesn't remove the mess, it "gives the mess a nicer interface and makes it move faster."
That's exactly what we see. A typical Adelaide SMB has a CRM for sales, a separate accounting system, jobs tracked somewhere else, and customer history scattered across email and someone's memory. Each system works on its own. Together, they disagree.
Sales counts a deal as "won" at the quote stage. Finance counts it as revenue when the invoice is paid. Both are right inside their own tool. Point an AI agent at both and ask "how many customers did we win last quarter," and it'll give you a number that's technically computed and completely useless, because the underlying definitions never matched.
Comptia's research found mid-sized companies are actually the most likely to have a high degree of data silos, because they grew fast and accumulated systems without a plan for how it all fits together (Computer Weekly). That's most of the businesses we work with.
LLMs don't reason their way around this. They pattern-match on what you give them. Feed them contradictory, half-labelled, duplicated records and you get answers that sound right and aren't.
It's worth running the same logic against your AI tool spend. Our post on how many of your AI tools actually touch revenue covers a quick audit that surfaces exactly this kind of misalignment.
What "structured" actually means (and what it doesn't)
When engineers say "structured data," owners sometimes hear "expensive six-month data warehouse project." That's not what we mean, and you almost never need that to start.
Structured, in practical terms, means a few things. Your key facts live in one agreed place instead of three. The fields are consistent, so a date is always a date and a customer name isn't spelled four ways. The important terms have one definition the whole business uses. And the data is machine-readable, not buried in PDF attachments or free-text notes.
Here's the technical bit worth knowing, because it explains why structure matters so much for these tools specifically. Most business AI tools use a technique called RAG (retrieval-augmented generation): before the model answers, it goes and fetches the relevant records from your data, then writes its answer from those. The quality of what it fetches sets the ceiling on the quality of the answer. Research on adding structure and metadata to that retrieval step measured around a 30% improvement in answer quality on analytical and comparative questions (SRAG paper, arXiv). A separate review of RAG over structured data found models routinely fail on the harder queries when the underlying data is loosely organised (AI21).
The short version: tidy, well-labelled data isn't a nice-to-have for these systems. It's the input they're built to run on.
Where to start (without boiling the ocean)
You don't fix everything at once, and you shouldn't try.
Write down the definitions first. The single most revealing exercise we run with clients is asking each department what "active customer" or "revenue" means to them. The disagreements that surface tell you exactly where automation would have gone wrong. This is the business-glossary idea, and it costs nothing but an afternoon.
Pick one source of truth per thing. One system that owns customer records. One that owns financials. Everything else reads from those, rather than keeping its own conflicting copy.
Clean the dataset you're about to automate on, not all of them. If you're building an AI assistant for customer support, get the support history and product info in order. Leave the rest for later. Start where the payoff is obvious.
Then automate. Once the data underneath is consistent, the same tools that were disappointing tend to start working, because you finally gave them something to work with. That's the same principle behind getting real ROI from AI in year one: start narrow, prove value, then expand.
The honest summary
More data was never the goal. We've worked with businesses sitting on a decade of records that couldn't get a useful answer out of any of it, and smaller ones with a clean, well-defined customer list that got real value from AI in a couple of weeks.
If you've tried an AI tool and walked away unimpressed, it's worth checking the data before you blame the tool or write off the whole idea. Nine times out of ten, that's where the problem is hiding.
If you'd like a second pair of eyes on your current setup, that's the kind of assessment we do through our AI automation practice. Happy to tell you honestly what's worth fixing and what can wait.
