|
My friend Christian Martinez had a mess. Three Excel tabs. Billing transactions on one. GL accounts on another. Bank reconciliation on the third. Cross-referenced rows were incomplete and dates had three different formats. Duplicates existed and GL codes were missing. He needed to clean it. Or have a complete nightmare on his hands. So he did something super smart. Instead of spending two days manually fixing things, he used AI to automate the entire process. (this is why he teaches inside my AI Finance Club) But - and this is really important - he never changed the raw data. He created a cleaned version beside it. Most people don't do what Christian did. They take that same mess and give it to AI and expect it to work. But AI doesn't always push back. It doesn't say "hey, your dates are in two different formats, which one do you want?" It just picks one. Then it gives you something that looks good, but is completely wrong underneath. I see this all the time. A forecast that looked perfect but had double-counted revenue. Why? Because nobody flagged the duplicate rows before uploading. The AI treated every single row as fact. And this is the thing that I want to tell you today. You do not need perfect data to use AI. But you need to understand what's broken first, because AI can take your existing problems and make them worse. In a way that looks really good on screen! Your Data Has 3 ProblemsYou have messy data. There is no getting away from this. I have never met anyone with perfect data. 57% of companies say data reliability is their biggest blocker against AI success (Informatica - CDO Insights 2026). Plus, most companies' governance hasn't kept up with their speed of AI adoption. So, you've bought the AI, trained people and run pilots. But you have not given your data the same attention. As I've said before: "If you use AI with those tables that are made for humans, you are not going to get a good output." And the reason is simple. Your data has three core problems, which AI can make worse if you are not careful.. Problem 1: Inconsistent formatsYour dates are in three different formats. Some columns use thousands, others use ones. One system says "United States", another says "US", a third says "USA." When you're in the spreadsheet yourself, you notice this. And you fix it as you go. AI might not notice. It just makes an assumption about which format is correct and moves on. Problem 2: Missing dataWhat if GL codes are blank for 15% of your transactions? You'd flag them and ask someone to fill them in. AI might fill them in too, but it doesn't ask you first. It looks at the patterns in your data and makes its best guess. And those guesses are the dangerous part because they look like they could work. Nothing tells you which ones are real and which ones aren't. Problem 3: Duplicates and Dirty recordsWhen you work with this data manually, duplicates are frustrating but you spot them eventually. With AI, every row can get treated as truth. So your duplicates don't cancel out and things get inflated. But nobody questions this until someone tries to reconcile back to the general ledger and the numbers don't match. So. These three problems are nothing new. You've been dealing with them your entire career. The difference now is that AI sits between you and the data. And instead of showing you the errors, it hides behind a clean-looking output. But… You already know how to fix this! You reconcile monthly. You audit internal controls. You understand IFRS and SOX. And those same skills apply to data and AI. You just need an AI method to clean and audit. Red, Amber or Green?Before you even begin thinking about cleaning your data, you need to understand it. So, pick your top three data sources. Then score each one across these five dimensions using RAG (Red/Amber/Green): Completeness. Are all required rows and columns populated? Green = everything there. Red = more than 10% missing. Consistency. Are all formats standardized? One dataset has dates as DD/MM/YYYY, another as MM/DD/YYYY. One has currency in thousands, another in ones. One codes country as "United States", another as "US". Green = all matching format. Red = multiple formats across the dataset. Timeliness. How fresh is the data? Green = refreshed daily. Red = refreshed monthly or slower. Accuracy. Spot-check 10 rows. Do they match your general ledger or bank statements? Green = 100% of spot checks match. Red = less than 80% match. Accessibility. Can AI actually read it? Is it locked? Is it a PDF? Does it have hidden columns? Is it multiple nested tabs? Green = CSV or clean single-tab structure. Red = PDF, image, locked, or deeply nested. You'll usually find Amber ratings on Completeness and Red on Consistency. And this is exactly where you start. Don't try to fix everything at once. How to Clean Your Data Using AISo, let me show you how to do what Christian did, step by step. Step 1: Prepare your raw data.Export your messiest dataset, the one with duplicates, formatting issues, missing values (the ones that showed up Amber or Red). Have it ready in Excel or CSV. Step 2: Use this promptUpload your data (make sure to use a secure AI tool and its reasoning mode) and ask: I have a financial dataset combining [describe: e.g., billing transactions, GL accounts, bank reconciliation]. The specific problems are:
1. [List: e.g., duplicate rows based on invoice ID][List: e.g., dates in three different formats: DD/MM/YYYY, MM/DD/YYYY, and text like 'Jan 15']
2. [List: e.g., missing GL account codes for 15% of transactions] 3. [List: e.g., data spread across three tabs that need to be consolidated]
I need you to:
- Identify all duplicates and flag them on a new tab (don't delete yet) - Standardize all dates to YYYY-MM-DD - Fill missing GL codes by matching transaction descriptions to this reference list [paste your GL mapping if you have one]
- Consolidate multi-tab data into one clean output
- Build everything using formulas, not static values
- Create a summary tab showing record counts before and after, by key dimensions
- Leave the original raw data completely untouched
This will clean your data, but it will also give you an audit trail. Step 3: Verify the outputCheck four things:
Step 4: Build your governance registerNow that you have clean data, you need control. Christophe Atten in a recent workshop raised the accountability question: "Who is accountable if AI produces wrong outputs?" Without clear governance, you're at risk (from 7 years as an Auditor for PwC, trust me when I say this). So, make sure to create one simple Excel register: Update it monthly. This register answers: What AI is running? Who owns it? Is the data ready? What's the risk? It's your control layer, and it also provides accountability. Step 5: Run an AI self-audit, then manually verifyDon't ask AI to fix things to begin with. Ask it to audit and flag. Upload your cleaned data and use this prompt: Review this financial dataset. Identify any anomalies or gaps. For each issue you find, tell me:
1. What you found (e.g., 'July 2025 shows zero revenue for Market Segment B')
2. Why it's unusual (e.g., 'This segment typically averages £400k monthly')
3. What might explain it (e.g., 'Reporting lag, data exclusion error, or business shutdown')
4. Do NOT change anything. Just flag it.
Take every flag and reproduce it in Excel using formulas. Build a reconciliation tab:
This reconciliation tab becomes your audit trail. When someone asks "Why did the forecast number change?" you can show them. One last tip - If you're working with a lot of data that requires a lot of different cleaning. I find that chunking it up works better. Remember, AI can still only work with so much data at once, so splitting it up will produce better results. The One Thing to RememberIf you skip the method and just throw your data at AI, you already know what happens. You can end up with more problems than you started with. So, use AI to help you fix your data gaps. But do it the way Christian did. Keep the raw data untouched, build an audit trail and put governance around it. Do this right, and every time you use AI, instead of getting worse, your data gets cleaner, your outputs get better, and your team trusts the process more. The sooner you do this, the sooner your data will improve. So make sure you start now. Best, Your AI Finance Expert, - Nicolas P.S. – Are you struggling with data right now? Hit reply and tell me. I read every reply. P.P.S. – If you're interested to know more of such tips, here's 9 Power Moves to Make Your Finance Work 7x More Efficient with ChatGPT. *Datarails Study - The CFO’s Office 2.0: The 2026 AI Transformation facing Finance Teams link |
Join 270,000+ Professionals and receive the best insights about Finance & AI. More than 1 million people follow me on social media. Join us today and get 5 goodies from me!
How to Become an AI CFO Don't miss tomorrow's free Masterclass that will show you how to become an AI CFO (and stay ahead). I will teach you: How to build a 3-scenario, 5-year financial model, before your colleagues have finished making their coffee Excel’s best-kept secret: The special mode that cleans your data - and fixes its own formula errors while you watch How I combined 80 messy CSV files into a single upload in 20 seconds (without knowing a single line of code) If you feel you are...
Become an AI CFO Don't miss next week's free Masterclass that will show you how to become an AI CFO (and stay ahead). I will teach you: How to build a 3-scenario, 5-year financial model, before your colleagues have finished making their coffee Excel’s best-kept secret: The special mode that cleans your data - and fixes its own formula errors while you watch How I combined 80 messy CSV files into a single upload in 20 seconds (without knowing a single line of code) If you feel you are falling...
How to Become an AI CFO Don't miss tomorrow's free Masterclass that will show you how to become an AI CFO (and stay ahead). I will teach you: How to build a 3-scenario, 5-year financial model, before your colleagues have finished making their coffee Excel’s best-kept secret: The special mode that cleans your data - and fixes its own formula errors while you watch How I combined 80 messy CSV files into a single upload in 20 seconds (without knowing a single line of code) If you feel you are...