How to use this audit
Score each of the 50 questions: 0 = No, 1 = Partial / planned, 2 = Yes, in production. Maximum score is 100. The maturity bands at the end tell you where to focus next.
Be honest. The audit only works if you tick "partial" when you mean partial. Overstating maturity early in an AI program is how organizations end up with abandoned proofs of concept and write-offs.
It's normal for organizations under 50 employees to score 25–45 on the first pass. That's not a failing grade — it's a starting line.
Category 1 — Data foundations
If your data is unreliable, AI built on it will be unreliable in interesting ways. Fix this first.
- ☐Q1. Do you have a single source of truth for each key data domain (customer, product, transaction)? (0 / 1 / 2)
- ☐Q2. Is data quality measured (completeness, accuracy, freshness) and tracked over time? (0 / 1 / 2)
- ☐Q3. For supervised use cases, do you have a labelling process with documented guidelines and inter-annotator agreement checks? (0 / 1 / 2)
- ☐Q4. Is there a documented data retention policy, applied automatically in production systems? (0 / 1 / 2)
- ☐Q5. Can you trace any record's lineage from source through transformations to the system that uses it? (0 / 1 / 2)
- ☐Q6. Are access controls in place at the row or column level for sensitive data, with reviewable audit logs? (0 / 1 / 2)
- ☐Q7. Do you have a reliable anonymization / pseudonymization pipeline for analytical and AI workloads? (0 / 1 / 2)
- ☐Q8. Are schema changes versioned, reviewed, and rolled out without breaking downstream consumers? (0 / 1 / 2)
- ☐Q9. Are consent records (where applicable) tracked at the record level and enforced when data is used? (0 / 1 / 2)
- ☐Q10. Is every dataset classified (public / internal / confidential / restricted) and handled accordingly? (0 / 1 / 2)
Category 2 — Infrastructure
AI workloads need elastic, observable, and reproducible infrastructure. If you can't deploy a service in an hour, you can't iterate on a model.
- ☐Q11. Is your production stack cloud-native (containers, managed services, autoscaling) rather than tied to bare metal or fixed VMs? (0 / 1 / 2)
- ☐Q12. Do you have a documented internal API layer (REST or GraphQL) that exposes core data and operations to other services? (0 / 1 / 2)
- ☐Q13. Is there an event-driven backbone (queue / event bus) that decouples producers and consumers? (0 / 1 / 2)
- ☐Q14. Is observability in place across logs (structured), metrics, and tracing — centralized and queryable? (0 / 1 / 2)
- ☐Q15. Is your infrastructure defined as code (Terraform, Pulumi, CloudFormation) and version-controlled? (0 / 1 / 2)
- ☐Q16. Do you have CI/CD pipelines that build, test, and deploy automatically with rollback support? (0 / 1 / 2)
- ☐Q17. Are secrets and credentials managed through a dedicated secrets manager (not in code, not in env files)? (0 / 1 / 2)
- ☐Q18. Are dev, staging, and production environments materially similar so what works in staging works in prod? (0 / 1 / 2)
- ☐Q19. Is autoscaling configured for the production tier so traffic spikes don't degrade users? (0 / 1 / 2)
- ☐Q20. Are you aware of vendor lock-in risks (model providers, cloud-specific services) and have you documented exit paths? (0 / 1 / 2)
Category 3 — Team and culture
AI projects are at least 60% organizational change. Without the right team and culture, the technology underperforms.
- ☐Q21. Is there senior engineering oversight on AI initiatives (not just a junior data team running unsupervised)? (0 / 1 / 2)
- ☐Q22. Does the team have working AI / ML literacy — can they evaluate a model card, read a confusion matrix, reason about LLM costs? (0 / 1 / 2)
- ☐Q23. Is the engineering culture experiment-friendly — can the team ship a small experiment within 2 weeks and learn from it? (0 / 1 / 2)
- ☐Q24. Are feature flags used routinely so AI features can be turned off without a deploy? (0 / 1 / 2)
- ☐Q25. Is there a blameless post-mortem culture for incidents, with written documents and tracked actions? (0 / 1 / 2)
- ☐Q26. Do significant decisions get written design docs (or ADRs) before implementation? (0 / 1 / 2)
- ☐Q27. For each AI use case, is there a named KPI owner accountable for the business metric? (0 / 1 / 2)
- ☐Q28. Are teams cross-functional (engineering + product + domain expert + ops) rather than siloed? (0 / 1 / 2)
- ☐Q29. Is decision velocity high — can a senior engineer get a green light on a $20k experiment within a week? (0 / 1 / 2)
- ☐Q30. Does the organization tolerate change well — including rolling back a feature when it underperforms? (0 / 1 / 2)
Category 4 — Governance
Governance isn't paperwork. It's how you avoid the AI program embarrassing the company in a press release.
- ☐Q31. Is there a written data classification policy applied consistently across teams and systems? (0 / 1 / 2)
- ☐Q32. Is there an internal AI use policy covering what data may be sent to third-party models and what may not? (0 / 1 / 2)
- ☐Q33. Does each model in production have a model card (purpose, training data, performance, known limitations)? (0 / 1 / 2)
- ☐Q34. Are models reviewed for bias against protected attributes appropriate to your domain? (0 / 1 / 2)
- ☐Q35. For high-risk decisions, is there a human-in-the-loop with veto power and a clear audit trail? (0 / 1 / 2)
- ☐Q36. Are AI-influenced decisions logged with enough context that they can be reconstructed later? (0 / 1 / 2)
- ☐Q37. Are models versioned with prompts, weights, and evaluation results retained for at least 12 months? (0 / 1 / 2)
- ☐Q38. Have you mapped applicable regulations (EU AI Act, GDPR Article 22, HIPAA, sector-specific) to your AI use cases? (0 / 1 / 2)
- ☐Q39. Is there an executive sponsor for the AI program who can clear blockers and own the strategy? (0 / 1 / 2)
- ☐Q40. Is AI risk recorded in the company risk register with owners and mitigations? (0 / 1 / 2)
Category 5 — Business case
Without a sharp business case, AI projects become a research budget that produces blog posts instead of revenue.
- ☐Q41. Does each AI use case have a measurable KPI tied to revenue, cost, or risk reduction? (0 / 1 / 2)
- ☐Q42. Is the current baseline metric measured before any AI is introduced? (0 / 1 / 2)
- ☐Q43. Has the cost of failure been considered (wrong prediction, hallucination, biased recommendation) and bounded? (0 / 1 / 2)
- ☐Q44. Has the ROI been sized — best case, expected, worst case — with assumptions documented? (0 / 1 / 2)
- ☐Q45. Is the value pool large enough to justify the build (revenue impact, cost avoidance, hours saved)? (0 / 1 / 2)
- ☐Q46. Has the initial scope been minimized to the smallest useful thing you can ship? (0 / 1 / 2)
- ☐Q47. Is there a credible MVP timeline (8–16 weeks) for delivering the first useful version? (0 / 1 / 2)
- ☐Q48. Are there pre-agreed kill criteria — conditions under which you stop the project? (0 / 1 / 2)
- ☐Q49. Is there a comms plan to set internal expectations (especially that "AI" doesn't mean "magic")? (0 / 1 / 2)
- ☐Q50. Has rollout been sequenced (internal alpha → pilot users → full population) rather than launched at everyone at once? (0 / 1 / 2)
Scoring guide
Add up your score across all 50 questions. Maximum: 100.
What to do with this score
Three actions:
- Find the weakest category. A balanced score of 60 is more useful than a lopsided 80. If one category scores under 50% of its possible 20 points, fix that first.
- Identify three "zero" items in your weakest category. Pick the three highest-leverage ones. Make them quarterly OKRs.
- Re-run the audit in 6 months. Track the score. Movement is the metric — not the absolute number.
Want a written assessment?
We offer a free 45-minute call to walk through your audit results and identify the 3–5 actions we'd prioritize for your specific situation. Book a call. No commitment.