AI POC ROI Evaluator

A two-minute orientation

This tool tells you whether an AI POC is worth building — and if not, what would make it worthwhile.

It replaces vendor optimism with structured honesty, using evidence-backed parameters and Monte Carlo simulation to produce a distribution of possible ROI outcomes rather than a single hopeful number. The output is board-grade: a verdict, a confidence interval, and a diagnosis of what to change if the verdict is unfavourable.

In one paragraph

§1

What this is

Astructured decision instrument for evaluating whether a proposed AI project will deliver the ROI its vendor promises. It takes the vendor's annual value claim (Vᵢ) and applies a chain of evidence-backed discounts — realisation probability, infrastructure trust, attribution, decay, and adoption ramp — then simulates thousands of possible outcomes to produce a distribution of realistic ROI rather than a single optimistic number.

It is built on the formula from Gradwell's AI ROI work, Laney's adoption function from Infonomics, and Agidee's Digital Transformation Resilience Model (DTRM). The evidence base covers RAND, MIT, BCG, Brynjolfsson-Rock-Syverson, Microsoft Copilot deployment benchmarks, and the Nature Scientific Reports paper on ML model degradation. Every default has a citation.

Three possible verdicts, three commercial paths

§2

What you get out

Green verdict

PROCEED

The project clears the CFO hurdle with adequate confidence. Infrastructure and exit feasibility are both adequate. The path forward is the AI build engagement itself.

Amber verdict

CONDITIONAL

The project could be approvable if specific structural preconditions are remediated. The Infrastructure tab identifies exactly which sub-questions need to move. Infrastructure remediation is the engagement.

Red verdict

DO NOT PROCEED

Central estimate negative with low probability of any positive return. The vendor's Vᵢ is structurally too small relative to cost, or infrastructure is too weak. Either the claim must rise or the infrastructure must be rebuilt first.

Notice that all three outcomes are commercially meaningful. Green opens an AI build conversation. Amber opens an infrastructure remediation conversation. Red opens both: challenge the vendor's claim or rebuild the infrastructure first. There is no outcome that wastes the assessment.

The suggested workflow, in order

§3

How to use it

Setup~5 minutes

Tell the tool what you're evaluating.

Enter the project name, the vendor's annual value claim (Vᵢ), the time horizon you're evaluating over, and your CFO's hurdle rate. Set the benefit-side parameters (pᵢ, αᵢ, dᵢ) — the defaults are evidence-backed cross-industry averages, so you can leave them alone on first pass. Set the adoption ramp across three sub-curves: Exposure, Utilisation, and Absorption.

Infrastructure~20 minutes

Score your organisation's readiness honestly.

This is the most important tab. Work through the five ITI domains (infrastructure trust) and seven EFI domains (exit feasibility), each with 4–5 sub-questions. Score each sub-question 0–5 using the rubric text that appears next to the slider. Most organisations score themselves at Amber (2.5–3.9). The band is derived — you cannot cheat the output by clicking a band.

Evidence~10 minutes, optional

Understand and adjust the evidence-backed defaults.

Each of the four benefit-side variables (pᵢ, αᵢ, dᵢ, Aᵢ(t)) has its own evidence panel with research citations, industry benchmarks, and calibration guidance. Use this tab if your context materially differs from the defaults — for instance, if your AI model has well-documented drift behaviour you'd like to reflect in dᵢ.

Finance~5 minutes to read

See the ROI distribution and sensitivity.

The headline output. Shows P10, P50, P90 ROI as a distribution from 3,000 Monte Carlo samples. The Vᵢ Sensitivity panel shows how the verdict depends on the size of the vendor's claim — the critical commercial diagnostic. Year-by-year and Cost Structure breakdowns follow.

Board~3 minutes to read

The executive summary.

The verdict (PROCEED / CONDITIONAL / DO NOT PROCEED), the probability of clearing the CFO hurdle, and the three structural levers that can change the outcome: ITI, EFI, adoption. Formatted as board-grade output suitable for a signed recommendation.

Scenario~5 minutes

See what it would take.

The 'all three levers pulled' comparison shows what the ROI distribution would look like if ITI and EFI were at Green, adoption were lifted to structured-rollout benchmarks, and the hurdle rate were your current setting. The delta between current and target is the commercial opportunity — expressed as ROI points, probability shift, and narrowing of the confidence interval.

The inputs, in order of importance

§4

What it needs from you

Critical inputs

Vᵢ — Vendor's annual value claim. The single most consequential number in the framework. Where did it come from? A client-measured baseline? A vendor calculator? Adjust skepticism accordingly.

ITI sub-question scores. Five domains × 4–5 sub-questions. These drive f(ITI) and δ. Most of the framework's commercial output depends on these being honest.

EFI sub-question scores. Seven domains × 4–5 sub-questions. These drive PV(SO). The three expanded domains — contractual terms, reversibility, model transparency — are the most consequential.

Hurdle rate. The CFO's minimum acceptable ROI. Typically 10–25%.

Important inputs

Adoption sub-curves. Exposure × Utilisation × Absorption per year. Most business cases assume 100% day-one; reality is 20–40%.

Cost structure. Build, run, governance, maintenance. Often under-stated by vendor quotes by 20–50%.

Time horizon. Typically 3 years for ROI evaluation.

Defaults

The benefit-side variables pᵢ, αᵢ, dᵢ are pre-set to evidence-backed cross-industry averages. The Evidence tab documents every citation. Adjust only if your context materially differs.

What the numbers actually mean

§5

Reading the output

P50 ROI — the central estimate

The median of 3,000 Monte Carlo simulations. Half of outcomes consistent with your inputs are above this, half below. Not a prediction — a median of defensible possibilities. If this is positive, the project is more likely than not to create value. If negative, vice versa.

80% confidence interval (P10 → P90)

The range within which 80% of simulated outcomes fall. Width matters as much as centre. A P50 of +20% with CI of −5% to +45% is a different investment proposition than +20% with CI of +15% to +25%. Narrow distributions mean predictable outcomes; wide distributions mean the project could go badly.

Probability of clearing hurdle

The fraction of simulated outcomes that exceed your CFO's hurdle rate. The most commercially consequential number in the output. A P50 above hurdle with only 55% probability of clearing hurdle tells you the project is marginal — CFO-approvable technically, but the distribution is wide enough that half of outcomes disappoint.

Vᵢ sensitivity

The curve showing how ROI changes as Vᵢ varies from 0.25× to 3× current value. Identifies the break-even Vᵢ (where ROI = 0) and the hurdle-clearing Vᵢ (where ROI = hurdle rate). The gap between your current Vᵢ and the hurdle-clearing Vᵢ is the diagnostic bridge to a commercial conversation: either the vendor's claim needs to rise, or the infrastructure needs to lift.

What first-time users get wrong

§6

Common pitfalls

⚠ Over-scoring the domains

IT leaders routinely overestimate their own infrastructure maturity. Provenance reconstructability at 4.0 means you can trace every data element across the entire estate in real time. If you cannot, your score is lower. When in doubt, score one level below your first instinct.

⚠ Treating Vᵢ as fixed

The vendor's claim is an input to the framework, not an output. If the framework says your project fails, one possibility is that the vendor's Vᵢ is optimistic. Use the Vᵢ Sensitivity panel to see what the claim would need to be for the project to work.

⚠ Reading P50 as a prediction

P50 is the median of simulated outcomes, not a forecast of what will happen. The real output is the distribution. A P50 of +20% means the project is more likely positive than negative; it does not mean the project will definitely return +20%.

⚠ Ignoring the width

Two projects with identical P50 can have very different risk profiles. A narrow distribution means predictable outcomes; a wide one means the project could disappoint even if the central estimate is good. CFOs increasingly ask about CI width, not just point estimate.

⚠ Scoring without evidence

If you're scoring a sub-question 4.0 based on what you think should be true rather than on documented assessment, the output is speculation dressed up in precision. Either lower your score, or commission the actual assessment.

Ready?

Start with Setup. Then score your Infrastructure honestly. Everything else follows.