Every CI leader eventually faces the same conversation. A senior executive leans across the table and asks: "What are we getting from all this improvement work?" If your answer requires a week of spreadsheet assembly, a caveat about data quality, and a story about how the real value is cultural -- you have a measurement problem.
The measurement problem isn't that impact doesn't exist. It does. Organizations running disciplined improvement programs produce measurable gains in cost, quality, safety, time, and satisfaction every month. The problem is that most organizations lack the infrastructure to capture, attribute, and aggregate that impact in a way that's credible enough to survive a board meeting.
This isn't a post about why measurement matters. You already know that. This is about what to measure, how the numbers actually work in practice, and what organizations with mature programs have learned about connecting CI metrics to outcomes that executives and boards care about.
The data structure problem nobody talks about
Before you can measure impact, you need data you can trust. And the most common reason CI teams can't produce credible impact numbers isn't a lack of effort -- it's a data structure that wasn't designed for the question being asked.
CareSource, a healthcare organization in Dayton, Ohio, ran into this problem with their Six Sigma Green Belt program. They had a single template in KaiNexus for both general CI projects and Green Belt student projects. On the surface, this seemed efficient. In practice, it created data misattribution, inconsistent documentation, and an inability to isolate the Green Belt program's specific results. When leadership asked "what impact are the Green Belts having?" the CI team couldn't answer cleanly.
The fix wasn't a technology change. It was a design conversation. Working with their KaiNexus customer success manager, the CI team built a dedicated Green Belt template with fields specific to the program's reporting needs. Green Belt projects now flow into their own boards, produce their own metrics, and generate clean data that the communications team uses to tell individual graduate stories internally.
The lesson applies well beyond Green Belts. If your improvement data lives in a single undifferentiated bucket -- or worse, scattered across spreadsheets owned by different people -- you can't slice it by program type, department, strategic priority, or time period. You can count things, but you can't answer the questions that matter: which types of improvements produce the most value? Which teams are generating the highest impact? Is the Green Belt investment paying off differently than the daily kaizen program?
At Tirlan, the pre-KaiNexus state was Excel templates across departments. The CI team relied on individual surveys for project updates, which made it impossible to establish accurate benchmarks. Progress tracking was inconsistent. Nobody trusted the numbers because the numbers depended on who remembered to update their spreadsheet that month.
The structural fix in both cases was the same: move from a system where data capture is an afterthought to one where every improvement carries structured metadata from the moment it's submitted -- impact type, strategic alignment, owning team, methodology, status, and measurable outcome. That structure is what makes aggregation possible later. Without it, you're assembling a jigsaw puzzle where half the pieces are from different boxes.
What to measure beyond dollars
Financial impact is the metric that gets the most executive attention, and it should be tracked. But organizations that measure only dollars distort their improvement programs in predictable ways. People stop submitting ideas that improve safety, quality, or satisfaction because those improvements are harder to dollarize. The program skews toward cost reduction and misses the categories where CI often has the deepest effect.
KaiNexus customer data across thousands of organizations breaks down like this: 28% of all improvements have a direct financial impact. 36% impact quality. 31% increase staff or customer satisfaction. Many improvements affect multiple categories at once. If you're only counting dollars, you're only seeing about a quarter of the picture.
A more complete measurement framework tracks five categories:
Financial impact -- cost savings, cost avoidance, and revenue generation. Among KaiNexus customers, 12% of improvements produce cost savings averaging $70,000 in first-year impact (about 30% annually recurring). 1.2% increase revenue with an average first-year impact just under $210,000 (75% annually recurring). These numbers are skewed by the long tail: most individual improvements are small, but about 1 in 100 generates over $100,000 in impact. The value comes from volume, not from waiting for home runs.
Quality outcomes -- defect rates, error rates, rework, compliance. These are often the improvements that matter most to patients, customers, and regulators, even when they're hard to express in dollars.
Safety -- incidents, near-misses, hazard identification. Safety improvements are among the most important work a CI program produces and among the hardest to dollarize (what's the financial value of a fall that didn't happen?). Track them separately. Report them separately. Don't try to force them into a financial frame.
Time -- cycle time, lead time, wait time, time saved per occurrence. Time savings compound. Five minutes saved per patient discharge, multiplied across 40 discharges per day, multiplied across 365 days, is over 1,200 hours per year. That math is often more persuasive to operations leaders than a dollar figure.
Satisfaction -- staff engagement, patient or customer experience, internal NPS equivalents. These metrics are leading indicators. When satisfaction drops, quality and safety problems follow. When it rises, retention improves and recruitment gets easier.
The point isn't to track all five for every improvement. It's to build a system where every improvement is tagged to at least one category, so you can aggregate and report across the full spectrum of value your program creates.
The metrics that predict long-term success
Impact metrics tell you what your program has produced. Activity and engagement metrics tell you whether it will keep producing.
UMass Memorial Health provides the most detailed case study in how engagement metrics evolve as a program matures. Their measurement approach changed deliberately across three phases, and each phase revealed something different:
Phase 1: Participation rate. When UMass first rolled out KaiNexus to 16,000 caregivers, they measured the percentage of caregivers actively contributing. Baseline was 20%. Year one target was 50%. Year two pushed to 75%. This metric answered the most basic question: are people using the system?
Phase 2: Team-level activity. In year three, UMass shifted from individual participation to a team-level metric: each team should submit and complete at least one idea per month, targeting 12 completed ideas per team per year. This was counterintuitive -- asking for fewer individual ideas, but asking for more consistent team engagement. Active teams grew from 261 to 380. The metric shift surfaced something participation rate had hidden: some teams were generating high volume from a few prolific individuals while most team members remained disengaged. The new metric forced managers to involve their whole teams.
Phase 3: Completion and quality. As UMass surpassed 200,000 ideas, the focus shifted again -- toward implementation quality, impact tracking, and connecting improvement work to strategic priorities. The system had volume. Now it needed to demonstrate that volume was producing outcomes.
That three-phase evolution -- from "are people using it" to "are teams engaged" to "is the work producing results" -- is a useful model for any organization. The mistake most programs make is measuring phase 3 metrics (impact, ROI) when they're still in a phase 1 reality (trying to get people to submit ideas at all). Match the metric to the maturity.
Other engagement metrics worth tracking: idea cycle time (how quickly improvements move from submission to implementation), queue depth (how many ideas are waiting for action, and for how long), spread rate (how often a successful improvement is replicated elsewhere), and leader engagement (are managers reviewing, commenting on, and coaching improvement work in the system).
Connecting CI metrics to outcomes the board cares about
The gap between "we implemented 4,000 improvements" and "our bond rating is the highest in 35 years" is where most CI measurement falls short. Activity metrics live in the CI department's world. Outcome metrics live in the C-suite's world. The organizations that sustain executive investment in CI are the ones that build an explicit bridge between the two.
UMass Memorial Health built that bridge more completely than any other customer story in this set. Their improvement program produced over 200,000 frontline ideas. But the metrics that earned sustained executive attention were organizational outcomes:
- Patient safety composite scores (PSI-90) dropped from 1.47 to 0.69
- CMS star ratings rose from 1 star to 4 stars
- Patient satisfaction moved from the 10th percentile to the top quartile
- Emergency department length of stay was cut nearly in half
- The system achieved its highest bond rating in 35 years
Did KaiNexus and the improvement program cause all of those outcomes? Not alone. But the CI infrastructure provided the mechanism through which thousands of frontline caregivers identified and solved problems that collectively moved those numbers. Leadership could draw a credible line from "583 teams running weekly improvement" to "PSI-90 cut in half" because the data existed to support the connection.
Mary Greeley Medical Center tells a similar story at a different scale. A 220-bed hospital in Ames, Iowa, Mary Greeley tracked over 1,600 improvements with more than 75% implemented and nearly $2 million in measured impact. In 2019, they became the first Iowa organization to win the Malcolm Baldrige National Quality Award. The improvement data wasn't an afterthought in the Baldrige application -- it was evidence that the organization had built the systems, leadership behaviors, and measurement discipline the Baldrige criteria demand.
Electrolux connects improvement activity to strategy deployment through KaiNexus, linking individual projects to the organization's most critical strategic goals. This means leadership doesn't just see "how many improvements happened" -- they see whether improvement activity is aligned with the things that matter most. The platform is now active in 60% of Electrolux sites globally, functioning as both an improvement management system and a global knowledge repository where a process improvement in Brazil can be discovered and adopted by a team in Poland.
The pattern across all three: measurement isn't just about counting. It's about connecting the count to outcomes that exist in the language of the boardroom -- patient safety scores, bond ratings, national quality awards, strategic goal attainment, and cross-facility knowledge transfer.
The compound math that makes volume matter
CI leaders intuitively understand that small improvements add up. But "they add up" is not a business case. The math needs to be explicit.
KaiNexus customer data shows that the average improvement generates approximately $15,000 in tracked impact. Most individual improvements are modest. But the distribution has a long right tail -- about 1 in 100 improvements generates over $100,000 in impact. You can't predict which ideas will be the high-impact ones. You can only generate enough volume that the probability of finding them is high.
This is why engagement metrics and impact metrics are inseparable. An organization that generates 50 ideas per year has roughly a 50% chance of producing a single six-figure improvement. An organization generating 5,000 ideas per year will find approximately 50 of them. The difference isn't creativity or methodology. It's volume, and volume is a function of how many people participate, which is a function of how easy the system is to use and how quickly people see results from their contributions. (For the rollout and adoption practices that drive participation, see How to Roll Out Continuous Improvement Software Without Losing Momentum.)
Trinity Industries demonstrated this math in compressed form. A physical idea board produced 4 ideas in 12 months. KaiNexus produced 653 ideas in the first 6 months. The ideas didn't get better because the software was shinier. More ideas surfaced because the friction between having a thought and submitting it dropped to near zero, and because the visible feedback loop -- a kanban board showing your idea moving through stages -- taught people that contributing was worth the effort.
Oceania Dairy demonstrated the cultural side of this math. As their technical assistant Dion Batchelor put it: "Our frontline users that operate the machinery feel comfortable enough to submit ideas. We have crazy things come through, but those crazy things work." The volume equation requires psychological safety. People need to feel safe submitting imperfect, unconventional, or "crazy" ideas -- because the filtering happens after submission, not before. An organization where people self-censor before submitting is leaving its best ideas on the table.
Building the reporting cadence
Measurement only drives behavior if the numbers are visible to the right people at the right frequency. A quarterly impact report that arrives six weeks after the quarter ends is a historical document, not a management tool.
The organizations in this set use different cadences for different audiences:
Daily/weekly -- team-level metrics visible in huddles and on boards. Active ideas, items needing attention, recently completed improvements. This is where frontline managers see whether their team is engaged and where coaching conversations start.
Monthly -- department and leadership-level scorecards. UMass published monthly participation scorecards by team and shared them with leadership. These created healthy visibility: managers could see how their teams compared, and senior leaders could see which areas of the organization needed attention. Tirlan's CI team used KaiNexus dashboards to replace the hours they'd previously spent chasing updates and assembling PowerPoint decks.
Quarterly/annually -- aggregate impact reporting for executive leadership and boards. Cumulative financial impact, quality and safety trends, participation growth, strategic alignment. This is where the connection between CI activity and organizational outcomes gets made explicitly.
The key principle: the people closest to the work need the most frequent data (so they can act on it), while senior leaders need the most aggregated data (so they can see the pattern). Trying to give everyone the same report at the same frequency serves no one well.
UMass's CEO, Dr. Eric Dickson, led monthly manager meetings to discuss KaiNexus goals, strategic planning, and manager accountability. That cadence -- monthly, led by the CEO, focused on improvement metrics alongside strategic priorities -- is what made improvement data part of how the organization was managed rather than a separate reporting exercise.
What measurement makes possible
When measurement works, it changes the conversation about CI from defensive to strategic. Instead of "can we justify this program?" the question becomes "where should we invest more?" Instead of "is this working?" the question becomes "which types of improvements produce the highest return and how do we get more of them?"
The 2025 Nexie Awards provide a snapshot of what mature measurement enables. Hilti won Outstanding Achievement in their first year -- credible because engagement data backed the claim. Mayo Clinic won Collaboration Excellence for optimizing their mortality review process post-implementation. UMass won Team of the Year for continuously innovating how they energize their program and set evolving goals. James Hardie won Rookie of the Year for gathering stakeholder input throughout implementation and following recommendations about when to change versus when to observe.
Every one of those awards was grounded in evidence, not anecdotes. The organizations could show what they'd done, how many people were involved, what the results were, and how they compared to where they started. That's the difference between a CI program that feels good and one that proves its value.
The spreadsheet got you to your first hundred improvements. It can't tell you what those hundred improvements were worth, which ones should be replicated across other facilities, or whether the program is accelerating or plateauing. That's what measurement infrastructure is for.
Frequently Asked Questions
What's the single most important metric for a CI program?
There isn't one. The right metric depends on your program's maturity. Early-stage programs should measure participation and engagement (are people using the system?). Maturing programs should measure team-level activity and cycle time (is improvement embedded in how teams work?). Mature programs should measure aggregate impact and strategic alignment (is the work producing outcomes that matter to the organization?). Trying to measure ROI before you have broad engagement is measuring the wrong thing at the wrong time.
How do we measure improvements that don't have a financial impact?
Tag every improvement to at least one impact category: financial, quality, safety, time, or satisfaction. Report each category separately. Don't force non-financial improvements into dollar figures -- a medication error prevented or 15 minutes saved per patient discharge is meaningful on its own terms. Across KaiNexus customers, 36% of improvements impact quality and 31% impact satisfaction. Ignoring those categories means ignoring most of what your program produces.
How do we know if an improvement actually caused the result?
Use control charts or process behavior charts to distinguish real shifts from normal variation. If you make a change and the process data shifts -- the center line moves, the variation tightens -- you have evidence of real improvement. If the data stays within previous control limits, the change didn't have the effect you expected, regardless of what this week's number looks like compared to last week's.
What if leadership only cares about financial impact?
Start by reporting financial impact clearly -- KaiNexus tracks it automatically at the individual improvement level and aggregates it across the program. Then pair financial data with one or two non-financial outcomes that leadership already tracks (patient safety scores, customer satisfaction, employee retention). Over time, the pattern becomes visible: the organizations with the strongest improvement cultures also have the strongest organizational outcomes. UMass's bond rating and PSI-90 improvements didn't show up in an ROI spreadsheet, but they proved the program's value more powerfully than any dollar figure could.
How many improvements do we need before the numbers are meaningful?
The compound math favors volume. With 50 improvements per year, your data is anecdotal. With 500, patterns emerge. With 5,000, you can identify which types of improvements, which teams, and which methodologies produce the highest returns. About 1 in 100 improvements generates over $100,000 in impact -- but you only find those if you're generating enough total volume. The measurement system and the engagement system are the same system.


Add a Comment