How to use this checklist
This is a 40-item, five-phase checklist for modernizing a legacy software platform. It is opinionated: it assumes you want zero unplanned downtime, a parallel run before cutover, and a real plan for the 90 days after go-live.
You don't have to do all 40 items. You do have to consciously decide which ones you're skipping and why. The "why this matters" line under each item helps that conversation.
Phase 1 — Discovery
You can't replace a system you don't understand. Discovery is where most projects underspend and later overspend.
- ☐Codebase audit. Walk every directory of the existing codebase. Note dead code, copy-pasted modules, framework versions, and language features in use.Why: estimates are wrong without an honest line count and dead-code map. Surprises here surface as scope creep later.
- ☐Dependency map. List every external library, microservice, third-party API, scheduled job, and queue the system touches. Note version and support status of each.Why: forgotten dependencies are the single biggest cause of cutover failure. Half the time the dependency was undocumented.
- ☐Runtime metrics baseline. Capture current p50, p95, p99 response times, error rates, CPU and memory profiles for the top 20 endpoints.Why: you can't claim the new system is faster without a number to compare against. Take this measurement before touching anything.
- ☐Traffic patterns. Document daily, weekly, and seasonal load patterns. Note batch windows, peak hours, and quiet windows.Why: cutover windows live in the quiet hours. If you don't know when those are, you'll schedule downtime into peak.
- ☐Undocumented logic owners. For every weird-looking conditional, find the human who knows why it's there. Names go in the doc.Why: every legacy system has 5–20 of these. Replacing one without understanding it produces regressions that take weeks to find.
- ☐Regulatory footprint. Identify which data, modules, or workflows are subject to GDPR, HIPAA, PCI DSS, SOC 2, or sector-specific rules.Why: regulatory data has different handling requirements. Migrating it casually can put you in breach.
- ☐Downtime tolerance per workflow. Map each business workflow to the maximum tolerable downtime. Some can take an overnight; some cannot tolerate 30 seconds.Why: dictates whether you can do a big-bang cutover or must use parallel run and a feature flag.
- ☐Key business rules catalog. Write down the 30–50 business rules that the system enforces, in plain English, ranked by criticality.Why: the new system has to enforce these. Putting them in a doc makes the parity check easy; not doing it makes the project fail UAT.
Phase 2 — Planning
Decisions made here determine whether you'll finish on time. Decide once, document, revisit only with cause.
- ☐Target architecture decision. Pick monolith vs. modular monolith vs. microservices. Write an Architecture Decision Record (ADR).Why: most teams default to microservices and regret it. Pick the simplest thing that fits your real constraints.
- ☐Stack choice. Choose language, framework, database, cache, queue, and frontend stack. Justify each in writing.Why: stack choices change what you can hire for, deploy to, and modernize to next time. Treat it as a 5-year decision.
- ☐Data migration strategy. Decide between dump-and-load, dual-write, or CDC streaming. Estimate volume and downtime per option.Why: this is the single most expensive technical decision. Wrong strategy = multi-day outage or data loss.
- ☐Parallel run vs. big-bang. Decide whether old and new systems run side by side for a stabilization window, or whether you cut over in one move.Why: parallel run is safer but costs ~30% more. Big-bang is cheaper but unforgiving. Pick consciously.
- ☐Rollback plan. Write the rollback runbook before you write the cutover runbook. Include the trigger criteria and the named decision-maker.Why: if you cannot answer "how do we go back?" you cannot go forward safely.
- ☐Training plan. Identify who needs training (admins, ops, end users) and produce one written guide per audience.Why: untrained users will judge the new system harshly in the first week. That perception lasts.
- ☐Stakeholder comms plan. A schedule of who hears what, when, from whom. Weekly status, monthly steering, ad-hoc incidents.Why: stakeholder anxiety scales with silence. Predictable comms buy you patience when things slip.
- ☐Code and config freeze windows. Decide when changes to the legacy system stop. Communicate to all teams.Why: cutting over a moving target is impossible. A short freeze beats a chaotic merge.
Phase 3 — Execution
The build phase. The goal is not just to write the new system but to write it in a way that lets you cut over confidently.
- ☐Strangler-fig pattern. Route new functionality through the new system; route legacy functionality through the old. Migrate routes one at a time.Why: lets you ship incrementally and ship back if something breaks. Big-bang rewrites have a famously poor track record.
- ☐Feature parity matrix. A spreadsheet listing every feature in the legacy system and its status in the new system: Done / In Progress / Out of Scope.Why: this is what you show in steering meetings. It also makes "are we done?" a yes/no question.
- ☐Feature flags. Every new module is behind a flag. Off by default. Toggle on per user, per cohort, then per environment.Why: this is your rollback at the feature level. You'll use this more than you think.
- ☐Monitoring and alerting in place before cutover. Dashboards live, alerts wired, on-call rota agreed before a single user hits the new system.Why: cutover is the moment you most need observability. Setting it up afterwards is too late.
- ☐Observability stack. Logs (structured), metrics (Prometheus-compatible), tracing (OpenTelemetry). Centralized, queryable.Why: when prod misbehaves, you have minutes to understand it. Without structured observability you have hours.
- ☐Cutover runbook. Step-by-step, with timings, decision points, named owners, and rollback triggers. Rehearse it.Why: cutover at 2am is not the time to invent steps. The runbook turns a stressful night into a checklist.
- ☐Smoke tests. A short set of automated tests that prove the system is alive: login, key workflow, write, read, logout.Why: the first thing you run after cutover. If smoke tests fail you've failed cutover, full stop.
- ☐Sign-off matrix. Who signs off on what to release each module: tech lead, product owner, security, sponsor.Why: ambiguity here causes either rubber-stamping or paralysis. Codify the matrix once.
Phase 4 — Cutover
The night of go-live. If discovery and planning were honest, this is uneventful. Aim for boring.
- ☐DNS / traffic strategy. Decide DNS TTL window, blue/green switch, or weighted routing. Test the switch in staging first.Why: DNS propagation surprises catch teams every year. Drop TTL 48 hours before cutover.
- ☐Database cutover. Final delta migration, integrity checks, primary handover. Old DB read-only during this window.Why: the database is the riskiest part. Old DB read-only protects against split-brain.
- ☐Data integrity check. Row count parity, checksum on critical tables, spot-check of recent records.Why: if data isn't right post-cutover, nothing else matters. Verify before opening traffic.
- ☐Monitoring window. A pre-agreed window (e.g. 4 hours) during which the cutover team watches dashboards live before declaring success.Why: most cutover issues surface in the first hour or two. Don't leave the room early.
- ☐Rollback rehearsal. Within the last week before cutover, do a dry-run rollback in staging end to end.Why: an unrehearsed rollback fails in production. Rehearse the worst case so it becomes routine.
- ☐Support team on call. Tier 1 support trained, FAQ ready, escalation path to engineering documented and tested.Why: users will hit support in the first 24 hours. Ready support deflects engineering interrupts.
- ☐Comms to stakeholders. "Cutover complete, monitoring green, no user impact" sent within agreed SLA. Even (especially) at 3am.Why: silence after cutover is interpreted as failure. A boring status note is reassuring.
- ☐Post-cutover checks. Hourly checks for the first 4 hours, then 4-hourly for 24 hours, then daily for the first week.Why: regressions can surface hours or days after cutover, especially around batch jobs and reports.
Phase 5 — Post-cutover
The 90 days after go-live. This is where modernization projects either pay off or quietly collapse.
- ☐30 / 60 / 90 day stabilization plan. Pre-agreed checkpoints with defined exit criteria for each.Why: without checkpoints, "stabilization" becomes permanent and the team never moves to roadmap work.
- ☐Retrospective. A structured retro within 2 weeks of cutover. What worked, what didn't, what we'd do differently.Why: organizational learning happens in retros or not at all. Hold one before the team disperses.
- ☐Decommission timeline for old system. Date by which the old system is fully retired, with milestones for code freeze, read-only mode, and shutdown.Why: paying for two systems forever is a hidden cost. Set the date and defend it.
- ☐Knowledge transfer. Written handover from build team to ongoing maintenance team, including architecture overview, runbook, and known issues.Why: if the build team leaves and nothing is written down, the next team rebuilds tribal knowledge from scratch.
- ☐Training delivered and signed off. All identified audiences trained, with attendance records and feedback collected.Why: training that isn't measured isn't real. Confirm uptake.
- ☐Retainer or maintenance contract in place. A defined support arrangement for at least the first 6 months post-cutover.Why: the first 6 months produce most of the bugs. A retainer gets them fixed fast.
- ☐Post-mortem documented for any cutover incidents. Even minor ones. Especially the near-misses.Why: post-mortems compound into institutional learning. Skipping them resets the clock.
- ☐Success metrics measured and reported. The KPIs you defined in planning, measured against baseline, shared with the sponsor.Why: this is how you prove the project succeeded — or learn that it didn't and recover.
Scoring your project
Count how many of the 40 items you can honestly tick as complete (or consciously decided to skip) right now. Bands:
- 0–15: You are not ready. Pause the project, return to discovery and planning. Better to slip by a month than fail at cutover.
- 16–28: Workable but with risk. Identify the 5 most important unchecked items and address them before cutover.
- 29–35: Solid plan. Cutover with confidence; rehearse rollback.
- 36–40: Excellent. You're in the top 10% of modernization projects.
Want a hand?
We've run dozens of modernization projects. If you want a free 30-minute review of your modernization plan against this checklist, book a call. We'll tell you which items we'd prioritize for your specific situation — even if you don't end up working with us.