How to use this scorecard
Eight domains, each with 4–6 sub-criteria. Each sub-criterion scored 1 to 5. Domain scores are weighted by importance. Total possible: 100 points. Tier bands at the end tell you who passes shortlist.
Use the same scorer (or scoring team) for every vendor. Score within 48 hours of each vendor's pitch / proposal so impressions are fresh and consistent. Don't let one charismatic founder visit skew your scoring — anchor on the evidence.
| Domain | Weight |
| 1. Technical capability | 20% |
| 2. Process & methodology | 15% |
| 3. References & track record | 15% |
| 4. Communication | 12% |
| 5. Commercial | 12% |
| 6. Security & compliance | 10% |
| 7. Cultural fit | 8% |
| 8. Business stability | 8% |
| Total | 100% |
Domain 1 — Technical capability (weight 20%)
Can they actually build the thing? Score each sub-criterion 1 (poor) to 5 (excellent).
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Stack fit (do they specialise in your tech?) | __ | __ | __ |
| AI native tooling adoption (Cursor, Claude Code, Copilot — in daily practice?) | __ | __ | __ |
| Code-quality evidence (public repos, sample PRs, linting/testing standards) | __ | __ | __ |
| Architecture references (relevant systems they've designed at your scale) | __ | __ | __ |
| Security posture (security headers, dependency scanning, IaC hygiene) | __ | __ | __ |
| Domain 1 average (1–5) | __ | __ | __ |
What 5 looks like: Multiple senior engineers with 7+ years in your exact stack. Public artefacts (blog, GitHub) demonstrating taste. AI tooling embedded in daily workflow, not "we use ChatGPT sometimes". Architecture references at or beyond your scale.
What 1 looks like: Generalist team. Stack named in proposal but no senior on call. AI tooling absent or treated as forbidden. No demonstrable artefacts of quality.
Domain 2 — Process & methodology (weight 15%)
Do they ship predictably? Look at how they describe their week, not just the slide titled "Agile".
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Sprint cadence (fixed cadence, end-of-sprint demo) | __ | __ | __ |
| Demo discipline (every sprint, working software, not slides) | __ | __ | __ |
| Written reports (weekly status with risks, blockers, asks) | __ | __ | __ |
| Change-request protocol (written change orders, no ambush invoices) | __ | __ | __ |
| CI/CD maturity (automated pipelines, every PR builds & tests) | __ | __ | __ |
| Test coverage standards (named target, enforced in CI) | __ | __ | __ |
| Domain 2 average (1–5) | __ | __ | __ |
What 5 looks like: Two-week sprints. Every sprint ends with a demo of working software. Weekly written status with risks named. Change requests in writing before work starts. 70%+ test coverage non-negotiable.
What 1 looks like: "We're flexible / agile" — meaning nothing in particular. Demos when convenient. Status by Slack only. Changes verbal. No coverage standard.
Domain 3 — References & track record (weight 15%)
Have they done this before — recently, at your scale, with someone who'll vouch for them?
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Years operating (5+ is meaningful; under 2 is risky) | __ | __ | __ |
| Similar industry case studies (3+ projects in your domain) | __ | __ | __ |
| Named, contactable references (you called and they confirmed) | __ | __ | __ |
| Employee tenure (senior staff averaging 3+ years in role) | __ | __ | __ |
| Churn rate of past clients (multi-year relationships, repeat work) | __ | __ | __ |
| Domain 3 average (1–5) | __ | __ | __ |
What 5 looks like: 5+ years operating. 3+ relevant case studies. References answer the phone and praise the work without prompting. Long-tenured seniors. Multiple repeat clients.
What 1 looks like: Under 2 years operating, or rebranded recently. No relevant case studies. References vague or unreachable. High visible turnover.
Domain 4 — Communication (weight 12%)
In an offshore engagement, communication discipline is the #1 predictor of success. Don't underweight this.
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Timezone overlap (4+ hours per working day live overlap) | __ | __ | __ |
| Dedicated PM / engagement manager (named, single point of contact) | __ | __ | __ |
| Response SLAs (e.g. 4 business hours to first response, defined in MSA) | __ | __ | __ |
| English fluency (written and spoken, across the team not just sales) | __ | __ | __ |
| Written-update discipline (weekly status doc, not just Slack) | __ | __ | __ |
| Domain 4 average (1–5) | __ | __ | __ |
What 5 looks like: 4+ hour live overlap. Named PM you've already met. Response SLAs in the contract. Senior engineers as articulate as the sales lead. Weekly written status is a habit.
What 1 looks like: No overlap or under 2 hours. Account manager and engineers are different people who don't speak. Vague response timing. Junior engineers hard to follow.
Domain 5 — Commercial (weight 12%)
Beyond headline rate: what does it actually cost to engage, and what does it cost to leave?
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Rate transparency (per-role rates published or stated up-front) | __ | __ | __ |
| Lock-in terms (notice period, minimum commitment) | __ | __ | __ |
| Payment flexibility (milestone vs. monthly, currency options) | __ | __ | __ |
| Currency & FX handling (your currency or one you can hedge) | __ | __ | __ |
| Exit terms (handover obligations, IP transfer, source code escrow if applicable) | __ | __ | __ |
| Domain 5 average (1–5) | __ | __ | __ |
What 5 looks like: Rate card public or on first request. 30-day notice. Milestone-based with reasonable payment terms. Will quote in your currency. Detailed handover clause in MSA, IP transfers cleanly.
What 1 looks like: Rates only after long sales process. 6-month lock-in, 90-day notice. Pre-pay required. Their currency only. Vague IP terms, no handover process.
Domain 6 — Security & compliance (weight 10%)
If you handle regulated or sensitive data, this is non-negotiable. Score generously to "5" only with evidence.
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Encryption (TLS 1.2+ in transit, AES-256 at rest, key management) | __ | __ | __ |
| Regulatory posture (GDPR / HIPAA / SOC 2 evidence appropriate to your domain) | __ | __ | __ |
| Sub-processor list (named third parties, ability to update with notice) | __ | __ | __ |
| Pen test cadence (annual minimum, recent report shareable under NDA) | __ | __ | __ |
| NDA / MSA quality (data protection, breach notification, audit rights) | __ | __ | __ |
| Internal security training (employees trained, attested annually) | __ | __ | __ |
| Domain 6 average (1–5) | __ | __ | __ |
What 5 looks like: Encryption everywhere, key management explained. Certifications appropriate to industry. Annual pen test, willing to share summary. Strong MSA with named DPO. Documented training.
What 1 looks like: Vague answers on encryption. No certifications relevant to your sector. No pen test or refusal to discuss. Boilerplate MSA missing data protection clauses.
Domain 7 — Cultural fit (weight 8%)
Often dismissed as "soft". It's actually a strong predictor of whether the engagement survives stress.
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Decision velocity (small decisions in hours, not days) | __ | __ | __ |
| Founder / exec accessibility (you can get the CEO when it matters) | __ | __ | __ |
| Transparency on slips (will they tell you a milestone is at risk early?) | __ | __ | __ |
| Willingness to push back (will they disagree with you when you're wrong?) | __ | __ | __ |
| Domain 7 average (1–5) | __ | __ | __ |
What 5 looks like: Decisions same-day. Exec on a call within 24 hours when escalated. Slips flagged early with mitigation. Engineers comfortable disagreeing in writing.
What 1 looks like: Decisions take a week. Sales rep is the only contact. Slips revealed late, framed as "almost done". Yes-people on every call.
Domain 8 — Business stability (weight 8%)
You don't want your vendor to go out of business mid-project. Quick sanity checks here.
| Sub-criterion | Vendor A | Vendor B | Vendor C |
| Years in business (5+ years operating under current name) | __ | __ | __ |
| Financial stability (filed accounts, no major write-downs) | __ | __ | __ |
| Parent group / investors (any concentration risk?) | __ | __ | __ |
| Key-person risk (does the whole business depend on one founder?) | __ | __ | __ |
| Domain 8 average (1–5) | __ | __ | __ |
What 5 looks like: 5+ years operating, healthy filed accounts, no recent layoffs, multiple senior leaders, distributed delivery capability.
What 1 looks like: Under 2 years, financials unavailable or weak, single founder is the whole technical bench, recent visible turnover.
Total score & weighted roll-up
| Domain | Weight | Vendor A score | A weighted | Vendor B score | B weighted | Vendor C score | C weighted |
| 1. Technical capability | 20% | __ | __ | __ | __ | __ | __ |
| 2. Process & methodology | 15% | __ | __ | __ | __ | __ | __ |
| 3. References & track record | 15% | __ | __ | __ | __ | __ | __ |
| 4. Communication | 12% | __ | __ | __ | __ | __ | __ |
| 5. Commercial | 12% | __ | __ | __ | __ | __ | __ |
| 6. Security & compliance | 10% | __ | __ | __ | __ | __ | __ |
| 7. Cultural fit | 8% | __ | __ | __ | __ | __ | __ |
| 8. Business stability | 8% | __ | __ | __ | __ | __ | __ |
| Total weighted (out of 5.00) | 100% | — | __ | — | __ | — | __ |
| Total % (× 20) | — | — | __ | — | __ | — | __ |
Tier bands
Tier-A vendor — 80–100% Strong fit. Shortlist. Move to commercial negotiation. Run a small 2 week trial sprint to confirm delivery quality.
Tier-B vendor — 60–80% Workable with mitigation. Shortlist only if the gap in one domain is fixable (e.g. they'll add a dedicated PM, agree to written status). Don't pick a Tier-B unless no Tier-A exists.
Pass — under 60% Decline. Score below 60% means at least one domain is fundamentally weak. The engagement will struggle. Save everyone the time.
Practical scoring discipline
- Score within 48 hours of the vendor meeting. Memory of nuance fades faster than you think. Pull this scorecard up immediately after the call.
- Get two scorers. Compare independently scored sheets, then reconcile differences in a 30-minute conversation. Single-scorer scorecards drift.
- Don't average to "3" for the unknown. If you don't have evidence for a sub-criterion, score it 1 and ask the vendor. Reward effort to fill the gap.
- One veto criterion. Allow yourself one "below 2 here = automatic pass regardless of total" criterion. Common candidates: security, timezone overlap, English fluency.
- Score your incumbent too. If you're considering switching, score your current vendor the same way. Sometimes the right answer is to keep them and renegotiate.
Common red flags (score these zero on sight)
- "We have 1,000 developers" but can't name the team that will work on your account.
- Sales lead promises a CV pack but it never arrives, or arrives generic.
- Cannot give a reference in your industry from the last 24 months.
- Pricing model that's "we'll figure it out" rather than per-role rates with assumptions.
- MSA missing IP transfer or data protection clauses.
- No public engineering blog, no GitHub presence, no signed talks. Quality engineering tends to leave fingerprints.
Want a second pair of eyes?
Send us your shortlist and your filled-in scorecard. We'll review for free — including blind-scoring ourselves so you have a comparison. Book a review. Even if you don't end up working with us, you'll make a sharper decision.