Case Study: AI Document Processing for Insurance Claims

The Challenge

A mid-sized UK insurance broker handling motor, home, and commercial property claims was drowning in paperwork. Every claim arrived as a bundle of PDFs, scanned forms, photos, and email attachments. Claims handlers manually opened each document, extracted key data points, cross-referenced policy details, and routed the claim to the correct team.

5 business days average processing time. From first notification of loss (FNOL) to triage decision, most claims sat in a queue for 3–5 days. High-value or complex claims could take weeks. Customers complained regularly about lack of updates, and the broker's NPS score had dropped 18 points in two years.

Error-prone manual extraction. Handlers typed data from scanned forms into the claims management system by hand. Transcription errors in policy numbers, dates, and amounts caused downstream failures: incorrect payouts, duplicated claims, and compliance audit flags. The operations team estimated a 12% error rate on manually entered claims.

No scalable path. Hiring more claims handlers was not viable. The broker processed 3,000 claims per month and expected volume to double within 18 months following a partnership with a major insurer. They needed automation that handled the complexity of real-world insurance documents without requiring clean, standardised inputs.

The Approach

RG INSYS designed a three-stage pipeline: document ingestion, AI-powered extraction, and automated triage with human-in-the-loop review for edge cases.

Intelligent document ingestion: Claims arrived via email, web upload, or API from partner systems. We built a unified ingestion layer that classified incoming files by type (claim form, police report, repair estimate, photo evidence, medical report) using a fine-tuned document classification model. Each document was OCR-processed using AWS Textract for structured forms and a vision LLM for handwritten or poorly scanned inputs.

LLM-powered data extraction: Once classified, each document was processed by an extraction pipeline built on GPT-4 with structured output schemas. The system extracted policy numbers, claimant details, incident dates, damage descriptions, estimated amounts, and third-party information. Extracted data was validated against the broker's policy database via API, flagging mismatches for human review.

Automated triage and routing: A rules engine combined with an ML scoring model assessed each claim's complexity, estimated value, and fraud risk indicators. Low-risk, straightforward claims (under a configurable threshold) were auto-approved for fast-track settlement. Complex or flagged claims were routed to specialist handlers with a pre-populated summary and confidence scores for each extracted field.

Timeline: Week by Week

Weeks 1–2: Discovery and data audit. We reviewed 500 historical claims to map document types, extraction fields, and edge cases. Architecture design and API contracts with the existing claims management system.

Weeks 3–5: Document ingestion pipeline (email parsing, web upload, API gateway). OCR integration with AWS Textract. Document classification model training on labelled historical data.

Weeks 6–8: LLM extraction pipeline with structured output schemas. Policy validation API. Confidence scoring and human review queue. Extraction accuracy testing against 1,000 manually verified claims.

Weeks 9–10: Triage rules engine and fraud risk scoring. Dashboard for claims managers showing pipeline status, extraction results, and override controls. Audit logging for compliance.

Weeks 11–12: UAT with live claims (shadow mode). Performance tuning. Security review. Production deployment with 30-day parallel run alongside manual process.

Tech Stack

Backend: Node.js 20, Express, TypeScript
AI/ML: OpenAI GPT-4 (structured extraction), AWS Textract (OCR), custom classification model (Python, scikit-learn)
Vector Store: Pinecone (policy document semantic search)
Database: PostgreSQL 16, Redis 7
Infrastructure: AWS (Lambda, S3, SQS, Step Functions, Textract, ECS)
Frontend: React 18, TypeScript, Tailwind CSS
Integrations: Acturis (claims management system), email ingestion via AWS SES
AI tooling: Claude Code, Cursor IDE

Results

85%

Claims auto-triaged

4 hrs

Avg processing (was 5 days)

96%

Extraction accuracy

3K+

Claims processed monthly

12 wks

Concept to production

Key Features Delivered

Multi-format document ingestion: Unified pipeline accepting email attachments, web uploads, and API submissions. Automatic classification into 14 document types with 97% accuracy.
AI-powered data extraction: LLM extracts structured data from claim forms, police reports, repair estimates, and medical reports. Confidence scores on every field let handlers focus review on low-confidence items only.
Automated triage and fraud scoring: Rules engine plus ML model assesses claim complexity and flags fraud indicators. Low-risk claims fast-tracked; high-risk claims routed to specialists with full context.
Claims manager dashboard: Real-time pipeline visibility showing ingestion status, extraction results, triage decisions, and override controls. Full audit trail for FCA compliance.

Drowning in document processing?

We build AI pipelines that extract, validate, and route documents at scale. Get a scope, timeline, and cost estimate within 48 hours.

Book Free Consultation →

AI Document Processing for Insurance Claims