The Problem
Legal work is document-intensive, precedent-driven, and extremely high-stakes for errors. A lawyer handling 20+ active cases spends hours every week just locating relevant documents, finding applicable precedents, and cross-referencing case notes.
Casewin gives every attorney an AI legal analyst that knows their entire case load.
Core Features
RAG-Powered Case Intelligence
Every document in the system — contracts, filings, correspondence, court orders — is chunked, embedded, and stored in ChromaDB. Attorneys can ask natural language questions:
- "What were the key dates in the Johnson contract dispute?"
- "Find all communications where the defendant acknowledged the agreement"
- "What precedents apply to constructive dismissal in the Western Cape?"
The system retrieves the relevant chunks, ranks by relevance, and generates a structured response with source citations.
Document Analysis Pipeline
Uploaded documents are processed through a three-stage pipeline:
- Extraction — text pulled from PDFs, Word docs, scanned images (OCR)
- Classification — document type identified (contract, pleading, correspondence, etc.)
- Chunking — semantically chunked with metadata preserved (parties, dates, key clauses)
Precedent Research
A separate index holds South African case law (SAFLII corpus). When attorneys work on a matter, the system automatically surfaces potentially applicable precedents based on the case type and key facts.
Architecture
Document Upload (Next.js frontend)
↓
FastAPI Processing Service
↓ ↓
Extraction Classification
↓
ChromaDB Vector Store
↓
LangChain Retrieval Chain
↓
Anthropic Claude (generation + synthesis)
↓
Attorney Interface (Next.js)
Technical Challenges
Chunking strategy for legal documents was the hardest problem. Legal documents have specific structure (recitals, definitions, operative clauses, schedules) that generic semantic chunking destroys. I built a custom parser that preserves clause boundaries and cross-references within chunks.
Multi-matter context isolation — each case needs its own retrieval scope so results from Matter A don't bleed into Matter B. ChromaDB's collection-per-matter approach solved this cleanly.
Status
Currently in beta with two law firm clients. The core RAG pipeline is production-ready; the precedent research feature is being validated for accuracy before GA release.