The Problem

Legal work is document-intensive, precedent-driven, and extremely high-stakes for errors. A lawyer handling 20+ active cases spends hours every week just locating relevant documents, finding applicable precedents, and cross-referencing case notes.

Casewin gives every attorney an AI legal analyst that knows their entire case load.

Core Features

RAG-Powered Case Intelligence

Every document in the system — contracts, filings, correspondence, court orders — is chunked, embedded, and stored in ChromaDB. Attorneys can ask natural language questions:

"What were the key dates in the Johnson contract dispute?"
"Find all communications where the defendant acknowledged the agreement"
"What precedents apply to constructive dismissal in the Western Cape?"

The system retrieves the relevant chunks, ranks by relevance, and generates a structured response with source citations.

Document Analysis Pipeline

Uploaded documents are processed through a three-stage pipeline:

Extraction — text pulled from PDFs, Word docs, scanned images (OCR)
Classification — document type identified (contract, pleading, correspondence, etc.)
Chunking — semantically chunked with metadata preserved (parties, dates, key clauses)

Precedent Research

A separate index holds South African case law (SAFLII corpus). When attorneys work on a matter, the system automatically surfaces potentially applicable precedents based on the case type and key facts.

Architecture

Document Upload (Next.js frontend)
        ↓
FastAPI Processing Service
    ↓             ↓
Extraction     Classification
    ↓
ChromaDB Vector Store
    ↓
LangChain Retrieval Chain
    ↓
Anthropic Claude (generation + synthesis)
    ↓
Attorney Interface (Next.js)

Technical Challenges

Chunking strategy for legal documents was the hardest problem. Legal documents have specific structure (recitals, definitions, operative clauses, schedules) that generic semantic chunking destroys. I built a custom parser that preserves clause boundaries and cross-references within chunks.

Multi-matter context isolation — each case needs its own retrieval scope so results from Matter A don't bleed into Matter B. ChromaDB's collection-per-matter approach solved this cleanly.

Status

Currently in beta with two law firm clients. The core RAG pipeline is production-ready; the precedent research feature is being validated for accuracy before GA release.