The problem: lawyers reviewing the wrong documents
A mid-sized technology company with an in-house legal team of six handled over 400 contracts per quarter — vendor agreements, SaaS subscriptions, NDAs, partnership contracts and employment agreements. The team was experienced and capable. The bottleneck was not skill. It was volume.
First-pass document review — reading a contract to identify key clauses, flag risks, and determine whether it required negotiation — was consuming approximately 70% of each lawyer's billable hours. That left little time for the work that actually required legal expertise: negotiation, advice, risk mitigation, escalation to external counsel.
The Head of Legal put it directly: "We are spending lawyer time doing work that a structured process could handle. The problem is we don't have the structured process."
The diagnosis: a workflow problem, not a staffing problem
The instinct was to hire. Before recommending headcount, GoMeasure AI ran a three-week workflow diagnostic to understand where lawyer time was actually going.
The findings were consistent with what we see in most document-heavy legal teams:
- 68% of contracts reviewed were standard agreements with minor variations — NDAs, MSAs, SaaS terms — where first-pass review followed a predictable pattern
- Only 22% of contracts contained genuinely novel clauses requiring substantive lawyer judgment
- The remaining 10% were escalation-worthy: unusual risk, non-standard terms, or regulatory exposure
The implication was clear: the 68% of standard contracts were consuming the same lawyer hours as the 22% that actually needed them. The workflow had no triage layer.
The solution: AI-assisted triage with lawyer review
GoMeasure AI designed and deployed a three-stage workflow agent over an eight-week engagement.
Stage 1 — Document intake and classification
Contracts were ingested through a structured intake form connected to the legal team's existing document management system. The AI classified each contract by type, flagged the applicable review template, and extracted the key clauses relevant to that contract type — parties, term, payment, liability cap, indemnification, termination, and governing law.
Stage 2 — Evidence scoring against a risk rubric
Each extracted clause was scored against a risk rubric co-developed with the legal team. The rubric codified what the team already knew — which clause variations were standard, which were borderline, and which were unacceptable. The AI produced a structured evidence report: clause by clause, with a risk score, the extracted text, and a comparison against the team's standard position.
Stage 3 — Confidence-based routing
Contracts where all clauses scored within acceptable ranges with high confidence were flagged as "standard — lawyer confirmation required." A lawyer could approve these in under five minutes. Contracts with borderline clauses, low-confidence extractions, or flagged risk terms were routed directly to the reviewer queue as "requires substantive review." Contracts with critical flags went immediately to the team lead.
Results at 90 days
The system went live with a two-week parallel run (AI and manual review running simultaneously) to calibrate the confidence thresholds and validate the extraction accuracy. At 90 days post-deployment:
- 62% reduction in average lawyer time per standard contract (from 47 minutes to 18 minutes)
- 74% of NDAs and SaaS agreements handled through the confirmation queue rather than full review
- 100% audit coverage — every contract had a structured evidence log with the AI's extraction, score, and the lawyer's decision recorded
- 3 previously-missed liability clauses identified in the first month that had been missed in manual review — the AI's structured extraction caught inconsistencies that skimming had missed
- Lawyer time redirected to negotiation, external counsel coordination, and strategic advisory — work that had been consistently deprioritised due to review volume
What made it work — and what nearly didn't
The technical components were straightforward. The harder work was the risk rubric. Getting six lawyers to agree on what "acceptable," "borderline," and "unacceptable" meant for each clause type took three working sessions and multiple iterations. That codified knowledge became the foundation of the entire system — and it would not have been possible without the lawyers themselves building it.
The near-failure was the first calibration run. The initial confidence thresholds were too conservative — routing 60% of contracts to full review rather than the target 30%. The legal team almost lost confidence in the system before it had a chance to prove itself. Recalibrating over two weeks using real examples from their own contract library brought the routing accuracy to an acceptable level.
The lesson: AI systems for legal work need to be calibrated on the organisation's own contracts, against the organisation's own standards. Generic legal AI models are a starting point, not a solution.
What the team said at six months
The Head of Legal, six months post-deployment: "The honest answer is that we do not think about the AI much any more. It is just part of how contracts move through the team. What changed is that my lawyers are doing legal work again — not document administration."
Key takeaways
- Most legal document review bottlenecks are routing problems, not capacity problems.
- AI should triage and structure — lawyers should judge and decide.
- The risk rubric (what counts as standard vs. risky) must be built with the lawyers, not for them.
- Calibrate confidence thresholds on your own contracts, not generic benchmarks.
- Audit coverage is a governance benefit, not just a compliance checkbox.
- Run a parallel period before going live — it prevents the trust collapse that kills early AI deployments.