Discovery OCR & Party Information Extraction Tool

Capstone Project: Discovery OCR & Party Information Extraction Tool

1. Project Overview

The Santa Barbara County Public Defender’s Office (SBC PDO) is sponsoring a capstone project to build a cloud-based pipeline that converts discovery stored in Box.com into structured, verified party data for eDefender. On retrieval, the system will capture and retain the Bates stamp number(s) that indicate where each party appears in the discovery. The extraction explicitly covers all party types—victims, witnesses, involved parties, and law enforcement officers and includes available contact information (address, phone, email). Staff will verify extracted data prior to submission to eDefender. Project Lead: AJ Voisan. Project Sponsor: Deepak Budwani. SMEs: Shawna Mateer (LOP/Discovery Intake) and Angie Stokke (eDefender/CM).

2. Current Process (Initial & Supplemental Discovery)

• Channels: e-Disclosure portal, Box.com handoff from DA, email attachments, and physical media.
• Intake: Discovery receipt emails are categorized; files are saved to case folders in Box.com using naming conventions; police reports trigger party review and entry in eDefender.
• Manual effort: Staff read reports, identify parties (incl. officers), note references, and hand-enter contact details; conflict check is initiated after entries are saved.
• Pain points: Non-searchable PDFs, retyping data, risk of missed names, and multi-hour reviews for large packets.

3. Project Rationale

Automating extraction and review reduces manual data entry, improves accuracy, supports reliable conflict checks, and delivers faster access to contactable parties for attorneys and investigators. Capturing Bates stamps enables precise source tracing.

4. Project Components

Component: Document Processing & OCR

Scope: Retrieve discovery from Box.com (case folders) via API or scheduled jobs. Normalize file types (PDF, images) and run OCR to generate machine-readable text. Capture document metadata (filename, PD#, Disc#, received date, source agency) and page-to-Bates mapping when available.

Inputs: Discovery PDFs/images, Box folder paths/IDs, Disc numbers, Bates-stamped pages.

Outputs: Searchable text per document/page, metadata record, Bates index.



RoleNamee-mail
Faculty AdvisorJungsoo Limjlim34@calstatela.edu
Project LeadJennifer Liasjlias2@calstatela.edu
Customer liaison/requirements leadNadia Hernandeznherna170@calstatela.edu
Architecture/design leadJoseph Lamjlam87@calstatela.edu
UI LeadLemeng Zhaolzhao25@calstatela.edu
Backend LeadAddison Zhouazhou19@calstatela.edu
QA/QC leadJesus Villajvilla24@calstatela.edu
Documentation LeadDaniel Concepciondconcep@calstatela.edu
Demo LeadThomas Ogdentogden3@calstatela.edu
Presentation LeadTommy Works

tworks@calstatela.ed

Support LeadJose Holguinjholgu21@calstatela.edu
Co-LeadPeter Uy

puy@calstatela.edu




TeamsMembers
Staff Review UIThomas, Jen, Lemeng
Data Structure and ValidationPeter, Joseph
Document OCR and ExtractionNadia, Daniel, Tommy
Logging and AuditJesus, Addison, Jose



MeetingsDateTime
Weekly advisor group meetingFriday8 AM - 9:00 AM
Bi-Weekly Liaison MeetingFriday9 AM - 10:00 AM
Weekly team meetingFriday10 AM - 11:00 AM

Student Team
  • Daniel Concepcion
  • Nadia Hernandez
  • Jose Holguin
  • Joseph Lam
  • Jennifer Lias
  • Thomas Ogden Jr
  • Peter Uy
  • Jesus Villa
  • Tommy Works
  • Lemeng Zhao
  • Addison Zhou