Discovery OCR & Party Information Extraction Tool
Capstone Project: Discovery OCR & Party
Information Extraction Tool
1. Project Overview
The Santa Barbara
County Public Defender’s Office (SBC PDO) is sponsoring a capstone project to
build a cloud-based pipeline that converts discovery stored in Box.com into
structured, verified party data for eDefender. On retrieval, the system will
capture and retain the Bates stamp number(s) that indicate where each party
appears in the discovery. The extraction explicitly covers all party
types—victims, witnesses, involved parties, and law enforcement officers and
includes available contact information (address, phone, email). Staff will
verify extracted data prior to submission to eDefender. Project Lead: AJ
Voisan. Project Sponsor: Deepak Budwani. SMEs: Shawna Mateer (LOP/Discovery
Intake) and Angie Stokke (eDefender/CM).
2. Current Process (Initial & Supplemental Discovery)
• Channels:
e-Disclosure portal, Box.com handoff from DA, email attachments, and physical
media.
• Intake: Discovery receipt emails are categorized; files are saved to case
folders in Box.com using naming conventions; police reports trigger party
review and entry in eDefender.
• Manual effort: Staff read reports, identify parties (incl. officers), note
references, and hand-enter contact details; conflict check is initiated after
entries are saved.
• Pain points: Non-searchable PDFs, retyping data, risk of missed names, and
multi-hour reviews for large packets.
3. Project Rationale
Automating extraction
and review reduces manual data entry, improves accuracy, supports reliable
conflict checks, and delivers faster access to contactable parties for
attorneys and investigators. Capturing Bates stamps enables precise source
tracing.
4. Project Components
Component: Document Processing & OCR
Scope: Retrieve
discovery from Box.com (case folders) via API or scheduled jobs. Normalize file
types (PDF, images) and run OCR to generate machine-readable text. Capture
document metadata (filename, PD#, Disc#, received date, source agency) and
page-to-Bates mapping when available.
Inputs: Discovery
PDFs/images, Box folder paths/IDs, Disc numbers, Bates-stamped pages.
Outputs: Searchable
text per document/page, metadata record, Bates index.
| Role | Name | |
|---|---|---|
| Faculty Advisor | Jungsoo Lim | jlim34@calstatela.edu |
| Project Lead | Jennifer Lias | jlias2@calstatela.edu |
| Customer liaison/requirements lead | Nadia Hernandez | nherna170@calstatela.edu |
| Architecture/design lead | Joseph Lam | jlam87@calstatela.edu |
| UI Lead | Lemeng Zhao | lzhao25@calstatela.edu |
| Backend Lead | Addison Zhou | azhou19@calstatela.edu |
| QA/QC lead | Jesus Villa | jvilla24@calstatela.edu |
| Documentation Lead | Daniel Concepcion | dconcep@calstatela.edu |
| Demo Lead | Thomas Ogden | togden3@calstatela.edu |
| Presentation Lead | Tommy Works | tworks@calstatela.ed |
| Support Lead | Jose Holguin | jholgu21@calstatela.edu |
| Co-Lead | Peter Uy | puy@calstatela.edu |
| Teams | Members |
| Staff Review UI | Thomas, Jen, Lemeng |
| Data Structure and Validation | Peter, Joseph |
| Document OCR and Extraction | Nadia, Daniel, Tommy |
| Logging and Audit | Jesus, Addison, Jose |
| Meetings | Date | Time |
| Weekly advisor group meeting | Friday | 8 AM - 9:00 AM |
| Bi-Weekly Liaison Meeting | Friday | 9 AM - 10:00 AM |
| Weekly team meeting | Friday | 10 AM - 11:00 AM |
- Daniel Concepcion
- Nadia Hernandez
- Jose Holguin
- Joseph Lam
- Jennifer Lias
- Thomas Ogden Jr
- Peter Uy
- Jesus Villa
- Tommy Works
- Lemeng Zhao
- Addison Zhou