A time-consuming process
That manual reporting process, however, took far too long, impacting King County’s ability to shape response. It took county employees about 10 to 12 hours to print and scan the documents associated with 200 fatal drug overdoses and another 4,000 hours to extract the data in the documents and fill in the reporting forms, with dozens of fields to populate.
With funding assistance from the CDC and the US Department of Health and Human Resources, the medical examiner’s office worked with the King County Department of Information Technology to develop a suite of tools, using natural language processing (NLP) and machine learning (ML) to automate the data extraction and form completion needed to report drug deaths.
“This project addresses the bottleneck issues in these programs, what’s preventing them from being more efficient,” Martin says.
Step by step
The new process involves three steps. Multipage incident and toxicology reports filed after a fatal drug overdose are first scanned so that information can be extracted using optical character recognition.
During the second phase, NLP and ML models created and trained by the King County Department of IT extract the pertinent information from these digitized reports. The ML models include classic ML and deep learning to predict category labels from the narrative text in reports.
King County’s NLP model was based on BERT, an advanced large language model (LLM). The IT department also used the Hugging Face online AI service and PyTorch, a Python framework for building deep learning models. Azure Databricks is also employed for data analytics as part of the solution.