Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

Hong Yu

Second Advisor

Madalina Fiterau Brostean

Third Advisor

Razieh Negin Rahimi

Fourth Advisor

Feifan Liu

Subject Categories

Artificial Intelligence and Robotics | Biomedical Informatics | Data Science | Public Health Education and Promotion


Clinical decision support systems (CDSS) provide intelligently filtered knowledge and patient-specific and population information to the clinicians, nursing staff and healthcare professionals. CDSS can significantly improve the quality, safety, efficiency and effectiveness of health care. Over the last decade, American hospitals have adopted electronic health records (EHRs) widely resulting in a massive collection of clinical notes such as admission notes, physician notes, nursing notes and discharge summaries. For the past couple of decades, most of the work in CDSS has been focused on developing knowledge-based systems using structured data such as medications and ICD codes. In contrast, the EHR notes incorporate rich and important information including adverse drug events, suicidal behaviors, and social determinants of health, all of which are substantially under-represented in the structured data. This presents a unique opportunity for natural language processing (NLP), with its ability to process a massive amount of EHR notes beyond the scope of human capability, to provide new clinical evidence previously missed out by any CDSS systems. We contribute to the NLP and clinical community by developing a robust multi-task learning framework for CDSS. First, we identified causality between medication and its adverse drug reactions using a clinically standardized assessment technique called Naranjo Scale. Our multi-task learning framework takes a question from Naranjo Scale, along with a patient's note to identify relevant evidence sentences and paragraphs in the note and predicts the final answer for the question. Second, we extracted suicide attempt (SA) and suicide ideation (SI) events from patients' clinical notes. We created the first publicly available suicide attempt and ideation events (ScAN) dataset. We then built a multi-task learning model ScANER (Suicide Attempt and Ideation Attempts Retriever) to extract the relevant suicidal behavior evidence from clinical notes. Next, we deployed multiple parameter-efficient transfer learning techniques to fine-tune the ScANER model for different hospitals’ EHR datasets. By fine-tuning less than ~2% of ScANER’s parameters on a small annotated data, ScANER is able to maintain a similar performance. To provide evidence for population-level CDSS for suicide prevention, we identified risk factors for suicidal behaviors using large EHRs (~7 million patients). We found that patients with traumatic brain injury and/or post-traumatic stress disorder are more than twice as likely to have suicidal behavior as compared to the control population. We also studied the prevalence of other risk factors, such as social determinants of health, extracted from EHR notes using different NLP approaches.


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Available for download on Sunday, May 26, 2024