YSPH Biostatistics Seminar- “Advancing Responsible Statistical and AI/ML Methods for Harnessing the Power of Electronic Health Records”
Note: BIS 526 students are required to attend in person. Others are invited to attend in person but may also attend via zoom.
Speaker- Qi Long, Ph.D.
Title- “Advancing Responsible Statistical and AI/ML Methods for Harnessing the Power of Electronic Health Records”
Abstract
Rich electronic health records (EHR) data offer remarkable opportunities in advancing precision medicine (Orcutt et al., 2025, Nature Medicine), they also present daunting analytical challenges. Multi-modal data in EHR that are recorded at irregular time intervals with varying frequencies include structured data such as labs and vitals, codified data such as diagnosis and procedure codes, and unstructured data such as clinical notes and pathology reports. They are typically incomplete and fraught with other errors and biases. What’s more, data gaps and errors in EHRs are often unequally distributed across patient groups: People with less access to care, often people with lower socioeconomic status, tend to have more incomplete data in EHRs. Such data issues, if not adequately addressed, would lead to biased results and exacerbate health disparities (Getzen et al. 2023, JBI). In this talk, I will share my research group’s recent works on developing responsible statistical and AI/ML methods including large language models (LLMs) and agentic AI for addressing these challenges. Since LLMs are themselves plagued by various biases, I will also discuss our ongoing research on developing rigorous statistical and ML approaches for mitigating pitfalls and risks of LLMs (e.g., Xiao et al. 2025a JASA and 2025b, ICML; Li et al. 2025a AoS and 2025b JRSSB).
Speakers
Contact
Host Organization
- biostatisitcs