Subtype-Aware Registration of Longitudinal Electronic Health Records
Electronic Health Records (EHRs) contain extensive patient information that can inform downstream clinical decisions, such as mortality prediction, disease phenotyping, and disease onset prediction. A key challenge in EHR data analysis is the temporal gap between when a condition is first recorded and its actual onset time. Such timeline misalignment can lead to artificially distinct biomarker trends among patients with similar disease progression, undermining the reliability of downstream analyses and complicating tasks such as disease subtyping and outcome prediction. To address this challenge, we provide a subtype-aware timeline registration method that leverages data projection and discrete optimization to correct timeline misalignment. Through simulation and real-world data analyses, we demonstrate that the proposed method effectively aligns distorted observed records with the true disease progression patterns, enhancing subtyping clarity and improving performance in downstream clinical analyses.