Researcher profile

Yuanyun Zhang

Yuanyun Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

AURORA: Contextual Orthogonalization for Geometric Representation Learning in Healthcare Foundation Models

Recent healthcare foundation models have achieved strong predictive performance through large scale self supervised learning, yet their latent representations frequently entangle physiologic severity, intervention intensity, observational structure, and institutional workflow into shared embedding directions. While effective for downstream prediction, such representations remain semantically opaque and unstable under contextual shift. We introduce AURORA, Adaptive Uncertainty aware Representations through Orthogonalized Relational Alignment, a new framework for healthcare representation learning based on contextual latent geometry. Rather than optimizing a single unified embedding manifold, AURORA decomposes representations into orthogonal semantic subspaces corresponding to distinct contextual factors and learns relational consistency objectives within each subspace. This induces latent spaces that are both semantically disentangled and geometrically interpretable. Across multiple clinical prediction and retrieval tasks, AURORA consistently outperforms reconstruction, contrastive, and self distillation baselines while substantially improving contextual disentanglement, neighborhood purity, and robustness under institutional distribution shift. Our results suggest that latent geometry itself constitutes an important axis of healthcare foundation model design and that explicitly structuring representation space according to contextual semantics provides a complementary direction beyond conventional predictive compression objectives.

preprint2026arXiv

Event Fields: Learning Latent Event Structure for Waveform Foundation Models

We propose a new class of waveform foundation models that departs from conventional sequence based representations by modeling physiological time series as realizations of latent event processes. Rather than treating signals as collections of local tokens or patches, our approach assumes that clinically meaningful structure arises from temporally extended, interacting events whose boundaries and dynamics are not directly observed. To capture this structure, we introduce a self supervised learning framework that enforces consistency across stochastic segmentations and time frequency projections of the same waveform, encouraging representations that are invariant to signal level perturbations while preserving event level organization. The resulting model combines a segmentation aware encoder with a latent interaction operator that captures dependencies among inferred events, and naturally extends to multimodal settings by aligning modalities through shared event representations. Across a range of physiological benchmarks, including arrhythmia classification, hemodynamic prediction, and waveform retrieval, the proposed method improves performance, robustness, and label efficiency relative to strong sequence based baselines. These results suggest that shifting from signal centric to event centric representations provides a more appropriate inductive bias for modeling physiological dynamics and offers a complementary path to scaling foundation models in healthcare.

preprint2026arXiv

Learning Longitudinal Health Representations from EHR and Wearable Data

Foundation models trained on electronic health records show strong performance on many clinical prediction tasks but are limited by sparse and irregular documentation. Wearable devices provide dense continuous physiological signals but lack semantic grounding. Existing methods usually model these data sources separately or combine them through late fusion. We propose a multimodal foundation model that jointly represents electronic health records and wearable data as a continuous time latent process. The model uses modality specific encoders and a shared temporal backbone pretrained with self supervised and cross modal objectives. This design produces representations that are temporally coherent and clinically grounded. Across forecasting physiological and risk modeling tasks the model outperforms strong electronic health record only and wearable only baselines especially at long horizons and under missing data. These results show that joint electronic health record and wearable pretraining yields more faithful representations of longitudinal health.

preprint2026arXiv

WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records

Representation learning in electronic health records (EHR) has largely followed paradigms inherited from natural language processing, relying on sequence modeling and reconstruction based objectives that treat clinical labels as ground truth. However, real world clinical supervision is inherently weak, arising from heterogeneous, noisy, and institution specific labeling processes such as billing codes, heuristic phenotypes, and incomplete annotations. In this work, we propose WISTERIA, a weakly supervised representation learning framework that models labels as stochastic observations of an underlying latent clinical state. Instead of optimizing against a single supervision signal, WISTERIA constructs multiple weak supervision operators and learns representations by enforcing consistency across their induced label distributions. This multi view formulation induces an implicit denoising mechanism, allowing the model to recover clinically meaningful structure by reconciling disagreement between noisy labelers. We further incorporate ontology aware regularization in the label space to impose semantic structure over supervision signals. Empirically, WISTERIA improves predictive performance across standard EHR benchmarks, demonstrates strong robustness to label noise, and exhibits superior cross institutional generalization compared to sequence based pretraining objectives. These results suggest that explicitly modeling the supervision process rather than treating labels as fixed targets provides a more appropriate inductive bias for learning robust and clinically meaningful representations from EHR data.