Researcher profile

Huiyuan Wang

Huiyuan Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims

Evidence derived from large-scale real-world data (RWD) is increasingly informing regulatory evaluation and healthcare decision-making. Administrative claims provide population-scale, longitudinal records of healthcare utilization, expenditure, and detailed coding of diagnoses, procedures, and medications, yet their potential as a substrate for healthcare foundation models remains largely unexplored. Here we present ReClaim, a generative transformer trained from scratch on 43.8 billion medical events from more than 200 million enrollees in the MarketScan claims data spanning 2008-2022. ReClaim models longitudinal trajectories across diagnoses, procedures, medications, and expenditure, and was scaled to 140 million, 700 million, and 1.7 billion parameters. Across over 1,000 disease-onset prediction tasks, ReClaim achieved a mean AUC of 75.6%, substantially outperforming disease-specific LightGBM (66.3%) and the transformer-based Delphi model (69.4%), with the largest gains for rare diseases. These advantages held across retrospective and prospective evaluations and in external validation on two independent datasets. Performance improved monotonically with scale, and post-training added 13.8 percentage points over pre-training alone. Beyond disease prediction, ReClaim captured financial outcomes and improved real-world evidence (RWE) analyses: for healthcare expenditure forecasting it increased explained variance from 0.28 to 0.37 relative to LightGBM, and in a target trial emulation it reduced systematic bias by 72% on average relative to Delphi. Together, these results establish administrative claims as a scalable substrate for healthcare foundation models and show that learned representations generalize across time periods and data sources, supporting disease surveillance, expenditure forecasting, and RWE generation.

preprint2026arXiv

Radio AGN feedback sustains quiescence only in a minority of massive galaxies

Radio active galactic nuclei (AGNs) eject a huge amount of energy into the surrounding medium and are thought to potentially prevent gas cooling and maintain the quiescence of massive galaxies. The short-lived, sporadic, and anisotropic nature of radio activities, coupled with the detection of abundant cold gas around some massive quiescent galaxies, raise questions about the efficiency of radio feedback in massive galaxies. Here we present an innovative method rooted in artificial intelligence to separate galaxies in which radio feedback is effective (RFE), regardless of current radio emission, from those in which radio feedback is ineffective (RFI), according to their optical images. Galaxies categorized as RFE are all dynamically hot, whereas quiescent RFI (RFI-Q) galaxies usually have extended cold-disk components. At given stellar mass, dark matter halos hosting RFE galaxies are between four to ten times more massive than those of RFI-Q galaxies. We find, for the first time, that almost all RFE galaxies have scant cold gas, irrespective of AGN activity. In contrast, many RFI-Q galaxies are surrounded by substantial amounts of condensed atomic gas, indicating a different evolutionary path from RFE galaxies. Our finding provides direct and compelling evidence that a radio AGN has gone through about 300 on-off cycles and that radio feedback can prevent gas cooling over a timescale much longer than that of radio activity. Contrary to general belief, our analysis shows that only a small fraction of massive galaxies are influenced by strong radio AGNs, suggesting that current galaxy formation models need serious revision.

preprint2022arXiv

An Extended Halo-based Group/Cluster finder: application to the DESI legacy imaging surveys DR8

We extend the halo-based group finder developed by \citet[][]{Yang2005a} to use data {\it simultaneously} with either photometric or spectroscopic redshifts. A mock galaxy redshift survey constructed from a high-resolution N-body simulation is used to evaluate the performance of this extended group finder. For galaxies with magnitude ${\rm z\le 21}$ and redshift $0<z\le 1.0$ in the DESI legacy imaging surveys (the Legacy Surveys), our group finder successfully identifies more than 60\% of the members in about $90\%$ of halos with mass $\ga 10^{12.5}\msunh$. Detected groups with mass $\ga 10^{12.0}\msunh$ have a purity (the fraction of true groups) greater than 90\%. The halo mass assigned to each group has an uncertainty of about 0.2 dex at the high mass end $\ga 10^{13.5}\msunh$ and 0.40 dex at the low mass end. Groups with more than 10 members have a redshift accuracy of $\sim 0.008$. We apply this group finder to the Legacy Surveys DR8 and find 5.2 Million groups with at least 3 members. About 387,000 of these groups have at least 10 members. The resulting catalog containing 3D coordinates, richness, halo masses, and total group luminosities, is made publicly available.

preprint2022arXiv

ELUCID VII: Using Constrained Hydro Simulations to Explore the Gas Component of the Cosmic Web

Using reconstructed initial conditions in the SDSS survey volume, we carry out constrained hydrodynamic simulations in three regions representing different types of the cosmic web: the Coma cluster of galaxies; the SDSS great wall; and a large low-density region at $z\sim 0.05$. These simulations, which include star formation and stellar feedback but no AGN formation and feedback, are used to investigate the properties and evolution of intergalactic and intra-cluster media. About half of the warm-hot intergalactic gas is associated with filaments in the local cosmic web. Gas in the outskirts of massive filaments and halos can be heated significantly by accretion shocks generated by mergers of filaments and halos, respectively, and there is a tight correlation between gas temperature and the strength of the local tidal field. The simulations also predict some discontinuities associated with shock fronts and contact edges, which can be tested using observations of the thermal SZ effect and X-rays. A large fraction of the sky is covered by Ly$α$ and OVI absorption systems, and most of the OVI systems and low-column density HI systems are associated with filaments in the cosmic web. The constrained simulations, which follow the formation and heating history of the observed cosmic web, provide an important avenue to interpret observational data. With full information about the origin and location of the cosmic gas to be observed, such simulations can also be used to develop observational strategies.

preprint2022arXiv

Elucidating Galaxy Assembly Bias in SDSS

We investigate the level of galaxy assembly bias in the Sloan Digital Sky Survey (SDSS) main galaxy sample using ELUCID, a state-of-the-art constrained simulation that accurately reconstructed the initial density perturbations within the SDSS volume. On top of the ELUCID haloes, we develop an extended HOD model that includes the assembly bias of central and satellite galaxies, parameterized as $\mathcal{Q}_\mathrm{cen}$ and $\mathcal{Q}_\mathrm{sat}$, respectively, to predict a suite of one- and two-point observables. In particular, our fiducial constraint employs the probability distribution of the galaxy number counts measured on $8\,\mathrm{Mpc}\,h^{-1}$ scales $N_8^g$ and the projected cross-correlation functions of quintiles of galaxies selected by $N_8^g$ with our entire galaxy sample. We perform extensive tests of the efficacy of our method by fitting the same observables to mock data using both constrained and non-constrained simulations. We discover that in many cases the level of cosmic variance between the two simulations can produce biased constraints that lead to an erroneous detection of galaxy assembly bias if the non-constrained simulation is used. When applying our method to the SDSS data, the ELUCID reconstruction effectively removes an otherwise strong degeneracy between cosmic variance and galaxy assembly bias in SDSS, enabling us to derive an accurate and stringent constraint on the latter. Our fiducial ELUCID constraint, for galaxies above a stellar mass threshold $M_*{=}10^{10.2}\,h^{-2}\,M_\odot$, is $\mathcal{Q}_\mathrm{cen}{=}{-}0.09\pm{0.05}$ and $\mathcal{Q}_\mathrm{sat}{=}0.09\pm{0.10}$, indicating no evidence for a significant~($>2σ$) galaxy assembly bias in the local Universe probed by SDSS. Finally, our method provides a promising path to the robust modelling of the galaxy-halo connection within future surveys like DESI and PFS.

preprint2022arXiv

Evidence for quasar fast outflows being accelerated at the scale of tens of parsecs

Quasar outflows may play a crucial role in regulating the host galaxy, although the spatial scale of quasar outflows remain a major enigma, with their acceleration mechanism poorly understood. The kinematic information of outflow is the key to understanding its origin and acceleration mechanism. Here, we report the galactocentric distances of different outflow components for both a sample and an individual quasar. We find that the outflow distance increases with velocity, with a typical value from several parsecs to more than one hundred parsecs, providing direct evidence for an acceleration happening at a scale of the order of 10 parsecs. These outflows carry ~1% of the total quasar energy, while their kinematics are consistent with a dust driven model with a launching radius comparable to the scale of a dusty torus, indicating that the coupling between dust and quasar radiation may produce powerful feedback that is crucial to galaxy evolution.

preprint2022arXiv

Massive Star-Forming Galaxies Have Converted Most of Their Halo Gas into Stars

In the local Universe, the efficiency for converting baryonic gas into stars is very low. In dark matter halos where galaxies form and evolve, the average efficiency varies with galaxy stellar mass and has a maximum of about twenty percent for Milky-Way-like galaxies. The low efficiency at higher mass is believed to be produced by some quenching processes, such as the feedback from active galactic nuclei. We perform an analysis of weak lensing and satellite kinematics for SDSS central galaxies. Our results reveal that the efficiency is much higher, more than sixty percent, for a large population of massive star-forming galaxies around $10^{11}M_{\odot}$. This suggests that these galaxies acquired most of the gas in their halos and converted it into stars without being affected significantly by quenching processes. This population of galaxies is not reproduced in current galaxy formation models, indicating that our understanding of galaxy formation is incomplete. The implications of our results on circumgalactic media, star formation quenching and disc galaxy rotation curves are discussed. We also examine systematic uncertainties in halo-mass and stellar-mass measurements that might influence our results.

preprint2020arXiv

Detection of missing baryons in galaxy groups with kinetic Sunyaev-Zel&#39;dovich effect

We present the detection of the kinetic Sunyaev-Zel&#39;dovich effect (kSZE) signals from groups of galaxies as a function of halo mass down to $\log (M_{500}/{\rm M_\odot}) \sim 12.3$, using the {\it Planck} CMB maps and stacking about $40,000$ galaxy systems with known positions, halo masses, and peculiar velocities. The signals from groups of different mass are constrained simultaneously to take care of projection effects of nearby halos. The total kSZE flux within halos estimated implies that the gas fraction in halos is about the universal baryon fraction, even in low-mass halos, indicating that the `missing baryons&#39; are found. Various tests performed show that our results are robust against systematic effects, such as contamination by infrared/radio sources and background variations, beam-size effects and contributions from halo exteriors. Combined with the thermal Sunyaev-Zel&#39;dovich effect, our results indicate that the `missing baryons&#39; associated with galaxy groups are contained in warm-hot media with temperatures between $10^5$ and $10^6\,{\rm K}$.

preprint2020arXiv

Detection of missing baryons in galaxy groups with kinetic Sunyaev-Zel&#39;dovich effect

We present the detection of the kinetic Sunyaev-Zel&#39;dovich effect (kSZE) signals from groups of galaxies as a function of halo mass down to $\log (M_{500}/{\rm M_\odot}) \sim 12.3$, using the {\it Planck} CMB maps and stacking about $40,000$ galaxy systems with known positions, halo masses, and peculiar velocities. The signals from groups of different mass are constrained simultaneously to take care of projection effects of nearby halos. The total kSZE flux within halos estimated implies that the gas fraction in halos is about the universal baryon fraction, even in low-mass halos, indicating that the `missing baryons&#39; are found. Various tests performed show that our results are robust against systematic effects, such as contamination by infrared/radio sources and background variations, beam-size effects and contributions from halo exteriors. Combined with the thermal Sunyaev-Zel&#39;dovich effect, our results indicate that the `missing baryons&#39; associated with galaxy groups are contained in warm-hot media with temperatures between $10^5$ and $10^6\,{\rm K}$.

preprint2020arXiv

Probing Primordial Chirality with Galaxy Spins

Chiral symmetry is maximally violated in weak interactions, and such microscopic asymmetries in the early Universe might leave observable imprints on astrophysical scales without violating the cosmological principle. In this Letter, we propose a helicity measurement to detect primordial chiral violation. We point out that observations of halo-galaxy angular momentum directions (spins), which are frozen in during the galaxy formation process, provide a fossil chiral observable. From the clustering mode of large scale structure of the Universe, we construct a spin mode in Lagrangian space and show in simulations that it is a good probe of halo-galaxy spins. In standard model, a strong symmetric correlation between the left and right helical components of this spin mode and galaxy spins is expected. Measurements of these correlations will be sensitive to chiral breaking, providing a direct test of chiral symmetry breaking in the early Universe.

preprint2020arXiv

Relating the structure of dark matter halos to their assembly and environment

We use a large $N$-body simulation to study the relation of the structural properties of dark matter halos to their assembly history and environment. The complexity of individual halo assembly histories can be well described by a small number of principal components (PCs), which, compared to formation times, provide a more complete description of halo assembly histories and have a stronger correlation with halo structural properties. Using decision trees built with the random ensemble method, we find that about $60\%$, $10\%$, and $20\%$ of the variances in halo concentration, axis ratio, and spin, respectively, can be explained by combining four dominating predictors: the first PC of the assembly history, halo mass, and two environment parameters. Halo concentration is dominated by halo assembly. The local environment is found to be important for the axis ratio and spin but is degenerate with halo assembly. The small percentages of the variance in the axis ratio and spin that are explained by known assembly and environmental factors suggest that the variance is produced by many nuanced factors and should be modeled as such. The relations between halo intrinsic properties and environment are weak compared to their variances, with the anisotropy of the local tidal field having the strongest correlation with halo properties. Our method of dimension reduction and regression can help simplify the characterization of the halo population and clarify the degeneracy among halo properties.

preprint2020arXiv

The Breakdown Scale of HI Bias Linearity

The 21 cm intensity mapping experiments promise to obtain the large-scale distribution of HI gas at the post-reionization epoch. In order to reveal the underlying matter density fluctuations from the HI mapping, it is important to understand how HI gas traces the matter density distribution. Both nonlinear halo clustering and nonlinear effects modulating HI gas in halos may determine the scale below which the HI bias deviates from linearity. We employ three approaches to generate the mock HI density from a large-scale N-body simulation at low redshifts, and demonstrate that the assumption of HI linearity is valid at the scale corresponding to the first peak of baryon acoustic oscillations, but breaks down at $k \gtrsim 0.1\,h\, {\rm Mpc}^{-1}$. The nonlinear effects of halo clustering and HI content modulation counteract each other at small scales, and their competition results in a model-dependent &#34;sweet-spot&#34; redshift near $z$=1 where the HI bias is scale-independent down to small scales. We also find that the linear HI bias scales approximately linearly with redshift for $z\le 3$.

preprint2020arXiv

The Formation History of Subhalos and the Evolution of Satellite Galaxies

Satellites constitute an important fraction of the overall galaxy population and are believed to form in dark matter subhalos. Here we use the cosmological hydrodynamic simulation TNG100 to investigate how the formation histories of subhalos affect the properties and evolution of their host galaxies. We use a scaled formation time ($a_{\rm nf}$) to characterize the mass assembly histories of the subhalos before they are accreted by massive host halos. We find that satellite galaxies in young subhalos (low $a_{\rm nf}$) are less massive and more gas rich, and have stronger star formation and a higher fraction of ex situ stellar mass than satellites in old subhalos (high $a_{\rm nf}$). Furthermore, these low $a_{\rm nf}$ satellites require longer timescales to be quenched as a population than the high $a_{\rm nf}$ counterparts. We find very different merger histories between satellites in fast accretion (FA, $a_{\rm nf}<1.3$) and slow accretion (SA, $a_{\rm nf}>1.3$) subhalos. For FA satellites, the galaxy merger frequency dramatically increases just after accretion, which enhances the star formation at accretion. While, for SA satellites, the mergers occur smoothly and continuously across the accretion time. Moreover, mergers with FA satellites happen mainly after accretion, while a contrary trend is found for SA satellites. Our results provide insight into the evolution and star formation quenching of the satellite population.

preprint2019arXiv

The Dearth of Difference between Central and Satellite Galaxies III. Environmental Dependence of Mass-Size and Mass-Structure Relations

As demonstrated in Paper I, the quenching properties of central and satellite galaxies are quite similar as long as both stellar mass and halo mass are controlled. Here we extend the analysis to the size and bulge-to-total light ratio (B/T) of galaxies. In general central galaxies have size-stellar mass and B/T-stellar mass relations different from satellites. However, the differences are eliminated when halo mass is controlled. We also study the dependence of size and B/T on halo-centric distance and find a transitional stellar mass (M$_{*,t}$) at given halo mass (M$_h$), which is about one fifth of the mass of the central galaxies in halos of mass M$_h$. The transitional stellar masses for size, B/T and quenched fraction are similar over the whole halo mass range, suggesting a connection between the quenching of star formation and the structural evolution of galaxies. Our analysis further suggests that the classification based on the transitional stellar mass is more fundamental than the central-satellite dichotomy, and provide a more reliable way to understand the environmental effects on galaxy properties. We compare the observational results with the hydro-dynamical simulation, EAGLE and the semi-analytic model, L-GALAXIES. The EAGLE simulation successfully reproduces the similarities of size for centrals and satellites and even M$_{*,t}$, while L-GALAXIES fails to recover the observational results.