Researcher profile

Xiaohu Yang

Xiaohu Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
31works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

31 published item(s)

preprint2026arXiv

DepRadar: Agentic Coordination for Context Aware Defect Impact Analysis in Deep Learning Libraries

Deep learning libraries like Transformers and Megatron are now widely adopted in modern AI programs. However, when these libraries introduce defects, ranging from silent computation errors to subtle performance regressions, it is often challenging for downstream users to assess whether their own programs are affected. Such impact analysis requires not only understanding the defect semantics but also checking whether the client code satisfies complex triggering conditions involving configuration flags, runtime environments, and indirect API usage. We present DepRadar, an agent coordination framework for fine grained defect and impact analysis in DL library updates. DepRadar coordinates four specialized agents across three steps: 1. the PR Miner and Code Diff Analyzer extract structured defect semantics from commits or pull requests, 2. the Orchestrator Agent synthesizes these signals into a unified defect pattern with trigger conditions, and 3. the Impact Analyzer checks downstream programs to determine whether the defect can be triggered. To improve accuracy and explainability, DepRadar integrates static analysis with DL-specific domain rules for defect reasoning and client side tracing. We evaluate DepRadar on 157 PRs and 70 commits across two representative DL libraries. It achieves 90% precision in defect identification and generates high quality structured fields (average field score 1.6). On 122 client programs, DepRadar identifies affected cases with 90% recall and 80% precision, substantially outperforming other baselines.

preprint2026arXiv

Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance

Fine-tuning safety-aligned large language models (LLMs) can substantially compromise their safety. Previous approaches require many safety samples or calibration sets, which not only incur significant computational overhead during realignment but also lead to noticeable degradation in model utility. Contrary to this belief, we show that safety alignment can be fully recovered with only a single safety example, without sacrificing utility and at minimal cost. Remarkably, this recovery is effective regardless of the number of harmful examples used in fine-tuning or the size of the underlying model, and convergence is achieved within just a few epochs. Furthermore, we uncover the low-rank structure of the safety gradient, which explains why such efficient correction is possible. We validate our findings across five safety-aligned LLMs and multiple datasets, demonstrating the generality of our approach.

preprint2026arXiv

SolContractEval: A Benchmark for Evaluating Contract-Level Solidity Code Generation

The rise of blockchain has brought smart contracts into mainstream use, creating a demand for smart contract generation tools. While large language models (LLMs) excel at generating code in general-purpose languages, their effectiveness on Solidity, the primary language for smart contracts, remains underexplored. Solidity constitutes only a small portion of typical LLM training data and differs from general-purpose languages in its version-sensitive syntax and limited flexibility. These factors raise concerns about the reliability of existing LLMs for Solidity code generation. Critically, existing evaluations, focused on isolated functions and synthetic inputs, fall short of assessing models' capabilities in real-world contract development. To bridge this gap, we introduce SolContractEval, the first contract-level benchmark for Solidity code generation. It comprises 124 tasks drawn from real on-chain contracts across nine major domains. Each task input, consisting of complete context dependencies, a structured contract framework, and a concise task prompt, is independently annotated and cross-validated by experienced developers. To enable precise and automated evaluation of functional correctness, we also develop a dynamic evaluation framework based on historical transaction replay. Building on SolContractEval, we perform a systematic evaluation of six mainstream LLMs. We find that Claude-3.7-Sonnet achieves the highest overall performance, though evaluated models underperform relative to their capabilities on class-level generation tasks in general-purpose programming languages. Second, current models perform better on tasks that follow standard patterns but struggle with complex logic and inter-contract dependencies. Finally, they exhibit limited understanding of Solidity-specific features and contextual dependencies.

preprint2026arXiv

Understanding and Preserving Safety in Fine-Tuned LLMs

Fine-tuning is an essential and pervasive functionality for applying large language models (LLMs) to downstream tasks. However, it has the potential to substantially degrade safety alignment, e.g., by greatly increasing susceptibility to jailbreak attacks, even when the fine-tuning data is entirely harmless. Despite garnering growing attention in defense efforts during the fine-tuning stage, existing methods struggle with a persistent safety-utility dilemma: emphasizing safety compromises task performance, whereas prioritizing utility typically requires deep fine-tuning that inevitably leads to steep safety declination. In this work, we address this dilemma by shedding new light on the geometric interaction between safety- and utility-oriented gradients in safety-aligned LLMs. Through systematic empirical analysis, we uncover three key insights: (I) safety gradients lie in a low-rank subspace, while utility gradients span a broader high-dimensional space; (II) these subspaces are often negatively correlated, causing directional conflicts during fine-tuning; and (III) the dominant safety direction can be efficiently estimated from a single sample. Building upon these novel insights, we propose safety-preserving fine-tuning (SPF), a lightweight approach that explicitly removes gradient components conflicting with the low-rank safety subspace. Theoretically, we show that SPF guarantees utility convergence while bounding safety drift. Empirically, SPF consistently maintains downstream task performance and recovers nearly all pre-trained safety alignment, even under adversarial fine-tuning scenarios. Furthermore, SPF exhibits robust resistance to both deep fine-tuning and dynamic jailbreak attacks. Together, our findings provide new mechanistic understanding and practical guidance toward always-aligned LLM fine-tuning.

preprint2025arXiv

Introduction to the Chinese Space Station Survey Telescope (CSST)

The Chinese Space Station Survey Telescope (CSST) is an upcoming Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxies and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.

preprint2022arXiv

\textsc{The Three Hundred} project: The \textsc{Gizmo-Simba} run

We introduce \textsc{Gizmo-Simba}, a new suite of galaxy cluster simulations within \textsc{The Three Hundred} project. \textsc{The Three Hundred} consists of zoom re-simulations of 324 clusters with $M_{200}\gtrsim 10^{14.8}M_\odot$ drawn from the MultiDark-Planck $N$-body simulation, run using several hydrodynamic and semi-analytic codes. The \textsc{Gizmo-Simba} suite adds a state-of-the-art galaxy formation model based on the highly successful {\sc Simba} simulation, mildly re-calibrated to match $z=0$ cluster stellar properties. Comparing to \textsc{The Three Hundred} zooms run with \textsc{Gadget-X}, we find intrinsic differences in the evolution of the stellar and gas mass fractions, BCG ages, and galaxy colour-magnitude diagrams, with \textsc{Gizmo-Simba} generally providing a good match to available data at $z \approx 0$. \textsc{Gizmo-Simba}'s unique black hole growth and feedback model yields agreement with the observed BH scaling relations at the intermediate-mass range and predicts a slightly different slope at high masses where few observations currently lie. \textsc{Gizmo-Simba} provides a new and novel platform to elucidate the co-evolution of galaxies, gas, and black holes within the densest cosmic environments.

preprint2022arXiv

An Extended Halo-based Group/Cluster finder: application to the DESI legacy imaging surveys DR8

We extend the halo-based group finder developed by \citet[][]{Yang2005a} to use data {\it simultaneously} with either photometric or spectroscopic redshifts. A mock galaxy redshift survey constructed from a high-resolution N-body simulation is used to evaluate the performance of this extended group finder. For galaxies with magnitude ${\rm z\le 21}$ and redshift $0<z\le 1.0$ in the DESI legacy imaging surveys (the Legacy Surveys), our group finder successfully identifies more than 60\% of the members in about $90\%$ of halos with mass $\ga 10^{12.5}\msunh$. Detected groups with mass $\ga 10^{12.0}\msunh$ have a purity (the fraction of true groups) greater than 90\%. The halo mass assigned to each group has an uncertainty of about 0.2 dex at the high mass end $\ga 10^{13.5}\msunh$ and 0.40 dex at the low mass end. Groups with more than 10 members have a redshift accuracy of $\sim 0.008$. We apply this group finder to the Legacy Surveys DR8 and find 5.2 Million groups with at least 3 members. About 387,000 of these groups have at least 10 members. The resulting catalog containing 3D coordinates, richness, halo masses, and total group luminosities, is made publicly available.

preprint2022arXiv

Cross-correlation of Planck CMB lensing with DESI galaxy groups

We measure the cross-correlation between galaxy groups constructed from DESI Legacy Imaging Survey DR8 and \emph{Planck} CMB lensing, over overlapping sky area of 16876 $\rm deg^2$. The detections are significant and consistent with the expected signal of the large-scale structure of the universe, over group samples of various redshift, mass, richness $N_{\rm g}$ and over various scale cuts. The overall S/N is 40 for a conservative sample with $N_{\rm g}\geq 5$, and increases to $50$ for the sample with $N_{\rm g}\geq 2$. Adopting the \emph{Planck} 2018 cosmology, we constrain the density bias of groups with $N_{\rm g}\geq 5$ as $b_{\rm g}=1.31\pm 0.10$, $2.22\pm 0.10$, $3.52\pm 0.20$ at $0.1<z\leq 0.33$, $0.33<z\leq 0.67$, $0.67<z\leq1$ respectively. The group catalog provides the estimation of group halo mass and therefore allows us to detect the dependence of bias on group mass with high significance. It also allows us to compare the measured bias with the theoretically predicted one using the estimated group mass. We find excellent agreement for the two high redshift bins. However, it is lower than the theory by $\sim 3σ$ for the lowest redshift bin. Another interesting finding is the significant impact of the thermal Sunyaev Zel&#39;dovich (tSZ). It contaminates the galaxy group-CMB lensing cross-correlation at $\sim 30\%$ level, and must be deprojected first in CMB lensing reconstruction.

preprint2022arXiv

Defect Identification, Categorization, and Repair: Better Together

Just-In-Time defect prediction (JIT-DP) models can identify defect-inducing commits at check-in time. Even though previous studies have achieved a great progress, these studies still have the following limitations: 1) useful information (e.g., semantic information and structure information) are not fully used; 2) existing work can only predict a commit as buggy one or clean one without more information about what type of defect it is; 3) a commit may involve changes in many files, which cause difficulty in locating the defect; 4) prior studies treat defect identification and defect repair as separate tasks, none aims to handle both tasks simultaneously. In this paper, to handle aforementioned limitations, we propose a comprehensive defect prediction and repair framework named CompDefect, which can identify whether a changed function (a more fine-grained level) is defect-prone, categorize the type of defect, and repair such a defect automatically if it falls into several scenarios, e.g., defects with single statement fixes, or those that match a small set of defect templates. Generally, the first two tasks in CompDefect are treated as a multiclass classification task, while the last one is treated as a sequence generation task. The whole input of CompDefect consists of three parts (exampled with positive functions): the clean version of a function (i.e., the version before defect introduced), the buggy version of a function and the fixed version of a function. In multiclass classification task, CompDefect categorizes the type of defect via multiclass classification with the information in both the clean version and the buggy version. In code sequence generation task, CompDefect repairs the defect once identified or keeps it unchanged.

preprint2022arXiv

ELUCID VII: Using Constrained Hydro Simulations to Explore the Gas Component of the Cosmic Web

Using reconstructed initial conditions in the SDSS survey volume, we carry out constrained hydrodynamic simulations in three regions representing different types of the cosmic web: the Coma cluster of galaxies; the SDSS great wall; and a large low-density region at $z\sim 0.05$. These simulations, which include star formation and stellar feedback but no AGN formation and feedback, are used to investigate the properties and evolution of intergalactic and intra-cluster media. About half of the warm-hot intergalactic gas is associated with filaments in the local cosmic web. Gas in the outskirts of massive filaments and halos can be heated significantly by accretion shocks generated by mergers of filaments and halos, respectively, and there is a tight correlation between gas temperature and the strength of the local tidal field. The simulations also predict some discontinuities associated with shock fronts and contact edges, which can be tested using observations of the thermal SZ effect and X-rays. A large fraction of the sky is covered by Ly$α$ and OVI absorption systems, and most of the OVI systems and low-column density HI systems are associated with filaments in the cosmic web. The constrained simulations, which follow the formation and heating history of the observed cosmic web, provide an important avenue to interpret observational data. With full information about the origin and location of the cosmic gas to be observed, such simulations can also be used to develop observational strategies.

preprint2022arXiv

Elucidating Galaxy Assembly Bias in SDSS

We investigate the level of galaxy assembly bias in the Sloan Digital Sky Survey (SDSS) main galaxy sample using ELUCID, a state-of-the-art constrained simulation that accurately reconstructed the initial density perturbations within the SDSS volume. On top of the ELUCID haloes, we develop an extended HOD model that includes the assembly bias of central and satellite galaxies, parameterized as $\mathcal{Q}_\mathrm{cen}$ and $\mathcal{Q}_\mathrm{sat}$, respectively, to predict a suite of one- and two-point observables. In particular, our fiducial constraint employs the probability distribution of the galaxy number counts measured on $8\,\mathrm{Mpc}\,h^{-1}$ scales $N_8^g$ and the projected cross-correlation functions of quintiles of galaxies selected by $N_8^g$ with our entire galaxy sample. We perform extensive tests of the efficacy of our method by fitting the same observables to mock data using both constrained and non-constrained simulations. We discover that in many cases the level of cosmic variance between the two simulations can produce biased constraints that lead to an erroneous detection of galaxy assembly bias if the non-constrained simulation is used. When applying our method to the SDSS data, the ELUCID reconstruction effectively removes an otherwise strong degeneracy between cosmic variance and galaxy assembly bias in SDSS, enabling us to derive an accurate and stringent constraint on the latter. Our fiducial ELUCID constraint, for galaxies above a stellar mass threshold $M_*{=}10^{10.2}\,h^{-2}\,M_\odot$, is $\mathcal{Q}_\mathrm{cen}{=}{-}0.09\pm{0.05}$ and $\mathcal{Q}_\mathrm{sat}{=}0.09\pm{0.10}$, indicating no evidence for a significant~($>2σ$) galaxy assembly bias in the local Universe probed by SDSS. Finally, our method provides a promising path to the robust modelling of the galaxy-halo connection within future surveys like DESI and PFS.

preprint2022arXiv

First measurement of the characteristic depletion radius of dark matter haloes from weak lensing

We use weak lensing observations to make the first measurement of the characteristic depletion radius, one of the three radii that characterize the region where matter is being depleted by growing haloes. The lenses are taken from the halo catalog produced by the extended halo-based group/cluster finder applied to DESI Legacy Imaging Surveys DR9, while the sources are extracted from the DECaLS DR8 imaging data with the Fourier_Quad pipeline. We study halo masses $12 < \log ( M_{\rm grp} ~[{\rm M_{\odot}}/h] ) \leq 15.3$ within redshifts $0.2 \leq z \leq 0.3$. The virial and splashback radii are also measured and used to test the original findings on the depletion region. When binning haloes by mass, we find consistency between most of our measurements and predictions from the CosmicGrowth simulation, with exceptions to the lowest mass bins. The characteristic depletion radius is found to be roughly $2.5$ times the virial radius and $1.7 - 3$ times the splashback radius, in line with an approximately universal outer density profile, and the average enclosed density within the characteristic depletion radius is found to be roughly $29$ times the mean matter density of the Universe in our sample. When binning haloes by both mass and a proxy for halo concentration, we do not detect a significant variation of the depletion radius with concentration, on which the simulation prediction is also sensitive to the choice of concentration proxy. We also confirm that the measured splashback radius varies with concentration differently from simulation predictions.

preprint2022arXiv

Groups and protocluster candidates in the CLAUDS and HSC-SSP joint deep surveys

Using the extended halo-based group finder developed by Yang et al. (2021), which is able to deal with galaxies via spectroscopic and photometric redshifts simultaneously, we construct galaxy group and candidate protocluster catalogs in a wide redshift range ($0 < z < 6$) from the joint CFHT Large Area $U$-band Deep Survey (CLAUDS) and Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) deep data set. Based on a selection of 5,607,052 galaxies with $i$-band magnitude $m_{i} < 26$ and a sky coverage of $34.41\ {\rm deg}^2$, we identify a total of 2,232,134 groups, within which 402,947 groups have at least three member galaxies. We have visually checked and discussed the general properties of those richest groups at redshift $z>2.0$. By checking the galaxy number distributions within a $5-7\ h^{-1}\mathrm{Mpc}$ projected separation and a redshift difference $Δz \le 0.1$ around those richest groups at redshift $z>2$, we identified a list of 761, 343 and 43 protocluster candidates in the redshift bins $2\leq z<3$, $3\leq z<4$ and $z \geq 4$, respectively. In general, these catalogs of galaxy groups and protocluster candidates will provide useful environmental information in probing galaxy evolution along the cosmic time.

preprint2022arXiv

Massive Star-Forming Galaxies Have Converted Most of Their Halo Gas into Stars

In the local Universe, the efficiency for converting baryonic gas into stars is very low. In dark matter halos where galaxies form and evolve, the average efficiency varies with galaxy stellar mass and has a maximum of about twenty percent for Milky-Way-like galaxies. The low efficiency at higher mass is believed to be produced by some quenching processes, such as the feedback from active galactic nuclei. We perform an analysis of weak lensing and satellite kinematics for SDSS central galaxies. Our results reveal that the efficiency is much higher, more than sixty percent, for a large population of massive star-forming galaxies around $10^{11}M_{\odot}$. This suggests that these galaxies acquired most of the gas in their halos and converted it into stars without being affected significantly by quenching processes. This population of galaxies is not reproduced in current galaxy formation models, indicating that our understanding of galaxy formation is incomplete. The implications of our results on circumgalactic media, star formation quenching and disc galaxy rotation curves are discussed. We also examine systematic uncertainties in halo-mass and stellar-mass measurements that might influence our results.

preprint2022arXiv

The Universal Specific Merger Rate of Dark Matter Halos

We employ a set of high resolution N-body simulations to study the merger rate of dark matter halos. We define a specific merger rate by normalizing the average number of mergers per halo with the logarithmic mass growth change of the hosts at the time of accretion. Based on the simulation results, we find that this specific merger rate, $\mathrm{d}N_{\mathrm{merge}}(ξ|M,z)/\mathrm{d}ξ/\mathrm{d}\log M(z)$, has a universal form, which is only a function of the mass ratio of merging halo pairs, $ξ$, and does not depend on the host halo mass, $M$, or redshift, $z$, over a wide range of masses ($10^{12}\lesssim M \lesssim10^{14}\,M_\odot/h$) and merger ratios ($ξ\ge 1e-2$). We further test with simulations of different $Ω_m$ and $σ_8$, and get the same specific merger rate. The universality of the specific merger rate shows that halos in the universe are built up self-similarly, with a universal composition in the mass contributions and an absolute merger rate that grows in proportion to the halo mass growth. As a result, the absolute merger rate relates with redshift and cosmology only through the halo mass variable, whose evolution can be readily obtained from the universal mass accretion history (MAH) model of \cite{2009ApJ...707..354Z}. Lastly, we show that this universal specific merger rate immediately predicts an universal un-evolved subhalo mass function that is independent on the redshift, MAH or the final halo mass, and vice versa.

preprint2022arXiv

What to expect from dynamical modelling of cluster haloes II. Investigating dynamical state indicators with Random Forest

We investigate the importances of various dynamical features in predicting the dynamical state (DS) of galaxy clusters, based on the Random Forest (RF) machine learning approach. We use a large sample of galaxy clusters from the Three Hundred Project of hydrodynamical zoomed-in simulations, and construct dynamical features from the raw data as well as from the corresponding mock maps in the optical, X-ray, and Sunyaev-Zel&#39;dovich (SZ) channels. Instead of relying on the impurity based feature importance of the RF algorithm, we directly use the out-of-bag (OOB) scores to evaluate the importances of individual features and different feature combinations. Among all the features studied, we find the virial ratio, $η$, to be the most important single feature. The features calculated directly from the simulations and in 3-dimensions carry more information on the DS than those constructed from the mock maps. Compared with the features based on X-ray or SZ maps, features related to the centroid positions are more important. Despite the large number of investigated features, a combination of up to three features of different types can already saturate the score of the prediction. Lastly, we show that the most sensitive feature $η$ is strongly correlated with the well-known half-mass bias in dynamical modelling. Without a selection in DS, cluster halos have an asymmetric distribution in $η$, corresponding to an overall positive half-mass bias. Our work provides a quantitative reference for selecting the best features to discriminate the DS of galaxy clusters in both simulations and observations.

preprint2021arXiv

The Color Gradients of the Globular Cluster Systems in M87 and M49

Combining data from the ACS Virgo Cluster Survey (ACSVCS) and the Next Generation Virgo cluster Survey (NGVS), we extend previous studies of color gradients of the globular cluster (GC) systems of the two most massive galaxies in the Virgo cluster, M87 and M49, to radii of $\sim 15~R_e$ ($\sim 200$ kpc for M87 and $\sim 250$ kpc for M49). We find significant negative color gradients, i.e., becoming bluer with increasing distance, out to these large radii. The gradients are driven mainly by the outwards decrease of the ratio of red to blue GC numbers. The color gradients are also detected out to $\sim 15~R_e$ in the red and blue sub-populations of GCs taken separately. In addition, we find a negative color gradient when we consider the satellite low-mass elliptical galaxies as a system, i.e., the satellite galaxies closer to the center of the host galaxy usually have redder color indices, both for their stars and GCs. According to the &#34;two phase&#34; formation scenario of massive early-type galaxies, the host galaxy accretes stars and GCs from low-mass satellite galaxies in the second phase. So the accreted GC system naturally inherits the negative color gradient present in the satellite population. This can explain why the color gradient of the GC system can still be observed at large radii after multiple minor mergers.

preprint2020arXiv

Detection of missing baryons in galaxy groups with kinetic Sunyaev-Zel&#39;dovich effect

We present the detection of the kinetic Sunyaev-Zel&#39;dovich effect (kSZE) signals from groups of galaxies as a function of halo mass down to $\log (M_{500}/{\rm M_\odot}) \sim 12.3$, using the {\it Planck} CMB maps and stacking about $40,000$ galaxy systems with known positions, halo masses, and peculiar velocities. The signals from groups of different mass are constrained simultaneously to take care of projection effects of nearby halos. The total kSZE flux within halos estimated implies that the gas fraction in halos is about the universal baryon fraction, even in low-mass halos, indicating that the `missing baryons&#39; are found. Various tests performed show that our results are robust against systematic effects, such as contamination by infrared/radio sources and background variations, beam-size effects and contributions from halo exteriors. Combined with the thermal Sunyaev-Zel&#39;dovich effect, our results indicate that the `missing baryons&#39; associated with galaxy groups are contained in warm-hot media with temperatures between $10^5$ and $10^6\,{\rm K}$.

preprint2020arXiv

Detection of missing baryons in galaxy groups with kinetic Sunyaev-Zel&#39;dovich effect

We present the detection of the kinetic Sunyaev-Zel&#39;dovich effect (kSZE) signals from groups of galaxies as a function of halo mass down to $\log (M_{500}/{\rm M_\odot}) \sim 12.3$, using the {\it Planck} CMB maps and stacking about $40,000$ galaxy systems with known positions, halo masses, and peculiar velocities. The signals from groups of different mass are constrained simultaneously to take care of projection effects of nearby halos. The total kSZE flux within halos estimated implies that the gas fraction in halos is about the universal baryon fraction, even in low-mass halos, indicating that the `missing baryons&#39; are found. Various tests performed show that our results are robust against systematic effects, such as contamination by infrared/radio sources and background variations, beam-size effects and contributions from halo exteriors. Combined with the thermal Sunyaev-Zel&#39;dovich effect, our results indicate that the `missing baryons&#39; associated with galaxy groups are contained in warm-hot media with temperatures between $10^5$ and $10^6\,{\rm K}$.

preprint2020arXiv

Observing the Effects of Galaxy Interactions on the Circumgalactic Medium

We continue our empirical study of the emission line flux originating in the cool ($T\sim10^4$ K) gas that populates the halos of galaxies and their environments. Specifically, we present results obtained for a sample of galaxy pairs with a range of projected separations, {\bf $10 < {S_p/\rm kpc} < 200$}, and mass ratios $<$ 1:5, intersected by 5,443 SDSS lines of sight at projected radii of 10 to 50 kpc from either or both of the two galaxies. We find significant enhancement in H$α$ emission and a moderate enhancement in [N {\small II}]6583 emission for low mass pairs (mean stellar mass per galaxy, $\overline{\rm M}_*, <10^{10.4} {\rm M}_\odot$) relative to the results from a control sample. This enhanced H$α$ emission comes almost entirely from sight lines located between the galaxies, consistent with a short-term, interaction-driven origin for the enhancement. We find no enhancement in H$α$ emission, but significant enhancement in [N {\small II}]6583 emission for high mass ($\overline{\rm M}_* >10^{10.4}{\rm M}_\odot$) pairs. Furthermore, we find a dependence of the emission line properties on the galaxy pair mass ratio such that those with a mass ratio below 1:2.5 have enhanced [N {\small II}]6583 and those with a mass ratio between 1:2.5 and 1:5 do not. In all cases, departures from the control sample are only detected for close pairs ($S_p <$ 100 kpc). Attributing an elevated [N {\small II}]6583/H$α$ ratio to shocks, we infer that shocks play a role in determining the CGM properties for close pairs that are among the more massive and have mass ratios closer to 1:1.

preprint2020arXiv

Populating HI gas in dark matter halos: I. method

We combine data from the Sloan Digital Sky Survey (SDSS) and the Arecibo Legacy Fast ALFA Survey (ALFALFA) to establish an empirical model for the HI gas content within dark matter halos. A cross-match between our SDSS DR7 galaxy group sample and the ALFALFA HI sources provides a catalog of 16,520 HI-galaxy pairs within 14,270 galaxy groups (halos). Using these matched pairs, we model the HI gas mass distributions within halos using two components: 1) {\it in situ} galaxy relations that involve the HI masses, colors $({\rm g-r})$ and stellar masses 2) an {\it ex situ} dependence of the HI mass on the halo mass/environment. We find that if we solely use galaxy associated scaling relations to predict the HI gas distribution (solely component 1), the number of HI detections is significantly over-predicted with respect the ALFALFA observations. We introduce a concept for the survival of the HI masses/members within halos of different masses labelled as the `efficiency&#39; factor, in order to describe the probability that a halo has in retaining its HI detections. Taking the above consideration into account we construct a `halo based HI mass model&#39; which does not only predict the HI masses of galaxies, but also yields similar number, stellar, halo mass and satellite fraction distributions to the HI detections retrieved from observational data.

preprint2020arXiv

Predictive Models in Software Engineering: Challenges and Opportunities

Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-preformed studies and well-desigeworks in various research domains, including software requirements, software design and development, testing and debugging and software maintenance. This paper is a first attempt to systematically organize knowledge in this area by surveying a body of 139 papers on predictive models. We describe the key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results. Based on our findings, we also propose a set of current challenges that still need to be addressed in future work and provide a proposed research road map for these opportunities.

preprint2020arXiv

Probing Primordial Chirality with Galaxy Spins

Chiral symmetry is maximally violated in weak interactions, and such microscopic asymmetries in the early Universe might leave observable imprints on astrophysical scales without violating the cosmological principle. In this Letter, we propose a helicity measurement to detect primordial chiral violation. We point out that observations of halo-galaxy angular momentum directions (spins), which are frozen in during the galaxy formation process, provide a fossil chiral observable. From the clustering mode of large scale structure of the Universe, we construct a spin mode in Lagrangian space and show in simulations that it is a good probe of halo-galaxy spins. In standard model, a strong symmetric correlation between the left and right helical components of this spin mode and galaxy spins is expected. Measurements of these correlations will be sensitive to chiral breaking, providing a direct test of chiral symmetry breaking in the early Universe.

preprint2020arXiv

Relating the structure of dark matter halos to their assembly and environment

We use a large $N$-body simulation to study the relation of the structural properties of dark matter halos to their assembly history and environment. The complexity of individual halo assembly histories can be well described by a small number of principal components (PCs), which, compared to formation times, provide a more complete description of halo assembly histories and have a stronger correlation with halo structural properties. Using decision trees built with the random ensemble method, we find that about $60\%$, $10\%$, and $20\%$ of the variances in halo concentration, axis ratio, and spin, respectively, can be explained by combining four dominating predictors: the first PC of the assembly history, halo mass, and two environment parameters. Halo concentration is dominated by halo assembly. The local environment is found to be important for the axis ratio and spin but is degenerate with halo assembly. The small percentages of the variance in the axis ratio and spin that are explained by known assembly and environmental factors suggest that the variance is produced by many nuanced factors and should be modeled as such. The relations between halo intrinsic properties and environment are weak compared to their variances, with the anisotropy of the local tidal field having the strongest correlation with halo properties. Our method of dimension reduction and regression can help simplify the characterization of the halo population and clarify the degeneracy among halo properties.

preprint2020arXiv

The Breakdown Scale of HI Bias Linearity

The 21 cm intensity mapping experiments promise to obtain the large-scale distribution of HI gas at the post-reionization epoch. In order to reveal the underlying matter density fluctuations from the HI mapping, it is important to understand how HI gas traces the matter density distribution. Both nonlinear halo clustering and nonlinear effects modulating HI gas in halos may determine the scale below which the HI bias deviates from linearity. We employ three approaches to generate the mock HI density from a large-scale N-body simulation at low redshifts, and demonstrate that the assumption of HI linearity is valid at the scale corresponding to the first peak of baryon acoustic oscillations, but breaks down at $k \gtrsim 0.1\,h\, {\rm Mpc}^{-1}$. The nonlinear effects of halo clustering and HI content modulation counteract each other at small scales, and their competition results in a model-dependent &#34;sweet-spot&#34; redshift near $z$=1 where the HI bias is scale-independent down to small scales. We also find that the linear HI bias scales approximately linearly with redshift for $z\le 3$.

preprint2020arXiv

The intrinsic SFRF and sSFRF of galaxies: comparing SDSS observation with IllustrisTNG simulation

The star formation rate function (SFRF) and specific star formation rate function (sSFRF) from the observation are impacted by the Eddington bias, due to the uncertainties on the estimated SFR. We develop a novel method to correct the Eddington bias and obtained the intrinsic SFRF and sSFRF from the Sloan Digital Sky Survey Data Release 7. The intrinsic SFRF is in good agreement with measurements from previous data in the literature that relied on UV SFRs but its high star-forming end is slightly lower than those IR and radio tracers. We demonstrate that the intrinsic sSFRF from SDSS has a bi-modal form with the one peak found at ${\rm sSFR \sim 10^{-9.7} yr^{-1}}$ representing the star-forming objects while the other peak is found at ${\rm sSFR \sim 10^{-12} yr^{-1}}$ representing the quenched population. Furthermore, we compare our observations with the predictions from the IllustrisTNG and Illustris simulations and show that the ``TNG&#39;&#39; model performs much better than its predecessor. However, we show that the simulated SFRF and cosmic star formation density (CSFRD) of TNG simulations are highly dependent on resolution, reflecting the limitations of the model and today state-of-the-art simulations. We demonstrate that the bi-modal, two peaked sSFRF implied by the SDSS observations does not appear in TNG regardless of the adopted box-size or resolution. This tension reflects the need for inclusion of an additional efficient quenching mechanism to the TNG model.

preprint2020arXiv

The parameter-free Finger-Of-God model and its application to 21cm intensity mapping

Using the galaxy catalog built from ELUCID N-body simulation and the semi-analytical galaxy formation model, we have built a mock HI intensity mapping map. We have implemented the Finger-of-God (FoG) effect in the map by considering the galaxy HI gas velocity dispersion. By comparing the HI power spectrum in the redshift space with the measurement from IllustrisTNG simulation, we have found that such FoG effect can explain the discrepancy between current mock map built from N-body simulation and Illustris TNG simulation. Then we built a parameter-free FoG model and a shot-noise model to calculate the HI power spectrum. We found that our model can accurately fit both the monopole and quadrupole moments of the HI matter power spectrum. Our method of building the mock HI intensity map and the parameter-free FoG model will be widely useful for the up-coming 21cm intensity mapping experiments, such as CHIME, Tianlai, BINGO, FAST and SKA. It is also crucial for us to study the non-linear effects in 21cm intensity mapping.

preprint2020arXiv

The Three Hundred Project: the stellar and gas profiles

Using the catalogues of galaxy clusters from The Three Hundred project, modelled with both hydrodynamic simulations, (Gadget-X and Gadget-MUSIC), and semi-analytic models (SAMs), we study the scatter and self-similarity of the profiles and distributions of the baryonic components of the clusters: the stellar and gas mass, metallicity, the stellar age, gas temperature, and the (specific) star formation rate. Through comparisons with observational results, we find that the shape and the scatter of the gas density profiles matches well the observed trends including the reduced scatter at large radii which is a signature of self-similarity suggested in previous studies. One of our simulated sets, Gadget-X, reproduces well the shape of the observed temperature profile, while Gadget-MUSIC has a higher and flatter profile in the cluster centre and a lower and steeper profile at large radii. The gas metallicity profiles from both simulation sets, despite following the observed trend, have a relatively lower normalisation. The cumulative stellar density profiles from SAMs are in better agreement with the observed result than both hydrodynamic simulations which show relatively higher profiles. The scatter in these physical profiles, especially in the cluster centre region, shows a dependence on the cluster dynamical state and on the cool-core/non-cool-core dichotomy. The stellar age, metallicity and (s)SFR show very large scatter, which are then presented in 2D maps. We also do not find any clear radial dependence of these properties. However, the brightest central galaxies have distinguishable features compared to the properties of the satellite galaxies.

preprint2020arXiv

UV & U-band luminosity functions from CLAUDS and HSC-SSP -- I. Using four million galaxies to simultaneously constrain the very faint and bright regimes to $z \sim 3$

We constrain the rest-frame FUV (1546Å), NUV (2345Å) and U-band (3690Å) luminosity functions (LFs) and luminosity densities (LDs) with unprecedented precision from $z\sim0.2$ to $z\sim3$ (FUV, NUV) and $z\sim2$ (U-band). Our sample of over 4.3 million galaxies, selected from the CFHT Large Area $U$-band Deep Survey (CLAUDS) and HyperSuprime-Cam Subaru Strategic Program (HSC-SSP) data lets us probe the very faint regime (down to $M_\mathrm{FUV},M_\mathrm{NUV},M_\mathrm{U} \simeq -15$ at low redshift) while simultaneously detecting very rare galaxies at the bright end down to comoving densities $<10^{-5}$ Mpc$^{-3}$. Our FUV and NUV LFs are well fitted by single Schechter functions, with faint-end slopes that are very stable up to $z\sim2$. We confirm, but self-consistently and with much better precision than previous studies, that the LDs at all three wavelengths increase rapidly with lookback time to $z\sim1$, and then much more slowly at $1<z<2$--$3$. Evolution of the FUV and NUV LFs and LDs at $z<1$ is driven almost entirely by the fading of the characteristic magnitude, $M^\star_{UV}$, while at $z>1$ it is due to the evolution of both $M^\star_{UV}$ and the characteristic number density $ϕ^\star_{UV}$. In contrast, the U-band LF has an excess of faint galaxies and is fitted with a double-Schechter form; $M^\star_\mathrm{U}$, both $ϕ^\star_\mathrm{U}$ components, and the bright-end slope evolve throughout $0.2<z<2$, while the faint-end slope is constant over at least the measurable $0.05<z<0.6$. We present tables of our Schechter parameters and LD measurements that can be used for testing theoretical galaxy evolution models and forecasting future observations.

preprint2019arXiv

The Dearth of Difference between Central and Satellite Galaxies III. Environmental Dependence of Mass-Size and Mass-Structure Relations

As demonstrated in Paper I, the quenching properties of central and satellite galaxies are quite similar as long as both stellar mass and halo mass are controlled. Here we extend the analysis to the size and bulge-to-total light ratio (B/T) of galaxies. In general central galaxies have size-stellar mass and B/T-stellar mass relations different from satellites. However, the differences are eliminated when halo mass is controlled. We also study the dependence of size and B/T on halo-centric distance and find a transitional stellar mass (M$_{*,t}$) at given halo mass (M$_h$), which is about one fifth of the mass of the central galaxies in halos of mass M$_h$. The transitional stellar masses for size, B/T and quenched fraction are similar over the whole halo mass range, suggesting a connection between the quenching of star formation and the structural evolution of galaxies. Our analysis further suggests that the classification based on the transitional stellar mass is more fundamental than the central-satellite dichotomy, and provide a more reliable way to understand the environmental effects on galaxy properties. We compare the observational results with the hydro-dynamical simulation, EAGLE and the semi-analytic model, L-GALAXIES. The EAGLE simulation successfully reproduces the similarities of size for centrals and satellites and even M$_{*,t}$, while L-GALAXIES fails to recover the observational results.

preprint2019arXiv

Toward accurate measurement of property-dependent galaxy clustering I. Comparison of the Vmax method and the &#34;shuffled&#34; method

Galaxy clustering provides insightful clues to our understanding of galaxy formation and evolution, as well as the universe. The redshift assignment for the random sample is one of the key steps to measure the galaxy clustering accurately. In this paper, by virtue of the mock galaxy catalogs, we investigate the effect of two redshift assignment methods on the measurement of galaxy two-point correlation functions (hereafter 2PCFs), the Vmax method and the &#34;shuffled&#34; method. We found that the shuffled method significantly underestimates both of the projected 2PCFs and the two-dimensional 2PCFs in redshift space. While the Vmax method does not show any notable bias on the 2PCFs for volume-limited samples. For flux-limited samples, the bias produced by the Vmax method is less than half of the shuffled method on large scales. Therefore, we strongly recommend the Vmax method to assign redshifts to random samples in the future galaxy clustering analysis.