Source author record

Youcai Zhang

Youcai Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.CO astro-ph.GA Computer Vision Machine Learning

Catalog footprint

What is connected

18works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Extended Halo-based Group/Cluster finder: application to the DESI legacy imaging surveys DR8

We extend the halo-based group finder developed by \citet[][]{Yang2005a} to use data {\it simultaneously} with either photometric or spectroscopic redshifts. A mock galaxy redshift survey constructed from a high-resolution N-body simulation is used to evaluate the performance of this extended group finder. For galaxies with magnitude ${\rm z\le 21}$ and redshift $0<z\le 1.0$ in the DESI legacy imaging surveys (the Legacy Surveys), our group finder successfully identifies more than 60\% of the members in about $90\%$ of halos with mass $\ga 10^{12.5}\msunh$. Detected groups with mass $\ga 10^{12.0}\msunh$ have a purity (the fraction of true groups) greater than 90\%. The halo mass assigned to each group has an uncertainty of about 0.2 dex at the high mass end $\ga 10^{13.5}\msunh$ and 0.40 dex at the low mass end. Groups with more than 10 members have a redshift accuracy of $\sim 0.008$. We apply this group finder to the Legacy Surveys DR8 and find 5.2 Million groups with at least 3 members. About 387,000 of these groups have at least 10 members. The resulting catalog containing 3D coordinates, richness, halo masses, and total group luminosities, is made publicly available.

preprint2022arXiv

Elucidating Galaxy Assembly Bias in SDSS

We investigate the level of galaxy assembly bias in the Sloan Digital Sky Survey (SDSS) main galaxy sample using ELUCID, a state-of-the-art constrained simulation that accurately reconstructed the initial density perturbations within the SDSS volume. On top of the ELUCID haloes, we develop an extended HOD model that includes the assembly bias of central and satellite galaxies, parameterized as $\mathcal{Q}_\mathrm{cen}$ and $\mathcal{Q}_\mathrm{sat}$, respectively, to predict a suite of one- and two-point observables. In particular, our fiducial constraint employs the probability distribution of the galaxy number counts measured on $8\,\mathrm{Mpc}\,h^{-1}$ scales $N_8^g$ and the projected cross-correlation functions of quintiles of galaxies selected by $N_8^g$ with our entire galaxy sample. We perform extensive tests of the efficacy of our method by fitting the same observables to mock data using both constrained and non-constrained simulations. We discover that in many cases the level of cosmic variance between the two simulations can produce biased constraints that lead to an erroneous detection of galaxy assembly bias if the non-constrained simulation is used. When applying our method to the SDSS data, the ELUCID reconstruction effectively removes an otherwise strong degeneracy between cosmic variance and galaxy assembly bias in SDSS, enabling us to derive an accurate and stringent constraint on the latter. Our fiducial ELUCID constraint, for galaxies above a stellar mass threshold $M_*{=}10^{10.2}\,h^{-2}\,M_\odot$, is $\mathcal{Q}_\mathrm{cen}{=}{-}0.09\pm{0.05}$ and $\mathcal{Q}_\mathrm{sat}{=}0.09\pm{0.10}$, indicating no evidence for a significant~($>2σ$) galaxy assembly bias in the local Universe probed by SDSS. Finally, our method provides a promising path to the robust modelling of the galaxy-halo connection within future surveys like DESI and PFS.

preprint2022arXiv

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

Vision-Language Pre-training (VLP) with large-scale image-text pairs has demonstrated superior performance in various fields. However, the image-text pairs co-occurrent on the Internet typically lack explicit alignment information, which is suboptimal for VLP. Existing methods proposed to adopt an off-the-shelf object detector to utilize additional image tag information. However, the object detector is time-consuming and can only identify the pre-defined object categories, limiting the model capacity. Inspired by the observation that the texts incorporate incomplete fine-grained image information, we introduce IDEA, which stands for increasing text diversity via online multi-label recognition for VLP. IDEA shows that multi-label learning with image tags extracted from the texts can be jointly optimized during VLP. Moreover, IDEA can identify valuable image tags online to provide more explicit textual supervision. Comprehensive experiments demonstrate that IDEA can significantly boost the performance on multiple downstream datasets with a small extra computational cost.

preprint2022arXiv

Towards Communication-Efficient and Privacy-Preserving Federated Representation Learning

This paper investigates the feasibility of federated representation learning under the constraints of communication cost and privacy protection. Existing works either conduct annotation-guided local training which requires frequent communication or aggregates the client models via weight averaging which has potential risks of privacy exposure. To tackle the above problems, we first identify that self-supervised contrastive local training is robust against the non-identically distributed data, which provides the feasibility of longer local training and thus reduces the communication cost. Then based on the aforementioned robustness, we propose a novel Federated representation Learning framework with Ensemble Similarity Distillation~(FLESD) that utilizes this robustness. At each round of communication, the server first gathers a fraction of the clients' inferred similarity matrices on a public dataset. Then it ensembles the similarity matrices and train the global model via similarity distillation. We verify the effectiveness of FLESD by a series of empirical experiments and show that, despite stricter constraints, it achieves comparable results under multiple settings on multiple datasets.

preprint2020arXiv

Prime-Aware Adaptive Distillation

Knowledge distillation(KD) aims to improve the performance of a student network by mimicing the knowledge from a powerful teacher network. Existing methods focus on studying what knowledge should be transferred and treat all samples equally during training. This paper introduces the adaptive sample weighting to KD. We discover that previous effective hard mining methods are not appropriate for distillation. Furthermore, we propose Prime-Aware Adaptive Distillation (PAD) by the incorporation of uncertainty learning. PAD perceives the prime samples in distillation and then emphasizes their effect adaptively. PAD is fundamentally different from and would refine existing methods with the innovative view of unequal training. For this reason, PAD is versatile and has been applied in various tasks including classification, metric learning, and object detection. With ten teacher-student combinations on six datasets, PAD promotes the performance of existing distillation methods and outperforms recent state-of-the-art methods.

preprint2020arXiv

Relating the structure of dark matter halos to their assembly and environment

We use a large $N$-body simulation to study the relation of the structural properties of dark matter halos to their assembly history and environment. The complexity of individual halo assembly histories can be well described by a small number of principal components (PCs), which, compared to formation times, provide a more complete description of halo assembly histories and have a stronger correlation with halo structural properties. Using decision trees built with the random ensemble method, we find that about $60\%$, $10\%$, and $20\%$ of the variances in halo concentration, axis ratio, and spin, respectively, can be explained by combining four dominating predictors: the first PC of the assembly history, halo mass, and two environment parameters. Halo concentration is dominated by halo assembly. The local environment is found to be important for the axis ratio and spin but is degenerate with halo assembly. The small percentages of the variance in the axis ratio and spin that are explained by known assembly and environmental factors suggest that the variance is produced by many nuanced factors and should be modeled as such. The relations between halo intrinsic properties and environment are weak compared to their variances, with the anisotropy of the local tidal field having the strongest correlation with halo properties. Our method of dimension reduction and regression can help simplify the characterization of the halo population and clarify the degeneracy among halo properties.

preprint2016arXiv

An empirical model to form and evolve galaxies in dark matter halos

Based on the star formation histories (SFH) of galaxies in halos of different masses, we develop an empirical model to grow galaxies in dark mattet halos. This model has very few ingredients, any of which can be associated to observational data and thus be efficiently assessed. By applying this model to a very high resolution cosmological $N$-body simulation, we predict a number of galaxy properties that are a very good match to relevant observational data. Namely, for both centrals and satellites, the galaxy stellar mass function (SMF) up to redshift $z\simeq4$ and the conditional stellar mass functions (CSMF) in the local universe are in good agreement with observations. In addition, the 2-point correlation is well predicted in the different stellar mass ranges explored by our model. Furthermore, after applying stellar population synthesis models to our stellar composition as a function of redshift, we find that the luminosity functions in $^{0.1}u$, $^{0.1}g$, $^{0.1}r$, $^{0.1}i$ and $^{0.1}z$ bands agree quite well with the SDSS observational results down to an absolute magnitude at about -17.0. The SDSS conditional luminosity functions (CLF) itself is predicted well. Finally, the cold gas is derived from the star formation rate (SFR) to predict the HI gas mass within each mock galaxy. We find a remarkably good match to observed HI-to-stellar mass ratios. These features ensure that such galaxy/gas catalogs can be used to generate reliable mock redshift surveys.

preprint2016arXiv

ELUCID - Exploring the Local Universe with reConstructed Initial Density field III: Constrained Simulation in the SDSS Volume

A method we developed recently for the reconstruction of the initial density field in the nearby Universe is applied to the Sloan Digital Sky Survey Data Release 7. A high-resolution N-body constrained simulation (CS) of the reconstructed initial condition, with $3072^3$ particles evolved in a 500 Mpc/h box, is carried out and analyzed in terms of the statistical properties of the final density field and its relation with the distribution of SDSS galaxies. We find that the statistical properties of the cosmic web and the halo populations are accurately reproduced in the CS. The galaxy density field is strongly correlated with the CS density field, with a bias that depend on both galaxy luminosity and color. Our further investigations show that the CS provides robust quantities describing the environments within which the observed galaxies and galaxy systems reside. Cosmic variance is greatly reduced in the CS so that the statistical uncertainties can be controlled effectively even for samples of small volumes.

preprint2016arXiv

Galaxy groups in the 2MASS Redshift Survey

A galaxy group catalog is constructed from the 2MASS Redshift Survey (2MRS) with the use of a halo-based group finder. The halo mass associated with a group is estimated using a `GAP' method based on the luminosity of the central galaxy and its gap with other member galaxies. Tests using mock samples shows that this method is reliable, particularly for poor systems containing only a few members. On average 80% of all the groups have completeness >0.8, and about 65% of the groups have zero contamination. Halo masses are estimated with a typical uncertainty $\sim 0.35\,{\rm dex}$. The application of the group finder to the 2MRS gives 29,904 groups from a total of 43,246 galaxies at $z \leq 0.08$, with 5,286 groups having two or more members. Some basic properties of this group catalog is presented, and comparisons are made with other groups catalogs in overlap regions. With a depth to $z\sim 0.08$ and uniformly covering about 91% of the whole sky, this group catalog provides a useful data base to study galaxies in the local cosmic web, and to reconstruct the mass distribution in the local Universe.

preprint2016arXiv

Mapping the real space distributions of galaxies in SDSS DR7: I. Two Point Correlation Functions

Using a method to correct redshift space distortion (RSD) for individual galaxies, we mapped the real space distributions of galaxies in the Sloan Digital Sky Survey (SDSS) Data Release 7 (DR7). We use an ensemble of mock catalogs to demonstrate the reliability of our method. Here as the first paper in a series, we mainly focus on the two point correlation function (2PCF) of galaxies. Overall the 2PCF measured in the reconstructed real space for galaxies brighter than $^{0.1}{\rm M}_r-5\log h=-19.0$ agrees with the direct measurement to an accuracy better than the measurement error due to cosmic variance, if the reconstruction uses the correct cosmology. Applying the method to the SDSS DR7, we construct a real space version of the main galaxy catalog, which contains 396,068 galaxies in the North Galactic Cap with redshifts in the range $0.01 \leq z \leq 0.12$. The Sloan Great Wall, the largest known structure in the nearby Universe, is not as dominant an over-dense structure as appears to be in redshift space. We measure the 2PCFs in reconstructed real space for galaxies of different luminosities and colors. All of them show clear deviations from single power-law forms, and reveal clear transitions from 1-halo to 2-halo terms. A comparison with the corresponding 2PCFs in redshift space nicely demonstrates how RSDs boost the clustering power on large scales (by about $40-50\%$ at scales $\sim 10 h^{-1}{\rm {Mpc}}$) and suppress it on small scales (by about $70-80\%$ at a scale of $0.3 h^{-1}{\rm {Mpc}}$).

preprint2014arXiv

Connections between galaxy mergers and Starburst: evidence from local Universe

Major mergers and interactions between gas-rich galaxies with comparable masses are thought to be the main triggers of starburst. In this work, we study, for a large stellar mass range, the interaction rate of the starburst galaxies in the local universe. We focus independently on central and satellite star forming galaxies extracted from the Sloan Digital Sky Survey. Here the starburst galaxies are selected in the star formation rate (SFR) stellar mass plane with SFR five times larger than the median value found for "star forming" galaxies of the same stellar mass. Through visual inspection of their images together with close companions determined using spectroscopic redshifts, we find that ~50% of the "starburst" populations show evident merger features, i.e., tidal tails, bridges between galaxies, double cores and close companions. In contrast, in the control sample we selected from the normal star forming galaxies, only ~19% of galaxies are associated with evident mergers. The interaction rates may increase by ~5% for the starburst sample and 2% for the control sample if close companions determined using photometric redshifts are considered. The contrast of the merger rate between the two samples strengthens the hypothesis that mergers and interactions are indeed the main causes of starburst.

preprint2014arXiv

Spin alignments of spiral galaxies within the large-scale structure from SDSS DR7

Using a sample of spiral galaxies selected from the Sloan Digital Sky Survey Data Release 7 (SDSS DR7) and Galaxy Zoo 2 (GZ2), we investigate the alignment of spin axes of spiral galaxies with their surrounding large scale structure, which is characterized by the large-scale tidal field reconstructed from the data using galaxy groups above a certain mass threshold. We find that the spin axes of only have weak tendency to be aligned with (or perpendicular to) the intermediate (or minor) axis of the local tidal tensor. The signal is the strongest in a \cluster environment where all the three eigenvalues of the local tidal tensor are positive. Compared to the alignments between halo spins and local tidal field obtained in N-body simulations, the above observational results are in best agreement with those for the spins of inner regions of halos, suggesting that the disk material traces the angular momentum of dark matter halos in the inner regions.

preprint2013arXiv

Alignments of galaxies within cosmic filaments from SDSS DR7

Using a sample of galaxy groups selected from the Sloan Digital Sky Survey Data Release 7 (SDSS DR7), we examine the alignment between the orientation of galaxies and their surrounding large scale structure in the context of the cosmic web. The latter is quantified using the large-scale tidal field, reconstructed from the data using galaxy groups above a certain mass threshold. We find that the major axes of galaxies in filaments tend to be preferentially aligned with the directions of the filaments, while galaxies in sheets have their major axes preferentially aligned parallel to the plane of the sheets. The strength of this alignment signal is strongest for red, central galaxies, and in good agreement with that of dark matter halos in N-body simulations. This suggests that red, central galaxies are well aligned with their host halos, in quantitative agreement with previous studies based on the spatial distribution of satellite galaxies. There is a luminosity and mass dependence that brighter and more massive galaxies in filaments and sheets have stronger alignment signals. We also find that the orientation of galaxies is aligned with the eigenvector associated with the smallest eigenvalue of the tidal tensor. These observational results indicate that galaxy formation is affected by large-scale environments, and strongly suggests that galaxies are aligned with each other over scales comparable to those of sheets and filaments in the cosmic web.

preprint2013arXiv

Nonlinearities in modified gravity cosmology. II. Impacts of modified gravity on the halo properties

The statistics of dark matter halos is an essential component of understanding the nonlinear evolution in modified gravity cosmology. Based on a series of modified gravity N-body simulations, we investigate the halo mass function, concentration and bias. We model the impact of modified gravity by a single parameter ζ, which determines the enhancement of particle acceleration with respect to GR, given the identical mass distribution (ζ=1 in GR). We select snapshot redshifts such that the linear matter power spectra of different gravity models are identical, in order to isolate the impact of gravity beyond modifying the linear growth rate. At the baseline redshift corresponding to z_S=1.2 in the standard ΛCDM, for a 10% deviation from GR(|ζ-1|=0.1), the measured halo mass function can differ by about 5-10%, the halo concentration by about 10-20%, while the halo bias differs significantly less. These results demonstrate that the halo mass function and/or the halo concentration are sensitive to the nature of gravity and may be used to make interesting constraints along this line.

preprint2012arXiv

Evolution of the Galaxy - Dark Matter Connection and the Assembly of Galaxies in Dark Matter Halos

We present a new model to describe the galaxy-dark matter connection across cosmic time, which unlike the popular subhalo abundance matching technique is self-consistent in that it takes account of the facts that (i) subhalos are accreted at different times, and (ii) the properties of satellite galaxies may evolve after accretion. Using observations of galaxy stellar mass functions out to $z \sim 4$, the conditional stellar mass function at $z\sim 0.1$ obtained from SDSS galaxy group catalogues, and the two-point correlation function (2PCF) of galaxies at $z \sim 0.1$ as function of stellar mass, we constrain the relation between galaxies and dark matter halos over the entire cosmic history from $z \sim 4$ to the present. This relation is then used to predict the median assembly histories of different stellar mass components within dark matter halos (central galaxies, satellite galaxies, and halo stars). We also make predictions for the 2PCFs of high-$z$ galaxies as function of stellar mass. Our main findings are the following: (i) Our model reasonably fits all data within the observational uncertainties, indicating that the $Λ$CDM concordance cosmology is consistent with a wide variety of data regarding the galaxy population across cosmic time. (ii) ... [abridged]

preprint2012arXiv

Measures of Galaxy Environment - I. What is "Environment"?

The influence of a galaxy's environment on its evolution has been studied and compared extensively in the literature, although differing techniques are often used to define environment. Most methods fall into two broad groups: those that use nearest neighbours to probe the underlying density field and those that use fixed apertures. The differences between the two inhibit a clean comparison between analyses and leave open the possibility that, even with the same data, different properties are actually being measured. In this work we apply twenty published environment definitions to a common mock galaxy catalogue constrained to look like the local Universe. We find that nearest neighbour-based measures best probe the internal densities of high-mass haloes, while at low masses the inter-halo separation dominates and acts to smooth out local density variations. The resulting correlation also shows that nearest neighbour galaxy environment is largely independent of dark matter halo mass. Conversely, aperture-based methods that probe super-halo scales accurately identify high-density regions corresponding to high mass haloes. Both methods show how galaxies in dense environments tend to be redder, with the exception of the largest apertures, but these are the strongest at recovering the background dark matter environment. We also warn against using photometric redshifts to define environment in all but the densest regions. When considering environment there are two regimes: the 'local environment' internal to a halo best measured with nearest neighbour and 'large-scale environment' external to a halo best measured with apertures. This leads to the conclusion that there is no universal environment measure and the most suitable method depends on the scale being probed.

preprint2011arXiv

An analytical model for the accretion of dark matter subhalos

An analytical model is developed for the mass function of cold dark matter subhalos at the time of accretion and for the distribution of their accretion times. Our model is based on the model of Zhao et al. (2009) for the median assembly histories of dark matter halos, combined with a simple log-normal distribution to describe the scatter in the main-branch mass at a given time for halos of the same final mass. Our model is simple, and can be used to predict the un-evolved subhalo mass function, the mass function of subhalos accreted at a given time, the accretion-time distribution of subhalos of a given initial mass, and the frequency of major mergers as a function of time. We test our model using high-resolution cosmological $N$-body simulations, and find that our model predictions match the simulation results remarkably well. Finally, we discuss the implications of our model for the evolution of subhalos in their hosts and for the construction of a self-consistent model to link galaxies and dark matter halos at different cosmic times.

preprint2010arXiv

Genus statistics using the Delaunay tessellation field estimation method: (I) tests with the Millennium Simulation and the SDSS DR7

We study the topology of cosmic large-scale structure through the genus statistics, using galaxy catalogues generated from the Millennium Simulation and observational data from the latest Sloan Digital Sky Survey Data Release (SDSS DR7). We introduce a new method for constructing galaxy density fields and for measuring the genus statistics of its isodensity surfaces. It is based on a Delaunay tessellation field estimation (DTFE) technique that allows the definition of a piece-wise continuous density field and the exact computation of the topology of its polygonal isodensity contours, without introducing any free numerical parameter. Besides this new approach, we also employ the traditional approaches of smoothing the galaxy distribution with a Gaussian of fixed width, or by adaptively smoothing with a kernel that encloses a constant number of neighboring galaxies. Our results show that the Delaunay-based method extracts the largest amount of topological information. Unlike the traditional approach for genus statistics, it is able to discriminate between the different theoretical galaxy catalogues analyzed here, both in real space and in redshift space, even though they are based on the same underlying simulation model. In particular, the DTFE approach detects with high confidence a discrepancy of one of the semi-analytic models studied here compared with the SDSS data, while the other models are found to be consistent.

Youcai Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

An Extended Halo-based Group/Cluster finder: application to the DESI legacy imaging surveys DR8

Elucidating Galaxy Assembly Bias in SDSS

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

Towards Communication-Efficient and Privacy-Preserving Federated Representation Learning

Prime-Aware Adaptive Distillation

Relating the structure of dark matter halos to their assembly and environment

An empirical model to form and evolve galaxies in dark matter halos

ELUCID - Exploring the Local Universe with reConstructed Initial Density field III: Constrained Simulation in the SDSS Volume

Galaxy groups in the 2MASS Redshift Survey

Mapping the real space distributions of galaxies in SDSS DR7: I. Two Point Correlation Functions

Connections between galaxy mergers and Starburst: evidence from local Universe

Spin alignments of spiral galaxies within the large-scale structure from SDSS DR7

Alignments of galaxies within cosmic filaments from SDSS DR7

Nonlinearities in modified gravity cosmology. II. Impacts of modified gravity on the halo properties

Evolution of the Galaxy - Dark Matter Connection and the Assembly of Galaxies in Dark Matter Halos

Measures of Galaxy Environment - I. What is "Environment"?

An analytical model for the accretion of dark matter subhalos

Genus statistics using the Delaunay tessellation field estimation method: (I) tests with the Millennium Simulation and the SDSS DR7