Researcher profile

Arya Farahi

Arya Farahi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2024arXiv

U-Trustworthy Models.Reliability, Competence, and Confidence in Decision-Making

With growing concerns regarding bias and discrimination in predictive models, the AI community has increasingly focused on assessing AI system trustworthiness. Conventionally, trustworthy AI literature relies on the probabilistic framework and calibration as prerequisites for trustworthiness. In this work, we depart from this viewpoint by proposing a novel trust framework inspired by the philosophy literature on trust. We present a precise mathematical definition of trustworthiness, termed $\mathcal{U}$-trustworthiness, specifically tailored for a subset of tasks aimed at maximizing a utility function. We argue that a model's $\mathcal{U}$-trustworthiness is contingent upon its ability to maximize Bayes utility within this task subset. Our first set of results challenges the probabilistic framework by demonstrating its potential to favor less trustworthy models and introduce the risk of misleading trustworthiness assessments. Within the context of $\mathcal{U}$-trustworthiness, we prove that properly-ranked models are inherently $\mathcal{U}$-trustworthy. Furthermore, we advocate for the adoption of the AUC metric as the preferred measure of trustworthiness. By offering both theoretical guarantees and experimental validation, AUC enables robust evaluation of trustworthiness, thereby enhancing model selection and hyperparameter tuning to yield more trustworthy outcomes.

preprint2022arXiv

Correlations of Dark Matter, Gas and Stellar Profiles in Dark Matter Halos

Halos of similar mass and redshift exhibit a large degree of variability in their differential properties, such as dark matter, hot gas, and stellar mass density profiles. This variability is an indicator of diversity in the formation history of these dark matter halos that is reflected in the coupling of scatters about the mean relations. In this work, we show that the strength of this coupling depends on the scale at which halo profiles are measured. By analyzing the outputs of the IllustrisTNG hydrodynamical cosmological simulations we report the radial- and mass-dependent couplings between the dark matter, hot gas, and stellar mass radial density profiles utilizing the population diversity in dark matter halos. We find that for the same mass halos the scatters in the density of baryons and dark matter are strongly coupled at large scales ($r>R_{200}$); but the coupling between gas and dark matter density profiles fades near the core of halos ($r < 0.3 R_{200}$). We then show that the correlation between halo profile and integrated quantities induces a radius-dependent additive bias in the profile observables of halos when halos are selected on properties other than their mass. We discuss the impact of this effect on cluster abundance and cross-correlations cosmology with multi-wavelength cosmological surveys.

preprint2022arXiv

HSC-XXL : Baryon budget of the 136 XXL Groups and Clusters

We present our determination of the baryon budget for an X-ray-selected XXL sample of 136 galaxy groups and clusters spanning nearly two orders of magnitude in mass ($M_{500}\sim 10^{13}-10^{15}M_\odot$) and the redshift range $0< z < 1$. Our joint analysis is based on the combination of HSC-SSP weak-lensing mass measurements, XXL X-ray gas mass measurements, and HSC and SDSS multiband photometry. We carry out a Bayesian analysis of multivariate mass-scaling relations of gas mass, galaxy stellar mass, stellar mass of brightest cluster galaxies (BCGs), and soft-band X-ray luminosity, by taking into account the intrinsic covariance between cluster properties, selection effect, weak-lensing mass calibration, and observational error covariance matrix. The mass-dependent slope of the gas mass--total mass ($M_{500}$) relation is found to be $1.29_{-0.10}^{+0.16}$, which is steeper than the self-similar prediction of unity, whereas the slope of the stellar mass--total mass relation is shallower than unity, $0.85_{-0.09}^{+0.12}$. The BCG stellar mass weakly depends on cluster mass with a slope of $0.49_{-0.10}^{+0.11}$. The baryon, gas mass, and stellar mass fractions as a function of $M_{500}$ agree with the results from numerical simulations and previous observations. We successfully constrain the full intrinsic covariance of the baryonic contents. The BCG stellar mass shows the larger intrinsic scatter at a given halo total mass, followed in order by stellar mass and gas mass. We find a significant positive intrinsic correlation coefficient between total (and satellite) stellar mass and BCG stellar mass and no evidence for intrinsic correlation between gas mass and stellar mass. All the baryonic components show no redshift evolution.

preprint2022arXiv

KLLR: A scale-dependent, multivariate model class for regression analysis

The underlying physics of astronomical systems governs the relation between their measurable properties. Consequently, quantifying the statistical relationships between system-level observable properties of a population offers insights into the astrophysical drivers of that class of systems. While purely linear models capture behavior over a limited range of system scale, the fact that astrophysics is ultimately scale-dependent implies the need for a more flexible approach to describing population statistics over a wide dynamic range. For such applications, we introduce and implement a class of Kernel-Localized Linear Regression (KLLR) models. KLLR is a natural extension to the commonly-used linear models that allows the parameters of the linear model -- normalization, slope, and covariance matrix -- to be scale-dependent. KLLR performs inference in two steps: (1) it estimates the mean relation between a set of independent variables and a dependent variable and; (2) it estimates the conditional covariance of the dependent variables given a set of independent variables. We demonstrate the model&#39;s performance in a simulated setting and showcase an application of the proposed model in analyzing the baryonic content of dark matter halos. As a part of this work, we publicly release a Python implementation of the KLLR method.

preprint2022arXiv

Optical selection bias and projection effects in stacked galaxy cluster weak lensing

Cosmological constraints from current and upcoming galaxy cluster surveys are limited by the accuracy of cluster mass calibration. In particular, optically identified galaxy clusters are prone to selection effects that can bias the weak lensing mass calibration. We investigate the selection bias of the stacked cluster lensing signal associated with optically selected clusters, using clusters identified by the redMaPPer algorithm in the Buzzard simulations as a case study. We find that at a given cluster halo mass, the residuals of redMaPPer richness and weak lensing signal are positively correlated. As a result, for a given richness selection, the stacked lensing signal is biased high compared with what we would expect from the underlying halo mass probability distribution. The cluster lensing selection bias can thus lead to overestimated mean cluster mass and biased cosmology results. We show that the lensing selection bias exhibits a strong scale-dependence and is approximately 20 to 60 percent for $ΔΣ$ at large scales. This selection bias largely originates from spurious member galaxies within +/- 20 to 60 Mpc/h along the line of sight, highlighting the importance of quantifying projection effects associated with the broad redshift distribution of member galaxies in photometric cluster surveys. While our results qualitatively agree with those in the literature, accurate quantitative modelling of the selection bias is needed to achieve the goals of cluster lensing cosmology and will require synthetic catalogues covering a wide range of galaxy-halo connection models.

preprint2021arXiv

Galaxy Velocity Bias in Cosmological Simulations: Towards Percent-level Calibration

Galaxy cluster masses, rich with cosmological information, can be estimated from internal dark matter (DM) velocity dispersions, which in turn can be observationally inferred from satellite galaxy velocities. However, galaxies are biased tracers of the DM, and the bias can vary over host halo and galaxy properties as well as time. We precisely calibrate the velocity bias, b_v -- defined as the ratio of galaxy and DM velocity dispersions -- as a function of redshift, host halo mass, and galaxy stellar mass threshold (Mstarsat), for massive halos (M200c > 1e13.5 msun) from five cosmological simulations: IllustrisTNG, Magneticum, Bahamas + Macsis, The Three Hundred Project, and MultiDark Planck-2. We first compare scaling relations for galaxy and DM velocity dispersion across simulations; the former is estimated using a new ensemble velocity likelihood method that is unbiased for low galaxy counts per halo, while the latter uses a local linear regression. The simulations show consistent trends of b_v increasing with M200c and decreasing with redshift and Mstarsat. The ensemble-estimated theoretical uncertainty in b_v is 2-3% but becomes percent-level when considering only the three highest resolution simulations. We update the mass-richness normalization previously estimated by Farahi et al. (2016) for an SDSS redMaPPer cluster sample. The improved accuracy of our b_v estimates reduces the mass normalization uncertainty from 22% to 8%, demonstrating that dynamical estimation techniques can be competitive with weak lensing in calibrating population mean masses. We discuss necessary steps for further improving this precision. Our estimates for b_v(M200c, Mstarsat, z) are made publicly available.

preprint2021arXiv

SHAPing the Gas: Understanding Gas Shapes in Dark Matter Haloes with Interpretable Machine Learning

The non-spherical shapes of dark matter and gas distributions introduce systematic uncertainties that affect observable-mass relations and selection functions of galaxy groups and clusters. However, the triaxial gas distributions depend on the non-linear physical processes of halo formation histories and baryonic physics, which are challenging to model accurately. In this study we explore a machine learning approach for modelling the dependence of gas shapes on dark matter and baryonic properties. With data from the IllustrisTNG hydrodynamical cosmological simulations, we develop a machine learning pipeline that applies \pkg{XGBoost}, an implementation of gradient boosted decision trees, to predict radial profiles of gas shapes from halo properties. We show that \pkg{XGBoost} models can accurately predict gas shape profiles in dark matter haloes. We also explore model interpretability with \pkg{SHAP}, a method that identifies the most predictive properties at different halo radii. We find that baryonic properties best predict gas shapes in halo cores, whereas dark matter shapes are the main predictors in the halo outskirts. This work demonstrates the power of interpretable machine learning in modelling observable properties of dark matter haloes in the era of multi-wavelength cosmological surveys.

preprint2021arXiv

The Role of Machine Learning in the Next Decade of Cosmology

In recent years, machine learning (ML) methods have remarkably improved how cosmologists can interpret data. The next decade will bring new opportunities for data-driven cosmological discovery, but will also present new challenges for adopting ML methodologies and understanding the results. ML could transform our field, but this transformation will require the astronomy community to both foster and promote interdisciplinary research endeavors.

preprint2020arXiv

Aging Halos: Implications of the Magnitude Gap on Conditional Statistics of Stellar and Gas Properties of Massive Halos

Cold dark matter model predicts that the large-scale structure grows hierarchically. Small dark matter halos form first. Then, they grow gradually via continuous merger and accretion. These halos host the majority of baryonic matter in the Universe in the form of hot gas and cold stellar phase. Determining how baryons are partitioned into these phases requires detailed modeling of galaxy formation and their assembly history. It is speculated that formation time of the same mass halos might be correlated with their baryonic content. To evaluate this hypothesis, we employ halos of mass above $10^{14}\,M_{\odot}$ realized by TNG300 solution of the IllustrisTNG project. Formation time is not directly observable. Hence, we rely on the magnitude gap between the brightest and the fourth brightest halo galaxy member, which is shown that traces formation time of the host halo. We compute the conditional statistics of the stellar and gas content of halos conditioned on their total mass and magnitude gap. We find a strong correlation between magnitude gap and gas mass, BCG stellar mass, and satellite galaxies stellar mass, but not the total stellar mass of halo. Conditioning on the magnitude gap can reduce the scatter about halo property--halo mass relation and has a significant impact on the conditional covariance. Reduction in the scatter can be as significant as 30%, which implies more accurate halo mass prediction. Incorporating the magnitude gap has a potential to improve cosmological constraints using halo abundance and allows us to gain insight into the baryon evolution within these systems.

preprint2020arXiv

Dark Energy Survey Year 1 Results: Cosmological Constraints from Cluster Abundances and Weak Lensing

We perform a joint analysis of the counts and weak lensing signal of redMaPPer clusters selected from the Dark Energy Survey (DES) Year 1 dataset. Our analysis uses the same shear and source photometric redshifts estimates as were used in the DES combined probes analysis. Our analysis results in surprisingly low values for $S_8 =σ_8(Ω_{\rm m}/0.3)^{0.5}= 0.65\pm 0.04$, driven by a low matter density parameter, $Ω_{\rm m}=0.179^{+0.031}_{-0.038}$, with $σ_8-Ω_{\rm m}$ posteriors in $2.4σ$ tension with the DES Y1 3x2pt results, and in $5.6σ$ with the Planck CMB analysis. These results include the impact of post-unblinding changes to the analysis, which did not improve the level of consistency with other data sets compared to the results obtained at the unblinding. The fact that multiple cosmological probes (supernovae, baryon acoustic oscillations, cosmic shear, galaxy clustering and CMB anisotropies), and other galaxy cluster analyses all favor significantly higher matter densities suggests the presence of systematic errors in the data or an incomplete modeling of the relevant physics. Cross checks with X-ray and microwave data, as well as independent constraints on the observable--mass relation from SZ selected clusters, suggest that the discrepancy resides in our modeling of the weak lensing signal rather than the cluster abundance. Repeating our analysis using a higher richness threshold ($λ\ge 30$) significantly reduces the tension with other probes, and points to one or more richness-dependent effects not captured by our model.

preprint2020arXiv

Driving with Data in the Motor City: Mining and Modeling Vehicle Fleet Maintenance Data

The City of Detroit maintains an active fleet of over 2500 vehicles, spending an annual average of over \$5 million on purchases and over \$7.7 million on maintenance. Modeling patterns and trends in this data is of particular importance to a variety of stakeholders, particularly as Detroit emerges from Chapter 9 bankruptcy, but the structure in such data is complex, and the city lacks dedicated resources for in-depth analysis. The City of Detroit&#39;s Operations and Infrastructure Group and the University of Michigan initiated a collaboration which seeks to address this unmet need by analyzing data from the City of Detroit&#39;s vehicle fleet. This work presents a case study and provides the first data-driven benchmark, demonstrating a suite of methods to aid in data understanding and prediction for large vehicle maintenance datasets. We present analyses to address three key questions raised by the stakeholders, related to discovering multivariate maintenance patterns over time; predicting maintenance; and predicting vehicle- and fleet-level costs. We present a novel algorithm, PRISM, for automating multivariate sequential data analyses using tensor decomposition. This work is a first of its kind that presents both methodologies and insights to guide future civic data research.

preprint2020arXiv

Stellar Property Statistics of Massive Halos from Cosmological Hydrodynamics Simulations: Common Kernel Shapes

We study stellar property statistics, including satellite galaxy occupation, of massive halo populations realized by three cosmological hydrodynamics simulations: BAHAMAS + MACSIS, TNG300 of the IllustrisTNG suite, and Magneticum Pathfinder. The simulations incorporate independent sub-grid methods for astrophysical processes with spatial resolutions ranging from $1.5$ to $6$ kpc, and each generates samples of $1000$ or more halos with $M_{\rm halo}> 10^{13.5} M_{\odot}$ at redshift $z=0$. Applying localized, linear regression (LLR), we extract halo mass-conditioned statistics (normalizations, slopes, and intrinsic covariance) for a three-element stellar property vector consisting of: i) $N_{sat}$, the number of satellite galaxies with stellar mass, $M_{\star, \rm sat} > 10^{10} M_{\odot}$ within radius $R_{200c}$ of the halo; ii) $M_{\star,\rm tot}$, the total stellar mass within that radius, and; iii) $M_{\star,\rm BCG}$, the gravitationally-bound stellar mass of the central galaxy within a $100 \, \rm kpc$ radius. Scaling parameters for the three properties with halo mass show mild differences among the simulations, in part due to numerical resolution, but there is qualitative agreement on property correlations, with halos having smaller than average central galaxies tending to also have smaller total stellar mass and a larger number of satellite galaxies. Marginalizing over total halo mass, we find the satellite galaxy kernel, $p(\ln N_{sat}\,|\,M_{\rm halo},z)$ to be consistently skewed left, with skewness parameter $γ= -0.91 \pm 0.02$, while that of $\ln M_{\star,\rm tot}$ is closer to log-normal, in all three simulations. The highest resolution simulations find $γ\simeq -0.8$ for the $z=0$ shape of $p(\ln M_{\star,\rm BCG}\,|\,M_{\rm halo},z)$ and also that the fractional scatter in total stellar mass is below $10\%$ in halos more massive than $10^{14.3} M_{\odot}$.