Researcher profile

Yifan Mai

Yifan Mai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Adaptive auditing of AI systems with anytime-valid guarantees

A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to annotate based on past results. While this framework is highly practical, its extreme flexibility makes it difficult to draw statistically rigorous conclusions, as it violates classical assumptions: the number of observations is typically limited (often 10 to 50 cases) and decisions regarding sampling and stopping are made in the midst of data collection rather than based a pre-specified rule. To characterize what statistical inferences can be drawn from highly adaptive audits, we introduce a hypothesis testing framework from two 'dueling' perspectives: (i) the model's null that asserts there is no failure mode with performance below a target threshold versus (ii) the auditor's null that asserts they have a sampling strategy that will uncover a failure mode. Leveraging Safe Anytime-Valid Inference (SAVI), we formalize the auditor as conducting 'testing by betting', which translates into simultaneous e-processes for testing the dueling null hypotheses. Furthermore, if the auditor is sufficiently powerful, we prove that these two hypotheses are asymptotically inverses of each other, in that passage of a stringent audit does in fact certify the AI system as being globally robust. Empirically, we demonstrate that our proposed testing procedures maintain anytime-valid type-I error control, outperform pre-specified testing methods, and can reach statistically rigorous conclusions sometimes with as few as 20 observations.

preprint2026arXiv

Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

Current Large Language Model (LLM) evaluation frameworks utilize the same static prompt template across all models under evaluation. This differs from the common industry practice of using prompt optimization (PO) techniques to optimize the prompt for each model to maximize application performance. In this paper, we investigate the effect of PO towards LLM evaluations. Our results on public academic and internal industry benchmarks show that PO greatly affects the final ranking of models. This highlights the importance of practitioners performing PO per model when conducting evaluations to choose the best model for a given task.

preprint2026arXiv

The MAGPI Survey: co-evolution of baryons and dark matter in star-forming disk-like galaxies at $0.1 \lesssim z \lesssim 0.85$

We present a comprehensive analysis of the dark matter (DM) content and its structural dependence in star-forming disk-like galaxies at intermediate redshifts ($0.1 \lesssim z \lesssim 0.85$), utilizing spatially resolved kinematic data from the MAGPI survey. We report the following: (1) Low stellar mass galaxies ($M_{\rm star} < 10^{9.5}\, M_\odot$) are strongly DM dominated across all radii, with average $\langle f_{_{\rm DM}} \rangle \sim 0.85$, while high-mass ($M_{\rm star} > 10^{10.5}\, M_\odot$) systems exhibit relatively low DM fractions in their inner regions ($\langle f_{_{\rm DM}} \rangle \sim 0.47$) which is equivalent to local massive disk galaxies (e.g., Milky Way and Andromeda). This suggests a mass-dependent structural dichotomy, most-likely governed by a combination of internal galactic processes and environmental influences. (2) A tight inverse correlation between $f_{_{\rm DM}}$ and baryon mass surface density ($Σ_{\rm bar}$), with intrinsic scatter of $\sim 0.11$ dex. This is consistent with an inside-out baryon assembly scenario and suggests that the fundamental structural correlations of galaxies were already established by $z\sim 0.85$. (3) No significant evolution in $f_{_{\rm DM}}$ with redshift across the MAGPI window, and when combined with higher-redshift ($0.6 \leq z \leq 1.5$) data from Sharma et al. 2025, we quantitatively show that the reported decline in $f_{_{\rm DM}}(z)$ is most-likely due to observational biases against low-mass systems at $z > 1$. These results offer empirical evidence for a scenario in which disk-like galaxies evolve through a co-regulated build-up of baryonic and DM components, preserving internal structural regularities (such as the total mass distribution and rotation-curve shape) throughout cosmic time.

preprint2022arXiv

The SAMI Galaxy Survey: The relationship between galaxy rotation and the motion of neighbours

Using data from the SAMI Galaxy Survey, we investigate the correlation between the projected stellar kinematic spin vector of 1397 SAMI galaxies and the line-of-sight motion of their neighbouring galaxies. We calculate the luminosity-weighted mean velocity difference between SAMI galaxies and their neighbours in the direction perpendicular to the SAMI galaxies angular momentum axes. The luminosity-weighted mean velocity offsets between SAMI and neighbours, which indicates the signal of coherence between the rotation of the SAMI galaxies and the motion of neighbours, is 9.0 $\pm$ 5.4 km s$^{-1}$ (1.7 $σ$) for neighbours within 1 Mpc. In a large-scale analysis, we find that the average velocity offsets increase for neighbours out to 2 Mpc. However, the velocities are consistent with zero or negative for neighbours outside 3 Mpc. The negative signals for neighbours at distance around 10 Mpc are also significant at $\sim 2$ $σ$ level, which indicate that the positive signals within 2 Mpc might come from the variance of large-scale structure. We also calculate average velocities of different subsamples, including galaxies in different regions of the sky, galaxies with different stellar masses, galaxy type, $λ_{Re}$ and inclination. Although low-mass, high-mass, early-type and low-spin galaxies subsamples show 2 - 3 $σ$ signal of coherence for the neighbours within 2 Mpc, the results for different inclination subsamples and large-scale results suggest that the $\sim 2 σ$ signals might result from coincidental scatter or variance of large-scale structure. Overall, the modest evidence of coherence signals for neighbouring galaxies within 2 Mpc needs to be confirmed by larger samples of observations and simulation studies.