Source author record

Ying Xiao

Ying Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computational Complexity Artificial Intelligence hep-ph math.OC math.PR Social and Information Networks

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FairMedQA: Benchmarking Bias in Large Language Models for Medical Question Answering

Large language models (LLMs) are approaching expert-level performance in medical question answering (QA), demonstrating strong potential to improve public healthcare. However, underlying biases related to sensitive attributes such as sex and race pose life-critical risks. The extent to which such sensitive attributes affect diagnosis remains an open question and requires comprehensive empirical investigation. Additionally, even the latest Counterfactual Patient Variations (CPV) benchmark can hardly distinguish the bias levels of different LLMs. To further explore these dynamics, we propose a new benchmark, FairMedQA, and benchmark 12 representative LLMs. FairMedQA contains 4,806 counterfactual question pairs constructed from 801 clinical vignettes. Our results reveal substantial accuracy disparity ranging from 3 to 19 percentage points across sensitive demographic groups. Notably, FairMedQA exposes biases that are at least 12 percentage points larger than those identified by the latest CPV benchmark, presenting superior benchmarking sensitivity. Our results underscore an urgent need for targeted debiasing techniques and more rigorous, identity-aware validation protocols before LLMs can be safely integrated into practical clinical decision-support systems.

preprint2022arXiv

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

Social networks, such as Twitter, form a heterogeneous information network (HIN) where nodes represent domain entities (e.g., user, content, advertiser, etc.) and edges represent one of many entity interactions (e.g, a user re-sharing content or "following" another). Interactions from multiple relation types can encode valuable information about social network entities not fully captured by a single relation; for instance, a user's preference for accounts to follow may depend on both user-content engagement interactions and the other users they follow. In this work, we investigate knowledge-graph embeddings for entities in the Twitter HIN (TwHIN); we show that these pretrained representations yield significant offline and online improvement for a diverse range of downstream recommendation and classification tasks: personalized ads rankings, account follow-recommendation, offensive content detection, and search ranking. We discuss design choices and practical challenges of deploying industry-scale HIN embeddings, including compressing them to reduce end-to-end model latency and handling parameter drift across versions.

preprint2016arXiv

Semileptonic decays of $B_c$ meson to S-wave charmonium states in the perturbative QCD approach

Inspired by the recent measurement of the ratio of $B_c$ branching fractions to $J/ψπ^+$ and $J/ψμ^+ν_μ$ final states at the LHCb detector, we study the semileptonic decays of $B_c$ meson to the S-wave ground and radially excited 2S and 3S charmonium states with the perturbative QCD approach. After evaluating the form factors for the transitions $B_c\rightarrow P,V$, where $P$ and $V$ denote pseudoscalar and vector S-wave charmonia, respectively, we calculate the branching ratios for all these semileptonic decays. The theoretical uncertainty of hadronic input parameters are reduced by utilizing the light-cone wave function for $B_c$ meson. It is found that the predicted branching ratios range from $10^{-6}$ up to $10^{-2}$ and could be measured by the future LHCb experiment. Our prediction for the ratio of branching fractions $\frac{\mathcal {BR}(B_c^+\rightarrow J/Ψπ^+)}{\mathcal {BR}(B_c^+\rightarrow J/Ψμ^+ν_μ)}$ is in good agreement with the data. For $B_c\rightarrow V l ν_l$ decays, the relative contributions of the longitudinal and transverse polarization are discussed in different momentum transfer squared regions. These predictions will be tested on the ongoing and forthcoming experiments.

preprint2016arXiv

Statistical Algorithms and a Lower Bound for Detecting Planted Clique

We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to a statistical query oracle. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, e.g., moments-based methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing can be implemented in this framework. Our framework is based on, and generalizes, the statistical query model in learning theory (Kearns, 1998). Our main application is a nearly optimal lower bound on the complexity of any statistical query algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size $O(n^{1/2-δ})$ for any constant $δ> 0$. The assumed hardness of variants of these problems has been used to prove hardness of several other problems and as a guarantee for security in cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.

preprint2015arXiv

Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity

We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions. We use the technique to give a polynomial-time algorithm for standard ICA with sample complexity nearly linear in the dimension, thereby improving substantially on previous bounds. The analysis is based on properties of random polynomials, namely the spacings of an ensemble of polynomials. Our technique also applies to other applications of tensor decompositions, including spherical Gaussian mixture models.

preprint2014arXiv

Fourier PCA and Robust Tensor Decomposition

Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution.We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-$1$ decompositions. Our main application is the first provably polynomial-time algorithm for underdetermined ICA, i.e., learning an $n \times m$ matrix $A$ from observations $y=Ax$ where $x$ is drawn from an unknown product distribution with arbitrary non-Gaussian components. The number of component distributions $m$ can be arbitrarily higher than the dimension $n$ and the columns of $A$ only need to satisfy a natural and efficiently verifiable nondegeneracy condition. As a second application, we give an alternative algorithm for learning mixtures of spherical Gaussians with linearly independent means. These results also hold in the presence of Gaussian noise.

preprint2013arXiv

Compact Random Feature Maps

Kernel approximation using randomized feature maps has recently gained a lot of interest. In this work, we identify that previous approaches for polynomial kernel approximation create maps that are rank deficient, and therefore do not utilize the capacity of the projected feature space effectively. To address this challenge, we propose compact random feature maps (CRAFTMaps) to approximate polynomial kernels more concisely and accurately. We prove the error bounds of CRAFTMaps demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes. We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers. We present experiments on multiple standard data-sets with performance competitive with state-of-the-art results.

preprint2012arXiv

Structure from Local Optima: Learning Subspace Juntas via Higher Order PCA

We present a generalization of the well-known problem of learning k-juntas in R^n, and a novel tensor algorithm for unraveling the structure of high-dimensional distributions. Our algorithm can be viewed as a higher-order extension of Principal Component Analysis (PCA). Our motivating problem is learning a labeling function in R^n, which is determined by an unknown k-dimensional subspace. This problem of learning a k-subspace junta is a common generalization of learning a k-junta (a function of k coordinates in R^n) and learning intersections of k halfspaces. In this context, we introduce an irrelevant noisy attributes model where the distribution over the "relevant" k-dimensional subspace is independent of the distribution over the (n-k)-dimensional "irrelevant" subspace orthogonal to it. We give a spectral tensor algorithm which identifies the relevant subspace, and thereby learns k-subspace juntas under some additional assumptions. We do this by exploiting the structure of local optima of higher moment tensors over the unit sphere; PCA finds the global optima of the second moment tensor (covariance matrix). Our main result is that when the distribution in the irrelevant (n-k)-dimensional subspace is any Gaussian, the complexity of our algorithm is T(k,ε) + \poly(n), where T is the complexity of learning the concept in k dimensions, and the polynomial is a function of the k-dimensional concept class being learned. This substantially generalizes existing results on learning low-dimensional concepts.