Source author record

Ziwei Zhu

Ziwei Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Information Retrieval Computation and Language Methodology Computer Vision Machine Learning

Catalog footprint

What is connected

11works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bias Association Discovery Framework for Open-Ended LLM Generations

Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs.

preprint2026arXiv

Context-Aware Decoding for Faithful Vision-Language Generation

Hallucinations, generating responses inconsistent with the visual input, remain a critical limitation of large vision-language models (LVLMs), especially in open-ended tasks such as image captioning and visual reasoning. In this work, we probe the layer-wise generation dynamics that drive hallucinations and propose a training-free mitigation strategy. Employing the Logit Lens, we examine how LVLMs construct next-token distributions across decoder layers, uncovering a pronounced commitment-depth gap: truthful tokens accumulate probability mass on their final candidates earlier than hallucinatory ones. Drawing on this discovery, we introduce Context Embedding Injection (CEI), a lightweight method that harnesses the hidden state of the last input token-the context embedding-as a grounding signal to maintain visual fidelity throughout decoding and curb hallucinations. Evaluated on the CHAIR, AMBER, and MMHal-Bench benchmarks (with a maximum token length of 512), CEI outperforms state-of-the-art baselines across three LVLMs, with its dynamic variant yielding the lowest overall hallucination rates. By integrating novel mechanistic insights with a scalable intervention, this work advances the mitigation of hallucinations in LVLMs.

preprint2026arXiv

WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents

Large language model (LLM)-based agents are widely deployed in user-facing services but remain error-prone in new tasks, tend to repeat the same failure patterns, and show substantial run-to-run variability. Fixing failures via environment-specific training or manual patching is costly and hard to scale. To enable self-evolving agents in user-facing service environments, we propose WISE-Flow, a workflow-centric framework that converts historical service interactions into reusable procedural experience by inducing workflows with prerequisite-augmented action blocks. At deployment, WISE-Flow aligns the agent's execution trajectory to retrieved workflows and performs prerequisite-aware feasibility reasoning to achieve state-grounded next actions. Experiments on ToolSandbox and $τ^2$-bench show consistent improvement across base models.

preprint2022arXiv

Evolution of Popularity Bias: Empirical Study and Debiasing

Popularity bias is a long-standing challenge in recommender systems. Such a bias exerts detrimental impact on both users and item providers, and many efforts have been dedicated to studying and solving such a bias. However, most existing works situate this problem in a static setting, where the bias is analyzed only for a single round of recommendation with logged data. These works fail to take account of the dynamic nature of real-world recommendation process, leaving several important research questions unanswered: how does the popularity bias evolve in a dynamic scenario? what are the impacts of unique factors in a dynamic recommendation process on the bias? and how to debias in this long-term dynamic process? In this work, we aim to tackle these research gaps. Concretely, we conduct an empirical study by simulation experiments to analyze popularity bias in the dynamic scenario and propose a dynamic debiasing strategy and a novel False Positive Correction method utilizing false positive signals to debias, which show effective performance in extensive experiments.

preprint2022arXiv

Quantifying and Mitigating Popularity Bias in Conversational Recommender Systems

Conversational recommender systems (CRS) have shown great success in accurately capturing a user's current and detailed preference through the multi-round interaction cycle while effectively guiding users to a more personalized recommendation. Perhaps surprisingly, conversational recommender systems can be plagued by popularity bias, much like traditional recommender systems. In this paper, we systematically study the problem of popularity bias in CRSs. We demonstrate the existence of popularity bias in existing state-of-the-art CRSs from an exposure rate, a success rate, and a conversational utility perspective, and propose a suite of popularity bias metrics designed specifically for the CRS setting. We then introduce a debiasing framework with three unique features: (i) Popularity-Aware Focused Learning to reduce the popularity-distorting impact on preference prediction; (ii) Cold-Start Item Embedding Reconstruction via Attribute Mapping, to improve the modeling of cold-start items; and (iii) Dual-Policy Learning, to better guide the CRS when dealing with either popular or unpopular items. Through extensive experiments on two frequently used CRS datasets, we find the proposed model-agnostic debiasing framework not only mitigates the popularity bias in state-of-the-art CRSs but also improves the overall recommendation performance.

preprint2022arXiv

Rank-Constrained Least-Squares: Prediction and Inference

In this work, we focus on the high-dimensional trace regression model with a low-rank coefficient matrix. We establish a nearly optimal in-sample prediction risk bound for the rank-constrained least-squares estimator under no assumptions on the design matrix. Lying at the heart of the proof is a covering number bound for the family of projection operators corresponding to the subspaces spanned by the design. By leveraging this complexity result, we perform a power analysis for a permutation test on the existence of a low-rank signal under the high-dimensional trace regression model. We show that the permutation test based on the rank-constrained least-squares estimator achieves non-trivial power with no assumptions on the minimum (restricted) eigenvalue of the covariance matrix of the design. Finally, we use alternating minimization to approximately solve the rank-constrained least-squares problem to evaluate its empirical in-sample prediction risk and power of the resulting permutation test in our numerical study.

preprint2022arXiv

Supervised Homogeneity Fusion: a Combinatorial Approach

Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called $L_0$-Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called grouping sensitivity that underpins the difficulty of recovering the true groups. We show that $L_0$-Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply $L_0$-Fusion coupled with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide a MIO formulation for $L_0$-Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that $L_0$-Fusion exhibits superiority over its competitors in terms of grouping accuracy.

preprint2022arXiv

Towards Fair Conversational Recommender Systems

Conversational recommender systems have demonstrated great success. They can accurately capture a user's current detailed preference -- through a multi-round interaction cycle -- to effectively guide users to a more personalized recommendation. Alas, conversational recommender systems can be plagued by the adverse effects of bias, much like traditional recommenders. In this work, we argue for increased attention on the presence of and methods for counteracting bias in these emerging systems. As a starting point, we propose three fundamental questions that should be deeply examined to enable fairness in conversational recommender systems.

preprint2020arXiv

Taming heavy-tailed features by shrinkage

In this work, we focus on a variant of the generalized linear model (GLM) called corrupted GLM (CGLM) with heavy-tailed features and responses. To robustify the statistical inference on this model, we propose to apply $\ell_4$-norm shrinkage to the feature vectors in the low-dimensional regime and apply elementwise shrinkage to them in the high-dimensional regime. Under bounded fourth moment assumptions, we show that the maximum likelihood estimator (MLE) based on the shrunk data enjoys nearly the minimax optimal rate with an exponential deviation bound. Our simulations demonstrate that the proposed feature shrinkage significantly enhances the statistical performance in linear regression and logistic regression on heavy-tailed data. Finally, we apply our shrinkage principle to guard against mislabeling and image noise in the human-written digit recognition problem. We add an $\ell_4$-norm shrinkage layer to the original neural net and reduce the testing misclassification rate by more than $30\%$ relatively in the presence of mislabeling and image noise.

preprint2016arXiv

Heterogeneity Adjustment with Applications to Graphical Model Inference

Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the biases of batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a large fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the "Bless of Dimensionality". As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.

preprint2015arXiv

Distributed Estimation and Inference with Statistical Guarantees

This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose $k$ as $n$ grows large, providing a theoretical upper bound on $k$ such that the information loss due to the divide and conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as a practically infeasible oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

Ziwei Zhu

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Bias Association Discovery Framework for Open-Ended LLM Generations

Context-Aware Decoding for Faithful Vision-Language Generation

WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents

Evolution of Popularity Bias: Empirical Study and Debiasing

Quantifying and Mitigating Popularity Bias in Conversational Recommender Systems

Rank-Constrained Least-Squares: Prediction and Inference

Supervised Homogeneity Fusion: a Combinatorial Approach

Towards Fair Conversational Recommender Systems

Taming heavy-tailed features by shrinkage

Heterogeneity Adjustment with Applications to Graphical Model Inference

Distributed Estimation and Inference with Statistical Guarantees