Source author record

Xueru Zhang

Xueru Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Methodology

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop

The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models are trained on their own outputs and may cause performance drops and induce emerging biases. In real-world applications, previously deployed LLMs may influence the data they generate, leading to a dynamic system driven by user feedback. For example, if a model continues to underserve users from a group, less query data will be collected from this particular demographic of users. In this study, we introduce the concept of \textbf{S}elf-\textbf{C}onsuming \textbf{P}erformative \textbf{L}oop (\textbf{SCPL}) and investigate the role of synthetic data in shaping bias during these dynamic iterative training processes under controlled performative feedback. This controlled setting is motivated by the inaccessibility of real-world user preference data from dynamic production systems, and enables us to isolate and analyze feedback-driven bias evolution in a principled manner. We focus on two types of loops, including the typical retraining setting and the incremental fine-tuning setting, which is largely underexplored. Through experiments on three real-world tasks, we find that the performative loop increases preference bias and decreases disparate bias. We design a reward-based rejection sampling strategy to mitigate the bias, moving towards more trustworthy self-improving systems.

preprint2022arXiv

Circular designs for total effects under interference models

This paper studies circular designs for interference models, where a treatment assigned to a plot also affects its neighboring plots within a block. For the purpose of estimating total effects, the circular neighbor balanced design was shown to be universally optimal among designs which do not allow treatments to be neighbors of themselves. Our study shows that self-neighboring block sequences are actually the main ingredient for an optimal design. Here, we adopt the approximate design framework and study optimal designs in the whole design space. Our approach is flexible enough to accommodate all possible design parameters, that is the block size and the number of blocks and treatments. This approach can be broken down into two main steps: the identification of the minimal supporting set of block sequences and the optimality condition built on it. The former is critical for reducing the computational time from almost infinity to seconds. Meanwhile, the task of finding the minimal set is normally achieved through numerical methods, which can only handle small block sizes. Our approach is of a hybrid nature in order to deal with all design sizes. When block size is not large, we provide explicit expressions of the minimal set instead of relying on numerical methods. For larger block sizes when a typical numerical method would fail, we theoretically derived a reasonable size intermediate set of sequences, from which the minimal set can be quickly derived through a customized algorithm. Taking it further, the optimality conditions allow us to obtain both symmetric and asymmetric designs. Lastly, we also investigate the trade-off issue between circular and noncircular designs, and provide guidelines on the choices.

preprint2020arXiv

Fairness in Learning-Based Sequential Decision Algorithms: A Survey

Algorithmic fairness in decision-making has been studied extensively in static settings where one-shot decisions are made on tasks such as classification. However, in practice most decision-making processes are of a sequential nature, where decisions made in the past may have an impact on future data. This is particularly the case when decisions affect the individuals or users generating the data used for future decisions. In this survey, we review existing literature on the fairness of data-driven sequential decision-making. We will focus on two types of sequential decisions: (1) past decisions have no impact on the underlying user population and thus no impact on future data; (2) past decisions have an impact on the underlying user population and therefore the future data, which can then impact future decisions. In each case the impact of various fairness interventions on the underlying population is examined.