Source author record

David D. Yao

David D. Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence math.PR econ.GN Machine Learning math.OC q-fin.EC

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline

We propose a deterministic adjoint matching framework that formulates human preference alignment for flow-based generative models as an optimal control problem over velocity fields. One can directly regress the control toward a value-gradient-induced target under the current policy, leading to a simple and stable training objective. Building on this perspective, we introduce a truncated adjoint scheme that focuses computation on the terminal portion of the trajectory, where reward-relevant signals concentrate, which yields substantial computational savings while preserving alignment quality. We further generalize the framework beyond standard KL-based regularization, allowing more flexible trade-offs between alignment strength and distributional preservation. Experiments on SiT-XL/2 and FLUX.2-Klein-4B demonstrate consistent gains across multiple alignment metrics, along with substantially improved diversity and mode preservation.

preprint2026arXiv

RPO: Fine-Tuning Visual Generative Models via Rich Vision-Language Preferences

Traditional preference tuning methods for LLMs/Visual Generative Models often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals from Vision Language Models (VLMs) to improve the curation of preference pairs for fine-tuning visual generative models like text-to-image diffusion models. Our approach begins with prompting VLMs to generate detailed critiques of synthesized images, from which we further prompt VLMs to extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.

preprint2024arXiv

Polynomial Voting Rules

We propose and study a new class of polynomial voting rules for a general decentralized decision/consensus system, and more specifically for the PoS (Proof of Stake) protocol. The main idea, inspired by the Penrose square-root law and the more recent quadratic voting rule, is to differentiate a voter's voting power and the voter's share (fraction of the total in the system). We show that while voter shares form a martingale process that converge to a Dirichlet distribution, their voting powers follow a super-martingale process that decays to zero over time. This prevents any voter from controlling the voting process, and thus enhances security. For both limiting results, we also provide explicit rates of convergence. When the initial total volume of votes (or stakes) is large, we show a phase transition in share stability (or the lack thereof), corresponding to the voter's initial share relative to the total. We also study the scenario in which trading (of votes/stakes) among the voters is allowed, and quantify the level of risk sensitivity (or risk averse) in three categories, corresponding to the voter's utility being a super-martingale, a sub-martingale, and a martingale. For each category, we identify the voter's best strategy in terms of participation and trading.

preprint2015arXiv

Matching Supply and Demand in Production-Inventory Systems: Asymptotics and Optimization

We consider a general class of high-volume, fast-moving production-inventory systems based on both lost-sales and backorder inventory models. Such systems require a fundamental understanding of the asymptotic behavior of key performance measures under various supply strategies, as well as the pre-planning of these strategies. Our analysis relies on a thorough study of the asymptotic behavior of a random walk with power drift, which is of independent interest. In addition to providing key insights, our analysis leads to approximations of the corresponding optimization problem that yield simple solutions which are close to optimal. We also establish an equivalence between the lost-sales and backorder models when both have the same penalty cost that becomes large.