Source author record

Yuan Zhou

Yuan Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

78works

28topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Representation autoencoders that reuse frozen pretrained vision encoders as visual tokenizers have achieved strong reconstruction and generation quality. However, existing methods universally extract features from only the last encoder layer, discarding the rich hierarchical information distributed across intermediate layers. We show that low-level visual details survive in the last layer merely as attenuated residuals after multiple layers of semantic abstraction, and that explicitly fusing multi-layer features can substantially recover this lost information. We propose DRoRAE (Depth-Routed Representation AutoEncoder), a lightweight fusion module that adaptively aggregates all encoder layers via energy-constrained routing and incremental correction, producing an enriched latent compatible with a frozen pretrained decoder. A three-phase decoupled training strategy first learns the fusion under the implicit distributional constraint of the frozen decoder, then fine-tunes the decoder to fully exploit the enriched representation. On ImageNet-256, DRoRAE reduces rFID from 0.57 to 0.29 and improves generation FID from 1.74 to 1.65 (with AutoGuidance), with gains also transferring to text-to-image synthesis. Furthermore, we uncover a log-linear scaling law ($R^2{=}0.86$) between fusion capacity and reconstruction quality, identifying \textit{representation richness} as a new, predictably scalable dimension for visual tokenizers analogous to vocabulary size in NLP.

preprint2026arXiv

Drivora: A Unified and Extensible Infrastructure for Search-based Autonomous Driving Testing

Search-based testing is critical for evaluating the safety and reliability of autonomous driving systems (ADSs). However, existing approaches are often built on heterogeneous frameworks (e.g., distinct scenario spaces, simulators, and ADSs), which require considerable effort to reuse and adapt across different settings. To address these challenges, we present Drivora, a unified and extensible infrastructure for search-based ADS testing built on the widely used CARLA simulator. Drivora introduces a unified scenario definition, OpenScenario, that specifies scenarios using low-level, actionable parameters to ensure compatibility with existing methods while supporting extensibility to new testing designs (e.g., multi-autonomous-vehicle testing). On top of this, Drivora decouples the testing engine, scenario execution, and ADS integration. The testing engine leverages evolutionary computation to explore new scenarios and supports flexible customization of core components. The scenario execution can run arbitrary scenarios using a parallel execution mechanism that maximizes hardware utilization for large-scale batch simulation. For ADS integration, Drivora provides access to 12 ADSs through a unified interface, streamlining configuration and simplifying the incorporation of new ADSs. Our tools are publicly available at https://github.com/MingfeiCheng/Drivora.

preprint2026arXiv

Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

We study the dynamic joint assortment selection and positioning problem, where the attraction of each product depends on both its intrinsic appeal and its display position under a Multinomial Logit (MNL) choice framework. Our study ranges from the multiplicative position effects model, in which each product's attraction is scaled by a position-specific factor, to a general position effects model assigning independent attraction parameters to every product--position pair to capture heterogeneous synergies. For both models, we design round-based learning algorithms that update decisions after every single feedback, and establish the first regret-optimal characterization. Besides, our round-based algorithms provide the prompt operations needed by modern platforms. For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and prove that our algorithm P2MLE-UCB attains a regret of $\tilde{O}(\sqrt{NT})$, matching the lower bound and closing the $\sqrt{K}$ gap left by prior epoch-based analyses. For the general model, we establish a minimax lower bound and propose GP2-UCB with a matching upper bound. Moreover, we design an efficient subroutine for the per-round joint assortment and positioning optimization based on Dinkelbach's method and maximum-weight bipartite matching. Numerical experiments on synthetic data and the Expedia dataset show that our algorithms consistently outperform state-of-the-art benchmarks.

preprint2026arXiv

Perturbed Proximal Gradient ADMM for Nonconvex Composite Optimization

This paper proposes a Perturbed Proximal Gradient ADMM (PPG-ADMM) framework for solving general nonconvex composite optimization problems, where the objective function consists of a smooth nonconvex term and a nonsmooth weakly convex term for both primal variables. Unlike existing ADMM-based methods which necessitate the function associated with the last updated primal variable to be smooth, the proposed PPG-ADMM removes this restriction by introducing a perturbation mechanism, which also helps reduce oscillations in the primal-dual updates, thereby improving convergence stability. By employing a linearization technique for the smooth term and the proximal operator for the nonsmooth and weakly convex term, the subproblems have closed-form solutions, significantly reducing computational complexity. The convergence is established through a technically constructed Lyapunov function, which guarantees sufficient descent and has a well-defined lower bound. With properly chosen parameters, PPG-ADMM converges to an $ε$-approximate stationary point at a sublinear convergence rate of $\mathcal{O}(1/\sqrt{K})$. Furthermore, by appropriately tuning the perturbation parameter $β$, it achieves an $ε$-stationary point, providing stronger optimality guarantees. We further apply PPG-ADMM to two practical distributed nonconvex composite optimization problems, i.e., the distributed partial consensus problem and the resource allocation problem. The algorithm operates in a fully decentralized manner without a central coordinating node. Finally, numerical experiments validate the effectiveness of PPG-ADMM, demonstrating its improved convergence performance.

preprint2026arXiv

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

Reinforcement learning from verifiable rewards (RLVR) is a promising paradigm for improving large language model (LLM) agents on long-horizon interactive tasks. However, in partially observable environments, incomplete observations cause agent beliefs to drift over time, while delayed rewards obscure the causal impact of intermediate decisions, exacerbating temporal credit assignment challenges. To address this, we propose ReBel (Reward Belief), a process-level reinforcement learning algorithm that explicitly models structured belief states to summarize interaction history and guide subsequent policy learning. ReBel introduces belief-consistency supervision, converting discrepancies between predicted beliefs and observed feedback into dense self-supervised signals without requiring external step-wise annotations or verifiers. It also employs belief-aware grouping to compare trajectories under similar belief states, yielding more robust and lower-variance advantage estimates. We evaluate ReBel on challenging long-horizon benchmarks, including ALFWorld and WebShop. ReBel improves task success by up to $20.4$ percentage points over the episode-level baseline GRPO and increases sample efficiency by $2.1\times$. These results suggest that belief-aware self-supervision is a promising direction for reliable long-horizon decision-making under partial observability. Code is available at: https://github.com/Fateyetian/Rebel.git.

preprint2026arXiv

SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models

Recent video diffusion models (VDMs) synthesize visually convincing clips, yet still drop entities, mis-bind attributes, and weaken the interactions specified in the prompt. Representation-alignment objectives such as VideoREPA and MoAlign improve fine-grained text following by distilling spatio-temporal token relations from a frozen visual foundation model, but their pairwise supervision budget is allocated by visual or motion cues rather than by how relevant each pair is to the prompt. We present SARA, Semantically Adaptive Relational Alignment, which keeps token-relation distillation (TRD) on a frozen VFM target and adds a text-conditioned saliency that decides which token pairs carry supervision. A lightweight Stage 1 aligner is trained with per-entity SAM 3.1 mask supervision and an InfoNCE regulariser, and its continuous saliency is fused into TRD through a pair-routing operator that assigns each token pair a weight whenever either of its two endpoints is salient, thereby routing supervision toward subject-subject and subject-background pairs and away from background-background ones. In the Wan2.2 continual-training setting, SARA improves both text alignment and motion quality over SFT, VideoREPA, and MoAlign on a 13-dimension VLM rubric, on the public VBench benchmarks, and in a blind user study.

preprint2026arXiv

Video Generation Models Are Good Latent Reward Models

Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach incurs substantial memory overhead and increased training time, and its late-stage optimization lacks early-stage supervision, refining only visual quality rather than fundamental motion dynamics and structural coherence. In this work, we show that pre-trained video generation models are naturally suited for reward modeling in the noisy latent space, as they are explicitly designed to process noisy latent representations at arbitrary timesteps and inherently preserve temporal information through their sequential modeling capabilities. Accordingly, we propose Process Reward Feedback Learning~(PRFL), a framework that conducts preference optimization entirely in latent space, enabling efficient gradient backpropagation throughout the full denoising chain without VAE decoding. Extensive experiments demonstrate that PRFL significantly improves alignment with human preferences, while achieving substantial reductions in memory consumption and training time compared to RGB ReFL.

preprint2022arXiv

A global second order Sobolev regularity for $p$-Laplacian type equations with variable coefficients in bounded domains

Let $Ω\subset R^n$ be a bounded convex domain with $n\ge2$. Suppose that $A$ is uniformly elliptic and belongs to $W^{1,n}$ when $n\ge 3$ or $W^{1,q}$ for some $q>2$ when $n=2$. For $1<p<\infty$, we build up a global second order regularity estimate $$\|D[|Du|^{p-2} Du]\|_{L^2(Ω)}+\|D[ |\sqrt{A}Du|^{p-2} A Du]\|_{L^2(Ω)} \le C \|f\|_{L^2(Ω)} $$ for inhomogeneous $p$-Laplace type equation \begin{equation} -\mathrm{div}\big(\langle A Du,Du\rangle ^{\frac{p-2}2} A Du\big)=f \quad\rm{in }\ Ω\mbox{ with Dirichlet/Neumann $0$-boundary.} \end{equation} Similar result was also built up for certain bounded Lipschitz domain whose boundary is weakly second order differentiable and satisfies some smallness assumptions.

preprint2022arXiv

AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning

Reinforcement Learning (RL) techniques have drawn great attention in many challenging tasks, but their performance deteriorates dramatically when applied to real-world problems. Various methods, such as domain randomization, have been proposed to deal with such situations by training agents under different environmental setups, and therefore they can be generalized to different environments during deployment. However, they usually do not incorporate the underlying environmental factor information that the agents interact with properly and thus can be overly conservative when facing changes in the surroundings. In this paper, we first formalize the task of adapting to changing environmental dynamics in RL as a generalization problem using Contextual Markov Decision Processes (CMDPs). We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks. We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.

preprint2022arXiv

Adaptive Double-Exploration Tradeoff for Outlier Detection

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

preprint2022arXiv

Automatic Map Generation for Autonomous Driving System Testing

High-definition (HD) maps are essential in testing autonomous driving systems (ADSs). HD maps essentially determine the potential diversity of the testing scenarios. However, the current HD maps suffer from two main limitations: lack of junction diversity in the publicly available HD maps and cost-consuming to build a new HD map. Hence, in this paper, we propose, FEAT2MAP, to automatically generate concise HD maps with scenario diversity guarantees. FEAT2MAP focuses on junctions as they significantly influence scenario diversity, especially in urban road networks. FEAT2MAP first defines a set of features to characterize junctions. Then, FEAT2MAP extracts and samples concrete junction features from a list of input HD maps or user-defined requirements. Each junction feature generates a junction. Finally, FEAT2MAP builds a map by connecting the junctions in a grid layout. To demonstrate the effectiveness of FEAT2MAP, we conduct experiments with the public HD maps from SVL and the open-source ADS Apollo. The results show that FEAT2MAP can (1) generate new maps of reduced size while maintaining scenario diversity in terms of the code coverage and motion states of the ADS under test, and (2) generate new maps of increased scenario diversity by merging intersection features from multiple maps or taking user inputs.

preprint2022arXiv

Dynamic Car Dispatching and Pricing: Revenue and Fairness for Ridesharing Platforms

A major challenge for ridesharing platforms is to guarantee profit and fairness simultaneously, especially in the presence of misaligned incentives of drivers and riders. We focus on the dispatching-pricing problem to maximize the total revenue while keeping both drivers and riders satisfied. We study the computational complexity of the problem, provide a novel two-phased pricing solution with revenue and fairness guarantees, extend it to stochastic settings and develop a dynamic (a.k.a., learning-while-doing) algorithm that actively collects data to learn the demand distribution during the scheduling process. We also conduct extensive experiments to demonstrate the effectiveness of our algorithms.

preprint2022arXiv

Generating Adjacency Matrix for Video Relocalization

In this paper, we continue our work on video relocalization task. Based on using graph convolution to extract intra-video and inter-video frame features, we improve the method by using similarity-metric based graph convolution, whose weighted adjacency matrix is achieved by calculating similarity metric between features of any two different time steps in the graph. Experiments on ActivityNet v1.2 and Thumos14 dataset show the effectiveness of this improvement, and it outperforms the state-of-the-art methods.

preprint2022arXiv

Graph Neural Network for Video Relocalization

In this paper, we focus on video relocalization task, which uses a query video clip as input to retrieve a semantic relative video clip in another untrimmed long video. we find that in video relocalization datasets, there exists a phenomenon showing that there does not exist consistent relationship between feature similarity by frame and feature similarity by video, which affects the feature fusion among frames. However, existing video relocalization methods do not fully consider it. Taking this phenomenon into account, in this article, we treat video features as a graph by concatenating the query video feature and proposal video feature along time dimension, where each timestep is treated as a node, each row of the feature matrix is treated as feature of each node. Then, with the power of graph neural networks, we propose a Multi-Graph Feature Fusion Module to fuse the relation feature of this graph. After evaluating our method on ActivityNet v1.2 dataset and Thumos14 dataset, we find that our proposed method outperforms the state of art methods.

preprint2022arXiv

Imitation Learning from Observations under Transition Model Disparity

Learning to perform tasks by leveraging a dataset of expert observations, also known as imitation learning from observations (ILO), is an important paradigm for learning skills without access to the expert reward function or the expert actions. We consider ILO in the setting where the expert and the learner agents operate in different environments, with the source of the discrepancy being the transition dynamics model. Recent methods for scalable ILO utilize adversarial learning to match the state-transition distributions of the expert and the learner, an approach that becomes challenging when the dynamics are dissimilar. In this work, we propose an algorithm that trains an intermediary policy in the learner environment and uses it as a surrogate expert for the learner. The intermediary policy is learned such that the state transitions generated by it are close to the state transitions in the expert dataset. To derive a practical and scalable algorithm, we employ concepts from prior work on estimating the support of a probability distribution. Experiments using MuJoCo locomotion tasks highlight that our method compares favorably to the baselines for ILO with transition dynamics mismatch.

preprint2022arXiv

Interior Hölder regularity for stable solutions to semilinear elliptic equations up to dimension 5

Let $2\le n\le 5$. We establish an apriori interior Hölder regularity of $C^2$-stable solutions to the semilinear equation $-Δu=f(u)$ in any domain of $R^n$ for any nonlinearity $f\in C^{0,1}(R) $.If $f $ is nondecreasing and convex in addition,we obtain an interior Hölder regularity, and hence the local boundedness, of $W^{1,2}(Ω)$-stable solutions by locally approximating them via $C^2(Ω)$-stable solutions. In particular, we do not require any lower bound on $f$.

preprint2022arXiv

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

Many practical applications of reinforcement learning require agents to learn from sparse and delayed rewards. It challenges the ability of agents to attribute their actions to future outcomes. In this paper, we consider the problem formulation of episodic reinforcement learning with trajectory feedback. It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory. A popular paradigm for this problem setting is learning with a designed auxiliary dense reward function, namely proxy reward, instead of sparse environmental signals. Based on this framework, this paper proposes a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning. We establish a surrogate problem by Monte-Carlo sampling that scales up least-squares-based reward redistribution to long-horizon problems. We analyze our surrogate loss function by connection with existing methods in the literature, which illustrates the algorithmic properties of our approach. In experiments, we extensively evaluate our proposed method on a variety of benchmark tasks with episodic rewards and demonstrate substantial improvement over baseline algorithms.

preprint2022arXiv

Monolithically integrated active passive waveguide array fabricated on thin film lithium niobate using a single continuous photolithography process

We demonstrate a robust low-loss optical interface by tiling passive (i.e., without doping of active ions) thin film lithium niobate (TFLN) and active (i.e., doped with rare earth ions) TFLN substrates for monolithic integration of passive/active lithium niobate photonics. The tiled substrates composed of both active and passive areas allow to pattern the mask of the integrated active passive photonic device at once using a single continuous photolithography process. The interface loss of tiled substrate is measured as low as 0.26 dB. Thanks to the stability provided by this approach, a four-channel waveguide amplifier is realized in a straightforward manner, which shows a net gain of ~5 dB at 1550-nm wavelength and that of ~8 dB at 1530-nm wavelength for each channel. The robust low-loss optical interface for passive/active photonic integration will facilitate large-scale high performance photonic devices which require on-chip light sources and amplifiers.

preprint2022arXiv

Monolithically integrated waveguide-coupled single-frequency microlaser on erbium-doped thin film lithium niobate

We overcome the difficulty in realizing a monolithic waveguide-coupled microring laser integrated on erbium-doped thin film lithium niobate (Er: TFLN) using photolithography assisted chemo-mechanical etching (PLACE) technique. We demonstrate an integrated single-frequency microring laser operating around 1531 nm wavelength. The PLACE technique, enabling integrated Er: TFLN photonics with low propagation loss, can thus be used to realize low cost mass production of monolithic on-chip microlasers with applications ranging from optical communication and photonic integrated circuit (PIC) to precision metrology and large-scale sensing.

preprint2022arXiv

On-chip integrated Yb3+-doped waveguide amplifiers on thin film lithium niobate

We report the fabrication and optical characterization of Yb3+-doped waveguide amplifiers (YDWA) on the thin film lithium niobate fabricated by photolithography assisted chemo-mechanical etching. The fabricated Yb3+-doped lithium niobate waveguides demonstrates low propagation loss of 0.13 dB/cm at 1030 nm and 0.1 dB/cm at 1060 nm. The internal net gain of 5 dB at 1030 nm and 8 dB at 1060 nm are measured on a 4.0 cm long waveguide pumped by 976nm laser diodes, indicating the gain per unit length of 1.25 dB/cm at 1030 nm and 2 dB/cm at 1060 nm, respectively. The integrated Yb3+-doped lithium niobate waveguide amplifiers will benefit the development of a powerful gain platform and are expected to contribute to the high-density integration of thin film lithium niobate based photonic chip.

preprint2022arXiv

Physical Backdoor Attacks to Lane Detection Systems in Autonomous Driving

Modern autonomous vehicles adopt state-of-the-art DNN models to interpret the sensor data and perceive the environment. However, DNN models are vulnerable to different types of adversarial attacks, which pose significant risks to the security and safety of the vehicles and passengers. One prominent threat is the backdoor attack, where the adversary can compromise the DNN model by poisoning the training samples. Although lots of effort has been devoted to the investigation of the backdoor attack to conventional computer vision tasks, its practicality and applicability to the autonomous driving scenario is rarely explored, especially in the physical world. In this paper, we target the lane detection system, which is an indispensable module for many autonomous driving tasks, e.g., navigation, lane switching. We design and realize the first physical backdoor attacks to such system. Our attacks are comprehensively effective against different types of lane detection algorithms. Specifically, we introduce two attack methodologies (poison-annotation and clean-annotation) to generate poisoned samples. With those samples, the trained lane detection model will be infected with the backdoor, and can be activated by common objects (e.g., traffic cones) to make wrong detections, leading the vehicle to drive off the road or onto the opposite lane. Extensive evaluations on public datasets and physical autonomous vehicles demonstrate that our backdoor attacks are effective, stealthy and robust against various defense solutions. Our codes and experimental videos can be found in https://sites.google.com/view/lane-detection-attack/lda.

preprint2022arXiv

Real-time Semantic Segmentation via Spatial-detail Guided Context Propagation

Nowadays, vision-based computing tasks play an important role in various real-world applications. However, many vision computing tasks, e.g. semantic segmentation, are usually computationally expensive, posing a challenge to the computing systems that are resource-constrained but require fast response speed. Therefore, it is valuable to develop accurate and real-time vision processing models that only require limited computational resources. To this end, we propose the Spatial-detail Guided Context Propagation Network (SGCPNet) for achieving real-time semantic segmentation. In SGCPNet, we propose the strategy of spatial-detail guided context propagation. It uses the spatial details of shallow layers to guide the propagation of the low-resolution global contexts, in which the lost spatial information can be effectively reconstructed. In this way, the need for maintaining high-resolution features along the network is freed, therefore largely improving the model efficiency. On the other hand, due to the effective reconstruction of spatial details, the segmentation accuracy can be still preserved. In the experiments, we validate the effectiveness and efficiency of the proposed SGCPNet model. On the Citysacpes dataset, for example, our SGCPNet achieves 69.5% mIoU segmentation accuracy, while its speed reaches 178.5 FPS on 768x1536 images on a GeForce GTX 1080 Ti GPU card. In addition, SGCPNet is very lightweight and only contains 0.61 M parameters.

preprint2022arXiv

SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation

Automated segmentation in medical image analysis is a challenging task that requires a large amount of manually labeled data. However, most existing learning-based approaches usually suffer from limited manually annotated medical data, which poses a major practical problem for accurate and robust medical image segmentation. In addition, most existing semi-supervised approaches are usually not robust compared with the supervised counterparts, and also lack explicit modeling of geometric structure and semantic information, both of which limit the segmentation accuracy. In this work, we present SimCVD, a simple contrastive distillation framework that significantly advances state-of-the-art voxel-wise representation learning. We first describe an unsupervised training strategy, which takes two views of an input volume and predicts their signed distance maps of object boundaries in a contrastive objective, with only two independent dropout as mask. This simple approach works surprisingly well, performing on the same level as previous fully supervised methods with much less labeled data. We hypothesize that dropout can be viewed as a minimal form of data augmentation and makes the network robust to representation collapse. Then, we propose to perform structural distillation by distilling pair-wise similarities. We evaluate SimCVD on two popular datasets: the Left Atrial Segmentation Challenge (LA) and the NIH pancreas CT dataset. The results on the LA dataset demonstrate that, in two types of labeled ratios (i.e., 20% and 10%), SimCVD achieves an average Dice score of 90.85% and 89.03% respectively, a 0.91% and 2.22% improvement compared to previous best results. Our method can be trained in an end-to-end fashion, showing the promise of utilizing SimCVD as a general framework for downstream tasks, such as medical image synthesis, enhancement, and registration.

preprint2021arXiv

Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records

With the successful adoption of machine learning on electronic health records (EHRs), numerous computational models have been deployed to address a variety of clinical problems. However, due to the heterogeneity of EHRs, models trained on different patient groups suffer from poor generalizability. How to mitigate domain shifts between the source patient group where the model is built upon and the target one where the model will be deployed becomes a critical issue. In this paper, we propose a data augmentation method to facilitate domain adaptation, which leverages knowledge from the source patient group when training model on the target one. Specifically, adversarially generated samples are used during domain adaptation to fill the generalization gap between the two patient groups. The proposed method is evaluated by a case study on different predictive modeling tasks on MIMIC-III EHR dataset. Results confirm the effectiveness of our method and the generality on different tasks.

preprint2021arXiv

Dynamic Assortment Selection under the Nested Logit Models

We study a stylized dynamic assortment planning problem during a selling season of finite length $T$. At each time period, the seller offers an arriving customer an assortment of substitutable products and the customer makes the purchase among offered products according to a discrete choice model. The goal of the seller is to maximize the expected revenue, or equivalently, to minimize the worst-case expected regret. One key challenge is that utilities of products are unknown to the seller and need to be learned. Although the dynamic assortment planning problem has received increasing attention in revenue management, most existing work is based on the multinomial logit choice models (MNL). In this paper, we study the problem of dynamic assortment planning under a more general choice model -- the nested logit model, which models hierarchical choice behavior and is ``the most widely used member of the GEV (generalized extreme value) family''. By leveraging the revenue-ordered structure of the optimal assortment within each nest, we develop a novel upper confidence bound (UCB) policy with an aggregated estimation scheme. Our policy simultaneously learns customers' choice behavior and makes dynamic decisions on assortments based on the current knowledge. It achieves the accumulated regret at the order of $\tilde{O}(\sqrt{MNT})$, where $M$ is the number of nests and $N$ is the number of products in each nest. We further provide a lower bound result of $Ω(\sqrt{MT})$, which shows the near optimality of the upper bound when $T$ is much larger than $M$ and $N$. When the number of items per nest $N$ is large, we further provide a discretization heuristic for better performance of our algorithm. Numerical results are presented to demonstrate the empirical performance of our proposed algorithms.

preprint2021arXiv

On-chip integrated waveguide amplifiers on Erbium-doped thin film lithium niobate on insulator

We demonstrate on-chip light amplification with integrated optical waveguide fabricated on erbium-doped thin film lithium niobate on insulator (TFLNOI) using the photolithography assisted chemo-mechanical etching (PLACE) technique. A maximum internal net gain of 18 dB in the small-signal-gain regime is measured at the peak emission wavelength of 1530 nm for a waveguide length of 3.6 cm, indicating a differential gain per unit length of 5 dB/cm. This work paves the way to the monolithic integration of diverse active and passive photonic components on the TFLNOI platform.

preprint2021arXiv

Probabilistic Programs with Stochastic Conditioning

We tackle the problem of conditioning probabilistic programs on distributions of observable variables. Probabilistic programs are usually conditioned on samples from the joint data distribution, which we refer to as deterministic conditioning. However, in many real-life scenarios, the observations are given as marginal distributions, summary statistics, or samplers. Conventional probabilistic programming systems lack adequate means for modeling and inference in such scenarios. We propose a generalization of deterministic conditioning to stochastic conditioning, that is, conditioning on the marginal distribution of a variable taking a particular form. To this end, we first define the formal notion of stochastic conditioning and discuss its key properties. We then show how to perform inference in the presence of stochastic conditioning. We demonstrate potential usage of stochastic conditioning on several case studies which involve various kinds of stochastic conditioning and are difficult to solve otherwise. Although we present stochastic conditioning in the context of probabilistic programming, our formalization is general and applicable to other settings.

preprint2021arXiv

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition

Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest assumptions, A-QED is both sound and complete. However, as is well-known, large design sizes significantly limit the scalability of formal verification, including A-QED. We overcome this scalability challenge through a new decomposition technique for A-QED, called A-QED with Decomposition (A-QED$^2$). A-QED$^2$ systematically decomposes an HA into smaller, functional sub-modules, called sub-accelerators, which are then verified independently using A-QED. We prove completeness of A-QED$^2$; in particular, if the full HA under verification contains a bug, then A-QED$^2$ ensures detection of that bug during A-QED verification of the corresponding sub-accelerators. Results on over 100 (buggy) versions of a wide variety of HAs with millions of logic gates demonstrate the effectiveness and practicality of A-QED$^2$.

preprint2021arXiv

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is an important class of sequential decision making problems with a wide range of applications to recommender systems, online advertising, healthcare, and many other machine learning related tasks. While there is a lot of prior research, tight regret bounds of linear contextual bandit with infinite action sets remain open. In this paper, we address this open problem by considering the linear contextual bandit with (changing) infinite action sets. We prove a regret upper bound on the order of $O(\sqrt{d^2T\log T})\times \text{poly}(\log\log T)$ where $d$ is the domain dimension and $T$ is the time horizon. Our upper bound matches the previous lower bound of $Ω(\sqrt{d^2 T\log T})$ in [Li et al., 2019] up to iterated logarithmic terms.

Yuan Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

78 published item(s)

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Drivora: A Unified and Extensible Infrastructure for Search-based Autonomous Driving Testing

Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

Perturbed Proximal Gradient ADMM for Nonconvex Composite Optimization

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models

Video Generation Models Are Good Latent Reward Models

A global second order Sobolev regularity for $p$-Laplacian type equations with variable coefficients in bounded domains

AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning

Adaptive Double-Exploration Tradeoff for Outlier Detection

Automatic Map Generation for Autonomous Driving System Testing

Dynamic Car Dispatching and Pricing: Revenue and Fairness for Ridesharing Platforms

Generating Adjacency Matrix for Video Relocalization

Graph Neural Network for Video Relocalization

Imitation Learning from Observations under Transition Model Disparity

Interior Hölder regularity for stable solutions to semilinear elliptic equations up to dimension 5

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

Monolithically integrated active passive waveguide array fabricated on thin film lithium niobate using a single continuous photolithography process

Monolithically integrated waveguide-coupled single-frequency microlaser on erbium-doped thin film lithium niobate

On-chip integrated Yb3+-doped waveguide amplifiers on thin film lithium niobate

Physical Backdoor Attacks to Lane Detection Systems in Autonomous Driving

Real-time Semantic Segmentation via Spatial-detail Guided Context Propagation

SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation

Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records

Dynamic Assortment Selection under the Nested Logit Models

On-chip integrated waveguide amplifiers on Erbium-doped thin film lithium niobate on insulator

Probabilistic Programs with Stochastic Conditioning

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

$\sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

Anypath Routing Protocol Design via Q-Learning for Underwater Sensor Networks

Collaborative Top Distribution Identifications with Limited Interaction

Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support

Domain Adaptive Adversarial Learning Based on Physics Model Feedback for Underwater Image Enhancement

Dual-reference Age Synthesis

Efficient Competitive Self-Play Policy Optimization

Equivariant Perturbation in Gomory and Johnson's Infinite Group Problem. VII. Inverse semigroup theory, closures, decomposition of perturbations

Furnishing Your Room by What You See: An End-to-End Furniture Set Retrieval Framework with Rich Annotated Benchmark Dataset

Giant anisotropy of spin relaxation and spin-valley mixing in a silicon quantum dot

Improving mobility of silicon metal-oxide-semiconductor devices for quantum dots by high vacuum activation annealing

Multi-IF : An Approach to Anomaly Detection in Self-Driving Systems

Multinomial Logit Bandit with Low Switching Cost

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

Phononic waveguide assisted steady state entanglement of SiV centers

Pooling Regularized Graph Neural Network for fMRI Biomarker Analysis

Soft-Root-Sign Activation Function

Stochastically Differentiable Probabilistic Programs

Quasicrystalline Chern Insulators

A quasiconformal composition problem for the Q-spaces

Hypercontractive inequalities via SOS, and the Frankl--Rödl graph

Lump solutions to nonlinear partial differential equations via Hirota bilinear forms

Parameterized Algorithms for Constraint Satisfaction Problems Above Average with Global Cardinality Constraints

Toward computer-assisted discovery and automated proofs of cutting plane theorems

A spatial compositional model (SCM) for linear unmixing and endmember uncertainty estimation

Satisfiability of Ordering CSPs Above Average

Uniqueness of absolute minimizers for $L^\fz$-functionals involving Hamiltonians $H(x,p)$

Constant Factor Lasserre Integrality Gaps for Graph Partitioning Problems

Everywhere differentiability of viscosity solutions to a class of Aronsson's equations

Hardness of robust graph isomorphism, Lasserre gaps, and asymmetry of random graphs

Hypercontractivity, Sum-of-Squares Proofs, and their Applications

Intrinsic Geometry and Analysis of Diffusion Processes and $L^\infty$-Variational Problems

Locally Testable Codes and Cayley Graphs

Resonance, Fermi surface topology, and Superconductivity in Cuprates

Approximability and proof complexity

Geometry and Analysis of Dirichlet forms

Left Derivations and Strong Commutativity Preserving Maps on Semiprime $Γ$-Rings

Transversal Propagation Behaviors of Helical Edge States In a QSH System

Approximation Algorithms and Hardness of the k-Route Cut Problem

Characterizations of Besov and Triebel-Lizorkin Spaces on Metric Measure Spaces

Polynomial integrality gaps for strong SDP relaxations of Densest k-subgraph

Anisotropic Singular Integrals in Product Spaces

Criteria for Optimal Global Integrability of Hajłasz-Sobolev Functions

Haj\lasz-Sobolev Imbedding and Extension