Researcher profile

Chunlin Chen

Chunlin Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning large-scale generative models, such as diffusion and flow models, to align with complex human preferences and user-specified tasks. A fundamental limitation remains \textit{the curse of diversity collapse}, where the objective formulation and optimization landscape inherently collapse the policy to a Dirac delta distribution. To address this challenge, we propose \textbf{DRIFT} (\textbf{D}ive\textbf{R}sity-\textbf{I}ncentivized Reinforcement \textbf{F}ine-\textbf{T}uning for Versatile Image Generation), an innovative framework that systematically incentivizes output diversity throughout the on-policy fine-tuning process, reconciling strong task alignment with high generation diversity to enhance versatility essential for applications that demand diverse candidate generations. We approach the problem across three representative perspectives: i) \textbf{sampling} a reward-concentrated subset that filters out reward outliers to prevent premature collapse; ii) \textbf{prompting} with stochastic variations to expand the conditioning space, and iii) \textbf{optimization} of the intra-group diversity with a potential-based reward shaping mechanism. Experimental results show that DRIFT achieves superior Pareto dominance regarding task alignment and generation diversity, yielding a $ 9.08\%\!\sim\! 43.46\%$ increase in diversity at equivalent alignment levels and a $ 59.65\% \!\sim\! 65.86\%$ increase in alignment at equivalent levels of diversity.

preprint2023arXiv

Depthwise Convolution for Multi-Agent Communication with Enhanced Mean-Field Approximation

Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).

preprint2022arXiv

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the non-stationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian non-parametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture, thus the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.

preprint2022arXiv

Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments

Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.

preprint2022arXiv

On compression rate of quantum autoencoders: Control design, numerical and experimental realization

Quantum autoencoders which aim at compressing quantum information in a low-dimensional latent space lie in the heart of automatic data compression in the field of quantum information. In this paper, we establish an upper bound of the compression rate for a given quantum autoencoder and present a learning control approach for training the autoencoder to achieve the maximal compression rate. The upper bound of the compression rate is theoretically proven using eigen-decomposition and matrix differentiation, which is determined by the eigenvalues of the density matrix representation of the input states. Numerical results on 2-qubit and 3-qubit systems are presented to demonstrate how to train the quantum autoencoder to achieve the theoretically maximal compression, and the training performance using different machine learning algorithms is compared. Experimental results of a quantum autoencoder using quantum optical systems are illustrated for compressing two 2-qubit states into two 1-qubit states.

preprint2022arXiv

Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction

Intelligent robots designed to interact with humans in real scenarios need to be able to refer to entities actively by natural language. In spatial referring expression generation, the ambiguity is unavoidable due to the diversity of reference frames, which will lead to an understanding gap between humans and robots. To narrow this gap, in this paper, we propose a novel perspective-corrected spatial referring expression generation (PcSREG) approach for human-robot interaction by considering the selection of reference frames. The task of referring expression generation is simplified into the process of generating diverse spatial relation units. First, we pick out all landmarks in these spatial relation units according to the entropy of preference and allow its updating through a stack model. Then all possible referring expressions are generated according to different reference frame strategies. Finally, we evaluate every expression using a probabilistic referring expression resolution model and find the best expression that satisfies both of the appropriateness and effectiveness. We implement the proposed approach on a robot system and empirical experiments show that our approach can generate more effective spatial referring expressions for practical applications.

preprint2022arXiv

Shaping Visual Representations with Attributes for Few-Shot Recognition

Few-shot recognition aims to recognize novel categories under low-data regimes. Some recent few-shot recognition methods introduce auxiliary semantic modality, i.e., category attribute information, into representation learning, which enhances the feature discrimination and improves the recognition performance. Most of these existing methods only consider the attribute information of support set while ignoring the query set, resulting in a potential loss of performance. In this letter, we propose a novel attribute-shaped learning (ASL) framework, which can jointly perform query attributes generation and discriminative visual representation learning for few-shot recognition. Specifically, a visual-attribute predictor (VAP) is constructed to predict the attributes of queries. By leveraging the attributes information, an attribute-visual attention module (AVAM) is designed, which can adaptively utilize attributes and visual representations to learn more discriminative features. Under the guidance of attribute modality, our method can learn enhanced semantic-aware representation for classification. Experiments demonstrate that our method can achieve competitive results on CUB and SUN benchmarks. Our source code is available at: \url{https://github.com/chenhaoxing/ASL}.

preprint2021arXiv

Curriculum-based Deep Reinforcement Learning for Quantum Control

Deep reinforcement learning has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel deep reinforcement learning approach by constructing a curriculum consisting of a set of intermediate tasks defined by a fidelity threshold. Tasks among a curriculum can be statically determined using empirical knowledge or adaptively generated with the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based deep reinforcement learning (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical simulations on closed quantum systems and open quantum systems demonstrate that the proposed method exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with fewer control pulses.

preprint2021arXiv

Deep Reinforcement Learning with Quantum-inspired Experience Replay

In this paper, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to traditional experience replay mechanism in DRL, the proposed deep reinforcement learning with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations, and then the preparation operation and the depreciation operation are performed on the transitions. In this progress, the preparation operation reflects the relationship between the temporal difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms such as DRL-PER and DCRL on most of these games with improved training efficiency, and is also applicable to such memory-based DRL approaches as double network and dueling network.

preprint2021arXiv

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

A central capability of a long-lived reinforcement learning (RL) agent is to incrementally adapt its behavior as its environment changes, and to incrementally build upon previous experiences to facilitate future learning in real-world scenarios. In this paper, we propose LifeLong Incremental Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments. We develop and maintain a library that contains an infinite mixture of parameterized environment models, which is equivalent to clustering environment parameters in a latent space. The prior distribution over the mixture is formulated as a Chinese restaurant process (CRP), which incrementally instantiates new environment models without any external information to signal environmental changes in advance. During lifelong learning, we employ the expectation maximization (EM) algorithm with online Bayesian inference to update the mixture in a fully incremental manner. In EM, the E-step involves estimating the posterior expectation of environment-to-cluster assignments, while the M-step updates the environment parameters for future learning. This method allows for all environment models to be adapted as necessary, with new models instantiated for environmental changes and old models retrieved when previously seen environments are encountered again. Experiments demonstrate that LLIRL outperforms relevant existing methods, and enables effective incremental adaptation to various dynamic environments for lifelong learning.

preprint2021arXiv

Measuring phonon dispersion at an interface

The breakdown of translational symmetry at heterointerfaces leads to the emergence of new phonon modes localized near the interface. These interface phonons play an essential role in thermal/electrical transport properties in devices especially in miniature ones wherein the interface may dominate the entire response of the device. Knowledge of phonon dispersion at interfaces is therefore highly desirable for device design and optimization. Although theoretical work has begun decades ago, experimental research is totally absent due to challenges in achieving combined spatial, momentum and spectral resolutions required to probe localized phonon modes. Here we use electron energy loss spectroscopy in an electron microscope to directly measure both the local phonon density of states and the interface phonon dispersion relation for an epitaxial cBN-diamond heterointerface. In addition to bulk phonon modes, we observe acoustic and optical phonon modes localized at the interface, and modes isolated away from the interface. These features only appear within ~ 1 nm around the interface. The experimental results can be nicely reproduced by ab initio calculations. Our findings provide insights into lattice dynamics at heterointerfaces and should be practically useful in thermal/electrical engineering.

preprint2020arXiv

Learning-based Quantum Robust Control: Algorithm, Applications and Experiments

Robust control design for quantum systems has been recognized as a key task in quantum information technology, molecular chemistry and atomic physics. In this paper, an improved differential evolution algorithm, referred to as \emph{msMS}\_DE, is proposed to search robust fields for various quantum control problems. In \emph{msMS}\_DE, multiple samples are used for fitness evaluation and a mixed strategy is employed for the mutation operation. In particular, the \emph{msMS}\_DE algorithm is applied to the control problems of (i) open inhomogeneous quantum ensembles and (ii) the consensus goal of a quantum network with uncertainties. Numerical results are presented to demonstrate the excellent performance of the improved machine learning algorithm for these two classes of quantum robust control problems. Furthermore, \emph{msMS}\_DE is experimentally implemented on femtosecond laser control applications to optimize two-photon absorption and control fragmentation of the molecule $\text{CH}_2\text{BrI}$. Experimental results demonstrate excellent performance of \emph{msMS}\_DE in searching for effective femtosecond laser pulses for various tasks.

preprint2020arXiv

Realization of a quantum autoencoder for lossless compression of quantum data

As a ubiquitous aspect of modern information technology, data compression has a wide range of applications. Therefore, a quantum autoencoder which can compress quantum information into a low-dimensional space is fundamentally important to achieve automatic data compression in the field of quantum information. Such a quantum autoencoder can be implemented through training the parameters of a quantum device using classical optimization algorithms. In this article, we analyze the condition of achieving a perfect quantum autoencoder and theoretically prove that a quantum autoencoder can losslessly compress high-dimensional quantum information into a low-dimensional space (also called latent space) if the number of maximum linearly independent vectors from input states is no more than the dimension of the latent space. Also, we experimentally realize a universal two-qubit unitary gate and design a quantum autoencoder device by applying machine learning method. Experimental results demonstrate that our quantum autoencoder is able to compress two two-qubit states into two one-qubit states. Besides compressing quantum information, the quantum autoencoder is used to experimentally discriminate two groups of nonorthogonal states.