Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
57works
0followers
30topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

57 published item(s)

preprint2026arXiv

RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation

Radiology report generation (RRG) has emerged as a promising approach to alleviate radiologists' workload and reduce human errors by automatically generating diagnostic reports from medical images. A key challenge in RRG is achieving fine-grained alignment between complex visual features and the hierarchical structure of long-form radiology reports. Although recent methods have improved image-text representation learning, they often treat reports as flat sequences, overlooking their structured sections and semantic hierarchies. This simplification hinders precise cross-modal alignment and weakens RRG accuracy. To address this challenge, we propose RIHA (Report-Image Hierarchical Alignment Transformer), a novel end-to-end framework that performs multi-level alignment between radiological images and their corresponding reports across paragraph, sentence, and word levels. This hierarchical alignment enables more precise cross-modal mapping, essential for capturing the nuanced semantics embedded in clinical narratives. Specifically, RIHA introduces a Visual Feature Pyramid (VFP) to extract multi-scale visual features and a Text Feature Pyramid (TFP) to represent multi-granularity textual structures. These components are integrated through a Cross-modal Hierarchical Alignment (CHA) module, leveraging optimal transport to effectively align visual and textual features across various levels. Furthermore, we incorporate Relative Positional Encoding (RPE) into the decoder to model spatial and semantic relationships among tokens, enhancing the token-level alignment between visual features and generated text. Extensive experiments on two benchmark chest X-ray datasets, IU-Xray and MIMIC-CXR, demonstrate that RIHA outperforms existing state-of-the-art models in both natural language generation and clinical efficacy metrics.

preprint2022arXiv

A Survey on Model-based Reinforcement Learning

Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. While RL achieves outstanding success in playing complex video games that allow huge trial-and-error, making errors is always undesired in the real world. To improve the sample efficiency and thus reduce the errors, model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs. In this survey, we take a review of MBRL with a focus on the recent progress in deep RL. For non-tabular environments, there is always a generalization error between the learned environment model and the real environment. As such, it is of great importance to analyze the discrepancy between policy training in the environment model and that in the real environment, which in turn guides the algorithm design for better model learning, model usage, and policy training. Besides, we also discuss the recent advances of model-based techniques in other forms of RL, including offline RL, goal-conditioned RL, multi-agent RL, and meta-RL. Moreover, we discuss the applicability and advantages of MBRL in real-world tasks. Finally, we end this survey by discussing the promising prospects for the future development of MBRL. We think that MBRL has great potential and advantages in real-world applications that were overlooked, and we hope this survey could attract more research on MBRL.

preprint2022arXiv

Accelerated quantum adiabatic transfer in superconducting qubits

Quantum adiabatic transfer is widely used in quantum computation and quantum simulation. However, the transfer speed is limited by the quantum adiabatic approximation condition, which hinders its application in quantum systems with a short decoherence time. Here we demonstrate quantum adiabatic state transfers that jump along geodesics in one-qubit and two-qubit superconducting transmons. This approach possesses the advantages of speed, robustness, and high fidelity compared with the usual adiabatic process. Our protocol provides feasible strategies for improving state manipulation and gate operation in superconducting quantum circuits.

preprint2022arXiv

Active Hierarchical Exploration with Stable Subgoal Representation Learning

Goal-conditioned hierarchical reinforcement learning (GCHRL) provides a promising approach to solving long-horizon tasks. Recently, its success has been extended to more general settings by concurrently learning hierarchical policies and subgoal representations. Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level policy learning relies on external rewards. As the high-level policy selects subgoals in an online learned representation space, the dynamic change of the subgoal space severely hinders effective high-level exploration. In this paper, we propose a novel regularization that contributes to both stable and efficient subgoal representation learning. Building upon the stable representation, we design measures of novelty and potential for subgoals, and develop an active hierarchical exploration strategy that seeks out new promising subgoals and states without intrinsic rewards. Experimental results show that our approach significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards.

preprint2022arXiv

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation improve model accuracy.

preprint2022arXiv

Context-Aware Sparse Deep Coordination Graphs

Learning sparse coordination graphs adaptive to the coordination dynamics among agents is a long-standing problem in cooperative multi-agent learning. This paper studies this problem and proposes a novel method using the variance of payoff functions to construct context-aware sparse coordination topologies. We theoretically consolidate our method by proving that the smaller the variance of payoff functions is, the less likely action selection will change after removing the corresponding edge. Moreover, we propose to learn action representations to effectively reduce the influence of payoff functions' estimation errors on graph construction. To empirically evaluate our method, we present the Multi-Agent COordination (MACO) benchmark by collecting classic coordination problems in the literature, increasing their difficulty, and classifying them into different types. We carry out a case study and experiments on the MACO and StarCraft II micromanagement benchmark to demonstrate the dynamics of sparse graph learning, the influence of graph sparseness, and the learning performance of our method. (The MACO benchmark and codes are publicly available at https://github.com/TonghanWang/CASEC-MACO-benchmark.)

preprint2022arXiv

Continuous-Time and Event-Triggered Online Optimization for Linear Multi-Agent Systems

This paper studies the decentralized online convex optimization problem for heterogeneous linear multi-agent systems. Agents have access to their time-varying local cost functions related to their own outputs, and there are also time-varying coupling inequality constraints among them. The goal of each agent is to minimize the global cost function by selecting appropriate local actions only through communication between neighbors. We design a distributed controller based on the saddle-point method which achieves constant regret bound and sublinear fit bound. In addition, to reduce the communication overhead, we propose an event-triggered communication scheme and show that the constant regret bound and sublinear fit bound are still achieved in the case of discrete communications with no Zeno behavior. A numerical example is provided to verify the proposed algorithms.with no Zeno behavior. A numerical example is provided to verify the proposed algorithms.

preprint2022arXiv

Convolutional Neural Networks with A Topographic Representation Module for EEG-Based Brain-Computer Interfaces

Objective: Convolutional Neural Networks (CNNs) have shown great potential in the field of Brain-Computer Interfaces (BCIs). The raw Electroencephalogram (EEG) signal is usually represented as 2-Dimensional (2-D) matrix composed of channels and time points, which ignores the spatial topological information. Our goal is to make the CNN with the raw EEG signal as input have the ability to learn EEG spatial topological features, and improve its performance while essentially maintaining its original structure. Methods:We propose an EEG Topographic Representation Module (TRM). This module consists of (1) a mapping block from the raw EEG signal to a 3-D topographic map and (2) a convolution block from the topographic map to an output of the same size as input. According to the size of the kernel used in the convolution block, we design 2 types of TRMs, namely TRM-(5,5) and TRM-(3,3). We embed the TRM into 3 widely used CNNs, and tested them on 2 publicly available datasets (Emergency Braking During Simulated Driving Dataset (EBDSDD), and High Gamma Dataset (HGD)). Results: The results show that the classification accuracies of all 3 CNNs are improved on both datasets after using the TRM. With TRM-(5,5), the average accuracies of DeepConvNet, EEGNet and ShallowConvNet are improved by 6.54%, 1.72% and 2.07% on EBDSDD, and by 6.05%, 3.02% and 5.14% on HGD, respectively; with TRM-(3,3), they are improved by 7.76%, 1.71% and 2.17% on EBDSDD, and by 7.61%, 5.06% and 6.28% on HGD, respectively. Significance: We improve the classification performance of 3 CNNs on 2 datasets by the use of TRM, indicating that it has the capability to mine the EEG spatial topological information. In addition, since the output of TRM has the same size as the input, CNNs with the raw EEG signal as input can use this module without changing their original structures.

preprint2022arXiv

Cosmological perturbations in the spatially covariant gravity with a dynamical lapse function

We investigate the scalar perturbations in a class of spatially covariant gravity theory with a dynamical lapse function. Generally, there are two scalar degrees of freedom due to the presence of the velocity of the lapse function. We treat the scalar perturbations as analogues of those in a two-field inflationary mode, in which one is light mode and the other is the heavy mode. This is justified by the fact that the scalar mode due to the dynamical lapse function becomes infinitely heavy in the limit when the lapse function reduces to be an auxiliary variable. The standard approaches of multiple filed perturbations can be applied to deal with our model. By integrating out the heavy mode and derive the effective theory for the single light field, we find the solution to the single mode in the form of plane waves. Then we calculate the corrections to the power spectrum of the light mode from the heavy mode, by making use of the standard perturbative method of field theory. At last, when the two fields are not weakly coupled, we find a power law mode for the coupled system in large scales.

preprint2022arXiv

EEG-Based Detection of Braking Intention During Simulated Driving

Accurately detecting and identifying drivers' braking intention is the basis of man-machine driving. In this paper, we proposed an electroencephalographic (EEG)-based braking intention measurement strategy. We used the Car Learning to Act (Carla) platform to build the simulated driving environment. 11 subjects participated in our study, and each subject drove a simulated vehicle to complete emergency braking and normal braking tasks. We compared the EEG topographic maps in different braking situations and used three different classifiers to predict the subjects' braking intention through EEG signals. The experimental results showed that the average response time of subjects in emergency braking was 762 ms; emergency braking and no braking can be well distinguished, while normal braking and no braking were not easy to be classified; for the two different types of braking, emergency braking and normal braking had obvious differences in EEG topographic maps, and the classification results also showed that the two were highly distinguishable. This study provides a user-centered driver-assistance system and a good framework to combine with advanced shared control algorithms, which has the potential to be applied to achieve a more friendly interaction between the driver and vehicle in real driving environment.

preprint2022arXiv

Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library

Mathematical reasoning recently has been shown as a hard challenge for neural systems. Abilities including expression translation, logical reasoning, and mathematics knowledge acquiring appear to be essential to overcome the challenge. This paper demonstrates that some abilities can be achieved through abductive combination with discrete systems that have been programmed with human knowledge. On a mathematical reasoning dataset, we adopt the recently proposed abductive learning framework, and propose the ABL-Sym algorithm that combines the Transformer neural models with a symbolic mathematics library. ABL-Sym shows 9.73% accuracy improvement on the interpolation tasks and 47.22% accuracy improvement on the extrapolation tasks, over the state-of-the-art approaches. Online demonstration: http://math.polixir.ai

preprint2022arXiv

Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation

Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre. Previous works rely on conventional aggregation modules (e.g., dilated convolution, convolutional LSTM), which only make use of the local context. In this paper, we propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance, by progressively capturing the global context. We firstly develop a hierarchy Transformer to capture intra-video relation that includes richer spatial and temporal cues from neighbor pixels and previous frames. A joint space-time window shift scheme is proposed to efficiently aggregate these two cues into each pixel embedding. Then, we explore inter-video relation via pixel-to-pixel contrastive learning, which well structures the global embedding space. A multi-source contrast training objective is developed to group the pixel embeddings across videos with the ground-truth guidance, which is crucial for learning the global property of the whole data. We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches. Code is available at https://github.com/YuemingJin/STswinCL.

preprint2022arXiv

Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting. In this paper, we propose Hybrid Value Estimation (HVE) to reduce value estimation error, which trades off bias and variance by balancing between the value estimation from offline data and the learned model. Theoretical analysis discloses that HVE enjoys a better error bound than the direct methods. HVE can be leveraged in both off-policy evaluation and offline reinforcement learning settings. We, therefore, provide two concrete algorithms Off-policy HVE (OPHVE) and Model-based Offline HVE (MOHVE), respectively. Empirical evaluations on MuJoCo tasks corroborate the theoretical claim. OPHVE outperforms other off-policy evaluation methods in all three metrics measuring the estimation effectiveness, while MOHVE achieves better or comparable performance with state-of-the-art offline reinforcement learning algorithms. We hope that HVE could shed some light on further research on reinforcement learning from fixed data.

preprint2022arXiv

Interaction expansion inchworm Monte Carlo solver for lattice and impurity models

Multi-orbital quantum impurity models with general interaction and hybridization terms appear in a wide range of applications including embedding, quantum transport, and nanoscience. However, most quantum impurity solvers are restricted to a few impurity orbitals, discretized baths, diagonal hybridizations, or density-density interactions. Here, we generalize the inchworm quantum Monte Carlo method to the interaction expansion and explore its application to typical single- and multi-orbital problems encountered in investigations of impurity and lattice models. Our implementation generically outperforms bare and bold-line quantum Monte Carlo algorithms in the interaction expansion. So far, for the systems studied here, it remains inferior to the more specialized hybridization expansion and auxiliary field algorithms. The problem of convergence to unphysical fixed points, which hampers so-called bold-line methods, is not encountered in inchworm Monte Carlo.

preprint2022arXiv

LINDA: Multi-Agent Local Information Decomposition for Awareness of Teammates

In cooperative multi-agent reinforcement learning (MARL), where agents only have access to partial observations, efficiently leveraging local information is critical. During long-time observations, agents can build \textit{awareness} for teammates to alleviate the problem of partial observability. However, previous MARL methods usually neglect this kind of utilization of local information. To address this problem, we propose a novel framework, multi-agent \textit{Local INformation Decomposition for Awareness of teammates} (LINDA), with which agents learn to decompose local information and build awareness for each teammate. We model the awareness as stochastic random variables and perform representation learning to ensure the informativeness of awareness representations by maximizing the mutual information between awareness and the actual trajectory of the corresponding agent. LINDA is agnostic to specific algorithms and can be flexibly integrated to different MARL methods. Sufficient experiments show that the proposed framework learns informative awareness from local partial observations for better collaboration and significantly improves the learning performance, especially on challenging tasks.

preprint2022arXiv

Mode-selective Single-dipole Excitation and Controlled Routing of Guided Waves in a Multi-mode Topological Waveguide

Topology-linked binary degrees of freedom of guided waves have been used to expand the channel capacity of and to ensure robust transmission through photonic waveguides. However, selectively exciting optical modes associated with the desired degree of freedom is challenging and typically requires spatially extended sources or filters. Both approaches are incompatible with the ultimate objective of developing compact mode-selective sources powered by single emitters. In addition, the implementation of highly desirable functionalities, such as controllable distribution of guided modes between multiple detectors, becomes challenging in highly-compact devices due to photon loss to reflections. Here, we demonstrate that a linearly-polarized dipole-like source can selectively excite a topologically robust edge mode with the desired valley degree of freedom. Reflection-free routing of valley-polarized edge modes into two spatially-separated detectors with reconfigurable splitting ratios is also presented. An optical implementation of such a source will have the potential to broaden the applications of topological photonic devices.

preprint2022arXiv

Model Generation with Provable Coverability for Offline Reinforcement Learning

Model-based offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization, where the learned policy could adapt to different dynamics enumerated at the training stage. But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration, which still hinders policy to generalize well. To narrow the gap, previous works roughly ensemble randomly initialized models to better approximate the real dynamics. However, such practice is costly and inefficient, and provides no guarantee on how well the real dynamics could be approximated by the learned models, which we name coverability in this paper. We actively address this issue by generating models with provable ability to cover real dynamics in an efficient and controllable way. To that end, we design a distance metric for dynamic models based on the occupancy of policies under the dynamics, and propose an algorithm to generate models optimizing their coverage for the real dynamics. We give a theoretical analysis on the model generation process and proves that our algorithm could provide enhanced coverability. As a downstream task, we train a dynamics-aware policy with minor or no conservative penalty, and experiments demonstrate that our algorithm outperforms prior offline methods on existing offline RL benchmarks. We also discover that policies learned by our method have better zero-shot transfer performance, implying their better generalization.

preprint2022arXiv

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

A promising way to improve the sample efficiency of reinforcement learning is model-based methods, in which many explorations and evaluations can happen in the learned models to save real-world samples. However, when the learned model has a non-negligible model error, sequential steps in the model are hard to be accurately evaluated, limiting the model's utilization. This paper proposes to alleviate this issue by introducing multi-step plans to replace multi-step actions for model-based RL. We employ the multi-step plan value estimation, which evaluates the expected discounted return after executing a sequence of action plans at a given state, and updates the policy by directly computing the multi-step policy gradient via plan value estimation. The new model-based reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning with Multi-step Plan Value Estimation) shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.

preprint2022arXiv

Multi-Agent Policy Transfer via Task Relationship Modeling

Team adaptation to new cooperative tasks is a hallmark of human intelligence, which has yet to be fully realized in learning agents. Previous work on multi-agent transfer learning accommodate teams of different sizes, heavily relying on the generalization ability of neural networks for adapting to unseen tasks. We believe that the relationship among tasks provides the key information for policy adaptation. In this paper, we try to discover and exploit common structures among tasks for more efficient transfer, and propose to learn effect-based task representations as a common space of tasks, using an alternatively fixed training scheme. We demonstrate that the task representation can capture the relationship among tasks, and can generalize to unseen tasks. As a result, the proposed method can help transfer learned cooperation knowledge to new tasks after training on a few source tasks. We also find that fine-tuning the transferred policies help solve tasks that are hard to learn from scratch.

preprint2022arXiv

Offline Reinforcement Learning with Causal Structured World Models

Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error bound. We then propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly. Consequently, it performs better than the plain model-based offline RL algorithms and other causal model-based RL algorithms.

preprint2022arXiv

On Generalization of Adversarial Imitation Learning and Beyond

Despite massive empirical evaluations, one of the fundamental questions in imitation learning is still not fully settled: does AIL (adversarial imitation learning) provably generalize better than BC (behavioral cloning)? We study this open problem with tabular and episodic MDPs. For vanilla AIL that uses the direct maximum likelihood estimation, we provide both negative and positive answers under the known transition setting. For some MDPs, we show that vanilla AIL has a worse sample complexity than BC. The key insight is that the state-action distribution matching principle is weak so that AIL may generalize poorly even on visited states from the expert demonstrations. For another class of MDPs, vanilla AIL is proved to generalize well even on non-visited states. Interestingly, its sample complexity is horizon-free, which provably beats BC by a wide margin. Finally, we establish a framework in the unknown transition scenario, which allows AIL to explore via reward-free exploration strategies. Compared with the best-known online apprenticeship learning algorithm, the resulting algorithm improves the sample complexity and interaction complexity.

preprint2022arXiv

Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations

Surgical scene segmentation is fundamentally crucial for prompting cognitive assistance in robotic surgery. However, pixel-wise annotating surgical video in a frame-by-frame manner is expensive and time consuming. To greatly reduce the labeling burden, in this work, we study semi-supervised scene segmentation from robotic surgical video, which is practically essential yet rarely explored before. We consider a clinically suitable annotation situation under the equidistant sampling. We then propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation. It effectively leverages unlabeled data for a trusty and global model regularization that produces more discriminative feature representation. Concretely, for trusty representation learning, we propose to incorporate pseudo labels to instruct the pair selection, obtaining more reliable representation pairs for pixel contrast. Moreover, we expand the representation learning space from previous image-level to cross-video, which can capture the global semantics to benefit the learning process. We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS. Experimental results demonstrate the effectiveness of our method, consistently outperforming the state-of-the-art semi-supervised methods under different labeling ratios, and even surpassing fully supervised training on EndoVis18 with 10.1% labeling.

preprint2022arXiv

Rethinking ValueDice: Does It Really Improve Performance?

Since the introduction of GAIL, adversarial imitation learning (AIL) methods attract lots of research interests. Among these methods, ValueDice has achieved significant improvements: it beats the classical approach Behavioral Cloning (BC) under the offline setting, and it requires fewer interactions than GAIL under the online setting. Are these improvements benefited from more advanced algorithm designs? We answer this question by the following conclusions. First, we show that ValueDice could reduce to BC under the offline setting. Second, we verify that overfitting exists and regularization matters in the low-data regime. Specifically, we demonstrate that with weight decay, BC also nearly matches the expert performance as ValueDice does. The first two claims explain the superior offline performance of ValueDice. Third, we establish that ValueDice does not work when the expert trajectory is subsampled. Instead, the mentioned success of ValueDice holds when the expert trajectory is complete, in which ValueDice is closely related to BC that performs well as mentioned. Finally, we discuss the implications of our research for imitation learning studies beyond ValueDice.

preprint2022arXiv

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

This paper addresses policy learning in non-stationary environments and games with continuous actions. Rather than the classical reward maximization mechanism, inspired by the ideas of follow-the-regularized-leader (FTRL) and mirror descent (MD) update, we propose a no-regret style reinforcement learning algorithm PORL for continuous action tasks. We prove that PORL has a last-iterate convergence guarantee, which is important for adversarial and cooperative games. Empirical studies show that, in stationary environments such as MuJoCo locomotion controlling tasks, PORL performs equally well as, if not better than, the soft actor-critic (SAC) algorithm; in non-stationary environments including dynamical environments, adversarial training, and competitive games, PORL is superior to SAC in both a better final policy performance and a more stable training process.

preprint2021arXiv

A general framework for scintillation in nanophotonics

Bombardment of materials by high-energy particles (e.g., electrons, nuclei, X- and $γ$-ray photons) often leads to light emission, known generally as scintillation. Scintillation is ubiquitous and enjoys widespread applications in many areas such as medical imaging, X-ray non-destructive inspection, night vision, electron microscopy, and high-energy particle detectors. A large body of research focuses on finding new materials optimized for brighter, faster, and more controlled scintillation. Here, we develop a fundamentally different approach based on integrating nanophotonic structures into scintillators to enhance their emission. To start, we develop a unified and ab initio theory of nanophotonic scintillators that accounts for the key aspects of scintillation: the energy loss by high-energy particles, as well as the light emission by non-equilibrium electrons in arbitrary nanostructured optical systems. This theoretical framework allows us, for the first time, to experimentally demonstrate nearly an order-of-magnitude enhancement of scintillation, in both electron-induced, and X-ray-induced scintillation. Our theory also allows the discovery of structures that could eventually achieve several orders-of-magnitude scintillation enhancement. The framework and results shown here should enable the development of a new class of brighter, faster, and higher-resolution scintillators with tailored and optimized performances - with many potential applications where scintillators are used.

preprint2021arXiv

Active millimeter wave three-dimensional scan real-time imaging mechanism with a line antenna array

Active Millimeter wave (AMMW) imaging is of interest as it has played important roles in wide variety of applications, from nondestructive test to medical diagnosis. Current AMMW imaging systems have a high spatial resolution and can realize three-dimensional (3D) imaging. However, conventional AMMW imaging systems based on the synthetic aperture require either time-consume acquisition or reconstruction. The AMMW imaging systems based on real-aperture are able to real-time imaging but they need a large aperture and a complex two-dimensional (2D) scan structure to get 3D images. Besides, most AMMW imaging systems need the targets keep still and hold a special posture while screening, limiting the throughput. Here, by using beam control techniques and fast post-processing algorithms, we demonstrate the AMMW 3D scan real-time imaging mechanism with a line antenna array, which can realize 3D real-time imaging by a simple one-dimensional (1D) linear moving, simultaneously, with a satisfactory throughput (over 2000 people per-hour, 10 times than the commercial AMMW imaging systems) and a low system cost. First, the original spherical beam lines generated by the linear antenna array are modulated to fan beam lines via a bi-convex cylindrical lens. Then the holographic imaging algorithm is used to primarily focus the echo data of the imaged object. Finally, the defocus blur is corrected rapidly to get high resolution images by deconvolution. Since our method does not need targets to keep still, has a low system cost, can achieve 3D real-time imaging with a satisfactory throughput simultaneously, this work has the potential to serve as a foundation for future short-range AMMW imaging systems, which can be used in a variety of fields such as security inspection, medical diagnosis, etc.

preprint2021arXiv

ASBSO: An Improved Brain Storm Optimization With Flexible Search Length and Memory-Based Selection

Brain storm optimization (BSO) is a newly proposed population-based optimization algorithm, which uses a logarithmic sigmoid transfer function to adjust its search range during the convergent process. However, this adjustment only varies with the current iteration number and lacks of flexibility and variety which makes a poor search effciency and robustness of BSO. To alleviate this problem, an adaptive step length structure together with a success memory selection strategy is proposed to be incorporated into BSO. This proposed method, adaptive step length based on memory selection BSO, namely ASBSO, applies multiple step lengths to modify the generation process of new solutions, thus supplying a flexible search according to corresponding problems and convergent periods. The novel memory mechanism, which is capable of evaluating and storing the degree of improvements of solutions, is used to determine the selection possibility of step lengths. A set of 57 benchmark functions are used to test ASBSO's search ability, and four real-world problems are adopted to show its application value. All these test results indicate the remarkable improvement in solution quality, scalability, and robustness of ASBSO.

preprint2021arXiv

Derivative-Free Reinforcement Learning: A Review

Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

preprint2021arXiv

Exponential convergence of distributed optimization for heterogeneous linear multi-agent systems

In this work we study a distributed optimal output consensus problem for heterogeneous linear multi-agent systems where the agents aim to reach consensus with the purpose of minimizing the sum of private convex costs. Based on output feedback, a fully distributed control law is proposed by using the proportional-integral (PI) control technique. For strongly convex cost functions with Lipschitz gradients, the designed controller can achieve convergence exponentially in an undirected and connected network. Furthermore, to remove the requirement of continuous communications, the proposed control law is then extended to periodic and event-triggered communication schemes, which also achieve convergence exponentially. Two simulation examples are given to verify the proposed control algorithms.

preprint2021arXiv

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning a good policy from a batch of collected data, without extra interactions with the environment during training. However, current offline RL benchmarks commonly have a large reality gap, because they involve large datasets collected by highly exploratory policies, and the trained policy is directly evaluated in the environment. In real-world situations, running a highly exploratory policy is prohibited to ensure system safety, the data is commonly very limited, and a trained policy should be well validated before deployment. In this paper, we present a near real-world offline RL benchmark, named NeoRL, which contains datasets from various domains with controlled sizes, and extra test datasets for policy validation. We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward. The empirical results demonstrate that the tested offline RL algorithms become less competitive to the deterministic policy on many datasets, and the offline policy evaluation hardly helps. The NeoRL suit can be found at http://polixir.ai/research/neorl. We hope this work will shed some light on future research and draw more attention when deploying RL in real-world systems.

preprint2021arXiv

Self-supervised learning for fast and scalable time series hyper-parameter tuning

Hyper-parameters of time series models play an important role in time series analysis. Slight differences in hyper-parameters might lead to very different forecast results for a given model, and therefore, selecting good hyper-parameter values is indispensable. Most of the existing generic hyper-parameter tuning methods, such as Grid Search, Random Search, Bayesian Optimal Search, are based on one key component - search, and thus they are computationally expensive and cannot be applied to fast and scalable time-series hyper-parameter tuning (HPT). We propose a self-supervised learning framework for HPT (SSL-HPT), which uses time series features as inputs and produces optimal hyper-parameters. SSL-HPT algorithm is 6-20x faster at getting hyper-parameters compared to other search based algorithms while producing comparable accurate forecasting results in various applications.

preprint2021arXiv

The Role of the Hercules Autonomous Vehicle During the COVID-19 Pandemic: An Autonomous Logistic Vehicle for Contactless Goods Transportation

Since early 2020, the coronavirus disease 2019 (COVID-19) has spread rapidly across the world. As at the date of writing this article, the disease has been globally reported in 223 countries and regions, infected over 108 million people and caused over 2.4 million deaths (https://covid19.who.int/, accessed on Feb. 17, 2021). Avoiding person-to-person transmission is an effective approach to control and prevent the pandemic. However, many daily activities, such as transporting goods in our daily life, inevitably involve person-to-person contact. Using an autonomous logistic vehicle to achieve contact-less goods transportation could alleviate this issue. For example, it can reduce the risk of virus transmission between the driver and customers. Moreover, many countries have imposed tough lockdown measures to reduce the virus transmission (e.g., retail, catering) during the pandemic, which causes inconveniences for human daily life. Autonomous vehicle can deliver the goods bought by humans, so that humans can get the goods without going out. These demands motivate us to develop an autonomous vehicle, named as Hercules, for contact-less goods transportation during the COVID-19 pandemic. The vehicle is evaluated through real-world delivering tasks under various traffic conditions.

preprint2021arXiv

TurboTransformers: An Efficient GPU Serving System For Transformer Models

The transformer is the most critical algorithm innovation of the Nature Language Processing (NLP) field in recent years. Unlike the Recurrent Neural Network (RNN) models, Transformers can process on dimensions of sequence lengths in parallel, therefore leading to better accuracy on long sequences. However, efficient deployments of them for online services in data centers equipped with GPUs are not easy. First, more computation introduced by transformer structures makes it more challenging to meet the latency and throughput constraints of serving. Second, NLP tasks take in sentences of variable length. The variability of input dimensions brings a severe problem to efficient memory management and serving optimization. This paper designed a transformer serving system called TurboTransformers, which consists of a computing runtime and a serving framework to solve the above challenges. Three innovative features make it stand out from other similar works. An efficient parallel algorithm is proposed for GPU-based batch reduction operations, like Softmax and LayerNorm, major hot spots besides BLAS routines. A memory allocation algorithm, which better balances the memory footprint and allocation/free efficiency, is designed for variable-length input situations. A serving framework equipped with a new batch scheduler using dynamic programming achieves the optimal throughput on variable-length requests. The system can achieve the state-of-the-art transformer model serving performance on GPU platforms and can be seamlessly integrated into your PyTorch code with a few lines of code.

preprint2020arXiv

AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online

Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.

preprint2020arXiv

Application of Jordan Decomposition to Non-Hermitian Lattice Models with Spectrally-Isolated Lower Dimensional States

When analyzing non-Hermitian lattice systems, the standard eigenmode decomposition utilized for the analysis of Hermitian systems must be replaced by Jordan decomposition. This approach enables us to identify the correct number of the left and right eigenstates of a large finite-sized lattice system, and to form a complete basis for calculating the resonant excitation of the system. Specifically, we derive the procedure for applying Jordan decomposition to a system with spectrally-isolated states. We use a non-Hermitian quadrupole insulator with zero-energy corner states as an example of a large system whose dimensionality can be drastically reduced to derive a low-dimensional "defective" Hamiltonian describing such localized states. Counter-intuitive and non-local properties of the resonant response of the system near zero energy are explained using the Jordan decomposition approach. Depending on the excitation properties of the corner states, we classify our non-Hermitian quadrupolar insulator into three categories: trivial, near-Hermitian, and non-local.

preprint2020arXiv

Binary Representaion for Non-binary LDPC Code with Decoder Design

The equivalent binary parity check matrices for the binary images of the cycle-free non-binary LDPC codes have numerous bit-level cycles. In this paper, we show how to transform these binary parity check matrices into their cycle-free forms. It is shown that the proposed methodology can be adopted not only for the binary images of non-binary LDPC codes but also for a large class of binary LDPC codes. Specifically, we present an extended $p$-reducible (EPR) LDPC code structure to eliminate the bit-level cycles. For the non-binary LDPC codes with short length symbol-level cycles, the EPR-LDPC codes can largely avoid the corresponding short length bit-level cycles. As to the decoding of the EPR-LDPC codes, we propose a hybrid hard-decision decoder and a hybrid parallel decoder for binary symmetric channel and binary input Gaussian channel, respectively. A simple code optimization algorithm for these binary decoders is also provided. Simulations show the comparative results and justify the advantages, i.e., better performance and lower decoding complexity, of the proposed binary constructions.

preprint2020arXiv

Day-to-Day Dynamic Traffic Assignment with Imperfect Information, Bounded Rationality and Information Sharing

This paper presents a doubly dynamic day-to-day (DTD) traffic assignment model with simultaneous route-and-departure-time (SRDT) choices while incorporating incomplete and imperfect information as well as bounded rationality. Two SRDT choice models are proposed to incorporate imperfect travel information: One based on multinomial Logit (MNL) model and the other on sequential, mixed multinomial/nested Logit model. These two variants, serving as based models, are further extended with two features: bounded rationality (BR) and information sharing. BR is considered by incorporating the indifference band into the random utility component of the MNL model, forming a BR-based DTD stochastic model. A macroscopic model of travel information sharing is integrated into the DTD dynamics to account for the impact of incomplete information on travelers' SRDT choices. These DTD choice models are combined with within-day dynamics following the Lighthill-Whitham-Richards (LWR) fluid dynamic network loading model. Simulations on large-scale networks (Anaheim) illustrate the interactions between users' adaptive decision making and network conditions (including local disruption) with different levels of information availability and user behavior. Our findings highlight the need for modeling network transient and disequilibriated states, which are often overlooked in equilibrium-constrained network design and optimization. The MATLAB package and computational examples are available at https://github.com/DrKeHan/DTD

preprint2020arXiv

Design of Convergence-Optimized Non-binary LDPC Codes over Binary Erasure Channel

In this letter, we present a hybrid iterative decoder for non-binary low density parity check (LDPC) codes over binary erasure channel (BEC), based on which the recursion of the erasure probability is derived to design non-binary LDPC codes with convergence-optimized degree distributions. The resulting one-step decoding tree is cycle-free and achieves lower decoding complexity. Experimental studies show that the proposed convergence-optimization algorithm accelerates the convergence process by 33%.

preprint2020arXiv

Design of Low Complexity Non-binary LDPC Codes with an Approximated Performance-Complexity Tradeoff

By presenting an approximated performance-complexity tradeoff (PCT) algorithm,a low-complexity non-binary low density parity check (LDPC) code over q-ary-input symmetric-output channel is designed in this manuscript which converges faster than the threshold-optimized non-binary LDPC codes in the low error rate regime. We examine our algorithm by both hard and soft decision decoders.Moreover, simulation shows that the approximated PCT algorithm has accelerated the convergence process by 30% regarding the number of the decoding iterations.

preprint2020arXiv

Diabolical Points in Coupled Active Cavities with Quantum Emitters

In single microdisks, embedded active emitters intrinsically affect the cavity mode of microdisks, which results in a trivial symmetric backscattering and a low controllability. Here we propose a macroscopical control of the backscattering direction by optimizing the cavity size. The signature of positive and negative backscattering directions in each single microdisk is confirmed with two strongly coupled microdisks. Furthermore, the diabolical points are achieved at the resonance of two microdisks, which agrees well with the theoretical calculations considering backscattering directions. The diabolical points in active optical structures pave a way to implement quantum information processing with geometric phase in quantum photonic networks.

preprint2020arXiv

Experimental Observation of Tensor Monopoles with a Superconducting Qudit

Monopoles play a center role in gauge theories and topological matter. There are two fundamental types of monopoles in physics: vector monopoles and tensor monopoles. Examples of vector monopoles include the Dirac monopole in 3D and Yang monopole in 5D, which have been extensively studied and observed in condensed matter or artificial systems. However, tensor monopoles are less studied, and their observation has not been reported. Here we experimentally construct a tunable spin-1 Hamiltonian to generate a tensor monopole and then measure its unique features with superconducting quantum circuits. The energy structure of a 4D Weyl-like Hamiltonian with three-fold degenerate points acting as tensor monopoles is imaged. Through quantum-metric measurements, we report the first experiment that measures the Dixmier-Douady invariant, the topological charge of the tensor monopole. Moreover, we observe topological phase transitions characterized by the topological Dixmier-Douady invariant, rather than the Chern numbers as used for conventional monopoles in odd-dimensional spaces.

preprint2020arXiv

Experimental Realization of Universal Time-optimal non-Abelian Geometric Gates

Based on the geometrical nature of quantum phases, non-adiabatic holonomic quantum control (NHQC) has become a standard technique for enhancing robustness in constructing quantum gates. However, the conventional approach of NHQC is sensitive to control instability, as it requires the driving pulses to cover a fixed pulse area. Furthermore, even for small-angle rotations, all operations need to be completed with the same duration of time. Here we experimentally demonstrate a time-optimal and unconventional approach of NHQC (called TOUNHQC), which can optimize the operation time of any holonomic gate. Compared with the conventional approach, TOUNHQC provides an extra layer of robustness to decoherence and control errors. The experiment involves a scalable architecture of superconducting circuit, where we achieved a fidelity of 99.51% for a single qubit gate using interleaved randomized benchmarking. Moreover, a two-qubit holonomic control-phase gate has been implemented where the gate error can be reduced by as much as 18% compared with NHQC.

preprint2020arXiv

Finite Temperature Auxiliary Field Quantum Monte Carlo in the Canonical Ensemble

Finite temperature auxiliary field-based Quantum Monte Carlo methods, including Determinant Quantum Monte Carlo (DQMC) and Auxiliary Field Quantum Monte Carlo (AFQMC), have historically assumed pivotal roles in the investigation of the finite temperature phase diagrams of a wide variety of multidimensional lattice models and materials. Despite their utility, however, these techniques are typically formulated in the grand canonical ensemble, which makes them difficult to apply to condensates like superfluids and difficult to benchmark against alternative methods that are formulated in the canonical ensemble. Working in the grand canonical ensemble is furthermore accompanied by the increased overhead associated with having to determine the chemical potentials that produce desired fillings. Given this backdrop, in this work, we present a new recursive approach for performing AFQMC simulations in the canonical ensemble that does not require knowledge of chemical potentials. To derive this approach, we exploit the convenient fact that AFQMC solves the many-body problem by decoupling many-body propagators into integrals over one-body problems to which non-interacting theories can be applied. We benchmark the accuracy of our technique on illustrative Bose and Fermi Hubbard models and demonstrate that it can converge more quickly to the ground state than grand canonical AFQMC simulations. We believe that our novel use of HS-transformed operators to implement algorithms originally derived for non-interacting systems will motivate the development of a variety of other methods and anticipate that our technique will enable direct performance comparisons against other many-body approaches formulated in the canonical ensemble.

preprint2020arXiv

Identifying defect-related quantum emitters in monolayer WSe$_2$

Monolayer transition metal dichalcogenides have recently attracted great interests because the quantum dots embedded in monolayer can serve as optically active single photon emitters. Here, we provide an interpretation of the recombination mechanisms of these quantum emitters through polarization-resolved and magneto-optical spectroscopy at low temperature. Three types of defect-related quantum emitters in monolayer tungsten diselenide (WSe$_2$) are observed, with different exciton g factors of 2.02, 9.36 and unobservable Zeeman shift, respectively. The various magnetic response of the spatially localized excitons strongly indicate that the radiative recombination stems from the different transitions between defect-induced energy levels, valance and conduction bands. Furthermore, the different g factors and zero-field splittings of the three types of emitters strongly show that quantum dots embedded in monolayer have various types of confining potentials for localized excitons, resulting in electron-hole exchange interaction with a range of values in the presence of anisotropy. Our work further sheds light on the recombination mechanisms of defect-related quantum emitters and paves a way toward understanding the role of defects in single photon emitters in atomically thin semiconductors.

preprint2020arXiv

Large photoluminescence enhancement by an out-of-plane magnetic field in exfoliated WS$_2$ flakes

We report an out-of-plane magnetic field induced large photoluminescence enhancement in WS${}_2$ flakes at $4$ K, in contrast to the photoluminescence enhancement provided by in-plane field in general. Two mechanisms for the enhancement are proposed. One is a larger overlap of electron and hole caused by the magnetic field induced confinement. The other is that the energy difference between $Λ$ and K valleys is reduced by magnetic field, and thus enhancing the corresponding indirect-transition trions. Meanwhile, the Landé g factor of the trion is measured as $-0.8$, whose absolute value is much smaller than normal exciton, which is around $|-4|$. A model for the trion g factor is presented, confirming that the smaller absolute value of Landé g factor is a behavior of this $Λ$-K trion. By extending the valley space, we believe this work provides a further understanding of the valleytronics in monolayer transition metal dichalcogenides.

preprint2020arXiv

Local Neighbor Propagation Embedding

Manifold Learning occupies a vital role in the field of nonlinear dimensionality reduction and its ideas also serve for other relevant methods. Graph-based methods such as Graph Convolutional Networks (GCN) show ideas in common with manifold learning, although they belong to different fields. Inspired by GCN, we introduce neighbor propagation into LLE and propose Local Neighbor Propagation Embedding (LNPE). With linear computational complexity increase compared with LLE, LNPE enhances the local connections and interactions between neighborhoods by extending $1$-hop neighbors into $n$-hop neighbors. The experimental results show that LNPE could obtain more faithful and robust embeddings with better topological and geometrical properties.

preprint2020arXiv

Novelty-Prepared Few-Shot Classification

Few-shot classification algorithms can alleviate the data scarceness issue, which is vital in many real-world problems, by adopting models pre-trained from abundant data in other domains. However, the pre-training process was commonly unaware of the future adaptation to other concept classes. We disclose that a classically fully trained feature extractor can leave little embedding space for unseen classes, which keeps the model from well-fitting the new classes. In this work, we propose to use a novelty-prepared loss function, called self-compacting softmax loss (SSL), for few-shot classification. The SSL can prevent the full occupancy of the embedding space. Thus the model is more prepared to learn new classes. In experiments on CUB-200-2011 and mini-ImageNet datasets, we show that SSL leads to significant improvement of the state-of-the-art performance. This work may shed some light on considering the model capacity for few-shot classification tasks.

preprint2020arXiv

OrgMining 2.0: A Novel Framework for Organizational Model Mining from Event Logs

Providing appropriate structures around human resources can streamline operations and thus facilitate the competitiveness of an organization. To achieve this goal, modern organizations need to acquire an accurate and timely understanding of human resource grouping while faced with an ever-changing environment. The use of process mining offers a promising way to help address the need through utilizing event log data stored in information systems. By extracting knowledge about the actual behavior of resources participating in business processes from event logs, organizational models can be constructed, which facilitate the analysis of the de facto grouping of human resources relevant to process execution. Nevertheless, open research gaps remain to be addressed when applying the state-of-the-art process mining to analyze resource grouping. For one, the discovery of organizational models has only limited connections with the context of process execution. For another, a rigorous solution that evaluates organizational models against event log data is yet to be proposed. In this paper, we aim to tackle these research challenges by developing a novel framework built upon a richer definition of organizational models coupling resource grouping with process execution knowledge. By introducing notions of conformance checking for organizational models, the framework allows effective evaluation of organizational models, and therefore provides a foundation for analyzing and improving resource grouping based on event logs. We demonstrate the feasibility of this framework by proposing an approach underpinned by the framework for organizational model discovery, and also conduct experiments on real-life event logs to discover and evaluate organizational models.

preprint2020arXiv

Reinforced Epidemic Control: Saving Both Lives and Economy

Saving lives or economy is a dilemma for epidemic control in most cities while smart-tracing technology raises people's privacy concerns. In this paper, we propose a solution for the life-or-economy dilemma that does not require private data. We bypass the private-data requirement by suppressing epidemic transmission through a dynamic control on inter-regional mobility that only relies on Origin-Designation (OD) data. We develop DUal-objective Reinforcement-Learning Epidemic Control Agent (DURLECA) to search mobility-control policies that can simultaneously minimize infection spread and maximally retain mobility. DURLECA hires a novel graph neural network, namely Flow-GNN, to estimate the virus-transmission risk induced by urban mobility. The estimated risk is used to support a reinforcement learning agent to generate mobility-control actions. The training of DURLECA is guided with a well-constructed reward function, which captures the natural trade-off relation between epidemic control and mobility retaining. Besides, we design two exploration strategies to improve the agent's searching efficiency and help it get rid of local optimums. Extensive experimental results on a real-world OD dataset show that DURLECA is able to suppress infections at an extremely low level while retaining 76\% of the mobility in the city. Our implementation is available at https://github.com/anyleopeace/DURLECA/.

preprint2020arXiv

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}). The \texttt{ReBoot} enforces exploration by injecting data-driven randomness through a residual-based perturbation mechanism. This novel mechanism captures the underlying distributional properties of fitting errors, and more importantly boosts exploration to escape from suboptimal solutions (for small sample sizes) by inflating variance level in an \textit{unconventional} way. In theory, with appropriate variance inflation level, \texttt{ReBoot} provably secures instance-dependent logarithmic regret in Gaussian multi-armed bandits. We evaluate the \texttt{ReBoot} in different synthetic multi-armed bandits problems and observe that the \texttt{ReBoot} performs better for unbounded rewards and more robustly than \texttt{Giro} \cite{kveton2018garbage} and \texttt{PHE} \cite{kveton2019perturbed}, with comparable computational efficiency to the Thompson sampling method.

preprint2020arXiv

Simultaneous Inference for Massive Data: Distributed Bootstrap

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods \cite{kleiner2014scalable,sengupta2016subsampled}, while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

preprint2020arXiv

Simultaneously exciting two atoms with photon-mediated Raman interaction

We propose an approach to simultaneously excite two atoms by using cavity-assisted Raman process in combination with cavity photon-mediated interaction. The system consists of a two-level atom and a $Λ$-type or V-type three-level atom, which are coupled together with a cavity mode. Having derived the effective Hamiltonian, we find that under certain circumstances a single photon can simultaneously excite two atoms. In addition, multiple photons and even a classical field can also simultaneously excite two atoms. As an example, we show a scheme to realize our proposal in a circuit QED setup, which is artificial atoms coupled with a cavity. The dynamics and the quantum statistical properties of the process are investigated with experimentally feasible parameters.

preprint2020arXiv

Switchable next-nearest-neighbor coupling for controlled two-qubit operations

In a superconducting quantum processor with nearest neighbor coupling, the dispersive interaction between adjacent qubits can result in an effective next-nearest-neighbor coupling whose strength depends on the state of the intermediary qubit. Here, we theoretically explore the possibility of engineering this next-nearest-neighbor coupling to implement controlled two-qubit operations where the intermediary qubit controls the operation on the next-nearest neighbor pair of qubits. In particular, in a system comprising two types of superconducting qubits with anharmonicities of opposite-sign arranged in an -A-B-A- pattern, where the unwanted static ZZ coupling between adjacent qubits could be heavily suppressed, a switchable coupling between the next-nearest-neighbor qubits can be achieved via the intermediary qubit, the qubit state of which functions as an on/off switch for this coupling. Therefore, depending on the adopted activating scheme, various controlled two-qubit operations such as controlled-iSWAP gate can be realized, potentially enabling circuit depth reductions as to a standard decomposition approach for implementing generic quantum algorithms.

preprint2020arXiv

Synthesis of Cu mono-component metallic glass by the deposition on amorphous SiO$_2$ substrate: a molecular dynamics study

In this work, we simulated the physical vapor deposition (PVD) process of Cu atoms on the amorphous SiO$_2$ substrate. The resulting Cu thin layer exhibit amorphous structure. The Cu liquid quenching from 2000 K to 50 K was also simulated with different cooling rate to form the Cu metallic glass for comparison. The Cu glasses from the two different processes (PVD and quenching) revealed the same radial distribution function but different local structure from the Voronoi tessellation analysis. The PVD glass exhibit higher densities and lower potential energy compared with the melt-quenched counterpart, which corresponded to the properties of ultrastable glasses.

preprint2020arXiv

Temporal-adaptive Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is expected to be decomposed into corresponding levels with sub-tasks, and thus the learning can be more efficient. In HRL, although it is intuitive that a high-level policy only needs to make macro decisions in a low frequency, the exact frequency is hard to be simply determined. Previous HRL approaches often employed a fixed-time skip strategy or learn a terminal condition without taking account of the context, which, however, not only requires manual adjustments but also sacrifices some decision granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical policy learning} (TEMPLE) structure, which uses a temporal gate to adaptively control the high-level policy decision frequency. We train the TEMPLE structure with PPO and test its performance in a range of environments including 2-D rooms, Mujoco tasks, and Atari games. The results show that the TEMPLE structure can lead to improved performance in these environments with a sequential adaptive high-level control.

preprint2019arXiv

Realization of Superadiabatic Two-qubit Gates Using Parametric Modulation in Superconducting Circuits

Fast robust two-qubit gate operation with low susceptibility to crosstalk are the key to scalable quantum information processing. Parametrically driven gate is inherently insensitive to crosstalk while superadiabatic control can speed up the gate without losing accuracy. We propose and experimentally implement superadiabatic two-qubit gates using parametric modulation on superconducting quantum circuits. Our results demonstrate the preservation of adiabaticity at a gate speed close to the quantum limit, in addition to robustness against control instability. We demonstrate a CZ gate with error rate of 5.8$\%$, limited largely by qubit decoherence, promising future improvement and scalable implementation.

preprint2018arXiv

Multi-Layered Gradient Boosting Decision Trees

Multi-layered representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision. While non-differentiable models such as gradient boosting decision trees (GBDTs) are the dominant methods for modeling discrete or tabular data, they are hard to incorporate with such representation learning ability. In this work, we propose the multi-layered GBDT forest (mGBDTs), with an explicit emphasis on exploring the ability to learn hierarchical representations by stacking several layers of regression GBDTs as its building block. The model can be jointly trained by a variant of target propagation across layers, without the need to derive back-propagation nor differentiability. Experiments and visualizations confirmed the effectiveness of the model in terms of performance and representation learning ability.