Source author record

Qi Zhao

Qi Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning quant-ph Artificial Intelligence Computer Science and Game Theory Cryptography and Security eess.SY Human-Computer Interaction Information Theory math.IT Multimedia Neurons and Cognition physics.atom-ph physics.data-an physics.geo-ph physics.optics Robotics Systems and Control

Catalog footprint

What is connected

33works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

While modern diffusion models excel at generating diverse single images, extending this to sequential generation reveals a fundamental challenge: balancing narrative dynamism with multi-character coherence. Existing methods often falter at this trade-off, leading to artifacts where characters lose their identity or the story stagnates. To resolve this critical tension, we introduce RealDiffusion, a unified framework designed to reconcile robust coherence with narrative dynamism. Heat diffusion serves as a dissipative prior that averages neighboring features along the sequence and removes high-frequency noise within the subject region. This suppresses attribute drift and stabilizes identity across frames. A region-aware stochastic process then introduces small perturbations that explore nearby modes and prevent collapse so the story maintains pose change and scene evolution. We thus introduce a lightweight, training-free Physics-informed Attention mechanism that injects controllable physical priors into the self-attention layers during inference. By modeling feature evolution as a configurable physical system, our method regularizes spatio-temporal relationships without suppressing intentional, prompt-driven changes. Extensive experiments demonstrate that RealDiffusion achieves substantial gains in character coherence while preserving narrative dynamism, outperforming state-of-the-art approaches. Code is available at https://github.com/ShmilyQi-CN/RealDiffusion.

preprint2024arXiv

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

Self-supervised representation learning methods have achieved significant success in computer vision and natural language processing, where data samples exhibit explicit spatial or semantic dependencies. However, applying these methods to tabular data is challenging due to the less pronounced dependencies among data samples. In this paper, we address this limitation by introducing SwitchTab, a novel self-supervised method specifically designed to capture latent dependencies in tabular data. SwitchTab leverages an asymmetric encoder-decoder framework to decouple mutual and salient features among data pairs, resulting in more representative embeddings. These embeddings, in turn, contribute to better decision boundaries and lead to improved results in downstream tasks. To validate the effectiveness of SwitchTab, we conduct extensive experiments across various domains involving tabular data. The results showcase superior performance in end-to-end prediction tasks with fine-tuning. Moreover, we demonstrate that pre-trained salient embeddings can be utilized as plug-and-play features to enhance the performance of various traditional classification methods (e.g., Logistic Regression, XGBoost, etc.). Lastly, we highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.

preprint2023arXiv

Numerical investigation of progressive damage and associated seismicity on a laboratory fault

Understanding rock shear failure behavior is crucial to gain insights into slip-related geohazards such as rock avalanches, landslides, and earthquakes. However, descriptions of the progressive damage on the shear surface are still incomplete or ambiguous. In this study, we use the hybrid finite-discrete element method (FDEM) to simulate a shear experiment and obtain a detailed comprehension of shear induced progressive damage and the associated seismic activity. We built a laboratory fault model from high resolution surface scans and micro-CT imaging. Our results show that under quasi-static shear loading, the fault surface experiences local dynamic seismic activities. We found that the seismic activity is related to the stress concentration on interlocking asperities. This interlocking behavior (i) causes stress concentration at the region of contact that could reach the compressive strength, and (ii) produces tensile stress up to the tensile strength in the region adjacent to the contact area. Thus, different failure mechanisms and damage patterns including crushing and sub-vertical fracturing are observed on the rough surface. Asperity failure creates rapid local slips resulting in significant stress perturbations that alter the overall stress condition and may trigger the slip of adjacent critically stressed asperities. We found that the spatial distribution of the damaged asperities and the seismic activity is highly heterogeneous; regions with intense asperity interactions formed gouge material, while others exhibit minimal to no damage. These results emphasize the important role of surface roughness in controlling the overall shear behavior and the local dynamic seismic activities on faults.

preprint2022arXiv

Artificial Intelligence Enables Real-Time and Intuitive Control of Prostheses via Nerve Interface

Objective: The next generation prosthetic hand that moves and feels like a real hand requires a robust neural interconnection between the human minds and machines. Methods: Here we present a neuroprosthetic system to demonstrate that principle by employing an artificial intelligence (AI) agent to translate the amputee's movement intent through a peripheral nerve interface. The AI agent is designed based on the recurrent neural network (RNN) and could simultaneously decode six degree-of-freedom (DOF) from multichannel nerve data in real-time. The decoder's performance is characterized in motor decoding experiments with three human amputees. Results: First, we show the AI agent enables amputees to intuitively control a prosthetic hand with individual finger and wrist movements up to 97-98% accuracy. Second, we demonstrate the AI agent's real-time performance by measuring the reaction time and information throughput in a hand gesture matching task. Third, we investigate the AI agent's long-term uses and show the decoder's robust predictive performance over a 16-month implant duration. Conclusion & significance: Our study demonstrates the potential of AI-enabled nerve technology, underling the next generation of dexterous and intuitive prosthetic hands.

preprint2022arXiv

Attention in Reasoning: Dataset, Analysis, and Modeling

While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of attention progressively along the reasoning process and to differentiate the correct and incorrect attention patterns. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR

preprint2022arXiv

Constrained Heterogeneous Two-facility Location Games with Max-variant Cost

In this paper, we propose a constrained heterogeneous facility location model where a set of alternative locations are feasible for building facilities and the number of facilities built at each location is limited. Supposing that a set of agents on the real line can strategically report their locations and each agent's cost is her distance to the further facility that she is interested in, we study deterministic mechanism design without money for constrained heterogeneous two-facility location games. Depending on whether agents have optional preference, the problem is considered in two settings: the compulsory setting and the optional setting. In the compulsory setting where each agent is served by the two heterogeneous facilities, we provide a 3-approximate deterministic group strategyproof mechanism for the sum/maximum cost objective respectively, which is also the best deterministic strategyproof mechanism under the corresponding social objective. In the optional setting where each agent can be interested in one of the two facilities or both, we propose a deterministic group strategyproof mechanism with approximation ratio of at most $2n+1$ for the sum cost objective and a deterministic group strategyproof mechanism with approximation ratio of at most 9 for the maximum cost objective.

preprint2022arXiv

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.

preprint2022arXiv

Improved Pump Setpoint Selection Using a Calibrated Hydraulic Model of a High-Pressure Irrigation System

This paper presents a case study of the operational management of the Robinvale high-pressure piped irrigation water delivery system (RVHPS) in Australia. Based on datasets available, improved pump setpoint selection using a calibrated hydraulic model is investigated. The first step was to implement pre-processing of measured flow and pressure data to identify errors in the data and possible faulty sensors. An EPANET hydraulic simulation model was updated with calibrated pipe roughness height values by using the processed pressure and flow data. Then, new pump setpoints were selected using the calibrated model given the actual measured demands such that the pressures in the network were minimized subject to required customer service standards. Based on a two-day simulation, it was estimated that 4.7% savings in pumping energy cost as well as 4.7% reduction in greenhouse gas emissions can be achieved by applying the new pump setpoints.

preprint2022arXiv

Learning to Minimize the Remainder in Supervised Learning

The learning process of deep learning methods usually updates the model's parameters in multiple iterations. Each iteration can be viewed as the first-order approximation of Taylor's series expansion. The remainder, which consists of higher-order terms, is usually ignored in the learning process for simplicity. This learning scheme empowers various multimedia based applications, such as image retrieval, recommendation system, and video search. Generally, multimedia data (e.g., images) are semantics-rich and high-dimensional, hence the remainders of approximations are possibly non-zero. In this work, we consider the remainder to be informative and study how it affects the learning process. To this end, we propose a new learning approach, namely gradient adjustment learning (GAL), to leverage the knowledge learned from the past training iterations to adjust vanilla gradients, such that the remainders are minimized and the approximations are improved. The proposed GAL is model- and optimizer-agnostic, and is easy to adapt to the standard learning framework. It is evaluated on three tasks, i.e., image classification, object detection, and regression, with state-of-the-art models and optimizers. The experiments show that the proposed GAL consistently enhances the evaluated models, whereas the ablation studies validate various aspects of the proposed GAL. The code is available at \url{https://github.com/luoyan407/gradient_adjustment.git}.

preprint2022arXiv

Look in Different Views: Multi-Scheme Regression Guided Cell Instance Segmentation

Cell instance segmentation is a new and challenging task aiming at joint detection and segmentation of every cell in an image. Recently, many instance segmentation methods have applied in this task. Despite their great success, there still exists two main weaknesses caused by uncertainty of localizing cell center points. First, densely packed cells can easily be recognized into one cell. Second, elongated cell can easily be recognized into two cells. To overcome these two weaknesses, we propose a novel cell instance segmentation network based on multi-scheme regression guidance. With multi-scheme regression guidance, the network has the ability to look each cell in different views. Specifically, we first propose a gaussian guidance attention mechanism to use gaussian labels for guiding the network's attention. We then propose a point-regression module for assisting the regression of cell center. Finally, we utilize the output of the above two modules to further guide the instance segmentation. With multi-scheme regression guidance, we can take full advantage of the characteristics of different regions, especially the central region of the cell. We conduct extensive experiments on benchmark datasets, DSB2018, CA2.5 and SCIS. The encouraging results show that our network achieves SOTA (state-of-the-art) performance. On the DSB2018 and CA2.5, our network surpasses previous methods by 1.2% (AP50). Particularly on SCIS dataset, our network performs stronger by large margin (3.0% higher AP50). Visualization and analysis further prove that our proposed method is interpretable.

preprint2022arXiv

REX: Reasoning-aware and Grounded Explanation

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially execute different reasoning steps and construct a new dataset with 1,040,830 multi-modal explanations. Second, we identify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions, and propose a novel explanation generation method that explicitly models the pairwise correspondence between words and regions of interest. It improves the visual grounding capability by a considerable margin, resulting in enhanced interpretability and reasoning performance. Finally, with our new data and method, we perform extensive analyses to study the effectiveness of our explanation under different settings, including multi-task learning and transfer learning. Our code and data are available at https://github.com/szzexpoi/rex.

preprint2022arXiv

Speckle-based optical cryptosystem and its application for human face recognition via deep learning

Face recognition has recently become ubiquitous in many scenes for authentication or security purposes. Meanwhile, there are increasing concerns about the privacy of face images, which are sensitive biometric data that should be carefully protected. Software-based cryptosystems are widely adopted nowadays to encrypt face images, but the security level is limited by insufficient digital secret key length or computing power. Hardware-based optical cryptosystems can generate enormously longer secret keys and enable encryption at light speed, but most reported optical methods, such as double random phase encryption, are less compatible with other systems due to system complexity. In this study, a plain yet high-efficient speckle-based optical cryptosystem is proposed and implemented. A scattering ground glass is exploited to generate physical secret keys of gigabit length and encrypt face images via seemingly random optical speckles at light speed. Face images can then be decrypted from the random speckles by a well-trained decryption neural network, such that face recognition can be realized with up to 98% accuracy. The proposed cryptosystem has wide applicability, and it may open a new avenue for high-security complex information encryption and decryption by utilizing optical speckles.

preprint2021arXiv

Embedded Self-Distillation in Compact Multi-Branch Ensemble Network for Remote Sensing Scene Classification

Remote sensing (RS) image scene classification task faces many challenges due to the interference from different characteristics of different geographical elements. To solve this problem, we propose a multi-branch ensemble network to enhance the feature representation ability by fusing features in final output logits and intermediate feature maps. However, simply adding branches will increase the complexity of models and decline the inference efficiency. On this issue, we embed self-distillation (SD) method to transfer knowledge from ensemble network to main-branch in it. Through optimizing with SD, main-branch will have close performance as ensemble network. During inference, we can cut other branches to simplify the whole model. In this paper, we first design compact multi-branch ensemble network, which can be trained in an end-to-end manner. Then, we insert SD method on output logits and feature maps. Compared to previous methods, our proposed architecture (ESD-MBENet) performs strongly on classification accuracy with compact design. Extensive experiments are applied on three benchmark RS datasets AID, NWPU-RESISC45 and UC-Merced with three classic baseline models, VGG16, ResNet50 and DenseNet121. Results prove that our proposed ESD-MBENet can achieve better accuracy than previous state-of-the-art (SOTA) complex models. Moreover, abundant visualization analysis make our method more convincing and interpretable.

preprint2021arXiv

Experimental exploration of five-qubit quantum error correcting code with superconducting qubits

Quantum error correction is an essential ingredient for universal quantum computing. Despite tremendous experimental efforts in the study of quantum error correction, to date, there has been no demonstration in the realisation of universal quantum error correcting code, with the subsequent verification of all key features including the identification of an arbitrary physical error, the capability for transversal manipulation of the logical state, and state decoding. To address this challenge, we experimentally realise the $[\![5,1,3]\!]$ code, the so-called smallest perfect code that permits corrections of generic single-qubit errors. In the experiment, having optimised the encoding circuit, we employ an array of superconducting qubits to realise the $[\![5,1,3]\!]$ code for several typical logical states including the magic state, an indispensable resource for realising non-Clifford gates. The encoded states are prepared with an average fidelity of $57.1(3)\%$ while with a high fidelity of $98.6(1)\%$ in the code space. Then, the arbitrary single-qubit errors introduced manually are identified by measuring the stabilizers. We further implement logical Pauli operations with a fidelity of $97.2(2)\%$ within the code space. Finally, we realise the decoding circuit and recover the input state with an overall fidelity of $74.5(6)\%$, in total with $92$ gates. Our work demonstrates each key aspect of the $[\![5,1,3]\!]$ code and verifies the viability of experimental realization of quantum error correcting codes with superconducting qubits.

preprint2021arXiv

Experimental Realization of Device-Independent Quantum Randomness Expansion

Randomness expansion where one generates a longer sequence of random numbers from a short one is viable in quantum mechanics but not allowed classically. Device-independent quantum randomness expansion provides a randomness resource of the highest security level. Here, we report the first experimental realization of device-independent quantum randomness expansion secure against quantum side information established through quantum probability estimation. We generate $5.47\times10^8$ quantum-proof random bits while consuming $4.39\times10^8$ bits of entropy, expanding our store of randomness by $1.08\times10^8$ bits at a latency of about $13.1$ h, with a total soundness error $4.6\times10^{-10}$. Device-independent quantum randomness expansion not only enriches our understanding of randomness but also sets a solid base to bring quantum-certifiable random bits into realistic applications.

preprint2020arXiv

$n$-Reference Transfer Learning for Saliency Prediction

Benefiting from deep learning research and large-scale datasets, saliency prediction has achieved significant success in the past decade. However, it still remains challenging to predict saliency maps on images in new domains that lack sufficient data for data-hungry models. To solve this problem, we propose a few-shot transfer learning paradigm for saliency prediction, which enables efficient transfer of knowledge learned from the existing large-scale saliency datasets to a target domain with limited labeled examples. Specifically, very few target domain examples are used as the reference to train a model with a source domain dataset such that the training process can converge to a local minimum in favor of the target domain. Then, the learned model is further fine-tuned with the reference. The proposed framework is gradient-based and model-agnostic. We conduct comprehensive experiments and ablation study on various source domain and target domain pairs. The results show that the proposed framework achieves a significant performance improvement. The code is publicly available at \url{https://github.com/luoyan407/n-reference}.

preprint2020arXiv

Active Learning for Skewed Data Sets

Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training data. Both of these problems occur with surprising frequency in many web applications. For instance, detecting offensive or sensitive content in online communities (pornography, violence, and hate-speech) is receiving enormous attention from industry as well as research communities. Such problems have both the characteristics we describe -- a vast majority of content is not offensive, so the number of positive examples for such content is orders of magnitude smaller than the negative examples. Furthermore, there is usually only a small amount of initial training data available when building machine-learned models to solve such problems. To address both these issues, we propose a hybrid active learning algorithm (HAL) that balances exploiting the knowledge available through the currently labeled training examples with exploring the large amount of unlabeled data available. Through simulation results, we show that HAL makes significantly better choices for what points to label when compared to strong baselines like margin-sampling. Classifiers trained on the examples selected for labeling by HAL easily out-perform the baselines on target metrics (like area under the precision-recall curve) given the same budget for labeling examples. We believe HAL offers a simple, intuitive, and computationally tractable way to structure active learning for a wide range of machine learning applications.

preprint2020arXiv

AiR: Attention with Reasoning Capability

While attention has been an increasingly popular component in deep neural networks to both interpret and boost performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attentions on their reasoning capability and how they impact task performance. Furthermore, we propose a supervision method to jointly and progressively optimize attention, reasoning, and task performance so that models learn to look at regions of interests by following a reasoning process. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR

preprint2020arXiv

Constructing Multipartite Bell inequalities from stabilizers

Bell inequality with self-testing property has played an important role in quantum information field with both fundamental and practical applications. However, it is generally challenging to find Bell inequalities with self-testing property for multipartite states and actually there are not many known candidates. In this work, we propose a systematical framework to construct Bell inequalities from stabilizers which are maximally violated by general stabilizer states, with two observables for each local party. We show that the constructed Bell inequalities can self-test any stabilizer state which is essentially device-independent, if and only if these stabilizers can uniquely determine the state in a device-dependent manner. This bridges the gap between device-independent and device-dependent verification methods. Our framework can provide plenty of Bell inequalities for self-testing stabilizer states. Among them, we give two families of Bell inequalities with different advantages: (1) a family of Bell inequalities with a constant ratio of quantum and classical bounds using 2N correlations, (2) Single pair inequalities improving on all previous robustness self-testing bounds using N+1 correlations, which are both efficient and suitable for realizations in multipartite systems. Our framework can not only inspire more fruitful multipartite Bell inequalities from conventional verification methods, but also pave the way for their practical applications.

preprint2020arXiv

Direction Concentration Learning: Enhancing Congruency in Machine Learning

One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In this work, we first define such an agreement in a concepts learning process as congruency. Formally, given a particular task and sufficiently large dataset, the congruency issue occurs in the learning process whereby the task-specific semantics in the training data are highly varying. We propose a Direction Concentration Learning (DCL) method to improve congruency in the learning process, where enhancing congruency influences the convergence path to be less circuitous. The experimental results show that the proposed DCL method generalizes to state-of-the-art models and optimizers, as well as improves the performances of saliency prediction task, continual learning task, and classification task. Moreover, it helps mitigate the catastrophic forgetting problem in the continual learning task. The code is publicly available at https://github.com/luoyan407/congruency.

preprint2020arXiv

GradMix: Multi-source Transfer across Domains and Tasks

The computer vision community is witnessing an unprecedented rate of new tasks being proposed and addressed, thanks to the deep convolutional networks' capability to find complex mappings from X to Y. The advent of each task often accompanies the release of a large-scale annotated dataset, for supervised training of deep network. However, it is expensive and time-consuming to manually label sufficient amount of training data. Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task. While previous works mostly focus on transfer learning from a single source, we study multi-source transfer across domains and tasks (MS-DTT), in a semi-supervised setting. We propose GradMix, a model-agnostic method applicable to any model trained with gradient-based learning rule, to transfer knowledge via gradient descent by weighting and mixing the gradients from all sources during training. GradMix follows a meta-learning objective, which assigns layer-wise weights to the source gradients, such that the combined gradient follows the direction that minimize the loss for a small set of samples from the target dataset. In addition, we propose to adaptively adjust the learning rate for each mini-batch based on its importance to the target task, and a pseudo-labeling method to leverage the unlabeled samples in the target domain. We conduct MS-DTT experiments on two tasks: digit recognition and action recognition, and demonstrate the advantageous performance of the proposed method against multiple baselines.

preprint2020arXiv

MGML: Multi-Granularity Multi-Level Feature Ensemble Network for Remote Sensing Scene Classification

Remote sensing (RS) scene classification is a challenging task to predict scene categories of RS images. RS images have two main characters: large intra-class variance caused by large resolution variance and confusing information from large geographic covering area. To ease the negative influence from the above two characters. We propose a Multi-granularity Multi-Level Feature Ensemble Network (MGML-FENet) to efficiently tackle RS scene classification task in this paper. Specifically, we propose Multi-granularity Multi-Level Feature Fusion Branch (MGML-FFB) to extract multi-granularity features in different levels of network by channel-separate feature generator (CS-FG). To avoid the interference from confusing information, we propose Multi-granularity Multi-Level Feature Ensemble Module (MGML-FEM) which can provide diverse predictions by full-channel feature generator (FC-FG). Compared to previous methods, our proposed networks have ability to use structure information and abundant fine-grained features. Furthermore, through ensemble learning method, our proposed MGML-FENets can obtain more convincing final predictions. Extensive classification experiments on multiple RS datasets (AID, NWPU-RESISC45, UC-Merced and VGoogle) demonstrate that our proposed networks achieve better performance than previous state-of-the-art (SOTA) networks. The visualization analysis also shows the good interpretability of MGML-FENet.

preprint2020arXiv

MM-FSOD: Meta and metric integrated few-shot object detection

In the object detection task, CNN (Convolutional neural networks) models always need a large amount of annotated examples in the training process. To reduce the dependency of expensive annotations, few-shot object detection has become an increasing research focus. In this paper, we present an effective object detection framework (MM-FSOD) that integrates metric learning and meta-learning to tackle the few-shot object detection task. Our model is a class-agnostic detection model that can accurately recognize new categories, which are not appearing in training samples. Specifically, to fast learn the features of new categories without a fine-tuning process, we propose a meta-representation module (MR module) to learn intra-class mean prototypes. MR module is trained with a meta-learning method to obtain the ability to reconstruct high-level features. To further conduct similarity of features between support prototype with query RoIs features, we propose a Pearson metric module (PR module) which serves as a classifier. Compared to the previous commonly used metric method, cosine distance metric. PR module enables the model to align features into discriminative embedding space. We conduct extensive experiments on benchmark datasets FSOD, MS COCO, and PASCAL VOC to demonstrate the feasibility and efficiency of our model. Comparing with the previous method, MM-FSOD achieves state-of-the-art (SOTA) results.

preprint2020arXiv

Saliency Prediction with External Knowledge

The last decades have seen great progress in saliency prediction, with the success of deep neural networks that are able to encode high-level semantics. Yet, while humans have the innate capability in leveraging their knowledge to decide where to look (e.g. people pay more attention to familiar faces such as celebrities), saliency prediction models have only been trained with large eye-tracking datasets. This work proposes to bridge this gap by explicitly incorporating external knowledge for saliency models as humans do. We develop networks that learn to highlight regions by incorporating prior knowledge of semantic relationships, be it general or domain-specific, depending on the task of interest. At the core of the method is a new Graph Semantic Saliency Network (GraSSNet) that constructs a graph that encodes semantic relationships learned from external knowledge. A Spatial Graph Attention Network is then developed to update saliency features based on the learned graph. Experiments show that the proposed model learns to predict saliency from the external knowledge and outperforms the state-of-the-art on four saliency benchmarks.

preprint2020arXiv

Video Storytelling: Textual Summaries for Events

Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generation. In this work, we introduce the problem of video storytelling, which aims at generating coherent and succinct stories for long videos. Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video. We propose novel methods to address the challenges. First, we propose a context-aware framework for multimodal embedding learning, where we design a Residual Bidirectional Recurrent Neural Network to leverage contextual information from past and future. Second, we propose a Narrator model to discover the underlying storyline. The Narrator is formulated as a reinforcement learning agent which is trained by directly optimizing the textual metric of the generated story. We evaluate our method on the Video Story dataset, a new dataset that we have collected to enable the study. We compare our method with multiple state-of-the-art baselines, and show that our method achieves better performance, in terms of quantitative measures and user study.

preprint2019arXiv

One-Shot Coherence Distillation: Towards Completing the Picture

The resource framework of quantum coherence was introduced by Baumgratz, Cramer and Plenio [PRL 113, 140401 (2014)] and further developed by Winter and Yang [PRL 116, 120404 (2016)]. We consider the one-shot problem of distilling pure coherence from a single instance of a given resource state. Specifically, we determine the distillable coherence with a given fidelity under incoherent operations (IO) through a generalisation of the Winter-Yang protocol. This is compared to the distillable coherence under maximal incoherent operations (MIO) and dephasing-covariant incoherent operations (DIO), which can be cast as a semidefinite programme, that has been presented previously by Regula et al. [PRL 121, 010401 (2018)]. Our results are given in terms of a smoothed min-relative entropy distance from the incoherent set of states, and a variant of the hypothesis-testing relative entropy distance, respectively. The one-shot distillable coherence is also related to one-shot randomness extraction. Moreover, from the one-shot formulas under IO, MIO, DIO, we can recover the optimal distillable rate in the many-copy asymptotics, yielding the relative entropy of coherence. These results can be compared with previous work by some of the present authors [Zhao et al., PRL 120, 070403 (2018)] on one-shot coherence formation under IO, MIO, DIO and also SIO. This shows that the amount of distillable coherence is essentially the same for IO, DIO, and MIO, despite the fact that the three classes of operations are very different. We also relate the distillable coherence under strictly incoherent operations (SIO) to a constrained hypothesis testing problem and explicitly show the existence of bound coherence under SIO in the asymptotic regime.

preprint2019arXiv

Randomness expansion secured by quantum contextuality

The output randomness from a random number generator can be certified by observing the violation of quantum contextuality inequalities based on the Kochen-Specker theorem. Contextuality can be tested in a single quantum system, which significantly simplifies the experimental requirements to observe the violation comparing to the ones based on nonlocality tests. However, it is not yet resolved how to ensure compatibilities for sequential measurements that is required in contextuality tests. Here, we employ a modified Klyachko-Can-Binicioğlu-Shumovsky contextuality inequality, which can ease the strict compatibility requirement on measurements. On a trapped single \Ba ion system, we experimentally demonstrate violation of the contextuality inequality and realize self-testing quantum random number expansion by closing detection loopholes. We perform $1.29 \times 10^8$ trials of experiments and extract the randomness of $8.06 \times 10^5$ bits with a speed of 270 bits s$^{-1}$. Our demonstration paves the way for the practical high-speed spot-checking quantum random number expansion and other secure information processing applications.

preprint2019arXiv

Simultaneous Certification of Entangled States and Measurements in Bounded Dimensional Semi-Quantum Games

Certification of quantum systems and operations is a central task in quantum information processing. Most current schemes rely on a tomography with fully characterised devices, while this may not be met in real experiments. Device characterisations can be removed with device-independent tests, it is technically challenging at the moment, though. In this letter, we investigate the problem of certifying entangled states and measurements via semi-quantum games, a type of non-local quantum games with well characterised quantum inputs, balancing practicality and device-independence. We first design a specific bounded-dimensional measurement-device-independent game, with which we simultaneously certify any pure entangled state and Bell state measurement operators. Afterwards via a duality treatment of state and measurement, we interpret the dual form of this game as a source-independent bounded-dimensional entanglement swapping protocol and show the whole process, including any entangled projector and Bell states, can be certified with this protocol. In particular, our results do not require a complete Bell state measurement, which is beneficial for experiments and practical use.

preprint2016arXiv

Efficient measurement-device-independent detection of multipartite entanglement structure

Witnessing entanglement is crucial in quantum information processing. With properly preparing ancillary states, it has been shown previously that genuine entanglement can be witnessed without trusting measurement devices. In this work, we generalize the scenario and show that generic multipartite entanglement structures, including entanglement of subsystems and entanglement depth, can be witnessed via measurement-device-independent means. As the original measurement-device-independent entanglement witness scheme exploits only one out of four Bell measurement outcomes for each party, a direct generalization to multipartite quantum states will inevitably cause inefficiency in entanglement detection after taking account of statistical fluctuations. To resolve this problem, we also present a way to utilize all the measurement outcomes. The scheme is efficient for multipartite entanglement detection and can be realized with state-of-the-art technologies.

preprint2016arXiv

Foveation-based Mechanisms Alleviate Adversarial Examples

We show that adversarial examples, i.e., the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions. To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as a linear classifier: CNNs act locally linearly to changes in the image regions with objects recognized by the CNN, and in other regions the CNN may act non-linearly. Then, we corroborate that when the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation. This is because, hypothetically, the CNNs for ImageNet are robust to changes of scale and translation of the object produced by the foveation, but this property does not generalize to transformations of the perturbation. As a result, the accuracy after a foveation is almost the same as the accuracy of the CNN without the adversarial perturbation, even if the adversarial perturbation is calculated taking into account a foveation.

preprint2015arXiv

Clauser-Horne Bell test with imperfect random inputs

Bell test is one of the most important tools in quantum information science. On the one hand, it enables fundamental test for the physics laws of nature, and on the other hand, it can be also applied in varieties of device independent tasks such as quantum key distribution and random number generation. In practice, loopholes existing in experimental demonstrations of Bell tests may affect the validity of the conclusions. In this work, we focus on the randomness (freewill) loophole and investigate the randomness requirement in a well-known Bell test, the Clauser-Horne test, under various conditions. With partially random inputs, we explicitly bound the Bell value for all local hidden variable models by optimizing the classical strategy. Our result thus puts input randomness requirement on the Clauser-Horne test under varieties of practical scenarios. The employed analysis technique can be generalized to other Bell's inequalities.

preprint2015arXiv

Measurement-device-independent quantum key distribution over untrustful metropolitan network

Quantum cryptography holds the promise to establish an information-theoretically secure global network. All field tests of metropolitan-scale quantum networks to date are based on trusted relays. The security critically relies on the accountability of the trusted relays, which will break down if the relay is dishonest or compromised. Here, we construct a measurement-device-independent quantum key distribution (MDIQKD) network in a star topology over a 200 square kilometers metropolitan area, which is secure against untrustful relays and against all detection attacks. In the field test, our system continuously runs through one week with a secure key rate ten times larger than previous result. Our results demonstrate that the MDIQKD network, combining the best of both worlds --- security and practicality, constitutes an appealing solution to secure metropolitan communications.

preprint2014arXiv

Learning of Proto-object Representations via Fixations on Low Resolution

While previous researches in eye fixation prediction typically rely on integrating low-level features (e.g. color, edge) to form a saliency map, recently it has been found that the structural organization of these features into a proto-object representation can play a more significant role. In this work, we present a computational framework based on deep network to demonstrate that proto-object representations can be learned from low-resolution image patches from fixation regions. We advocate the use of low-resolution inputs in this work due to the following reasons: (1) Proto-objects are computed in parallel over an entire visual field (2) People can perceive or recognize objects well even it is in low resolution. (3) Fixations from lower resolution images can predict fixations on higher resolution images. In the proposed computational model, we extract multi-scale image patches on fixation regions from eye fixation datasets, resize them to low resolution and feed them into a hierarchical. With layer-wise unsupervised feature learning, we find that many proto-objects like features responsive to different shapes of object blobs are learned out. Visualizations also show that these features are selective to potential objects in the scene and the responses of these features work well in predicting eye fixations on the images when combined with learned weights.

Qi Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

Numerical investigation of progressive damage and associated seismicity on a laboratory fault

Artificial Intelligence Enables Real-Time and Intuitive Control of Prostheses via Nerve Interface

Attention in Reasoning: Dataset, Analysis, and Modeling

Constrained Heterogeneous Two-facility Location Games with Max-variant Cost

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Improved Pump Setpoint Selection Using a Calibrated Hydraulic Model of a High-Pressure Irrigation System

Learning to Minimize the Remainder in Supervised Learning

Look in Different Views: Multi-Scheme Regression Guided Cell Instance Segmentation

REX: Reasoning-aware and Grounded Explanation

Speckle-based optical cryptosystem and its application for human face recognition via deep learning

Embedded Self-Distillation in Compact Multi-Branch Ensemble Network for Remote Sensing Scene Classification

Experimental exploration of five-qubit quantum error correcting code with superconducting qubits

Experimental Realization of Device-Independent Quantum Randomness Expansion

$n$-Reference Transfer Learning for Saliency Prediction

Active Learning for Skewed Data Sets

AiR: Attention with Reasoning Capability

Constructing Multipartite Bell inequalities from stabilizers

Direction Concentration Learning: Enhancing Congruency in Machine Learning

GradMix: Multi-source Transfer across Domains and Tasks

MGML: Multi-Granularity Multi-Level Feature Ensemble Network for Remote Sensing Scene Classification

MM-FSOD: Meta and metric integrated few-shot object detection

Saliency Prediction with External Knowledge

Video Storytelling: Textual Summaries for Events

One-Shot Coherence Distillation: Towards Completing the Picture

Randomness expansion secured by quantum contextuality

Simultaneous Certification of Entangled States and Measurements in Bounded Dimensional Semi-Quantum Games

Efficient measurement-device-independent detection of multipartite entanglement structure

Foveation-based Mechanisms Alleviate Adversarial Examples

Clauser-Horne Bell test with imperfect random inputs

Measurement-device-independent quantum key distribution over untrustful metropolitan network

Learning of Proto-object Representations via Fixations on Low Resolution