Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves global modeling with linear complexity. Furthermore, a cross-modal matching transform module is introduced to enhance interactive modeling quality by utilizing representative features from both modalities. Extensive experiments demonstrate competitive performance against state-of-the-art methods.

preprint2023arXiv

Absorption and scattering of a high dimensional noncommutative black hole

In this work, we investigate the scattering of massless plane scalar waves by the high dimensional noncommutative Schwarzschild-Tangherlini black hole. We use the partial wave approach to determine the scattering and absorption cross sections in the incident wavelength range. Our numerical results demonstrate that the bigger the noncommutative parameter, the smaller the maximum value of the related partial absorption cross section, however the tendency is slightly. We also discovered that when the noncommutative parameter is weak, the absorption cross section of the high dimensional black hole oscillates in the low frequency zone. The total absorption cross section oscillates around the geometrical optical limit in the high frequency range, and the scattering characteristics of black holes with various parameters are visibly different. The influence on the differential scattering cross section is particularly pronounced at large angles.

preprint2023arXiv

Theory of the microwave impedance microscopy of Chern insulators

Microwave impedance microscopy (MIM) has been utilized to directly visualize topological edge states in many quantum materials, from quantum Hall systems to topological insulators, across the GHz regime. While the microwave response for conventional metals and insulators can be accurately quantified using simple lumped-element circuits, the applicability of these classical models to more exotic quantum systems remains limited. In this work, we present a general theoretical framework of the MIM response of arbitrary quantum materials within linear response theory. As a special case, we model the microwave response of topological edge states in a Chern insulator and predict an enhanced MIM response at the crystal boundaries due to collective edge magnetoplasmon (EMP) excitations. The resonance frequency of these plasmonic modes should depend quantitatively on the topological invariant of the Chern insulator state and on the sample's circumference, which highlights their non-local, topological nature. To benchmark our analytical predictions, we experimentally probe the MIM response of quantum anomalous Hall edge states in a Cr-doped (Bi,Sb)2Te3 topological insulator and perform numerical simulations using a classical formulation of the EMP modes based on this realistic tip-sample geometry, both of which yield results consistent with our theoretical picture. We also show how the technique of MIM can be used to quantitatively extract the topological invariant of a Chern insulator, disentangle the signatures of topological versus trivial edge states, and shed light on the microscopic nature of dissipation along the crystal boundaries.

preprint2022arXiv

A Measurement of Proton, Deuteron, Triton and Alpha Particle Emission after Nuclear Muon Capture on Al, Si and Ti with the AlCap Experiment

Heavy charged particles after nuclear muon capture are an important nuclear physics background to the muon-to-electron conversion experiments Mu2e and COMET, which will search for charged lepton flavor violation at an unprecedented level of sensitivity. The AlCap experiment measured the yield and energy spectra of protons, deuterons, tritons, and alpha particles emitted after the nuclear capture of muons stopped in Al, Si, and Ti in the low energy range relevant for the muon-to-electron conversion experiments. Individual charged particle types were identified in layered silicon detector packages and their initial energy distributions were unfolded from the observed energy spectra. Detailed information on yields and energy spectra for all observed nuclei are presented in the paper.

preprint2022arXiv

Are Neural Ranking Models Robust?

Recently, we have witnessed the bloom of neural ranking models in the information retrieval (IR) field. So far, much effort has been devoted to developing effective neural ranking models that can generalize well on new data. There has been less attention paid to the robustness perspective. Unlike the effectiveness which is about the average performance of a system under normal purpose, robustness cares more about the system performance in the worst case or under malicious operations instead. When a new technique enters into the real-world application, it is critical to know not only how it works in average, but also how would it behave in abnormal situations. So we raise the question in this work: Are neural ranking models robust? To answer this question, firstly, we need to clarify what we refer to when we talk about the robustness of ranking models in IR. We show that robustness is actually a multi-dimensional concept and there are three ways to define it in IR: 1) The performance variance under the independent and identically distributed (I.I.D.) setting; 2) The out-of-distribution (OOD) generalizability; and 3) The defensive ability against adversarial operations. The latter two definitions can be further specified into two different perspectives respectively, leading to 5 robustness tasks in total. Based on this taxonomy, we build corresponding benchmark datasets, design empirical experiments, and systematically analyze the robustness of several representative neural ranking models against traditional probabilistic ranking models and learning-to-rank (LTR) models. The empirical results show that there is no simple answer to our question. While neural ranking models are less robust against other IR models in most cases, some of them can still win 1 out of 5 tasks. This is the first comprehensive study on the robustness of neural ranking models.

preprint2022arXiv

Certified Robustness to Word Substitution Ranking Attack for Neural Ranking Models

Neural ranking models (NRMs) have achieved promising results in information retrieval. NRMs have also been shown to be vulnerable to adversarial examples. A typical Word Substitution Ranking Attack (WSRA) against NRMs was proposed recently, in which an attacker promotes a target document in rankings by adding human-imperceptible perturbations to its text. This raises concerns when deploying NRMs in real-world applications. Therefore, it is important to develop techniques that defend against such attacks for NRMs. In empirical defenses adversarial examples are found during training and used to augment the training set. However, such methods offer no theoretical guarantee on the models' robustness and may eventually be broken by other sophisticated WSRAs. To escape this arms race, rigorous and provable certified defense methods for NRMs are needed. To this end, we first define the \textit{Certified Top-$K$ Robustness} for ranking models since users mainly care about the top ranked results in real-world scenarios. A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack. Then, we introduce a Certified Defense method, named CertDR, to achieve certified top-$K$ robustness against WSRA, based on the idea of randomized smoothing. Specifically, we first construct a smoothed ranker by applying random word substitutions on the documents, and then leverage the ranking property jointly with the statistical property of the ensemble to provably certify top-$K$ robustness. Extensive experiments on two representative web search datasets demonstrate that CertDR can significantly outperform state-of-the-art empirical defense methods for ranking models.

preprint2022arXiv

Contrastive learning-based computational histopathology predict differential expression of cancer driver genes

Digital pathological analysis is run as the main examination used for cancer diagnosis. Recently, deep learning-driven feature extraction from pathology images is able to detect genetic variations and tumor environment, but few studies focus on differential gene expression in tumor cells. In this paper, we propose a self-supervised contrastive learning framework, HistCode, to infer differential gene expressions from whole slide images (WSIs). We leveraged contrastive learning on large-scale unannotated WSIs to derive slide-level histopathological feature in latent space, and then transfer it to tumor diagnosis and prediction of differentially expressed cancer driver genes. Our extensive experiments showed that our method outperformed other state-of-the-art models in tumor diagnosis tasks, and also effectively predicted differential gene expressions. Interestingly, we found the higher fold-changed genes can be more precisely predicted. To intuitively illustrate the ability to extract informative features from pathological images, we spatially visualized the WSIs colored by the attentive scores of image tiles. We found that the tumor and necrosis areas were highly consistent with the annotations of experienced pathologists. Moreover, the spatial heatmap generated by lymphocyte-specific gene expression patterns was also consistent with the manually labeled WSI.

preprint2022arXiv

DeepPERF: A Deep Learning-Based Approach For Improving Software Performance

Improving software performance is an important yet challenging part of the software development cycle. Today, the majority of performance inefficiencies are identified and patched by performance experts. Recent advancements in deep learning approaches and the wide-spread availability of open source data creates a great opportunity to automate the identification and patching of performance problems. In this paper, we present DeepPERF, a transformer-based approach to suggest performance improvements for C# applications. We pretrain DeepPERF on English and Source code corpora and followed by finetuning for the task of generating performance improvement patches for C# applications. Our evaluation shows that our model can generate the same performance improvement suggestion as the developer fix in ~53% of the cases, getting ~34% of them verbatim in our expert-verified dataset of performance changes made by C# developers. Additionally, we evaluate DeepPERF on 50 open source C# repositories on GitHub using both benchmark and unit tests and find that our model is able to suggest valid performance improvements that can improve both CPU usage and Memory allocations. So far we've submitted 19 pull-requests with 28 different performance optimizations and 11 of these PRs have been approved by the project owners.

preprint2022arXiv

Dual-Tasks Siamese Transformer Framework for Building Damage Assessment

Accurate and fine-grained information about the extent of damage to buildings is essential for humanitarian relief and disaster response. However, as the most commonly used architecture in remote sensing interpretation tasks, Convolutional Neural Networks (CNNs) have limited ability to model the non-local relationship between pixels. Recently, Transformer architecture first proposed for modeling long-range dependency in natural language processing has shown promising results in computer vision tasks. Considering the frontier advances of Transformer architecture in the computer vision field, in this paper, we present the first attempt at designing a Transformer-based damage assessment architecture (DamFormer). In DamFormer, a siamese Transformer encoder is first constructed to extract non-local and representative deep features from input multitemporal image-pairs. Then, a multitemporal fusion module is designed to fuse information for downstream tasks. Finally, a lightweight dual-tasks decoder aggregates multi-level features for final prediction. To the best of our knowledge, it is the first time that such a deep Transformer-based network is proposed for multitemporal remote sensing interpretation tasks. The experimental results on the large-scale damage assessment dataset xBD demonstrate the potential of the Transformer-based architecture.

preprint2022arXiv

Exploiting Feature Diversity for Make-up Temporal Video Grounding

This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.

preprint2022arXiv

Federated Unlearning with Knowledge Distillation

Federated Learning (FL) is designed to protect the data privacy of each client during the training process by transmitting only models instead of the original data. However, the trained model may memorize certain information about the training data. With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client. We propose a novel federated unlearning method to eliminate a client's contribution by subtracting the accumulated historical updates from the model and leveraging the knowledge distillation method to restore the model's performance without using any data from the clients. This method does not have any restrictions on the type of neural networks and does not rely on clients' participation, so it is practical and efficient in the FL system. We further introduce backdoor attacks in the training process to help evaluate the unlearning effect. Experiments on three canonical datasets demonstrate the effectiveness and efficiency of our method.

preprint2022arXiv

Fully Convolutional Change Detection Framework with Generative Adversarial Network for Unsupervised, Weakly Supervised and Regional Supervised Change Detection

Deep learning for change detection is one of the current hot topics in the field of remote sensing. However, most end-to-end networks are proposed for supervised change detection, and unsupervised change detection models depend on traditional pre-detection methods. Therefore, we proposed a fully convolutional change detection framework with generative adversarial network, to conclude unsupervised, weakly supervised, regional supervised, and fully supervised change detection tasks into one framework. A basic Unet segmentor is used to obtain change detection map, an image-to-image generator is implemented to model the spectral and spatial variation between multi-temporal images, and a discriminator for changed and unchanged is proposed for modeling the semantic changes in weakly and regional supervised change detection task. The iterative optimization of segmentor and generator can build an end-to-end network for unsupervised change detection, the adversarial process between segmentor and discriminator can provide the solutions for weakly and regional supervised change detection, the segmentor itself can be trained for fully supervised task. The experiments indicate the effectiveness of the propsed framework in unsupervised, weakly supervised and regional supervised change detection. This paper provides theorical definitions for unsupervised, weakly supervised and regional supervised change detection tasks, and shows great potentials in exploring end-to-end network for remote sensing change detection.

preprint2022arXiv

Kaon meson condensate in neutron star matter including hyperons

The recent measurement of the mass of neutron stars (PSR J1614 - 2230, PSR J0348 + 0432, MSP J0740 + 6620) restricts the lower limit $\sim 2M_{\odot}$ of the maximum mass of such compact stars, making it possible for dense matter to exist in massive stars. The relativistic mean field theory with parameter sets FSUGold including Kaon condensation is used to describe the properties of neutron stars in $β$ equilibrium. Through careful choice of the parameter of the $σ$-cut $c_σ$, we are able to produce a maximum mass neutron star with Kaon condensation heavier than $2M_{\odot}$, and we find that the parameter $Λ_ν$ of the $ρ-ω$ interaction term in this model has a significant effect on $K^{-}$ condensation. In the case of using $σ$-cut scheme, $K^{-}$ condensation occurs only when the $ρ-ω$ interaction $Λ_ν$ is switched off.

preprint2022arXiv

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics. Recently, deep neural network for code search has been a hot research topic. Typical methods for neural code search first represent the code snippet and query text as separate embeddings, and then use vector distance (e.g. dot-product or cosine) to calculate the semantic similarity between them. There exist many different ways for aggregating the variable length of code or query tokens into a learnable embedding, including bi-encoder, cross-encoder, and poly-encoder. The goal of the query encoder and code encoder is to produce embeddings that are close with each other for a related pair of query and the corresponding desired code snippet, in which the choice and design of encoder is very significant. In this paper, we propose a novel deep semantic model which makes use of the utilities of not only the multi-modal sources, but also feature extractors such as self-attention, the aggregated vectors, combination of the intermediate representations. We apply the proposed model to tackle the CodeSearchNet challenge about semantic code search. We align cross-lingual embedding for multi-modality learning with large batches and hard example mining, and combine different learned representations for better enhancing the representation learning. Our model is trained on CodeSearchNet corpus and evaluated on the held-out data, the final model achieves 0.384 NDCG and won the first place in this benchmark. Models and code are available at https://github.com/overwindows/SemanticCodeSearch.git.

preprint2022arXiv

Learning from Drivers to Tackle the Amazon Last Mile Routing Research Challenge

The goal of the Amazon Last Mile Routing Research Challenge is to integrate the real-life experience of Amazon drivers into the solution of optimal route planning and optimization. This paper presents our method that tackles this challenge by hierarchically combining machine learning and conventional Traveling Salesperson Problem (TSP) solvers. Our method reaps the benefits from both worlds. On the one hand, our method encodes driver know-how by learning a sequential probability model from historical routes at the zone level, where each zone contains a few parcel stops. It then uses a single step policy iteration method, known as the Rollout algorithm, to generate plausible zone sequences sampled from the learned probability model. On the other hand, our method utilizes proven methods developed in the rich TSP literature to sequence stops within each zone efficiently. The outcome of such a combination appeared to be promising. Our method obtained an evaluation score of $0.0374$, which is comparable to what the top three teams have achieved on the official Challenge leaderboard. Moreover, our learning-based method is applicable to driving routes that may exhibit distinct sequential patterns beyond the scope of this Challenge. The source code of our method is publicly available at https://github.com/aws-samples/amazon-sagemaker-amazon-routing-challenge-sol

preprint2022arXiv

Multi-Temporal Spatial-Spectral Comparison Network for Hyperspectral Anomalous Change Detection

Hyperspectral anomalous change detection has been a challenging task for its emphasis on the dynamics of small and rare objects against the prevalent changes. In this paper, we have proposed a Multi-Temporal spatial-spectral Comparison Network for hyperspectral anomalous change detection (MTC-NET). The whole model is a deep siamese network, aiming at learning the prevalent spectral difference resulting from the complex imaging conditions from the hyperspectral images by contrastive learning. A three-dimensional spatial spectral attention module is designed to effectively extract the spatial semantic information and the key spectral differences. Then the gaps between the multi-temporal features are minimized, boosting the alignment of the semantic and spectral features and the suppression of the multi-temporal background spectral difference. The experiments on the "Viareggio 2013" datasets demonstrate the effectiveness of proposed MTC-NET.

preprint2022arXiv

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this paper, we introduce the Word Substitution Ranking Attack (WSRA) task against NRMs, which aims to promote a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text.

preprint2022arXiv

Programming Language Agnostic Mining of Code and Language Pairs with Sequence Labeling Based Question Answering

Mining aligned natural language (NL) and programming language (PL) pairs is a critical task to NL-PL understanding. Existing methods applied specialized hand-crafted features or separately-trained models for each PL. However, they usually suffered from low transferability across multiple PLs, especially for niche PLs with less annotated data. Fortunately, a Stack Overflow answer post is essentially a sequence of text and code blocks and its global textual context can provide PL-agnostic supplementary information. In this paper, we propose a Sequence Labeling based Question Answering (SLQA) method to mine NL-PL pairs in a PL-agnostic manner. In particular, we propose to apply the BIO tagging scheme instead of the conventional binary scheme to mine the code solutions which are often composed of multiple blocks of a post. Experiments on current single-PL single-block benchmarks and a manually-labeled cross-PL multi-block benchmark prove the effectiveness and transferability of SLQA. We further present a parallel NL-PL corpus named Lang2Code automatically mined with SLQA, which contains about 1.4M pairs on 6 PLs. Under statistical analysis and downstream evaluation, we demonstrate that Lang2Code is a large-scale high-quality data resource for further NL-PL research.

preprint2022arXiv

VLMAE: Vision-Language Masked Autoencoder

Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data. However, we observe that most existing VLP methods focus on modeling the interactions between image and text features while neglecting the information disparity between image and text, thus suffering from focal bias. To address this problem, we propose a vision-language masked autoencoder framework (VLMAE). VLMAE employs visual generative learning, facilitating the model to acquire fine-grained and unbiased features. Unlike the previous works, VLMAE pays attention to almost all critical patches in an image, providing more comprehensive understanding. Extensive experiments demonstrate that VLMAE achieves better performance in various vision-language downstream tasks, including visual question answering, image-text retrieval and visual grounding, even with up to 20% pre-training speedup.

preprint2022arXiv

Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling

Weakly-Supervised Semantic Segmentation (WSSS) methods with image-level labels generally train a classification network to generate the Class Activation Maps (CAMs) as the initial coarse segmentation labels. However, current WSSS methods still perform far from satisfactorily because their adopted CAMs 1) typically focus on partial discriminative object regions and 2) usually contain useless background regions. These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks. In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in the classification network to mitigate the above problems. In the visual words learning module, we counter the first problem by enforcing the classification network to learn fine-grained visual word labels so that more object extents could be discovered. Specifically, the visual words are learned with a codebook, which could be updated via two proposed strategies, i.e. learning-based strategy and memory-bank strategy. The second drawback of CAMs is alleviated with the proposed hybrid pooling, which incorporates the global average and local discriminative information to simultaneously ensure object completeness and reduce background regions. We evaluated our methods on PASCAL VOC 2012 and MS COCO 2014 datasets. Without any extra saliency prior, our method achieved 70.6% and 70.7% mIoU on the $val$ and $test$ set of PASCAL VOC dataset, respectively, and 36.2% mIoU on the $val$ set of MS COCO dataset, which significantly surpassed the performance of state-of-the-art WSSS methods.

preprint2021arXiv

Learning to Truncate Ranked Lists for Information Retrieval

Ranked list truncation is of critical importance in a variety of professional information retrieval applications such as patent search or legal search. The goal is to dynamically determine the number of returned documents according to some user-defined objectives, in order to reach a balance between the overall utility of the results and user efforts. Existing methods formulate this task as a sequential decision problem and take some pre-defined loss as a proxy objective, which suffers from the limitation of local decision and non-direct optimization. In this work, we propose a global decision based truncation model named AttnCut, which directly optimizes user-defined objectives for the ranked list truncation. Specifically, we take the successful transformer architecture to capture the global dependency within the ranked list for truncation decision, and employ the reward augmented maximum likelihood (RAML) for direct optimization. We consider two types of user-defined objectives which are of practical usage. One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search. Empirical results over the Robust04 and MQ2007 datasets demonstrate the effectiveness of our approach as compared with the state-of-the-art baselines.

preprint2021arXiv

Mitigating Backdoor Attacks in Federated Learning

Malicious clients can attack federated learning systems using malicious data, including backdoor samples, during the training phase. The compromised global model will perform well on the validation dataset designed for the task, but a small subset of data with backdoor patterns may trigger the model to make a wrong prediction. There has been an arms race between attackers who tried to conceal attacks and defenders who tried to detect attacks during the aggregation stage of training on the server-side. In this work, we propose a new and effective method to mitigate backdoor attacks after the training phase. Specifically, we design a federated pruning method to remove redundant neurons in the network and then adjust the model's extreme weight values. Our experiments conducted on distributed Fashion-MNIST show that our method can reduce the average attack success rate from 99.7% to 1.9% with a 5.5% loss of test accuracy on the validation dataset. To minimize the pruning influence on test accuracy, we can fine-tune after pruning, and the attack success rate drops to 6.4%, with only a 1.7% loss of test accuracy. Further experiments under Distributed Backdoor Attacks on CIFAR-10 also show promising results that the average attack success rate drops more than 70% with less than 2% loss of test accuracy on the validation dataset.

preprint2020arXiv

Change Detection in Multi-temporal VHR Images Based on Deep Siamese Multi-scale Convolutional Networks

Very-high-resolution (VHR) images can provide abundant ground details and spatial geometric information. Change detection in multi-temporal VHR images plays a significant role in urban expansion and area internal change analysis. Nevertheless, traditional change detection methods can neither take full advantage of spatial context information nor cope with the complex internal heterogeneity of VHR images. In this paper, a powerful feature extraction model entitled multi-scale feature convolution unit (MFCU) is adopted for change detection in multi-temporal VHR images. MFCU can extract multi-scale spatial-spectral features in the same layer. Based on the unit two novel deep siamese convolutional neural networks, called as deep siamese multi-scale convolutional network (DSMS-CN) and deep siamese multi-scale fully convolutional network (DSMS-FCN), are designed for unsupervised and supervised change detection, respectively. For unsupervised change detection, an automatic pre-classification is implemented to obtain reliable training samples, then DSMS-CN fits the statistical distribution of changed and unchanged areas from selected training samples through MFCU modules and deep siamese architecture. For supervised change detection, the end-to-end deep fully convolutional network DSMS-FCN is trained in any size of multi-temporal VHR images, and directly outputs the binary change map. In addition, for the purpose of solving the inaccurate localization problem, the fully connected conditional random field (FC-CRF) is combined with DSMS-FCN to refine the results. The experimental results with challenging data sets confirm that the two proposed architectures perform better than the state-of-the-art methods.

preprint2020arXiv

Deep Siamese Domain Adaptation Convolutional Neural Network for Cross-domain Change Detection in Multispectral Images

Recently, deep learning has achieved promising performance in the change detection task. However, the deep models are task-specific and data set bias often exists, thus it is difficult to transfer a network trained on one multi-temporal data set (source domain) to another multi-temporal data set with very limited (even no) labeled data (target domain). In this paper, we propose a novel deep siamese domain adaptation convolutional neural network (DSDANet) architecture for cross-domain change detection. In DSDANet, a siamese convolutional neural network first extracts spatial-spectral features from multi-temporal images. Then, through multiple kernel maximum mean discrepancy (MK-MMD), the learned feature representation is embedded into a reproducing kernel Hilbert space (RKHS), in which the distribution of two domains can be explicitly matched. By optimizing the network parameters and kernel coefficients with the source labeled data and target unlabeled data, the DSDANet can learn transferrable feature representation that can bridge the discrepancy between two domains. To the best of our knowledge, it is the first time that such a domain adaptation-based deep network is proposed for change detection. The theoretical analysis and experimental results demonstrate the effectiveness and potential of the proposed method.

preprint2020arXiv

DSDANet: Deep Siamese Domain Adaptation Convolutional Neural Network for Cross-domain Change Detection

Change detection (CD) is one of the most vital applications in remote sensing. Recently, deep learning has achieved promising performance in the CD task. However, the deep models are task-specific and CD data set bias often exists, hence it is inevitable that deep CD models would suffer degraded performance after transferring it from original CD data set to new ones, making manually label numerous samples in the new data set unavoidable, which costs a large amount of time and human labor. How to learn a transferable CD model in the data set with enough labeled data (original domain) but can well detect changes in another data set without labeled data (target domain)? This is defined as the cross-domain change detection problem. In this paper, we propose a novel deep siamese domain adaptation convolutional neural network (DSDANet) architecture for cross-domain CD. In DSDANet, a siamese convolutional neural network first extracts spatial-spectral features from multi-temporal images. Then, through multi-kernel maximum mean discrepancy (MK-MMD), the learned feature representation is embedded into a reproducing kernel Hilbert space (RKHS), in which the distribution of two domains can be explicitly matched. By optimizing the network parameters and kernel coefficients with the source labeled data and target unlabeled data, DSDANet can learn transferrable feature representation that can bridge the discrepancy between two domains. To the best of our knowledge, it is the first time that such a domain adaptation-based deep network is proposed for CD. The theoretical analysis and experimental results demonstrate the effectiveness and potential of the proposed method.

preprint2020arXiv

Intelligent Radome Design Using Multilayer Metamaterial Structures to Realize Energy Isolation and Asymmetric Propagation of Electromagnetic Wave

An intelligent radome utilizing composite metamaterial structures is presented and investigated in this article, which can realize energy isolation and asymmetric propagation of electromagnetic (EM) wave self-adaptively by controlling states of PIN diodes. The whole structure mainly consists of a broadband polarization-sensitive polarization converter (PC) and an active frequency selective rasorber (AFSR) switching between a transmission mode and absorption mode which is used as an energy-selective surface (ESS). Among them, the function of the PC is to make the EM waves transmit asymmetrically, and the purpose of AFSR is to make the high-power waves be reflected or absorbed, which depends on the polarization type of the wave. Thus, the radome can realize both asymmetric propagations of EM wave and electromagnetic shielding. The equivalent circuit models (ECM) and parametric studies are considered to explain the physical operating mechanism of PC and AFSR. The fabricated structure with 7*7 unit cells is experimentally demonstrated and the measured results agree with simulated results well. Considering the distinctive characteristic of self-actuation, the presented concept has the potential application in electromagnetic stealth and HPEMWs shielding to protect communication devices.

preprint2020arXiv

Low Precision Floating-point Arithmetic for High Performance FPGA-based CNN Acceleration

Low precision data representation is important to reduce storage size and memory access for convolutional neural networks (CNNs). Yet, existing methods have two major limitations: (1) requiring re-training to maintain accuracy for deep CNNs, and (2) needing 16-bit floating-point or 8-bit fixed-point for a good accuracy. In this paper, we propose a low precision (8-bit) floating-point (LPFP) quantization method for FPGA-based acceleration to overcome the above limitations. Without any re-training, LPFP finds an optimal 8-bit data representation with negligible top-1/top-5 accuracy loss (within 0.5%/0.3% in our experiments, respectively, and significantly better than existing methods for deep CNNs). Furthermore, we implement one 8-bit LPFP multiplication by one 4-bit multiply-adder (MAC) and one 3-bit adder, and therefore implement four 8-bit LPFP multiplications using one DSP slice of Xilinx Kintex 7 family (KC705 in this paper) while one DSP can implement only two 8-bit fixed-point multiplications. Experiments on six typical CNNs for inference show that on average, we improve throughput by 64.5x over Intel i9 CPU and by 1.5x over existing FPGA accelerators. Particularly for VGG16 and YOLO, compared to six recent FPGA accelerators, we improve average throughput by 3.5x and 27.5x and improve average throughput per DSP by 4.1x and 5x, respectively. To the best of our knowledge, this is the first in-depth study to simplify one multiplication for CNN inference to one 4-bit MAC and implement four multiplications within one DSP while maintaining comparable accuracy without any re-training.

preprint2020arXiv

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Classifying multi-temporal scene land-use categories and detecting their semantic scene-level changes for imagery covering urban regions could straightly reflect the land-use transitions. Existing methods for scene change detection rarely focus on the temporal correlation of bi-temporal features, and are mainly evaluated on small scale scene change detection datasets. In this work, we proposed a CorrFusion module that fuses the highly correlated components in bi-temporal feature embeddings. We firstly extracts the deep representations of the bi-temporal inputs with deep convolutional networks. Then the extracted features will be projected into a lower dimension space to computed the instance-level correlation. The cross-temporal fusion will be performed based on the computed correlation in CorrFusion module. The final scene classification are obtained with softmax activation layers. In the objective function, we introduced a new formulation for calculating the temporal correlation. The detailed derivation of backpropagation gradients for the proposed module is also given in this paper. Besides, we presented a much larger scale scene change detection dataset and conducted experiments on this dataset. The experimental results demonstrated that our proposed CorrFusion module could remarkably improve the multi-temporal scene classification and scene change detection results.

preprint2020arXiv

On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond

Variational autoencoders (VAEs) combine latent variables with amortized variational inference, whose optimization usually converges into a trivial local optimum termed posterior collapse, especially in text modeling. By tracking the optimization dynamics, we observe the encoder-decoder incompatibility that leads to poor parameterizations of the data manifold. We argue that the trivial local optimum may be avoided by improving the encoder and decoder parameterizations since the posterior network is part of a transition map between them. To this end, we propose Coupled-VAE, which couples a VAE model with a deterministic autoencoder with the same structure and improves the encoder and decoder parameterizations via encoder weight sharing and decoder signal matching. We apply the proposed Coupled-VAE approach to various VAE models with different regularization, posterior family, decoder structure, and optimization strategy. Experiments on benchmark datasets (i.e., PTB, Yelp, and Yahoo) show consistently improved results in terms of probability estimation and richness of the latent space. We also generalize our method to conditional language modeling and propose Coupled-CVAE, which largely improves the diversity of dialogue generation on the Switchboard dataset.

preprint2020arXiv

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

Self-supervised pre-training, such as BERT, MASS and BART, has emerged as a powerful technique for natural language understanding and generation. Existing pre-training techniques employ autoencoding and/or autoregressive objectives to train Transformer-based models by recovering original word tokens from corrupted text with some masked tokens. The training goals of existing techniques are often inconsistent with the goals of many language generation tasks, such as generative question answering and conversational response generation, for producing new text given context. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. The new scheme alleviates the mismatch introduced by the existing denoising scheme between pre-training and fine-tuning where generation is more than reconstructing original text. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks covering generative question answering (Rank 1 on the official MARCO leaderboard), abstractive summarization on CNN/DailyMail as well as Gigaword, question generation on SQuAD, and conversational response generation on Cornell Movie Dialogues.

preprint2020arXiv

Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks

Convolutional neural networks (CNNs) achieve state-of-the-art performance at the cost of becoming deeper and larger. Although quantization (both fixed-point and floating-point) has proven effective for reducing storage and memory access, two challenges -- 1) accuracy loss caused by quantization without calibration, fine-tuning or re-training for deep CNNs and 2) hardware inefficiency caused by floating-point quantization -- prevent processors from completely leveraging the benefits. In this paper, we propose a low-precision floating-point quantization oriented processor, named Phoenix, to address the above challenges. We primarily have three key observations: 1) 8-bit floating-point quantization incurs less error than 8-bit fixed-point quantization; 2) without using any calibration, fine-tuning or re-training techniques, normalization before quantization further reduces accuracy degradation; 3) 8-bit floating-point multiplier achieves higher hardware efficiency than 8-bit fixed-point multiplier if the full-precision product is applied. Based on these key observations, we propose a normalization-oriented 8-bit floating-point quantization method to reduce storage and memory access with negligible accuracy loss (within 0.5%/0.3% for top-1/top-5 accuracy, respectively). We further design a hardware processor to address the hardware inefficiency caused by floating-point multiplier. Compared with a state-of-the-art accelerator, Phoenix is 3.32x and 7.45x better in performance with the same core area for AlexNet and VGG16, respectively.

preprint2020arXiv

ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data

Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state of the art performance for pixel level classification of objects. \textcolor{black}{Here we propose a reliable framework for performant results for the task of semantic segmentation of monotemporal very high resolution aerial images. Our framework consists of a novel deep learning architecture, ResUNet-a, and a novel loss function based on the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination with residual connections, atrous convolutions, pyramid scene parsing pooling and multi-tasking inference. ResUNet-a infers sequentially the boundary of the objects, the distance transform of the segmentation mask, the segmentation mask and a colored reconstruction of the input. Each of the tasks is conditioned on the inference of the previous ones, thus establishing a conditioned relationship between the various tasks, as this is described through the architecture's computation graph. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has excellent convergence properties and behaves well even under the presence of highly imbalanced classes.} The performance of our modeling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.9\% over all classes for our best model.

preprint2020arXiv

Searching for Dark Matter Signals from Local Dwarf Spheroidal Galaxies at Low Radio Frequencies in the GLEAM Survey

The search for emission from weakly interacting massive particle (WIMP) dark matter annihilation and decay has become a multi-pronged area of research not only targeting a diverse selection of astrophysical objects, but also taking advantage of the entire electromagnetic spectrum. The decay of WIMP particles into standard model particles has been suggested as a possible channel for synchrotron emission to be detected at low radio frequencies. Here, we present the stacking analysis of a sample of 33 dwarf spheroidal (dSph) galaxies with low-frequency (72 - 231 MHz) radio images from the GaLactic and Extragalactic All-sky Murchison Widefield Array (GLEAM) survey. We produce radial surface brightness profiles of images centred upon each dSph galaxy with background radio sources masked. We remove ten fields from the stacking due to contamination from either poorly subtracted, bright radio sources or strong background gradients across the field. The remaining 23 dSph galaxies are stacked in an attempt to obtain a statistical detection of any WIMP-induced synchrotron emission in these systems. We find that the stacked radial brightness profile does not exhibit a statistically significant detection above the 95% confidence level of $\sim$1.5 mJy beam$^{-1}$. This novel technique shows the potential of using low-frequency radio images to constrain fundamental properties of particle dark matter.

preprint2020arXiv

The GLEAM 4-Jy (G4Jy) Sample: I. Definition and the catalogue

The Murchison Widefield Array (MWA) has observed the entire southern sky (Declination, $δ<$ 30 deg) at low radio-frequencies, over the range 72-231 MHz. These observations constitute the GaLactic and Extragalactic All-sky MWA (GLEAM) Survey, and we use the extragalactic catalogue (Galactic latitude, $|b| >$ 10 deg) to define the GLEAM 4-Jy (G4Jy) Sample. This is a complete sample of the &#39;brightest&#39; radio-sources ($S_{\mathrm{151MHz}} >$ 4 Jy), the majority of which are active galactic nuclei with powerful radio-jets. Crucially, low-frequency observations allow the selection of such sources in an orientation-independent way (i.e. minimising the bias caused by Doppler boosting, inherent in high-frequency surveys). We then use higher-resolution radio images, and information at other wavelengths, to morphologically classify the brightest components in GLEAM. We also conduct cross-checks against the literature, and perform internal matching, in order to improve sample completeness (which is estimated to be $>$ 95.5%). This results in a catalogue of 1,863 sources, making the G4Jy Sample over 10 times larger than that of the revised Third Cambridge Catalogue of Radio Sources (3CRR; $S_{\mathrm{178MHz}} >$ 10.9 Jy). Of these G4Jy sources, 78 are resolved by the MWA (Phase-I) synthesised beam ($\sim$2 arcmin at 200 MHz), and we label 67% of the sample as &#39;single&#39;, 26% as &#39;double&#39;, 4% as &#39;triple&#39;, and 3% as having &#39;complex&#39; morphology at $\sim$1 GHz (45-arcsec resolution). Alongside this, our value-added catalogue provides mid-infrared source associations (subject to 6-arcsec resolution at 3.4 micron) for the radio emission, as identified through visual inspection and thorough checks against the literature. As such, the G4Jy Sample can be used as a reliable training set for cross-identification via machine-learning algorithms. [Abstract abridged for arXiv submission.]

preprint2020arXiv

The GLEAM 4-Jy (G4Jy) Sample: II. Host-galaxy identification for individual sources

The entire southern sky (Declination, $δ<$ 30 deg) has been observed using the Murchison Widefield Array (MWA), which provides radio imaging of $\sim$2-arcmin resolution at low frequencies (72-231 MHz). This is the GaLactic and Extragalactic All-sky MWA (GLEAM) Survey, and we have previously used a combination of visual inspection, cross-checks against the literature, and internal matching to identify the &#39;brightest&#39; radio-sources ($S_{\mathrm{151MHz}} >$ 4 Jy) in the extragalactic catalogue (Galactic latitude, $|b| >$ 10 deg). We refer to these 1,863 sources as the GLEAM 4-Jy (G4Jy) Sample, and use radio images (of $\leq$ 45-arcsec resolution), and multi-wavelength information, to assess their morphology and identify the galaxy that is hosting the radio emission (where appropriate). Details of how to access all of the overlays used for this work are available at https://github.com/svw26/G4Jy. Alongside this we conduct further checks against the literature, which we document in this paper for individual sources. Whilst the vast majority of the G4Jy Sample are active galactic nuclei with powerful radio-jets, we highlight that it also contains a nebula, two nearby, star-forming galaxies, a cluster relic, and a cluster halo. There are also three extended sources for which we are unable to infer the mechanism that gives rise to the low-frequency emission. In the G4Jy catalogue we provide mid-infrared identifications for 86% of the sources, and flag the remainder as: having an uncertain identification (129 sources), having a faint/uncharacterised mid-infrared host (126 sources), or it being inappropriate to specify a host (2 sources). For the subset of 129 sources, there is ambiguity concerning candidate host-galaxies, and this includes four sources (B0424$-$728, B0703$-$451, 3C 198, and 3C 403.1) where we question the existing identification.

preprint2019arXiv

Candidate radio supernova remnants observed by the GLEAM survey over $345^\circ < l < 60^\circ$ and $180^\circ < l < 240^\circ$

We examined the latest data release from the GaLactic and Extragalactic All-sky Murchison Widefield Array (GLEAM) survey covering $345^\circ < l < 60^\circ$, $180^\circ < l < 240^\circ$, using these data and that of the Widefield Infrared Survey Explorer to follow up proposed candidate Supernova Remnants from other sources. Of the 101 candidates proposed in the region, we are able to definitively confirm ten as SNRs, tentatively confirm two as SNRs, and reclassify five as Hii regions. A further two are detectable in our images but difficult to classify; the remaining 82 are undetectable in these data. We also investigated the 18 unclassified Multi-Array Galactic Plane Imaging Survey (MAGPIS) candidate SNRs, newly confirming three as SNRs, reclassifying two as Hii regions, and exploring the unusual spectra and morphology of two others.

preprint2019arXiv

GaLactic and Extragalactic All-sky Murchison Widefield Array (GLEAM) survey II: Galactic Plane $345^\circ < l < 67^\circ$, $180^\circ < l < 240^\circ$

This work makes available a further 2,860deg$^2$ of the GLEAM survey, covering half of the accessible Galactic Plane, across twenty frequency bands sampling $72-231$MHz, with resolution $4&#39;-2&#39;$. Unlike previous GLEAM data releases, we used multi-scale clean to better deconvolve large-scale Galactic structure. For the Galactic longitude ranges $345^\circ < l < 67^\circ$, $180^\circ < l < 240^\circ$, we provide a compact source catalogue of 22,037 components selected from a 60-MHz bandwidth image centred at 200-MHz, with RMS noise $\approx10-20$mJy beam$^{-1}$ and position accuracy better than $2&#34;$. The catalogue has a completeness of 50% at $\approx120$mJy, and a reliability of 99.86%. It covers Galactic latitudes $1^\circ\leq|b|\leq10^\circ$ toward the Galactic Centre and $|b|\leq10^\circ$ for other regions, and is available from Vizier; images covering $|b|\leq10^\circ$ for all longitudes are made available on the GLEAM VO server and SkyView.

preprint2019arXiv

New candidate radio supernova remnants detected in the GLEAM survey over $345^\circ < l < 60^\circ$, $180^\circ < l < 240^\circ$

We have detected 27 new supernova remnants (SNRs) using a new data release of the GLEAM survey from the Murchison Widefield Array (MWA) telescope, including the lowest surface-brightness SNR ever detected, G0.1-9.7. Our method uses spectral fitting to the radio continuum to derive spectral indices for 26/27 candidates, and our low-frequency observations probe a steeper-spectrum population than previously discovered. None of the candidates have coincident Wide-field Infrared Survey Explorer mid-IR emission, further showing that the emission is non-thermal. Using pulsar associations we derive physical properties for six candidate SNRs, finding G0.1-9.7 may be younger than 10kyr. 60% of the candidates subtend areas larger than 0.2deg$^{2}$ on the sky, compared to $<25$% of previously-detected SNRs. We also make the first detection of two SNRs in the Galactic longitude range $220^\circ-240^\circ$.