Researcher profile

Bin Luo

Bin Luo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
30works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

30 published item(s)

preprint2026arXiv

SARU: A Shadow-Aware and Removal Unified Framework for Remote Sensing Images with New Benchmarks

Shadows are a prevalent problem in remote sensing imagery (RSI), degrading visual quality and severely limiting the performance of downstream tasks like object detection and semantic segmentation. Most prior works treat shadow detection and removal as separate, cascaded tasks, which can lead to cumbersome process and error accumulation. Furthermore, many deep learning methods rely on paired shadow and non-shadow images for training, which are often unavailable in practice. To address these challenges, we propose Shadow-Aware and Removal Unified (SARU) Framework , a cohesive two-stage framework. First, its dual-branch detection module (DBCSF-Net) fuses multi-color space and semantic features to generate high-fidelity shadow masks, effectively distinguishing shadows from dark objects. Then, leveraging these masks, a novel, training-free physical algorithm (N$^2$SGSR) restores illumination by transferring properties from adjacent non-shadow regions within the single input image. To facilitate rigorous evaluation and foster future work, we also introduce two new benchmark datasets: the RSI Shadow Detection (RSISD) dataset and the Single-image Shadow Removal Benchmark (SiSRB). Extensive experiments on the AISD and RSISD datasets demonstrate that SARU achieves SOTA shadow detection performance. For shadow removal, our training-free N$^2$SGSR algorithm attains an average processing speed of approximately $1.3$s, which is over $10$ times faster than the SOTA MAOSD while maintains an SRI value close to 0.9 on both the AISD and SiSRB datasets, a level comparable to the advanced RS-GSSR method. By holistically integrating shadow detection and removal to mitigate error propagation and eliminating the dependency on paired training data, SARU establishes a robust, practical framework for real-world RSI analysis. The code and datasets are publicly available at: https://github.com/AeroVILab-AHU/SARU

preprint2024arXiv

Transformer RGBT Tracking with Spatio-Temporal Multimodal Tokens

Many RGBT tracking researches primarily focus on modal fusion design, while overlooking the effective handling of target appearance changes. While some approaches have introduced historical frames or fuse and replace initial templates to incorporate temporal information, they have the risk of disrupting the original target appearance and accumulating errors over time. To alleviate these limitations, we propose a novel Transformer RGBT tracking approach, which mixes spatio-temporal multimodal tokens from the static multimodal templates and multimodal search regions in Transformer to handle target appearance changes, for robust RGBT tracking. We introduce independent dynamic template tokens to interact with the search region, embedding temporal information to address appearance changes, while also retaining the involvement of the initial static template tokens in the joint feature extraction process to ensure the preservation of the original reliable target appearance information that prevent deviations from the target appearance caused by traditional temporal updates. We also use attention mechanisms to enhance the target features of multimodal template tokens by incorporating supplementary modal cues, and make the multimodal search region tokens interact with multimodal dynamic template tokens via attention mechanisms, which facilitates the conveyance of multimodal-enhanced target change information. Our module is inserted into the transformer backbone network and inherits joint feature extraction, search-template matching, and cross-modal interaction. Extensive experiments on three RGBT benchmark datasets show that the proposed approach maintains competitive performance compared to other state-of-the-art tracking algorithms while running at 39.1 FPS.

preprint2024arXiv

Unifying Graph Contrastive Learning via Graph Message Augmentation

Graph contrastive learning is usually performed by first conducting Graph Data Augmentation (GDA) and then employing a contrastive learning pipeline to train GNNs. As we know that GDA is an important issue for graph contrastive learning. Various GDAs have been developed recently which mainly involve dropping or perturbing edges, nodes, node attributes and edge attributes. However, to our knowledge, it still lacks a universal and effective augmentor that is suitable for different types of graph data. To address this issue, in this paper, we first introduce the graph message representation of graph data. Based on it, we then propose a novel Graph Message Augmentation (GMA), a universal scheme for reformulating many existing GDAs. The proposed unified GMA not only gives a new perspective to understand many existing GDAs but also provides a universal and more effective graph data augmentation for graph self-supervised learning tasks. Moreover, GMA introduces an easy way to implement the mixup augmentor which is natural for images but usually challengeable for graphs. Based on the proposed GMA, we then propose a unified graph contrastive learning, termed Graph Message Contrastive Learning (GMCL), that employs attribution-guided universal GMA for graph contrastive learning. Experiments on many graph learning tasks demonstrate the effectiveness and benefits of the proposed GMA and GMCL approaches.

preprint2023arXiv

Tracking with Human-Intent Reasoning

Advances in perception modeling have significantly improved the performance of object tracking. However, the current methods for specifying the target object in the initial frame are either by 1) using a box or mask template, or by 2) providing an explicit language description. These manners are cumbersome and do not allow the tracker to have self-reasoning ability. Therefore, this work proposes a new tracking task -- Instruction Tracking, which involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames. To achieve this, we investigate the integration of knowledge and reasoning capabilities from a Large Vision-Language Model (LVLM) for object tracking. Specifically, we propose a tracker called TrackGPT, which is capable of performing complex reasoning-based tracking. TrackGPT first uses LVLM to understand tracking instructions and condense the cues of what target to track into referring embeddings. The perception component then generates the tracking results based on the embeddings. To evaluate the performance of TrackGPT, we construct an instruction tracking benchmark called InsTrack, which contains over one thousand instruction-video pairs for instruction tuning and evaluation. Experiments show that TrackGPT achieves competitive performance on referring video object segmentation benchmarks, such as getting a new state-of the-art performance of 66.5 $\mathcal{J}\&\mathcal{F}$ on Refer-DAVIS. It also demonstrates a superior performance of instruction tracking under new evaluation protocols. The code and models are available at \href{https://github.com/jiawen-zhu/TrackGPT}{https://github.com/jiawen-zhu/TrackGPT}.

preprint2023arXiv

Universal adversarial perturbation for remote sensing images

Recently, with the application of deep learning in the remote sensing image (RSI) field, the classification accuracy of the RSI has been dramatically improved compared with traditional technology. However, even the state-of-the-art object recognition convolutional neural networks are fooled by the universal adversarial perturbation (UAP). The research on UAP is mostly limited to ordinary images, and RSIs have not been studied. To explore the basic characteristics of UAPs of RSIs, this paper proposes a novel method combining an encoder-decoder network with an attention mechanism to generate the UAP of RSIs. Firstly, the former is used to generate the UAP, which can learn the distribution of perturbations better, and then the latter is used to find the sensitive regions concerned by the RSI classification model. Finally, the generated regions are used to fine-tune the perturbation making the model misclassified with fewer perturbations. The experimental results show that the UAP can make the classification model misclassify, and the attack success rate of our proposed method on the RSI data set is as high as 97.09%.

preprint2022arXiv

A quasar shedding its dust cocoon at redshift 2

We present the first near-IR spectroscopy and joint analyses of multi-wavelength observations for SDSS J082747.14+425241.1, a dust-reddened, weak broad emission-line quasar (WLQ) undergoing a remarkable broad absorption line (BAL) transformation. The systemic redshift is more precisely measured to be $z=2.070\pm0.001$ using H$β$ compared to $z=2.040\pm0.003$ using \mgii\ from the literature, signifying an extreme \mgii\ blueshift of $2140\pm530$ \kms\ relative to H$β$. Using the H$β$-based single-epoch scaling relation with a systematic uncertainty of 0.3 dex, its black hole (BH) mass and Eddington ratio are estimated to be $M_{\rm BH}\sim6.1\times10^8M_\odot$ and $λ_{\rm Edd}\sim0.71$, indicative of being in a rapidly accreting phase. Our investigations confirm the WLQ nature and the LoBAL$\rightarrow$HiBAL transformation, along with a factor of 2 increase in the \mgii+\feii\ emission strength and a decrease of 0.1 in $E(B-V)$ over two decades. The kinetic power of this LoBAL wind at $R\sim$15 pc from its BH is estimated to be $\sim$43\% of the Eddington luminosity, sufficient for quasar feedback upon its host galaxy albeit with an order-of-magnitude uncertainty. This quasar provides a clear example of the long-sought scenario where LoBAL quasars are surrounded by dust cocoons, and wide-angle nuclear winds play a key role in the transition for red quasars evolving into the commonly seen blue quasars.

preprint2022arXiv

An X-ray fading, UV brightening QSO at $z\approx6$

Explaining the existence of $\gtrsim10^8\,\mathrm{M_\odot}$ SMBHs at $z>6$ is a persistent challenge to modern astrophysics. Multi-wavelength observations of $z\gtrsim6$ QSOs reveal that, on average, their accretion physics is similar to that of their counterparts at lower redshift. However, QSOs showing properties that deviate from the general behavior can provide useful insights into the physical processes responsible for the rapid growth of SMBHs in the early universe. We present X-ray (XMM-Newton, 100 ks) follow-up observations of a $z\approx6$ QSO, J1641+3755, which was found to be remarkably X-ray bright in a 2018 Chandra dataset. J1641+3755 is not detected in the 2021 XMM-Newton observation, implying that its X-ray flux decreased by a factor $\gtrsim7$ on a notably short timescale (i.e., $\approx115$ rest-frame days), making it the $z>4$ QSO with the largest variability amplitude. We also obtained rest-frame UV spectroscopic and photometric data with textit{LBT}, and compared them with archival datasets. Surprisingly, we found that J1641+3755 became brighter in the rest-frame UV band from 2003 to 2016, while no strong variation occurred from 2016 to 2021. Multiple narrow absorption features are detected in its rest-frame UV spectrum, and several of them can be associated with an intervening system at $z=5.67$. The variability properties of J1641+3755 can be due to intrinsic variations of the accretion rate, a small-scale obscuration event, gravitational lensing due to an intervening object, or an unrelated X-ray transient in a foreground galaxy in 2018. Accounting for all of the $z>6$ QSOs with multiple X-ray observations separated by $>10$ rest-frame days, we found an enhancement of strongly (i.e., by a factor $>3$) X-ray variable objects compared to QSOs at later cosmic times. This finding may be related to the physics of fast accretion in high-redshift QSOs.

preprint2022arXiv

Connecting Low- and High-Redshift Weak Emission-Line Quasars via HST Spectroscopy of Ly$α$ Emission

We present ultraviolet spectroscopy covering the Ly$α$ + N V complex of six candidate low-redshift ($0.9 < z < 1.5$) weak emission-line quasars (WLQs) based on observations with the Hubble Space Telescope. The original systematic searches for these puzzling Type 1 quasars with intrinsically weak broad emission lines revealed an $N \approx 100$ WLQ population from optical spectroscopy of high-redshift ($z > 3$) quasars, defined by a Ly$α$ + N V rest-frame equivalent width (EW) threshold $< 15.4$ Å. Identification of lower-redshift ($z < 3$) WLQ candidates, however, has relied primarily on optical spectroscopy of weak broad emission lines at longer rest-frame wavelengths. With these new observations expanding existing optical coverage into the ultraviolet, we explore unifying the low- and high-$z$ WLQ populations via EW[Ly$α$+NV]. Two objects in the sample unify with high-$z$ WLQs, three others appear consistent with the intermediate portion of the population connecting WLQs and normal quasars, and the final object is consistent with typical quasars. The expanded wavelength coverage improves the number of available line diagnostics for our individual targets, allowing a better understanding of the shapes of their ionizing continua. The ratio of EW[Ly$α$+NV] to EW[MgII] in our sample is generally small but varied, favoring a soft ionizing continuum scenario for WLQs, and we find a lack of correlation between EW[Ly$α$+NV] and the X-ray properties of our targets, consistent with a &#34;slim-disk&#34; shielding gas model. We also find indications that weak absorption may be a more significant contaminant in low-$z$ WLQ populations than previously thought.

preprint2022arXiv

Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code

Recent years have seen the successful application of deep learning to software engineering (SE). In particular, the development and use of pre-trained models of source code has enabled state-of-the-art results to be achieved on a wide variety of SE tasks. This paper provides an overview of this rapidly advancing field of research and reflects on future research directions.

preprint2022arXiv

Definitive upper bound on the negligible contribution of quasars to cosmic reionization

Cosmic (hydrogen) reionization marks one of the major phase transitions of the universe at redshift z >= 6. During this epoch, hydrogen atoms in the intergalactic medium (IGM) were ionized by Lyman continuum (LyC) photons. However, it remains challenging to identify the major sources of the LyC photons responsible for reionization. In particular, individual contributions of quasars (or active galactic nuclei, AGNs) and galaxies are still under debate. Here we construct the far-ultraviolet (far-UV) luminosity function for type 1 quasars at z >= 6 that spans 10 magnitudes (-19 < M_UV < -29), conclusively showing that quasars made a negligible contribution to reionization. We mainly search for quasars in the low-luminosity range of M_UV > -23 mag that is critical to determine quasars&#39; total LyC photon production but has been barely explored previously. We find that the quasar population can only provide less than 7% (95% confidence level) of the total photons needed to keep the universe ionized at z = 6.0 - 6.6. Our result suggests that galaxies, presumably low-luminosity star-forming systems, are the major sources of hydrogen reionization.

preprint2022arXiv

Detection and characterization of microseismic events from fiber-optic DAS data using deep learning

Microseismic analysis is a valuable tool for fracture characterization in the earth&#39;s subsurface. As distributed acoustic sensing (DAS) fibers are deployed at depth inside wells, they hold vast potential for high-resolution microseismic analysis. However, the accurate detection of microseismic signals in continuous DAS data is challenging and time-consuming. We design, train, and deploy a deep learning model to detect microseismic events in DAS data automatically. We create a curated dataset of nearly 7,000 manually-selected events and an equal number of background noise examples. We optimize the deep learning model&#39;s network architecture together with its training hyperparameters by Bayesian optimization. The trained model achieves an accuracy of 98.6% on our benchmark dataset and even detects low-amplitude events missed during manual labeling. Our methodology detects more than 100,000 events allowing the reconstruction of spatio-temporal fracture development far more accurately and efficiently than would have been feasible by traditional methods.

preprint2022arXiv

Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification

Few-shot classification which aims to recognize unseen classes using very limited samples has attracted more and more attention. Usually, it is formulated as a metric learning problem. The core issue of few-shot classification is how to learn (1) consistent representations for images in both support and query sets and (2) effective metric learning for images between support and query sets. In this paper, we show that the two challenges can be well modeled simultaneously via a unified Query-Support TransFormer (QSFormer) model. To be specific,the proposed QSFormer involves global query-support sample Transformer (sampleFormer) branch and local patch Transformer (patchFormer) learning branch. sampleFormer aims to capture the dependence of samples in support and query sets for image representation. It adopts the Encoder, Decoder and Cross-Attention to respectively model the Support, Query (image) representation and Metric learning for few-shot classification task. Also, as a complementary to global learning branch, we adopt a local patch Transformer to extract structural representation for each image sample by capturing the long-range dependence of local image patches. In addition, a novel Cross-scale Interactive Feature Extractor (CIFE) is proposed to extract and fuse multi-scale CNN features as an effective backbone module for the proposed few-shot learning method. All modules are integrated into a unified framework and trained in an end-to-end manner. Extensive experiments on four popular datasets demonstrate the effectiveness and superiority of the proposed QSFormer.

preprint2022arXiv

High-dimensional robust approximated M-estimators for mean regression with asymmetric data

Asymmetry along with heteroscedasticity or contamination often occurs with the growth of data dimensionality. In ultra-high dimensional data analysis, such irregular settings are usually overlooked for both theoretical and computational convenience. In this paper, we establish a framework for estimation in high-dimensional regression models using Penalized Robust Approximated quadratic M-estimators (PRAM). This framework allows general settings such as random errors lack of symmetry and homogeneity, or the covariates are not sub-Gaussian. To reduce the possible bias caused by the data&#39;s irregularity in mean regression, PRAM adopts a loss function with a flexible robustness parameter growing with the sample size. Theoretically, we first show that, in the ultra-high dimension setting, PRAM estimators have local estimation consistency at the minimax rate enjoyed by the LS-Lasso. Then we show that PRAM with an appropriate non-convex penalty in fact agrees with the local oracle solution, and thus obtain its oracle property. Computationally, we demonstrate the performances of six PRAM estimators using three types of loss functions for approximation (Huber, Tukey&#39;s biweight and Cauchy loss) combined with two types of penalty functions (Lasso and MCP). Our simulation studies and real data analysis demonstrate satisfactory finite sample performances of the PRAM estimator under general irregular settings.

preprint2022arXiv

On the Robustness of &#34;Robust reversible data hiding scheme based on two-layer embedding strategy&#34;

In the paper &#34;Robust reversible data hiding scheme based on two-layer embedding strategy&#34; published in INS recently, Kumar et al. proposed a robust reversible data hiding (RRDH) scheme based on two-layer embedding. Secret data was embedded into the most significant bit (MSB) planes to increase robustness, and a sorting strategy based on local complexity was adopted to reduce distortion. However, Kumar et al.&#39;s reversible data hiding (RDH) scheme is not as robust against joint photographic experts group (JPEG) compression as stated and can not be called RRDH. This comment first gives a brief description of their RDH scheme, then analyses their scheme&#39;s robustness from the perspective of JPEG compression principles. JPEG compression will change pixel values, thereby destroying auxiliary information and pixel value ordering required to extract secret data correctly, making their scheme not robust. Next, the changes in both bit plane and pixel value ordering after JPEG compression are shown and analysed by different robustness-testing experiments. Finally, some suggestions are given to improve the robustness.

preprint2022arXiv

Spectral Energy Distributions in Three Deep-Drilling Fields of the Vera C. Rubin Observatory Legacy Survey of Space and Time: Source Classification and Galaxy Properties

W-CDF-S, ELAIS-S1, and XMM-LSS will be three Deep-Drilling Fields (DDFs) of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), but their extensive multi-wavelength data have not been fully utilized as done in the COSMOS field, another LSST DDF. To prepare for future science, we fit source spectral energy distributions (SEDs) from X-ray to far-infrared in these three fields mainly to derive galaxy stellar masses and star-formation rates. We use CIGALE v2022.0, a code that has been regularly developed and evaluated, for the SED fitting. Our catalog includes 0.8 million sources covering $4.9~\mathrm{deg^2}$ in W-CDF-S, 0.8 million sources covering $3.4~\mathrm{deg^2}$ in ELAIS-S1, and 1.2 million sources covering $4.9~\mathrm{deg^2}$ in XMM-LSS. Besides fitting normal galaxies, we also select candidates that may host active galactic nuclei (AGNs) or are experiencing recent star-formation variations and use models specifically designed for these sources to fit their SEDs; this increases the utility of our catalog for various projects in the future. We calibrate our measurements by comparison with those in well-studied smaller regions and briefly discuss the implications of our results. We also perform detailed tests of the completeness and purity of SED-selected AGNs. Our data can be retrieved from a public website.

preprint2022arXiv

SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations

Recent years have seen the successful application of large pre-trained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their application to SE tasks. First, the majority of the pre-trained models focus on pre-training only the encoder of the Transformer. For generation tasks that are addressed using models with the encoder-decoder architecture, however, there is no reason why the decoder should be left out during pre-training. Second, many existing pre-trained models, including state-of-the-art models such as T5-learning, simply reuse the pre-training tasks designed for natural languages. Moreover, to learn the natural language description of source code needed eventually for code-related tasks such as code summarization, existing pre-training tasks require a bilingual corpus composed of source code and the associated natural language description, which severely limits the amount of data for pre-training. To this end, we propose SPT-Code, a sequence-to-sequence pre-trained model for source code. In order to pre-train SPT-Code in a sequence-to-sequence manner and address the aforementioned weaknesses associated with existing pre-training tasks, we introduce three pre-training tasks that are specifically designed to enable SPT-Code to learn knowledge of source code, the corresponding code structure, as well as a natural language description of the code without relying on any bilingual corpus, and eventually exploit these three sources of information when it is applied to downstream tasks. Experimental results demonstrate that SPT-Code achieves state-of-the-art performance on five code-related downstream tasks after fine-tuning.

preprint2022arXiv

Tiny Object Tracking: A Large-scale Dataset and A Baseline

Tiny objects, frequently appearing in practical applications, have weak appearance and features, and receive increasing interests in meany vision tasks, such as object detection and segmentation. To promote the research and development of tiny object tracking, we create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames. Each frame is carefully annotated with a high-quality bounding box. In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities, and annotate these attributes for facilitating the attribute-based performance analysis. To provide a strong baseline in tiny object tracking, we propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework to effectively enhance the feature representation, discrimination and localization abilities in tracking tiny objects. Extensive experiments are performed on the proposed dataset, and the results prove the superiority and effectiveness of MKDNet compared with state-of-the-art methods. The dataset, the algorithm code, and the evaluation code are available at https://github.com/mmic-lcl/Datasets-and-benchmark-code.

preprint2022arXiv

Unified GCNs: Towards Connecting GCNs with CNNs

Graph Convolutional Networks (GCNs) have been widely demonstrated their powerful ability in graph data representation and learning. Existing graph convolution layers are mainly designed based on graph signal processing and transform aspect which usually suffer from some limitations, such as over-smoothing, over-squashing and non-robustness, etc. As we all know that Convolution Neural Networks (CNNs) have received great success in many computer vision and machine learning. One main aspect is that CNNs leverage many learnable convolution filters (kernels) to obtain rich feature descriptors and thus can have high capacity to encode complex patterns in visual data analysis. Also, CNNs are flexible in designing their network architecture, such as MobileNet, ResNet, Xception, etc. Therefore, it is natural to arise a question: can we design graph convolutional layer as flexibly as that in CNNs? Innovatively, in this paper, we consider connecting GCNs with CNNs deeply from a general perspective of depthwise separable convolution operation. Specifically, we show that GCN and GAT indeed perform some specific depthwise separable convolution operations. This novel interpretation enables us to better understand the connections between GCNs (GCN, GAT) and CNNs and further inspires us to design more Unified GCNs (UGCNs). As two showcases, we implement two UGCNs, i.e., Separable UGCN (S-UGCN) and General UGCN (G-UGCN) for graph data representation and learning. Promising experiments on several graph representation benchmarks demonstrate the effectiveness and advantages of the proposed UGCNs.

preprint2022arXiv

Vibration-Based Bridge Health Monitoring using Telecommunication Cables

Bridge Health Monitoring (BHM) enables early damage detection of bridges and is thus critical for avoiding more severe damages that might result in major financial and human losses. However, conventional BHM systems require dedicated sensors on bridges, which is costly to install and maintain and hard to scale up. To overcome this challenge, we introduce a new system that uses existing telecommunication cables for Distributed Acoustic Sensing (DAS) to collect bridge dynamic strain responses. In addition, we develop a two-module physics-guided system identification method to extract bridge damage-sensitive information (e.g., natural frequencies and mode shapes) from noisy DAS data by constraining strain and displacement mode shapes by bridge dynamics. This approach does not require installation and maintenance of dedicated sensors on bridges. We evaluate our system with field experiments on a concrete bridge with fiber cable running in a conduit under the deck. Our system successfully identified modal frequencies and reconstructed meter-scale mode shapes.

preprint2021arXiv

PICA: A Pixel Correlation-based Attentional Black-box Adversarial Attack

The studies on black-box adversarial attacks have become increasingly prevalent due to the intractable acquisition of the structural knowledge of deep neural networks (DNNs). However, the performance of emerging attacks is negatively impacted when fooling DNNs tailored for high-resolution images. One of the explanations is that these methods usually focus on attacking the entire image, regardless of its spatial semantic information, and thereby encounter the notorious curse of dimensionality. To this end, we propose a pixel correlation-based attentional black-box adversarial attack, termed as PICA. Firstly, we take only one of every two neighboring pixels in the salient region as the target by leveraging the attentional mechanism and pixel correlation of images, such that the dimension of the black-box attack reduces. After that, a general multiobjective evolutionary algorithm is employed to traverse the reduced pixels and generate perturbations that are imperceptible by the human vision. Extensive experimental results have verified the effectiveness of the proposed PICA on the ImageNet dataset. More importantly, PICA is computationally more efficient to generate high-resolution adversarial examples compared with the existing black-box attacks.

preprint2021arXiv

The Stellar Age Dependence of X-ray Emission from Normal Star-Forming Galaxies in the GOODS Fields

The Chandra Deep Field-South and North surveys (CDFs) provide unique windows into the cosmic history of X-ray emission from normal (non-active) galaxies. Scaling relations of normal galaxy X-ray luminosity (L_X) with star formation rate (SFR) and stellar mass (M_star) have been used to show that the formation rates of low-mass and high-mass X-ray binaries (LMXBs and HMXBs, respectively) evolve with redshift across z = 0-2 following L_HMXB/SFR ~ 1 + z and L_LMXB/M_star ~ (1 + z)^{2-3}. However, these measurements alone do not directly reveal the physical mechanisms behind the redshift evolution of X-ray binaries (XRBs). We derive star-formation histories for a sample of 344 normal galaxies in the CDFs, using spectral energy distribution (SED) fitting of FUV-to-FIR photometric data, and construct a self-consistent, age-dependent model of the X-ray emission from the galaxies. Our model quantifies how X-ray emission from hot gas and XRB populations vary as functions of host stellar-population age. We find that (1) the ratio L_X/M_star declines by a factor of ~1000 from 0-10 Gyr and (2) the X-ray SED becomes harder with increasing age, consistent with a scenario in which the hot gas contribution to the X-ray SED declines quickly for ages above 10 Myr. When dividing our sample into subsets based on metallicity, we find some indication that L_X/M_star is elevated for low-metallicity galaxies, consistent with recent studies of X-ray scaling relations. However, additional statistical constraints are required to quantify both the age and metallicity dependence of X-ray emission from star-forming galaxies.

preprint2020arXiv

\emph{cm}SalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks

Image salient object detection (SOD) is an active research topic in computer vision and multimedia area. Fusing complementary information of RGB and depth has been demonstrated to be effective for image salient object detection which is known as RGB-D salient object detection problem. The main challenge for RGB-D salient object detection is how to exploit the salient cues of both intra-modality (RGB, depth) and cross-modality simultaneously which is known as cross-modality detection problem. In this paper, we tackle this challenge by designing a novel cross-modality Saliency Generative Adversarial Network (\emph{cm}SalGAN). \emph{cm}SalGAN aims to learn an optimal view-invariant and consistent pixel-level representation for RGB and depth images via a novel adversarial learning framework, which thus incorporates both information of intra-view and correlation information of cross-view images simultaneously for RGB-D saliency detection problem. To further improve the detection results, the attention mechanism and edge detection module are also incorporated into \emph{cm}SalGAN. The entire \emph{cm}SalGAN can be trained in an end-to-end manner by using the standard deep neural network framework. Experimental results show that \emph{cm}SalGAN achieves the new state-of-the-art RGB-D saliency detection performance on several benchmark datasets.

preprint2020arXiv

Can Synthetic Data Improve Object Detection Results for Remote Sensing Images?

Deep learning approaches require enough training samples to perform well, but it is a challenge to collect enough real training data and label them manually. In this letter, we propose the use of realistic synthetic data with a wide distribution to improve the performance of remote sensing image aircraft detection. Specifically, to increase the variability of synthetic data, we randomly set the parameters during rendering, such as the size of the instance and the class of background images. In order to make the synthetic images more realistic, we then refine the synthetic images at the pixel level using CycleGAN with real unlabeled images. We also fine-tune the model with a small amount of real data, to obtain a higher accuracy. Experiments on NWPU VHR-10, UCAS-AOD and DIOR datasets demonstrate that the proposed method can be applied for augmenting insufficient real data.

preprint2020arXiv

DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation

Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.

preprint2020arXiv

Ghost Imaging with the Optimal Binary Sampling

To extract the maximum information about the object from a series of binary samples in ghost imaging applications, we propose and demonstrate a framework for optimizing the performance of ghost imaging with binary sampling to approach the results without binarization. The method is based on maximizing the information content of the signal arm detection, by formulating and solving the appropriate parameter estimation problem - finding the binarization threshold that would yield the reconstructed image with optimal Fisher information properties. Applying the 1-bit quantized Poisson statistics to a ghost-imaging model with pseudo-thermal light, we derive the fundamental limit, i.e., the Cramer-Rao lower bound, as the benchmark for the evaluation of the accuracy of the estimator. Our theoertical model and experimental results suggest that, with the optimal binarization threshold, coincident with the statistical mean of all bucket samples, and large number of measurements, the performance of binary sampling GI can approach that of the ordinary one without binarization.

preprint2020arXiv

M$^5$L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking

Classifying the confusing samples in the course of RGBT tracking is a quite challenging problem, which hasn&#39;t got satisfied solution. Existing methods only focus on enlarging the boundary between positive and negative samples, however, the structured information of samples might be harmed, e.g., confusing positive samples are closer to the anchor than normal positive samples.To handle this problem, we propose a novel Multi-Modal Multi-Margin Metric Learning framework, named M$^5$L for RGBT tracking in this paper. In particular, we design a multi-margin structured loss to distinguish the confusing samples which play a most critical role in tracking performance boosting. To alleviate this problem, we additionally enlarge the boundaries between confusing positive samples and normal ones, between confusing negative samples and normal ones with predefined margins, by exploiting the structured information of all samples in each modality.Moreover, a cross-modality constraint is employed to reduce the difference between modalities and push positive samples closer to the anchor than negative ones from two modalities.In addition, to achieve quality-aware RGB and thermal feature fusion, we introduce the modality attentions and learn them using a feature fusion module in our network. Extensive experiments on large-scale datasets testify that our framework clearly improves the tracking performance and outperforms the state-of-the-art RGBT trackers.

preprint2020arXiv

On the relation between hard X-ray photon index versus accretion rate for super-Eddington accreting quasars

We investigate whether the hard X-ray photon index ($Γ$) versus accretion rate correlation for super-Eddington accreting quasars is different from that for sub-Eddington accreting quasars. We construct a sample of 113 bright quasars from the Sloan Digital Sky Survey Data Release 14 quasar catalog, including 38 quasars as the super-Eddington subsample and 75 quasars as the sub-Eddington subsample. We derive black-hole masses using a simple-epoch virial mass formula based on the ${\rm Hβ}$ lines, and we use the standard thin disk model to derive the dimensionless accretion rates ($\dot{\mathscr{M}}$) for our sample. The X-ray data for these quasars are collected from the Chandra and XMM-Newton archives. We fit the hard X-ray spectra using a single power-law model to obtain $Γ$ values. We find a statistically significant ($R_{\rm S}=0.43$, $p=7.75\times{10}^{-3}$) correlation between $Γ$ and $\dot{\mathscr{M}}$ for the super-Eddington subsample. The $Γ$-$\dot{\mathscr{M}}$ correlation for the sub-Eddington subsample is also significant, but weaker ($R_{\rm S}=0.30$, $p=9.98\times{10}^{-3}$). Linear regression analysis shows that ${\rm Γ}=(0.34\pm0.11){\rm log}{\dot{\mathscr{M}}}+(1.71\pm0.17)$ and ${\rm Γ}=(0.09\pm0.04){\rm log}{\dot{\mathscr{M}}}+(1.93\pm0.04)$ for the super- and sub-Eddington subsamples, respectively. The $Γ$-$\dot{\mathscr{M}}$ correlations of the two subsamples are different, suggesting different disk-corona connections in these two types of systems. We propose one qualitative explanation of the steeper $Γ$-$\dot{\mathscr{M}}$ correlation in the super-Eddington regime that involves larger seed photon fluxes received by the compact coronae from the thick disks in super-Eddington accreting quasars.

preprint2020arXiv

Supermassive black holes with high accretion rates in active galactic nuclei. XI. Accretion disk reverberation mapping of Mrk 142

We performed an intensive accretion disk reverberation mapping campaign on the high accretion rate active galactic nucleus Mrk 142 in early 2019. Mrk 142 was monitored with the Neil Gehrels Swift Observatory for 4 months in X-rays and 6 UV/optical filters. Ground-based photometric monitoring was obtained from the Las Cumbres Observatory, Liverpool Telescope and Dan Zowada Memorial Observatory in ugriz filters and the Yunnan Astronomical Observatory in V. Mrk 142 was highly variable throughout, displaying correlated variability across all wavelengths. We measure significant time lags between the different wavelength light curves, finding that through the UV and optical the wavelength-dependent lags, $τ(λ)$, generally follow the relation $τ(λ) \propto λ^{4/3}$, as expected for the $T\propto R^{-3/4}$ profile of a steady-state optically-thick, geometrically-thin accretion disk, though can also be fit by $τ(λ) \propto λ^{2}$, as expected for a slim disk. The exceptions are the u and U band, where an excess lag is observed, as has been observed in other AGN and attributed to continuum emission arising in the broad-line region. Furthermore, we perform a flux-flux analysis to separate the constant and variable components of the spectral energy distribution, finding that the flux-dependence of the variable component is consistent with the $f_ν\proptoν^{1/3}$ spectrum expected for a geometrically-thin accretion disk. Moreover, the X-ray to UV lag is significantly offset from an extrapolation of the UV/optical trend, with the X-rays showing a poorer correlation with the UV than the UV does with the optical. The magnitude of the UV/optical lags is consistent with a highly super-Eddington accretion rate.

preprint2020arXiv

The frequency of extreme X-ray variability of radio-quiet quasars

We analyze 1598 serendipitous Chandra X-ray observations of 462 radio-quiet quasars to constrain the frequency of extreme amplitude X-ray variability that is intrinsic to the quasar corona and innermost accretion flow. The quasars in this investigation are all spectroscopically confirmed, optically bright ($m_i \leq$ 20.2), and contain no identifiable broad absorption lines in their optical/ultraviolet spectra. This sample includes quasars spanning $z \approx$ 0.1 - 4 and probes X-ray variability on timescales of up to $\approx$ 12 rest-frame years. Variability amplitudes are computed between every epoch of observation for each quasar and are analyzed as a function of timescale and luminosity. The tail-heavy distributions of variability amplitudes at all timescales indicate that extreme X-ray variations are driven by an additional physical mechanism and not just typical random fluctuations of the coronal emission. Similarly, extreme X-ray variations of low-luminosity quasars seem to be driven by an additional physical mechanism, whereas high-luminosity quasars seem more consistent with random fluctuations. The amplitude at which an X-ray variability event can be considered extreme is quantified for different timescales and luminosities. Extreme X-ray variations occur more frequently at long timescales ( $Δt \gtrsim$ 300 days) than at shorter timescales, and in low-luminosity quasars compared to high-luminosity quasars over a similar timescale. A binomial analysis indicates that extreme intrinsic X-ray variations are rare, with a maximum occurrence rate of <2.4% of observations. Finally, we present X-ray variability and basic optical emission-line properties of three archival quasars that have been newly discovered to exhibit extreme X-ray variability.

preprint2020arXiv

U2-ONet: A Two-level Nested Octave U-structure with Multiscale Attention Mechanism for Moving Instances Segmentation

Most scenes in practical applications are dynamic scenes containing moving objects, so segmenting accurately moving objects is crucial for many computer vision applications. In order to efficiently segment out all moving objects in the scene, regardless of whether the object has a predefined semantic label, we propose a two-level nested Octave U-structure network with a multiscale attention mechanism called U2-ONet. Each stage of U2-ONet is filled with our newly designed Octave ReSidual U-block (ORSU) to enhance the ability to obtain more context information at different scales while reducing spatial redundancy of feature maps. In order to efficiently train our multi-scale deep network, we introduce a hierarchical training supervision strategy that calculates the loss at each level while adding a knowledge matching loss to keep the optimization consistency. Experimental results show that our method achieves state-of-the-art performance in several general moving objects segmentation datasets.