Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
115works
0followers
36topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

115 published item(s)

preprint2026arXiv

One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

Large Language Models (LLMs) often fail to utilize their latent reasoning capabilities due to a distributional mismatch between ambiguous human inquiries and the structured logic required for machine activation. Existing alignment methods either incur prohibitive $O(N)$ costs by fine-tuning each model individually or rely on static prompts that fail to resolve query-level structural complexity. In this paper, we propose ReQueR (\textbf{Re}inforcement \textbf{Que}ry \textbf{R}efinement), a modular framework that treats reasoning elicitation as an inference-time alignment task. We train a specialized Refiner policy via Reinforcement Learning to rewrite raw queries into explicit logical decompositions, treating frozen LLMs as the environment. Rooted in the classical Zone of Proximal Development from educational psychology, we introduce the Adaptive Solver Hierarchy, a curriculum mechanism that stabilizes training by dynamically aligning environmental difficulty with the Refiner's evolving competence. ReQueR yields consistent absolute gains of 1.7\%--7.2\% across diverse architectures and benchmarks, outperforming strong baselines by 2.1\% on average. Crucially, it provides a promising paradigm for one-to-many inference-time reasoning elicitation, enabling a single Refiner trained on a small set of models to effectively unlock reasoning in diverse unseen models. Code is available at https://github.com/newera-xiao/ReQueR.

preprint2024arXiv

MS-DETR: Efficient DETR Training with Mixed Supervision

DETR accomplishes end-to-end object detection through iteratively generating multiple object candidates based on image features and promoting one candidate for each ground-truth object. The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates. We aim at improving the DETR training efficiency by explicitly supervising the candidate generation procedure through mixing one-to-one supervision and one-to-many supervision. Our approach, namely MS-DETR, is simple, and places one-to-many supervision to the object queries of the primary decoder that is used for inference. In comparison to existing DETR variants with one-to-many supervision, such as Group DETR and Hybrid DETR, our approach does not need additional decoder branches or object queries. The object queries of the primary decoder in our approach directly benefit from one-to-many supervision and thus are superior in object candidate prediction. Experimental results show that our approach outperforms related DETR variants, such as DN-DETR, Hybrid DETR, and Group DETR, and the combination with related DETR variants further improves the performance.

preprint2024arXiv

Newly Formed Dust within the Circumstellar Environment of SNIa-CSM 2018evt

Dust associated with various stellar sources in galaxies at all cosmic epochs remains a controversial topic, particularly whether supernovae (SNe) play an important role in dust production. We report evidence of dust formation in the cold, dense shell behind the ejecta-circumstellar medium (CSM) interaction in the Type Ia-CSM SN 2018evt three years after the explosion, characterized by a rise in the mid-infrared (MIR) emission accompanied by an accelerated decline in the optical radiation of the SN. Such a dust-formation picture is also corroborated by the concurrent evolution of the profiles of the Ha emission line. Our model suggests enhanced CSM dust concentration at increasing distances from the SN as compared to what can be expected from the density profile of the mass loss from a steady stellar wind. By the time of the last MIR observations at day +1041, a total amount of 1.2+-0.2x10^{-2} Msun of new dust has been formed by SN 2018evt, making SN 2018evt one of the most prolific dust factories among SNe with evidence of dust formation. The unprecedented witness of the intense production procedure of dust may shed light on the perceptions of dust formation in cosmic history.

preprint2024arXiv

Rotating black hole mimicker surrounded by the string cloud

Traversable wormholes and regular black holes usually represent completely different scenarios. But in the black bounce spacetime they can be described by a same line element, which is very attractive. Furthermore, the black hole photos taken by EHT show that black holes have spin, so spin is an indispensable intrinsic property of black holes in the actual universe. In this work, we derive a rotating black hole mimicker surrounded by the string cloud (SC), which can be interpolated to represent regular black hole spacetime and traversable wormhole spacetime. We investigate the effect of the spin $a$ and SC parameter $L$ on the observables (shadow radius $R_s$ and distortion $δ_s$) and energy emission rate of the black hole mimicker surrounded by the SC. We find that shadow for this spacetime is very sensitive to the $L$, i.e., the SC parameter can significantly increase the boundary of the shadow.

preprint2023arXiv

DR-WLC: Dimensionality Reduction cognition for object detection and pose estimation by Watching, Learning and Checking

Object detection and pose estimation are difficult tasks in robotics and autonomous driving. Existing object detection and pose estimation methods mostly adopt the same-dimensional data for training. For example, 2D object detection usually requires a large amount of 2D annotation data with high cost. Using high-dimensional information to supervise lower-dimensional tasks is a feasible way to reduce datasets size. In this work, the DR-WLC, a dimensionality reduction cognitive model, which can perform both object detection and pose estimation tasks at the same time is proposed. The model only requires 3D model of objects and unlabeled environment images (with or without objects) to finish the training. In addition, a bounding boxes generation strategy is also proposed to build the relationship between 3D model and 2D object detection task. Experiments show that our method can qualify the work without any manual annotations and it is easy to deploy for practical applications. Source code is at https://github.com/IN2-ViAUn/DR-WLC.

preprint2023arXiv

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM). MHR-Net aims to find a set of reasonable reconstructions for a 2D view, and it also selects the most likely reconstruction from the set. To deal with the challenging unsupervised generation of non-rigid shapes, we develop a new Deterministic Basis and Stochastic Deformation scheme in MHR-Net. The non-rigid shape is first expressed as the sum of a coarse shape basis and a flexible shape deformation, then multiple hypotheses are generated with uncertainty modeling of the deformation part. MHR-Net is optimized with reprojection loss on the basis and the best hypothesis. Furthermore, we design a new Procrustean Residual Loss, which reduces the rigid rotations between similar shapes and further improves the performance. Experiments show that MHR-Net achieves state-of-the-art reconstruction accuracy on Human3.6M, SURREAL and 300-VW datasets.

preprint2023arXiv

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

Neural Radiance Fields (NeRF) methods have proved effective as compact, high-quality and versatile representations for 3D scenes, and enable downstream tasks such as editing, retrieval, navigation, etc. Various neural architectures are vying for the core structure of NeRF, including the plain Multi-Layer Perceptron (MLP), sparse tensors, low-rank tensors, hashtables and their compositions. Each of these representations has its particular set of trade-offs. For example, the hashtable-based representations admit faster training and rendering but their lack of clear geometric meaning hampers downstream tasks like spatial-relation-aware editing. In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions. PVD consequently empowers downstream applications to optimally adapt the neural representations for the task at hand in a post hoc fashion. The conversions are fast, as distillation is progressively performed on different levels of volume representations, from shallower to deeper. We also employ special treatment of density to deal with its specific numerical instability problem. Empirical evidence is presented to validate our method on the NeRF-Synthetic, LLFF and TanksAndTemples datasets. For example, with PVD, an MLP-based NeRF model can be distilled from a hashtable-based Instant-NGP model at a 10X~20X faster speed than being trained the original NeRF from scratch, while achieving a superior level of synthesis quality. Code is available at https://github.com/megvii-research/AAAI2023-PVD.

preprint2023arXiv

Temporal Perceiving Video-Language Pre-training

Video-Language Pre-training models have recently significantly improved various multi-modal downstream tasks. Previous dominant works mainly adopt contrastive learning to achieve global feature alignment across modalities. However, the local associations between videos and texts are not modeled, restricting the pre-training models' generality, especially for tasks requiring the temporal video boundary for certain query texts. This work introduces a novel text-video localization pre-text task to enable fine-grained temporal and semantic alignment such that the trained model can accurately perceive temporal boundaries in videos given the text description. Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features. To produce temporal boundaries, frame features in several videos are manually merged into a long video sequence that interacts with a text sequence. With the localization task, our method connects the fine-grained frame representations with the word representations and implicitly distinguishes representations of different instances in the single modality. Notably, comprehensive experimental results show that our method significantly improves the state-of-the-art performance on various benchmarks, covering text-to-video retrieval, video question answering, video captioning, temporal action localization and temporal moment retrieval. The code will be released soon.

preprint2022arXiv

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

NLP researchers propose different word-substitute black-box attacks that can fool text classification models. In such attack, an adversary keeps sending crafted adversarial queries to the target model until it can successfully achieve the intended outcome. State-of-the-art attack methods usually require hundreds or thousands of queries to find one adversarial example. In this paper, we study whether a sophisticated adversary can attack the system with much less queries. We propose a simple yet efficient method that can reduce the average number of adversarial queries by 3-30 times and maintain the attack effectiveness. This research highlights that an adversary can fool a deep NLP model with much less cost.

preprint2022arXiv

Active Learning for Deep Visual Tracking

Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled training samples, and the number and quality of these samples directly affect the representational capability of the trained model. However, this approach is restrictive in practice, because manually labeling such a large number of training samples is time-consuming and prohibitively expensive. In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model. Under the guidance of active learning, the tracker based on the trained deep CNNs model can achieve competitive tracking performance while reducing the labeling cost. More specifically, to ensure the diversity of selected samples, we propose an active learning method based on multi-frame collaboration to select those training samples that should be and need to be annotated. Meanwhile, considering the representativeness of these selected samples, we adopt a nearest neighbor discrimination method based on the average nearest neighbor distance to screen isolated samples and low-quality samples. Therefore, the training samples subset selected based on our method requires only a given budget to maintain the diversity and representativeness of the entire sample set. Furthermore, we adopt a Tversky loss to improve the bounding box estimation of our tracker, which can ensure that the tracker achieves more accurate target states. Extensive experimental results confirm that our active learning-based tracker (ALT) achieves competitive tracking accuracy and speed compared with state-of-the-art trackers on the seven most challenging evaluation benchmarks.

preprint2022arXiv

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates with similar shapes in a local area while missing other representative ones in the global environment. In this paper, we propose a new 3D region-based active learning method to tackle this problem. Dubbed SSDR-AL, our method groups the original point clouds into superpoints and incrementally selects the most informative and representative ones for label acquisition. We achieve the selection mechanism via a graph reasoning network that considers both the spatial and structural diversities of superpoints. To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints. Extensive experiments on two point cloud benchmarks demonstrate the effectiveness of SSDR-AL in the semantic segmentation task. Particularly, SSDR-AL significantly outperforms the baseline method and reduces the annotation cost by up to 63.0% and 24.0% when achieving 90% performance of fully supervised learning, respectively.

preprint2022arXiv

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Vast requirement of computation power of Deep Neural Networks is a major hurdle to their real world applications. Many recent Application Specific Integrated Circuit (ASIC) chips feature dedicated hardware support for Neural Network Acceleration. However, as ASICs take multiple years to develop, they are inevitably out-paced by the latest development in Neural Architecture Research. For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy. In this paper, we propose Arch-Net, a family of Neural Networks made up of only operators efficiently supported across most architectures of ASICs. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. Empirical results on machine translation and image classification tasks confirm that we can transform latest developed Neural Architectures into fast running and as-accurate Arch-Net, ready for deployment on multiple mass-produced ASIC chips. The code will be available at https://github.com/megvii-research/Arch-Net.

preprint2022arXiv

Automated Progressive Learning for Efficient Training of Vision Transformers

Recent advances in vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs. Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training. In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning. First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth. Then, we propose automated progressive learning (AutoProg), an efficient training scheme that aims to achieve lossless acceleration by automatically increasing the training overload on-the-fly; this is achieved by adaptively deciding whether, where and how much should the model grow during progressive learning. Specifically, we first relax the optimization of the growth schedule to sub-network architecture optimization problem, then propose one-shot estimation of the sub-network performance via an elastic supernet. The searching overhead is reduced to minimal by recycling the parameters of the supernet. Extensive experiments of efficient training on ImageNet with two representative ViT models, DeiT and VOLO, demonstrate that AutoProg can accelerate ViTs training by up to 85.1% with no performance drop. Code: https://github.com/changlin31/AutoProg

preprint2022arXiv

Automatic Depth Optimization for Quantum Approximate Optimization Algorithm

Quantum Approximate Optimization Algorithm (QAOA) is a hybrid algorithm whose control parameters are classically optimized. In addition to the variational parameters, the right choice of hyperparameter is crucial for improving the performance of any optimization model. Control depth, or the number of variational parameters, is considered as the most important hyperparameter for QAOA. In this paper we investigate the control depth selection with an automatic algorithm based on proximal gradient descent. The performances of the automatic algorithm are demonstrated on 7-node and 10-node Max-Cut problems, which show that the control depth can be significantly reduced during the iteration while achieving an sufficient level of optimization accuracy. With theoretical convergence guarantee, the proposed algorithm can be used as an efficient tool for choosing the appropriate control depth as a replacement of random search or empirical rules. Moreover, the reduction of control depth will induce a significant reduction in the number of quantum gates in circuit, which improves the applicability of QAOA on Noisy Intermediate-scale Quantum (NISQ) devices.

preprint2022arXiv

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

A thriving trend for domain adaptive segmentation endeavors to generate the high-quality pseudo labels for target domain and retrain the segmentor on them. Under this self-training paradigm, some competitive methods have sought to the latent-space information, which establishes the feature centroids (a.k.a prototypes) of the semantic classes and determines the pseudo label candidates by their distances from these centroids. In this paper, we argue that the latent space contains more information to be exploited thus taking one step further to capitalize on it. Firstly, instead of merely using the source-domain prototypes to determine the target pseudo labels as most of the traditional methods do, we bidirectionally produce the target-domain prototypes to degrade those source features which might be too hard or disturbed for the adaptation. Secondly, existing attempts simply model each category as a single and isotropic prototype while ignoring the variance of the feature distribution, which could lead to the confusion of similar categories. To cope with this issue, we propose to represent each category with multiple and anisotropic prototypes via Gaussian Mixture Model, in order to fit the de facto distribution of source domain and estimate the likelihood of target samples based on the probability density. We apply our method on GTA5->Cityscapes and Synthia->Cityscapes tasks and achieve 61.2 and 62.8 respectively in terms of mean IoU, substantially outperforming other competitive self-training methods. Noticeably, in some categories which severely suffer from the categorical confusion such as "truck" and "bus", our method achieves 56.4 and 68.8 respectively, which further demonstrates the effectiveness of our design.

preprint2022arXiv

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.

preprint2022arXiv

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

Cross-domain person re-identification (re-ID), such as unsupervised domain adaptive (UDA) re-ID, aims to transfer the identity-discriminative knowledge from the source to the target domain. Existing methods commonly consider the source and target domains are isolated from each other, i.e., no intermediate status is modeled between both domains. Directly transferring the knowledge between two isolated domains can be very difficult, especially when the domain gap is large. From a novel perspective, we assume these two domains are not completely isolated, but can be connected through intermediate domains. Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains for a smooth knowledge transfer. To discover and utilize these intermediate domains, we propose an Intermediate Domain Module (IDM) and a Mirrors Generation Module (MGM). IDM has two functions: 1) it generates multiple intermediate domains by mixing the hidden-layer features from source and target domains and 2) it dynamically reduces the domain gap between the source / target domain features and the intermediate domain features. While IDM achieves good domain alignment, it introduces a side effect, i.e., the mix-up operation may mix the identities into a new identity and lose the original identities. To compensate this, MGM is introduced by mapping the features into the IDM-generated intermediate domains without changing their original identity. It allows to focus on minimizing domain variations to promote the alignment between the source / target domain and intermediate domains, which reinforces IDM into IDM++. We extensively evaluate our method under both the UDA and domain generalization (DG) scenarios and observe that IDM++ yields consistent performance improvement for cross-domain re-ID, achieving new state of the art.

preprint2022arXiv

Compact Scintillator Array Detector (ComSAD) for sounding rocket and CubeSat missions

The development of CubeSat and more frequent launch chances of sounding rockets are a total game changer to the space program, and it allows us to build space instruments to be more achievable and affordable. Therefore, it gives us a good opportunity to build a small cosmic ray detector which has capabilities to measure the flux, direction, and even energy of cosmic rays at the height above the limitation of balloon experiments, and it may open a new door for building a constellation of detectors to study cosmic ray physics. Compact Scintillator Array Detector (ComSAD) is dedicated for the sounding rocket mission of Taiwan's National Space Organization. In paper, we present the idea, design, and performance of ComSAD which is also suitable for CubeSat missions in the future.

preprint2022arXiv

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence. Thanks to the semantic diversity of natural language descriptions, temporal grounding allows activity grounding beyond pre-defined classes and has received increasing attention in recent years. The semantic diversity is rooted in the principle of compositionality in linguistics, where novel semantics can be systematically described by combining known words in novel ways (compositional generalization). However, current temporal grounding datasets do not specifically test for the compositional generalizability. To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i.e., Charades-CG and ActivityNet-CG. Evaluating the state-of-the-art methods on our new dataset splits, we empirically find that they fail to generalize to queries with novel combinations of seen words. To tackle this challenge, we propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies and learns fine-grained semantic correspondence among them. Experiments illustrate the superior compositional generalizability of our approach. The repository of this work is at https://github.com/YYJMJC/ Compositional-Temporal-Grounding.

preprint2022arXiv

Computational discovery of spin-polarized semimetals in spinel materials

The materials with spin-polarized electronic states have attracted a huge amount of interest due to their potential applications in spintronics. Based on first-principles calculations, we study the electronic characteristics of a series of AB2X4 chalcogeniden spinel structures and propose two promising candidates, VZn2O4 and VCd2S4, are spin-polarized semimetal materials. Both of them have ferromagnetic ground states. Their bands near the Fermi level are completely spin-polarized and form two types of nodal rings in the spin-up channel, and the large gaps in the spin-down channel prevent the spin-flip. Further symmetry analysis reveals that the nodal rings are protected by the glide mirror or mirror symmetries. Significantly, these nodal rings connect with each other and form a nodal chain structure, which can be well described by a simple four-band tight-binding (TB) model. The two ternary chalcogeniden spinel materials with a fully spin polarized nodal chain can serve as a prominent platform in the future applications of spintronic.

preprint2022arXiv

Constraints from LIGO O3 data on gravitational-wave emission due to r-modes in the glitching pulsar PSR J0537-6910

We present a search for continuous gravitational-wave emission due to r-modes in the pulsar PSR J0537-6910 using data from the LIGO-Virgo Collaboration observing run O3. PSR J0537-6910 is a young energetic X-ray pulsar and is the most frequent glitcher known. The inter-glitch braking index of the pulsar suggests that gravitational-wave emission due to r-mode oscillations may play an important role in the spin evolution of this pulsar. Theoretical models confirm this possibility and predict emission at a level that can be probed by ground-based detectors. In order to explore this scenario, we search for r-mode emission in the epochs between glitches by using a contemporaneous timing ephemeris obtained from NICER data. We do not detect any signals in the theoretically expected band of 86-97 Hz, and report upper limits on the amplitude of the gravitational waves. Our results improve on previous amplitude upper limits from r-modes in J0537-6910 by a factor of up to 3 and place stringent constraints on theoretical models for r-mode driven spin-down in PSR J0537-6910, especially for higher frequencies at which our results reach below the spin-down limit defined by energy conservation.

preprint2022arXiv

COVID-19 Detection Using CT Image Based On YOLOv5 Network

Computer aided diagnosis (CAD) increases diagnosis efficiency, helping doctors providing a quick and confident diagnosis, it has played an important role in the treatment of COVID19. In our task, we solve the problem about abnormality detection and classification. The dataset provided by Kaggle platform and we choose YOLOv5 as our model. We introduce some methods on objective detection in the related work section, the objection detection can be divided into two streams: onestage and two stage. The representational model are Faster RCNN and YOLO series. Then we describe the YOLOv5 model in the detail. Compared Experiments and results are shown in section IV. We choose mean average precision (mAP) as our experiments' metrics, and the higher (mean) mAP is, the better result the model will gain. mAP@0.5 of our YOLOv5s is 0.623 which is 0.157 and 0.101 higher than Faster RCNN and EfficientDet respectively.

preprint2022arXiv

Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning

Brain networks characterize complex connectivities among brain regions as graph structures, which provide a powerful means to study brain connectomes. In recent years, graph neural networks have emerged as a prevalent paradigm of learning with structured data. However, most brain network datasets are limited in sample sizes due to the relatively high cost of data acquisition, which hinders the deep learning models from sufficient training. Inspired by meta-learning that learns new concepts fast with limited training examples, this paper studies data-efficient training strategies for analyzing brain connectomes in a cross-dataset setting. Specifically, we propose to meta-train the model on datasets of large sample sizes and transfer the knowledge to small datasets. In addition, we also explore two brain-network-oriented designs, including atlas transformation and adaptive task reweighing. Compared to other pre-training strategies, our meta-learning-based approach achieves higher and stabler performance, which demonstrates the effectiveness of our proposed solutions. The framework is also able to derive new insights regarding the similarities among datasets and diseases in a data-driven fashion.

preprint2022arXiv

Deep Hierarchical Semantic Segmentation

Humans are able to recognize structured relations in observation, allowing us to decompose complex scenes into simpler parts and abstract the visual world in multiple levels. However, such hierarchical reasoning ability of human perception remains largely unexplored in current literature of semantic segmentation. Existing work is often aware of flatten labels and predicts target classes exclusively for each pixel. In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. We devise HSSN, a general HSS framework that tackles two critical issues in this task: i) how to efficiently adapt existing hierarchy-agnostic segmentation networks to the HSS setting, and ii) how to leverage the hierarchy information to regularize HSS network learning. To address i), HSSN directly casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. To solve ii), HSSN first explores inherent properties of the hierarchy as a training objective, which enforces segmentation predictions to obey the hierarchy structure. Further, with hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations and improve segmentation eventually. We conduct experiments on four semantic segmentation datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, and PASCAL-Person-Part), with different class hierarchies, segmentation network architectures and backbones, showing the generalization and superiority of HSSN.

preprint2022arXiv

Early-Time Ultraviolet Spectroscopy and Optical Follow-up Observations of the Type IIP Supernova 2021yja

We present three epochs of early-time ultraviolet (UV) and optical HST/STIS spectroscopy of the young, nearby Type IIP supernova (SN) 2021yja. We complement the HST data with two earlier epochs of Swift UVOT spectroscopy. The HST and Swift UVOT spectra are consistent with those of other well-studied Type IIP supernovae (SNe). The UV spectra exhibit rapid cooling at early times, while less dramatic changes are seen in the optical. We also present Lick/KAIT optical photometry up to the late-time-tail phase, showing a very long plateau and shallow decline compared with other SNe IIP. Our modeling of the UV spectrum with the TARDIS radiative-transfer code produces a good fit for a high-velocity explosion, a low total extinction $E(B-V) = 0.07$ mag, and a subsolar metallicity. We do not find a significant contribution to the UV flux from an additional heating source, such as interaction with the circumstellar medium, consistent with the observed flat plateau. Furthermore, the velocity width of the Mg II $λ$2798 line is comparable to that of the hydrogen Balmer lines, suggesting that the UV emission is confined to a region close to the photosphere.

preprint2022arXiv

Few-Shot Segmentation via Cycle-Consistent Transformer

Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as conditional information. These methods cannot utilize all pixel-wise support information for the query predictions, which is however critical for the segmentation task. In this paper, we focus on utilizing pixel-wise relationships between support and query images to facilitate the few-shot segmentation task. We design a novel Cycle-Consistent TRansformer (CyCTR) module to aggregate pixel-wise support features into query ones. CyCTR performs cross-attention between features from different images, i.e. support and query images. We observe that there may exist unexpected irrelevant pixel-level support features. Directly performing cross-attention may aggregate these features from support to query and bias the query features. Thus, we propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features and encourage query features to attend to the most informative pixels from support images. Experiments on all few-shot segmentation benchmarks demonstrate that our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods. Specifically, on Pascal-$5^i$ and COCO-$20^i$ datasets, we achieve 67.5% and 45.6% mIoU for 5-shot segmentation, outperforming previous state-of-the-art methods by 5.6% and 7.1% respectively.

preprint2022arXiv

Filter Pruning by Switching to Neighboring CNNs with Good Attributes

Filter pruning is effective to reduce the computational costs of neural networks. Existing methods show that updating the previous pruned filter would enable large model capacity and achieve better performance. However, during the iterative pruning process, even if the network weights are updated to new values, the pruning criterion remains the same. In addition, when evaluating the filter importance, only the magnitude information of the filters is considered. However, in neural networks, filters do not work individually, but they would affect other filters. As a result, the magnitude information of each filter, which merely reflects the information of an individual filter itself, is not enough to judge the filter importance. To solve the above problems, we propose Meta-attribute-based Filter Pruning (MFP). First, to expand the existing magnitude information based pruning criteria, we introduce a new set of criteria to consider the geometric distance of filters. Additionally, to explicitly assess the current state of the network, we adaptively select the most suitable criteria for pruning via a meta-attribute, a property of the neural network at the current state. Experiments on two image classification benchmarks validate our method. For ResNet-50 on ILSVRC-2012, we could reduce more than 50% FLOPs with only 0.44% top-5 accuracy loss.

preprint2022arXiv

Free-electron-light interactions in nanophotonics

When impinging on optical structures or passing in their vicinity, free electrons can spontaneously emit electromagnetic radiation, a phenomenon generally known as cathodoluminescence. Free-electron radiation comes in many guises: Cherenkov, transition, and Smith-Purcell radiation, but also electron scintillation, commonly referred to as incoherent cathodoluminescence. While those effects have been at the heart of many fundamental discoveries and technological developments in high-energy physics in the past century, their recent demonstration in photonic and nanophotonic systems has attracted a lot of attention. Those developments arose from predictions that exploit nanophotonics for novel radiation regimes, now becoming accessible thanks to advances in nanofabrication. In general, the proper design of nanophotonic structures can enable shaping, control, and enhancement of free-electron radiation, for any of the above-mentioned effects. Free-electron radiation in nanophotonics opens the way to promising applications, such as widely-tunable integrated light sources from x-ray to THz frequencies, miniaturized particle accelerators, and highly sensitive high-energy particle detectors. Here, we review the emerging field of free-electron radiation in nanophotonics. We first present a general, unified framework to describe free-electron light-matter interaction in arbitrary nanophotonic systems. We then show how this framework sheds light on the physical underpinnings of many methods in the field used to control and enhance free-electron radiation. Namely, the framework points to the central role played by the photonic eigenmodes in controlling the output properties of free-electron radiation (e.g., frequency, directionality, and polarization). [... see full abstract in paper]

preprint2022arXiv

Gap Opening and Inner Disk Structure in the Strongly Accreting Transition Disk of DM Tau

Large inner dust gaps in transition disks are frequently posited as evidence of giant planets sculpting gas and dust in the disk, or the opening of a gap by photoevaporative winds. Although the former hypothesis is strongly supported by the observations of planets and deep depletions in gas within the gap some disks, many T Tauri stars hosting transition disks accrete at rates typical for an undepleted disk, raising the question of how gap opening occurs in these objects. We thus present an analysis of the structure of the transition disk around the T Tauri star DM Tau, which is strongly accreting ($\sim 10^{-8.3}~\mathrm{M}_\odot~ \mathrm{yr}^{-1}$) and turbulent ($α=0.078 \pm 0.02$). Using the DALI thermochemical code, we fit disk models to simultaneously reproduce the accretion rate, high level of turbulence, the gas traced by ALMA band 6 observations of $^{12}$CO, $^{13}$CO, and C$^{18}$O J=2--1 lines, and the observed dust emission from the mm continuum and spectral energy distribution. We find a shallow depletion in gas surface density of $\sim 10$ relative to the outer disk and a gas rich inner disk is consistent with the observations. The planet mass of $<1$ M$_\mathrm{Jup}$ implied by the gap depth is in tension with predictions for dust trapping in a highly viscous disk, which requires a more massive planet of of $\sim10$M$_\mathrm{Jup}$. Photoevaporative models including a dead zone can qualitatively reproduce some features of the DM Tau disk, but still struggle to explain the high accretion rates and the observed mm continuum flux.

preprint2022arXiv

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training. In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge &#34;legos&#34; in each layers, on labeled multi-task and multi-domain datasets. By inspecting humans&#39; innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks: (1) simultaneous exposure to diverse cross-task and cross-domain information in each batch. (2) partitioned knowledge storage in separate lego units driven by knowledge sharing. (3) sparse activation of a subset of lego units for both pre-training and downstream tasks. Noteworthy, the joint training of disparate vision tasks is non-trivial due to their differences in input shapes, loss functions, output formats, data distributions, etc. Therefore, we innovatively develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training. SIMT lays the foundation of pre-training with large-scale multi-task multi-domain datasets and is proved essential for stable training in our GPPF experiments. Excitingly, the exhaustive experiments show that, our GPPF-R50 model achieves significant improvements of 2.5-5.8 over a strong baseline of the 8 pre-training tasks in GPPF-15M and harvests a range of SOTAs over the 22 downstream tasks with similar computation budgets. We also validate the generalization ability of GPPF to SOTA vision transformers with consistent improvements. These solid experimental results fully prove the effective knowledge learning, storing, sharing, and transfer provided by our novel GPPF framework.

preprint2022arXiv

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

Modeling temporal information for both detection and tracking in a unified framework has been proved a promising solution to video instance segmentation (VIS). However, how to effectively incorporate the temporal information into an online model remains an open problem. In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way. In detail, IAI employs a novel identification module to predict identification number for tracking instances explicitly. For passing temporal information cross frame, IAI utilizes an association module which combines current features and past embeddings. Notably, IAI can be integrated with different image models. We conduct extensive experiments on three VIS benchmarks. IAI outperforms all the online competitors on YouTube-VIS-2019 (ResNet-101 43.7 mAP) and YouTube-VIS-2021 (ResNet-50 38.0 mAP). Surprisingly, on the more challenging OVIS, IAI achieves SOTA performance (20.6 mAP). Code is available at https://github.com/zfonemore/IAI

preprint2022arXiv

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

Recently, increasing efforts have been focused on Weakly Supervised Scene Graph Generation (WSSGG). The mainstream solution for WSSGG typically follows the same pipeline: they first align text entities in the weak image-level supervisions (e.g., unlocalized relation triplets or captions) with image regions, and then train SGG models in a fully-supervised manner with aligned instance-level &#34;pseudo&#34; labels. However, we argue that most existing WSSGG works only focus on object-consistency, which means the grounded regions should have the same object category label as text entities. While they neglect another basic requirement for an ideal alignment: interaction-consistency, which means the grounded region pairs should have the same interactions (i.e., visual relations) as text entity pairs. Hence, in this paper, we propose to enhance a simple grounding module with both object-aware and interaction-aware knowledge to acquire more reliable pseudo labels. To better leverage these two types of knowledge, we regard them as two teachers and fuse their generated targets to guide the training process of our grounding module. Specifically, we design two different strategies to adaptively assign weights to different teachers by assessing their reliability on each training sample. Extensive experiments have demonstrated that our method consistently improves WSSGG performance on various kinds of weak supervision.

preprint2022arXiv

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Sign language is the window for people differently-abled to express their feelings as well as emotions. However, it remains challenging for people to learn sign language in a short time. To address this real-world challenge, in this work, we study the motion transfer system, which can transfer the user photo to the sign language video of specific words. In particular, the appearance content of the output video comes from the provided user image, while the motion of the video is extracted from the specified tutorial video. We observe two primary limitations in adopting the state-of-the-art motion transfer methods to sign language generation:(1) Existing motion transfer works ignore the prior geometrical knowledge of the human body. (2) The previous image animation methods only take image pairs as input in the training stage, which could not fully exploit the temporal information within videos. In an attempt to address the above-mentioned limitations, we propose Structure-aware Temporal Consistency Network (STCNet) to jointly optimize the prior structure of human with the temporal consistency for sign language video generation. There are two main contributions in this paper. (1) We harness a fine-grained skeleton detector to provide prior knowledge of the body keypoints. In this way, we ensure the keypoint movement in a valid range and make the model become more explainable and robust. (2) We introduce two cycle-consistency losses, i.e., short-term cycle loss and long-term cycle loss, which are conducted to assure the continuity of the generated video. We optimize the two losses and keypoint detector network in an end-to-end manner.

preprint2022arXiv

Krylov complexity and orthogonal polynomials

Krylov complexity measures operator growth with respect to a basis, which is adapted to the Heisenberg time evolution. The construction of that basis relies on the Lanczos algorithm, also known as the recursion method. The mathematics of Krylov complexity can be described in terms of orthogonal polynomials. We provide a pedagogical introduction to the subject and work out analytically a number of examples involving the classical orthogonal polynomials, polynomials of the Hahn class, and the Tricomi-Carlitz polynomials.

preprint2022arXiv

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

The Scene Graph Generation (SGG) task aims to detect all the objects and their pairwise visual relationships in a given image. Although SGG has achieved remarkable progress over the last few years, almost all existing SGG models follow the same training paradigm: they treat both object and predicate classification in SGG as a single-label classification problem, and the ground-truths are one-hot target labels. However, this prevalent training paradigm has overlooked two characteristics of current SGG datasets: 1) For positive samples, some specific subject-object instances may have multiple reasonable predicates. 2) For negative samples, there are numerous missing annotations. Regardless of the two characteristics, SGG models are easy to be confused and make wrong predictions. To this end, we propose a novel model-agnostic Label Semantic Knowledge Distillation (LS-KD) for unbiased SGG. Specifically, LS-KD dynamically generates a soft label for each subject-object instance by fusing a predicted Label Semantic Distribution (LSD) with its original one-hot target label. LSD reflects the correlations between this instance and multiple predicate categories. Meanwhile, we propose two different strategies to predict LSD: iterative self-KD and synchronous self-KD. Extensive ablations and results on three SGG tasks have attested to the superiority and generality of our proposed LS-KD, which can consistently achieve decent trade-off performance between different predicate categories.

preprint2022arXiv

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a locality-aware inter-and intra-video reconstruction framework that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our LIIR location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints.

preprint2022arXiv

Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data

Inferring the stereo structure of objects in the real world is a challenging yet practical task. To equip deep models with this ability usually requires abundant 3D supervision which is hard to acquire. It is promising that we can simply benefit from synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gaps are nontrivial considering the variant texture, shape and context. To overcome these difficulties, we propose a Visio-Perceptual Adaptive Network for single-view 3D reconstruction, dubbed VPAN. To generalize the model towards a real scenario, we propose to fulfill several aspects: (1) Look: visually incorporate spatial structure from the single view to enhance the expressiveness of representation; (2) Cast: perceptually align the 2D image features to the 3D shape priors with cross-modal semantic contrastive mapping; (3) Mold: reconstruct stereo-shape of target by transforming embeddings into the desired manifold. Extensive experiments on several benchmarks demonstrate the effectiveness and robustness of the proposed method in learning the 3D shape manifold from synthetic data via a single-view. The proposed method outperforms state-of-the-arts on Pix3D dataset with IoU 0.292 and CD 0.108, and reaches IoU 0.329 and CD 0.104 on Pascal 3D+.

preprint2022arXiv

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

3D-aware image synthesis aims to generate images of objects from multiple views by learning a 3D representation. However, one key challenge remains: existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images. To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints. By leveraging the underlying 3D geometry information of generated images, i.e., depth and camera transformation matrix, we explicitly establish stereo correspondence between views to perform multi-view joint optimization. In particular, we enforce the photometric consistency between pairs of views and integrate a stereo mixup mechanism into the training process, encouraging the model to reason about the correct 3D shape. Besides, we design a two-stage training strategy with feature-level multi-view joint optimization to improve the image quality. Extensive experiments on three datasets demonstrate that MVCGAN achieves the state-of-the-art performance for 3D-aware image synthesis.

preprint2022arXiv

Non-Abelian nonsymmorphic chiral symmetries

The Hofstadter model exemplifies a large class of physical systems characterized by particles hopping on a lattice immersed in a gauge field. Recent advancements on various synthetic platforms have enabled highly-controllable simulations of such systems with tailored gauge fields featuring complex spatial textures. These synthetic gauge fields could introduce synthetic symmetries that do not appear in electronic materials. Here, in an SU(2) non-Abelian Hofstadter model, we theoretically show the emergence of multiple nonsymmorphic chiral symmetries, which combine an internal unitary anti-symmetry with fractional spatial translation. Depending on the values of the gauge fields, the nonsymmorphic chiral symmetries can exhibit non-Abelian algebra and protect Kramer quartet states in the bulk band structure, creating general four-fold degeneracy at all momenta. These nonsymmorphic chiral symmetries protect double Dirac semimetals at zero energy, which become gapped into quantum confined insulating phases upon introducing a boundary. Moreover, the parity of the system size can determine whether the resulting insulating phase is trivial or topological. Our work indicates a pathway for creating topology via synthetic symmetries emergent from synthetic gauge fields.

preprint2022arXiv

Observations of the Very Young Type Ia Supernova 2019np with Early-excess Emission

Early-time radiative signals from type Ia supernovae (SNe Ia) can provide important constraints on the explosion mechanism and the progenitor system. We present observations and analysis of SN 2019np, a nearby SN Ia discovered within 1-2 days after the explosion. Follow-up observations were conducted in optical, ultraviolet, and near-infrared bands, covering the phases from $\sim-$16.7 days to $\sim$+367.8 days relative to its $B-$band peak luminosity. The photometric and spectral evolutions of SN 2019np resembles the average behavior of normal SNe Ia. The absolute B-band peak magnitude and the post-peak decline rate are $M_{\rm max}(B)=-19.52 \pm 0.47$mag and $Δm_{\rm15}(B) =1.04 \pm 0.04$mag, respectively. No Hydrogen line has been detected in the near-infrared and nebular-phase spectra of SN 2019np. Assuming that the $^{56}$Ni powering the light curve is centrally located, we find that the bolometric light curve of SN 2019np shows a flux excess up to 5.0% in the early phase compared to the radiative diffusion model. Such an extra radiation perhaps suggests the presence of an additional energy source beyond the radioactive decay of central nickel. Comparing the observed color evolution with that predicted by different models such as interactions of SN ejecta with circumstellar matter (CSM)/companion star, a double-detonation explosion from a sub-Chandrasekhar mass white dwarf (WD), and surface $^{56}$Ni mixing, the latter one is favored.

preprint2022arXiv

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

preprint2022arXiv

ReGO: Reference-Guided Outpainting for Scenery Image

We aim to tackle the challenging yet practical scenery image outpainting task in this work. Recently, generative adversarial learning has significantly advanced the image outpainting by producing semantic consistent content for the given image. However, the existing methods always suffer from the blurry texture and the artifacts of the generative part, making the overall outpainting results lack authenticity. To overcome the weakness, this work investigates a principle way to synthesize texture-rich results by borrowing pixels from its neighbors (i.e., reference images), named \textbf{Re}ference-\textbf{G}uided \textbf{O}utpainting (ReGO). Particularly, the ReGO designs an Adaptive Content Selection (ACS) module to transfer the pixel of reference images for texture compensating of the target one. To prevent the style of the generated part from being affected by the reference images, a style ranking loss is further proposed to augment the ReGO to synthesize style-consistent results. Extensive experiments on two popular benchmarks, NS6K \cite{yangzx} and NS8K \cite{wang}, well demonstrate the effectiveness of our ReGO. Our code will be made public available.

preprint2022arXiv

Results and findings of the 2021 Image Similarity Challenge

The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or hiding into unrelated images, combined with local pixel perturbations. The key algorithmic elements in the winning submissions are: training on strong augmentations, self-supervised learning, score normalization, explicit overlay detection, and global descriptor matching followed by pairwise image comparison.

preprint2022arXiv

Search for anisotropic gravitational-wave backgrounds using data from Advanced LIGO and Advanced Virgo&#39;s first three observing runs

We report results from searches for anisotropic stochastic gravitational-wave backgrounds using data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. For the first time, we include Virgo data in our analysis and run our search with a new efficient pipeline called {\tt PyStoch} on data folded over one sidereal day. We use gravitational-wave radiometry (broadband and narrow band) to produce sky maps of stochastic gravitational-wave backgrounds and to search for gravitational waves from point sources. A spherical harmonic decomposition method is employed to look for gravitational-wave emission from spatially-extended sources. Neither technique found evidence of gravitational-wave signals. Hence we derive 95\% confidence-level upper limit sky maps on the gravitational-wave energy flux from broadband point sources, ranging from $F_{α, Θ} < {\rm (0.013 - 7.6)} \times 10^{-8} {\rm erg \, cm^{-2} \, s^{-1} \, Hz^{-1}},$ and on the (normalized) gravitational-wave energy density spectrum from extended sources, ranging from $Ω_{α, Θ} < {\rm (0.57 - 9.3)} \times 10^{-9} \, {\rm sr^{-1}}$, depending on direction ($Θ$) and spectral index ($α$). These limits improve upon previous limits by factors of $2.9 - 3.5$. We also set 95\% confidence level upper limits on the frequency-dependent strain amplitudes of quasimonochromatic gravitational waves coming from three interesting targets, Scorpius X-1, SN 1987A and the Galactic Center, with best upper limits range from $h_0 < {\rm (1.7-2.1)} \times 10^{-25},$ a factor of $\geq 2.0$ improvement compared to previous stochastic radiometer searches.

preprint2022arXiv

Search for Axion(-like) Particles in Heavy-Ion Collisions

We propose a novel way to search for axion(-like) particles in heavy-ion collisions using prompt photons as the probe and the property of conversion between photon and axion(-like) particles under a strong magnetic field generated in the non-central collisions. The expected result reveals that a new phase space region of the coupling constant for photon and axion(-like) particles can be covered in the future high energy nuclear colliders.

preprint2022arXiv

Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification. While recent methods are mostly based on multi-stream architectures, we use group convolution to construct equivalent network architectures efficiently within a single-stream network. We further adopt and improve dynamic grouping convolution (DGConv) to make group convolution hyperparameters, and thus the overall network architecture, learnable during network training. The proposed method therefore can theoretically adjust any modern CNN models to any multi-source remote sensing data set, and can potentially avoid sub-optimal solutions caused by manually decided architecture hyperparameters. In the experiments, the proposed method is applied to ResNet and UNet, and the adjusted networks are verified on three very diverse benchmark data sets (i.e., Houston2018 data, Berlin data, and MUUFL data). Experimental results demonstrate the effectiveness of the proposed single-stream CNNs, and in particular ResNet18-DGConv improves the state-of-the-art classification overall accuracy (OA) on HS-SAR Berlin data set from $62.23\%$ to $68.21\%$. In the experiments we have two interesting findings. First, using DGConv generally reduces test OA variance. Second, multi-stream is harmful to model performance if imposed to the first few layers, but becomes beneficial if applied to deeper layers. Altogether, the findings imply that multi-stream architecture, instead of being a strictly necessary component in deep learning models for multi-source remote sensing data, essentially plays the role of model regularizer. Our code is publicly available at https://github.com/yyyyangyi/Multi-source-RS-DGConv. We hope our work can inspire novel research in the future.

preprint2022arXiv

Spectropolarimetry of the Thermonuclear Supernova 2021rhu: High Calcium Polarization 79 Days After Peak Luminosity

We report spectropolarimetric observations of the Type Ia supernova (SN) 2021rhu at four epochs: $-$7, +0, +36, and +79 days relative to its $B$-band maximum luminosity. A wavelength-dependent continuum polarization peaking at $3890 \pm 93$ Angstroms and reaching a level of $p_{\rm max}=1.78% \pm 0.02$% was found. The peak of the polarization curve is bluer than is typical in the Milky Way, indicating a larger proportion of small dust grains along the sightline to the SN. After removing the interstellar polarization, we found a pronounced increase of the polarization in the CaII near-infrared triplet, from $\sim$0.3% at day $-$7 to $\sim$2.5% at day +79. No temporal evolution in high-resolution flux spectra across the NaID and CaIIH&K features was seen from days +39 to +74, indicating that the late-time increase in polarization is intrinsic to the SN as opposed to being caused by scattering of SN photons in circumstellar or interstellar matter. We suggest that an explanation for the late-time rise of the CaII near-infrared triplet polarization may be the alignment of calcium atoms in a weak magnetic field through optical excitation/pumping by anisotropic radiation from the SN.

preprint2022arXiv

Spectropolarimetry of the tidal disruption event AT 2019qiz: a quasispherical reprocessing layer

We present optical spectropolarimetry of the tidal disruption event (TDE) AT 2019qiz on days $+0$ and $+29$ relative to maximum brightness. Continuum polarization, which informs the shape of the electron-scattering surface, was found to be consistent with 0 per cent at peak brightness. On day $+29$, the continuum polarization rose to $\sim 1$ per cent, making this the first reported spectropolarimetric evolution of a TDE. These findings are incompatible with a naked eccentric disc that lacks significant mass outflow. Instead, the spectropolarimetry paints a picture wherein, at maximum brightness, high-frequency emission from the accretion disc is reprocessed into the optical band by a nearly spherical, optically thick, electron-scattering photosphere located far away from the black hole. We estimate the radius of the scattering photosphere to be $\sim 100\rm\, au$ at maximum brightness -- significantly larger than the tidal radius ($\sim 1\rm\, au$) and the thermalisation radius ($\sim 30\rm\, au$) where the optical continuum is formed. A month later, as the fallback rate drops and the scattering photosphere recedes, the continuum polarization increases, revealing a moderately aspherical interior. We also see evidence for smaller-scale density variations in the scattering photosphere, inferred from the scatter of the data in the Stokes $q-u$ plane. On day $+29$, the H$α$ emission-line peak is depolarized to $\sim 0.3$ per cent (compared to $\sim 1$ per cent continuum polarization), and displays a gradual rise toward the line&#39;s redder wavelengths. This observation indicates the H$α$ line formed near the electron-scattering radius.

preprint2022arXiv

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

Voice conversion is to generate a new speech with the source content and a target voice style. In this paper, we focus on one general setting, i.e., non-parallel many-to-many voice conversion, which is close to the real-world scenario. As the name implies, non-parallel many-to-many voice conversion does not require the paired source and reference speeches and can be applied to arbitrary voice transfer. In recent years, Generative Adversarial Networks (GANs) and other techniques such as Conditional Variational Autoencoders (CVAEs) have made considerable progress in this field. However, due to the sophistication of voice conversion, the style similarity of the converted speech is still unsatisfactory. Inspired by the inherent structure of mel-spectrogram, we propose a new voice conversion framework, i.e., Subband-based Generative Adversarial Network for Voice Conversion (SGAN-VC). SGAN-VC converts each subband content of the source speech separately by explicitly utilizing the spatial characteristics between different subbands. SGAN-VC contains one style encoder, one content encoder, and one decoder. In particular, the style encoder network is designed to learn style codes for different subbands of the target speaker. The content encoder network can capture the content information on the source speech. Finally, the decoder generates particular subband content. In addition, we propose a pitch-shift module to fine-tune the pitch of the source speaker, making the converted tone more accurate and explainable. Extensive experiments demonstrate that the proposed approach achieves state-of-the-art performance on VCTK Corpus and AISHELL3 datasets both qualitatively and quantitatively, whether on seen or unseen data. Furthermore, the content intelligibility of SGAN-VC on unseen data even exceeds that of StarGANv2-VC with ASR network assistance.

preprint2022arXiv

The exact SL(K+3,C) symmetry of string theory

By using on-shell recursion relation of string scattering amplitudes (SSA), we show that all n-point SSA of the open bosonic string theory can be expressed in terms of the Lauricella functions. This result extends the previous exact SL(K+3,C) symmetry of the 4-point Lauricella SSA (LSSA) of three tachyons and one arbitrary string states to the whole tree-level open bosonic string theory. Moreover, we present three applications of the SL(K+3,C) symmetry on the SSA. They are the solvability of all SSA in terms of one amplitude, the existence of iteration relations among residues of a given SSA so as to soften its hard scattering behavior and finally the re-derivation of infinite linear relations among hard SSA [12].

preprint2022arXiv

The ringing of quantum corrected Schwarzschild black hole with GUP

Schwarzschild black holes with quantum corrections are studied under scalar field perturbations and electromagnetic field perturbations to analyze the effect of the correction term on the potential function and quasinormal mode (QNM). In classical general relativity, spacetime is continuous and there is no existence of the so-called minimal length. The introduction of the correction items of the generalized uncertainty principle (GUP), the parameter $β$, can change the singularity structure of the black hole gauge and may lead to discretization in time and space. We apply the sixth-order WKB method to approximate the QNM of Schwarzschild black holes with quantum corrections and perform numerical analysis to derive the results of the method. Also, we find that the effective potential and QNM in scalar fields are larger than those in electromagnetic fields.

preprint2022arXiv

The Type Icn SN 2021csp: Implications for the Origins of the Fastest Supernovae and the Fates of Wolf-Rayet Stars

We present observations of SN 2021csp, the second example of a newly-identified type of supernova (Type Icn) hallmarked by strong, narrow, P Cygni carbon features at early times. The SN appears as a fast and luminous blue transient at early times, reaching a peak absolute magnitude of -20 within 3 days due to strong interaction between fast SN ejecta (v ~ 30000 km/s) and a massive, dense, fast-moving C/O wind shed by the WC-like progenitor months before explosion. The narrow line features disappear from the spectrum 10-20 days after explosion and are replaced by a blue continuum dominated by broad Fe features, reminiscent of Type Ibn and IIn supernovae and indicative of weaker interaction with more extended H/He-poor material. The transient then abruptly fades ~60 days post-explosion when interaction ceases. Deep limits at later phases suggest minimal heavy-element nucleosynthesis, a low ejecta mass, or both, and imply an origin distinct from that of classical Type Ic supernovae. We place SN 2021csp in context with other fast-evolving interacting transients, and discuss various progenitor scenarios: an ultrastripped progenitor star, a pulsational pair-instability eruption, or a jet-driven fallback supernova from a Wolf-Rayet star. The fallback scenario would naturally explain the similarity between these events and radio-loud fast transients, and suggests a picture in which most stars massive enough to undergo a WR phase collapse directly to black holes at the end of their lives.

preprint2022arXiv

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Backdoor attacks pose a new threat to NLP models. A standard strategy to construct poisoned data in backdoor attacks is to insert triggers (e.g., rare words) into selected sentences and alter the original label to a target label. This strategy comes with a severe flaw of being easily detected from both the trigger and the label perspectives: the trigger injected, which is usually a rare word, leads to an abnormal natural language expression, and thus can be easily detected by a defense model; the changed target label leads the example to be mistakenly labeled and thus can be easily detected by manual inspections. To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled. The core idea of the proposed strategy is to construct clean-labeled examples, whose labels are correct but can lead to test label changes when fused with the training set. To generate poisoned clean-labeled examples, we propose a sentence generation model based on the genetic algorithm to cater to the non-differentiable characteristic of text data. Extensive experiments demonstrate that the proposed attacking strategy is not only effective, but more importantly, hard to defend due to its triggerless and clean-labeled nature. Our work marks the first step towards developing triggerless attacking strategies in NLP.

preprint2022arXiv

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation. NAS has been explosively studied to automate the discovery of top-performer neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works start to explore indicators that can predict a network&#39;s performance without training. However, they either leveraged limited properties of deep networks, or the benefits of their training-free indicators are not applied to more extensive search methods. By rigorous correlation analysis, we present a unified framework to understand and accelerate NAS, by disentangling &#34;TEG&#34; characteristics of searched networks - Trainability, Expressivity, Generalization - all assessed in a training-free manner. The TEG indicators could be scaled up and integrated with various NAS search methods, including both supernet and single-path approaches. Extensive studies validate the effective and efficient guidance from our TEG-NAS framework, leading to both improved search accuracy and over 56% reduction in search time cost. Moreover, we visualize search trajectories on three landscapes of &#34;TEG&#34; characteristics, observing that while a good local minimum is easier to find on NAS-Bench-201 given its simple topology, balancing &#34;TEG&#34; characteristics is much harder on the DARTS search space due to its complex landscape geometry. Our code is available at https://github.com/VITA-Group/TEGNAS.

preprint2022arXiv

Unified Transformer Tracker for Object Tracking

As an important area in computer vision, object tracking has formed two separate communities that respectively study Single Object Tracking (SOT) and Multiple Object Tracking (MOT). However, current methods in one tracking scenario are not easily adapted to the other due to the divergent training datasets and tracking objects of both tasks. Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking. In this work, we present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both SOT and MOT. The correlation between the target and tracking frame features is exploited to localize the target. We demonstrate that both SOT and MOT tasks can be solved within this framework. The model can be simultaneously end-to-end trained by alternatively optimizing the SOT and MOT objectives on the datasets of individual tasks. Extensive experiments are conducted on several benchmarks with a unified model trained on SOT and MOT datasets. Code will be available at https://github.com/Flowerfan/Trackron.

preprint2022arXiv

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

Product retrieval is of great importance in the ecommerce domain. This paper introduces our 1st-place solution in eBay eProduct Visual Search Challenge (FGVC9), which is featured for an ensemble of about 20 models from vision models and vision-language models. While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority. Specifically, for the vision models, we use a two-stage training pipeline which first learns from the coarse labels provided in the training set and then conducts fine-grained self-supervised training, yielding a coarse-to-fine metric learning manner. For the vision-language models, we use the textual description of the training image as the supervision signals for fine-tuning the image-encoder (feature extractor). With these designs, our solution achieves 0.7623 MAR@10, ranking the first place among all the competitors. The code is available at: \href{https://github.com/WangWenhao0716/V2L}{V$^2$L}.

preprint2022arXiv

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

One fundamental challenge of vehicle re-identification (re-id) is to learn robust and discriminative visual representation, given the significant intra-class vehicle variations across different camera views. As the existing vehicle datasets are limited in terms of training images and viewpoints, we propose to build a unique large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets, and design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet. The first stage of our approach is to learn the generic representation for all domains (i.e., source vehicle datasets) by training with the conventional classification loss. This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain. The second stage is to fine-tune the trained model purely based on the target vehicle set, by minimizing the distribution discrepancy between our VehicleNet and any target domain. We discuss our proposed multi-source dataset VehicleNet and evaluate the effectiveness of the two-stage progressive representation learning through extensive experiments. We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge, and competitive results on two other public vehicle re-id datasets, i.e., VeRi-776 and VehicleID. We hope this new VehicleNet dataset and the learned robust representations can pave the way for vehicle re-id in the real-world environments.

preprint2022arXiv

Visual Abductive Reasoning

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, Reasoner (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative representations for the premise and hypothesis. Then, multiple decoders are cascaded to generate and progressively refine the premise and hypothesis sentences. The prediction scores of the sentences are used to guide cross-sentence information flow in the cascaded reasoning procedure. Our VAR benchmarking results show that Reasoner surpasses many famous video-language models, while still being far behind human performance. This work is expected to foster future efforts in the reasoning-beyond-observation paradigm.

preprint2021arXiv

A general framework for scintillation in nanophotonics

Bombardment of materials by high-energy particles (e.g., electrons, nuclei, X- and $γ$-ray photons) often leads to light emission, known generally as scintillation. Scintillation is ubiquitous and enjoys widespread applications in many areas such as medical imaging, X-ray non-destructive inspection, night vision, electron microscopy, and high-energy particle detectors. A large body of research focuses on finding new materials optimized for brighter, faster, and more controlled scintillation. Here, we develop a fundamentally different approach based on integrating nanophotonic structures into scintillators to enhance their emission. To start, we develop a unified and ab initio theory of nanophotonic scintillators that accounts for the key aspects of scintillation: the energy loss by high-energy particles, as well as the light emission by non-equilibrium electrons in arbitrary nanostructured optical systems. This theoretical framework allows us, for the first time, to experimentally demonstrate nearly an order-of-magnitude enhancement of scintillation, in both electron-induced, and X-ray-induced scintillation. Our theory also allows the discovery of structures that could eventually achieve several orders-of-magnitude scintillation enhancement. The framework and results shown here should enable the development of a new class of brighter, faster, and higher-resolution scintillators with tailored and optimized performances - with many potential applications where scintillators are used.

preprint2021arXiv

A Survey on Concept Factorization: From Shallow to Deep Representation Learning

The quality of learned features by representation learning determines the performance of learning algorithms and the related application tasks (such as high-dimensional data clustering). As a relatively new paradigm for representation learning, Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining for over a decade. Lots of effective CF based methods have been proposed based on different perspectives and properties, but note that it still remains not easy to grasp the essential connections and figure out the underlying explanatory factors from exiting studies. In this paper, we therefore survey the recent advances on CF methodologies and the potential benchmarks by categorizing and summarizing the current methods. Specifically, we first re-view the root CF method, and then explore the advancement of CF-based representation learning ranging from shallow to deep/multilayer cases. We also introduce the potential application areas of CF-based methods. Finally, we point out some future directions for studying the CF-based representation learning. Overall, this survey provides an insightful overview of both theoretical basis and current developments in the field of CF, which can also help the interested researchers to understand the current trends of CF and find the most appropriate CF techniques to deal with particular applications.

preprint2021arXiv

Bilinear equations in Darboux transformations by Boson-Fermion correspondence

Bilinear equation is an important property for integrable nonlinear evolution equation. Many famous research objects in mathematical physics, such as Gromov-Witten invariants, can be described in terms of bilinear equations to show their connections with the integrable systems. Here in this paper, we mainly discuss the bilinear equations of the transformed tau functions under the successive applications of the Darboux transformations for the KP hierarchy, the modified KP hierarchy (Kupershmidt-Kiso version) and the BKP hierarchy, by the method of the Boson-Fermion correspondence. The Darboux transformations are considered in the Fermionic picture, by multiplying the different Fermionic fields on the tau functions. Here the Fermionic fields are corresponding to the (adjoint) eigenfunctions, whose changes under the Darboux transformations are showed to be the ones of the squared eigenfunction potentials in the Bosonic picture, used in the spectral representations of the (adjoint) eigenfunctions. Then the successive applications of the Darboux transformations are given in the Fermionic picture. Based upon this, some new bilinear equations in the Darboux chain are derived, besides the ones of $(l-l&#39;)$ -th modified KP hierarchy. The corresponding examples of these new bilinear equations are given.

preprint2021arXiv

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search

The goal of person search is to localize and match query persons from scene images. For high efficiency, one-step methods have been developed to jointly handle the pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. One is the mutual interference between the optimization objectives of multiple sub-tasks. The other is the sub-optimal identification feature learning caused by small batch size when end-to-end training. To overcome these problems, we propose a decoupled and memory-reinforced network (DMRNet). Specifically, to reconcile the conflicts of multiple objectives, we simplify the standard tightly coupled pipelines and establish a deeply decoupled multi-task learning framework. Further, we build a memory-reinforced mechanism to boost the identification feature learning. By queuing the identification features of recently accessed instances into a memory bank, the mechanism augments the similarity pair construction for pairwise metric learning. For better encoding consistency of the stored features, a slow-moving average of the network is applied for extracting these features. In this way, the dual networks reinforce each other and converge to robust solution states. Experimentally, the proposed method obtains 93.2% and 46.9% mAP on CUHK-SYSU and PRW datasets, which exceeds all the existing one-step methods.

preprint2021arXiv

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. It is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection field, which allows explicitly associating dense human semantics with sparse keypoints, is learnt and progressively improved over the network feature pyramid for robustness. Then, the difficult pixel grouping problem is cast as an easier, multi-person joint assembling task. By formulating joint association as maximum-weight bipartite matching, a differentiable solution is developed to exploit projected gradient descent and Dykstra&#39;s cyclic projection algorithm. This makes our method end-to-end trainable and allows back-propagating the grouping error to directly supervise multi-granularity human representation learning. This is distinguished from current bottom-up human parsers or pose estimators which require sophisticated post-processing or heuristic greedy algorithms. Experiments on three instance-aware human parsing datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.

preprint2021arXiv

Instance-Invariant Domain Adaptive Object Detection via Progressive Disentanglement

Most state-of-the-art methods of object detection suffer from poor generalization ability when the training and test data are from different domains, e.g., with different styles. To address this problem, previous methods mainly use holistic representations to align feature-level and pixel-level distributions of different domains, which may neglect the instance-level characteristics of objects in images. Besides, when transferring detection ability across different domains, it is important to obtain the instance-level features that are domain-invariant, instead of the styles that are domain-specific. Therefore, in order to extract instance-invariant features, we should disentangle the domain-invariant features from the domain-specific features. To this end, a progressive disentangled framework is first proposed to solve domain adaptive object detection. Particularly, base on disentangled learning used for feature decomposition, we devise two disentangled layers to decompose domain-invariant and domain-specific features. And the instance-invariant features are extracted based on the domain-invariant features. Finally, to enhance the disentanglement, a three-stage training mechanism including multiple loss functions is devised to optimize our model. In the experiment, we verify the effectiveness of our method on three domain-shift scenes. Our method is separately 2.3\%, 3.6\%, and 4.0\% higher than the baseline method \cite{saito2019strong}.

preprint2021arXiv

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

People can easily imagine the potential sound while seeing an event. This natural synchronization between audio and visual signals reveals their intrinsic correlations. To this end, we propose to learn the audio-visual correlations from the perspective of cross-modal generation in a self-supervised manner, the learned correlations can be then readily applied in multiple downstream tasks such as the audio-visual cross-modal localization and retrieval. We introduce a novel Variational AutoEncoder (VAE) framework that consists of Multiple encoders and a Shared decoder (MS-VAE) with an additional Wasserstein distance constraint to tackle the problem. Extensive experiments demonstrate that the optimized latent representation of the proposed MS-VAE can effectively learn the audio-visual correlations and can be readily applied in multiple audio-visual downstream tasks to achieve competitive performance even without any given label information during training.

preprint2021arXiv

Learning to Anticipate Egocentric Actions by Imagination

Anticipating actions before they are executed is crucial for a wide range of practical applications, including autonomous driving and robotics. In this paper, we study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos. Previous approaches focus on summarizing the observed content and directly predicting future action based on past observations. We believe it would benefit the action anticipation if we could mine some cues to compensate for the missing information of the unobserved frames. We then propose to decompose the action anticipation into a series of future feature predictions. We imagine how the visual feature changes in the near future and then predicts future action labels based on these imagined representations. Differently, our ImagineRNN is optimized in a contrastive learning way instead of feature regression. We utilize a proxy task to train the ImagineRNN, i.e., selecting the correct future states from distractors. We further improve ImagineRNN by residual anticipation, i.e., changing its target to predicting the feature difference of adjacent frames instead of the frame content. This promotes the network to focus on our target, i.e., the future action, as the difference between adjacent frame features is more important for forecasting the future. Extensive experiments on two large-scale egocentric action datasets validate the effectiveness of our method. Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.

preprint2021arXiv

Modeling the Probabilistic Distribution of Unlabeled Data forOne-shot Medical Image Segmentation

Existing image segmentation networks mainly leverage large-scale labeled datasets to attain high accuracy. However, labeling medical images is very expensive since it requires sophisticated expert knowledge. Thus, it is more desirable to employ only a few labeled data in pursuing high segmentation performance. In this paper, we develop a data augmentation method for one-shot brain magnetic resonance imaging (MRI) image segmentation which exploits only one labeled MRI image (named atlas) and a few unlabeled images. In particular, we propose to learn the probability distributions of deformations (including shapes and intensities) of different unlabeled MRI images with respect to the atlas via 3D variational autoencoders (VAEs). In this manner, our method is able to exploit the learned distributions of image deformations to generate new authentic brain MRI images, and the number of generated samples will be sufficient to train a deep segmentation network. Furthermore, we introduce a new standard segmentation benchmark to evaluate the generalization performance of a segmentation network through a cross-dataset setting (collected from different sources). Extensive experiments demonstrate that our method outperforms the state-of-the-art one-shot medical segmentation methods. Our code has been released at https://github.com/dyh127/Modeling-the-Probabilistic-Distribution-of-Unlabeled-Data.

preprint2021arXiv

One-Shot Neural Architecture Search via Self-Evaluated Template Network

Neural architecture search (NAS) aims to automate the search procedure of architecture instead of manual design. Even if recent NAS approaches finish the search within days, lengthy training is still required for a specific architecture candidate to get the parameters for its accurate evaluation. Recently one-shot NAS methods are proposed to largely squeeze the tedious training process by sharing parameters across candidates. In this way, the parameters for each candidate can be directly extracted from the shared parameters instead of training them from scratch. However, they have no sense of which candidate will perform better until evaluation so that the candidates to evaluate are randomly sampled and the top-1 candidate is considered the best. In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates. SETN consists of two components: (1) an evaluator, which learns to indicate the probability of each individual architecture being likely to have a lower validation loss. The candidates for evaluation can thus be selectively sampled according to this evaluator. (2) a template network, which shares parameters among all candidates to amortize the training cost of generated candidates. In experiments, the architecture found by SETN achieves state-of-the-art performance on CIFAR and ImageNet benchmarks within comparable computation costs. Code is publicly available on GitHub: https://github.com/D-X-Y/AutoDL-Projects.

preprint2021arXiv

Sketch-Guided Scenery Image Outpainting

The outpainting results produced by existing approaches are often too random to meet users&#39; requirement. In this work, we take the image outpainting one step forward by allowing users to harvest personal custom outpainting results using sketches as the guidance. To this end, we propose an encoder-decoder based network to conduct sketch-guided outpainting, where two alignment modules are adopted to impose the generated content to be realistic and consistent with the provided sketches. First, we apply a holistic alignment module to make the synthesized part be similar to the real one from the global view. Second, we reversely produce the sketches from the synthesized part and encourage them be consistent with the ground-truth ones using a sketch alignment module. In this way, the learned generator will be imposed to pay more attention to fine details and be sensitive to the guiding sketches. To our knowledge, this work is the first attempt to explore the challenging yet meaningful conditional scenery image outpainting. We conduct extensive experiments on two collected benchmarks to qualitatively and quantitatively validate the effectiveness of our approach compared with the other state-of-the-art generative models.

preprint2021arXiv

Supervision by Registration and Triangulation for Landmark Detection

We present Supervision by Registration and Triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utilize unlabeled data, there are two key observations: (1) the detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. (2) the detections of the same landmark in multiple synchronized and geometrically calibrated views should correspond to a single 3D point, i.e., multi-view consistency. Registration and multi-view consistency are sources of supervision that do not require manual labeling, thus it can be leveraged to augment existing training data during detector training. End-to-end training is made possible by differentiable registration and 3D triangulation modules. Experiments with 11 datasets and a newly proposed metric to measure precision demonstrate accuracy and precision improvements in landmark detection on both images and video. Code is available at https://github.com/D-X-Y/landmark-detection.

preprint2021arXiv

Toggling Near-field Directionality via Polarization Control of Surface Waves

Directional excitation of guidance modes is central to many applications ranging from light harvesting, optical information processing to quantum optical technology. Of paramount interest, especially, the active control of near-field directionality provides a new paradigm for the real-time on-chip manipulation of light. Here we find that for a given dipolar source, its near-field directionality can be toggled efficiently via tailoring the polarization of surface waves that are excited, for example, via tuning the chemical potential of graphene in a graphene-metasurface waveguide. This finding enables a feasible scheme for the active near-field directionality. Counterintuitively, we reveal that this scheme can transform a circular electric/magnetic dipole into a Huygens dipole in the near-field coupling. Moreover, for Janus dipoles, this scheme enables us to actively flip their near-field coupling and non-coupling faces.

preprint2020arXiv

Adaptive Exploration for Unsupervised Person Re-Identification

Due to domain bias, directly deploying a deep person re-identification (re-ID) model trained on one dataset often achieves considerably poor accuracy on another dataset. In this paper, we propose an Adaptive Exploration (AE) method to address the domain-shift problem for re-ID in an unsupervised manner. Specifically, in the target domain, the re-ID model is inducted to 1) maximize distances between all person images and 2) minimize distances between similar person images. In the first case, by treating each person image as an individual class, a non-parametric classifier with a feature memory is exploited to encourage person images to move far away from each other. In the second case, according to a similarity threshold, our method adaptively selects neighborhoods for each person image in the feature space. By treating these similar person images as the same class, the non-parametric classifier forces them to stay closer. However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods. As a result, a minority of images may select a large number of neighborhoods while a majority of images have only a few neighborhoods. To address this issue, we additionally integrate a balance strategy into the adaptive selection. We evaluate our methods with two protocols. The first one is called &#34;target-only re-ID&#34;, in which only the unlabeled target data is used for training. The second one is called &#34;domain adaptive re-ID&#34;, in which both the source data and the target data are used during training. Experimental results on large-scale re-ID datasets demonstrate the effectiveness of our method. Our code has been released at https://github.com/dyh127/Adaptive-Exploration-for-Unsupervised-Person-Re-Identification.

preprint2020arXiv

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

We aim at the problem named One-Shot Unsupervised Domain Adaptation. Unlike traditional Unsupervised Domain Adaptation, it assumes that only one unlabeled target sample can be available when learning to adapt. This setting is realistic but more challenging, in which conventional adaptation approaches are prone to failure due to the scarce of unlabeled target data. To this end, we propose a novel Adversarial Style Mining approach, which combines the style transfer module and task-specific module into an adversarial manner. Specifically, the style transfer module iteratively searches for harder stylized images around the one-shot target sample according to the current learning state, leading the task model to explore the potential styles that are difficult to solve in the almost unseen target domain, thus boosting the adaptation performance in a data-scarce scenario. The adversarial learning framework makes the style transfer module and task-specific module benefit each other during the competition. Extensive experiments on both cross-domain classification and segmentation benchmarks verify that ASM achieves state-of-the-art adaptation performance under the challenging one-shot setting.

preprint2020arXiv

Analytic Study of Magnetic Catalysis in Holographic QCD

We explore the effect of the magnetic field on the QCD phase transition through AdS/CFT correspondence. By introducing an anisotropic magnetic field in the Einstein-Maxwell-Scalar system, a family of analytic solutions is obtained by the potential reconstruction method where the contribution of the magnetic field in the blackening background can be analytically derived. After imposing the kinetic gauge function by requesting the linear Regge spectrum of $J/ψ$ mesons, the contribution of the magnetic field phase diagram can be demonstrated. The results show that the transition temperature will be raising as the magnetic field increases, which is the so-call magnetic catalysis effect. However, if the system is in a strong enough magnetic environment, the transition temperature will be cool down and display the inverse catalysis effect.

preprint2020arXiv

Analytic Study on Chiral Phase Transition in Holographic QCD

The chiral symmetry breaking ($χ_{SB}$) is one of the most fundamental problems in QCD. In this paper, we calculate quark condensation analytically in a holographic QCD model dual to the Einstein-Maxwell-Dilaton (EMD) system coupled to a probe scalar field. We find that the black hole phase transition in the EMD system seriously affects $χ_{SB}$. At small chemical potential, $χ_{SB}$ behaves as a crossover. For large chemical potential $μ>μ_c$, $χ_{SB}$ becomes first order with exactly the same transition temperature as the black hole phase transition by a bypass mechanism. The phase diagram we obtained is qualitatively consistent with the recent results from lattice QCD simulations and NJL models.

preprint2020arXiv

Angle-Based Cost-Sensitive Multicategory Classification

Many real-world classification problems come with costs which can vary for different types of misclassification. It is thus important to develop cost-sensitive classifiers which minimize the total misclassification cost. Although binary cost-sensitive classifiers have been well-studied, solving multicategory classification problems is still challenging. A popular approach to address this issue is to construct K classification functions for a K-class problem and remove the redundancy by imposing a sum-to-zero constraint. However, such method usually results in higher computational complexity and inefficient algorithms. In this paper, we propose a novel angle-based cost-sensitive classification framework for multicategory classification without the sum-to-zero constraint. Loss functions that included in the angle-based cost-sensitive classification framework are further justified to be Fisher consistent. To show the usefulness of the framework, two cost-sensitive multicategory boosting algorithms are derived as concrete instances. Numerical experiments demonstrate that proposed boosting algorithms yield competitive classification performances against other existing boosting approaches.

preprint2020arXiv

Collaborative Video Object Segmentation by Foreground-Background Integration

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation. Different from previous practices that only explore the embedding learning using pixels from foreground object (s), we consider background should be equally treated and thus propose Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach. Our CFBI implicitly imposes the feature embedding from the target foreground object and its corresponding background to be contrastive, promoting the segmentation results accordingly. With the feature embedding from both foreground and background, our CFBI performs the matching process between the reference and the predicted sequence from both pixel and instance levels, making the CFBI be robust to various object scales. We conduct extensive experiments on three popular benchmarks, i.e., DAVIS 2016, DAVIS 2017, and YouTube-VOS. Our CFBI achieves the performance (J$F) of 89.4%, 81.9%, and 81.4%, respectively, outperforming all the other state-of-the-art methods. Code: https://github.com/z-x-yang/CFBI.

preprint2020arXiv

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources. To this end, in this paper, we introduce a new task called video description via two multi-modal cooperative dialog agents, whose ultimate goal is for one conversational agent to describe an unseen video based on the dialog and two static frames. Specifically, one of the intelligent agents - Q-BOT - is given two static frames from the beginning and the end of the video, as well as a finite number of opportunities to ask relevant natural language questions before describing the unseen video. A-BOT, the other agent who has already seen the entire video, assists Q-BOT to accomplish the goal by providing answers to those questions. We propose a QA-Cooperative Network with a dynamic dialog history update learning mechanism to transfer knowledge from A-BOT to Q-BOT, thus helping Q-BOT to better describe the video. Extensive experiments demonstrate that Q-BOT can effectively learn to describe an unseen video by the proposed model and the cooperative learning method, achieving the promising performance where Q-BOT is given the full ground truth history dialog.

preprint2020arXiv

Dialog Intent Induction with Deep Multi-View Clustering

We introduce the dialog intent induction task and present a novel deep multi-view clustering approach to tackle the problem. Dialog intent induction aims at discovering user intents from user query utterances in human-human conversations such as dialogs between customer support agents and customers. Motivated by the intuition that a dialog intent is not only expressed in the user query utterance but also captured in the rest of the dialog, we split a conversation into two independent views and exploit multi-view clustering techniques for inducing the dialog intent. In particular, we propose alternating-view k-means (AV-KMEANS) for joint multi-view representation learning and clustering analysis. The key innovation is that the instance-view representations are updated iteratively by predicting the cluster assignment obtained from the alternative view, so that the multi-view representations of the instances lead to similar cluster assignments. Experiments on two public datasets show that AV-KMEANS can induce better dialog intent clusters than state-of-the-art unsupervised representation learning methods and standard multi-view clustering approaches.

preprint2020arXiv

Direct measurement of temporal correlations above the spin-glass transition by coherent resonant magnetic x-ray spectroscopy

In the 1970s a new paradigm was introduced that interacting quenched systems, such as a spin-glass, have a phase transition in which long time memory of spatial patterns is realized without spatial correlations. The principal methods to study the spin-glass transition, besides some elaborate and elegant theoretical constructions, have been numerical computer simulations and neutron spin echo measurements . We show here that the dynamical correlations of the spin-glass transition are embedded in measurements of the four-spin correlations at very long times. This information is directly available in the temporal correlations of the intensity, which encode the spin-orientation memory, obtained by the technique of resonant magnetic x-ray photon correlation spectroscopy (RM- XPCS). We have implemented this method to observe and accurately characterize the critical slowing down of the spin orientation fluctuations in the classic metallic spin glass alloy Cu(Mn) over time scales of 1 to 1000 secs. Our method opens the way for studying phase transitions in systems such as spin ices, and quantum spin liquids, as well as the structural glass transition.

preprint2020arXiv

DONet: Dual Objective Networks for Skin Lesion Segmentation

Skin lesion segmentation is a crucial step in the computer-aided diagnosis of dermoscopic images. In the last few years, deep learning based semantic segmentation methods have significantly advanced the skin lesion segmentation results. However, the current performance is still unsatisfactory due to some challenging factors such as large variety of lesion scale and ambiguous difference between lesion region and background. In this paper, we propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation. Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives. Concretely, the two objectives are actually defined by different loss functions. In this way, the two decoders are encouraged to produce differentiated probability maps to match different optimization targets, resulting in complementary predictions accordingly. The complementary information learned by these two objectives are further aggregated together to make the final prediction, by which the uncertainty existing in segmentation maps can be significantly alleviated. Besides, to address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM) to model the complex correlation among skin lesions, where the features with different scale contexts are efficiently integrated to form a more robust representation. Extensive experiments on two popular benchmarks well demonstrate the effectiveness of the proposed DONet. In particular, our DONet achieves 0.881 and 0.931 dice score on ISIC 2018 and $\text{PH}^2$, respectively. Code will be made public available.

preprint2020arXiv

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

Though action recognition in videos has achieved great success recently, it remains a challenging task due to the massive computational cost. Designing lightweight networks is a possible solution, but it may degrade the recognition performance. In this paper, we innovatively propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos. The dynamic inference approach can be achieved from aspects of the network depth and the number of input video frames, or even in a joint input-wise and network depth-wise manner. In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module. The inference is carried out progressively on the grid by following some predefined route, whenever the inference process comes across a checkpoint, an early prediction can be made depending on whether the early stop criteria meets. For the proof-of-concept purpose, we instantiate three dynamic inference frameworks using two well-known backbone CNNs. In these instances, we overcome the drawback of limited temporal coverage resulted from an early prediction by a novel frame permutation scheme, and alleviate the conflict between progressive computation and video temporal relation modeling by introducing an online temporal shift module. Extensive experiments are conducted to thoroughly analyze the effectiveness of our ideas and to inspire future research efforts. Results on various datasets also evident the superiority of our approach.

preprint2020arXiv

Early Ultra-Violet observations of type IIn supernovae constrain the asphericity of their circumstellar material

We present a survey of the early evolution of 12 Type IIn supernovae (SNe IIn) in the Ultra-Violet (UV) and visible light. We use this survey to constrain the geometry of the circumstellar material (CSM) surrounding SN IIn explosions, which may shed light on their progenitor diversity. In order to distinguish between aspherical and spherical circumstellar material (CSM), we estimate the blackbody radius temporal evolution of the SNe IIn of our sample, following the method introduced by Soumagnac et al. We find that higher luminosity objects tend to show evidence for aspherical CSM. Depending on whether this correlation is due to physical reasons or to some selection bias, we derive a lower limit between 35% and 66% on the fraction of SNe IIn showing evidence for aspherical CSM. This result suggests that asphericity of the CSM surrounding SNe IIn is common - consistent with data from resolved images of stars undergoing considerable mass loss. It should be taken into account for more realistic modelling of these events.

preprint2020arXiv

FinBERT: A Pretrained Language Model for Financial Communications

Contextual pretrained language models, such as BERT (Devlin et al., 2019), have made significant breakthrough in various NLP tasks by training on large scale of unlabeled text re-sources.Financial sector also accumulates large amount of financial communication text.However, there is no pretrained finance specific language models available. In this work,we address the need by pretraining a financial domain specific BERT models, FinBERT, using a large scale of financial communication corpora. Experiments on three financial sentiment classification tasks confirm the advantage of FinBERT over generic domain BERT model. The code and pretrained models are available at https://github.com/yya518/FinBERT. We hope this will be useful for practitioners and researchers working on financial NLP tasks.

preprint2020arXiv

Gated Channel Transformation for Visual Recognition

In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

preprint2020arXiv

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC), that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We term this process as structure completion, which is realized by multi-grained reasoning blocks in our model. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. This can be true since the annotated captions for an image are often semantically equivalent in existing datasets, and thus there is only one paired text for a masked image in training. We devise an unsupervised unpaired-creation learning path besides the over-explored paired-reconstruction path, as well as a multi-stage training strategy to mitigate the insufficiency of labeled data. We conduct extensive quantitative and qualitative experiments as well as ablation studies, which reveal the efficacy of our proposed LSIC.

preprint2020arXiv

Inter-Image Communication for Weakly Supervised Localization

Weakly supervised localization aims at finding target object regions using only image-level supervision. However, localization maps extracted from classification networks are often not accurate due to the lack of fine pixel-level supervision. In this paper, we propose to leverage pixel-level similarities across different objects for learning more accurate object locations in a complementary way. Particularly, two kinds of constraints are proposed to prompt the consistency of object features within the same categories. The first constraint is to learn the stochastic feature consistency among discriminative pixels that are randomly sampled from different images within a batch. The discriminative information embedded in one image can be leveraged to benefit its counterpart with inter-image communication. The second constraint is to learn the global consistency of object features throughout the entire dataset. We learn a feature center for each category and realize the global feature consistency by forcing the object features to approach class-specific centers. The global centers are actively updated with the training process. The two constraints can benefit each other to learn consistent pixel-level features within the same categories, and finally improve the quality of localization maps. We conduct extensive experiments on two popular benchmarks, i.e., ILSVRC and CUB-200-2011. Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set, surpassing the current state-of-the-art method by a large margin. The code is available at https://github.com/xiaomengyc/I2C.

preprint2020arXiv

Lane Detection in Low-light Conditions Using an Efficient Data Enhancement : Light Conditions Style Transfer

Nowadays, deep learning techniques are widely used for lane detection, but application in low-light conditions remains a challenge until this day. Although multi-task learning and contextual-information-based methods have been proposed to solve the problem, they either require additional manual annotations or introduce extra inference overhead respectively. In this paper, we propose a style-transfer-based data enhancement method, which uses Generative Adversarial Networks (GANs) to generate images in low-light conditions, that increases the environmental adaptability of the lane detector. Our solution consists of three parts: the proposed SIM-CycleGAN, light conditions style transfer and lane detection network. It does not require additional manual annotations nor extra inference overhead. We validated our methods on the lane detection benchmark CULane using ERFNet. Empirically, lane detection model trained using our method demonstrated adaptability in low-light conditions and robustness in complex scenarios. Our code for this paper will be publicly available.

preprint2020arXiv

Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning

We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset. Our framework considers cooperative optimization of shared weights between models for source and target tasks, and adjusts the constituent loss weights adaptively. The adaptation of the weights is based on a reinforcement learning (RL) selection policy, guided with a performance metric on the target validation set. We demonstrate that L2TL outperforms fine-tuning baselines and other adaptive transfer learning methods on eight datasets. In the regimes of small-scale target datasets and significant label mismatch between source and target datasets, L2TL shows particularly large benefits.

preprint2020arXiv

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions. Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage. In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net integrates the interaction and the propagation operations into a single network, which significantly promotes the efficiency of iVOS in the scheme of multi-round interactions. More importantly, we propose a simple yet effective memory aggregation mechanism to record the informative knowledge from the previous interaction rounds, improving the robustness in discovering challenging objects of interest greatly. We conduct extensive experiments on the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net achieves the J@60 score of 76.1% without any bells and whistles, outperforming the state-of-the-arts with more than 2.7%.

preprint2020arXiv

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search

Neural architecture search (NAS) has achieved breakthrough success in a great number of applications in the past few years. It could be time to take a step back and analyze the good and bad aspects in the field of NAS. A variety of algorithms search architectures under different search space. These searched architectures are trained using different setups, e.g., hyper-parameters, data augmentation, regularization. This raises a comparability problem when comparing the performance of various NAS algorithms. NAS-Bench-101 has shown success to alleviate this problem. In this work, we propose an extension to NAS-Bench-101: NAS-Bench-201 with a different search space, results on multiple datasets, and more diagnostic information. NAS-Bench-201 has a fixed search space and provides a unified benchmark for almost any up-to-date NAS algorithms. The design of our search space is inspired from the one used in the most popular cell-based searching algorithms, where a cell is represented as a DAG. Each edge here is associated with an operation selected from a predefined operation set. For it to be applicable for all NAS algorithms, the search space defined in NAS-Bench-201 includes all possible architectures generated by 4 nodes and 5 associated operation options, which results in 15,625 candidates in total. The training log and the performance for each architecture candidate are provided for three datasets. This allows researchers to avoid unnecessary repetitive training for selected candidate and focus solely on the search algorithm itself. The training time saved for every candidate also largely improves the efficiency of many methods. We provide additional diagnostic information such as fine-grained loss and accuracy, which can give inspirations to new designs of NAS algorithms. In further support, we have analyzed it from many aspects and benchmarked 10 recent NAS algorithms.

preprint2020arXiv

One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets

Named entity recognition (NER) is a fundamental component in the modern language understanding pipeline. Public NER resources such as annotated data and model services are available in many domains. However, given a particular downstream application, there is often no single NER resource that supports all the desired entity types, so users must leverage multiple resources with different tag sets. This paper presents a marginal distillation (MARDI) approach for training a unified NER model from resources with disjoint or heterogeneous tag sets. In contrast to recent works, MARDI merely requires access to pre-trained models rather than the original training datasets. This flexibility makes it easier to work with sensitive domains like healthcare and finance. Furthermore, our approach is general enough to integrate with different NER architectures, including local models (e.g., BiLSTM) and global models (e.g., CRF). Experiments on two benchmark datasets show that MARDI performs on par with a strong marginal CRF baseline, while being more flexible in the form of required NER resources. MARDI also sets a new state of the art on the progressive NER task. MARDI significantly outperforms the start-of-the-art model on the task of progressive NER.

preprint2020arXiv

OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in An Open World

In this paper, we tackle the problem of discovering new classes in unlabeled visual data given labeled data from disjoint classes. Existing methods typically first pre-train a model with labeled data, and then identify new classes in unlabeled data via unsupervised clustering. However, the labeled data that provide essential knowledge are often underexplored in the second step. The challenge is that the labeled and unlabeled examples are from non-overlapping classes, which makes it difficult to build the learning relationship between them. In this work, we introduce OpenMix to mix the unlabeled examples from an open set and the labeled examples from known classes, where their non-overlapping labels and pseudo-labels are simultaneously mixed into a joint label distribution. OpenMix dynamically compounds examples in two ways. First, we produce mixed training images by incorporating labeled examples with unlabeled examples. With the benefits of unique prior knowledge in novel class discovery, the generated pseudo-labels will be more credible than the original unlabeled predictions. As a result, OpenMix helps to prevent the model from overfitting on unlabeled samples that may be assigned with wrong pseudo-labels. Second, the first way encourages the unlabeled examples with high class-probabilities to have considerable accuracy. We introduce these examples as reliable anchors and further integrate them with unlabeled samples. This enables us to generate more combinations in unlabeled examples and exploit finer object relations among the new classes. Experiments on three classification datasets demonstrate the effectiveness of the proposed OpenMix, which is superior to state-of-the-art methods in novel class discovery.

preprint2020arXiv

Progressive Local Filter Pruning for Image Retrieval Acceleration

This paper focuses on network pruning for image retrieval acceleration. Prevailing image retrieval works target at the discriminative feature learning, while little attention is paid to how to accelerate the model inference, which should be taken into consideration in real-world practice. The challenge of pruning image retrieval models is that the middle-level feature should be preserved as much as possible. Such different requirements of the retrieval and classification model make the traditional pruning methods not that suitable for our task. To solve the problem, we propose a new Progressive Local Filter Pruning (PLFP) method for image retrieval acceleration. Specifically, layer by layer, we analyze the local geometric properties of each filter and select the one that can be replaced by the neighbors. Then we progressively prune the filter by gradually changing the filter weights. In this way, the representation ability of the model is preserved. To verify this, we evaluate our method on two widely-used image retrieval datasets,i.e., Oxford5k and Paris6K, and one person re-identification dataset,i.e., Market-1501. The proposed method arrives with superior performance to the conventional pruning methods, suggesting the effectiveness of the proposed method for image retrieval.

preprint2020arXiv

QCD Phase Diagram by Holography

We explore QCD phase diagram by constructing a holographic QCD model using the Einstein-Maxwell-Scalar system. The chiral transition is investigated by adding a probe scalar and confinement transition is studied by adding a probe string into the system. By interpreting the black hole phase transition in the bulk spacetime as the quarkyonic transition in the dual QCD theory and introducing the bypass mechanism for deconfinement transition, we give an explanation why chiral symmetry breaking and deconfinement transition lines coincide with each other despite their different physical origins.

preprint2020arXiv

Query-efficient Meta Attack to Deep Neural Networks

Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries. However, due to lack of prior and inefficiency in leveraging the query and feedback information, existing methods are mostly query-intensive for obtaining effective attack patterns. In this work, we propose a meta attack approach that is capable of attacking a targeted model with much fewer queries. Its high queryefficiency stems from effective utilization of meta learning approaches in learning generalizable prior abstraction from the previously observed attack patterns and exploiting such prior to help infer attack patterns from only a few queries and outputs. Extensive experiments on MNIST, CIFAR10 and tiny-Imagenet demonstrate that our meta-attack method can remarkably reduce the number of model queries without sacrificing the attack performance. Besides, the obtained meta attacker is not restricted to a particular model but can be used easily with a fast adaptive ability to attack a variety of models.The code of our work is available at https://github.com/dydjw9/MetaAttack_ICLR2020/.

preprint2020arXiv

Research on the new form of higher-order generalized uncertainty principle in quantum system

This paper proposes a new high-order generalized uncertainty principle, which can modify the momentum operator and position operator simultaneously. Moreover, the new form of GUP is consistent with the viewpoint of the existence of the minimum length uncertainty and the maximum observable momentum proposed by the mainstream quantum gravity theory. By using the new GUP, the maximum localization state and position eigenfunction are discussed, and the corresponding conclusions are compared with the existing literature. The harmonic oscillator is further discussed at the end of this article as an example.

preprint2020arXiv

Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps

Recently, remarkable progress has been made in weakly supervised object localization (WSOL) to promote object localization maps. The common practice of evaluating these maps applies an indirect and coarse way, i.e., obtaining tight bounding boxes which can cover high-activation regions and calculating intersection-over-union (IoU) scores between the predicted and ground-truth boxes. This measurement can evaluate the ability of localization maps to some extent, but we argue that the maps should be measured directly and delicately, i.e., comparing the maps with the ground-truth object masks pixel-wisely. To fulfill the direct evaluation, we annotate pixel-level object masks on the ILSVRC validation set. We propose to use IoU-Threshold curves for evaluating the real quality of localization maps. Beyond the amended evaluation metric and annotated object masks, this work also introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. We propose a two-stage approach to generate the localization maps by simply comparing the similarity of point-wise features between the high-activation and the rest pixels. Based on the predicted localization maps, we explore to estimate object boundaries on a very large dataset. A hard-negative suppression loss is proposed for obtaining fine boundaries. We conduct extensive experiments on the ILSVRC and CUB benchmarks. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC. The code and the annotated masks are released at https://github.com/xiaomengyc/SEM.

preprint2020arXiv

Revisiting EmbodiedQA: A Simple Baseline and Beyond

In Embodied Question Answering (EmbodiedQA), an agent interacts with an environment to gather necessary information for answering user questions. Existing works have laid a solid foundation towards solving this interesting problem. But the current performance, especially in navigation, suggests that EmbodiedQA might be too challenging for the contemporary approaches. In this paper, we empirically study this problem and introduce 1) a simple yet effective baseline that achieves promising performance; 2) an easier and practical setting for EmbodiedQA where an agent has a chance to adapt the trained model to a new environment before it actually answers users questions. In this new setting, we randomly place a few objects in new environments, and upgrade the agent policy by a distillation network to retain the generalization ability from the trained model. On the EmbodiedQA v1 benchmark, under the standard setting, our simple baseline achieves very competitive results to the-state-of-the-art; in the new setting, we found the introduced small change in settings yields a notable gain in navigation.

preprint2020arXiv

SCExAO/CHARIS High-Contrast Imaging of Spirals and Darkening Features in the HD 34700 A Protoplanetary Disk

We present Subaru/SCExAO+CHARIS broadband ($JHK$-band) integral field spectroscopy of HD 34700 A. CHARIS data recover HD 34700 A&#39;s disk ring and confirm multiple spirals discovered in Monnier et al. (2019). We set limits on substellar companions of $\sim12\ M_{\rm Jup}$ at $0\farcs3$ (in the ring gap) and $\sim5\ M_{\rm Jup}$ at $0\farcs75$ (outside the ring). The data reveal darkening effects on the ring and spiral, although we do not identify the origin of each feature such as shadows or physical features related to the outer spirals. Geometric albedoes converted from the surface brightness suggests a higher scale height and/or prominently abundant sub-micron dust at position angle between $\sim45^\circ$ and $90^\circ$. Spiral fitting resulted in very large pitch angles ($\sim30-50^\circ$) and a stellar flyby of HD 34700 B or infall from a possible envelope is perhaps a reasonable scenario to explain the large pitch angles.

preprint2020arXiv

SF-Net: Single-Frame Supervision for Temporal Action Localization

In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL). To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action. This can significantly reduce the labor cost of obtaining full supervision which requires annotating the action boundary. Compared to the weak supervision that only annotates the video-level label, the single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead. To make full use of such single-frame supervision, we propose a unified system called SF-Net. First, we propose to predict an actionness score for each video frame. Along with a typical category score, the actionness score can provide comprehensive information about the occurrence of a potential action and aid the temporal boundary refinement during inference. Second, we mine pseudo action and background frames based on the single-frame annotations. We identify pseudo action frames by adaptively expanding each annotated single frame to its nearby, contextual frames and we mine pseudo background frames from all the unannotated frames across multiple videos. Together with the ground-truth labeled frames, these pseudo-labeled frames are further used for training the classifier. In extensive experiments on THUMOS14, GTEA, and BEOID, SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization. Notably, SF-Net achieves comparable results to its fully-supervised counterpart which requires much more resource intensive annotations. The code is available at https://github.com/Flowerfan/SF-Net.

preprint2020arXiv

SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation

One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this paper, we propose a simple yet effective Similarity Guidance network to tackle the One-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we firstly adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adapted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework which can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SGOne achieves the mIoU score of 46.3%, surpassing the baseline methods.

preprint2020arXiv

Similarity-preserving Image-image Domain Adaptation for Person Re-identification

This article studies the domain adaptation problem in person re-identification (re-ID) under a &#34;learning via translation&#34; framework, consisting of two components, 1) translating the labeled images from the source to the target domain in an unsupervised manner, 2) learning a re-ID model using the translated images. The objective is to preserve the underlying human identity information after image translation, so that translated images with labels are effective for feature learning on the target domain. To this end, we propose a similarity preserving generative adversarial network (SPGAN) and its end-to-end trainable version, eSPGAN. Both aiming at similarity preserving, SPGAN enforces this property by heuristic constraints, while eSPGAN does so by optimally facilitating the re-ID model learning. More specifically, SPGAN separately undertakes the two components in the &#34;learning via translation&#34; framework. It first preserves two types of unsupervised similarity, namely, self-similarity of an image before and after translation, and domain-dissimilarity of a translated source image and a target image. It then learns a re-ID model using existing networks. In comparison, eSPGAN seamlessly integrates image translation and re-ID model learning. During the end-to-end training of eSPGAN, re-ID learning guides image translation to preserve the underlying identity information of an image. Meanwhile, image translation improves re-ID learning by providing identity-preserving training samples of the target domain style. In the experiment, we show that identities of the fake images generated by SPGAN and eSPGAN are well preserved. Based on this, we report the new state-of-the-art domain adaptation results on two large-scale person re-ID datasets.

preprint2020arXiv

Single Image Brightening via Multi-Scale Exposure Fusion with Hybrid Learning

A small ISO and a small exposure time are usually used to capture an image in the back or low light conditions which results in an image with negligible motion blur and small noise but look dark. In this paper, a single image brightening algorithm is introduced to brighten such an image. The proposed algorithm includes a unique hybrid learning framework to generate two virtual images with large exposure times. The virtual images are first generated via intensity mapping functions (IMFs) which are computed using camera response functions (CRFs) and this is a model-driven approach. Both the virtual images are then enhanced by using a data-driven approach, i.e. a residual convolutional neural network to approach the ground truth images. The model-driven approach and the data-driven one compensate each other in the proposed hybrid learning framework. The final brightened image is obtained by fusing the original image and two virtual images via a multi-scale exposure fusion algorithm with properly defined weights. Experimental results show that the proposed brightening algorithm outperforms existing algorithms in terms of the MEF-SSIM metric.

preprint2020arXiv

Sub-micron single-particle perovskite plasmonic nanolasers at room temperature

Plasmonic nanolasers have received a substantial interest for their promising applications in integrated photonics, optical sensing, and biomedical imaging. To date, a room-temperature plasmonic nanolaser, submicron in all dimensions, remains elusive in the visible regime due to high metallic losses. Here, we demonstrate single-particle lasing around 2.3 eV with full-submicron, cesium lead bromide perovskite (CsPbBr3) crystals atop polymer-coated gold substrates at room temperature. With a large number (~100) of devices in total, we systematically study the lasing action of plasmonic test and photonic control groups. The achieved smallest plasmonic laser was 0.56 micrometer x 0.58 micrometer x 0.32 micrometer in size, ten-fold smaller than that of our smallest photonic laser. Key elements to efficient plasmonic lasing are identified as enhanced optical gain by the Purcell effect, long carrier diffusivity, a large spontaneous emission factor, and a high group index. Our results shed light on three-dimensional miniaturization of plasmonic lasers.

preprint2020arXiv

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Egocentric video recognition is a natural testbed for diverse interaction reasoning. Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, ie, one branch for verb classification and the other branch for noun classification. However, correlation studies between the verb and the noun branches have been largely ignored. Besides, the two branches fail to exploit local features due to the absence of a position-aware attention mechanism. In this paper, we propose a novel Symbiotic Attention framework leveraging Privileged information (SAP) for egocentric video recognition. Finer position-aware object detection features can facilitate the understanding of actor&#39;s interaction with the object. We introduce these features in action recognition and regard them as privileged information. Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information. This communication process not only injects local details into global features but also exploits implicit guidance about the spatio-temporal position of an on-going action. We introduce novel symbiotic attention (SA) to enable effective communication. It first normalizes the detection guided features on one branch to underline the action-relevant information from the other branch. SA adaptively enhances the interactions among the three sources. To further catalyze this communication, spatial relations are uncovered for the selection of most action-relevant information. It identifies the most valuable and discriminative feature for classification. We validate the effectiveness of our SAP quantitatively and qualitatively. Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.

preprint2020arXiv

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

The traditional object retrieval task aims to learn a discriminative feature representation with intra-similarity and inter-dissimilarity, which supposes that the objects in an image are manually or automatically pre-cropped exactly. However, in many real-world searching scenarios (e.g., video surveillance), the objects (e.g., persons, vehicles, etc.) are seldom accurately detected or annotated. Therefore, object-level retrieval becomes intractable without bounding-box annotation, which leads to a new but challenging topic, i.e. image-level search. In this paper, to address the image search issue, we first introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A Siamese architecture and an on-line pairing strategy for similar and dissimilar objects in the given images are designed. 2) A novel on-line pairing (OLP) loss is introduced with a dynamic feature dictionary, which alleviates the multi-task training stagnation problem, by automatically generating a number of negative pairs to restrict the positives. 3) A hard example priority (HEP) based softmax loss is proposed to improve the robustness of classification task by selecting hard categories. With the philosophy of divide and conquer, we further propose an improved I-Net, called DC-I-Net, which makes two new contributions: 1) two modules are tailored to handle different tasks separately in the integrated framework, such that the task specification is guaranteed. 2) A class-center guided HEP loss (C2HEP) by exploiting the stored class centers is proposed, such that the intra-similarity and inter-dissimilarity can be captured for ultimate retrieval. Extensive experiments on famous image-level search oriented benchmark datasets demonstrate that the proposed DC-I-Net outperforms the state-of-the-art tasks-integrated and tasks-separated image search models.

preprint2020arXiv

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective

The re-ranking approach leverages high-confidence retrieved samples to refine retrieval results, which have been widely adopted as a post-processing tool for image retrieval tasks. However, we notice one main flaw of re-ranking, i.e., high computational complexity, which leads to an unaffordable time cost for real-world applications. In this paper, we revisit re-ranking and demonstrate that re-ranking can be reformulated as a high-parallelism Graph Neural Network (GNN) function. In particular, we divide the conventional re-ranking process into two phases, i.e., retrieving high-quality gallery samples and updating features. We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph. In practice, GNN only needs to concern vertices with the connected edges. Since the graph is sparse, we can efficiently update the vertex features. On the Market-1501 dataset, we accelerate the re-ranking processing from 89.2s to 9.4ms with one K40m GPU, facilitating the real-time post-processing. Similarly, we observe that our method achieves comparable or even better retrieval results on the other four image retrieval benchmarks, i.e., VeRi-776, Oxford-5k, Paris-6k and University-1652, with limited time cost. Our code is publicly available.

preprint2020arXiv

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

We consider the problem of cross-view geo-localization. The primary challenge of this task is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to the traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and could provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn the viewpoint-invariant features and also has good generalization ability in the real-world scenario.

preprint2019arXiv

Cascaded Revision Network for Novel Object Captioning

Image captioning, a challenging task where the machine automatically describes an image by sentences, has drawn significant attention in recent years. Despite the remarkable improvements of recent approaches, however, these methods are built upon a large set of training image-sentence pairs. The expensive labor efforts hence limit the captioning model to describe the wider world. In this paper, we present a novel network structure, Cascaded Revision Network, which aims at relieving the problem by equipping the model with out-of-domain knowledge. CRN first tries its best to describe an image using the existing vocabulary from in-domain knowledge. Due to the lack of out-of-domain knowledge, the caption may be inaccurate or include ambiguous words for the image with unknown (novel) objects. We propose to re-edit the primary captioning sentence by a series of cascaded operations. We introduce a perplexity predictor to find out which words are most likely to be inaccurate given the input image. Thereafter, we utilize external knowledge from a pre-trained object detection model and select more accurate words from detection results by the visual matching module. In the last step, we design a semantic matching module to ensure that the novel object is fit in the right position. By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects. We validate the proposed method with state-of-the-art performance on the held-out MSCOCO dataset as well as scale to ImageNet, demonstrating the effectiveness of this method.

preprint2019arXiv

High-Resolution Near-Infrared Polarimetry and Sub-Millimeter Imaging of FS Tau A: Possible Streamers in Misaligned Circumbinary Disk System

We analyzed the young (2.8-Myr-old) binary system FS Tau A using near-infrared (H-band) high-contrast polarimetry data from Subaru/HiCIAO and sub-millimeter CO (J=2-1) line emission data from ALMA. Both the near-infrared and sub-millimeter observations reveal several clear structures extending to $\sim$240 AU from the stars. Based on these observations at different wavelengths, we report the following discoveries. One arm-like structure detected in the near-infrared band initially extends from the south of the binary with a subsequent turn to the northeast, corresponding to two bar-like structures detected in ALMA observations with an LSRK velocity of 1.19-5.64 km/s. Another feature detected in the near-infrared band extends initially from the north of the binary, relating to an arm-like structure detected in ALMA observations with an LSRK velocity of 8.17-16.43 km/s. From their shapes and velocities, we suggest that these structures can mostly be explained by two streamers that connect the outer circumbinary disk and the central binary components. These discoveries will be helpful for understanding the evolution of streamers and circumstellar disks in young binary systems.

preprint2019arXiv

SN 2016hil-- a Type II supernova in the remote outskirts of an elliptical host and its origin

Type II supernovae (SNe) stem from the core collapse of massive ($>8\ M_{\odot}$) stars. Owing to their short lifespan, we expect a very low rate of such events in elliptical host galaxies, where the star-formation rate is low, and which mostly consist of an old stellar population. SN 2016hil (iPTF16hil) is a Type II supernova located in the extreme outskirts of an elliptical galaxy at redshift $z=0.0608$ (projected distance $27.2$ kpc). It was detected near peak brightness ($M_{r} \approx -17$ mag) 9 days after the last nondetection. SN 2016hil has some potentially peculiar properties: while presenting a characteristic spectrum, the event was unusually short lived and declined by $\sim 1.5$ mag in $< 40$ days, following an apparently double-peaked light curve. Its spectra suggest a low metallicity ($Z<0.4\ Z_{\odot}$). We place a tentative upper limit on the mass of a potential faint host at $\log(M/M_{\odot}) =7.27^{+0.43}_{-0.24}$ using deep Keck optical imaging. In light of this, we discuss the possibility of the progenitor forming locally, and other more exotic formation scenarios such as a merger or common-envelope evolution causing a time-delayed explosion. Further observations of the explosion site in the ultraviolet are needed in order to distinguish between the cases. Regardless of the origin of the transient, observing a population of such seemingly hostless Type II SNe could have many uses, including an estimate the number of faint galaxies in a given volume, and tests of the prediction of a time-delayed population of core-collapse SNe in locations otherwise unfavorable for the detection of such events.

preprint2019arXiv

Very Long Natural Scenery Image Prediction by Outpainting

Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting.

preprint2018arXiv

A solid approach to biopharmaceutical stabilisation

Ensilication is a technology we developed that can physically stabilise proteins in silica without use of a pre-formed particle matrix. Stabilisation is done by tailor fitting individual proteins with a silica coat using a modified sol-gel process. Biopharmaceuticals, for example, liquid-formulated vaccines with adjuvants, have poor thermal stability. Heating or freezing impairs their potency. As a result, there is an increase in the prevalence of vaccine-preventable diseases in low-income countries even when there are means to combat them. One of the root causes lies in the problematic vaccine cold-chain distribution. We believe that ensilication can improve vaccine availability by enabling transportation without refrigeration. Here, we show that ensilication stabilises tetanus toxoid C fragment (TTCF) and demonstrate that this material can be stored and transported at ambient temperature without compromising the immunogenic properties of TTCF in vivo. TTCF is a component of the diphtheria, tetanus and pertussis (DTP) vaccine. To further our understanding of the ensilication process, and its protective effect on proteins we have studied the formation of TTCF-silica nanoparticles via time-resolved Small Angle X-ray Scattering (SAXS). Our results reveal ensilication to be a staged diffusion-limited cluster aggregation (DLCA) type reaction, induced by the presence of TTCF protein at neutral pH. Analysis of scattering data indicates tailor fitting of TTCF protein. The experimental in vivo immunisation data confirms the retention of immunogenicity after release from silica. Our results suggest that we could utilise this technology for multicomponent vaccines, therapeutics or other biopharmaceuticals that are not compatible with lyophilisation.

preprint2008arXiv

Confront Holographic QCD with Regge Trajectories of vectors and axial-vectors

We derive the general 5-dimension metric structure of the $Dp-Dq$ system in type II superstring theory, and demonstrate the physical meaning of the parameters characterizing the 5-dimension metric structure of the \textit{holographic} QCD model by relating them to the parameters describing Regge trajectories. By matching the spectra of vector mesons $ρ_1$ with deformed $Dp-Dq$ soft-wall model, we find that the spectra of vector mesons $ρ_1$ can be described very well in the soft-wall $D3-Dq$ model, i.e, $AdS_5$ soft-wall model. We then investigate how well the $AdS_5$ soft-wall model can describe the Regge trajectory of axial-vector mesons $a_1$. We find that the constant component of the 5-dimension mass square of axial-vector mesons plays an efficient role to realize the chiral symmetry breaking in the vacuum, and a small negative $z^4$ correction in the 5-dimension mass square is helpful to realize the chiral symmetry restoration in high excitation states.