Source author record

Yi Yang

Yi Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

182works

44topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

Large Language Models (LLMs) often fail to utilize their latent reasoning capabilities due to a distributional mismatch between ambiguous human inquiries and the structured logic required for machine activation. Existing alignment methods either incur prohibitive $O(N)$ costs by fine-tuning each model individually or rely on static prompts that fail to resolve query-level structural complexity. In this paper, we propose ReQueR (\textbf{Re}inforcement \textbf{Que}ry \textbf{R}efinement), a modular framework that treats reasoning elicitation as an inference-time alignment task. We train a specialized Refiner policy via Reinforcement Learning to rewrite raw queries into explicit logical decompositions, treating frozen LLMs as the environment. Rooted in the classical Zone of Proximal Development from educational psychology, we introduce the Adaptive Solver Hierarchy, a curriculum mechanism that stabilizes training by dynamically aligning environmental difficulty with the Refiner's evolving competence. ReQueR yields consistent absolute gains of 1.7\%--7.2\% across diverse architectures and benchmarks, outperforming strong baselines by 2.1\% on average. Crucially, it provides a promising paradigm for one-to-many inference-time reasoning elicitation, enabling a single Refiner trained on a small set of models to effectively unlock reasoning in diverse unseen models. Code is available at https://github.com/newera-xiao/ReQueR.

preprint2024arXiv

MS-DETR: Efficient DETR Training with Mixed Supervision

DETR accomplishes end-to-end object detection through iteratively generating multiple object candidates based on image features and promoting one candidate for each ground-truth object. The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates. We aim at improving the DETR training efficiency by explicitly supervising the candidate generation procedure through mixing one-to-one supervision and one-to-many supervision. Our approach, namely MS-DETR, is simple, and places one-to-many supervision to the object queries of the primary decoder that is used for inference. In comparison to existing DETR variants with one-to-many supervision, such as Group DETR and Hybrid DETR, our approach does not need additional decoder branches or object queries. The object queries of the primary decoder in our approach directly benefit from one-to-many supervision and thus are superior in object candidate prediction. Experimental results show that our approach outperforms related DETR variants, such as DN-DETR, Hybrid DETR, and Group DETR, and the combination with related DETR variants further improves the performance.

preprint2024arXiv

Newly Formed Dust within the Circumstellar Environment of SNIa-CSM 2018evt

Dust associated with various stellar sources in galaxies at all cosmic epochs remains a controversial topic, particularly whether supernovae (SNe) play an important role in dust production. We report evidence of dust formation in the cold, dense shell behind the ejecta-circumstellar medium (CSM) interaction in the Type Ia-CSM SN 2018evt three years after the explosion, characterized by a rise in the mid-infrared (MIR) emission accompanied by an accelerated decline in the optical radiation of the SN. Such a dust-formation picture is also corroborated by the concurrent evolution of the profiles of the Ha emission line. Our model suggests enhanced CSM dust concentration at increasing distances from the SN as compared to what can be expected from the density profile of the mass loss from a steady stellar wind. By the time of the last MIR observations at day +1041, a total amount of 1.2+-0.2x10^{-2} Msun of new dust has been formed by SN 2018evt, making SN 2018evt one of the most prolific dust factories among SNe with evidence of dust formation. The unprecedented witness of the intense production procedure of dust may shed light on the perceptions of dust formation in cosmic history.

preprint2024arXiv

Rotating black hole mimicker surrounded by the string cloud

Traversable wormholes and regular black holes usually represent completely different scenarios. But in the black bounce spacetime they can be described by a same line element, which is very attractive. Furthermore, the black hole photos taken by EHT show that black holes have spin, so spin is an indispensable intrinsic property of black holes in the actual universe. In this work, we derive a rotating black hole mimicker surrounded by the string cloud (SC), which can be interpolated to represent regular black hole spacetime and traversable wormhole spacetime. We investigate the effect of the spin $a$ and SC parameter $L$ on the observables (shadow radius $R_s$ and distortion $δ_s$) and energy emission rate of the black hole mimicker surrounded by the SC. We find that shadow for this spacetime is very sensitive to the $L$, i.e., the SC parameter can significantly increase the boundary of the shadow.

preprint2023arXiv

DR-WLC: Dimensionality Reduction cognition for object detection and pose estimation by Watching, Learning and Checking

Object detection and pose estimation are difficult tasks in robotics and autonomous driving. Existing object detection and pose estimation methods mostly adopt the same-dimensional data for training. For example, 2D object detection usually requires a large amount of 2D annotation data with high cost. Using high-dimensional information to supervise lower-dimensional tasks is a feasible way to reduce datasets size. In this work, the DR-WLC, a dimensionality reduction cognitive model, which can perform both object detection and pose estimation tasks at the same time is proposed. The model only requires 3D model of objects and unlabeled environment images (with or without objects) to finish the training. In addition, a bounding boxes generation strategy is also proposed to build the relationship between 3D model and 2D object detection task. Experiments show that our method can qualify the work without any manual annotations and it is easy to deploy for practical applications. Source code is at https://github.com/IN2-ViAUn/DR-WLC.

preprint2023arXiv

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM). MHR-Net aims to find a set of reasonable reconstructions for a 2D view, and it also selects the most likely reconstruction from the set. To deal with the challenging unsupervised generation of non-rigid shapes, we develop a new Deterministic Basis and Stochastic Deformation scheme in MHR-Net. The non-rigid shape is first expressed as the sum of a coarse shape basis and a flexible shape deformation, then multiple hypotheses are generated with uncertainty modeling of the deformation part. MHR-Net is optimized with reprojection loss on the basis and the best hypothesis. Furthermore, we design a new Procrustean Residual Loss, which reduces the rigid rotations between similar shapes and further improves the performance. Experiments show that MHR-Net achieves state-of-the-art reconstruction accuracy on Human3.6M, SURREAL and 300-VW datasets.

preprint2023arXiv

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

Neural Radiance Fields (NeRF) methods have proved effective as compact, high-quality and versatile representations for 3D scenes, and enable downstream tasks such as editing, retrieval, navigation, etc. Various neural architectures are vying for the core structure of NeRF, including the plain Multi-Layer Perceptron (MLP), sparse tensors, low-rank tensors, hashtables and their compositions. Each of these representations has its particular set of trade-offs. For example, the hashtable-based representations admit faster training and rendering but their lack of clear geometric meaning hampers downstream tasks like spatial-relation-aware editing. In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions. PVD consequently empowers downstream applications to optimally adapt the neural representations for the task at hand in a post hoc fashion. The conversions are fast, as distillation is progressively performed on different levels of volume representations, from shallower to deeper. We also employ special treatment of density to deal with its specific numerical instability problem. Empirical evidence is presented to validate our method on the NeRF-Synthetic, LLFF and TanksAndTemples datasets. For example, with PVD, an MLP-based NeRF model can be distilled from a hashtable-based Instant-NGP model at a 10X~20X faster speed than being trained the original NeRF from scratch, while achieving a superior level of synthesis quality. Code is available at https://github.com/megvii-research/AAAI2023-PVD.

preprint2023arXiv

Temporal Perceiving Video-Language Pre-training

Video-Language Pre-training models have recently significantly improved various multi-modal downstream tasks. Previous dominant works mainly adopt contrastive learning to achieve global feature alignment across modalities. However, the local associations between videos and texts are not modeled, restricting the pre-training models' generality, especially for tasks requiring the temporal video boundary for certain query texts. This work introduces a novel text-video localization pre-text task to enable fine-grained temporal and semantic alignment such that the trained model can accurately perceive temporal boundaries in videos given the text description. Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features. To produce temporal boundaries, frame features in several videos are manually merged into a long video sequence that interacts with a text sequence. With the localization task, our method connects the fine-grained frame representations with the word representations and implicitly distinguishes representations of different instances in the single modality. Notably, comprehensive experimental results show that our method significantly improves the state-of-the-art performance on various benchmarks, covering text-to-video retrieval, video question answering, video captioning, temporal action localization and temporal moment retrieval. The code will be released soon.

preprint2022arXiv

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

NLP researchers propose different word-substitute black-box attacks that can fool text classification models. In such attack, an adversary keeps sending crafted adversarial queries to the target model until it can successfully achieve the intended outcome. State-of-the-art attack methods usually require hundreds or thousands of queries to find one adversarial example. In this paper, we study whether a sophisticated adversary can attack the system with much less queries. We propose a simple yet efficient method that can reduce the average number of adversarial queries by 3-30 times and maintain the attack effectiveness. This research highlights that an adversary can fool a deep NLP model with much less cost.

preprint2022arXiv

Active Learning for Deep Visual Tracking

Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled training samples, and the number and quality of these samples directly affect the representational capability of the trained model. However, this approach is restrictive in practice, because manually labeling such a large number of training samples is time-consuming and prohibitively expensive. In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model. Under the guidance of active learning, the tracker based on the trained deep CNNs model can achieve competitive tracking performance while reducing the labeling cost. More specifically, to ensure the diversity of selected samples, we propose an active learning method based on multi-frame collaboration to select those training samples that should be and need to be annotated. Meanwhile, considering the representativeness of these selected samples, we adopt a nearest neighbor discrimination method based on the average nearest neighbor distance to screen isolated samples and low-quality samples. Therefore, the training samples subset selected based on our method requires only a given budget to maintain the diversity and representativeness of the entire sample set. Furthermore, we adopt a Tversky loss to improve the bounding box estimation of our tracker, which can ensure that the tracker achieves more accurate target states. Extensive experimental results confirm that our active learning-based tracker (ALT) achieves competitive tracking accuracy and speed compared with state-of-the-art trackers on the seven most challenging evaluation benchmarks.

preprint2022arXiv

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates with similar shapes in a local area while missing other representative ones in the global environment. In this paper, we propose a new 3D region-based active learning method to tackle this problem. Dubbed SSDR-AL, our method groups the original point clouds into superpoints and incrementally selects the most informative and representative ones for label acquisition. We achieve the selection mechanism via a graph reasoning network that considers both the spatial and structural diversities of superpoints. To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints. Extensive experiments on two point cloud benchmarks demonstrate the effectiveness of SSDR-AL in the semantic segmentation task. Particularly, SSDR-AL significantly outperforms the baseline method and reduces the annotation cost by up to 63.0% and 24.0% when achieving 90% performance of fully supervised learning, respectively.

preprint2022arXiv

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Vast requirement of computation power of Deep Neural Networks is a major hurdle to their real world applications. Many recent Application Specific Integrated Circuit (ASIC) chips feature dedicated hardware support for Neural Network Acceleration. However, as ASICs take multiple years to develop, they are inevitably out-paced by the latest development in Neural Architecture Research. For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy. In this paper, we propose Arch-Net, a family of Neural Networks made up of only operators efficiently supported across most architectures of ASICs. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. Empirical results on machine translation and image classification tasks confirm that we can transform latest developed Neural Architectures into fast running and as-accurate Arch-Net, ready for deployment on multiple mass-produced ASIC chips. The code will be available at https://github.com/megvii-research/Arch-Net.

preprint2022arXiv

Automated Progressive Learning for Efficient Training of Vision Transformers

Recent advances in vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs. Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training. In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning. First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth. Then, we propose automated progressive learning (AutoProg), an efficient training scheme that aims to achieve lossless acceleration by automatically increasing the training overload on-the-fly; this is achieved by adaptively deciding whether, where and how much should the model grow during progressive learning. Specifically, we first relax the optimization of the growth schedule to sub-network architecture optimization problem, then propose one-shot estimation of the sub-network performance via an elastic supernet. The searching overhead is reduced to minimal by recycling the parameters of the supernet. Extensive experiments of efficient training on ImageNet with two representative ViT models, DeiT and VOLO, demonstrate that AutoProg can accelerate ViTs training by up to 85.1% with no performance drop. Code: https://github.com/changlin31/AutoProg

preprint2022arXiv

Automatic Depth Optimization for Quantum Approximate Optimization Algorithm

Quantum Approximate Optimization Algorithm (QAOA) is a hybrid algorithm whose control parameters are classically optimized. In addition to the variational parameters, the right choice of hyperparameter is crucial for improving the performance of any optimization model. Control depth, or the number of variational parameters, is considered as the most important hyperparameter for QAOA. In this paper we investigate the control depth selection with an automatic algorithm based on proximal gradient descent. The performances of the automatic algorithm are demonstrated on 7-node and 10-node Max-Cut problems, which show that the control depth can be significantly reduced during the iteration while achieving an sufficient level of optimization accuracy. With theoretical convergence guarantee, the proposed algorithm can be used as an efficient tool for choosing the appropriate control depth as a replacement of random search or empirical rules. Moreover, the reduction of control depth will induce a significant reduction in the number of quantum gates in circuit, which improves the applicability of QAOA on Noisy Intermediate-scale Quantum (NISQ) devices.

preprint2022arXiv

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

A thriving trend for domain adaptive segmentation endeavors to generate the high-quality pseudo labels for target domain and retrain the segmentor on them. Under this self-training paradigm, some competitive methods have sought to the latent-space information, which establishes the feature centroids (a.k.a prototypes) of the semantic classes and determines the pseudo label candidates by their distances from these centroids. In this paper, we argue that the latent space contains more information to be exploited thus taking one step further to capitalize on it. Firstly, instead of merely using the source-domain prototypes to determine the target pseudo labels as most of the traditional methods do, we bidirectionally produce the target-domain prototypes to degrade those source features which might be too hard or disturbed for the adaptation. Secondly, existing attempts simply model each category as a single and isotropic prototype while ignoring the variance of the feature distribution, which could lead to the confusion of similar categories. To cope with this issue, we propose to represent each category with multiple and anisotropic prototypes via Gaussian Mixture Model, in order to fit the de facto distribution of source domain and estimate the likelihood of target samples based on the probability density. We apply our method on GTA5->Cityscapes and Synthia->Cityscapes tasks and achieve 61.2 and 62.8 respectively in terms of mean IoU, substantially outperforming other competitive self-training methods. Noticeably, in some categories which severely suffer from the categorical confusion such as "truck" and "bus", our method achieves 56.4 and 68.8 respectively, which further demonstrates the effectiveness of our design.

preprint2022arXiv

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.

preprint2022arXiv

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

Cross-domain person re-identification (re-ID), such as unsupervised domain adaptive (UDA) re-ID, aims to transfer the identity-discriminative knowledge from the source to the target domain. Existing methods commonly consider the source and target domains are isolated from each other, i.e., no intermediate status is modeled between both domains. Directly transferring the knowledge between two isolated domains can be very difficult, especially when the domain gap is large. From a novel perspective, we assume these two domains are not completely isolated, but can be connected through intermediate domains. Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains for a smooth knowledge transfer. To discover and utilize these intermediate domains, we propose an Intermediate Domain Module (IDM) and a Mirrors Generation Module (MGM). IDM has two functions: 1) it generates multiple intermediate domains by mixing the hidden-layer features from source and target domains and 2) it dynamically reduces the domain gap between the source / target domain features and the intermediate domain features. While IDM achieves good domain alignment, it introduces a side effect, i.e., the mix-up operation may mix the identities into a new identity and lose the original identities. To compensate this, MGM is introduced by mapping the features into the IDM-generated intermediate domains without changing their original identity. It allows to focus on minimizing domain variations to promote the alignment between the source / target domain and intermediate domains, which reinforces IDM into IDM++. We extensively evaluate our method under both the UDA and domain generalization (DG) scenarios and observe that IDM++ yields consistent performance improvement for cross-domain re-ID, achieving new state of the art.

preprint2022arXiv

Compact Scintillator Array Detector (ComSAD) for sounding rocket and CubeSat missions

The development of CubeSat and more frequent launch chances of sounding rockets are a total game changer to the space program, and it allows us to build space instruments to be more achievable and affordable. Therefore, it gives us a good opportunity to build a small cosmic ray detector which has capabilities to measure the flux, direction, and even energy of cosmic rays at the height above the limitation of balloon experiments, and it may open a new door for building a constellation of detectors to study cosmic ray physics. Compact Scintillator Array Detector (ComSAD) is dedicated for the sounding rocket mission of Taiwan's National Space Organization. In paper, we present the idea, design, and performance of ComSAD which is also suitable for CubeSat missions in the future.

preprint2022arXiv

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence. Thanks to the semantic diversity of natural language descriptions, temporal grounding allows activity grounding beyond pre-defined classes and has received increasing attention in recent years. The semantic diversity is rooted in the principle of compositionality in linguistics, where novel semantics can be systematically described by combining known words in novel ways (compositional generalization). However, current temporal grounding datasets do not specifically test for the compositional generalizability. To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i.e., Charades-CG and ActivityNet-CG. Evaluating the state-of-the-art methods on our new dataset splits, we empirically find that they fail to generalize to queries with novel combinations of seen words. To tackle this challenge, we propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies and learns fine-grained semantic correspondence among them. Experiments illustrate the superior compositional generalizability of our approach. The repository of this work is at https://github.com/YYJMJC/ Compositional-Temporal-Grounding.

preprint2022arXiv

Computational discovery of spin-polarized semimetals in spinel materials

The materials with spin-polarized electronic states have attracted a huge amount of interest due to their potential applications in spintronics. Based on first-principles calculations, we study the electronic characteristics of a series of AB2X4 chalcogeniden spinel structures and propose two promising candidates, VZn2O4 and VCd2S4, are spin-polarized semimetal materials. Both of them have ferromagnetic ground states. Their bands near the Fermi level are completely spin-polarized and form two types of nodal rings in the spin-up channel, and the large gaps in the spin-down channel prevent the spin-flip. Further symmetry analysis reveals that the nodal rings are protected by the glide mirror or mirror symmetries. Significantly, these nodal rings connect with each other and form a nodal chain structure, which can be well described by a simple four-band tight-binding (TB) model. The two ternary chalcogeniden spinel materials with a fully spin polarized nodal chain can serve as a prominent platform in the future applications of spintronic.

preprint2022arXiv

Constraints from LIGO O3 data on gravitational-wave emission due to r-modes in the glitching pulsar PSR J0537-6910

We present a search for continuous gravitational-wave emission due to r-modes in the pulsar PSR J0537-6910 using data from the LIGO-Virgo Collaboration observing run O3. PSR J0537-6910 is a young energetic X-ray pulsar and is the most frequent glitcher known. The inter-glitch braking index of the pulsar suggests that gravitational-wave emission due to r-mode oscillations may play an important role in the spin evolution of this pulsar. Theoretical models confirm this possibility and predict emission at a level that can be probed by ground-based detectors. In order to explore this scenario, we search for r-mode emission in the epochs between glitches by using a contemporaneous timing ephemeris obtained from NICER data. We do not detect any signals in the theoretically expected band of 86-97 Hz, and report upper limits on the amplitude of the gravitational waves. Our results improve on previous amplitude upper limits from r-modes in J0537-6910 by a factor of up to 3 and place stringent constraints on theoretical models for r-mode driven spin-down in PSR J0537-6910, especially for higher frequencies at which our results reach below the spin-down limit defined by energy conservation.

preprint2022arXiv

COVID-19 Detection Using CT Image Based On YOLOv5 Network

Computer aided diagnosis (CAD) increases diagnosis efficiency, helping doctors providing a quick and confident diagnosis, it has played an important role in the treatment of COVID19. In our task, we solve the problem about abnormality detection and classification. The dataset provided by Kaggle platform and we choose YOLOv5 as our model. We introduce some methods on objective detection in the related work section, the objection detection can be divided into two streams: onestage and two stage. The representational model are Faster RCNN and YOLO series. Then we describe the YOLOv5 model in the detail. Compared Experiments and results are shown in section IV. We choose mean average precision (mAP) as our experiments' metrics, and the higher (mean) mAP is, the better result the model will gain. mAP@0.5 of our YOLOv5s is 0.623 which is 0.157 and 0.101 higher than Faster RCNN and EfficientDet respectively.

preprint2022arXiv

Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning

Brain networks characterize complex connectivities among brain regions as graph structures, which provide a powerful means to study brain connectomes. In recent years, graph neural networks have emerged as a prevalent paradigm of learning with structured data. However, most brain network datasets are limited in sample sizes due to the relatively high cost of data acquisition, which hinders the deep learning models from sufficient training. Inspired by meta-learning that learns new concepts fast with limited training examples, this paper studies data-efficient training strategies for analyzing brain connectomes in a cross-dataset setting. Specifically, we propose to meta-train the model on datasets of large sample sizes and transfer the knowledge to small datasets. In addition, we also explore two brain-network-oriented designs, including atlas transformation and adaptive task reweighing. Compared to other pre-training strategies, our meta-learning-based approach achieves higher and stabler performance, which demonstrates the effectiveness of our proposed solutions. The framework is also able to derive new insights regarding the similarities among datasets and diseases in a data-driven fashion.

preprint2022arXiv

Deep Hierarchical Semantic Segmentation

Humans are able to recognize structured relations in observation, allowing us to decompose complex scenes into simpler parts and abstract the visual world in multiple levels. However, such hierarchical reasoning ability of human perception remains largely unexplored in current literature of semantic segmentation. Existing work is often aware of flatten labels and predicts target classes exclusively for each pixel. In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. We devise HSSN, a general HSS framework that tackles two critical issues in this task: i) how to efficiently adapt existing hierarchy-agnostic segmentation networks to the HSS setting, and ii) how to leverage the hierarchy information to regularize HSS network learning. To address i), HSSN directly casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. To solve ii), HSSN first explores inherent properties of the hierarchy as a training objective, which enforces segmentation predictions to obey the hierarchy structure. Further, with hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations and improve segmentation eventually. We conduct experiments on four semantic segmentation datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, and PASCAL-Person-Part), with different class hierarchies, segmentation network architectures and backbones, showing the generalization and superiority of HSSN.

preprint2022arXiv

Early-Time Ultraviolet Spectroscopy and Optical Follow-up Observations of the Type IIP Supernova 2021yja

We present three epochs of early-time ultraviolet (UV) and optical HST/STIS spectroscopy of the young, nearby Type IIP supernova (SN) 2021yja. We complement the HST data with two earlier epochs of Swift UVOT spectroscopy. The HST and Swift UVOT spectra are consistent with those of other well-studied Type IIP supernovae (SNe). The UV spectra exhibit rapid cooling at early times, while less dramatic changes are seen in the optical. We also present Lick/KAIT optical photometry up to the late-time-tail phase, showing a very long plateau and shallow decline compared with other SNe IIP. Our modeling of the UV spectrum with the TARDIS radiative-transfer code produces a good fit for a high-velocity explosion, a low total extinction $E(B-V) = 0.07$ mag, and a subsolar metallicity. We do not find a significant contribution to the UV flux from an additional heating source, such as interaction with the circumstellar medium, consistent with the observed flat plateau. Furthermore, the velocity width of the Mg II $λ$2798 line is comparable to that of the hydrogen Balmer lines, suggesting that the UV emission is confined to a region close to the photosphere.

preprint2022arXiv

Few-Shot Segmentation via Cycle-Consistent Transformer

Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as conditional information. These methods cannot utilize all pixel-wise support information for the query predictions, which is however critical for the segmentation task. In this paper, we focus on utilizing pixel-wise relationships between support and query images to facilitate the few-shot segmentation task. We design a novel Cycle-Consistent TRansformer (CyCTR) module to aggregate pixel-wise support features into query ones. CyCTR performs cross-attention between features from different images, i.e. support and query images. We observe that there may exist unexpected irrelevant pixel-level support features. Directly performing cross-attention may aggregate these features from support to query and bias the query features. Thus, we propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features and encourage query features to attend to the most informative pixels from support images. Experiments on all few-shot segmentation benchmarks demonstrate that our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods. Specifically, on Pascal-$5^i$ and COCO-$20^i$ datasets, we achieve 67.5% and 45.6% mIoU for 5-shot segmentation, outperforming previous state-of-the-art methods by 5.6% and 7.1% respectively.

preprint2022arXiv

Filter Pruning by Switching to Neighboring CNNs with Good Attributes

Filter pruning is effective to reduce the computational costs of neural networks. Existing methods show that updating the previous pruned filter would enable large model capacity and achieve better performance. However, during the iterative pruning process, even if the network weights are updated to new values, the pruning criterion remains the same. In addition, when evaluating the filter importance, only the magnitude information of the filters is considered. However, in neural networks, filters do not work individually, but they would affect other filters. As a result, the magnitude information of each filter, which merely reflects the information of an individual filter itself, is not enough to judge the filter importance. To solve the above problems, we propose Meta-attribute-based Filter Pruning (MFP). First, to expand the existing magnitude information based pruning criteria, we introduce a new set of criteria to consider the geometric distance of filters. Additionally, to explicitly assess the current state of the network, we adaptively select the most suitable criteria for pruning via a meta-attribute, a property of the neural network at the current state. Experiments on two image classification benchmarks validate our method. For ResNet-50 on ILSVRC-2012, we could reduce more than 50% FLOPs with only 0.44% top-5 accuracy loss.

preprint2022arXiv

Free-electron-light interactions in nanophotonics

When impinging on optical structures or passing in their vicinity, free electrons can spontaneously emit electromagnetic radiation, a phenomenon generally known as cathodoluminescence. Free-electron radiation comes in many guises: Cherenkov, transition, and Smith-Purcell radiation, but also electron scintillation, commonly referred to as incoherent cathodoluminescence. While those effects have been at the heart of many fundamental discoveries and technological developments in high-energy physics in the past century, their recent demonstration in photonic and nanophotonic systems has attracted a lot of attention. Those developments arose from predictions that exploit nanophotonics for novel radiation regimes, now becoming accessible thanks to advances in nanofabrication. In general, the proper design of nanophotonic structures can enable shaping, control, and enhancement of free-electron radiation, for any of the above-mentioned effects. Free-electron radiation in nanophotonics opens the way to promising applications, such as widely-tunable integrated light sources from x-ray to THz frequencies, miniaturized particle accelerators, and highly sensitive high-energy particle detectors. Here, we review the emerging field of free-electron radiation in nanophotonics. We first present a general, unified framework to describe free-electron light-matter interaction in arbitrary nanophotonic systems. We then show how this framework sheds light on the physical underpinnings of many methods in the field used to control and enhance free-electron radiation. Namely, the framework points to the central role played by the photonic eigenmodes in controlling the output properties of free-electron radiation (e.g., frequency, directionality, and polarization). [... see full abstract in paper]

preprint2022arXiv

Gap Opening and Inner Disk Structure in the Strongly Accreting Transition Disk of DM Tau

Large inner dust gaps in transition disks are frequently posited as evidence of giant planets sculpting gas and dust in the disk, or the opening of a gap by photoevaporative winds. Although the former hypothesis is strongly supported by the observations of planets and deep depletions in gas within the gap some disks, many T Tauri stars hosting transition disks accrete at rates typical for an undepleted disk, raising the question of how gap opening occurs in these objects. We thus present an analysis of the structure of the transition disk around the T Tauri star DM Tau, which is strongly accreting ($\sim 10^{-8.3}~\mathrm{M}_\odot~ \mathrm{yr}^{-1}$) and turbulent ($α=0.078 \pm 0.02$). Using the DALI thermochemical code, we fit disk models to simultaneously reproduce the accretion rate, high level of turbulence, the gas traced by ALMA band 6 observations of $^{12}$CO, $^{13}$CO, and C$^{18}$O J=2--1 lines, and the observed dust emission from the mm continuum and spectral energy distribution. We find a shallow depletion in gas surface density of $\sim 10$ relative to the outer disk and a gas rich inner disk is consistent with the observations. The planet mass of $<1$ M$_\mathrm{Jup}$ implied by the gap depth is in tension with predictions for dust trapping in a highly viscous disk, which requires a more massive planet of of $\sim10$M$_\mathrm{Jup}$. Photoevaporative models including a dead zone can qualitatively reproduce some features of the DM Tau disk, but still struggle to explain the high accretion rates and the observed mm continuum flux.

preprint2022arXiv

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training. In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge "legos" in each layers, on labeled multi-task and multi-domain datasets. By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks: (1) simultaneous exposure to diverse cross-task and cross-domain information in each batch. (2) partitioned knowledge storage in separate lego units driven by knowledge sharing. (3) sparse activation of a subset of lego units for both pre-training and downstream tasks. Noteworthy, the joint training of disparate vision tasks is non-trivial due to their differences in input shapes, loss functions, output formats, data distributions, etc. Therefore, we innovatively develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training. SIMT lays the foundation of pre-training with large-scale multi-task multi-domain datasets and is proved essential for stable training in our GPPF experiments. Excitingly, the exhaustive experiments show that, our GPPF-R50 model achieves significant improvements of 2.5-5.8 over a strong baseline of the 8 pre-training tasks in GPPF-15M and harvests a range of SOTAs over the 22 downstream tasks with similar computation budgets. We also validate the generalization ability of GPPF to SOTA vision transformers with consistent improvements. These solid experimental results fully prove the effective knowledge learning, storing, sharing, and transfer provided by our novel GPPF framework.

preprint2022arXiv

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

Modeling temporal information for both detection and tracking in a unified framework has been proved a promising solution to video instance segmentation (VIS). However, how to effectively incorporate the temporal information into an online model remains an open problem. In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way. In detail, IAI employs a novel identification module to predict identification number for tracking instances explicitly. For passing temporal information cross frame, IAI utilizes an association module which combines current features and past embeddings. Notably, IAI can be integrated with different image models. We conduct extensive experiments on three VIS benchmarks. IAI outperforms all the online competitors on YouTube-VIS-2019 (ResNet-101 43.7 mAP) and YouTube-VIS-2021 (ResNet-50 38.0 mAP). Surprisingly, on the more challenging OVIS, IAI achieves SOTA performance (20.6 mAP). Code is available at https://github.com/zfonemore/IAI

preprint2022arXiv

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

Recently, increasing efforts have been focused on Weakly Supervised Scene Graph Generation (WSSGG). The mainstream solution for WSSGG typically follows the same pipeline: they first align text entities in the weak image-level supervisions (e.g., unlocalized relation triplets or captions) with image regions, and then train SGG models in a fully-supervised manner with aligned instance-level "pseudo" labels. However, we argue that most existing WSSGG works only focus on object-consistency, which means the grounded regions should have the same object category label as text entities. While they neglect another basic requirement for an ideal alignment: interaction-consistency, which means the grounded region pairs should have the same interactions (i.e., visual relations) as text entity pairs. Hence, in this paper, we propose to enhance a simple grounding module with both object-aware and interaction-aware knowledge to acquire more reliable pseudo labels. To better leverage these two types of knowledge, we regard them as two teachers and fuse their generated targets to guide the training process of our grounding module. Specifically, we design two different strategies to adaptively assign weights to different teachers by assessing their reliability on each training sample. Extensive experiments have demonstrated that our method consistently improves WSSGG performance on various kinds of weak supervision.

preprint2022arXiv

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Sign language is the window for people differently-abled to express their feelings as well as emotions. However, it remains challenging for people to learn sign language in a short time. To address this real-world challenge, in this work, we study the motion transfer system, which can transfer the user photo to the sign language video of specific words. In particular, the appearance content of the output video comes from the provided user image, while the motion of the video is extracted from the specified tutorial video. We observe two primary limitations in adopting the state-of-the-art motion transfer methods to sign language generation:(1) Existing motion transfer works ignore the prior geometrical knowledge of the human body. (2) The previous image animation methods only take image pairs as input in the training stage, which could not fully exploit the temporal information within videos. In an attempt to address the above-mentioned limitations, we propose Structure-aware Temporal Consistency Network (STCNet) to jointly optimize the prior structure of human with the temporal consistency for sign language video generation. There are two main contributions in this paper. (1) We harness a fine-grained skeleton detector to provide prior knowledge of the body keypoints. In this way, we ensure the keypoint movement in a valid range and make the model become more explainable and robust. (2) We introduce two cycle-consistency losses, i.e., short-term cycle loss and long-term cycle loss, which are conducted to assure the continuity of the generated video. We optimize the two losses and keypoint detector network in an end-to-end manner.

preprint2022arXiv

Krylov complexity and orthogonal polynomials

Krylov complexity measures operator growth with respect to a basis, which is adapted to the Heisenberg time evolution. The construction of that basis relies on the Lanczos algorithm, also known as the recursion method. The mathematics of Krylov complexity can be described in terms of orthogonal polynomials. We provide a pedagogical introduction to the subject and work out analytically a number of examples involving the classical orthogonal polynomials, polynomials of the Hahn class, and the Tricomi-Carlitz polynomials.

preprint2022arXiv

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

The Scene Graph Generation (SGG) task aims to detect all the objects and their pairwise visual relationships in a given image. Although SGG has achieved remarkable progress over the last few years, almost all existing SGG models follow the same training paradigm: they treat both object and predicate classification in SGG as a single-label classification problem, and the ground-truths are one-hot target labels. However, this prevalent training paradigm has overlooked two characteristics of current SGG datasets: 1) For positive samples, some specific subject-object instances may have multiple reasonable predicates. 2) For negative samples, there are numerous missing annotations. Regardless of the two characteristics, SGG models are easy to be confused and make wrong predictions. To this end, we propose a novel model-agnostic Label Semantic Knowledge Distillation (LS-KD) for unbiased SGG. Specifically, LS-KD dynamically generates a soft label for each subject-object instance by fusing a predicted Label Semantic Distribution (LSD) with its original one-hot target label. LSD reflects the correlations between this instance and multiple predicate categories. Meanwhile, we propose two different strategies to predict LSD: iterative self-KD and synchronous self-KD. Extensive ablations and results on three SGG tasks have attested to the superiority and generality of our proposed LS-KD, which can consistently achieve decent trade-off performance between different predicate categories.

preprint2022arXiv

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a locality-aware inter-and intra-video reconstruction framework that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our LIIR location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints.

preprint2022arXiv

Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data

Inferring the stereo structure of objects in the real world is a challenging yet practical task. To equip deep models with this ability usually requires abundant 3D supervision which is hard to acquire. It is promising that we can simply benefit from synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gaps are nontrivial considering the variant texture, shape and context. To overcome these difficulties, we propose a Visio-Perceptual Adaptive Network for single-view 3D reconstruction, dubbed VPAN. To generalize the model towards a real scenario, we propose to fulfill several aspects: (1) Look: visually incorporate spatial structure from the single view to enhance the expressiveness of representation; (2) Cast: perceptually align the 2D image features to the 3D shape priors with cross-modal semantic contrastive mapping; (3) Mold: reconstruct stereo-shape of target by transforming embeddings into the desired manifold. Extensive experiments on several benchmarks demonstrate the effectiveness and robustness of the proposed method in learning the 3D shape manifold from synthetic data via a single-view. The proposed method outperforms state-of-the-arts on Pix3D dataset with IoU 0.292 and CD 0.108, and reaches IoU 0.329 and CD 0.104 on Pascal 3D+.

preprint2022arXiv

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

3D-aware image synthesis aims to generate images of objects from multiple views by learning a 3D representation. However, one key challenge remains: existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images. To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints. By leveraging the underlying 3D geometry information of generated images, i.e., depth and camera transformation matrix, we explicitly establish stereo correspondence between views to perform multi-view joint optimization. In particular, we enforce the photometric consistency between pairs of views and integrate a stereo mixup mechanism into the training process, encouraging the model to reason about the correct 3D shape. Besides, we design a two-stage training strategy with feature-level multi-view joint optimization to improve the image quality. Extensive experiments on three datasets demonstrate that MVCGAN achieves the state-of-the-art performance for 3D-aware image synthesis.

preprint2022arXiv

Non-Abelian nonsymmorphic chiral symmetries

The Hofstadter model exemplifies a large class of physical systems characterized by particles hopping on a lattice immersed in a gauge field. Recent advancements on various synthetic platforms have enabled highly-controllable simulations of such systems with tailored gauge fields featuring complex spatial textures. These synthetic gauge fields could introduce synthetic symmetries that do not appear in electronic materials. Here, in an SU(2) non-Abelian Hofstadter model, we theoretically show the emergence of multiple nonsymmorphic chiral symmetries, which combine an internal unitary anti-symmetry with fractional spatial translation. Depending on the values of the gauge fields, the nonsymmorphic chiral symmetries can exhibit non-Abelian algebra and protect Kramer quartet states in the bulk band structure, creating general four-fold degeneracy at all momenta. These nonsymmorphic chiral symmetries protect double Dirac semimetals at zero energy, which become gapped into quantum confined insulating phases upon introducing a boundary. Moreover, the parity of the system size can determine whether the resulting insulating phase is trivial or topological. Our work indicates a pathway for creating topology via synthetic symmetries emergent from synthetic gauge fields.

preprint2022arXiv

Observations of the Very Young Type Ia Supernova 2019np with Early-excess Emission

Early-time radiative signals from type Ia supernovae (SNe Ia) can provide important constraints on the explosion mechanism and the progenitor system. We present observations and analysis of SN 2019np, a nearby SN Ia discovered within 1-2 days after the explosion. Follow-up observations were conducted in optical, ultraviolet, and near-infrared bands, covering the phases from $\sim-$16.7 days to $\sim$+367.8 days relative to its $B-$band peak luminosity. The photometric and spectral evolutions of SN 2019np resembles the average behavior of normal SNe Ia. The absolute B-band peak magnitude and the post-peak decline rate are $M_{\rm max}(B)=-19.52 \pm 0.47$mag and $Δm_{\rm15}(B) =1.04 \pm 0.04$mag, respectively. No Hydrogen line has been detected in the near-infrared and nebular-phase spectra of SN 2019np. Assuming that the $^{56}$Ni powering the light curve is centrally located, we find that the bolometric light curve of SN 2019np shows a flux excess up to 5.0% in the early phase compared to the radiative diffusion model. Such an extra radiation perhaps suggests the presence of an additional energy source beyond the radioactive decay of central nickel. Comparing the observed color evolution with that predicted by different models such as interactions of SN ejecta with circumstellar matter (CSM)/companion star, a double-detonation explosion from a sub-Chandrasekhar mass white dwarf (WD), and surface $^{56}$Ni mixing, the latter one is favored.

preprint2022arXiv

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

preprint2022arXiv

ReGO: Reference-Guided Outpainting for Scenery Image

We aim to tackle the challenging yet practical scenery image outpainting task in this work. Recently, generative adversarial learning has significantly advanced the image outpainting by producing semantic consistent content for the given image. However, the existing methods always suffer from the blurry texture and the artifacts of the generative part, making the overall outpainting results lack authenticity. To overcome the weakness, this work investigates a principle way to synthesize texture-rich results by borrowing pixels from its neighbors (i.e., reference images), named \textbf{Re}ference-\textbf{G}uided \textbf{O}utpainting (ReGO). Particularly, the ReGO designs an Adaptive Content Selection (ACS) module to transfer the pixel of reference images for texture compensating of the target one. To prevent the style of the generated part from being affected by the reference images, a style ranking loss is further proposed to augment the ReGO to synthesize style-consistent results. Extensive experiments on two popular benchmarks, NS6K \cite{yangzx} and NS8K \cite{wang}, well demonstrate the effectiveness of our ReGO. Our code will be made public available.

preprint2022arXiv

Results and findings of the 2021 Image Similarity Challenge

The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or hiding into unrelated images, combined with local pixel perturbations. The key algorithmic elements in the winning submissions are: training on strong augmentations, self-supervised learning, score normalization, explicit overlay detection, and global descriptor matching followed by pairwise image comparison.

preprint2022arXiv

Search for anisotropic gravitational-wave backgrounds using data from Advanced LIGO and Advanced Virgo's first three observing runs

We report results from searches for anisotropic stochastic gravitational-wave backgrounds using data from the first three observing runs of the Advanced LIGO and Advanced Virgo detectors. For the first time, we include Virgo data in our analysis and run our search with a new efficient pipeline called {\tt PyStoch} on data folded over one sidereal day. We use gravitational-wave radiometry (broadband and narrow band) to produce sky maps of stochastic gravitational-wave backgrounds and to search for gravitational waves from point sources. A spherical harmonic decomposition method is employed to look for gravitational-wave emission from spatially-extended sources. Neither technique found evidence of gravitational-wave signals. Hence we derive 95\% confidence-level upper limit sky maps on the gravitational-wave energy flux from broadband point sources, ranging from $F_{α, Θ} < {\rm (0.013 - 7.6)} \times 10^{-8} {\rm erg \, cm^{-2} \, s^{-1} \, Hz^{-1}},$ and on the (normalized) gravitational-wave energy density spectrum from extended sources, ranging from $Ω_{α, Θ} < {\rm (0.57 - 9.3)} \times 10^{-9} \, {\rm sr^{-1}}$, depending on direction ($Θ$) and spectral index ($α$). These limits improve upon previous limits by factors of $2.9 - 3.5$. We also set 95\% confidence level upper limits on the frequency-dependent strain amplitudes of quasimonochromatic gravitational waves coming from three interesting targets, Scorpius X-1, SN 1987A and the Galactic Center, with best upper limits range from $h_0 < {\rm (1.7-2.1)} \times 10^{-25},$ a factor of $\geq 2.0$ improvement compared to previous stochastic radiometer searches.

preprint2022arXiv

Search for Axion(-like) Particles in Heavy-Ion Collisions

We propose a novel way to search for axion(-like) particles in heavy-ion collisions using prompt photons as the probe and the property of conversion between photon and axion(-like) particles under a strong magnetic field generated in the non-central collisions. The expected result reveals that a new phase space region of the coupling constant for photon and axion(-like) particles can be covered in the future high energy nuclear colliders.

preprint2022arXiv

Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification. While recent methods are mostly based on multi-stream architectures, we use group convolution to construct equivalent network architectures efficiently within a single-stream network. We further adopt and improve dynamic grouping convolution (DGConv) to make group convolution hyperparameters, and thus the overall network architecture, learnable during network training. The proposed method therefore can theoretically adjust any modern CNN models to any multi-source remote sensing data set, and can potentially avoid sub-optimal solutions caused by manually decided architecture hyperparameters. In the experiments, the proposed method is applied to ResNet and UNet, and the adjusted networks are verified on three very diverse benchmark data sets (i.e., Houston2018 data, Berlin data, and MUUFL data). Experimental results demonstrate the effectiveness of the proposed single-stream CNNs, and in particular ResNet18-DGConv improves the state-of-the-art classification overall accuracy (OA) on HS-SAR Berlin data set from $62.23\%$ to $68.21\%$. In the experiments we have two interesting findings. First, using DGConv generally reduces test OA variance. Second, multi-stream is harmful to model performance if imposed to the first few layers, but becomes beneficial if applied to deeper layers. Altogether, the findings imply that multi-stream architecture, instead of being a strictly necessary component in deep learning models for multi-source remote sensing data, essentially plays the role of model regularizer. Our code is publicly available at https://github.com/yyyyangyi/Multi-source-RS-DGConv. We hope our work can inspire novel research in the future.

preprint2022arXiv

Spectropolarimetry of the Thermonuclear Supernova 2021rhu: High Calcium Polarization 79 Days After Peak Luminosity

We report spectropolarimetric observations of the Type Ia supernova (SN) 2021rhu at four epochs: $-$7, +0, +36, and +79 days relative to its $B$-band maximum luminosity. A wavelength-dependent continuum polarization peaking at $3890 \pm 93$ Angstroms and reaching a level of $p_{\rm max}=1.78% \pm 0.02$% was found. The peak of the polarization curve is bluer than is typical in the Milky Way, indicating a larger proportion of small dust grains along the sightline to the SN. After removing the interstellar polarization, we found a pronounced increase of the polarization in the CaII near-infrared triplet, from $\sim$0.3% at day $-$7 to $\sim$2.5% at day +79. No temporal evolution in high-resolution flux spectra across the NaID and CaIIH&K features was seen from days +39 to +74, indicating that the late-time increase in polarization is intrinsic to the SN as opposed to being caused by scattering of SN photons in circumstellar or interstellar matter. We suggest that an explanation for the late-time rise of the CaII near-infrared triplet polarization may be the alignment of calcium atoms in a weak magnetic field through optical excitation/pumping by anisotropic radiation from the SN.

preprint2022arXiv

Spectropolarimetry of the tidal disruption event AT 2019qiz: a quasispherical reprocessing layer

We present optical spectropolarimetry of the tidal disruption event (TDE) AT 2019qiz on days $+0$ and $+29$ relative to maximum brightness. Continuum polarization, which informs the shape of the electron-scattering surface, was found to be consistent with 0 per cent at peak brightness. On day $+29$, the continuum polarization rose to $\sim 1$ per cent, making this the first reported spectropolarimetric evolution of a TDE. These findings are incompatible with a naked eccentric disc that lacks significant mass outflow. Instead, the spectropolarimetry paints a picture wherein, at maximum brightness, high-frequency emission from the accretion disc is reprocessed into the optical band by a nearly spherical, optically thick, electron-scattering photosphere located far away from the black hole. We estimate the radius of the scattering photosphere to be $\sim 100\rm\, au$ at maximum brightness -- significantly larger than the tidal radius ($\sim 1\rm\, au$) and the thermalisation radius ($\sim 30\rm\, au$) where the optical continuum is formed. A month later, as the fallback rate drops and the scattering photosphere recedes, the continuum polarization increases, revealing a moderately aspherical interior. We also see evidence for smaller-scale density variations in the scattering photosphere, inferred from the scatter of the data in the Stokes $q-u$ plane. On day $+29$, the H$α$ emission-line peak is depolarized to $\sim 0.3$ per cent (compared to $\sim 1$ per cent continuum polarization), and displays a gradual rise toward the line's redder wavelengths. This observation indicates the H$α$ line formed near the electron-scattering radius.

preprint2022arXiv

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

Voice conversion is to generate a new speech with the source content and a target voice style. In this paper, we focus on one general setting, i.e., non-parallel many-to-many voice conversion, which is close to the real-world scenario. As the name implies, non-parallel many-to-many voice conversion does not require the paired source and reference speeches and can be applied to arbitrary voice transfer. In recent years, Generative Adversarial Networks (GANs) and other techniques such as Conditional Variational Autoencoders (CVAEs) have made considerable progress in this field. However, due to the sophistication of voice conversion, the style similarity of the converted speech is still unsatisfactory. Inspired by the inherent structure of mel-spectrogram, we propose a new voice conversion framework, i.e., Subband-based Generative Adversarial Network for Voice Conversion (SGAN-VC). SGAN-VC converts each subband content of the source speech separately by explicitly utilizing the spatial characteristics between different subbands. SGAN-VC contains one style encoder, one content encoder, and one decoder. In particular, the style encoder network is designed to learn style codes for different subbands of the target speaker. The content encoder network can capture the content information on the source speech. Finally, the decoder generates particular subband content. In addition, we propose a pitch-shift module to fine-tune the pitch of the source speaker, making the converted tone more accurate and explainable. Extensive experiments demonstrate that the proposed approach achieves state-of-the-art performance on VCTK Corpus and AISHELL3 datasets both qualitatively and quantitatively, whether on seen or unseen data. Furthermore, the content intelligibility of SGAN-VC on unseen data even exceeds that of StarGANv2-VC with ASR network assistance.

preprint2022arXiv

The exact SL(K+3,C) symmetry of string theory

By using on-shell recursion relation of string scattering amplitudes (SSA), we show that all n-point SSA of the open bosonic string theory can be expressed in terms of the Lauricella functions. This result extends the previous exact SL(K+3,C) symmetry of the 4-point Lauricella SSA (LSSA) of three tachyons and one arbitrary string states to the whole tree-level open bosonic string theory. Moreover, we present three applications of the SL(K+3,C) symmetry on the SSA. They are the solvability of all SSA in terms of one amplitude, the existence of iteration relations among residues of a given SSA so as to soften its hard scattering behavior and finally the re-derivation of infinite linear relations among hard SSA [12].

preprint2022arXiv

The ringing of quantum corrected Schwarzschild black hole with GUP

Schwarzschild black holes with quantum corrections are studied under scalar field perturbations and electromagnetic field perturbations to analyze the effect of the correction term on the potential function and quasinormal mode (QNM). In classical general relativity, spacetime is continuous and there is no existence of the so-called minimal length. The introduction of the correction items of the generalized uncertainty principle (GUP), the parameter $β$, can change the singularity structure of the black hole gauge and may lead to discretization in time and space. We apply the sixth-order WKB method to approximate the QNM of Schwarzschild black holes with quantum corrections and perform numerical analysis to derive the results of the method. Also, we find that the effective potential and QNM in scalar fields are larger than those in electromagnetic fields.

preprint2022arXiv

The Type Icn SN 2021csp: Implications for the Origins of the Fastest Supernovae and the Fates of Wolf-Rayet Stars

We present observations of SN 2021csp, the second example of a newly-identified type of supernova (Type Icn) hallmarked by strong, narrow, P Cygni carbon features at early times. The SN appears as a fast and luminous blue transient at early times, reaching a peak absolute magnitude of -20 within 3 days due to strong interaction between fast SN ejecta (v ~ 30000 km/s) and a massive, dense, fast-moving C/O wind shed by the WC-like progenitor months before explosion. The narrow line features disappear from the spectrum 10-20 days after explosion and are replaced by a blue continuum dominated by broad Fe features, reminiscent of Type Ibn and IIn supernovae and indicative of weaker interaction with more extended H/He-poor material. The transient then abruptly fades ~60 days post-explosion when interaction ceases. Deep limits at later phases suggest minimal heavy-element nucleosynthesis, a low ejecta mass, or both, and imply an origin distinct from that of classical Type Ic supernovae. We place SN 2021csp in context with other fast-evolving interacting transients, and discuss various progenitor scenarios: an ultrastripped progenitor star, a pulsational pair-instability eruption, or a jet-driven fallback supernova from a Wolf-Rayet star. The fallback scenario would naturally explain the similarity between these events and radio-loud fast transients, and suggests a picture in which most stars massive enough to undergo a WR phase collapse directly to black holes at the end of their lives.

preprint2022arXiv

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Backdoor attacks pose a new threat to NLP models. A standard strategy to construct poisoned data in backdoor attacks is to insert triggers (e.g., rare words) into selected sentences and alter the original label to a target label. This strategy comes with a severe flaw of being easily detected from both the trigger and the label perspectives: the trigger injected, which is usually a rare word, leads to an abnormal natural language expression, and thus can be easily detected by a defense model; the changed target label leads the example to be mistakenly labeled and thus can be easily detected by manual inspections. To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled. The core idea of the proposed strategy is to construct clean-labeled examples, whose labels are correct but can lead to test label changes when fused with the training set. To generate poisoned clean-labeled examples, we propose a sentence generation model based on the genetic algorithm to cater to the non-differentiable characteristic of text data. Extensive experiments demonstrate that the proposed attacking strategy is not only effective, but more importantly, hard to defend due to its triggerless and clean-labeled nature. Our work marks the first step towards developing triggerless attacking strategies in NLP.

preprint2022arXiv

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation. NAS has been explosively studied to automate the discovery of top-performer neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works start to explore indicators that can predict a network's performance without training. However, they either leveraged limited properties of deep networks, or the benefits of their training-free indicators are not applied to more extensive search methods. By rigorous correlation analysis, we present a unified framework to understand and accelerate NAS, by disentangling "TEG" characteristics of searched networks - Trainability, Expressivity, Generalization - all assessed in a training-free manner. The TEG indicators could be scaled up and integrated with various NAS search methods, including both supernet and single-path approaches. Extensive studies validate the effective and efficient guidance from our TEG-NAS framework, leading to both improved search accuracy and over 56% reduction in search time cost. Moreover, we visualize search trajectories on three landscapes of "TEG" characteristics, observing that while a good local minimum is easier to find on NAS-Bench-201 given its simple topology, balancing "TEG" characteristics is much harder on the DARTS search space due to its complex landscape geometry. Our code is available at https://github.com/VITA-Group/TEGNAS.

preprint2022arXiv

Unified Transformer Tracker for Object Tracking

As an important area in computer vision, object tracking has formed two separate communities that respectively study Single Object Tracking (SOT) and Multiple Object Tracking (MOT). However, current methods in one tracking scenario are not easily adapted to the other due to the divergent training datasets and tracking objects of both tasks. Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking. In this work, we present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both SOT and MOT. The correlation between the target and tracking frame features is exploited to localize the target. We demonstrate that both SOT and MOT tasks can be solved within this framework. The model can be simultaneously end-to-end trained by alternatively optimizing the SOT and MOT objectives on the datasets of individual tasks. Extensive experiments are conducted on several benchmarks with a unified model trained on SOT and MOT datasets. Code will be available at https://github.com/Flowerfan/Trackron.

preprint2022arXiv

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

Product retrieval is of great importance in the ecommerce domain. This paper introduces our 1st-place solution in eBay eProduct Visual Search Challenge (FGVC9), which is featured for an ensemble of about 20 models from vision models and vision-language models. While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority. Specifically, for the vision models, we use a two-stage training pipeline which first learns from the coarse labels provided in the training set and then conducts fine-grained self-supervised training, yielding a coarse-to-fine metric learning manner. For the vision-language models, we use the textual description of the training image as the supervision signals for fine-tuning the image-encoder (feature extractor). With these designs, our solution achieves 0.7623 MAR@10, ranking the first place among all the competitors. The code is available at: \href{https://github.com/WangWenhao0716/V2L}{V$^2$L}.

preprint2022arXiv

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

One fundamental challenge of vehicle re-identification (re-id) is to learn robust and discriminative visual representation, given the significant intra-class vehicle variations across different camera views. As the existing vehicle datasets are limited in terms of training images and viewpoints, we propose to build a unique large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets, and design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet. The first stage of our approach is to learn the generic representation for all domains (i.e., source vehicle datasets) by training with the conventional classification loss. This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain. The second stage is to fine-tune the trained model purely based on the target vehicle set, by minimizing the distribution discrepancy between our VehicleNet and any target domain. We discuss our proposed multi-source dataset VehicleNet and evaluate the effectiveness of the two-stage progressive representation learning through extensive experiments. We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge, and competitive results on two other public vehicle re-id datasets, i.e., VeRi-776 and VehicleID. We hope this new VehicleNet dataset and the learned robust representations can pave the way for vehicle re-id in the real-world environments.

preprint2022arXiv

Visual Abductive Reasoning

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, Reasoner (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative representations for the premise and hypothesis. Then, multiple decoders are cascaded to generate and progressively refine the premise and hypothesis sentences. The prediction scores of the sentences are used to guide cross-sentence information flow in the cascaded reasoning procedure. Our VAR benchmarking results show that Reasoner surpasses many famous video-language models, while still being far behind human performance. This work is expected to foster future efforts in the reasoning-beyond-observation paradigm.

preprint2021arXiv

A general framework for scintillation in nanophotonics

Bombardment of materials by high-energy particles (e.g., electrons, nuclei, X- and $γ$-ray photons) often leads to light emission, known generally as scintillation. Scintillation is ubiquitous and enjoys widespread applications in many areas such as medical imaging, X-ray non-destructive inspection, night vision, electron microscopy, and high-energy particle detectors. A large body of research focuses on finding new materials optimized for brighter, faster, and more controlled scintillation. Here, we develop a fundamentally different approach based on integrating nanophotonic structures into scintillators to enhance their emission. To start, we develop a unified and ab initio theory of nanophotonic scintillators that accounts for the key aspects of scintillation: the energy loss by high-energy particles, as well as the light emission by non-equilibrium electrons in arbitrary nanostructured optical systems. This theoretical framework allows us, for the first time, to experimentally demonstrate nearly an order-of-magnitude enhancement of scintillation, in both electron-induced, and X-ray-induced scintillation. Our theory also allows the discovery of structures that could eventually achieve several orders-of-magnitude scintillation enhancement. The framework and results shown here should enable the development of a new class of brighter, faster, and higher-resolution scintillators with tailored and optimized performances - with many potential applications where scintillators are used.

preprint2021arXiv

A Survey on Concept Factorization: From Shallow to Deep Representation Learning

The quality of learned features by representation learning determines the performance of learning algorithms and the related application tasks (such as high-dimensional data clustering). As a relatively new paradigm for representation learning, Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining for over a decade. Lots of effective CF based methods have been proposed based on different perspectives and properties, but note that it still remains not easy to grasp the essential connections and figure out the underlying explanatory factors from exiting studies. In this paper, we therefore survey the recent advances on CF methodologies and the potential benchmarks by categorizing and summarizing the current methods. Specifically, we first re-view the root CF method, and then explore the advancement of CF-based representation learning ranging from shallow to deep/multilayer cases. We also introduce the potential application areas of CF-based methods. Finally, we point out some future directions for studying the CF-based representation learning. Overall, this survey provides an insightful overview of both theoretical basis and current developments in the field of CF, which can also help the interested researchers to understand the current trends of CF and find the most appropriate CF techniques to deal with particular applications.

preprint2021arXiv

Bilinear equations in Darboux transformations by Boson-Fermion correspondence

Bilinear equation is an important property for integrable nonlinear evolution equation. Many famous research objects in mathematical physics, such as Gromov-Witten invariants, can be described in terms of bilinear equations to show their connections with the integrable systems. Here in this paper, we mainly discuss the bilinear equations of the transformed tau functions under the successive applications of the Darboux transformations for the KP hierarchy, the modified KP hierarchy (Kupershmidt-Kiso version) and the BKP hierarchy, by the method of the Boson-Fermion correspondence. The Darboux transformations are considered in the Fermionic picture, by multiplying the different Fermionic fields on the tau functions. Here the Fermionic fields are corresponding to the (adjoint) eigenfunctions, whose changes under the Darboux transformations are showed to be the ones of the squared eigenfunction potentials in the Bosonic picture, used in the spectral representations of the (adjoint) eigenfunctions. Then the successive applications of the Darboux transformations are given in the Fermionic picture. Based upon this, some new bilinear equations in the Darboux chain are derived, besides the ones of $(l-l')$ -th modified KP hierarchy. The corresponding examples of these new bilinear equations are given.

preprint2021arXiv

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search

The goal of person search is to localize and match query persons from scene images. For high efficiency, one-step methods have been developed to jointly handle the pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. One is the mutual interference between the optimization objectives of multiple sub-tasks. The other is the sub-optimal identification feature learning caused by small batch size when end-to-end training. To overcome these problems, we propose a decoupled and memory-reinforced network (DMRNet). Specifically, to reconcile the conflicts of multiple objectives, we simplify the standard tightly coupled pipelines and establish a deeply decoupled multi-task learning framework. Further, we build a memory-reinforced mechanism to boost the identification feature learning. By queuing the identification features of recently accessed instances into a memory bank, the mechanism augments the similarity pair construction for pairwise metric learning. For better encoding consistency of the stored features, a slow-moving average of the network is applied for extracting these features. In this way, the dual networks reinforce each other and converge to robust solution states. Experimentally, the proposed method obtains 93.2% and 46.9% mAP on CUHK-SYSU and PRW datasets, which exceeds all the existing one-step methods.

preprint2021arXiv

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. It is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection field, which allows explicitly associating dense human semantics with sparse keypoints, is learnt and progressively improved over the network feature pyramid for robustness. Then, the difficult pixel grouping problem is cast as an easier, multi-person joint assembling task. By formulating joint association as maximum-weight bipartite matching, a differentiable solution is developed to exploit projected gradient descent and Dykstra's cyclic projection algorithm. This makes our method end-to-end trainable and allows back-propagating the grouping error to directly supervise multi-granularity human representation learning. This is distinguished from current bottom-up human parsers or pose estimators which require sophisticated post-processing or heuristic greedy algorithms. Experiments on three instance-aware human parsing datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.

preprint2021arXiv

Instance-Invariant Domain Adaptive Object Detection via Progressive Disentanglement

Most state-of-the-art methods of object detection suffer from poor generalization ability when the training and test data are from different domains, e.g., with different styles. To address this problem, previous methods mainly use holistic representations to align feature-level and pixel-level distributions of different domains, which may neglect the instance-level characteristics of objects in images. Besides, when transferring detection ability across different domains, it is important to obtain the instance-level features that are domain-invariant, instead of the styles that are domain-specific. Therefore, in order to extract instance-invariant features, we should disentangle the domain-invariant features from the domain-specific features. To this end, a progressive disentangled framework is first proposed to solve domain adaptive object detection. Particularly, base on disentangled learning used for feature decomposition, we devise two disentangled layers to decompose domain-invariant and domain-specific features. And the instance-invariant features are extracted based on the domain-invariant features. Finally, to enhance the disentanglement, a three-stage training mechanism including multiple loss functions is devised to optimize our model. In the experiment, we verify the effectiveness of our method on three domain-shift scenes. Our method is separately 2.3\%, 3.6\%, and 4.0\% higher than the baseline method \cite{saito2019strong}.

preprint2021arXiv

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

People can easily imagine the potential sound while seeing an event. This natural synchronization between audio and visual signals reveals their intrinsic correlations. To this end, we propose to learn the audio-visual correlations from the perspective of cross-modal generation in a self-supervised manner, the learned correlations can be then readily applied in multiple downstream tasks such as the audio-visual cross-modal localization and retrieval. We introduce a novel Variational AutoEncoder (VAE) framework that consists of Multiple encoders and a Shared decoder (MS-VAE) with an additional Wasserstein distance constraint to tackle the problem. Extensive experiments demonstrate that the optimized latent representation of the proposed MS-VAE can effectively learn the audio-visual correlations and can be readily applied in multiple audio-visual downstream tasks to achieve competitive performance even without any given label information during training.

preprint2021arXiv

Learning to Anticipate Egocentric Actions by Imagination

Anticipating actions before they are executed is crucial for a wide range of practical applications, including autonomous driving and robotics. In this paper, we study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos. Previous approaches focus on summarizing the observed content and directly predicting future action based on past observations. We believe it would benefit the action anticipation if we could mine some cues to compensate for the missing information of the unobserved frames. We then propose to decompose the action anticipation into a series of future feature predictions. We imagine how the visual feature changes in the near future and then predicts future action labels based on these imagined representations. Differently, our ImagineRNN is optimized in a contrastive learning way instead of feature regression. We utilize a proxy task to train the ImagineRNN, i.e., selecting the correct future states from distractors. We further improve ImagineRNN by residual anticipation, i.e., changing its target to predicting the feature difference of adjacent frames instead of the frame content. This promotes the network to focus on our target, i.e., the future action, as the difference between adjacent frame features is more important for forecasting the future. Extensive experiments on two large-scale egocentric action datasets validate the effectiveness of our method. Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.

preprint2021arXiv

Modeling the Probabilistic Distribution of Unlabeled Data forOne-shot Medical Image Segmentation

Existing image segmentation networks mainly leverage large-scale labeled datasets to attain high accuracy. However, labeling medical images is very expensive since it requires sophisticated expert knowledge. Thus, it is more desirable to employ only a few labeled data in pursuing high segmentation performance. In this paper, we develop a data augmentation method for one-shot brain magnetic resonance imaging (MRI) image segmentation which exploits only one labeled MRI image (named atlas) and a few unlabeled images. In particular, we propose to learn the probability distributions of deformations (including shapes and intensities) of different unlabeled MRI images with respect to the atlas via 3D variational autoencoders (VAEs). In this manner, our method is able to exploit the learned distributions of image deformations to generate new authentic brain MRI images, and the number of generated samples will be sufficient to train a deep segmentation network. Furthermore, we introduce a new standard segmentation benchmark to evaluate the generalization performance of a segmentation network through a cross-dataset setting (collected from different sources). Extensive experiments demonstrate that our method outperforms the state-of-the-art one-shot medical segmentation methods. Our code has been released at https://github.com/dyh127/Modeling-the-Probabilistic-Distribution-of-Unlabeled-Data.

preprint2021arXiv

One-Shot Neural Architecture Search via Self-Evaluated Template Network

Neural architecture search (NAS) aims to automate the search procedure of architecture instead of manual design. Even if recent NAS approaches finish the search within days, lengthy training is still required for a specific architecture candidate to get the parameters for its accurate evaluation. Recently one-shot NAS methods are proposed to largely squeeze the tedious training process by sharing parameters across candidates. In this way, the parameters for each candidate can be directly extracted from the shared parameters instead of training them from scratch. However, they have no sense of which candidate will perform better until evaluation so that the candidates to evaluate are randomly sampled and the top-1 candidate is considered the best. In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates. SETN consists of two components: (1) an evaluator, which learns to indicate the probability of each individual architecture being likely to have a lower validation loss. The candidates for evaluation can thus be selectively sampled according to this evaluator. (2) a template network, which shares parameters among all candidates to amortize the training cost of generated candidates. In experiments, the architecture found by SETN achieves state-of-the-art performance on CIFAR and ImageNet benchmarks within comparable computation costs. Code is publicly available on GitHub: https://github.com/D-X-Y/AutoDL-Projects.

preprint2021arXiv

Sketch-Guided Scenery Image Outpainting

The outpainting results produced by existing approaches are often too random to meet users' requirement. In this work, we take the image outpainting one step forward by allowing users to harvest personal custom outpainting results using sketches as the guidance. To this end, we propose an encoder-decoder based network to conduct sketch-guided outpainting, where two alignment modules are adopted to impose the generated content to be realistic and consistent with the provided sketches. First, we apply a holistic alignment module to make the synthesized part be similar to the real one from the global view. Second, we reversely produce the sketches from the synthesized part and encourage them be consistent with the ground-truth ones using a sketch alignment module. In this way, the learned generator will be imposed to pay more attention to fine details and be sensitive to the guiding sketches. To our knowledge, this work is the first attempt to explore the challenging yet meaningful conditional scenery image outpainting. We conduct extensive experiments on two collected benchmarks to qualitatively and quantitatively validate the effectiveness of our approach compared with the other state-of-the-art generative models.

preprint2021arXiv

Supervision by Registration and Triangulation for Landmark Detection

We present Supervision by Registration and Triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utilize unlabeled data, there are two key observations: (1) the detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. (2) the detections of the same landmark in multiple synchronized and geometrically calibrated views should correspond to a single 3D point, i.e., multi-view consistency. Registration and multi-view consistency are sources of supervision that do not require manual labeling, thus it can be leveraged to augment existing training data during detector training. End-to-end training is made possible by differentiable registration and 3D triangulation modules. Experiments with 11 datasets and a newly proposed metric to measure precision demonstrate accuracy and precision improvements in landmark detection on both images and video. Code is available at https://github.com/D-X-Y/landmark-detection.

preprint2021arXiv

Toggling Near-field Directionality via Polarization Control of Surface Waves

Directional excitation of guidance modes is central to many applications ranging from light harvesting, optical information processing to quantum optical technology. Of paramount interest, especially, the active control of near-field directionality provides a new paradigm for the real-time on-chip manipulation of light. Here we find that for a given dipolar source, its near-field directionality can be toggled efficiently via tailoring the polarization of surface waves that are excited, for example, via tuning the chemical potential of graphene in a graphene-metasurface waveguide. This finding enables a feasible scheme for the active near-field directionality. Counterintuitively, we reveal that this scheme can transform a circular electric/magnetic dipole into a Huygens dipole in the near-field coupling. Moreover, for Janus dipoles, this scheme enables us to actively flip their near-field coupling and non-coupling faces.

preprint2020arXiv

Adaptive Exploration for Unsupervised Person Re-Identification

Due to domain bias, directly deploying a deep person re-identification (re-ID) model trained on one dataset often achieves considerably poor accuracy on another dataset. In this paper, we propose an Adaptive Exploration (AE) method to address the domain-shift problem for re-ID in an unsupervised manner. Specifically, in the target domain, the re-ID model is inducted to 1) maximize distances between all person images and 2) minimize distances between similar person images. In the first case, by treating each person image as an individual class, a non-parametric classifier with a feature memory is exploited to encourage person images to move far away from each other. In the second case, according to a similarity threshold, our method adaptively selects neighborhoods for each person image in the feature space. By treating these similar person images as the same class, the non-parametric classifier forces them to stay closer. However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods. As a result, a minority of images may select a large number of neighborhoods while a majority of images have only a few neighborhoods. To address this issue, we additionally integrate a balance strategy into the adaptive selection. We evaluate our methods with two protocols. The first one is called "target-only re-ID", in which only the unlabeled target data is used for training. The second one is called "domain adaptive re-ID", in which both the source data and the target data are used during training. Experimental results on large-scale re-ID datasets demonstrate the effectiveness of our method. Our code has been released at https://github.com/dyh127/Adaptive-Exploration-for-Unsupervised-Person-Re-Identification.

preprint2020arXiv

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

We aim at the problem named One-Shot Unsupervised Domain Adaptation. Unlike traditional Unsupervised Domain Adaptation, it assumes that only one unlabeled target sample can be available when learning to adapt. This setting is realistic but more challenging, in which conventional adaptation approaches are prone to failure due to the scarce of unlabeled target data. To this end, we propose a novel Adversarial Style Mining approach, which combines the style transfer module and task-specific module into an adversarial manner. Specifically, the style transfer module iteratively searches for harder stylized images around the one-shot target sample according to the current learning state, leading the task model to explore the potential styles that are difficult to solve in the almost unseen target domain, thus boosting the adaptation performance in a data-scarce scenario. The adversarial learning framework makes the style transfer module and task-specific module benefit each other during the competition. Extensive experiments on both cross-domain classification and segmentation benchmarks verify that ASM achieves state-of-the-art adaptation performance under the challenging one-shot setting.

preprint2020arXiv

Analytic Study of Magnetic Catalysis in Holographic QCD

We explore the effect of the magnetic field on the QCD phase transition through AdS/CFT correspondence. By introducing an anisotropic magnetic field in the Einstein-Maxwell-Scalar system, a family of analytic solutions is obtained by the potential reconstruction method where the contribution of the magnetic field in the blackening background can be analytically derived. After imposing the kinetic gauge function by requesting the linear Regge spectrum of $J/ψ$ mesons, the contribution of the magnetic field phase diagram can be demonstrated. The results show that the transition temperature will be raising as the magnetic field increases, which is the so-call magnetic catalysis effect. However, if the system is in a strong enough magnetic environment, the transition temperature will be cool down and display the inverse catalysis effect.

preprint2020arXiv

Analytic Study on Chiral Phase Transition in Holographic QCD

The chiral symmetry breaking ($χ_{SB}$) is one of the most fundamental problems in QCD. In this paper, we calculate quark condensation analytically in a holographic QCD model dual to the Einstein-Maxwell-Dilaton (EMD) system coupled to a probe scalar field. We find that the black hole phase transition in the EMD system seriously affects $χ_{SB}$. At small chemical potential, $χ_{SB}$ behaves as a crossover. For large chemical potential $μ>μ_c$, $χ_{SB}$ becomes first order with exactly the same transition temperature as the black hole phase transition by a bypass mechanism. The phase diagram we obtained is qualitatively consistent with the recent results from lattice QCD simulations and NJL models.

preprint2020arXiv

Angle-Based Cost-Sensitive Multicategory Classification

Many real-world classification problems come with costs which can vary for different types of misclassification. It is thus important to develop cost-sensitive classifiers which minimize the total misclassification cost. Although binary cost-sensitive classifiers have been well-studied, solving multicategory classification problems is still challenging. A popular approach to address this issue is to construct K classification functions for a K-class problem and remove the redundancy by imposing a sum-to-zero constraint. However, such method usually results in higher computational complexity and inefficient algorithms. In this paper, we propose a novel angle-based cost-sensitive classification framework for multicategory classification without the sum-to-zero constraint. Loss functions that included in the angle-based cost-sensitive classification framework are further justified to be Fisher consistent. To show the usefulness of the framework, two cost-sensitive multicategory boosting algorithms are derived as concrete instances. Numerical experiments demonstrate that proposed boosting algorithms yield competitive classification performances against other existing boosting approaches.

preprint2020arXiv

Collaborative Video Object Segmentation by Foreground-Background Integration

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation. Different from previous practices that only explore the embedding learning using pixels from foreground object (s), we consider background should be equally treated and thus propose Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach. Our CFBI implicitly imposes the feature embedding from the target foreground object and its corresponding background to be contrastive, promoting the segmentation results accordingly. With the feature embedding from both foreground and background, our CFBI performs the matching process between the reference and the predicted sequence from both pixel and instance levels, making the CFBI be robust to various object scales. We conduct extensive experiments on three popular benchmarks, i.e., DAVIS 2016, DAVIS 2017, and YouTube-VOS. Our CFBI achieves the performance (J$F) of 89.4%, 81.9%, and 81.4%, respectively, outperforming all the other state-of-the-art methods. Code: https://github.com/z-x-yang/CFBI.

preprint2020arXiv

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources. To this end, in this paper, we introduce a new task called video description via two multi-modal cooperative dialog agents, whose ultimate goal is for one conversational agent to describe an unseen video based on the dialog and two static frames. Specifically, one of the intelligent agents - Q-BOT - is given two static frames from the beginning and the end of the video, as well as a finite number of opportunities to ask relevant natural language questions before describing the unseen video. A-BOT, the other agent who has already seen the entire video, assists Q-BOT to accomplish the goal by providing answers to those questions. We propose a QA-Cooperative Network with a dynamic dialog history update learning mechanism to transfer knowledge from A-BOT to Q-BOT, thus helping Q-BOT to better describe the video. Extensive experiments demonstrate that Q-BOT can effectively learn to describe an unseen video by the proposed model and the cooperative learning method, achieving the promising performance where Q-BOT is given the full ground truth history dialog.

preprint2020arXiv

Dialog Intent Induction with Deep Multi-View Clustering

We introduce the dialog intent induction task and present a novel deep multi-view clustering approach to tackle the problem. Dialog intent induction aims at discovering user intents from user query utterances in human-human conversations such as dialogs between customer support agents and customers. Motivated by the intuition that a dialog intent is not only expressed in the user query utterance but also captured in the rest of the dialog, we split a conversation into two independent views and exploit multi-view clustering techniques for inducing the dialog intent. In particular, we propose alternating-view k-means (AV-KMEANS) for joint multi-view representation learning and clustering analysis. The key innovation is that the instance-view representations are updated iteratively by predicting the cluster assignment obtained from the alternative view, so that the multi-view representations of the instances lead to similar cluster assignments. Experiments on two public datasets show that AV-KMEANS can induce better dialog intent clusters than state-of-the-art unsupervised representation learning methods and standard multi-view clustering approaches.

preprint2020arXiv

Direct measurement of temporal correlations above the spin-glass transition by coherent resonant magnetic x-ray spectroscopy

In the 1970s a new paradigm was introduced that interacting quenched systems, such as a spin-glass, have a phase transition in which long time memory of spatial patterns is realized without spatial correlations. The principal methods to study the spin-glass transition, besides some elaborate and elegant theoretical constructions, have been numerical computer simulations and neutron spin echo measurements . We show here that the dynamical correlations of the spin-glass transition are embedded in measurements of the four-spin correlations at very long times. This information is directly available in the temporal correlations of the intensity, which encode the spin-orientation memory, obtained by the technique of resonant magnetic x-ray photon correlation spectroscopy (RM- XPCS). We have implemented this method to observe and accurately characterize the critical slowing down of the spin orientation fluctuations in the classic metallic spin glass alloy Cu(Mn) over time scales of 1 to 1000 secs. Our method opens the way for studying phase transitions in systems such as spin ices, and quantum spin liquids, as well as the structural glass transition.

preprint2020arXiv

DONet: Dual Objective Networks for Skin Lesion Segmentation

Skin lesion segmentation is a crucial step in the computer-aided diagnosis of dermoscopic images. In the last few years, deep learning based semantic segmentation methods have significantly advanced the skin lesion segmentation results. However, the current performance is still unsatisfactory due to some challenging factors such as large variety of lesion scale and ambiguous difference between lesion region and background. In this paper, we propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation. Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives. Concretely, the two objectives are actually defined by different loss functions. In this way, the two decoders are encouraged to produce differentiated probability maps to match different optimization targets, resulting in complementary predictions accordingly. The complementary information learned by these two objectives are further aggregated together to make the final prediction, by which the uncertainty existing in segmentation maps can be significantly alleviated. Besides, to address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM) to model the complex correlation among skin lesions, where the features with different scale contexts are efficiently integrated to form a more robust representation. Extensive experiments on two popular benchmarks well demonstrate the effectiveness of the proposed DONet. In particular, our DONet achieves 0.881 and 0.931 dice score on ISIC 2018 and $\text{PH}^2$, respectively. Code will be made public available.

preprint2020arXiv

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

Though action recognition in videos has achieved great success recently, it remains a challenging task due to the massive computational cost. Designing lightweight networks is a possible solution, but it may degrade the recognition performance. In this paper, we innovatively propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos. The dynamic inference approach can be achieved from aspects of the network depth and the number of input video frames, or even in a joint input-wise and network depth-wise manner. In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module. The inference is carried out progressively on the grid by following some predefined route, whenever the inference process comes across a checkpoint, an early prediction can be made depending on whether the early stop criteria meets. For the proof-of-concept purpose, we instantiate three dynamic inference frameworks using two well-known backbone CNNs. In these instances, we overcome the drawback of limited temporal coverage resulted from an early prediction by a novel frame permutation scheme, and alleviate the conflict between progressive computation and video temporal relation modeling by introducing an online temporal shift module. Extensive experiments are conducted to thoroughly analyze the effectiveness of our ideas and to inspire future research efforts. Results on various datasets also evident the superiority of our approach.

preprint2020arXiv

Early Ultra-Violet observations of type IIn supernovae constrain the asphericity of their circumstellar material

We present a survey of the early evolution of 12 Type IIn supernovae (SNe IIn) in the Ultra-Violet (UV) and visible light. We use this survey to constrain the geometry of the circumstellar material (CSM) surrounding SN IIn explosions, which may shed light on their progenitor diversity. In order to distinguish between aspherical and spherical circumstellar material (CSM), we estimate the blackbody radius temporal evolution of the SNe IIn of our sample, following the method introduced by Soumagnac et al. We find that higher luminosity objects tend to show evidence for aspherical CSM. Depending on whether this correlation is due to physical reasons or to some selection bias, we derive a lower limit between 35% and 66% on the fraction of SNe IIn showing evidence for aspherical CSM. This result suggests that asphericity of the CSM surrounding SNe IIn is common - consistent with data from resolved images of stars undergoing considerable mass loss. It should be taken into account for more realistic modelling of these events.

preprint2020arXiv

FinBERT: A Pretrained Language Model for Financial Communications

Contextual pretrained language models, such as BERT (Devlin et al., 2019), have made significant breakthrough in various NLP tasks by training on large scale of unlabeled text re-sources.Financial sector also accumulates large amount of financial communication text.However, there is no pretrained finance specific language models available. In this work,we address the need by pretraining a financial domain specific BERT models, FinBERT, using a large scale of financial communication corpora. Experiments on three financial sentiment classification tasks confirm the advantage of FinBERT over generic domain BERT model. The code and pretrained models are available at https://github.com/yya518/FinBERT. We hope this will be useful for practitioners and researchers working on financial NLP tasks.

preprint2020arXiv

Gated Channel Transformation for Visual Recognition

In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

preprint2020arXiv

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC), that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We term this process as structure completion, which is realized by multi-grained reasoning blocks in our model. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. This can be true since the annotated captions for an image are often semantically equivalent in existing datasets, and thus there is only one paired text for a masked image in training. We devise an unsupervised unpaired-creation learning path besides the over-explored paired-reconstruction path, as well as a multi-stage training strategy to mitigate the insufficiency of labeled data. We conduct extensive quantitative and qualitative experiments as well as ablation studies, which reveal the efficacy of our proposed LSIC.

preprint2020arXiv

Inter-Image Communication for Weakly Supervised Localization

Weakly supervised localization aims at finding target object regions using only image-level supervision. However, localization maps extracted from classification networks are often not accurate due to the lack of fine pixel-level supervision. In this paper, we propose to leverage pixel-level similarities across different objects for learning more accurate object locations in a complementary way. Particularly, two kinds of constraints are proposed to prompt the consistency of object features within the same categories. The first constraint is to learn the stochastic feature consistency among discriminative pixels that are randomly sampled from different images within a batch. The discriminative information embedded in one image can be leveraged to benefit its counterpart with inter-image communication. The second constraint is to learn the global consistency of object features throughout the entire dataset. We learn a feature center for each category and realize the global feature consistency by forcing the object features to approach class-specific centers. The global centers are actively updated with the training process. The two constraints can benefit each other to learn consistent pixel-level features within the same categories, and finally improve the quality of localization maps. We conduct extensive experiments on two popular benchmarks, i.e., ILSVRC and CUB-200-2011. Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set, surpassing the current state-of-the-art method by a large margin. The code is available at https://github.com/xiaomengyc/I2C.

preprint2020arXiv

Lane Detection in Low-light Conditions Using an Efficient Data Enhancement : Light Conditions Style Transfer

Nowadays, deep learning techniques are widely used for lane detection, but application in low-light conditions remains a challenge until this day. Although multi-task learning and contextual-information-based methods have been proposed to solve the problem, they either require additional manual annotations or introduce extra inference overhead respectively. In this paper, we propose a style-transfer-based data enhancement method, which uses Generative Adversarial Networks (GANs) to generate images in low-light conditions, that increases the environmental adaptability of the lane detector. Our solution consists of three parts: the proposed SIM-CycleGAN, light conditions style transfer and lane detection network. It does not require additional manual annotations nor extra inference overhead. We validated our methods on the lane detection benchmark CULane using ERFNet. Empirically, lane detection model trained using our method demonstrated adaptability in low-light conditions and robustness in complex scenarios. Our code for this paper will be publicly available.

preprint2020arXiv

Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning

We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset. Our framework considers cooperative optimization of shared weights between models for source and target tasks, and adjusts the constituent loss weights adaptively. The adaptation of the weights is based on a reinforcement learning (RL) selection policy, guided with a performance metric on the target validation set. We demonstrate that L2TL outperforms fine-tuning baselines and other adaptive transfer learning methods on eight datasets. In the regimes of small-scale target datasets and significant label mismatch between source and target datasets, L2TL shows particularly large benefits.

preprint2020arXiv

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions. Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage. In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net integrates the interaction and the propagation operations into a single network, which significantly promotes the efficiency of iVOS in the scheme of multi-round interactions. More importantly, we propose a simple yet effective memory aggregation mechanism to record the informative knowledge from the previous interaction rounds, improving the robustness in discovering challenging objects of interest greatly. We conduct extensive experiments on the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net achieves the J@60 score of 76.1% without any bells and whistles, outperforming the state-of-the-arts with more than 2.7%.

preprint2020arXiv

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search

Neural architecture search (NAS) has achieved breakthrough success in a great number of applications in the past few years. It could be time to take a step back and analyze the good and bad aspects in the field of NAS. A variety of algorithms search architectures under different search space. These searched architectures are trained using different setups, e.g., hyper-parameters, data augmentation, regularization. This raises a comparability problem when comparing the performance of various NAS algorithms. NAS-Bench-101 has shown success to alleviate this problem. In this work, we propose an extension to NAS-Bench-101: NAS-Bench-201 with a different search space, results on multiple datasets, and more diagnostic information. NAS-Bench-201 has a fixed search space and provides a unified benchmark for almost any up-to-date NAS algorithms. The design of our search space is inspired from the one used in the most popular cell-based searching algorithms, where a cell is represented as a DAG. Each edge here is associated with an operation selected from a predefined operation set. For it to be applicable for all NAS algorithms, the search space defined in NAS-Bench-201 includes all possible architectures generated by 4 nodes and 5 associated operation options, which results in 15,625 candidates in total. The training log and the performance for each architecture candidate are provided for three datasets. This allows researchers to avoid unnecessary repetitive training for selected candidate and focus solely on the search algorithm itself. The training time saved for every candidate also largely improves the efficiency of many methods. We provide additional diagnostic information such as fine-grained loss and accuracy, which can give inspirations to new designs of NAS algorithms. In further support, we have analyzed it from many aspects and benchmarked 10 recent NAS algorithms.

preprint2020arXiv

One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets

Named entity recognition (NER) is a fundamental component in the modern language understanding pipeline. Public NER resources such as annotated data and model services are available in many domains. However, given a particular downstream application, there is often no single NER resource that supports all the desired entity types, so users must leverage multiple resources with different tag sets. This paper presents a marginal distillation (MARDI) approach for training a unified NER model from resources with disjoint or heterogeneous tag sets. In contrast to recent works, MARDI merely requires access to pre-trained models rather than the original training datasets. This flexibility makes it easier to work with sensitive domains like healthcare and finance. Furthermore, our approach is general enough to integrate with different NER architectures, including local models (e.g., BiLSTM) and global models (e.g., CRF). Experiments on two benchmark datasets show that MARDI performs on par with a strong marginal CRF baseline, while being more flexible in the form of required NER resources. MARDI also sets a new state of the art on the progressive NER task. MARDI significantly outperforms the start-of-the-art model on the task of progressive NER.

preprint2020arXiv

OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in An Open World

In this paper, we tackle the problem of discovering new classes in unlabeled visual data given labeled data from disjoint classes. Existing methods typically first pre-train a model with labeled data, and then identify new classes in unlabeled data via unsupervised clustering. However, the labeled data that provide essential knowledge are often underexplored in the second step. The challenge is that the labeled and unlabeled examples are from non-overlapping classes, which makes it difficult to build the learning relationship between them. In this work, we introduce OpenMix to mix the unlabeled examples from an open set and the labeled examples from known classes, where their non-overlapping labels and pseudo-labels are simultaneously mixed into a joint label distribution. OpenMix dynamically compounds examples in two ways. First, we produce mixed training images by incorporating labeled examples with unlabeled examples. With the benefits of unique prior knowledge in novel class discovery, the generated pseudo-labels will be more credible than the original unlabeled predictions. As a result, OpenMix helps to prevent the model from overfitting on unlabeled samples that may be assigned with wrong pseudo-labels. Second, the first way encourages the unlabeled examples with high class-probabilities to have considerable accuracy. We introduce these examples as reliable anchors and further integrate them with unlabeled samples. This enables us to generate more combinations in unlabeled examples and exploit finer object relations among the new classes. Experiments on three classification datasets demonstrate the effectiveness of the proposed OpenMix, which is superior to state-of-the-art methods in novel class discovery.

preprint2020arXiv

Progressive Local Filter Pruning for Image Retrieval Acceleration

This paper focuses on network pruning for image retrieval acceleration. Prevailing image retrieval works target at the discriminative feature learning, while little attention is paid to how to accelerate the model inference, which should be taken into consideration in real-world practice. The challenge of pruning image retrieval models is that the middle-level feature should be preserved as much as possible. Such different requirements of the retrieval and classification model make the traditional pruning methods not that suitable for our task. To solve the problem, we propose a new Progressive Local Filter Pruning (PLFP) method for image retrieval acceleration. Specifically, layer by layer, we analyze the local geometric properties of each filter and select the one that can be replaced by the neighbors. Then we progressively prune the filter by gradually changing the filter weights. In this way, the representation ability of the model is preserved. To verify this, we evaluate our method on two widely-used image retrieval datasets,i.e., Oxford5k and Paris6K, and one person re-identification dataset,i.e., Market-1501. The proposed method arrives with superior performance to the conventional pruning methods, suggesting the effectiveness of the proposed method for image retrieval.

preprint2020arXiv

QCD Phase Diagram by Holography

We explore QCD phase diagram by constructing a holographic QCD model using the Einstein-Maxwell-Scalar system. The chiral transition is investigated by adding a probe scalar and confinement transition is studied by adding a probe string into the system. By interpreting the black hole phase transition in the bulk spacetime as the quarkyonic transition in the dual QCD theory and introducing the bypass mechanism for deconfinement transition, we give an explanation why chiral symmetry breaking and deconfinement transition lines coincide with each other despite their different physical origins.

preprint2020arXiv

Query-efficient Meta Attack to Deep Neural Networks

Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries. However, due to lack of prior and inefficiency in leveraging the query and feedback information, existing methods are mostly query-intensive for obtaining effective attack patterns. In this work, we propose a meta attack approach that is capable of attacking a targeted model with much fewer queries. Its high queryefficiency stems from effective utilization of meta learning approaches in learning generalizable prior abstraction from the previously observed attack patterns and exploiting such prior to help infer attack patterns from only a few queries and outputs. Extensive experiments on MNIST, CIFAR10 and tiny-Imagenet demonstrate that our meta-attack method can remarkably reduce the number of model queries without sacrificing the attack performance. Besides, the obtained meta attacker is not restricted to a particular model but can be used easily with a fast adaptive ability to attack a variety of models.The code of our work is available at https://github.com/dydjw9/MetaAttack_ICLR2020/.

preprint2020arXiv

Research on the new form of higher-order generalized uncertainty principle in quantum system

This paper proposes a new high-order generalized uncertainty principle, which can modify the momentum operator and position operator simultaneously. Moreover, the new form of GUP is consistent with the viewpoint of the existence of the minimum length uncertainty and the maximum observable momentum proposed by the mainstream quantum gravity theory. By using the new GUP, the maximum localization state and position eigenfunction are discussed, and the corresponding conclusions are compared with the existing literature. The harmonic oscillator is further discussed at the end of this article as an example.

preprint2020arXiv

Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps

Recently, remarkable progress has been made in weakly supervised object localization (WSOL) to promote object localization maps. The common practice of evaluating these maps applies an indirect and coarse way, i.e., obtaining tight bounding boxes which can cover high-activation regions and calculating intersection-over-union (IoU) scores between the predicted and ground-truth boxes. This measurement can evaluate the ability of localization maps to some extent, but we argue that the maps should be measured directly and delicately, i.e., comparing the maps with the ground-truth object masks pixel-wisely. To fulfill the direct evaluation, we annotate pixel-level object masks on the ILSVRC validation set. We propose to use IoU-Threshold curves for evaluating the real quality of localization maps. Beyond the amended evaluation metric and annotated object masks, this work also introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision. We propose a two-stage approach to generate the localization maps by simply comparing the similarity of point-wise features between the high-activation and the rest pixels. Based on the predicted localization maps, we explore to estimate object boundaries on a very large dataset. A hard-negative suppression loss is proposed for obtaining fine boundaries. We conduct extensive experiments on the ILSVRC and CUB benchmarks. In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC. The code and the annotated masks are released at https://github.com/xiaomengyc/SEM.

preprint2020arXiv

Revisiting EmbodiedQA: A Simple Baseline and Beyond

In Embodied Question Answering (EmbodiedQA), an agent interacts with an environment to gather necessary information for answering user questions. Existing works have laid a solid foundation towards solving this interesting problem. But the current performance, especially in navigation, suggests that EmbodiedQA might be too challenging for the contemporary approaches. In this paper, we empirically study this problem and introduce 1) a simple yet effective baseline that achieves promising performance; 2) an easier and practical setting for EmbodiedQA where an agent has a chance to adapt the trained model to a new environment before it actually answers users questions. In this new setting, we randomly place a few objects in new environments, and upgrade the agent policy by a distillation network to retain the generalization ability from the trained model. On the EmbodiedQA v1 benchmark, under the standard setting, our simple baseline achieves very competitive results to the-state-of-the-art; in the new setting, we found the introduced small change in settings yields a notable gain in navigation.

preprint2020arXiv

SCExAO/CHARIS High-Contrast Imaging of Spirals and Darkening Features in the HD 34700 A Protoplanetary Disk

We present Subaru/SCExAO+CHARIS broadband ($JHK$-band) integral field spectroscopy of HD 34700 A. CHARIS data recover HD 34700 A's disk ring and confirm multiple spirals discovered in Monnier et al. (2019). We set limits on substellar companions of $\sim12\ M_{\rm Jup}$ at $0\farcs3$ (in the ring gap) and $\sim5\ M_{\rm Jup}$ at $0\farcs75$ (outside the ring). The data reveal darkening effects on the ring and spiral, although we do not identify the origin of each feature such as shadows or physical features related to the outer spirals. Geometric albedoes converted from the surface brightness suggests a higher scale height and/or prominently abundant sub-micron dust at position angle between $\sim45^\circ$ and $90^\circ$. Spiral fitting resulted in very large pitch angles ($\sim30-50^\circ$) and a stellar flyby of HD 34700 B or infall from a possible envelope is perhaps a reasonable scenario to explain the large pitch angles.

preprint2020arXiv

SF-Net: Single-Frame Supervision for Temporal Action Localization

In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL). To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action. This can significantly reduce the labor cost of obtaining full supervision which requires annotating the action boundary. Compared to the weak supervision that only annotates the video-level label, the single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead. To make full use of such single-frame supervision, we propose a unified system called SF-Net. First, we propose to predict an actionness score for each video frame. Along with a typical category score, the actionness score can provide comprehensive information about the occurrence of a potential action and aid the temporal boundary refinement during inference. Second, we mine pseudo action and background frames based on the single-frame annotations. We identify pseudo action frames by adaptively expanding each annotated single frame to its nearby, contextual frames and we mine pseudo background frames from all the unannotated frames across multiple videos. Together with the ground-truth labeled frames, these pseudo-labeled frames are further used for training the classifier. In extensive experiments on THUMOS14, GTEA, and BEOID, SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization. Notably, SF-Net achieves comparable results to its fully-supervised counterpart which requires much more resource intensive annotations. The code is available at https://github.com/Flowerfan/SF-Net.

preprint2020arXiv

SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation

One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this paper, we propose a simple yet effective Similarity Guidance network to tackle the One-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we firstly adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adapted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework which can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SGOne achieves the mIoU score of 46.3%, surpassing the baseline methods.

preprint2020arXiv

Similarity-preserving Image-image Domain Adaptation for Person Re-identification

This article studies the domain adaptation problem in person re-identification (re-ID) under a "learning via translation" framework, consisting of two components, 1) translating the labeled images from the source to the target domain in an unsupervised manner, 2) learning a re-ID model using the translated images. The objective is to preserve the underlying human identity information after image translation, so that translated images with labels are effective for feature learning on the target domain. To this end, we propose a similarity preserving generative adversarial network (SPGAN) and its end-to-end trainable version, eSPGAN. Both aiming at similarity preserving, SPGAN enforces this property by heuristic constraints, while eSPGAN does so by optimally facilitating the re-ID model learning. More specifically, SPGAN separately undertakes the two components in the "learning via translation" framework. It first preserves two types of unsupervised similarity, namely, self-similarity of an image before and after translation, and domain-dissimilarity of a translated source image and a target image. It then learns a re-ID model using existing networks. In comparison, eSPGAN seamlessly integrates image translation and re-ID model learning. During the end-to-end training of eSPGAN, re-ID learning guides image translation to preserve the underlying identity information of an image. Meanwhile, image translation improves re-ID learning by providing identity-preserving training samples of the target domain style. In the experiment, we show that identities of the fake images generated by SPGAN and eSPGAN are well preserved. Based on this, we report the new state-of-the-art domain adaptation results on two large-scale person re-ID datasets.

preprint2020arXiv

Single Image Brightening via Multi-Scale Exposure Fusion with Hybrid Learning

A small ISO and a small exposure time are usually used to capture an image in the back or low light conditions which results in an image with negligible motion blur and small noise but look dark. In this paper, a single image brightening algorithm is introduced to brighten such an image. The proposed algorithm includes a unique hybrid learning framework to generate two virtual images with large exposure times. The virtual images are first generated via intensity mapping functions (IMFs) which are computed using camera response functions (CRFs) and this is a model-driven approach. Both the virtual images are then enhanced by using a data-driven approach, i.e. a residual convolutional neural network to approach the ground truth images. The model-driven approach and the data-driven one compensate each other in the proposed hybrid learning framework. The final brightened image is obtained by fusing the original image and two virtual images via a multi-scale exposure fusion algorithm with properly defined weights. Experimental results show that the proposed brightening algorithm outperforms existing algorithms in terms of the MEF-SSIM metric.

preprint2020arXiv

Sub-micron single-particle perovskite plasmonic nanolasers at room temperature

Plasmonic nanolasers have received a substantial interest for their promising applications in integrated photonics, optical sensing, and biomedical imaging. To date, a room-temperature plasmonic nanolaser, submicron in all dimensions, remains elusive in the visible regime due to high metallic losses. Here, we demonstrate single-particle lasing around 2.3 eV with full-submicron, cesium lead bromide perovskite (CsPbBr3) crystals atop polymer-coated gold substrates at room temperature. With a large number (~100) of devices in total, we systematically study the lasing action of plasmonic test and photonic control groups. The achieved smallest plasmonic laser was 0.56 micrometer x 0.58 micrometer x 0.32 micrometer in size, ten-fold smaller than that of our smallest photonic laser. Key elements to efficient plasmonic lasing are identified as enhanced optical gain by the Purcell effect, long carrier diffusivity, a large spontaneous emission factor, and a high group index. Our results shed light on three-dimensional miniaturization of plasmonic lasers.

preprint2020arXiv

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Egocentric video recognition is a natural testbed for diverse interaction reasoning. Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, ie, one branch for verb classification and the other branch for noun classification. However, correlation studies between the verb and the noun branches have been largely ignored. Besides, the two branches fail to exploit local features due to the absence of a position-aware attention mechanism. In this paper, we propose a novel Symbiotic Attention framework leveraging Privileged information (SAP) for egocentric video recognition. Finer position-aware object detection features can facilitate the understanding of actor's interaction with the object. We introduce these features in action recognition and regard them as privileged information. Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information. This communication process not only injects local details into global features but also exploits implicit guidance about the spatio-temporal position of an on-going action. We introduce novel symbiotic attention (SA) to enable effective communication. It first normalizes the detection guided features on one branch to underline the action-relevant information from the other branch. SA adaptively enhances the interactions among the three sources. To further catalyze this communication, spatial relations are uncovered for the selection of most action-relevant information. It identifies the most valuable and discriminative feature for classification. We validate the effectiveness of our SAP quantitatively and qualitatively. Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.

preprint2020arXiv

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

The traditional object retrieval task aims to learn a discriminative feature representation with intra-similarity and inter-dissimilarity, which supposes that the objects in an image are manually or automatically pre-cropped exactly. However, in many real-world searching scenarios (e.g., video surveillance), the objects (e.g., persons, vehicles, etc.) are seldom accurately detected or annotated. Therefore, object-level retrieval becomes intractable without bounding-box annotation, which leads to a new but challenging topic, i.e. image-level search. In this paper, to address the image search issue, we first introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A Siamese architecture and an on-line pairing strategy for similar and dissimilar objects in the given images are designed. 2) A novel on-line pairing (OLP) loss is introduced with a dynamic feature dictionary, which alleviates the multi-task training stagnation problem, by automatically generating a number of negative pairs to restrict the positives. 3) A hard example priority (HEP) based softmax loss is proposed to improve the robustness of classification task by selecting hard categories. With the philosophy of divide and conquer, we further propose an improved I-Net, called DC-I-Net, which makes two new contributions: 1) two modules are tailored to handle different tasks separately in the integrated framework, such that the task specification is guaranteed. 2) A class-center guided HEP loss (C2HEP) by exploiting the stored class centers is proposed, such that the intra-similarity and inter-dissimilarity can be captured for ultimate retrieval. Extensive experiments on famous image-level search oriented benchmark datasets demonstrate that the proposed DC-I-Net outperforms the state-of-the-art tasks-integrated and tasks-separated image search models.

preprint2020arXiv

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective

The re-ranking approach leverages high-confidence retrieved samples to refine retrieval results, which have been widely adopted as a post-processing tool for image retrieval tasks. However, we notice one main flaw of re-ranking, i.e., high computational complexity, which leads to an unaffordable time cost for real-world applications. In this paper, we revisit re-ranking and demonstrate that re-ranking can be reformulated as a high-parallelism Graph Neural Network (GNN) function. In particular, we divide the conventional re-ranking process into two phases, i.e., retrieving high-quality gallery samples and updating features. We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph. In practice, GNN only needs to concern vertices with the connected edges. Since the graph is sparse, we can efficiently update the vertex features. On the Market-1501 dataset, we accelerate the re-ranking processing from 89.2s to 9.4ms with one K40m GPU, facilitating the real-time post-processing. Similarly, we observe that our method achieves comparable or even better retrieval results on the other four image retrieval benchmarks, i.e., VeRi-776, Oxford-5k, Paris-6k and University-1652, with limited time cost. Our code is publicly available.

preprint2020arXiv

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

We consider the problem of cross-view geo-localization. The primary challenge of this task is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to the traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and could provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn the viewpoint-invariant features and also has good generalization ability in the real-world scenario.

preprint2019arXiv

Cascaded Revision Network for Novel Object Captioning

Image captioning, a challenging task where the machine automatically describes an image by sentences, has drawn significant attention in recent years. Despite the remarkable improvements of recent approaches, however, these methods are built upon a large set of training image-sentence pairs. The expensive labor efforts hence limit the captioning model to describe the wider world. In this paper, we present a novel network structure, Cascaded Revision Network, which aims at relieving the problem by equipping the model with out-of-domain knowledge. CRN first tries its best to describe an image using the existing vocabulary from in-domain knowledge. Due to the lack of out-of-domain knowledge, the caption may be inaccurate or include ambiguous words for the image with unknown (novel) objects. We propose to re-edit the primary captioning sentence by a series of cascaded operations. We introduce a perplexity predictor to find out which words are most likely to be inaccurate given the input image. Thereafter, we utilize external knowledge from a pre-trained object detection model and select more accurate words from detection results by the visual matching module. In the last step, we design a semantic matching module to ensure that the novel object is fit in the right position. By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects. We validate the proposed method with state-of-the-art performance on the held-out MSCOCO dataset as well as scale to ImageNet, demonstrating the effectiveness of this method.

preprint2019arXiv

High-Resolution Near-Infrared Polarimetry and Sub-Millimeter Imaging of FS Tau A: Possible Streamers in Misaligned Circumbinary Disk System

We analyzed the young (2.8-Myr-old) binary system FS Tau A using near-infrared (H-band) high-contrast polarimetry data from Subaru/HiCIAO and sub-millimeter CO (J=2-1) line emission data from ALMA. Both the near-infrared and sub-millimeter observations reveal several clear structures extending to $\sim$240 AU from the stars. Based on these observations at different wavelengths, we report the following discoveries. One arm-like structure detected in the near-infrared band initially extends from the south of the binary with a subsequent turn to the northeast, corresponding to two bar-like structures detected in ALMA observations with an LSRK velocity of 1.19-5.64 km/s. Another feature detected in the near-infrared band extends initially from the north of the binary, relating to an arm-like structure detected in ALMA observations with an LSRK velocity of 8.17-16.43 km/s. From their shapes and velocities, we suggest that these structures can mostly be explained by two streamers that connect the outer circumbinary disk and the central binary components. These discoveries will be helpful for understanding the evolution of streamers and circumstellar disks in young binary systems.

preprint2019arXiv

SN 2016hil-- a Type II supernova in the remote outskirts of an elliptical host and its origin

Type II supernovae (SNe) stem from the core collapse of massive ($>8\ M_{\odot}$) stars. Owing to their short lifespan, we expect a very low rate of such events in elliptical host galaxies, where the star-formation rate is low, and which mostly consist of an old stellar population. SN 2016hil (iPTF16hil) is a Type II supernova located in the extreme outskirts of an elliptical galaxy at redshift $z=0.0608$ (projected distance $27.2$ kpc). It was detected near peak brightness ($M_{r} \approx -17$ mag) 9 days after the last nondetection. SN 2016hil has some potentially peculiar properties: while presenting a characteristic spectrum, the event was unusually short lived and declined by $\sim 1.5$ mag in $< 40$ days, following an apparently double-peaked light curve. Its spectra suggest a low metallicity ($Z<0.4\ Z_{\odot}$). We place a tentative upper limit on the mass of a potential faint host at $\log(M/M_{\odot}) =7.27^{+0.43}_{-0.24}$ using deep Keck optical imaging. In light of this, we discuss the possibility of the progenitor forming locally, and other more exotic formation scenarios such as a merger or common-envelope evolution causing a time-delayed explosion. Further observations of the explosion site in the ultraviolet are needed in order to distinguish between the cases. Regardless of the origin of the transient, observing a population of such seemingly hostless Type II SNe could have many uses, including an estimate the number of faint galaxies in a given volume, and tests of the prediction of a time-delayed population of core-collapse SNe in locations otherwise unfavorable for the detection of such events.

preprint2019arXiv

Very Long Natural Scenery Image Prediction by Outpainting

Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting.

preprint2018arXiv

A solid approach to biopharmaceutical stabilisation

Ensilication is a technology we developed that can physically stabilise proteins in silica without use of a pre-formed particle matrix. Stabilisation is done by tailor fitting individual proteins with a silica coat using a modified sol-gel process. Biopharmaceuticals, for example, liquid-formulated vaccines with adjuvants, have poor thermal stability. Heating or freezing impairs their potency. As a result, there is an increase in the prevalence of vaccine-preventable diseases in low-income countries even when there are means to combat them. One of the root causes lies in the problematic vaccine cold-chain distribution. We believe that ensilication can improve vaccine availability by enabling transportation without refrigeration. Here, we show that ensilication stabilises tetanus toxoid C fragment (TTCF) and demonstrate that this material can be stored and transported at ambient temperature without compromising the immunogenic properties of TTCF in vivo. TTCF is a component of the diphtheria, tetanus and pertussis (DTP) vaccine. To further our understanding of the ensilication process, and its protective effect on proteins we have studied the formation of TTCF-silica nanoparticles via time-resolved Small Angle X-ray Scattering (SAXS). Our results reveal ensilication to be a staged diffusion-limited cluster aggregation (DLCA) type reaction, induced by the presence of TTCF protein at neutral pH. Analysis of scattering data indicates tailor fitting of TTCF protein. The experimental in vivo immunisation data confirms the retention of immunogenicity after release from silica. Our results suggest that we could utilise this technology for multicomponent vaccines, therapeutics or other biopharmaceuticals that are not compatible with lyophilisation.

preprint2016arXiv

A Resolved Near-Infrared Image of The Inner Cavity in The GM Aur Transitional Disk

We present high-contrast H-band polarized intensity (PI) images of the transitional disk around the young solar-like star GM Aur. The near-infrared direct imaging of the disk was derived by polarimetric differential imaging using the Subaru 8.2-m Telescope and HiCIAO. An angular resolution and an inner working angle of 0."07 and r~0."05, respectively, were obtained. We clearly resolved a large inner cavity, with a measured radius of 18+/-2 au, which is smaller than that of a submillimeter interferometric image (28 au). This discrepancy in the cavity radii at near-infrared and submillimeter wavelengths may be caused by a 3-4M_Jup planet about 20 au away from the star, near the edge of the cavity. The presence of a near-infrared inner is a strong constraint on hypotheses for inner cavity formation in a transitional disk. A dust filtration mechanism has been proposed to explain the large cavity in the submillimeter image, but our results suggest that this mechanism must be combined with an additional process. We found that the PI slope of the outer disk is significantly different from the intensity slope obtained from HST/NICMOS, and this difference may indicate the grain growth process in the disk.

preprint2016arXiv

ASASSN-15lh: A Superluminous Ultraviolet Rebrightening Observed by Swift and Hubble

We present and discuss ultraviolet and optical photometry from the Ultraviolet/Optical Telescope and X-ray limits from the X-Ray Telescope on Swift and imaging polarimetry and ultraviolet/optical spectroscopy with the Hubble Space Telescope of ASASSN-15lh. It has been classified as a hydrogen-poor superluminous supernova (SLSN I) more luminous than any other supernova observed. ASASSN-15lh is not detected in the X-rays in individual or coadded observations. From the polarimetry we determine that the explosion was only mildly asymmetric. We find the flux of ASASSN-15lh to increase strongly into the ultraviolet, with a ultraviolet luminosity a hundred times greater than the hydrogen-rich, ultraviolet-bright SLSN II SN 2008es. We find objects as bright as ASASSN-15lh are easily detectable beyond redshifts of ~4 with the single-visit depths planned for the Large Synoptic Survey Telescope. Deep near-infrared surveys could detect such objects past a redshift of ~20 enabling a probe of the earliest star formation. A late rebrightening -- most prominent at shorter wavelengths -- is seen about two months after the peak brightness, which is itself as bright as a superluminous supernova. The ultraviolet spectra during the rebrightening are dominated by the continuum without the broad absorption or emission lines seen in SLSNe or tidal disruption events and the early optical spectra of ASASSN-15lh. Our spectra show no strong hydrogen emission, showing only LyA absorption near the redshift previously found by optical absorption lines of the presumed host. The properties of ASASSN-15lh are extreme when compared to either SLSNe or tidal disruption events.

preprint2016arXiv

Attention to Scale: Scale-aware Semantic Image Segmentation

Incorporating multi-scale features in fully convolutional neural networks (FCNs) has been a key element to achieving state-of-the-art performance on semantic image segmentation. One common way to extract multi-scale features is to feed multiple resized input images to a shared deep network and then merge the resulting features for pixelwise classification. In this work, we propose an attention mechanism that learns to softly weight the multi-scale features at each pixel location. We adapt a state-of-the-art semantic image segmentation model, which we jointly train with multi-scale input images and the attention model. The proposed attention model not only outperforms average- and max-pooling, but allows us to diagnostically visualize the importance of features at different positions and scales. Moreover, we show that adding extra supervision to the output at each scale is essential to achieving excellent performance when merging multi-scale features. We demonstrate the effectiveness of our model with extensive experiments on three challenging datasets, including PASCAL-Person-Part, PASCAL VOC 2012 and a subset of MS-COCO 2014.

preprint2016arXiv

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

Despite the recent success of neural networks in image feature learning, a major problem in the video domain is the lack of sufficient labeled data for learning to model temporal information. In this paper, we propose an unsupervised temporal modeling method that learns from untrimmed videos. The speed of motion varies constantly, e.g., a man may run quickly or slowly. We therefore train a Multirate Visual Recurrent Model (MVRM) by encoding frames of a clip with different intervals. This learning process makes the learned model more capable of dealing with motion speed variance. Given a clip sampled from a video, we use its past and future neighboring clips as the temporal context, and reconstruct the two temporal transitions, i.e., present$\rightarrow$past transition and present$\rightarrow$future transition, reflecting the temporal information in different views. The proposed method exploits the two transitions simultaneously by incorporating a bidirectional reconstruction which consists of a backward reconstruction and a forward reconstruction. We apply the proposed method to two challenging video tasks, i.e., complex event detection and video captioning, in which it achieves state-of-the-art performance. Notably, our method generates the best single feature for event detection with a relative improvement of 10.4% on the MEDTest-13 dataset and achieves the best performance in video captioning across all evaluation metrics on the YouTube2Text dataset.

preprint2016arXiv

CNN-RNN: A Unified Framework for Multi-label Image Classification

While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification model

preprint2016arXiv

Confinement-Deconfinment Phase Transition for Heavy Quarks

We study confinement-deconfinement phase transition for heavy quarks in a bottom-up holographic QCD model. We consider a black hole background in an Einstein-Maxwell-scalar system and add probe open strings to the background. Combining the various configurations of the open strings and the phase structure of the black hole background itself, we obtain the confinement-deconfinement phase diagram for heavy quarks in the holographic QCD model.

preprint2016arXiv

Dynamic Concept Composition for Zero-Example Event Detection

In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars. In principle, zero-shot learning makes it possible to train an event detection model based on the assumption that events (e.g. \emph{birthday party}) can be described by multiple mid-level semantic concepts (e.g. "blowing candle", "birthday cake"). Towards this goal, we first pre-train a bundle of concept classifiers using data from other sources. Then we evaluate the semantic correlation of each concept \wrt the event of interest and pick up the relevant concept classifiers, which are applied on all test videos to get multiple prediction score vectors. While most existing systems combine the predictions of the concept classifiers with fixed weights, we propose to learn the optimal weights of the concept classifiers for each testing video by exploring a set of online available videos with free-form text descriptions of their content. To validate the effectiveness of the proposed approach, we have conducted extensive experiments on the latest TRECVID MEDTest 2014, MEDTest 2013 and CCV dataset. The experimental results confirm the superiority of the proposed approach.

preprint2016arXiv

Few-Shot Object Recognition from Machine-Labeled Web Images

With the tremendous advances of Convolutional Neural Networks (ConvNets) on object recognition, we can now obtain reliable enough machine-labeled annotations easily by predictions from off-the-shelf ConvNets. In this work, we present an abstraction memory based framework for few-shot learning, building upon machine-labeled image annotations. Our method takes some large-scale machine-annotated datasets (e.g., OpenImages) as an external memory bank. In the external memory bank, the information is stored in the memory slots with the form of key-value, where image feature is regarded as key and label embedding serves as value. When queried by the few-shot examples, our model selects visually similar data from the external memory bank, and writes the useful information obtained from related external data into another memory bank, i.e., abstraction memory. Long Short-Term Memory (LSTM) controllers and attention mechanisms are utilized to guarantee the data written to the abstraction memory is correlated to the query example. The abstraction memory concentrates information from the external memory bank, so that it makes the few-shot recognition effective. In the experiments, we firstly confirm that our model can learn to conduct few-shot object recognition on clean human-labeled data from ImageNet dataset. Then, we demonstrate that with our model, machine-labeled image annotations are very effective and abundant resources to perform object recognition on novel categories. Experimental results show that our proposed model with machine-labeled annotations achieves great performance, only with a gap of 1% between of the one with human-labeled annotations.

preprint2016arXiv

Fluid/Gravity Correspondence with Scalar Field and Electromagnetic Field

We consider fluid/gravity correspondence in a general rotating black hole background with scalar and electromagnetic fields. Using the method of Petrov-like boundary condition, we show that the scalar and the electromagnetic fields contribute external forces to the dual Navier-Stokes equation and the rotation of black hole induces the Coriolis force.

preprint2016arXiv

Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

The Tweedie GLM is a widely used method for predicting insurance premiums. However, the structure of the logarithmic mean is restricted to a linear form in the Tweedie GLM, which can be too rigid for many applications. As a better alternative, we propose a gradient tree-boosting algorithm and apply it to Tweedie compound Poisson models for pure premiums. We use a profile likelihood approach to estimate the index and dispersion parameters. Our method is capable of fitting a flexible nonlinear Tweedie model and capturing complex interactions among predictors. A simulation study confirms the excellent prediction performance of our method. As an application, we apply our method to an auto insurance claim data and show that the new method is superior to the existing methods in the sense that it generates more accurate premium predictions, thus helping solve the adverse selection issue. We have implemented our method in a user-friendly R package that also includes a nice visualization tool for interpreting the fitted model.

preprint2016arXiv

Optically Thin Metallic Films for High-radiative-efficiency Plasmonics

Plasmonics enables deep-subwavelength concentration of light and has become important for fundamental studies as well as real-life applications. Two major existing platforms of plasmonics are metallic nanoparticles and metallic films. Metallic nanoparticles allow efficient coupling to far field radiation, yet their synthesis typically leads to poor material quality. Metallic films offer substantially higher quality materials, but their coupling to radiation is typically jeopardized due to the large momentum mismatch with free space. Here, we propose and theoretically investigate optically thin metallic films as an ideal platform for high-radiative-efficiency plasmonics. For far-field scattering, adding a thin high-quality metallic substrate enables a higher quality factor while maintaining the localization and tunability that the nanoparticle provides. For near-field spontaneous emission, a thin metallic substrate, of high quality or not, greatly improves the field overlap between the emitter environment and propagating surface plasmons, enabling high-Purcell (total enhancement > $10^4$), high-quantum-yield (> 50 %) spontaneous emission, even as the gap size vanishes (3$\sim$5 nm). The enhancement has almost spatially independent efficiency and does not suffer from quenching effects that commonly exist in previous structures.

preprint2016arXiv

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive parallel computing capability of GPUs make them as one of the ideal platforms to accelerate CNNs and a number of GPU-based CNN libraries have been developed. While existing works mainly focus on the computational efficiency of CNNs, the memory efficiency of CNNs have been largely overlooked. Yet CNNs have intricate data structures and their memory behavior can have significant impact on the performance. In this work, we study the memory efficiency of various CNN layers and reveal the performance implication from both data layouts and memory access patterns. Experiments show the universal effect of our proposed optimizations on both single layers and various networks, with up to 27.9x for a single layer and up to 5.6x on the whole networks.

preprint2016arXiv

Part-of-Speech Tagging for Historical English

As more historical texts are digitized, there is interest in applying natural language processing tools to these archives. However, the performance of these tools is often unsatisfactory, due to language change and genre differences. Spelling normalization heuristics are the dominant solution for dealing with historical texts, but this approach fails to account for changes in usage and vocabulary. In this empirical paper, we assess the capability of domain adaptation techniques to cope with historical texts, focusing on the classic benchmark task of part-of-speech tagging. We evaluate several domain adaptation methods on the task of tagging Early Modern English and Modern British English texts in the Penn Corpora of Historical English. We demonstrate that the Feature Embedding method for unsupervised domain adaptation outperforms word embeddings and Brown clusters, showing the importance of embedding the entire feature space, rather than just individual words. Feature Embeddings also give better performance than spelling normalization, but the combination of the two methods is better still, yielding a 5% raw improvement in tagging accuracy on Early Modern English texts.

preprint2016arXiv

Person Re-identification: Past, Present and Future

Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance. It aims at spotting a person of interest in other cameras. In the early days, hand-crafted algorithms and small-scale evaluation were predominantly reported. Recent years have witnessed the emergence of large-scale datasets and deep learning systems which make use of large data volumes. Considering different tasks, we classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems will be reviewed. Moreover, two new re-ID tasks which are much closer to real-world applications are described and discussed, i.e., end-to-end re-ID and fast re-ID in very large galleries. This paper: 1) introduces the history of person re-ID and its relationship with image classification and instance retrieval; 2) surveys a broad selection of the hand-crafted systems and the large-scale methods in both image- and video-based re-ID; 3) describes critical future directions in end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs some important yet under-developed issues.

preprint2016arXiv

Probing Neutrino Mass Hierarchy by Comparing the Charged-Current and Neutral-Current Interaction Rates of Supernova Neutrinos

The neutrino mass hierarchy is one of the neutrino fundamental properties yet to be determined. We introduce a method to determine neutrino mass hierarchy by comparing the interaction rate of neutral current (NC) interactions, $ν(\hatν) + p\rightarrowν(\hatν) + p$, and inverse beta decays (IBD), $\barν_e + p\rightarrow n + e^+$, of supernova neutrinos in scintillation detectors. Neutrino flavor conversions inside the supernova are sensitive to neutrino mass hierarchy. Due to Mikheyev-Smirnov-Wolfenstein effects, the full swapping of $\barν_e$ flux with the $\barν_x$ ($x=μ,~τ$) one occurs in the inverted hierarchy, while such a swapping does not occur in the normal hierarchy. As a result, more high energy IBD events occur in the detector for the inverted hierarchy than the high energy IBD events in the normal hierarchy. By comparing IBD interaction rate with the mass hierarchy independent NC interaction rate, one can determine the neutrino mass hierarchy.

preprint2016arXiv

Review on High energy String Scattering Amplitudes and Symmetries of String Theory

We review high energy symmetries of string theory at both the fixed angle or Gross regime (GR) and the fixed momentum transfer or Regge regime (RR). We calculated in details high energy string scattering amplitudes at arbitrary mass levels for both regimes. We discovered infinite linear relations among fixed angle string amplitudes conjectured by Gross in 1988 from decoupling of high energy zero-norm states (ZNS), and infinite recurrence relations among Regge string amplitudes from Kummer function U and Appell function F_1. However, the linear relations we obtained in the GR corrected [27-32] the saddle point calculations of Gross, Gross and Mende and Gross and Manes [1-5]. Our results were consistent with the decoupling of high energy ZNS or unitarity of the theory while those of them were not. In addition, for the case of high energy closed string scatterings, our results [36] differ from theirs by an oscillating prefactor which was crucial to recover the KLT relation valid for all energies. In the GR/RR regime, all high energy string amplitudes can be solved by these linear/recurrence relations so that all GR/RR string amplitudes can be expressed in terms of one single GR/RR string amplitude. In addition, we found an interesting link between string amplitudes of the two regimes, and discovered that at each mass level the ratios among fixed angle amplitudes can be extracted from Regge string scattering amplitudes. This result enables us to argue that the known SL(5,C) dynamical symmetry of the Appell function F_1 is crucial to probe high energy spacetime symmetry of string theory.

preprint2016arXiv

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Non-linear models recently receive a lot of attention as people are starting to discover the power of statistical and embedding features. However, tree-based models are seldom studied in the context of structured learning despite their recent success on various classification and ranking tasks. In this paper, we propose S-MART, a tree-based structured learning framework based on multiple additive regression trees. S-MART is especially suitable for handling tasks with dense features, and can be used to learn many different structures under various loss functions. We apply S-MART to the task of tweet entity linking --- a core component of tweet information extraction, which aims to identify and link name mentions to entities in a knowledge base. A novel inference algorithm is proposed to handle the special structure of the task. The experimental results show that S-MART significantly outperforms state-of-the-art tweet entity linking systems.

preprint2016arXiv

Sparsity Oriented Importance Learning for High-dimensional Linear Regression

With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures.

preprint2016arXiv

Strategies for Searching Video Content with Text Queries or Video Examples

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches.

preprint2016arXiv

The Exact SL(K+3,C) Symmetry of String Scattering Amplitudes

We discover that the 26D open bosonic string scattering amplitudes (SSA) of three tachyons and one arbitrary string state can be expressed in terms of the D-type Lauricella functions with associated SL(K+3,C) symmetry. As a result, SSA and symmetries or relations among SSA of different string states at various limits calculated previously can be rederived. These include the linear relations conjectured by Gross [1-3] and proved in [4-9] in the hard scattering limit, the recurrence relations in the Regge scattering limit [14-16] and the extended recurrence relations in the nonrelativistic scattering limit [19] discovered recently. Finally, as an application, we calculate a new recurrence relation of SSA which is valid for all energies.

preprint2016arXiv

The expanding light echoes from supernova 2014J in M82

We present the measurement of the size and surface brightness of the expanding light echoes from supernova (SN) 2014J in the nearby starburst galaxy M82. Hubble Space Telescope (HST) ACS/WFC images were taken ~277 and ~416 days (after the time of B-band maximum light) in the filters F475W, F606W, and F775W, each combined with the three polarizing filters: POL0V, POL60V, and POL120V. The two epochs' imaging reveals the time evolution of at least two major echoes. Three concentric bright regions between position angles (PA, 0^{\circ} from North, counterclockwise). 80^{\circ} ~ 170^{\circ} have projected radius of 0.60" on the sky on ~277 days and expanding to 0.75" on ~416 days, corresponding to scattering materials at a foreground distance of 222\pm37 pc. Another fainter but evident light echo extending over a wide range of PA has radii of 0.75" and 0.96" on ~277 and ~416 days. This corresponds to scattering material at a foreground distance of 367\pm61 pc. Multiple light echoes with S/N > 2.5 reside at smaller radii on ~277 days but become less significant on ~416 days indicating a complex structure of foreground interstellar medium (ISM). The light echo shows bluer color than predicted under a Rayleigh scattering case. We also found the light echo brightened from V_{echo}=21.68\pm0.07 on 2014 September 5, to V_{echo}=21.05\pm0.08 on 2014 November 6, suggesting an enhancement of echoing materials at different distances projected on to the plane of the sky.

preprint2016arXiv

The Lauricella Functions and Exact String Scattering Amplitudes

We discover that the 26D open bosonic string scattering amplitudes (SSA) of three tachyons and one arbitrary string state can be expressed in terms of the D-type Lauricella functions with associated SL(K+3,C) symmetry. As a result, SSA and symmetries or relations among SSA of different string states at various limits calculated previously can be rederived. These include the linear relations first conjectured by Gross [1-3] and later corrected and proved in [4-9] in the hard scattering limit, the recurrence relations in the Regge scattering limit with associated SL(5,C) symmetry [19-21] and the extended recurrence relations in the nonrelativistic scattering limit with associated SL(4,C) symmetry [24] discovered recently. Finally, as an application, we calculate a new recurrence relation of SSA which is valid for all energies.

preprint2016arXiv

The String BCJ Relations Revisited and Extended Recurrence relations of Nonrelativistic String Scattering Amplitudes

We review and extend high energy string BCJ relations in both the fixed angle and Regge regimes. We then give an explicit proof of four point string BCJ relations for all energy. This calculation provides an alternative proof of the one based on monodromy of integration in string amplitude calculation. In addition, we calculate both s-t and t-u channel nonrelativistic low energy string scattering amplitudes of three tachyons and one leading trojectory string state at arbitrary mass levels. We discover that the mass and spin dependent nonrelativistic string BCJ relations can be expressed in terms of Gauss hypergeometry functions. As an application, for each fixed mass level N, we derive extended recurrence relations among nonrelativistic low energy string scattering amplitudes of string states with different spins and different channels.

preprint2016arXiv

Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities

Entity linking is the task of identifying mentions of entities in text, and linking them to entries in a knowledge base. This task is especially difficult in microblogs, as there is little additional text to provide disambiguating context; rather, authors rely on an implicit common ground of shared knowledge with their readers. In this paper, we attempt to capture some of this implicit context by exploiting the social network structure in microblogs. We build on the theory of homophily, which implies that socially linked individuals share interests, and are therefore likely to mention the same sorts of entities. We implement this idea by encoding authors, mentions, and entities in a continuous vector space, which is constructed so that socially-connected authors have similar vector representations. These vectors are incorporated into a neural structured prediction model, which captures structural constraints that are inherent in the entity linking task. Together, these design decisions yield F1 improvements of 1%-5% on benchmark datasets, as compared to the previous state-of-the-art.

preprint2016arXiv

Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences to describe a realistic video. Our hierarchical framework contains a sentence generator and a paragraph generator. The sentence generator produces one simple short sentence that describes a specific short video interval. It exploits both temporal- and spatial-attention mechanisms to selectively focus on visual elements during generation. The paragraph generator captures the inter-sentence dependency by taking as input the sentential embedding produced by the sentence generator, combining it with the paragraph history, and outputting the new initial state for the sentence generator. We evaluate our approach on two large-scale benchmark datasets: YouTubeClips and TACoS-MultiLevel. The experiments demonstrate that our approach significantly outperforms the current state-of-the-art methods with BLEU@4 scores 0.499 and 0.305 respectively.

preprint2015arXiv

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we investigate existing multi-GPU level 3 BLAS and present that 1) issues, such as the improper load balancing, inefficient communication, insufficient GPU stream level concurrency and data caching, impede current implementations from fully harnessing heterogeneous computing resources; 2) and the inter-GPU Peer-to-Peer(P2P) communication remains unexplored. We then present BLASX: a highly optimized multi-GPU level-3 BLAS. We adopt the concepts of algorithms-by-tiles treating a matrix tile as the basic data unit and operations on tiles as the basic task. Tasks are guided with a dynamic asynchronous runtime, which is cache and locality aware. The communication cost under BLASX becomes trivial as it perfectly overlaps communication and computation across multiple streams during asynchronous task progression. It also takes the current tile cache scheme one step further by proposing an innovative 2-level hierarchical tile cache, taking advantage of inter-GPU P2P communication. As a result, linear speedup is observable with BLASX under multi-GPU configurations; and the extensive benchmarks demonstrate that BLASX consistently outperforms the related leading industrial and academic projects such as cuBLAS-XT, SuperMatrix, MAGMA and PaRSEC.

preprint2015arXiv

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .

preprint2015arXiv

DenseBox: Unifying Landmark Localization with End to End Object Detection

How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.

preprint2015arXiv

Depth-based hand pose estimation: methods, data, and challenges

Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby objects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress.

preprint2015arXiv

Evolution of the 2012 July 12 CME from the Sun to the Earth: Data-Constrained Three-Dimensional MHD Simulations

The dynamic process of coronal mass ejections (CMEs) in the heliosphere provides us the key information for evaluating CMEs' geo-effectiveness and improving the accurate prediction of CME induced Shock Arrival Time (SAT) at the Earth. We present a data constrained three dimensional (3D) magnetohydrodynamic (MHD) simulation of the evolution of the CME in a realistic ambient solar wind for the July 12-16, 2012 event by using the 3D COIN-TVD MHD code. A detailed comparison of the kinematic evolution of the CME between the observations and the simulation is carried out, including the usage of the time-elongation maps from the perspectives of both Stereo A and Stereo B. In this case study, we find that our 3D COIN-TVD MHD model, with the magnetized plasma blob as the driver, is able to re-produce relatively well the real 3D nature of the CME in morphology and their evolution from the Sun to Earth. The simulation also provides a relatively satisfactory comparison with the in-situ plasma data from the Wind spacecraft.

preprint2015arXiv

Extremal RN/CFT in Both Hands Revisited

We study RN/CFT correspondence for four dimensional extremal Reissner-Nordstrom black hole. We uplift the 4d RN black hole to a 5d rotating black hole and make a geometric regularization of the 5d space-time. Both hands central charges are obtained correctly at the same time by Brown-Henneaux technique.

preprint2015arXiv

Flexible Expectile Regression in Reproducing Kernel Hilbert Space

Expectile, first introduced by Newey and Powell (1987) in the econometrics literature, has recently become increasingly popular in risk management and capital allocation for financial institutions due to its desirable properties such as coherence and elicitability. The current standard tool for expectile regression analysis is the multiple linear expectile regression proposed by Newey and Powell in 1987. The growing applications of expectile regression motivate us to develop a much more flexible nonparametric multiple expectile regression in a reproducing kernel Hilbert space. The resulting estimator is called KERE which has multiple advantages over the classical multiple linear expectile regression by incorporating non-linearity, non-additivity and complex interactions in the final estimator. The kernel learning theory of KERE is established. We develop an efficient algorithm inspired by majorization-minimization principle for solving the entire solution path of KERE. It is shown that the algorithm converges at least at a linear rate. Extensive simulations are conducted to show the very competitive finite sample performance of KERE. We further demonstrate the application of KERE by using personal computer price data.

preprint2015arXiv

Group $K$-Means

We study how to learn multiple dictionaries from a dataset, and approximate any data point by the sum of the codewords each chosen from the corresponding dictionary. Although theoretically low approximation errors can be achieved by the global solution, an effective solution has not been well studied in practice. To solve the problem, we propose a simple yet effective algorithm \textit{Group $K$-Means}. Specifically, we take each dictionary, or any two selected dictionaries, as a group of $K$-means cluster centers, and then deal with the approximation issue by minimizing the approximation errors. Besides, we propose a hierarchical initialization for such a non-convex problem. Experimental results well validate the effectiveness of the approach.

preprint2015arXiv

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for video representation becomes a fundamental problem for video content analysis. In this paper, we propose a new approach, namely Hierarchical Recurrent Neural Encoder (HRNE), to exploit temporal information of videos. Compared to recent video representation inference approaches, this paper makes the following three contributions. First, our HRNE is able to efficiently exploit video temporal structure in a longer range by reducing the length of input information flow, and compositing multiple consecutive inputs at a higher level. Second, computation operations are significantly lessened while attaining more non-linearity. Third, HRNE is able to uncover temporal transitions between frame chunks with different granularities, i.e., it can model the temporal transitions between frames as well as the transitions between segments. We apply the new method to video captioning where temporal information plays a crucial role. Experiments demonstrate that our method outperforms the state-of-the-art on video captioning benchmarks. Notably, even using a single network with only RGB stream as input, HRNE beats all the recent systems which combine multiple inputs, such as RGB ConvNet plus 3D ConvNet.

preprint2015arXiv

Improved Spectral Clustering via Embedded Label Propagation

Spectral clustering is a key research topic in the field of machine learning and data mining. Most of the existing spectral clustering algorithms are built upon Gaussian Laplacian matrices, which are sensitive to parameters. We propose a novel parameter free, distance consistent Locally Linear Embedding. The proposed distance consistent LLE promises that edges between closer data points have greater weight.Furthermore, we propose a novel improved spectral clustering via embedded label propagation. Our algorithm is built upon two advancements of the state of the art:1) label propagation,which propagates a nodeś labels to neighboring nodes according to their proximity; and 2) manifold learning, which has been widely used in its capacity to leverage the manifold structure of data points. First we perform standard spectral clustering on original data and assign each cluster to k nearest data points. Next, we propagate labels through dense, unlabeled data regions. Extensive experiments with various datasets validate the superiority of the proposed algorithm compared to current state of the art spectral algorithms.

preprint2015arXiv

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, our method is able to efficiently hypothesize the semantic meaning of new words and add them to its word dictionary so that they can be used to describe images which contain these novel concepts. Our method has an image captioning module based on m-RNN with several improvements. In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task. We propose methods to prevent overfitting the new concepts. In addition, three novel concept datasets are constructed for this new task. In the experiments, we show that our method effectively learns novel visual concepts from a few examples without disturbing the previously learned concepts. The project page is http://www.stat.ucla.edu/~junhua.mao/projects/child_learning.html

preprint2015arXiv

Multiclass Sparse Discriminant Analysis

In recent years many sparse linear discriminant analysis methods have been proposed for high-dimensional classification and variable selection. However, most of these proposals focus on binary classification and they are not directly applicable to multiclass classification problems. There are two sparse discriminant analysis methods that can handle multiclass classification problems, but their theoretical justifications remain unknown. In this paper, we propose a new multiclass sparse discriminant analysis method that estimates all discriminant directions simultaneously. We show that when applied to the binary case our proposal yields a classification direction that is equivalent to those by two successful binary sparse LDA methods in the literature. An efficient algorithm is developed for computing our method with high-dimensional data. Variable selection consistency and rates of convergence are established under the ultrahigh dimensionality setting. We further demonstrate the superior performance of our proposal over the existing methods on simulated and real data.

preprint2015arXiv

Regge Closed String Scattering and its Implication on Fixed angle Closed String Scattering

We calculate the complete closed string high energy scattering amplitudes (HSA) in the Regge regime for arbitrary mass levels. As an application, we deduce the complete ratios among closed string HSA in the fixed angle regime by using Stirling number identities. These results are in contrast with the incomplete set of closed string HSA in the fixed angle regime calculated previously. The complete forms of the fixed angle amplitudes, and hence the ratios, were not calculable previously without the input of zero-norm state calculation. This is mainly due to the lack of saddle point in the fixed angle closed string calculation.

preprint2015arXiv

Rotating Black Holes and Coriolis Effect

In this work, we consider the fluid/gravity correspondence for general rotating black holes. By using the Petrov-like boundary condition in near horizon limit, we study the correspondence between gravitational perturbation and fluid equation. We find that the dual fluid equation for rotating black holes contains a Coriolis force term, which is closely related to the angular velocity of the black hole horizon. This can be seen as a dual effect for the frame-dragging effect of rotating black hole under the holographic picture.

preprint2015arXiv

Uncovering Temporal Context for Video Question and Answering

In this work, we introduce Video Question Answering in temporal domain to infer the past, describe the present and predict the future. We present an encoder-decoder approach using Recurrent Neural Networks to learn temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using question form of "fill-in-the-blank", and managed to collect 109,895 video clips with duration over 1,000 hours from TACoS, MPII-MD, MEDTest 14 datasets, while the corresponding 390,744 questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.

preprint2015arXiv

Unsupervised Domain Adaptation with Feature Embeddings

Representation learning is the dominant technique for unsupervised domain adaptation, but existing approaches often require the specification of "pivot features" that generalize across domains, which are selected by task-specific heuristics. We show that a novel but simple feature embedding approach provides better performance, by exploiting the feature template structure common in NLP problems.

preprint2014arXiv

A Convex Formulation for Spectral Shrunk Clustering

Spectral clustering is a fundamental technique in the field of data mining and information processing. Most existing spectral clustering algorithms integrate dimensionality reduction into the clustering process assisted by manifold learning in the original space. However, the manifold in reduced-dimensional subspace is likely to exhibit altered properties in contrast with the original space. Thus, applying manifold information obtained from the original space to the clustering process in a low-dimensional subspace is prone to inferior performance. Aiming to address this issue, we propose a novel convex algorithm that mines the manifold structure in the low-dimensional subspace. In addition, our unified learning process makes the manifold learning particularly tailored for the clustering. Compared with other related methods, the proposed algorithm results in more structured clustering result. To validate the efficacy of the proposed algorithm, we perform extensive experiments on several benchmark datasets in comparison with some state-of-the-art clustering approaches. The experimental results demonstrate that the proposed algorithm has quite promising clustering performance.

preprint2014arXiv

A Discriminative CNN Video Representation for Event Detection

In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available. The focus of this paper is to effectively leverage deep Convolutional Neural Networks (CNNs) to advance event detection, where only frame level static descriptors can be extracted by the existing CNN toolkit. This paper makes two contributions to the inference of CNN video representation. First, while average pooling and max pooling have long been the standard approaches to aggregating frame level static features, we show that performance can be significantly improved by taking advantage of an appropriate encoding method. Second, we propose using a set of latent concept descriptors as the frame descriptor, which enriches visual information while keeping it computationally affordable. The integration of the two contributions results in a new state-of-the-art performance in event detection over the largest video datasets. Compared to improved Dense Trajectories, which has been recognized as the best video representation for event detection, our new representation improves the Mean Average Precision (mAP) from 27.6% to 36.8% for the TRECVID MEDTest 14 dataset and from 34.0% to 44.6% for the TRECVID MEDTest 13 dataset. This work is the core part of the winning solution of our CMU-Informedia team in TRECVID MED 2014 competition.

preprint2014arXiv

A Refined Holographic QCD Model and QCD Phase Structure

We consider the Einstein-Maxwell-dilaton system with an arbitrary kinetic gauge function and a dilaton potential. A family of analytic solutions is obtained by the potential reconstruction method. We then study its holographic dual QCD model. The kinetic gauge function can be fixed by requesting the linear Regge spectrum of mesons. We calculate the free energy to obtain the phase diagram of the holographic QCD model.

preprint2014arXiv

Analytical Perspective for Photonic Bound States in the Continuum in Photonic Crystal Slabs

We investigate the formation of photonic bound states in the continuum (BICs) in photonic crystal slabs from an analytical perspective. Unlike the stationary at-$Γ$ BICs which origin from the geometric symmetry, the tunable off-$Γ$ BICs are due to the weighted destructive via-the-continuum interference in the vicinity of accidental symmetry when the majority of the radiation is pre-canceled. The symmetric compatible nature of the off-$Γ$ BICs leads to a trapping of light that can be tuned through continuously varying the wavevector. With the analytical approach, we explain a reported experiment and predict the existence of a new BIC at an unrevealed symmetry.

preprint2014arXiv

Balanced k-Means and Min-Cut Clustering

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.

preprint2014arXiv

Explain Images with Multimodal Recurrent Neural Networks

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12, Flickr 8K, and Flickr 30K). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.

preprint2014arXiv

Multiferroic interfaces composed of d0 perovskites oxides

We investigate the electronic, ferroelectric and magnetic properties of KTaO3/PbTiO3 interfaces by using conventional density functional theory (DFT) and advanced DFT such as hybrid functional HSE06. We show that doped holes in valence bands or electrons in conduction bands give rise to ferromagnetism at the interfaces. The ferromagnetic states are ground states for both hole-doped (p-type) and electron-doped (n-type) interfaces by comparison with their corresponding nonmagnetic and antiferromagnetic states. Carriers (holes or electrons) concentrate near the interface to screen the polarization charge and thus the concentration of carrier varies with the ferroelectric polarization. Furthermore, the interface magnetization, which is nearly proportional to the concentration of carrier, can be tuned by ferroelectric polarization reversal, leading to strong intrinsic magnetoelectric effects at the interface of originally nonmagnetic KTaO3 and PbTiO3. Interestingly, a ferromagnetic-nonmagnetic transition tuned by an applied electric field can be realized at the p-type interface. This suggests an illuminating approach to multiferroic materials beyond conventional single-phase multiferroics and multi-phase multiferroics such as ferroelectric/ferromagnet heterostructures. The KTaO3/PbTiO3 interfaces may be promising in future multiferroic devices applications.

preprint2014arXiv

Production of Gadolinium-loaded Liquid Scintillator for the Daya Bay Reactor Neutrino Experiment

We report on the production and characterization of liquid scintillators for the detection of electron antineutrinos by the Daya Bay Reactor Neutrino Experiment. One hundred eighty-five tons of gadolinium-loaded (0.1% by mass) liquid scintillator (Gd-LS) and two hundred tons of unloaded liquid scintillator (LS) were successfully produced from a linear-alkylbenzene (LAB) solvent in six months. The scintillator properties, the production and purification systems, and the quality assurance and control (QA/QC) procedures are described.

preprint2014arXiv

The Appell Function $F_1$ and Regge String Scattering Amplitudes

We show that each 26D open bosonic Regge string scattering amplitude (RSSA) can be expressed in terms of one single Appell function $F_1$ in the Regge limit. This result enables us to derive infinite number of recurrence relations among RSSA at arbitrary mass levels, which are conjectured to be related to the known SL(5,C) dynamical symmetry of $F_1$. In addition, we show that these recurrence relations in the Regge limit can be systematically solved so that all RSSA can be expressed in terms of one amplitude. All these results are dual to high energy symmetries of fixed angle string scattering amplitudes discovered previously [4-8].

preprint2013arXiv

A New Approach to the Yang-Mills Gauge Theory of Gravity and its Applications

We shall give dynamics to our spacetime manifold by first identifying the local affine symmetry as the characterizing symmetry for our geometry a'la Felix Klein, this symmetry is imposed on us by the Law of Inertia and the Law of Causality. We then prescribe 16 gauge vector bosons to this symmetry a'la Yang and Mills. The locally affine symmetric Yang-Mills Lagrangian in the presence of a background world metric, and the corresponding equations of motion, are respectively constructed and derived. Spontaneous breaking of the local affine symmetry to the local Lorentz symmetry is achieved by classical solutions to the equations of motion. In these classical solutions, the 16 gauge vector bosons are shown to select the Schwarzschild metric as one among the admissible background world metrics. Classical gravity is thus be expressed by a spontaneously broken Erlangen program. We shall also show that this Yang-Mills gauge theory of gravity can give an explanation of the form of the galactic rotation curves, of the amount of intergalactic gravitational lensing, and of the accelerating expansion of the Universe.

preprint2013arXiv

A note on on-shell recursion relation of string amplitudes

In the application of on-shell recursion relation to string amplitudes, one challenge is the sum over infinite intermediate on-shell string states. In this note, we show how to sum these infinite states explicitly by including unphysical states to make complete Fock space.

preprint2013arXiv

BCFW Deformation and Regge Limit

BCFW deformation has served as an extremely useful tool in providing a recursive approach in studying color-ordered gauge amplitudes. This procedure has also been generalized to the study of graviton scattering. An important ingredient of this approach is the ability to identify amplitudes satisfying convergent dispersion relation when the BCFW parameter, z, is treated as a complex variable. In a modified BCFW treatment, we show in what sense the BCFW deformation in the large-z limit can be understood as the Regge limit. We also discuss how the issue of convergent dispersion integral for amplitudes involving external spins relates to the study of super-convergence relation which served as the precursor to the s-t duality relation for flat space string amplitudes.

preprint2013arXiv

Phase Structure in a Dynamical Soft-Wall Holographic QCD Model

preprint2013arXiv

Recurrence Relations of Higher Spin BPST Vertex Operators for Open String

We calculate higher spin BPST vertex operators for open bosonic string and express these operstors in terms of Kummer function of the second kind. We derive infinite number of recurrence relations among BPST vertex operators of different string states. These recurrence relations among BPST vertex operators lead to the recurrence relations among Regge string scattering amplitudes discovered recently.

preprint2012arXiv

Exponential fall-off Behavior of Regge Scatterings in Compactified Open String Theory

We calculate massive string scattering amplitudes of compactified open string in the Regge regime. We extract the complete infinite ratios among high-energy amplitudes of different string states in the fixed angle regime from these Regge string scattering amplitudes. The complete ratios calculated by this indirect method include and extend the subset of ratios calculated previously (Lee and Yang, 2007, and Lee, Takimi, and Yang, 2008) by the more difficult direct fixed angle calculation. In this calculation of compactified open string scattering, we discover a realization of arbitrary real values L in the identity Eq.(4.18), rather than integer value only in all previous high-energy string scattering amplitude calculations. The identity in Eq.(4.18) was explicitly proved recently in Lee, Yan, and Yang to link fixed angle and Regge string scattering amplitudes. In addition, we discover a kinematic regime with stringy highly winding modes, which shows the unusual exponential fall-off behavior in the Regge string scattering. This is in complementary with a kenematic regime discovered previously (Lee, Takimi, and Yang, 2008), which shows the unusual power-law behavior in the high-energy fixed angle compactified string scatterings. Key words: Regge string scatterings; High-energy String

preprint2012arXiv

High-Energy String Scattering Amplitudes and Signless Stirling Number Identity

We give a complete proof of a set of identities (7) proposed recently from calculation of high-energy string scattering amplitudes. These identities allow one to extract ratios among high-energy string scattering amplitudes in the fixed angle regime from high-energy amplitudes in the Regge regime. The proof is based on a signless Stirling number identity in combinatorial theory. The results are valid for arbitrary real values $L$ rather than only for $L=0,1$ proved previously. The identities for non-integer real value $L$ were recently shown to be realized in high-energy compactified string scattering amplitudes [He S., Lee J.C., Yang Y., arXiv:1012.3158]. The parameter $L$ is related to the mass level of an excited string state and can take non-integer values for Kaluza-Klein modes.

preprint2012arXiv

Quadratic Gravitational Lagrangian with Torsion Can Give Possible Explanations of the Form of Galactic Rotation Curves, of the Amount of Intergalactic Lensings, and of the Accelerating Expansion of the Universe

The Quadratic Gravitational Lagrangian with torsion provides us with a richer number of solutions than the Einstein-Hilbert Lagrangian does. With proper interpretation, these solutions, together, seem to give good explanations of the form of the galactic rotation curves, of the amount of intergalactic gravitational lensings, and of the accelerating expansion of the Universe. The existence of Particle Families can also arise from the existence of these various microscopic metrics endowed to the respective particles

preprint2011arXiv

Higher Spin String States Scattered from D-particle in the Regge Regime and Factorized Ratios of Fixed Angle Scatterings

We study scattering of higher spin closed string states at arbitrary mass levels from D-particle in the Regge regime. We extract the infinite ratios among high-energy amplitudes of different string states in the fixed angle regime from these Regge string scattering amplitudes. In this calculation, we have used an identity proved recently based on a signless Stirling number identity in combinatorial theory. The complete ratios calculated by this indirect method include a subset of ratios calculated previously by direct fixed angle calculation. Moreover, we discover that in spite of the non-factorizability of the closed string D-particle scattering amplitudes, the complete ratios derived for the fixed angle regime are found to be factorized. These ratios are consistent with the decoupling of high-energy zero norm states calculated previously.

preprint2011arXiv

Massive Superstring Scatterings in the Regge Regime

We calculate four classes of high energy massive string scattering amplitudes of fermionic string theory at arbitrary mass levels in the Regge regime (RR). We show that all four leading order amplitudes in the RR can be expressed in terms of the Kummer function of the second kind. Based on the summation algorithm of a set of extended signed Stirling number identities, we show that all four ratios calculated previously by the method of decoupling of zero-norm states among scattering amplitudes in the Gross Regime (GR) can be extracted from this Kummer function in the RR. Finally, we conjecture and give evidences that the existence of these four GR ratios in the RR persists to subleading orders in the Regge expansion of all high energy fermionic string scattering amplitudes.

preprint2011arXiv

Phase Structure of Kerr-AdS Black Hole

We study the critical phenomena of Kerr-AdS black hole. Phase structures are observed at different temperatures, $T_{L}$, $T_{c1}$ and $T_{c2}$ with various features. We discuss the thermal stability considering the isothermal compressibility and how phase transitions related to each other. The asymptotic value of the angular momentum also has an implication on separating stable and unstable part. Near critical temperature $T_{c1}$, the order parameter is determined to calculate the critical exponents. All the critical exponents ($α$,$β$,$γ$,$δ$)=(0,1/2,1,3) are identical to that of mean field systems. We plot the phase diagram near this critical point, and discuss the scaling symmetry of the free energy.

preprint2011arXiv

String Scattering Amplitudes in High Energy Limits

A very review of string scattering amplitudes in two important high energy limits: hard scattering and Regge scattering. Recent results of the symmetries in string theory by studying high energy string scattering anplitudes are showed.

preprint2010arXiv

High-energy String Scatterings of Compactified Open String

We calculate high-energy massive string scattering amplitudes of compactified open string. We derive infinite linear relations, or stringy symmetries, among soft high-energy string scattering amplitudes of different string states in the Gross kinematic regime (GR). In addition, we systematically analyze all hard power-law and soft exponential fall-off regimes of high-energy compactified open string scatterings by comparing the scatterings with their 26D noncompactified counterparts. In particular, we discover the existence of a power-law regime at fixed angle and an exponential fall-off regime at small angle for high-energy compactified open string scatterings. The linear relations break down as expected in all power-law regimes. The analysis can be extended to the high-energy scatterings of the compactified closed string, which corrects and extends the previous results in [28] .

preprint2009arXiv

Scalar Mesons and glueballs in $Dp-Dq$ hard-wall models

We investigate light scalar mesons and glueballs in the $Dp-Dq$ hard-wall models, including $D3-Dq$, $D4-Dq$, and $D6-Dq$ systems. It is found that only in the $D4-D6$ and $D4-D8$ hard wall models, the predicted masses of the ${\bar q} q$ scalar meson $f_0$, scalar glueball are consistent with their experimental or lattice results. This indicates that $D4-D6$ and $D4-D8$ hard-wall models are favorite candidates of the realistic holographic QCD model.

preprint2008arXiv

Confront Holographic QCD with Regge Trajectories of vectors and axial-vectors

We derive the general 5-dimension metric structure of the $Dp-Dq$ system in type II superstring theory, and demonstrate the physical meaning of the parameters characterizing the 5-dimension metric structure of the \textit{holographic} QCD model by relating them to the parameters describing Regge trajectories. By matching the spectra of vector mesons $ρ_1$ with deformed $Dp-Dq$ soft-wall model, we find that the spectra of vector mesons $ρ_1$ can be described very well in the soft-wall $D3-Dq$ model, i.e, $AdS_5$ soft-wall model. We then investigate how well the $AdS_5$ soft-wall model can describe the Regge trajectory of axial-vector mesons $a_1$. We find that the constant component of the 5-dimension mass square of axial-vector mesons plays an efficient role to realize the chiral symmetry breaking in the vacuum, and a small negative $z^4$ correction in the 5-dimension mass square is helpful to realize the chiral symmetry restoration in high excitation states.

preprint2005arXiv

Comments on the high energy limit of bosonic open string theory

In previous works, ratios among four-point scattering amplitudes at the leading order in the high-energy limit were derived for the bosonic open string theory. The derivation was based on Ward identities derived from the decoupling of zero-norm states and was purely algebraic. The only assumption of the derivation was that the momentum polarization can be approximated by the longitudinal polarization at high energies. In this paper, using the decoupling of spurious states, we reduce this assumption to a much weaker one which can be easily verified by simple power counting in most cases. For the special cases which are less obvious, we verify the new assumption for an example by saddle-point approximation. We also provide a new perspective to our previous results in terms of DDF states. In particular, we show that, by using DDF states, one can easily see that there is only one independent high energy scattering amplitude for each fixed mass level.

preprint2005arXiv

Generalizations of Lunin-Maldacena transformation on the $AdS sub 5 x S sup 5$ background

In this paper we consider a simple generalization of the method of Lunin and Maldacena for generating new string backgrounds based on TsT-transformations. We study multi-shift $Ts... sT$ transformations applied to backgrounds with at least two U(1) isometries. We prove that the string currents in any two backgrounds related by Ts...sT-transformations are equal. Applying this procedure to the $AdS_{5}\times S^{5}$, we find a new background and study some properties of the semiclassical strings.

preprint2005arXiv

Solving all 4-point correlation functions for bosonic open string theory in the high energy limit

We study the implication of decoupling zero-norm states in the high-energy limit, for the 26 dimensional bosonic open string theory. Infinitely many linear relations among 4-point functions are derived algebraically, and their unique solution is found. Equivalent results are also obtained by taking the high-energy limit of Virasoro constraints, and as an independent check, we compute all 4-point functions of 3 tachyons and an arbitrary massive state by saddle-point approximation.

Yi Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

182 published item(s)

One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

MS-DETR: Efficient DETR Training with Mixed Supervision

Newly Formed Dust within the Circumstellar Environment of SNIa-CSM 2018evt

Rotating black hole mimicker surrounded by the string cloud

DR-WLC: Dimensionality Reduction cognition for object detection and pose estimation by Watching, Learning and Checking

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

Temporal Perceiving Video-Language Pre-training

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

Active Learning for Deep Visual Tracking

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Automated Progressive Learning for Efficient Training of Vision Transformers

Automatic Depth Optimization for Quantum Approximate Optimization Algorithm

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

Compact Scintillator Array Detector (ComSAD) for sounding rocket and CubeSat missions

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

Computational discovery of spin-polarized semimetals in spinel materials

Constraints from LIGO O3 data on gravitational-wave emission due to r-modes in the glitching pulsar PSR J0537-6910

COVID-19 Detection Using CT Image Based On YOLOv5 Network

Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning

Deep Hierarchical Semantic Segmentation

Early-Time Ultraviolet Spectroscopy and Optical Follow-up Observations of the Type IIP Supernova 2021yja

Few-Shot Segmentation via Cycle-Consistent Transformer

Filter Pruning by Switching to Neighboring CNNs with Good Attributes

Free-electron-light interactions in nanophotonics

Gap Opening and Inner Disk Structure in the Strongly Accreting Transition Disk of DM Tau

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Krylov complexity and orthogonal polynomials

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

Non-Abelian nonsymmorphic chiral symmetries

Observations of the Very Young Type Ia Supernova 2019np with Early-excess Emission

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

ReGO: Reference-Guided Outpainting for Scenery Image

Results and findings of the 2021 Image Similarity Challenge

Search for anisotropic gravitational-wave backgrounds using data from Advanced LIGO and Advanced Virgo's first three observing runs

Search for Axion(-like) Particles in Heavy-Ion Collisions

Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

Spectropolarimetry of the Thermonuclear Supernova 2021rhu: High Calcium Polarization 79 Days After Peak Luminosity

Spectropolarimetry of the tidal disruption event AT 2019qiz: a quasispherical reprocessing layer

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

The exact SL(K+3,C) symmetry of string theory

The ringing of quantum corrected Schwarzschild black hole with GUP

The Type Icn SN 2021csp: Implications for the Origins of the Fastest Supernovae and the Fates of Wolf-Rayet Stars

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

Unified Transformer Tracker for Object Tracking

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

Visual Abductive Reasoning

A general framework for scintillation in nanophotonics

A Survey on Concept Factorization: From Shallow to Deep Representation Learning

Bilinear equations in Darboux transformations by Boson-Fermion correspondence

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Instance-Invariant Domain Adaptive Object Detection via Progressive Disentanglement

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

Learning to Anticipate Egocentric Actions by Imagination

Modeling the Probabilistic Distribution of Unlabeled Data forOne-shot Medical Image Segmentation

One-Shot Neural Architecture Search via Self-Evaluated Template Network

Sketch-Guided Scenery Image Outpainting

Supervision by Registration and Triangulation for Landmark Detection

Toggling Near-field Directionality via Polarization Control of Surface Waves

Adaptive Exploration for Unsupervised Person Re-Identification

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

Analytic Study of Magnetic Catalysis in Holographic QCD