Source author record

Jian Sun

Jian Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

152works

43topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Are eHMIs always helpful? Investigating how eHMIs interfere with pedestrian behavior on multi-lane streets: An eye-tracking virtual reality experiment

Appropriate communication is crucial for efficient and safe interactions between pedestrians and autonomous vehicles (AVs). External human-machine interfaces (eHMIs) on AVs, which can be categorized as allocentric or egocentric, are considered a promising solution. While the effectiveness of eHMIs has been extensively studied, in complex environments, such as unsignalized multi-lane streets, their potential to interfere with pedestrian crossing behavior remains underexplored. Hence, a virtual reality-based experiment was conducted to examine how different types of eHMIs displayed on AVs affect the crossing behavior of pedestrians in multi-lane streets environments, with a focus on the gaze patterns of pedestrians during crossing. The results revealed that the presence of eHMIs significantly influenced the cognitive load on pedestrians and increased the possibility of distraction, even misleading pedestrians in cases involving multiple AVs on multi-lane streets. Notably, allocentric eHMIs induced higher cognitive loads and greater distraction in pedestrians than egocentric eHMIs. This was primarily evidenced by longer gaze time and higher proportions of attention for the eHMI on the interacting vehicle, as well as a broader distribution of gaze toward vehicles in the non-interacting lane. However, misleading behavior was mainly triggered by eHMI signals from yielding vehicles in the non-interacting lane. Under such asymmetric signal configurations, egocentric eHMIs resulted in a higher misjudgment rate than allocentric eHMIs. These findings highlight the importance of enhancing eHMI designs to balance the clarity and consistency of the displayed information across different perspectives, especially in complex multi-lane traffic scenarios. This study provides valuable insights regarding the application and standardization of future eHMI systems for AVs.

preprint2026arXiv

MobileGeo: Exploring Hierarchical Knowledge Distillation for Resource-Efficient Cross-view Drone Geo-Localization

Cross-view geo-localization (CVGL) plays a vital role in drone-based multimedia applications, enabling precise localization by matching drone-captured aerial images against geo-tagged satellite databases in GNSS-denied environments. However, existing methods rely on resource-intensive feature alignment and multi-branch architectures, incurring high inference costs that limit their deployment on edge devices. We propose MobileGeo, a mobile-friendly framework designed for efficient on-device CVGL: 1) During training, a Hierarchical Distillation (HD-CVGL) paradigm, coupled with Uncertainty-Aware Prediction Alignment (UAPA), distills essential information into a compact model without incurring inference overhead. 2) During inference, an efficient Multi-view Selection Refinement Module (MSRM) leverages mutual information to filter redundant views and reduce computational load. Extensive experiments demonstrate that MobileGeo outperforms previous state-of-the-art methods, achieving a 4.19% improvement in AP on University1652 dataset while being over 5 times efficient in FLOPs and 3 times faster. Crucially, MobileGeo runs at 251.5 FPS on an NVIDIA AGX Orin edge device, demonstrating its practical viability for real-time on-device drone geo-localization. The code is available at https://github.com/SkyEyeLoc/MobileGeo.

preprint2026arXiv

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance this trade-off, we introduce Conditional Entropy Shaping (CES), a framework that dynamically controls token-level response entropy, enabling LLMs to produce concise solutions on simple problems while encouraging deeper exploration on hard ones. Built on DAPO, CES uses token-level entropy as an uncertainty signal and applies a conditional bidirectional policy: it penalizes high-entropy "forking point" tokens on correct reasoning paths to improve conciseness, and rewards them on incorrect paths to encourage exploration and error correction. We implement CES on DeepSeek-R1-Distill-7B and evaluate it on 12 mathematical benchmarks. CES consistently improves average accuracy while reducing response length relative to DAPO, and supplementary experiments show similar trends on a smaller 1.5B backbone and on out-of-domain benchmarks.

preprint2024arXiv

MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos

Deep machine learning models including Convolutional Neural Networks (CNN) have been successful in the detection of Mild Cognitive Impairment (MCI) using medical images, questionnaires, and videos. This paper proposes a novel Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to distinguish MCI from those with normal cognition by analyzing facial features. The data comes from the I-CONECT, a behavioral intervention trial aimed at improving cognitive function by providing frequent video chats. MC-ViViT extracts spatiotemporal features of videos in one branch and augments representations by the MC module. The I-CONECT dataset is challenging as the dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE loss to address the imbalanced problem. Our experimental results on the I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a high accuracy of 90.63% accuracy on some of the interview videos.

preprint2024arXiv

Quasi-invariant theorem on the Gaussian path space

In this article, we will first introduce a class of Gaussian processes, and prove the quasi-invariant theorem with respect to the Gaussian Wiener measure, which is the law of the associated Gaussian process. In particular, it includes the case of the fractional Brownian motion. As applications, we will establish the integration by parts formula and Bismut-Elworthy-Li formula on the Gaussian path space, and by which some logarithmic Sobolev inequalities will be presented. Moreover, we will also provides some applications in the field of financial mathematics.

preprint2023arXiv

Dynamic Grained Encoder for Vision Transformers

Transformers, the de-facto standard for language modeling, have been recently applied for vision tasks. This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images and save computational costs. Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region. Thus it achieves a fine-grained representation in discriminative regions while keeping high efficiency. Besides, the dynamic grained encoder is compatible with most vision transformer frameworks. Without bells and whistles, our encoder allows the state-of-the-art vision transformers to reduce computational complexity by 40%-60% while maintaining comparable performance on image classification. Extensive experiments on object detection and segmentation further demonstrate the generalizability of our approach. Code is available at https://github.com/StevenGrove/vtpack.

preprint2023arXiv

Pressure-Induced Superconductivity in Topological Heterostructure (PbSe)5(Bi2Se3)6

Recently, the natural heterostructure of (PbSe)5(Bi2Se3)6 has been theoretically predicted and experimentally confirmed as a topological insulator. In this work, we induce superconductivity in (PbSe)5(Bi2Se3)6 by implementing high pressure. As increasing pressure up to 10 GPa, superconductivity with Tc ~ 4.6 K suddenly appears, followed by an abrupt decrease. Remarkably, upon further compression above 30 GPa, a new superconducting state arises, where pressure raises the Tc to an unsaturated 6.0 K within the limit of our research. Combining XRD and Raman spectroscopies, we suggest that the emergence of two distinct superconducting states occurs concurrently with the pressure-induced structural transition in this topological heterostructure (PbSe)5(Bi2Se3)6.

preprint2023arXiv

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Capsule Network is powerful at defining the positional relationship between features in deep neural networks for visual recognition tasks, but it is computationally expensive and not suitable for running on mobile devices. The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between the capsules. On the other hand, XNOR-Net is fast and computationally efficient, though it suffers from low accuracy due to information loss in the binarization process. To address the computational burdens of the Dynamic Routing mechanism, this paper proposes new Fully Connected (FC) layers by xnorizing the linear projection outside or inside the Dynamic Routing within the CapsFC layer. Specifically, our proposed FC layers have two versions, XnODR (Xnorize the Linear Projection Outside Dynamic Routing) and XnIDR (Xnorize the Linear Projection Inside Dynamic Routing). To test the generalization of both XnODR and XnIDR, we insert them into two different networks, MobileNetV2 and ResNet-50. Our experiments on three datasets, MNIST, CIFAR-10, and MultiMNIST validate their effectiveness. The results demonstrate that both XnODR and XnIDR help networks to have high accuracy with lower FLOPs and fewer parameters (e.g., 96.14% correctness with 2.99M parameters and 311.74M FLOPs on CIFAR-10).

preprint2022arXiv

A Survey on Neural Open Information Extraction: Current Status and Future Directions

Open Information Extraction (OpenIE) facilitates domain-independent discovery of relational facts from large corpora. The technique well suits many open-world natural language understanding scenarios, such as automatic knowledge base construction, open-domain question answering, and explicit reasoning. Thanks to the rapid development in deep learning technologies, numerous neural OpenIE architectures have been proposed and achieve considerable performance improvement. In this survey, we provide an extensive overview of the-state-of-the-art neural OpenIE models, their key design decisions, strengths and weakness. Then, we discuss limitations of current solutions and the open issues in OpenIE problem itself. Finally we list recent trends that could help expand its scope and applicability, setting up promising directions for future research in OpenIE. To our best knowledge, this paper is the first review on this specific topic.

preprint2022arXiv

A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactions with the systems. In recent years, deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output SQL query. Subsequently, the large pre-trained language models have taken the state-of-the-art of the text-to-SQL parsing task to a new level. In this survey, we present a comprehensive review on deep learning approaches for text-to-SQL parsing. First, we introduce the text-to-SQL parsing corpora which can be categorized as single-turn and multi-turn. Second, we provide a systematical overview of pre-trained language models and existing methods for text-to-SQL parsing. Third, we present readers with the challenges faced by text-to-SQL parsing and explore some potential future directions in this field.

preprint2022arXiv

Anchor DETR: Query Design for Transformer-Based Object Detection

In this paper, we propose a novel query design for the transformer-based object detection. In previous transformer-based detectors, the object queries are a set of learned embeddings. However, each learned embedding does not have an explicit physical meaning and we cannot explain where it will focus on. It is difficult to optimize as the prediction slot of each object query does not have a specific mode. In other words, each object query will not focus on a specific region. To solved these problems, in our query design, object queries are based on anchor points, which are widely used in CNN-based detectors. So each object query focuses on the objects near the anchor point. Moreover, our query design can predict multiple objects at one position to solve the difficulty: "one region, multiple objects". In addition, we design an attention variant, which can reduce the memory cost while achieving similar or better performance than the standard attention in DETR. Thanks to the query design and the attention variant, the proposed detector that we called Anchor DETR, can achieve better performance and run faster than the DETR with 10$\times$ fewer training epochs. For example, it achieves 44.2 AP with 19 FPS on the MSCOCO dataset when using the ResNet50-DC5 feature for training 50 epochs. Extensive experiments on the MSCOCO benchmark prove the effectiveness of the proposed methods. Code is available at \url{https://github.com/megvii-research/AnchorDETR}.

preprint2022arXiv

Boosting Black-Box Adversarial Attacks with Meta Learning

Deep neural networks (DNNs) have achieved remarkable success in diverse fields. However, it has been demonstrated that DNNs are very vulnerable to adversarial examples even in black-box settings. A large number of black-box attack methods have been proposed to in the literature. However, those methods usually suffer from low success rates and large query counts, which cannot fully satisfy practical purposes. In this paper, we propose a hybrid attack method which trains meta adversarial perturbations (MAPs) on surrogate models and performs black-box attacks by estimating gradients of the models. Our method uses the meta adversarial perturbation as an initialization and subsequently trains any black-box attack method for several epochs. Furthermore, the MAPs enjoy favorable transferability and universality, in the sense that they can be employed to boost performance of other black-box adversarial attack methods. Extensive experiments demonstrate that our method can not only improve the attack success rates, but also reduces the number of queries compared to other methods.

preprint2022arXiv

BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment

This work addresses the Burst Super-Resolution (BurstSR) task using a new architecture, which requires restoring a high-quality image from a sequence of noisy, misaligned, and low-resolution RAW bursts. To overcome the challenges in BurstSR, we propose a Burst Super-Resolution Transformer (BSRT), which can significantly improve the capability of extracting inter-frame information and reconstruction. To achieve this goal, we propose a Pyramid Flow-Guided Deformable Convolution Network (Pyramid FG-DCN) and incorporate Swin Transformer Blocks and Groups as our main backbone. More specifically, we combine optical flows and deformable convolutions, hence our BSRT can handle misalignment and aggregate the potential texture information in multi-frames more efficiently. In addition, our Transformer-based structure can capture long-range dependency to further improve the performance. The evaluation on both synthetic and real-world tracks demonstrates that our approach achieves a new state-of-the-art in BurstSR task. Further, our BSRT wins the championship in the NTIRE2022 Burst Super-Resolution Challenge.

preprint2022arXiv

Data-driven Self-triggered Control via Trajectory Prediction

Self-triggered control, a well-documented technique for reducing the communication overhead while ensuring desired system performance, is gaining increasing popularity. However, existing methods for self-triggered control require explicit system models that are assumed perfectly known a priori. An end-to-end control paradigm known as data-driven control learns control laws directly from data, and offers a competing alternative to the routine system identification-then-control method. In this context, the present paper puts forth data-driven self-triggered control schemes for unknown linear systems using data collected offline. Specifically, for output feedback control systems, a data-driven model predictive control (MPC) scheme is proposed, which computes a sequence of control inputs while generating a predicted system trajectory. A data-driven self-triggering law is designed using the predicted trajectory, to determine the next triggering time once a new measurement becomes available. For state feedback control systems, instead of capitalizing on MPC to predict the trajectory, a data-fitting problem using the pre-collected input-state data is solved, whose solution is employed to construct the self-triggering mechanism. Both feasibility and stability are established for the proposed self-triggered controllers, which are validated using numerical examples.

preprint2022arXiv

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

To date, the most powerful semi-supervised object detectors (SS-OD) are based on pseudo-boxes, which need a sequence of post-processing with fine-tuned hyper-parameters. In this work, we propose replacing the sparse pseudo-boxes with the dense prediction as a united and straightforward form of pseudo-label. Compared to the pseudo-boxes, our Dense Pseudo-Label (DPL) does not involve any post-processing method, thus retaining richer information. We also introduce a region selection technique to highlight the key information while suppressing the noise carried by dense labels. We name our proposed SS-OD algorithm that leverages the DPL as Dense Teacher. On COCO and VOC, Dense Teacher shows superior performance under various settings compared with the pseudo-box-based methods.

preprint2022arXiv

Differentiable Architecture Search with Random Features

Differentiable architecture search (DARTS) has significantly promoted the development of NAS techniques because of its high search efficiency and effectiveness but suffers from performance collapse. In this paper, we make efforts to alleviate the performance collapse problem for DARTS from two aspects. First, we investigate the expressive power of the supernet in DARTS and then derive a new setup of DARTS paradigm with only training BatchNorm. Second, we theoretically find that random features dilute the auxiliary connection role of skip-connection in supernet optimization and enable search algorithm focus on fairer operation selection, thereby solving the performance collapse problem. We instantiate DARTS and PC-DARTS with random features to build an improved version for each named RF-DARTS and RF-PCDARTS respectively. Experimental results show that RF-DARTS obtains \textbf{94.36\%} test accuracy on CIFAR-10 (which is the nearest optimal result in NAS-Bench-201), and achieves the newest state-of-the-art top-1 test error of \textbf{24.0\%} on ImageNet when transferring from CIFAR-10. Moreover, RF-DARTS performs robustly across three datasets (CIFAR-10, CIFAR-100, and SVHN) and four search spaces (S1-S4). Besides, RF-PCDARTS achieves even better results on ImageNet, that is, \textbf{23.9\%} top-1 and \textbf{7.1\%} top-5 test error, surpassing representative methods like single-path, training-free, and partial-channel paradigms directly searched on ImageNet.

preprint2022arXiv

Distributed Momentum-based Frank-Wolfe Algorithm for Stochastic Optimization

This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings. In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov's momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks. It is shown that the convergence rate of the proposed algorithm is $\mathcal{O}(k^{-\frac{1}{2}})$ for convex optimization, and $\mathcal{O}(1/\mathrm{log}_2(k))$ for nonconvex optimization. The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.

preprint2022arXiv

Distributed stochastic projection-free solver for constrained optimization

This paper proposes a distributed stochastic projection-free algorithm for large-scale constrained finite-sum optimization whose constraint set is complicated such that the projection onto the constraint set can be expensive. The global cost function is allocated to multiple agents, each of which computes its local stochastic gradients and communicates with its neighbors to solve the global problem. Stochastic gradient methods enable low computational cost, while they are hard and slow to converge due to the variance caused by random sampling. To construct a convergent distributed stochastic projection-free algorithm, this paper incorporates a variance reduction technique and gradient tracking technique in the Frank-Wolfe update. We develop a sampling rule for the variance reduction technique to reduce the variance introduced by stochastic gradients. Complete and rigorous proofs show that the proposed distributed projection-free algorithm converges with a sublinear convergence rate and enjoys superior complexity guarantees for both convex and non-convex objective functions. By comparative simulations, we demonstrate the convergence and computational efficiency of the proposed algorithm.

preprint2022arXiv

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems

In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semi-supervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response latency by 50%.

preprint2022arXiv

Efficient reversible data hiding via two layers of double-peak embedding

Reversible data hiding continues to attract significant attention in recent years. In particular, an increasing number of authors focus on the higher significant bit (HSB) plane of an image which can yield more redundant space. On the other hand, the lower significant bit planes are often ignored for embedding in existing schemes due to their harm to the embedding rate. This paper proposes an efficient reversible data hiding scheme via a double-peak two-layer embedding (DTLE) strategy with prediction error expansion. The higher six-bit planes of the image are assigned as the HSB plane, and double prediction error peaks are applied in either embedding layer. This makes fuller use of the redundancy space of images compared with the one error peak strategy. Moreover, we carry out the median-edge detector pre-processing for complex images to reduce the size of the auxiliary information. A series of experimental results show that our DTLE approach achieves up to 83% higher embedding rate on real-world datasets while guaranteeing better image quality.

preprint2022arXiv

Focal Sparse Convolutional Networks for 3D Object Detection

Non-uniformed 3D sparse data, e.g., point clouds or voxels in different spatial positions, make contribution to the task of 3D object detection in different ways. Existing basic components in sparse convolutional networks (Sparse CNNs) process all sparse data, regardless of regular or submanifold sparse convolution. In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction. They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion, or Focals Conv-F for short. The new modules can readily substitute their plain counterparts in existing Sparse CNNs and be jointly trained in an end-to-end fashion. For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection. Extensive experiments on the KITTI, nuScenes and Waymo benchmarks validate the effectiveness of our approach. Without bells and whistles, our results outperform all existing single-model entries on the nuScenes test benchmark at the paper submission time. Code and models are at https://github.com/dvlab-research/FocalsConv.

preprint2022arXiv

FS6D: Few-Shot 6D Pose Estimation of Novel Objects

6D object pose estimation networks are limited in their capability to scale to large numbers of object instances due to the close-set assumption and their reliance on high-fidelity object CAD models. In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training. To tackle the problem, we point out the importance of fully exploring the appearance and geometric relationship between the given support views and query scene patches and propose a dense prototypes matching framework by extracting and matching dense RGBD prototypes with transformers. Moreover, we show that the priors from diverse appearances and shapes are crucial to the generalization capability under the problem setting and thus propose a large-scale RGBD photorealistic dataset (ShapeNet6D) for network pre-training. A simple and effective online texture blending approach is also introduced to eliminate the domain gap from the synthesis dataset, which enriches appearance diversity at a low cost. Finally, we discuss possible solutions to this problem and establish benchmarks on popular datasets to facilitate future research. The project page is at \url{https://fs6d.github.io/}.

preprint2022arXiv

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.

preprint2022arXiv

Ill-posed Surface Emissivity Retrieval from Multi-Geometry Hyperspectral Images using a Hybrid Deep Neural Network

Atmospheric correction is a fundamental task in remote sensing because observations are taken either of the atmosphere or looking through the atmosphere. Atmospheric correction errors can significantly alter the spectral signature of the observations, and lead to invalid classifications or target detection. This is even more crucial when working with hyperspectral data, where a precise measurement of spectral properties is required. State-of-the-art physics-based atmospheric correction approaches require extensive prior knowledge about sensor characteristics, collection geometry, and environmental characteristics of the scene being collected. These approaches are computationally expensive, prone to inaccuracy due to lack of sufficient environmental and collection information, and often impossible for real-time applications. In this paper, a geometry-dependent hybrid neural network is proposed for automatic atmospheric correction using multi-scan hyperspectral data collected from different geometries. The proposed network can characterize the atmosphere without any additional meteorological data. A grid-search method is also proposed to solve the temperature emissivity separation problem. Results show that the proposed network has the capacity to accurately characterize the atmosphere and estimate target emissivity spectra with a Mean Absolute Error (MAE) under 0.02 for 29 different materials. This solution can lead to accurate atmospheric correction to improve target detection for real time applications.

preprint2022arXiv

Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation

Building models of natural language processing (NLP) is challenging in low-resource scenarios where only limited data are available. Optimization-based meta-learning algorithms achieve promising results in low-resource scenarios by adapting a well-generalized model initialization to handle new tasks. Nonetheless, these approaches suffer from the memorization overfitting issue, where the model tends to memorize the meta-training tasks while ignoring support sets when adapting to new tasks. To address this issue, we propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. Specifically, we introduce a task-specific memory module to store support set information and construct an imitation module to force query sets to imitate the behaviors of some representative support-set samples stored in the memory. A theoretical analysis is provided to prove the effectiveness of our method, and empirical results also demonstrate that our method outperforms competitive baselines on both text classification and generation tasks.

preprint2022arXiv

Instance-Conditional Knowledge Distillation for Object Detection

Knowledge distillation has shown great success in classification, however, it is still challenging for detection. In a typical image for detection, representations from different locations may have different contributions to detection targets, making the distillation hard to balance. In this paper, we propose a conditional distillation framework to distill the desired knowledge, namely knowledge that is beneficial in terms of both classification and localization for every instance. The framework introduces a learnable conditional decoding module, which retrieves information given each target instance as query. Specifically, we encode the condition information as query and use the teacher's representations as key. The attention between query and key is used to measure the contribution of different features, guided by a localization-recognition-sensitive auxiliary task. Extensive experiments demonstrate the efficacy of our method: we observe impressive improvements under various settings. Notably, we boost RetinaNet with ResNet-50 backbone from 37.4 to 40.7 mAP (+3.3) under 1x schedule, that even surpasses the teacher (40.4 mAP) with ResNet-101 backbone under 3x schedule. Code has been released on https://github.com/megvii-research/ICD.

preprint2022arXiv

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a challenging research problem. Previous work ignores the visual property of documents and treats them as plain text, resulting in incomplete modality. In this paper, we propose a Layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents (VRDs), so as to generate accurate responses in dialogue systems. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents, becoming the largest VRD-based information extraction dataset to the best of our knowledge. We also develop benchmark methods that extend the token-based language model to consider layout features like humans. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.

preprint2022arXiv

LGD: Label-guided Self-distillation for Object Detection

In this paper, we propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation). Previous studies rely on a strong pretrained teacher to provide instructive knowledge that could be unavailable in real-world scenarios. Instead, we generate an instructive knowledge based only on student representations and regular labels. Our framework includes sparse label-appearance encoder, inter-object relation adapter and intra-object knowledge mapper that jointly form an implicit teacher at training phase, dynamically dependent on labels and evolving student representations. They are trained end-to-end with detector and discarded in inference. Experimentally, LGD obtains decent results on various detectors, datasets, and extensive tasks like instance segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). It boosts much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale training from 46.1% to 47.9% (+ 1.8%). Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also reduces 51% training cost beyond inherent student learning. Codes are available at https://github.com/megvii-research/LGD.

preprint2022arXiv

Linking-Enhanced Pre-Training for Table Semantic Parsing

Recently pre-training models have significantly improved the performance of various NLP tasks by leveraging large-scale text corpora to improve the contextual representation ability of the neural network. The large pre-training language model has also been applied in the area of table semantic parsing. However, existing pre-training approaches have not carefully explored explicit interaction relationships between a question and the corresponding database schema, which is a key ingredient for uncovering their semantic and structural correspondence. Furthermore, the question-aware representation learning in the schema grounding context has received less attention in pre-training objective.To alleviate these issues, this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. We evaluate our pre-trained framework by fine-tuning it on two benchmarks, Spider and SQUALL. The results demonstrate the effectiveness of our pre-training objective and curriculum compared to a variety of baselines.

preprint2022arXiv

MMChat: Multi-Modal Chat Dataset on Social Media

Incorporating multi-modal contexts in conversation is important for developing more engaging dialogue systems. In this work, we explore this direction by introducing MMChat: a large-scale Chinese multi-modal dialogue corpus (32.4M raw dialogues and 120.84K filtered dialogues). Unlike previous corpora that are crowd-sourced or collected from fictitious movies, MMChat contains image-grounded dialogues collected from real conversations on social media, in which the sparsity issue is observed. Specifically, image-initiated dialogues in common communications may deviate to some non-image-grounded topics as the conversation proceeds. To better investigate this issue, we manually annotate 100K dialogues from MMChat and further filter the corpus accordingly, which yields MMChat-hf. We develop a benchmark model to address the sparsity issue in dialogue generation tasks by adapting the attention routing mechanism on image features. Experiments demonstrate the usefulness of incorporating image features and the effectiveness of handling the sparsity of image features.

preprint2022arXiv

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2022arXiv

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemapping operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).

preprint2022arXiv

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at \url{https://github.com/megvii-research/PETR}.

preprint2022arXiv

Progressive End-to-End Object Detection in Crowded Scenes

In this paper, we propose a new query-based detection framework for crowd detection. Previous query-based detectors suffer from two drawbacks: first, multiple predictions will be inferred for a single object, typically in crowded scenes; second, the performance saturates as the depth of the decoding stage increases. Benefiting from the nature of the one-to-one label assignment rule, we propose a progressive predicting method to address the above issues. Specifically, we first select accepted queries prone to generate true positive predictions, then refine the rest noisy queries according to the previously accepted predictions. Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes. Equipped with our approach, Sparse RCNN achieves 92.0\% $\text{AP}$, 41.4\% $\text{MR}^{-2}$ and 83.2\% $\text{JI}$ on the challenging CrowdHuman \cite{shao2018crowdhuman} dataset, outperforming the box-based method MIP \cite{chu2020detection} that specifies in handling crowded scenarios. Moreover, the proposed method, robust to crowdedness, can still obtain consistent improvements on moderately and slightly crowded datasets like CityPersons \cite{zhang2017citypersons} and COCO \cite{lin2014microsoft}. Code will be made publicly available at https://github.com/megvii-model/Iter-E2EDET.

preprint2022arXiv

Real-time Object Detection for Streaming Perception

Autonomous driving requires the model to perceive the environment and (re)act within a low latency for safety. While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem. We build a simple and effective framework for streaming perception. It equips a novel DualFlow Perception module (DFP), which includes dynamic and static flows to capture the moving trend and basic detection feature for streaming prediction. Further, we introduce a Trend-Aware Loss (TAL) combined with a trend factor to generate adaptive weights for objects with different moving speeds. Our simple method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline, validating its effectiveness. Our code will be made available at https://github.com/yancie-yjr/StreamYOLO.

preprint2022arXiv

Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition

Deep neural networks perform poorly on heavily class-imbalanced datasets. Given the promising performance of contrastive learning, we propose Rebalanced Siamese Contrastive Mining (ResCom) to tackle imbalanced recognition. Based on the mathematical analysis and simulation results, we claim that supervised contrastive learning suffers a dual class-imbalance problem at both the original batch and Siamese batch levels, which is more serious than long-tailed classification learning. In this paper, at the original batch level, we introduce a class-balanced supervised contrastive loss to assign adaptive weights for different classes. At the Siamese batch level, we present a class-balanced queue, which maintains the same number of keys for all classes. Furthermore, we note that the imbalanced contrastive loss gradient with respect to the contrastive logits can be decoupled into the positives and negatives, and easy positives and easy negatives will make the contrastive gradient vanish. We propose supervised hard positive and negative pairs mining to pick up informative pairs for contrastive computation and improve representation learning. Finally, to approximately maximize the mutual information between the two views, we propose Siamese Balanced Softmax and joint it with the contrastive loss for one-stage training. Extensive experiments demonstrate that ResCom outperforms the previous methods by large margins on multiple long-tailed recognition benchmarks. Our code and models are made publicly available at: https://github.com/dvlab-research/ResCom.

preprint2022arXiv

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. This renders "how to appropriately define and alleviate the bias" one of the most important issues. Prior works mainly use label distribution or mean score information to indicate a coarse-grained bias. In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. To this end, we propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences. PCB generates fightback soft labels for regularization during training. Besides, an iterative learning paradigm is developed to support a progressive and smooth regularization in such debiasing. PCB can be plugged and played to any existing method as a complement. Experimental results on LVIS demonstrate that our method achieves state-of-the-art performance without bells and whistles. Superior results across various architectures show the generalization ability. The code and trained models are available at https://github.com/megvii-research/PCB.

preprint2022arXiv

S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

The task of converting a natural language question into an executable SQL query, known as text-to-SQL, is an important branch of semantic parsing. The state-of-the-art graph-based encoder has been successfully used in this task but does not model the question syntax well. In this paper, we propose S$^2$SQL, injecting Syntax to question-Schema graph encoder for Text-to-SQL parsers, which effectively leverages the syntactic dependency information of questions in text-to-SQL to improve the performance. We also employ the decoupling constraint to induce diverse relational edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used, resulting in a performance ranks first on the Spider leaderboard.

preprint2022arXiv

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We suggested five guidelines, e.g., applying re-parameterized large depth-wise convolutions, to design efficient high-performance large-kernel CNNs. Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large-kernel CNNs have much larger effective receptive fields and higher shape bias rather than texture bias. Code & models at https://github.com/megvii-research/RepLKNet.

preprint2022arXiv

Simple Baselines for Image Restoration

Although there have been significant advances in the field of image restoration recently, the system complexity of the state-of-the-art (SOTA) methods is increasing as well, which may hinder the convenient analysis and comparison of methods. In this paper, we propose a simple baseline that exceeds the SOTA methods and is computationally efficient. To further simplify the baseline, we reveal that the nonlinear activation functions, e.g. Sigmoid, ReLU, GELU, Softmax, etc. are not necessary: they could be replaced by multiplication or removed. Thus, we derive a Nonlinear Activation Free Network, namely NAFNet, from the baseline. SOTA results are achieved on various challenging benchmarks, e.g. 33.69 dB PSNR on GoPro (for image deblurring), exceeding the previous SOTA 0.38 dB with only 8.4% of its computational costs; 40.30 dB PSNR on SIDD (for image denoising), exceeding the previous SOTA 0.28 dB with less than half of its computational costs. The code and the pre-trained models are released at https://github.com/megvii-research/NAFNet.

preprint2022arXiv

SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation

Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either dialog understanding or dialog generation, but not both. In this paper, we propose SPACE-3, a novel unified semi-supervised pre-trained conversation model learning from large-scale dialog corpora with limited annotations, which can be effectively fine-tuned on a wide range of downstream dialog tasks. Specifically, SPACE-3 consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation module to produce appropriate responses. We design a dedicated pre-training objective for each component. Concretely, we pre-train the dialog encoding module with span mask language modeling to learn contextualized dialog information. To capture the structured dialog semantics, we pre-train the dialog understanding module via a novel tree-induced semi-supervised contrastive learning objective with the help of extra dialog annotations. In addition, we pre-train the dialog policy module by minimizing the L2 distance between its output policy vector and the semantic vector of the response for policy optimization. Finally, the dialog generation model is pre-trained by language modeling. Results show that SPACE-3 achieves state-of-the-art performance on eight downstream dialog benchmarks, including intent prediction, dialog state tracking, and end-to-end dialog modeling. We also show that SPACE-3 has a stronger few-shot ability than existing models under the low-resource setting.

preprint2022arXiv

StreamYOLO: Real-time Object Detection for Streaming Perception

The perceptive models of autonomous driving require fast inference within a low latency for safety. While existing works ignore the inevitable environmental changes after processing, streaming perception jointly evaluates the latency and accuracy into a single metric for video online perception, guiding the previous works to search trade-offs between accuracy and speed. In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. Specifically, we build a simple framework with two effective modules. One is a Dual Flow Perception module (DFP). It consists of dynamic flow and static flow in parallel to capture moving tendency and basic detection feature, respectively. Trend Aware Loss (TAL) is the other module which adaptively generates loss weight for each object with its moving speed. Realistically, we consider multiple velocities driving scene and further propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. In this realistic setting, we design a efficient mix-velocity training strategy to guide detector perceive any velocities. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively compared to the strong baseline, validating its effectiveness.

preprint2022arXiv

ThunderNet: Towards Real-time Generic Object Detection

Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Our code and models are available at \url{https://github.com/qinzheng93/ThunderNet}.

preprint2022arXiv

Towards Self-Supervised Category-Level Object Pose and Size Estimation

In this work, we tackle the challenging problem of category-level object pose and size estimation from a single depth image. Although previous fully-supervised works have demonstrated promising performance, collecting ground-truth pose labels is generally time-consuming and labor-intensive. Instead, we propose a label-free method that learns to enforce the geometric consistency between category template mesh and observed object point cloud under a self-supervision manner. Specifically, our method consists of three key components: differentiable shape deformation, registration, and rendering. In particular, shape deformation and registration are applied to the template mesh to eliminate the differences in shape, pose and scale. A differentiable renderer is then deployed to enforce geometric consistency between point clouds lifted from the rendered depth and the observed scene for self-supervision. We evaluate our approach on real-world datasets and find that our approach outperforms the simple traditional baseline by large margins while being competitive with some fully-supervised approaches.

preprint2022arXiv

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

Sparsely annotated semantic segmentation (SASS) aims to train a segmentation network with coarse-grained (i.e., point-, scribble-, and block-wise) supervisions, where only a small proportion of pixels are labeled in each image. In this paper, we propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels. The tree energy loss represents images as minimum spanning trees to model both low-level and high-level pair-wise affinities. By sequentially applying these affinities to the network prediction, soft pseudo labels for unlabeled pixels are generated in a coarse-to-fine manner, achieving dynamic online self-training. The tree energy loss is effective and easy to be incorporated into existing frameworks by combining it with a traditional segmentation loss. Compared with previous SASS methods, our method requires no multistage training strategies, alternating optimization procedures, additional supervised data, or time-consuming post-processing while outperforming them in all SASS settings. Code is available at https://github.com/megvii-research/TreeEnergyLoss.

preprint2022arXiv

Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns

Rapid advances in sensor, wireless communication, cloud computing and data science have brought unprecedented amount of data to assist transportation engineers and researchers in making better decisions. However, traffic data in reality often has corrupted or incomplete values due to detector and communication malfunctions. Data imputation is thus required to ensure the effectiveness of downstream data-driven applications. To this end, numerous tensor-based methods treating the imputation problem as the low-rank tensor completion (LRTC) have been attempted in previous works. To tackle rank minimization, which is at the core of the LRTC, most of aforementioned methods utilize the tensor nuclear norm (NN) as a convex surrogate for the minimization. However, the over-relaxation issue in NN refrains it from desirable performance in practice. In this paper, we define an innovative nonconvex truncated Schatten p-norm for tensors (TSpN) to approximate tensor rank and impute missing spatiotemporal traffic data under the LRTC framework. We model traffic data into a third-order tensor structure of (time intervals,locations (sensors),days) and introduce four complicated missing patterns, including random missing and three fiber-like missing cases according to the tensor mode-n fibers. Despite nonconvexity of the objective function in our model, we derive the global optimal solutions by integrating the alternating direction method of multipliers (ADMM) with generalized soft-thresholding (GST). In addition, we design a truncation rate decay strategy to deal with varying missing rate scenarios. Comprehensive experiments are finally conducted using real-world spatiotemporal datasets, which demonstrate that the proposed LRTC-TSpN method performs well under various missing cases, meanwhile outperforming other SOTA tensor-based imputation models in almost all scenarios.

preprint2022arXiv

Voxel Field Fusion for 3D Object Detection

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets. Code is made available at https://github.com/dvlab-research/VFF.

preprint2022arXiv

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which improves search efficiency by exploring only a small number of architectures while also achieving a higher search accuracy. TNAS introduces an architecture tree and a binary operation tree, to factorize the search space and substantially reduce the exploration size. TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture. Impressively, TNAS finds the global optimal architecture on CIFAR-10 with test accuracy of 94.37\% in four GPU hours in NAS-Bench-201. The average test accuracy is 94.35\%, which outperforms the state-of-the-art. Code is available at: \url{https://github.com/guochengqian/TNAS}.

preprint2021arXiv

Distributed proximal gradient algorithm for non-smooth non-convex optimization over time-varying networks

This note studies the distributed non-convex optimization problem with non-smooth regularization, which has wide applications in decentralized learning, estimation and control. The objective function is the sum of different local objective functions, which consist of differentiable (possibly non-convex) cost functions and non-smooth convex functions. This paper presents a distributed proximal gradient algorithm for the non-smooth non-convex optimization problem over time-varying multi-agent networks. Each agent updates local variable estimate by the multi-step consensus operator and the proximal operator. We prove that the generated local variables achieve consensus and converge to the set of critical points with convergence rate $O(1/T)$. Finally, we verify the efficacy of proposed algorithm by numerical simulations.

preprint2021arXiv

Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing

Semantic parsing has long been a fundamental problem in natural language processing. Recently, cross-domain context-dependent semantic parsing has become a new focus of research. Central to the problem is the challenge of leveraging contextual information of both natural language utterance and database schemas in the interaction history. In this paper, we present a dynamic graph framework that is capable of effectively modelling contextual utterances, tokens, database schemas, and their complicated interaction as the conversation proceeds. The framework employs a dynamic memory decay mechanism that incorporates inductive bias to integrate enriched contextual relation representation, which is further enhanced with a powerful reranking model. At the time of writing, we demonstrate that the proposed framework outperforms all existing models by large margins, achieving new state-of-the-art performance on two large-scale benchmarks, the SParC and CoSQL datasets. Specifically, the model attains a 55.8% question-match and 30.8% interaction-match accuracy on SParC, and a 46.8% question-match and 17.0% interaction-match accuracy on CoSQL.

preprint2021arXiv

Dynamic ordering transitions in charged solid

The phenomenon of group motion is common in nature, ranging from the schools of fish, birds and insects, to avalanches, landslides and sand drift. If we treat objects as collectively moving particles, such phenomena can be studied from a physical point of view, and the research on many-body systems has proved that marvelous effects can arise from the simplest individuals. The motion of numerous individuals presents different dynamic phases related to the ordering of the system. However, it is usually difficult to study the dynamic ordering and their transitions through experiments. Electron bubble states formed in a two-dimensional electron gas, as a type of electron solids, can be driven by an external electric field and provide a platform to study the dynamic collective behaviors. Here, we demonstrate that noise spectrum is a powerful method to investigate the dynamics of bubble states. We observed not only the phenomena from dynamically ordered and disordered structures, but also unexpected alternations between them. Our results show that a dissipative system can convert between chaotic structures and ordered structures when tuning global parameters, which is concealed in conventional transport measurements of resistance or conductance. Moreover, charging the objects to study electrical noise spectrum in collective motions can be an additional approach to revealing dynamic ordering transitions.

preprint2021arXiv

End-to-End Human Object Interaction Detection with HOI Transformer

We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. Current approaches either decouple HOI task into separated stages of object detection and interaction classification or introduce surrogate interaction problem. In contrast, our method, named HOI Transformer, streamlines the HOI pipeline by eliminating the need for many hand-designed components. HOI Transformer reasons about the relations of objects and humans from global image context and directly predicts HOI instances in parallel. A quintuple matching loss is introduced to force HOI predictions in a unified way. Our method is conceptually much simpler and demonstrates improved accuracy. Without bells and whistles, HOI Transformer achieves $26.61\% $ $ AP $ on HICO-DET and $52.9\%$ $AP_{role}$ on V-COCO, surpassing previous methods with the advantage of being much simpler. We hope our approach will serve as a simple and effective alternative for HOI tasks. Code is available at https://github.com/bbepoch/HoiTransformer .

preprint2021arXiv

Enhancing Crystal Structure Prediction by decomposition methods based on graph theory

Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals and the decomposition can dramatically reduce the searching space. Sufficient examples for the test, including the high pressure phases of methane, ammonia, MgAl2O4, and boron, show that these new evolution schemes can obviously improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.

preprint2021arXiv

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

In this work, we present FFB6D, a Full Flow Bidirectional fusion network designed for 6D pose estimation from a single RGBD image. Our key insight is that appearance information in the RGB image and geometry information from the depth image are two complementary data sources, and it still remains unknown how to fully leverage them. Towards this end, we propose FFB6D, which learns to combine appearance and geometry information for representation learning as well as output representation selection. Specifically, at the representation learning stage, we build bidirectional fusion modules in the full flow of the two networks, where fusion is applied to each encoding and decoding layer. In this way, the two networks can leverage local and global complementary information from the other one to obtain better representations. Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation. Experimental results show that our method outperforms the state-of-the-art by large margins on several benchmarks. Code and video are available at \url{https://github.com/ethnhe/FFB6D.git}.

preprint2021arXiv

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

In this paper, we present a novel approach, Momentum$^2$ Teacher, for student-teacher based self-supervised learning. The approach performs momentum update on both network weights and batch normalization (BN) statistics. The teacher's weight is a momentum update of the student, and the teacher's BN statistics is a momentum update of those in history. The Momentum$^2$ Teacher is simple and efficient. It can achieve the state of the art results (74.5\%) under ImageNet linear evaluation protocol using small-batch size(\eg, 128), without requiring large-batch training on special hardware like TPU or inefficient across GPU operation (\eg, shuffling BN, synced BN). Our implementation and pre-trained models will be given on GitHub\footnote{https://github.com/zengarden/momentum2-teacher}.

preprint2021arXiv

Negative linear compressibility and unusual dynamic behaviors of NaB3

First-principles calculations reveal that sodium boride (NaB3) undergoes a phase transition from a tetragonal P4/mbm phase to an orthorhombic Pbam phase at about 16 GPa, accompanied by counterintuitive lattice expansion along the crystallographic a-axis. This unusual compression behavior is identified as negative linear compressibility (NLC), which is dominantly attributed to the symmetry-breaking of boron framework. Meanwhile, the P4/mbm and Pbam phases form superionic conductors after undergoing a peculiar swap state at high temperature. Specifically, under warm conditions the Na cation pairs exhibit a rare local exchange (or rotation) behavior, which may be originated from the asymmetric energy barriers of different diffusion paths. The study of NaB3 compound sheds new light on a material with the combination of NLC and ion transportation at extreme conditions.

preprint2021arXiv

Numerical analysis of a deep learning formulation of elastic full waveform inversion with high order total variation regularization in different parameterization

We have formulated elastic seismic full waveform inversion (FWI) within a deep learning environment. In our formulation, a recurrent neural network is set up with rules enforcing elastic wave propagation, with the wavefield projected onto a measurement surface acting as the synthetic data to be compared with observed seismic data. Gradients for iterative updating of an elastic model, with a variety of parameterizations and misfit functionals, can be efficiently constructed within the network through the automatic differential method. With this method, the inversion based on complex misfits can be calculated. We survey the impact of different complex misfits (based on the l2, l1 ) with high order total variation (TV) regulations on multiparameter elastic FWI recovery of models within velocity/density, modulus/density, and stiffness parameter/density parameterizations. We analyze parameter cross-talk. Inversion results on simple and complex models show that the RNN elastic FWI with high order TV regulation using l1 norm can help mitigate cross-talk issues with gradient-based optimization methods.

preprint2021arXiv

Partially Diffusive Helium-Silica Compound in the Deep Interiors of Giant Planets

Helium is the second most abundant element in the universe, and together with silica, they are major components of giant planets. Exploring the reactivity and state of helium and silica under high pressure is of fundamental importance for developing and understanding of the evolution and internal structure of giant planets. Here, using first-principles calculations and crystal structure predictions, we identify four stable phases of a helium-silica compound with seven/eight-coordinated silicon atoms at pressure range of 600-4000 GPa, corresponding to the interior condition of the outer planets in the solar system. The density of HeSiO2 agrees with current structure models of the planets. This helium-silica compound exhibits a superionic-like helium diffusive state at the high pressure and high temperature conditions along the isentropes of Saturn, a metallic fluid state in Jupiter, and a solid state in the deep interiors of Uranus and Neptune. The reaction of helium and silica may lead to the erosion of the rocky core of giant planets and form a diluted core region. These results highlight the reactivity of helium under high pressure to form new compounds, and also provides evidence to help build more sophisticated interior models of giant planets.

preprint2021arXiv

Resilient Control under Quantization and Denial-of-Service: Co-designing a Deadbeat Controller and Transmission Protocol

This paper is concerned with the problem of stabilizing continuous-time linear time-invariant systems subject to quantization and Denial-of-Service (DoS) attacks. In this context, two DoS-induced challenges emerge with the design of resilient encoding schemes, namely, the coupling between encoding strategies of different signals, and the synchronization between the encoder and decoder. To address these challenges, a novel structure that is equipped with a deadbeat controller as well as a delicate transmission protocol for the input and output channels, co-designed leveraging the controllability index, is put forward. When both input and output channels are subject to DoS attacks and quantization, the proposed structure is shown able to decouple the encoding schemes for input, output, and estimated output signals. This property is further corroborated by designing encoding schemes as well as conditions that ensure exponential stability of the closed-loop system. On the other hand, when only the output channel is subject to network phenomenon, the proposed structure can achieve exponential stabilization without acknowledgment (ACK) signals, in contrast to existing ACK-based results. Finally, a numerical example is given to demonstrate the practical merits of the proposed approach as well as the theory.

preprint2021arXiv

Superionic silica-water and silica-hydrogen compounds under high pressure

Silica, water and hydrogen are known to be the major components of celestial bodies, and have significant influence on the formation and evolution of giant planets, such as Uranus and Neptune. Thus, it is of fundamental importance to investigate their states and possible reactions under the planetary conditions. Here, using advanced crystal structure searches and first-principles calculations in the Si-O-H system, we find that a silica-water compound (SiO2)2(H2O) and a silica-hydrogen compound SiO2H2 can exist under high pressures above 450 and 650 GPa, respectively. Further simulations reveal that, at high pressure and high temperature conditions corresponding to the interiors of Uranus and Neptune, these compounds exhibit superionic behavior, in which protons diffuse freely like liquid while the silicon and oxygen framework is fixed as solid. Therefore, these superionic silica-water and silica-hydrogen compounds could be regarded as important components of the deep mantle or core of giants, which also provides an alternative origin for their anomalous magnetic fields. These unexpected physical and chemical properties of the most common natural materials at high pressure offer key clues to understand some abstruse issues including demixing and erosion of the core in giant planets, and shed light on building reliable models for solar giants and exoplanets.

preprint2021arXiv

Using Long Short-Term Memory (LSTM) and Internet of Things (IoT) for localized surface temperature forecasting in an urban environment

The rising temperature is one of the key indicators of a warming climate, and it can cause extensive stress to biological systems as well as built structures. Due to the heat island effect, it is most severe in urban environments compared to other landscapes due to the decrease in vegetation associated with a dense human-built environment. It is essential to adequately monitor the local temperature dynamics to mitigate risks associated with increasing temperatures, which can include short term strategy to protect people and animals, to long term strategy to how to build a new structure and cope with extreme events. Observed temperature is also a very important input for atmospheric models, and accurate data can lead to better future forecasts. Ambient temperature collected at ground level can have a higher variability when compared to regional weather forecasts, which fail to capture the local dynamics. There remains a clear need for an accurate air temperature prediction at the sub-urban scale at high temporal and spatial resolution. This research proposed a framework based on Long Short-Term Memory (LSTM) deep learning network to generate day-ahead hourly temperature forecast with high spatial resolution. A case study is shown which uses historical in-situ observations and Internet of Things (IoT) observations for New York City, USA. By leveraging the historical air temperature data from in-situ observations, the LSTM model can be exposed to more historical patterns that might not be present in the IoT observations. Meanwhile, by using IoT observations, the spatial resolution of air temperature predictions is significantly improved.

preprint2021arXiv

Van Hove Singularity Arising from Mexican-Hat-Shaped Inverted Bands in the Topological Insulator Sn-doped Bi$_{1.1}$Sb$_{0.9}$Te$_{2}$S

The optical properties of Sn-doped Bi$_{1.1}$Sb$_{0.9}$Te$_{2}$S, the most bulk-insulating topological insulator thus far, have been examined at different temperatures over a broad frequency range. No Drude response is detected in the low-frequency range down to 30~cm$^{-1}$, corroborating the excellent bulk-insulating property of this material. Intriguingly, we observe a sharp peak at about 2\,200~cm$^{-1}$ in the optical conductivity at 5~K. Further quantitative analyses of the line shape and temperature dependence of this sharp peak, in combination with first-principles calculations, suggest that it corresponds to a van Hove singularity arising from Mexican-hat-shaped inverted bands. Such a van Hove singularity is a pivotal ingredient of various strongly correlated phases.

preprint2020arXiv

A Big Data Enabled Channel Model for 5G Wireless Communication Systems

The standardization process of the fifth generation (5G) wireless communications has recently been accelerated and the first commercial 5G services would be provided as early as in 2018. The increasing of enormous smartphones, new complex scenarios, large frequency bands, massive antenna elements, and dense small cells will generate big datasets and bring 5G communications to the era of big data. This paper investigates various applications of big data analytics, especially machine learning algorithms in wireless communications and channel modeling. We propose a big data and machine learning enabled wireless channel model framework. The proposed channel model is based on artificial neural networks (ANNs), including feed-forward neural network (FNN) and radial basis function neural network (RBF-NN). The input parameters are transmitter (Tx) and receiver (Rx) coordinates, Tx-Rx distance, and carrier frequency, while the output parameters are channel statistical properties, including the received power, root mean square (RMS) delay spread (DS), and RMS angle spreads (ASs). Datasets used to train and test the ANNs are collected from both real channel measurements and a geometry based stochastic model (GBSM). Simulation results show good performance and indicate that machine learning algorithms can be powerful analytical tools for future measurement-based wireless channel modeling.

preprint2020arXiv

A Non-Stationary VVLC MIMO Channel Model for Street Corner Scenarios

In recent years, the application potential of visible light communication (VLC) technology as an alternative and supplement to radio frequency (RF) technology has attracted people's attention. The study of the underlying VLC channel is the basis for designing the VLC communication system. In this paper, a new non-stationary geometric street corner model is proposed for vehicular VLC (VVLC) multiple-input multiple-output (MIMO) channel. The proposed model takes into account changes in vehicle speed and direction. The category of scatterers includes fixed scatterers and mobile scatterers (MS). Based on the proposed model, we derive the channel impulse response (CIR) and explore the statistical characteristics of the VVLC channel. The channel gain and root mean square (RMS) delay spread of the VVLC channel are studied. In addition, the influence of velocity change on the statistical characteristics of the model is also investigated. The proposed channel model can guide future vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) optical communication system design.

preprint2020arXiv

A real-time multi-constraints obstacle avoidance method using LiDAR

Obstacle avoidance is one of the essential and indispensable functions for autonomous mobile robots. Most of the existing solutions are typically based on single condition constraint and cannot incorporate sensor data in a real-time manner, which often fail to respond to unexpected moving obstacles in dynamic unknown environments. In this paper, a novel real-time multi-constraints obstacle avoidance method using Light Detection and Ranging(LiDAR) is proposed, which is able to, based on the latest estimation of the robot pose and environment, find the sub-goal defined by a multi-constraints function within the explored region and plan a corresponding optimal trajectory at each time step iteratively, so that the robot approaches the goal over time. Meanwhile, at each time step, the improved Ant Colony Optimization(ACO) algorithm is also used to re-plan optimal paths from the latest robot pose to the latest defined sub-goal position. While ensuring convergence, planning in this method is done by repeated local optimizations, so that the latest sensor data from LiDAR and derived environment information can be fully utilized at each step until the robot reaches the desired position. This method facilitates real-time performance, also has little requirement on memory space or computational power due to its nature, thus our method has huge potentials to benefit small low-cost autonomous platforms. The method is evaluated against several existing technologies in both simulation and real-world experiments.

preprint2020arXiv

A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

Question Answering (QA) over Knowledge Base (KB) aims to automatically answer natural language questions via well-structured relation information between entities stored in knowledge bases. In order to make KBQA more applicable in actual scenarios, researchers have shifted their attention from simple questions to complex questions, which require more KB triples and constraint inference. In this paper, we introduce the recent advances in complex QA. Besides traditional methods relying on templates and rules, the research is categorized into a taxonomy that contains two main branches, namely Information Retrieval-based and Neural Semantic Parsing-based. After describing the methods of these branches, we analyze directions for future research and introduce the models proposed by the Alime team.

preprint2020arXiv

Angle-based Search Space Shrinking for Neural Architecture Search

In this work, we present a simple and general search space shrinking method, called Angle-Based search space Shrinking (ABS), for Neural Architecture Search (NAS). Our approach progressively simplifies the original search space by dropping unpromising candidates, thus can reduce difficulties for existing NAS methods to find superior architectures. In particular, we propose an angle-based metric to guide the shrinking process. We provide comprehensive evidences showing that, in weight-sharing supernet, the proposed metric is more stable and accurate than accuracy-based and magnitude-based metrics to predict the capability of child models. We also show that the angle-based metric can converge fast while training supernet, enabling us to get promising shrunk search spaces efficiently. ABS can easily apply to most of NAS approaches (e.g. SPOS, FairNAS, ProxylessNAS, DARTS and PDARTS). Comprehensive experiments show that ABS can dramatically enhance existing NAS approaches by providing a promising shrunk search space.

preprint2020arXiv

Approximation algorithms for general cluster routing problem

Graph routing problems have been investigated extensively in operations research, computer science and engineering due to their ubiquity and vast applications. In this paper, we study constant approximation algorithms for some variations of the general cluster routing problem. In this problem, we are given an edge-weighted complete undirected graph $G=(V,E,c),$ whose vertex set is partitioned into clusters $C_{1},\dots ,C_{k}.$ We are also given a subset $V'$ of $V$ and a subset $E'$ of $E.$ The weight function $c$ satisfies the triangle inequality. The goal is to find a minimum cost walk $T$ that visits each vertex in $V'$ only once, traverses every edge in $E'$ at least once and for every $i\in [k]$ all vertices of $C_i$ are traversed consecutively.

preprint2020arXiv

Attentive Normalization for Conditional Image Generation

Traditional convolution-based generative adversarial networks synthesize images based on hierarchical local operations, where long-range dependency relation is implicitly modeled with a Markov chain. It is still not sufficient for categories with complicated structures. In this paper, we characterize long-range dependence with attentive normalization (AN), which is an extension to traditional instance normalization. Specifically, the input feature map is softly divided into several regions based on its internal semantic similarity, which are respectively normalized. It enhances consistency between distant regions with semantic correspondence. Compared with self-attention GAN, our attentive normalization does not need to measure the correlation of all locations, and thus can be directly applied to large-size feature maps without much computational burden. Extensive experiments on class-conditional image generation and semantic inpainting verify the efficacy of our proposed module.

preprint2020arXiv

Content-Aware Unsupervised Deep Homography Estimation

Homography estimation is a basic image alignment method in many applications. It is usually conducted by extracting and matching sparse feature points, which are error-prone in low-light and low-texture images. On the other hand, previous deep homography approaches use either synthetic images for supervised learning or aerial images for unsupervised learning, both ignoring the importance of handling depth disparities and moving objects in real world applications. To overcome these problems, in this work we propose an unsupervised deep homography method with a new architecture design. In the spirit of the RANSAC procedure in traditional methods, we specifically learn an outlier mask to only select reliable regions for homography estimation. We calculate loss with respect to our learned deep features instead of directly comparing image content as did previously. To achieve the unsupervised training, we also formulate a novel triplet loss customized for our network. We verify our method by conducting comprehensive comparisons on a new dataset that covers a wide range of scenes with varying degrees of difficulties for the task. Experimental results reveal that our method outperforms the state-of-the-art including deep solutions and feature-based solutions.

preprint2020arXiv

Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks

In this paper, the problem of dynamic spectrum sensing and aggregation is investigated in a wireless network containing N correlated channels, where these channels are occupied or vacant following an unknown joint 2-state Markov model. At each time slot, a single cognitive user with certain bandwidth requirement either stays idle or selects a segment comprising C (C < N) contiguous channels to sense. Then, the vacant channels in the selected segment will be aggregated for satisfying the user requirement. The user receives a binary feedback signal indicating whether the transmission is successful or not (i.e., ACK signal) after each transmission, and makes next decision based on the sensing channel states. Here, we aim to find a policy that can maximize the number of successful transmissions without interrupting the primary users (PUs). The problem can be considered as a partially observable Markov decision process (POMDP) due to without full observation of system environment. We implement a Deep Q-Network (DQN) to address the challenge of unknown system dynamics and computational expenses. The performance of DQN, Q-Learning, and the Improvident Policy with known system dynamics is evaluated through simulations. The simulation results show that DQN can achieve near-optimal performance among different system scenarios only based on partial observations and ACK signals.

preprint2020arXiv

Designing transformation-induced plasticity and twinning-induced plasticity Cr-Co-Ni medium entropy alloys: theory and experiment

In order to efficiently explore the nearly infinite composition space in multicomponent solid solution alloys, it is important to establish predictive design strategies and use computation-aided methods. In the present work, we demonstrated the density functional theory calculations informed design routes for realizing transformation-induced plasticity (TRIP) and twinning-induced plasticity (TWIP) in Cr-Co-Ni medium entropy alloys (MEAs). We systematically studied the effects of magnetism and chemical composition on the generalized stacking fault energy surface (gamma-surface) and showed that both chemistry and the coupled magnetic state strongly affect the gamma-surface, consequently, the primary deformation modes. Based on the calculated effective energy barriers for the competing deformation modes, we constructed composition and magnetism dependent deformation maps at both room and cryogenic temperatures. Accordingly, we proposed various design routes for achieving desired primary deformation modes in the ternary Cr-Co-Ni alloys. The deformation mechanisms predicted by our theoretical models are in nice agreement with available experimental observations in literature. Furthermore, we fabricated two non-equiatomic Cr-Co-Ni MEAs possessing the designed TWIP and TRIP effects, showing excellent combinations of tensile strength and ductility.

preprint2020arXiv

Detection in Crowded Scenes: One Proposal, Multiple Predictions

We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes. The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks. Equipped with new techniques such as EMD Loss and Set NMS, our detector can effectively handle the difficulty of detecting highly overlapped objects. On a FPN-Res50 baseline, our detector can obtain 4.9\% AP gains on challenging CrowdHuman dataset and 1.0\% $\text{MR}^{-2}$ improvements on CityPersons dataset, without bells and whistles. Moreover, on less crowed datasets like COCO, our approach can still achieve moderate improvement, suggesting the proposed method is robust to crowdedness. Code and pre-trained models will be released at https://github.com/megvii-model/CrowdDetection.

preprint2020arXiv

Dimensionalities and multiplicities determination of crystal nets

Low-dimensional materials have attracted significant attentions over the past decade. To discover new low-dimensional materials, high-throughout screening methods have been applied in different materials databases. For this purpose, the reliability of dimensionality identification is therefore highly important. In this work, we find that the existence of self-penetrating nets may lead to incorrect results by previous methods. In stead of this, we use the quotient graph to analysis the topologies of structures and compute their dimensionalities. Based on the quotient graph, we can calculate not only the dimensionality but also the multiplicity of self-penetrating structures. As a demonstration, we screened the Crystallography Open Database using our method and found hundreds of structures with different dimensionalities and high multiplicities up to eleven.

preprint2020arXiv

Discrete Conformal Geometry of Polyhedral Surfaces and Its Convergence

The paper proves a result on the convergence of discrete conformal maps to the Riemann mappings for Jordan domains. It is a counterpart of Rodin-Sullivan's theorem on convergence of circle packing mappings to the Riemann mapping in the new setting of discrete conformality. The proof follows the same strategy that Rodin-Sullivan used by establishing a rigidity result for regular hexagonal triangulations of the plane and estimating the quasiconformal constants associated to the discrete conformal maps.

preprint2020arXiv

Dynamic Memory Induction Networks for Few-Shot Text Classification

This paper proposes Dynamic Memory Induction Networks (DMIN) for few-shot text classification. The model utilizes dynamic routing to provide more flexibility to memory-based few-shot learning in order to better adapt the support sets, which is a critical capacity of few-shot classification models. Based on that, we further develop induction models with query information, aiming to enhance the generalization ability of meta-learning. The proposed model achieves new state-of-the-art results on the miniRCV1 and ODIC dataset, improving the best performance (accuracy) by 2~4%. Detailed analysis is further performed to show the effectiveness of each component.

preprint2020arXiv

End-to-end Interpretable Learning of Non-blind Image Deblurring

Non-blind image deblurring is typically formulated as a linear least-squares problem regularized by natural priors on the corresponding sharp picture's gradients, which can be solved, for example, using a half-quadratic splitting method with Richardson fixed-point iterations for its least-squares updates and a proximal operator for the auxiliary variable updates. We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels. Using convolutions instead of a generic linear preconditioner allows extremely efficient parameter sharing across the image, and leads to significant gains in accuracy and/or speed compared to classical FFT and conjugate-gradient methods. More importantly, the proposed architecture is easily adapted to learning both the preconditioner and the proximal operator using CNN embeddings. This yields a simple and efficient algorithm for non-blind image deblurring which is fully interpretable, can be learned end to end, and whose accuracy matches or exceeds the state of the art, quite significantly, in the non-uniform case.

preprint2020arXiv

Evidence for magnon-phonon coupling in the topological magnet Cu$_3$TeO$_6$

We perform thermodynamic and inelastic neutron scattering (INS) measurements to study the lattice dynamics (phonons) of a cubic collinear antiferromagnet Cu$_3$TeO$_6$ which hosts topological spin excitations (magnons). While the specific heat and thermal conductivity results show that the thermal transport is dominated by phonons, the deviation of the thermal conductivity from a pure phononic model indicates that there is a strong coupling between magnons and phonons. In the INS measurements, we find a mode in the excitation spectra at 4.5 K, which exhibits a slight downward dispersion around the Brillouin zone center. This mode disappears above the Néel temperature, and thus cannot be a phonon. Furthermore, the dispersion is distinct from that of a magnon. Instead, it can be explained by the magnon-polaron mode, which is new collective excitations resulting from the hybridization between magnons and phonons. We consider the suppression of the thermal conductivity and emergence of the magnon-polaron mode to be evidence for magnon-phonon coupling in Cu$_3$TeO$_6$.

preprint2020arXiv

Ferromagnetic MnSn monolayer epitaxially grown on silicon substrate

Two-dimensional (2D) ferromagnetic materials have been exhibiting promising potential in applications, such as spintronics devices. To grow epitaxial magnetic films on silicon substrate, in the single-layer limit, is practically important but challenging. In this study, we realized the epitaxial growth of MnSn monolayer on Si(111) substrate, with an atomically thin Sn/Si(111)-$2\sqrt{3}\times2\sqrt{3}$- buffer layer, and controlled the MnSn thickness with atomic-layer precision. We discovered the ferromagnetism in MnSn monolayer with the Curie temperature (Tc) of ~54 K. As the MnSn film is grown to 4 monolayers, Tc increases accordingly to ~235 K. The lattice of the epitaxial MnSn monolayer as well as the Sn/Si(111)-$2\sqrt{3}\times2\sqrt{3}$ is perfectly compatible with silicon, and thus an sharp interface is formed between MnSn, Sn and Si. This system provides a new platform for exploring the 2D ferromagnetism, integrating magnetic monolayers into silicon-based technology, and engineering the spintronics heterostructures.

preprint2020arXiv

Finite-thickness effect and spin polarization of the even-denominator fractional quantum Hall states

The spin-polarized even-denominator fractional quantum Hall (FQH) states in the second Landau level (LL), i.e. 5/2 and 7/2, may possess novel quasi-particle excitations obeying non-Abelian statistics. However, the spin polarization of the 7/2 FQH state has not been investigated experimentally and the spin polarization of the 5/2 FQH state from tilted field experiments remains controversial. Using a piezo-driven sample rotator with the lowest electron temperature down to 25 mK, we studied the energy gap of the even-denominator FQH states in the second LL by precise control of the tilted angles with a resolution less than 0.1°. We observed two different energy gap dependences on the in-plane magnetic field for 5/2, 7/2, other FQH states (7/3 and 8/3) in the second LL and reentrant integer quantum Hall (RIQH) states in the third LL. Though the transition fields vary from states, their corresponding in-plane magnetic lengths are comparable to the quantum well thickness of the sample, which may result from the influence of the finite-thickness effect. At low in-plane magnetic fields, before the conjectured finite-thickness effect starts to dominate, the energy gaps of both 5/2 and 7/2 states show a non-decreasing behavior, supporting spin-polarized ground states. Our results also suggest that the 7/3, 8/3 FQH states, and the RIQH states in the third LL are spin-polarized or partially spin-polarized.

preprint2020arXiv

Funnel Activation for Visual Recognition

We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition. The forms of ReLU and PReLU are y = max(x, 0) and y = max(x, px), respectively, while FReLU is in the form of y = max(x,T(x)), where T(x) is the 2D spatial condition. Moreover, the spatial condition achieves a pixel-wise modeling capacity in a simple way, capturing complicated visual layouts with regular convolutions. We conduct experiments on ImageNet, COCO detection, and semantic segmentation tasks, showing great improvements and robustness of FReLU in the visual recognition tasks. Code is available at https://github.com/megvii-model/FunnelAct.

preprint2020arXiv

Gauss-Newton Unrolled Neural Networks and Data-driven Priors for Regularized PSSE with Robustness

Distributed renewable generation, elastic loads, and purposeful manipulation of meter readings challenge the monitoring and control of today's power systems (PS). In this context, to maintain a comprehensive view of the system in real time, fast and robust state estimation (SE) methods are urgently needed. Conventional PSSE solvers typically entail minimizing a nonlinear and nonconvex least-squares by e.g., the workhorse Gauss-Newton method. Those iterative solvers however, are sensitive to initialization and may get stuck in local minima. To overcome these hurdles and inspired by recent image denoising techniques, this paper advocates a learnable regularization term for PSSE that uses a deep neural network (DNN) prior. For the resultant regularized PSSE problem, a "Gauss-Newton-like" alternating minimization solver is first developed. To accommodate real-time monitoring, a novel end-to-end DNN is constructed by unrolling the proposed alternating minimization solver. Interestingly, the power network topology can be easily incorporated into the DNN by designing a graph neural network (GNN) based prior. To further endow the physics-based DNN with robustness against bad data, an adversarial DNN training method is discussed. Numerical tests using real load data on the IEEE $118$-bus benchmark system showcase the improved estimation and robustness performance of the proposed scheme compared with several state-of-the-art alternatives.

preprint2020arXiv

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

Occluded person re-identification (ReID) aims to match occluded person images to holistic ones across dis-joint cameras. In this paper, we propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. At first, we use a CNN backbone and a key-points estimation model to extract semantic local features. Even so, occluded images still suffer from occlusion and outliers. Then, we view the local features of an image as nodes of a graph and propose an adaptive direction graph convolutional (ADGC)layer to pass relation information between nodes. The proposed ADGC layer can automatically suppress the message-passing of meaningless features by dynamically learning di-rection and degree of linkage. When aligning two groups of local features from two images, we view it as a graph matching problem and propose a cross-graph embedded-alignment (CGEA) layer to jointly learn and embed topology information to local features, and straightly predict similarity score. The proposed CGEA layer not only take full use of alignment learned by graph matching but also re-place sensitive one-to-one matching with a robust soft one. Finally, extensive experiments on occluded, partial, and holistic ReID tasks show the effectiveness of our proposed method. Specifically, our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.

preprint2020arXiv

LabelEnc: A New Intermediate Supervision Method for Object Detection

In this paper we propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems. The key idea is to introduce a novel label encoding function, mapping the ground-truth labels into latent embedding, acting as an auxiliary intermediate supervision to the detection backbone during training. Our approach mainly involves a two-step training procedure. First, we optimize the label encoding function via an AutoEncoder defined in the label space, approximating the "desired" intermediate representations for the target object detector. Second, taking advantage of the learned label encoding function, we introduce a new auxiliary loss attached to the detection backbones, thus benefiting the performance of the derived detector. Experiments show our method improves a variety of detection systems by around 2% on COCO dataset, no matter one-stage or two-stage frameworks. Moreover, the auxiliary structures only exist during training, i.e. it is completely cost-free in inference time. Code is available at: https://github.com/megvii-model/LabelEnc

preprint2020arXiv

Learning Delicate Local Representations for Multi-Person Pose Estimation

In this paper, we propose a novel method called Residual Steps Network (RSN). RSN aggregates features with the same spatial size (Intra-level features) efficiently to obtain delicate local representations, which retain rich low-level spatial information and result in precise keypoint localization. Additionally, we observe the output features contribute differently to final performance. To tackle this problem, we propose an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations. Our approach won the 1st place of COCO Keypoint Challenge 2019 and achieves state-of-the-art results on both COCO and MPII benchmarks, without using extra training data and pretrained model. Our single model achieves 78.6 on COCO test-dev, 93.0 on MPII test dataset. Ensembled models achieve 79.2 on COCO test-dev, 77.1 on COCO test-challenge dataset. The source code is publicly available for further research at https://github.com/caiyuanhao1998/RSN/

preprint2020arXiv

Learning Dynamic Routing for Semantic Segmentation

Recently, numerous handcrafted and searched networks have been applied for semantic segmentation. However, previous works intend to handle inputs with various scales in pre-defined static architectures, such as FCN, U-Net, and DeepLab series. This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image. To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly. In addition, the computational cost can be further reduced in an end-to-end manner by giving budget constraints to the gating function. We further relax the network level routing space to support multi-path propagations and skip-connections in each forward, bringing substantial network capacity. To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space. Extensive experiments are conducted on Cityscapes and PASCAL VOC 2012 to illustrate the effectiveness of the dynamic framework. Code is available at https://github.com/yanwei-li/DynamicRouting.

preprint2020arXiv

Learning Human-Object Interaction Detection using Interaction Points

Understanding interactions between humans and objects is one of the fundamental problems in visual classification and an essential step towards detailed scene understanding. Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them. Most existing HOI detection approaches are instance-centric where interactions between all possible human-object pairs are predicted based on appearance features and coarse spatial information. We argue that appearance features alone are insufficient to capture complex human-object interactions. In this paper, we therefore propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Paired with the densely predicted interaction vectors, the interactions are associated with human and object detections to obtain final predictions. To the best of our knowledge, we are the first to propose an approach where HOI detection is posed as a keypoint detection and grouping problem. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET. Our approach sets a new state-of-the-art on both datasets. Code is available at https://github.com/vaesl/IP-Net.

preprint2020arXiv

Market Power in Convex Hull Pricing

The start up costs in many kinds of generators lead to complex cost structures, which in turn yield severe market loopholes in the locational marginal price (LMP) scheme. Convex hull pricing (a.k.a. extended LMP) is proposed to improve the market efficiency by providing the minimal uplift payment to the generators. In this letter, we consider a stylized model where all generators share the same generation capacity. We analyze the generators' possible strategic behaviors in such a setting, and then propose an index for market power quantification in the convex hull pricing schemes.

preprint2020arXiv

Method for Extracting Patterns of Coordinated Network Attacks on Electric Power CPS based on Temporal-Topological Correlation

In the analysis of coordinated network attacks on electric power cyber-physical system (CPS), it is difficult to restore the complete attack path, and the intent of the attack cannot be identified automatically. A method is therefore proposed for the extracting patterns of coordinated network attacks on electric power CPS based on temporal-topological correlation. First, the attack events are aggregated according to the alarm log of the cyber space, and a temporal-causal Bayesian network-based cyber attack recognition algorithm is proposed to parse out the cyber attack sequences of the same attacker. Then, according to the characteristic curves of different attack measurement data in physical space, a combination of physical attack event criteria algorithm is designed to distinguish the types of physical attack events. Finally, physical attack events and cyber attack sequences are matched via temporal-topological correlation, frequent patterns of attack sequences are extracted, and hidden multi-step attack patterns are found from scattered grid measurement data and information from alarm logs. The effectiveness and efficiency of the proposed method are verified by the testbed at Mississippi State University.

preprint2020arXiv

Multi-Frequency Multi-Scenario Millimeter Wave MIMO Channel Measurements and Modeling for B5G Wireless Communication Systems

Millimeter wave (mmWave) bands have been utilized for the fifth generation (5G) communication systems and will no doubt continue to be deployed for beyond 5G (B5G). However, the underlying channels are not fully investigated at multifrequency bands and in multi-scenarios by using the same channel sounder, especially for the outdoor, multiple-input multiple-output (MIMO), and vehicle-to-vehicle (V2V) conditions. In this paper, we conduct multi-frequency multi-scenario mmWave MIMO channel measurements with 4*4 antennas at 28, 32, and 39 GHz bands for three cases, i.e., the human body and vehicle blockage measurements, outdoor path loss measurements, and V2V measurements. The channel characteristics, including blockage effect, path loss and coverage range, and non-stationarity and spatial consistency, are thoroughly studied. The blockage model, path loss model, and time-varying channel model are proposed for mmWave MIMO channels. The channel measurement and modeling results will be of great importance for further mmWave communication system deployments in indoor hotspot, outdoor, and vehicular network scenarios for B5G.

preprint2020arXiv

Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

In this paper time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. Among existing approximate dynamic programming (ADP) and reinforcement learning (RL) algorithms, the direct heuristic dynamic programming (dHDP) has been shown an effective tool as demonstrated in solving several complex learning control problems. It continuously updates the control policy and the critic as system states continuously evolve. It is therefore desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. Toward this goal, we propose a new event-driven dHDP. By constructing a Lyapunov function candidate, we prove the uniformly ultimately boundedness (UUB) of the system states and the weights in the critic and the control policy networks. Consequently we show the approximate control and cost-to-go function approaching Bellman optimality within a finite bound. We also illustrate how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.

preprint2020arXiv

Pressure Engineering of the Dirac Fermions in Quasi-One-Dimensional Tl$_2$Mo$_6$Se$_6$

Topological band dispersions other than the standard Dirac or Weyl fermions have garnered the increasing interest in materials science. Among them, the cubic Dirac fermions were recently proposed in the family of quasi-one-dimensional conductors A$_2$Mo$_6$X$_6$ (A= Na, K, In, Tl; X= S, Se, Te), where the band crossing is characterized by a linear dispersion in one $k$-space direction but the cubic dispersion in the plane perpendicular to it. It is not yet clear, however, how the external perturbations can alter these nontrivial carriers and ultimately induce a new distinct quantum phase. Here we study the evolution of Dirac fermions, in particular the cubic Dirac crossing, under external pressure in the representative quasi-one-dimensional Tl$_2$Mo$_6$Se$_6$ via the first-principles calculations. Specifically, it is found that the topological properties, including the bulk Dirac crossings and the topological surface states, change progressively under pressure up to 50 GPa where it undergoes a structural transition from the hexagonal phase to body-centered tetragonal phase. Above 50 GPa, the system is more likely to be topologically trivial. Further, we also investigate its phonon spectra, which reveals a gradual depletion of the negative phonon modes with pressure, consistent with the more three-dimensional Fermi surface in the high-pressure phase. Our work may provide a useful guideline for further experimental search and the band engineering of the topologically nontrivial fermions in this intriguing state of matter.

preprint2020arXiv

PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

In this work, we present a novel data-driven method for robust 6DoF object pose estimation from a single RGBD image. Unlike previous methods that directly regressing pose parameters, we tackle this challenging task with a keypoint-based approach. Specifically, we propose a deep Hough voting network to detect 3D keypoints of objects and then estimate the 6D pose parameters within a least-squares fitting manner. Our method is a natural extension of 2D-keypoint approaches that successfully work on RGB based 6DoF estimation. It allows us to fully utilize the geometric constraint of rigid objects with the extra depth information and is easy for a network to learn and optimize. Extensive experiments were conducted to demonstrate the effectiveness of 3D-keypoint detection in the 6D pose estimation task. Experimental results also show our method outperforms the state-of-the-art methods by large margins on several benchmarks. Code and video are available at https://github.com/ethnhe/PVN3D.git.

preprint2020arXiv

Single Path One-Shot Neural Architecture Search with Uniform Sampling

We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method, however, is hard to train and not yet effective on large scale datasets like ImageNet. This work propose a Single Path One-Shot model to address the challenge in the training. Our central idea is to construct a simplified supernet, where all architectures are single paths so that weight co-adaption problem is alleviated. Training is performed by uniform path sampling. All architectures (and their weights) are trained fully and equally. Comprehensive experiments verify that our approach is flexible and effective. It is easy to train and fast to search. It effortlessly supports complex search spaces (e.g., building blocks, channel, mixed-precision quantization) and different search constraints (e.g., FLOPs, latency). It is thus convenient to use for various needs. It achieves start-of-the-art performance on the large dataset ImageNet.

preprint2020arXiv

Spin filtering in germanium/silicon core/shell nanowires with pseudo-helical gap

Semiconductors with strong spin-orbit interactions can exhibit a helical gap with spin-momentum locking opened by a magnetic field. Such a gap is highly spin selective as a result of a topologically protected spin-momentum locking, which can be used for spin filtering. We experimentally demonstrate such a spin filtering effect in a quasi-ballistic p-type germanium/silicon core/shell nanowire (NW), which possesses a pseudo-helical gap without the application of magnetic field. Polarized hole spin injection to the NW is achieved using cobalt ferromagnetic contacts with controlled natural surface oxide on the NW as a tunnel barrier. Local and nonlocal spin valve effects are measured as the verification of polarized spin transport in the NW outside the helical gap. We electrically tune the NW into the helical gap by scanning its chemical potential with a gate. A hysteresis loop with three resistance states is observed in the local spin valve geometry, as an evidence of spin filtering in the helical gap.

preprint2020arXiv

Structured and Localized Image Restoration

We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning. We optimize a penalized energy function regularized by a sum of terms measuring the distance between patches to be restored and clean patches from an external database gathered beforehand. The resulting estimator comes with strong statistical guarantees leveraging local dependency properties of overlapping patches. We derive the corresponding algorithms for energies based on the mean-squared and Euclidean norm errors. Finally, we demonstrate the practical effectiveness of our model on different image restoration problems using standard benchmarks.

preprint2020arXiv

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.

preprint2020arXiv

Type-II Ising superconductivity and anomalous metallic state in macro-size ambient-stable ultrathin crystalline films

Recent emergence of two-dimensional (2D) crystalline superconductors has provided a promising platform to investigate novel quantum physics and potential applications. To reveal essential quantum phenomena therein, ultralow temperature transport investigation on high quality ultrathin superconducting films is critically required, although it has been quite challenging experimentally. Here we report a systematic transport study on the ultrathin crystalline PdTe2 films grown by molecular beam epitaxy (MBE). Interestingly, a new type of Ising superconductivity in 2D centrosymmetric materials is revealed by the detection of large in-plane critical field more than 7 times Pauli limit. Remarkably, in perpendicular magnetic field, we provide solid evidence of anomalous metallic state characterized by the resistance saturation at low temperatures with high quality filters. The robust superconductivity with intriguing quantum phenomena in the macro-size ambient-stable ultrathin PdTe2 films remains almost the same for 20 months, showing great potentials in electronic and spintronic applications.

preprint2020arXiv

WeightNet: Revisiting the Design Space of Weight Networks

We present a conceptually simple, flexible and effective framework for weight generating networks. Our approach is general that unifies two current distinct and extremely effective SENet and CondConv into the same framework on weight space. The method, called WeightNet, generalizes the two methods by simply adding one more grouped fully-connected layer to the attention activation layer. We use the WeightNet, composed entirely of (grouped) fully-connected layers, to directly output the convolutional weight. WeightNet is easy and memory-conserving to train, on the kernel space instead of the feature space. Because of the flexibility, our method outperforms existing approaches on both ImageNet and COCO detection tasks, achieving better Accuracy-FLOPs and Accuracy-Parameter trade-offs. The framework on the flexible weight space has the potential to further improve the performance. Code is available at https://github.com/megvii-model/WeightNet.

preprint2019arXiv

DetNAS: Backbone Search for Object Detection

Object detectors are usually equipped with backbone networks designed for image classification. It might be sub-optimal because of the gap between the tasks of image classification and object detection. In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection. It is non-trivial because detection training typically needs ImageNet pre-training while NAS systems require accuracies on the target detection task as supervisory signals. Based on the technique of one-shot supernet, which contains all possible networks in the search space, we propose a framework for backbone search on object detection. We train the supernet under the typical detector training schedule: ImageNet pre-training and detection fine-tuning. Then, the architecture search is performed on the trained supernet, using the detection task as the guidance. This framework makes NAS on backbones very efficient. In experiments, we show the effectiveness of DetNAS on various detectors, for instance, one-stage RetinaNet and the two-stage FPN. We empirically find that networks searched on object detection shows consistent superiority compared to those searched on ImageNet classification. The resulting architecture achieves superior performance than hand-crafted networks on COCO with much less FLOPs complexity.

preprint2019arXiv

Quantum correlations near the exceptional point

Recent advances in non-Hermitian physical systems have led to numerous novel optical phenomena and applications. However, most realizations are limited to classical systems and quantum fluctuations of light is unexplored. For the first time, we report the observation of quantum correlations between light channels in an anti-symmetric optical system made of flying atoms. Two distant optical channels coupled dissipatively, display gain, phase sensitivity and quantum correlations with each other, even under linear atom-light interaction within each channel. We found that quantum correlations emerge in the phase unbroken regime and disappears after crossing the exceptional point. Our microscopic model considering quantum noise evolution produces results in good qualitative agreement with experimental observations. This work opens up a new direction of experimental quantum nonlinear optics using non-Hermitian systems, and demonstrates the viability of nonlinear coupling with linear systems by using atomic motion as feedback.

preprint2016arXiv

Bed-Load Transport Rate Based on the Entrainment Probabilities of Sediment Grains by Rolling and Lifting

A function for the bed-load sediment transport rate is derived. This is achieved from the first principle by using the entrainment probabilities of the sediment grains by rolling and lifting, and by introducing two travel lengths, respectively, for the first time. The predictions from the new bed-load function agree well with the experimental results over the entire experimental range and show significant improvement over the commonly used formula for bed-load transport rate. The new function shows that, in terms of contributing to the bed-load transport rate, the total entrainment probability of the sediment grains is a weighted summation of those by the lifted and rolling grains, rather than a simple addition of the two. The function has also been used to predict the total entrainment probability, saltation length and the bed layer thickness at high bed-load transport rate. These predictions all agree well with the experimental results. It is found that, on average, the travel length for the rolling sand grains is about one order of magnitude less than that for the lifted ones.

preprint2016arXiv

Convergence of the Point Integral method for Poisson equation on point cloud

The Laplace-Beltrami operator (LBO) is a fundamental object associated to Riemannian manifolds, which encodes all intrinsic geometry of the manifolds and has many desirable properties. Recently, we proposed a novel numerical method, Point Integral method (PIM), to discretize the Laplace-Beltrami operator on point clouds \cite{LSS}. In this paper, we analyze the convergence of Point Integral method (PIM) for Poisson equation with Neumann boundary condition on submanifolds isometrically embedded in Euclidean spaces.

preprint2016arXiv

Convergence of the Point Integral method for the Poisson equation with Dirichlet boundary on point cloud

The Poisson equation on manifolds plays an fundamental role in many applications. Recently, we proposed a novel numerical method called the Point Integral method (PIM) to solve the Poisson equations on manifolds from point clouds. In this paper, we prove the convergence of the point integral method for solving the Poisson equation with the Dirichlet boundary condition.

preprint2016arXiv

Effective Clipart Image Vectorization Through Direct Optimization of Bezigons

Bezigons, i.e., closed paths composed of Bézier curves, have been widely employed to describe shapes in image vectorization results. However, most existing vectorization techniques infer the bezigons by simply approximating an intermediate vector representation (such as polygons). Consequently, the resultant bezigons are sometimes imperfect due to accumulated errors, fitting ambiguities, and a lack of curve priors, especially for low-resolution images. In this paper, we describe a novel method for vectorizing clipart images. In contrast to previous methods, we directly optimize the bezigons rather than using other intermediate representations; therefore, the resultant bezigons are not only of higher fidelity compared with the original raster image but also more reasonable because they were traced by a proficient expert. To enable such optimization, we have overcome several challenges and have devised a differentiable data energy as well as several curve-based prior terms. To improve the efficiency of the optimization, we also take advantage of the local control property of bezigons and adopt an overlapped piecewise optimization strategy. The experimental results show that our method outperforms both the current state-of-the-art method and commonly used commercial software in terms of bezigon quality.

preprint2016arXiv

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

preprint2016arXiv

Functional Brain Imaging: A Comprehensive Survey

Functional brain imaging allows measuring dynamic functionality in all brain regions. It is broadly used in clinical cognitive neuroscience as, well as in research. It will allow the observation of neural activities in the brain simultaneously. From the beginning when functional brain imaging was initiated by the mapping of brain functions proposed by phrenologists, many scientists were asking why we need to image brain functionality since we have already structural information. Simply, their important question was including a great answer. Functional information of the human brain would definitely complement structural information, helping to have a better understanding of what is happening in the brain. This paper, which could be useful to those who have an interest in functional brain imaging, such as engineers, will present a quick review of modalities used in functional brain imaging. We will concentrate on the most used techniques in functional imaging which are functional magnetic resonance imaging (fMRI) and functional optical imaging, which is one of novelties in this area of study.

preprint2016arXiv

Identity Mappings in Deep Residual Networks

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/resnet-1k-layers

preprint2016arXiv

Instance-sensitive Fully Convolutional Networks

Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set of instance-sensitive score maps, each of which is the outcome of a pixel-wise classifier of a relative position to instances. On top of these instance-sensitive score maps, a simple assembling module is able to output instance candidate at each position. In contrast to the recent DeepMask method for segmenting instances, our method does not have any high-dimensional layer related to the mask resolution, but instead exploits image local coherence for estimating instances. We present competitive results of instance segment proposal on both PASCAL VOC and MS COCO.

preprint2016arXiv

Object Detection Networks on Convolutional Feature Maps

Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them "Networks on Convolutional feature maps" (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.

preprint2016arXiv

Origin of the superconductivity of WTe2 under pressure

Tungsten ditelluride (WTe2) has attracted significant attention due to its interesting electronic properties, such as the unsaturated magnetoresistance and superconductivity. Recently, it has been proposed to be a new type of Weyl semimetal, which is distinguished from other transition metal dichalcogenides (TMDs) from a topological prospective. Here, we study the structure of WTe2 under pressure with a crystal structure prediction and ab initio calculations combined with high pressure synchrotron X-ray diffraction and Raman spectroscopy measurements. We find that the ambient orthorhombic structure (Td) transforms into a monoclinic structure (1T') at around 4-5 GPa. As the transition pressure is very close to the critical point in recent high-pressure electrical transport measurements, the emergence of superconductivity in WTe2 under pressure is attributed to the Td-1T' structure phase transition, which associates with a sliding mechanism of the TMD layers and results in a shorter Te-Te interlayer distance compared to the intralayer ones. These results highlight the critical role of the interlayer stacking and chalcogen interactions on the electronic and superconducting properties of multilayered TMDs under hydrostatic strain environments.

preprint2016arXiv

Rich Image Captioning in the Wild

We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out of-domain datasets.

preprint2016arXiv

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure. We note that for the topic of interactive image segmentation, scribbles are very widely used in academic research and commercial software, and are recognized as one of the most user-friendly ways of interacting. In this paper, we propose to use scribbles to annotate images, and develop an algorithm to train convolutional networks for semantic segmentation supervised by scribbles. Our algorithm is based on a graphical model that jointly propagates information from scribbles to unmarked pixels and learns network parameters. We present competitive object semantic segmentation results on the PASCAL VOC dataset by using scribbles as annotations. Scribbles are also favored for annotating stuff (e.g., water, sky, grass) that has no well-defined shape, and our method shows excellent results on the PASCAL-CONTEXT dataset thanks to extra inexpensive scribble annotations. Our scribble annotations on PASCAL VOC are available at http://research.microsoft.com/en-us/um/people/jifdai/downloads/scribble_sup

preprint2016arXiv

Supervised Transformer Network for Efficient Face Detection

Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The first stage is a multi-task Region Proposal Network (RPN), which simultaneously predicts candidate face regions along with associated facial landmarks. The candidate regions are then warped by mapping the detected facial landmarks to their canonical positions to better normalize the face patterns. The second stage, which is a RCNN, then verifies if the warped candidate regions are valid faces or not. We conduct end-to-end learning of the cascaded network, including optimizing the canonical positions of the facial landmarks. This supervised learning of the transformations automatically selects the best scale to differentiate face/non-face patterns. By combining feature maps from both stages of the network, we achieve state-of-the-art detection accuracies on several public benchmarks. For real-time performance, we run the cascaded network only on regions of interests produced from a boosting cascade face detector. Our detector runs at 30 FPS on a single CPU core for a VGA-resolution image.

preprint2016arXiv

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

Background subtraction has been a fundamental and widely studied task in video analysis, with a wide range of applications in video surveillance, teleconferencing and 3D modeling. Recently, motivated by compressive imaging, background subtraction from compressive measurements (BSCM) is becoming an active research task in video surveillance. In this paper, we propose a novel tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames into backgrounds with spatial-temporal correlations and foregrounds with spatio-temporal continuity in a tensor framework. In this approach, we use 3D total variation (TV) to enhance the spatio-temporal continuity of foregrounds, and Tucker decomposition to model the spatio-temporal correlations of video background. Based on this idea, we design a basic tensor RPCA model over the video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize the correlations among the groups of similar 3D patches of video background, we further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint tensor Tucker decompositions of 3D patch groups for modeling the video background. Efficient algorithms using alternating direction method of multipliers (ADMM) are developed to solve the proposed models. Extensive experiments on simulated and real-world videos demonstrate the superiority of the proposed approaches over the existing state-of-the-art approaches.

preprint2015arXiv

A new topological semimetal with iso-energetic Weyl fermions in TaAs under high pressure

TaAs as one of the experimentally discovered topological Weyl semimetal has attracted intense interests recently. The ambient TaAs has two types of Weyl nodes which are not on the same energy level. As an effective way to tune lattice parameters and electronic interactions, high pressure is becoming a significant tool to explore new materials as well as their exotic states. Therefore, it is highly interesting to investigate the behaviors of topological Weyl fermions and possible structural phase transitions in TaAs under pressure. Here, with a combination of ab initio calculations and crystal structure prediction techniques, a new hexagonal P-6m2 phase is predicted in TaAs at pressure around 14 GPa. Surprisingly, this new phase is a topological semimetal with only single set of Weyl nodes exactly on the same energy level. The phase transition pressure from the experimental measurements, including electrical transport measurements and Raman spectroscopy, agrees with our theoretical prediction reasonably. Moreover, the P-6m2 phase seems to be quenched recoverable to ambient pressure, which increases the possibilities of further study on the exotic behaviors of single set of Weyl fermions, such as the interplay between surface states and other properties.

preprint2015arXiv

Accelerating Very Deep Convolutional Networks for Classification and Detection

This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community. Unlike previous methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We develop an effective solution to the resulting nonlinear optimization problem without the need of stochastic gradient descent (SGD). More importantly, while previous methods mainly focus on optimizing one or two layers, our nonlinear method enables an asymmetric reconstruction that reduces the rapidly accumulated error when multiple (e.g., >=10) layers are approximated. For the widely used very deep VGG-16 model, our method achieves a whole-model speedup of 4x with merely a 0.3% increase of top-5 error in ImageNet classification. Our 4x accelerated VGG-16 model also shows a graceful accuracy degradation for object detection when plugged into the Fast R-CNN detector.

preprint2015arXiv

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called BoxSup, produces competitive results supervised by boxes only, on par with strong baselines fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further unleashes the power of deep convolutional networks and yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT.

preprint2015arXiv

Convergence of Laplacian Spectra from Point Clouds

The spectral structure of the Laplacian-Beltrami operator (LBO) on manifolds has been widely used in many applications, include spectral clustering, dimensionality reduction, mesh smoothing, compression and editing, shape segmentation, matching and parameterization, and so on. Typically, the underlying Riemannian manifold is unknown and often given by a set of sample points. The spectral structure of the LBO is estimated from some discrete Laplace operator constructed from this set of sample points. In our previous papers, we proposed the point integral method to discretize the LBO from point clouds, which is also capable to solve the eigenproblem. Then one fundmental issue is the convergence of the eigensystem of the discrete Laplacian to that of the LBO. In this paper, for compact manifolds isometrically embedded in Euclidean spaces possibly with boundary, we show that the eigenvalues and the eigenvectors obtained by the point integral method converges to the eigenvalues and the eigenfunctions of the LBO with the Neumann boundary, and in addition, we give an estimate of the convergence rate. This result provides a solid mathematical foundation for the point integral method in the computation of Laplacian spectra from point clouds.

preprint2015arXiv

Convolutional Feature Masking for Joint Object and Stuff Segmentation

The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs). The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions. This strategy introduces artificial boundaries on the images and may impact the quality of the extracted features. Besides, the operations on the raw image domain require to compute thousands of networks on a single image, which is time-consuming. In this paper, we propose to exploit shape information via masking convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. We further propose a joint method to handle objects and "stuff" (e.g., grass, sky, water) in the same framework. State-of-the-art results are demonstrated on benchmarks of PASCAL VOC and new PASCAL-CONTEXT, with a compelling computational speed.

preprint2015arXiv

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

In this paper, we present a novel approach to automatic 3D Facial Expression Recognition (FER) based on deep representation of facial 3D geometric and 2D photometric attributes. A 3D face is firstly represented by its geometric and photometric attributes, including the geometry map, normal maps, normalized curvature map and texture map. These maps are then fed into a pre-trained deep convolutional neural network to generate the deep representation. Then the facial expression prediction is simplyachieved by training linear SVMs over the deep representation for different maps and fusing these SVM scores. The visualizations show that the deep representation provides a complete and highly discriminative coding scheme for 3D faces. Comprehensive experiments on the BU-3DFE database demonstrate that the proposed deep representation can outperform the widely used hand-crafted descriptors (i.e., LBP, SIFT, HOG, Gabor) and the state-of-art approaches under the same experimental protocols.

preprint2015arXiv

Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

preprint2015arXiv

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

preprint2015arXiv

Fast Guided Filter

The guided filter is a technique for edge-aware image filtering. Because of its nice visual quality, fast speed, and ease of implementation, the guided filter has witnessed various applications in real products, such as image editing apps in phones and stereo reconstruction, and has been included in official MATLAB and OpenCV. In this note, we remind that the guided filter can be simply sped up from O(N) time to O(N/s^2) time for a subsampling ratio s. In a variety of applications, this leads to a speedup of >10x with almost no visible degradation. We hope this acceleration will improve performance of current applications and further popularize this filter. Code is released.

preprint2015arXiv

Geometrically Enhanced Quantum Oscillatory Signal and Nonzero Berry's Phase in Indium Arsenide Surface

In a system accommodating both surface and bulk conduction channels, a long-standing challenge is to extract weak Shubnikov-de Haas oscillation signal in the surface from a large background stemming from the bulk. Conventional methods to suppress the bulk conduction often involve doping, an intrusive approach, to reduce the bulk carrier density. Here we propose a geometric method, i.e. attaching a metal shunt to the indium arsenide epilayer, to redistribute current and thus enhance the oscillation-to-background ratio. This allows us, for the first time, to observe clear quantum oscillations and nonzero Berry's phase at the surface of indium arsenide. We also identify the existence of a Rashba type spin-orbit interaction, on the InAs surface, with a large coupling constant ~ 1 eVA. We anticipate wide applicability of this non-intrusive architecture in similar systems such as topological insulators.

preprint2015arXiv

Harmonic Extension

In this paper, we consider the harmonic extension problem, which is widely used in many applications of machine learning. We find that the transitional method of graph Laplacian fails to produce a good approximation of the classical harmonic function. To tackle this problem, we propose a new method called the point integral method (PIM). We consider the harmonic extension problem from the point of view of solving PDEs on manifolds. The basic idea of the PIM method is to approximate the harmonicity using an integral equation, which is easy to be discretized from points. Based on the integral equation, we explain the reason why the transitional graph Laplacian may fail to approximate the harmonicity in the classical sense and propose a different approach which we call the volume constraint method (VCM). Theoretically, both the PIM and the VCM computes a harmonic function with convergence guarantees, and practically, they are both simple, which amount to solve a linear system. One important application of the harmonic extension in machine learning is semi-supervised learning. We run a popular semi-supervised learning algorithm by Zhu et al. over a couple of well-known datasets and compare the performance of the aforementioned approaches. Our experiments show the PIM performs the best.

preprint2015arXiv

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multi-task Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.

preprint2015arXiv

Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal

In this paper, we address the problem of estimating and removing non-uniform motion blur from a single blurry image. We propose a deep learning approach to predicting the probabilistic distribution of motion blur at the patch level using a convolutional neural network (CNN). We further extend the candidate set of motion kernels predicted by the CNN using carefully designed image rotations. A Markov random field model is then used to infer a dense non-uniform motion blur field enforcing motion smoothness. Finally, motion blur is removed by a non-uniform deblurring model using patch-level image prior. Experimental evaluations show that our approach can effectively estimate and remove complex non-uniform motion blur that is not handled well by previous approaches.

preprint2015arXiv

Localization and Orbital Selectivity in Iron-Based Superconductors with Cu Substitution

We study an inhomogeneous three-orbital Hubbard model for the Cu-substituted iron pnictides using an extended real-space Green's function method combined with density functional calculations. We find that the onsite interactions of the Cu ions are the principal determinant of whether an electron dopant or a hole dopant is caused by the Cu substitution. It is found that the Cu substitution could lead to a hole doping when its onsite interactions are smaller than a critical value, as opposed to an electron doping when the interactions of Cu ions are larger than the critical value, which may explain why the effects of Cu substitution on the carrier density are entirely different in NaFe$_{1-x}$Cu$_x$As and Ba(Fe$_{1-x}$Cu$_x$)$_2$As$_2$. We also find that the effect of a doping-induced disorder is considerable in the Cu-substituted iron pnictides, and its cooperative effect with electron correlations contributes to the orbital-selective insulating phases in NaFe$_{1-x}$Cu$_x$As and Ba(Fe$_{1-x}$Cu$_x$)$_2$As$_2$.

preprint2015arXiv

Observation of Electrically Tunable van der Waals Interaction in Graphene-Molecule Complex

van der Waals (vdW) interaction plays a fundamental role in the surface-molecules related phenomena. Tuning of the correlated charge fluctuation in the vdW complex is a plausible way to modulate the molecules interaction at the atomic surface. We report vdW interaction tunability of the graphene-CO$_2$ complex by combining the first principle calculations with the vdW exchange correlation density functionals and the time evaluation measurements of CO$_2$ molecules adsorption/desorption on graphene under an external electric field. The field-dependent charge transfer within the complex unveils the controllable tuning of CO$_2$ from acceptor to donor. Meanwhile the configuration of the adsorbed molecule - the equilibrium distance from graphene and O-C-O bonding angle - is modified accordingly. The range of electrical tunability is a unique feature for each type of molecules.

preprint2015arXiv

Point Integral Method for Solving Poisson-type Equations on Manifolds from Point Clouds with Convergence Guarantees

Partial differential equations (PDE) on manifolds arise in many areas, including mathematics and many applied fields. Among all kinds of PDEs, the Poisson-type equations including the standard Poisson equation and the related eigenproblem of the Laplace-Beltrami operator are of the most important. Due to the complicated geometrical structure of the manifold, it is difficult to get efficient numerical method to solve PDE on manifold. In the paper, we propose a method called point integral method (PIM) to solve the Poisson-type equations from point clouds with convergence guarantees. In PIM, the key idea is to derive the integral equations which approximates the Poisson-type equations and contains no derivatives but only the values of the unknown function. The latter makes the integral equation easy to be approximated from point cloud. In the paper, we explain the derivation of the integral equations, describe the point integral method and its implementation, and present the numerical experiments to demonstrate the convergence of PIM.

preprint2015arXiv

Pressure Induced Enhancement of Superconductivity in LaRu2P2

To explore new superconductors beyond the copper-based and iron-based systems is very important. The Ru element locates just below the Fe in the periodic table and behaves like the Fe in many ways. One of the common thread to induce high temperature superconductivity is to introduce moderate correlation into the system. In this paper, we report the significant enhancement of superconducting transition temperature from 3.84K to 5.77K by using a pressure only of 1.74 GPa in LaRu2P2 which has an iso-structure of the iron-based 122 superconductors. The ab-initio calculation shows that the superconductivity in LaRu2P2 at ambient pressure can be explained by the McMillan's theory with strong electron-phonon coupling. However, it is difficult to interpret the significant enhancement of Tc versus pressure within this picture. Detailed analysis of the pressure induced evolution of resistivity and upper critical field Hc2(T) reveals that the increases of Tc with pressure may be accompanied by the involvement of extra electronic correlation effect. This suggests that the Ru-based system has some commonality as the Fe-based superconductors.

preprint2015arXiv

Pressure-induced semimetal to superconductor transition in a three-dimensional topological material ZrTe5

As a new type of topological materials, ZrTe5 shows many exotic properties under extreme conditions. Utilizing resistance and ac magnetic susceptibility measurements under high pressure, while the resistance anomaly near 128 K is completely suppressed at 6.2 GPa, a fully superconducting transition emerges surprisingly. The superconducting transition temperature Tc increases with applied pressure, and reaches a maximum of 4.0 K at 14.6 GPa, followed by a slight drop but remaining almost constant value up to 68.5 GPa. At pressures above 21.2 GPa, a second superconducting phase with the maximum Tc of about 6.0 K appears and coexists with the original one to the maximum pressure studied in this work. In situ high-pressure synchrotron X-ray diffraction and Raman spectroscopy combined with theoretical calculations indicate the observed two-stage superconducting behavior is correlated to the structural phase transition from ambient Cmcm phase to high-pressure C2/m phase around 6 GPa, and to a mixture of two high-pressure phases of C2/m and P-1 above 20 GPa. The combination of structure, transport measurement and theoretical calculations enable a complete understanding of the emerging exotic properties in three-dimensional topological materials happened under extreme environments.

preprint2015arXiv

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102x faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

preprint2015arXiv

Sudden change of geometric quantum discord in finite temperature reservoirs

We investigate sudden change (SC) behaviors of the distance-based measures of geometric quantum discords (GQDs) for two non-interacting qubits subject to the two-sided and the one-sided thermal reservoirs. We found that the GQDs defined by different distances exhibit different SCs, and thus the SCs are the combined result of the chosen discord measure and the property of a state. We also found that the thermal reservoir may generate states having different orderings related to different GQDs. These inherent differences of the GQDs reveal that they are incompatible in characterizing quantum correlations both quantitatively and qualitatively.

preprint2015arXiv

The large unsaturated magnetoresistance of Weyl semimetals

The Weyl semimetal (WSM) is a novel topological gapless state with promises exotic transport due to chiral anomaly. Recently, a family of nonmagnetic WSM candidates including TaAs, NbAs, NbP etc is confirmed by first principle calculation and experiments. The TaAs family are reported to display the large unsaturated magnetoresistance (XMR), which have not yet been explained. Here, we give a theoretical calculation of XMR based on the extended effective-medium approach to Weyl semimetals. We predict the power law of XMR at high magnetic field and the "turn-on" magnetic field, which are well in agreement with experiments data. Furthermore, we investigate the $θ$-dependence magnetoresistance and find the transition between the postive XMR and the negative magnetoresistance induced by chiral anomaly, which should be confirmed by further experiments.

preprint2014arXiv

A discrete uniformization theorem for polyhedral surfaces II

A discrete conformality for hyperbolic polyhedral surfaces is introduced in this paper. This discrete conformality is shown to be computable. It is proved that each hyperbolic polyhedral metric on a closed surface is discrete conformal to a unique hyperbolic polyhedral metric with a given discrete curvature satisfying Gauss-Bonnet formula. Furthermore, the hyperbolic polyhedral metric with given curvature can be obtained using a discrete Yamabe flow with surgery. In particular, each hyperbolic polyhedral metric on a closed surface with negative Euler characteristic is discrete conformal to a unique hyperbolic metric.

preprint2014arXiv

Convolutional Neural Networks at Constrained Time Cost

Though recent advanced convolutional neural networks (CNNs) have been improving the image recognition accuracy, the models are getting more complex and time-consuming. For real-world applications in industrial and commercial scenarios, engineers and developers are often faced with the requirement of constrained time budget. In this paper, we investigate the accuracy of CNNs under constrained time cost. Under this constraint, the designs of the network architectures should exhibit as trade-offs among the factors like depth, numbers of filters, filter sizes, etc. With a series of controlled comparisons, we progressively modify a baseline model while preserving its time complexity. This is also helpful for understanding the importance of the factors in network designs. We present an architecture that achieves very competitive accuracy in the ImageNet dataset (11.8% top-5 error, 10-view test), yet is 20% faster than "AlexNet" (16.0% top-5 error, 10-view test).

preprint2014arXiv

Discrete Conformal Deformation: Algorithm and Experiments

In this paper, we introduce a definition of discrete conformality for triangulated surfaces with flat cone metrics and describe an algorithm for solving the problem of prescribing curvature, that is to deform the metric discrete conformally so that the curvature of the resulting metric coincides with the prescribed curvature. We explicitly construct a discrete conformal map between the input triangulated surface and the deformed triangulated surface. Our algorithm can handle the surface with any topology with or without boundary, and can find a deformed metric for any prescribed curvature satisfying the Gauss-Bonnet formula. In addition, we present the numerical examples to show the convergence of our discrete conformality and to demonstrate the efficiency and the robustness of our algorithm.

preprint2014arXiv

Efficient and Accurate Approximations of Nonlinear Convolutional Networks

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is also presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4x is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the "AlexNet", but is 4.7% more accurate.

preprint2014arXiv

Power Scheduling of Kalman Filtering in Wireless Sensor Networks with Data Packet Drops

For a wireless sensor network (WSN) with a large number of low-cost, battery-driven, multiple transmission power leveled sensor nodes of limited transmission bandwidth, then conservation of transmission resources (power and bandwidth) is of paramount importance. Towards this end, this paper considers the problem of power scheduling of Kalman filtering for general linear stochastic systems subject to data packet drops (over a packet-dropping wireless network). The transmission of the acquired measurement from the sensor to the remote estimator is realized by sequentially transmitting every single component of the measurement to the remote estimator in one time period. The sensor node decides separately whether to use a high or low transmission power to communicate every component to the estimator across a packet-dropping wireless network based on the rule that promotes the power scheduling with the least impact on the estimator mean squared error. Under the customary assumption that the predicted density is (approximately) Gaussian, leveraging the statistical distribution of sensor data, the mechanism of power scheduling, the wireless network effect and the received data, the minimum mean squared error estimator is derived. By investigating the statistical convergence properties of the estimation error covariance, we establish, for general linear systems, both the sufficient condition and the necessary condition guaranteeing the stability of the estimator.

preprint2014arXiv

Recommendation Scheme Based on Converging Properties for Contents Broadcasting

Popular videos are often clicked by a mount of users in a short period. With content recommendation, the popular contents could be broadcast to the potential users in wireless network, to save huge transmitting resource. In this paper, the contents propagation model is analyzed due to users' historical behavior, location, and the converging properties in wireless data transmission, with the users' communication log in the Chinese commercial cellular network. And a recommendation scheme is proposed to achieve high energy efficiency.

preprint2013arXiv

A discrete uniformization theorem for polyhedral surfaces

A discrete conformality for polyhedral metrics on surfaces is introduced in this paper which generalizes earlier work on the subject. It is shown that each polyhedral metric on a surface is discrete conformal to a constant curvature polyhedral metric which is unique up to scaling. Furthermore, the constant curvature metric can be found using a discrete Yamabe flow with surgery.

preprint2013arXiv

A Variational Principle for Improving 2D Triangle Meshes based on Hyperbolic Volume

In this paper, we consider the problem of improving 2D triangle meshes tessellating planar regions. We propose a new variational principle for improving 2D triangle meshes where the energy functional is a convex function over the angle structures whose maximizer is unique and consists only of equilateral triangles. This energy functional is related to hyperbolic volume of ideal 3-simplex. Even with extra constraints on the angles for embedding the mesh into the plane and preserving the boundary, the energy functional remains well-behaved. We devise an efficient algorithm for maximizing the energy functional over these extra constraints. We apply our algorithm to various datasets and compare its performance with that of CVT. The experimental results show that our algorithm produces the meshes with both the angles and the aspect ratios of triangles lying in tighter intervals.

preprint2013arXiv

Disorder-Driven Superconductor-Insulator Transition in d-Wave Superconducting Ultrathin Films

We study the superconductor-insulator transition (SIT) in $d$-wave superconducting ultrathin films. By means of the kernel polynomial method, the Bogoliubov-de Gennes equations are solved for square lattices with up to $360\times 360$ unit cells self-consistently, making it possible to observe fully the nanoscale spatial fluctuations of the superconducting order parameters and discriminate accurately the localized quasiparticle states from the extended ones by the lattice-size scaling of the generalized inverse participation ratio. It is shown that Anderson localization can not entirely inhibit the occurrence of the local superconductivity in strongly-disordered $d$-wave superconductors. Separated by an insulating 'sea' completely, a few isolated superconducting 'islands' with significant enhancement of the local superconducting order parameters can survive across the SIT. The disorder-driven SIT, therefore, is a transition from a $d$-wave superconductor to a Bose insulator which consists of localized Cooper pairs. Unlike an $s$-wave superconductor which presents a robust single-particle gap across the SIT, the optical conductivity of a $d$-wave superconductor reveals a gapless insulating phase, where the SIT can be detected by observing the disappearance of the Drude weight with the increasing disorder.

preprint2013arXiv

Gromov-Hausdorff Approximation of Metric Spaces with Linear Structure

In many real-world applications data come as discrete metric spaces sampled around 1-dimensional filamentary structures that can be seen as metric graphs. In this paper we address the metric reconstruction problem of such filamentary structures from data sampled around them. We prove that they can be approximated, with respect to the Gromov-Hausdorff distance by well-chosen Reeb graphs (and some of their variants) and we provide an efficient and easy to implement algorithm to compute such approximations in almost linear time. We illustrate the performances of our algorithm on a few synthetic and real data sets.

preprint2013arXiv

Hierarchical Nystrom Methods for Constructing Markov State Models for Conformational Dynamics

Markov state models (MSMs) have become a popular approach for investigating the conformational dynamics of proteins and other biomolecules. MSMs are typically built from numerous molecular dynamics simulations by dividing the sampled configurations into a large number of microstates based on geometric criteria. The resulting microstate model can then be coarse-grained into a more understandable macro state model by lumping together rapidly mixing microstates into larger, metastable aggregates. However, finite sampling often results in the creation of many poorly sampled microstates. During coarse-graining, these states are mistakenly identified as being kinetically important because transitions to/from them appear to be slow. In this paper we propose a formalism based on an algebraic principle for matrix approximation, i.e. the Nystrom method, to deal with such poorly sampled microstates. Our scheme builds a hierarchy of microstates from high to low populations and progressively applies spectral clustering on sets of microstates within each level of the hierarchy. It helps spectral clustering identify metastable aggregates with highly populated microstates rather than being distracted by lowly populated states. We demonstrate the ability of this algorithm to discover the major metastable states on two model systems, the alanine dipeptide and TrpZip2.

preprint2013arXiv

Rigidity of Infinite Hexagonal Triangulation of the Plane

In the paper, we consider the rigidity problem of the infinite hexagonal triangulation of the plane under the piecewise linear conformal changes introduced by Luo in [5]. Our result shows that if a geometric hexagonal triangulation of the plane is PL conformal to the regular hexagonal triangulation and all inner angles are in $[δ, π/2 -δ]$ for any constant $δ> 0$, then it is the regular hexagonal triangulation. This partially solves a conjecture of Luo [4]. The proof uses the concept of \emph{quasi-harmonic} functions to unfold the properties of the mesh.

preprint2013arXiv

Stable all nitrogen metallic salt at terapascal pressures

The phase diagram and equation of state of dense nitrogen are of interest in understanding the fundamental physics and chemistry under extreme conditions, including planetary processes, and in discovering new materials. We predict several stable phases of nitrogen at multi-TPa pressures, including a P4/nbm structure consisting of partially charged N2 pairs and N5 tetrahedra, which is stable in the range 2.5-6.8 TPa. This is followed by a modulated layered structure between 6.8 and 12.6 TPa, which also exhibits a significant charge transfer. The P4/nbm metallic nitrogen salt and the modulated structure are stable at high pressures and temperatures, and they exhibit strongly ionic features and charge density distortions, which is unexpected in an element under such extreme conditions and could represent a new class of nitrogen materials. The P-T phase diagram of nitrogen at TPa pressures is investigated using quasiharmonic phonon calculations and ab initio molecular dynamics simulations.

preprint2013arXiv

Variational Principles for Minkowski Type Problems, Discrete Optimal Transport, and Discrete Monge-Ampere Equations

In this paper, we develop several related finite dimensional variational principles for discrete optimal transport (DOT), Minkowski type problems for convex polytopes and discrete Monge-Ampere equation (DMAE). A link between the discrete optimal transport, discrete Monge-Ampere equation and the power diagram in computational geometry is established.

preprint2012arXiv

Dynamics of quantum entanglement in the reservoir with memory effects

The non-Markovian dynamics of quantum entanglement is studied by the Shabani-Lidar master equation when one of entangled quantum systems is coupled to a local reservoir with memory effects. The completely positive reduced dynamical map can be constructed in the Kraus representation. Quantum entanglement decays more slowly in the non-Markovian environment. The decoherence time for quantum entanglement can be markedly increased by the change of the memory kernel. It is found out that the entanglement sudden death between quantum systems and entanglement sudden birth between the system and reservoir occur at different instants.

preprint2011arXiv

Persistence and eventual demise of oxygen molecules at terapascal pressures

Computational searches for structures of solid oxygen under pressures in the multi TPa range have been carried out using density-functional-theory methods. We find that molecular oxygen persists to about 1.9 TPa at which it transforms into a semiconducting square spiral-like polymeric structure (I41/acd) with a band gap of about 3.0 eV. Solid oxygen forms a metallic zig-zag chain-like structure (Cmcm) at about 3.0 TPa, but the chains in each layer gradually merge as the pressure is increased and a structure of Fmmm symmetry forms at about 9.5 TPa in which each atom has four nearest neighbors. The superconducting properties of molecular oxygen do not vary much with compression, although the structure becomes more symmetric. The electronic properties of oxygen have a complex evolution with pressure, swapping between insulating, semiconducting and metallic.

Jian Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

152 published item(s)

Are eHMIs always helpful? Investigating how eHMIs interfere with pedestrian behavior on multi-lane streets: An eye-tracking virtual reality experiment

MobileGeo: Exploring Hierarchical Knowledge Distillation for Resource-Efficient Cross-view Drone Geo-Localization

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos

Quasi-invariant theorem on the Gaussian path space

Dynamic Grained Encoder for Vision Transformers

Pressure-Induced Superconductivity in Topological Heterostructure (PbSe)5(Bi2Se3)6

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

A Survey on Neural Open Information Extraction: Current Status and Future Directions

A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

Anchor DETR: Query Design for Transformer-Based Object Detection

Boosting Black-Box Adversarial Attacks with Meta Learning

BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment

Data-driven Self-triggered Control via Trajectory Prediction

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

Differentiable Architecture Search with Random Features

Distributed Momentum-based Frank-Wolfe Algorithm for Stochastic Optimization

Distributed stochastic projection-free solver for constrained optimization

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems

Efficient reversible data hiding via two layers of double-peak embedding

Focal Sparse Convolutional Networks for 3D Object Detection

FS6D: Few-Shot 6D Pose Estimation of Novel Objects

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Ill-posed Surface Emissivity Retrieval from Multi-Geometry Hyperspectral Images using a Hybrid Deep Neural Network

Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation

Instance-Conditional Knowledge Distillation for Object Detection

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

LGD: Label-guided Self-distillation for Object Detection

Linking-Enhanced Pre-Training for Table Semantic Parsing

MMChat: Multi-Modal Chat Dataset on Social Media

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

Progressive End-to-End Object Detection in Crowded Scenes

Real-time Object Detection for Streaming Perception

Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Simple Baselines for Image Restoration

SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation

StreamYOLO: Real-time Object Detection for Streaming Perception

ThunderNet: Towards Real-time Generic Object Detection

Towards Self-Supervised Category-Level Object Pose and Size Estimation

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns

Voxel Field Fusion for 3D Object Detection

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

Distributed proximal gradient algorithm for non-smooth non-convex optimization over time-varying networks

Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing

Dynamic ordering transitions in charged solid

End-to-End Human Object Interaction Detection with HOI Transformer

Enhancing Crystal Structure Prediction by decomposition methods based on graph theory

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Negative linear compressibility and unusual dynamic behaviors of NaB3

Numerical analysis of a deep learning formulation of elastic full waveform inversion with high order total variation regularization in different parameterization

Partially Diffusive Helium-Silica Compound in the Deep Interiors of Giant Planets

Resilient Control under Quantization and Denial-of-Service: Co-designing a Deadbeat Controller and Transmission Protocol

Superionic silica-water and silica-hydrogen compounds under high pressure

Using Long Short-Term Memory (LSTM) and Internet of Things (IoT) for localized surface temperature forecasting in an urban environment

Van Hove Singularity Arising from Mexican-Hat-Shaped Inverted Bands in the Topological Insulator Sn-doped Bi$_{1.1}$Sb$_{0.9}$Te$_{2}$S

A Big Data Enabled Channel Model for 5G Wireless Communication Systems

A Non-Stationary VVLC MIMO Channel Model for Street Corner Scenarios

A real-time multi-constraints obstacle avoidance method using LiDAR

A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

Angle-based Search Space Shrinking for Neural Architecture Search

Approximation algorithms for general cluster routing problem

Attentive Normalization for Conditional Image Generation

Content-Aware Unsupervised Deep Homography Estimation

Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks

Designing transformation-induced plasticity and twinning-induced plasticity Cr-Co-Ni medium entropy alloys: theory and experiment

Detection in Crowded Scenes: One Proposal, Multiple Predictions

Dimensionalities and multiplicities determination of crystal nets