Source author record

Xiang Li

Xiang Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning astro-ph.HE Artificial Intelligence Computation and Language Social and Information Networks physics.soc-ph Information Retrieval astro-ph.CO cond-mat.mtrl-sci gr-qc eess.AS hep-ph Robotics Sound eess.SP physics.data-an Populations and Evolution astro-ph.GA cond-mat.str-el Cryptography and Security cs.CY eess.IV math.OC physics.comp-ph astro-ph.IM Computational Geometry cond-mat.dis-nn cond-mat.mes-hall cond-mat.supr-con Databases Information Theory math.IT Networking and Internet Architecture physics.app-ph physics.optics quant-ph Applications astro-ph.SR Biological Physics Biomolecules Cell Behavior Computer Science and Game Theory cond-mat.quant-gas cond-mat.soft cond-mat.stat-mech Data Structures and Algorithms eess.SY hep-ex hep-th Human-Computer Interaction math-ph math.AP math.DG math.MP Methodology Multiagent Systems Neural and Evolutionary Computing nlin.AO nucl-th physics.acc-ph physics.atm-clus physics.atom-ph physics.chem-ph physics.gen-ph physics.ins-det physics.plasm-ph Software Engineering Systems and Control

Catalog footprint

What is connected

161works

69topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

Medical vision--language models (VLMs) are usually evaluated on intact image--question pairs, but trustworthy clinical use requires a stronger property: a model must recognise when the evidential basis for an answer has failed. We study this through silent failures under perturbed evidence, where a vision-required medical question is paired with a false premise, wording perturbation, knowledge-only rewrite, or ROI-corrupted image, yet the model returns a fluent non-refusal answer. We introduce medvigil, a 300-case evaluation suite drawn from four public medical VQA sources, supervised end to end by four board-certified radiologists: every gold answer, refusal option, candidate-answer set, paraphrase, false-premise trap, ROI box, and clinical risk tier is clinician-authored. Two attending radiologists annotate every case in parallel, a senior radiologist consolidates the released manifest, and a separate fourth radiologist independent of construction answers every probe to provide the human reference baseline. The release contains 2{,}556 MCQ probes, 240 counterfactual triplets, physician-adjudicated risk-tier and answerability flags, ROI boxes, and a paired open-ended variant. We report seven correctness-conditioned audit metrics that summarise into the medvigil Composite Score (MCS), and audit 16 vision-capable models plus two text-only baselines. The independent radiologist scores MCS 83.3 at silent-failure rate 5.8%, leaving a 14.1-point composite headroom above the strongest audited model (Claude Opus 4.7 at 69.2). The benchmark and evaluation harness are publicly released.

preprint2024arXiv

Non-aligned supervision for Real Image Dehazing

Removing haze from real-world images is challenging due to unpredictable weather conditions, resulting in the misalignment of hazy and clear image pairs. In this paper, we propose an innovative dehazing framework that operates under non-aligned supervision. This framework is grounded in the atmospheric scattering model, and consists of three interconnected networks: dehazing, airlight, and transmission networks. In particular, we explore a non-alignment scenario that a clear reference image, unaligned with the input hazy image, is utilized to supervise the dehazing network. To implement this, we present a multi-scale reference loss that compares the feature representations between the referred image and the dehazed output. Our scenario makes it easier to collect hazy/clear image pairs in real-world environments, even under conditions of misalignment and shift views. To showcase the effectiveness of our scenario, we have collected a new hazy dataset including 415 image pairs captured by mobile Phone in both rural and urban areas, called "Phone-Hazy". Furthermore, we introduce a self-attention network based on mean and variance for modeling real infinite airlight, using the dark channel prior as positional guidance. Additionally, a channel attention network is employed to estimate the three-channel transmission. Experimental results demonstrate the superior performance of our framework over existing state-of-the-art techniques in the real-world image dehazing task. Phone-Hazy and code will be available at https://fanjunkai1.github.io/projectpage/NSDNet/index.html.

preprint2023arXiv

Designing optimal linear detectors -- a bottom-up approach

This paper develops a systematic approach to realising linear detectors with an optimised sensitivity, allowing for the detection of extremely weak signals. First, general constraints are derived on a specific class of input-output transfer functions of a linear detector. Then a physical realization of transfer functions in that class is found using the quantum network synthesis technique, which allows for the inference of the physical setup directly from the input-output transfer function. By exploring a minimal realization which has the minimum number of internal modes, it is shown that the optimal such detectors are internal squeezing schemes. Then, investigating non-minimal realizations, which is motivated by the parity-time symmetric systems, a quantum non-demolition measurement is systematically recovered.

preprint2023arXiv

Differentiate ChatGPT-generated and Human-written Medical Texts

Background: Large language models such as ChatGPT are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the Internet. However, medical texts such as clinical notes and diagnoses require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to healthcare and the general public. Objective: This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine. We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT, and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. Methods: We first construct a suite of datasets containing medical texts written by human experts and generated by ChatGPT. In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally, we design and implement machine learning methods to detect medical text generated by ChatGPT. Results: Medical texts written by humans are more concrete, more diverse, and typically contain more useful information, while medical texts generated by ChatGPT pay more attention to fluency and logic, and usually express general terminologies rather than effective information specific to the context of the problem. A BERT-based model can effectively detect medical texts generated by ChatGPT, and the F1 exceeds 95%.

preprint2023arXiv

DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution

We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution. Our method explores the properties of Transformers while having low computational costs. Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently. In addition, we note that existing Transformers usually explore all similarities of the tokens between the queries and keys for the feature aggregation. However, not all the tokens from the queries are relevant to those in keys, using all the similarities does not effectively facilitate the high-resolution image reconstruction. To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values so that the most useful global features can be better utilized for the high-resolution image reconstruction. We develop a hybrid dynamic-Transformer block(HDTB) that integrates the MHDLSA and SparseGSA for both local and global feature exploration. To ease the network training, we formulate the HDTBs into a residual hybrid dynamic-Transformer group (RHDTG). By embedding the RHDTGs into an end-to-end trainable network, we show that our proposed method has fewer network parameters and lower computational costs while achieving competitive performance against state-of-the-art ones in terms of accuracy. More information is available at https://neonleexiang.github.io/DLGSANet/

preprint2023arXiv

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

We investigate whether self-supervised learning (SSL) can improve online reinforcement learning (RL) from pixels. We extend the contrastive reinforcement learning framework (e.g., CURL) that jointly optimizes SSL and RL losses and conduct an extensive amount of experiments with various self-supervised losses. Our observations suggest that the existing SSL framework for RL fails to bring meaningful improvement over the baselines only taking advantage of image augmentation when the same amount of data and augmentation is used. We further perform evolutionary searches to find the optimal combination of multiple self-supervised losses for RL, but find that even such a loss combination fails to meaningfully outperform the methods that only utilize carefully designed image augmentations. After evaluating these approaches together in multiple different environments including a real-world robot environment, we confirm that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited. Finally, we conduct the ablation study on multiple factors and demonstrate the properties of representations learned with different approaches.

preprint2023arXiv

LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous Driving

Image instance segmentation is a fundamental research topic in autonomous driving, which is crucial for scene understanding and road safety. Advanced learning-based approaches often rely on the costly 2D mask annotations for training. In this paper, we present a more artful framework, LiDAR-guided Weakly Supervised Instance Segmentation (LWSIS), which leverages the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models. Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the annotation cost of the dense 2D masks. In detail, LWSIS consists of two crucial modules, Point Label Assignment (PLA) and Graph-based Consistency Regularization (GCR). The former module aims to automatically assign the 3D point cloud as 2D point-wise labels, while the latter further refines the predictions by enforcing geometry and appearance consistency of the multimodal data. Moreover, we conduct a secondary instance segmentation annotation on the nuScenes, named nuInsSeg, to encourage further research on multimodal perception tasks. Extensive experiments on the nuInsSeg, as well as the large-scale Waymo, show that LWSIS can substantially improve existing weakly supervised segmentation models by only involving 3D data during training. Additionally, LWSIS can also be incorporated into 3D object detectors like PointPainting to boost the 3D detection performance for free. The code and dataset are available at https://github.com/Serenos/LWSIS.

preprint2023arXiv

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions. In this work, we propose State-Action-Reward Transformer (StARformer) for visual RL, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. Our approach first extracts StAR-representations by self-attending image state patches, action, and reward tokens within a short temporal window. These are then combined with pure image state representations -- extracted as convolutional features, to perform self-attention over the whole sequence. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs. Our code is available at https://github.com/elicassion/StARformer.

preprint2023arXiv

Towards Spatial Equilibrium Object Detection

Semantic objects are unevenly distributed over images. In this paper, we study the spatial disequilibrium problem of modern object detectors and propose to quantify this ``spatial bias'' by measuring the detection performance over zones. Our analysis surprisingly shows that the spatial imbalance of objects has a great impact on the detection performance, limiting the robustness of detection applications. This motivates us to design a more generalized measurement, termed Spatial equilibrium Precision (SP), to better characterize the detection performance of object detectors. Furthermore, we also present a spatial equilibrium label assignment (SELA) to alleviate the spatial disequilibrium problem by injecting the prior spatial weight into the optimization process of detectors. Extensive experiments on PASCAL VOC, MS COCO, and 3 application datasets on face mask/fruit/helmet images demonstrate the advantages of our method. Our findings challenge the conventional sense of object detectors and show the indispensability of spatial equilibrium. We hope these discoveries would stimulate the community to rethink how an excellent object detector should be. All the source code, evaluation protocols, and the tutorials are publicly available at https://github.com/Zzh-tju/ZoneEval

preprint2022arXiv

2D Toda $τ$ Functions, Weighted Hurwitz Numbers and the Cayley Graph: Determinant Representation and Recursion Formula

We generalize the determinant representation of the KP $τ$ functions to the case of the 2D Toda $τ$ functions. The generating functions for the weighted Hurwitz numbers are a parametric family of 2D Toda $τ$ functions; for which we give a determinant representation of weighted Hurwitz numbers. Then we can get a finite-dimensional equation system for the weighted Hurwitz numbers $H^d_{G}(σ,ω)$ with the same dimension $|σ|=|ω|=n$. Using this equation system, we calculated the value of the weighted Hurwitz numbers with dimension $0,\,1,\,2$ and give a recursion formula to calculating the higher dimensional weighted Hurwitz numbers. For any given weighted generating function $G(z)$, the weighted Hurwitz number degenerates into the Hurwitz numbers when $d=0$. We get a matrix representation for the Hurwitz numbers. The generating functions of weighted paths in the Cayley graph of the symmetric group are a parametric family of 2D Toda $τ$ functions; for which we obtain a determinant representation of weighted paths in the Cayley graph.

preprint2022arXiv

A density peaks clustering algorithm with sparse search and K-d tree

Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Firstly, a sparse distance matrix is calculated by using K-d tree to replace the original full rank distance matrix, so as to accelerate the calculation of local density. Secondly, a sparse search strategy is proposed to accelerate the computation of relative-separation with the intersection between the set of $k$ nearest neighbors and the set consisting of the data points with larger local density for any data point. Furthermore, a second-order difference method for decision values is adopted to determine the cluster centers adaptively. Finally, experiments are carried out on datasets with different distribution characteristics, by comparing with other six state-of-the-art clustering algorithms. It is proved that the algorithm can effectively reduce the computational complexity of the original DPC from $O(n^2K)$ to $O(n(n^{1-1/K}+k))$. Especially for larger datasets, the efficiency is elevated more remarkably. Moreover, the clustering accuracy is also improved to a certain extent. Therefore, it can be concluded that the overall performance of the newly proposed algorithm is excellent.

preprint2022arXiv

A Neural Network Architecture for Program Understanding Inspired by Human Behaviors

Program understanding is a fundamental task in program language processing. Despite the success, existing works fail to take human behaviors as reference in understanding programs. In this paper, we consider human behaviors and propose the PGNN-EK model that consists of two main components. On the one hand, inspired by the "divide-and-conquer" reading behaviors of humans, we present a partitioning-based graph neural network model PGNN on the upgraded AST of codes. On the other hand, to characterize human behaviors of resorting to other resources to help code comprehension, we transform raw codes with external knowledge and apply pre-training techniques for information extraction. Finally, we combine the two embeddings generated from the two components to output code embeddings. We conduct extensive experiments to show the superior performance of PGNN-EK on the code summarization and code clone detection tasks. In particular, to show the generalization ability of our model, we release a new dataset that is more challenging for code clone detection and could advance the development of the community. Our codes and data are publicly available at https://github.com/RecklessRonan/PGNN-EK.

preprint2022arXiv

Ab initio calculation of real solids via neural network ansatz

Neural networks have been applied to tackle many-body electron correlations for small molecules and physical models in recent years. Here we propose a new architecture that extends molecular neural networks with the inclusion of periodic boundary conditions to enable ab initio calculation of real solids. The accuracy of our approach is demonstrated in four different types of systems, namely the one-dimensional periodic hydrogen chain, the two-dimensional graphene, the three-dimensional lithium hydride crystal, and the homogeneous electron gas, where the obtained results, e.g. total energies, dissociation curves, and cohesive energies, outperform many traditional ab initio methods and reach the level of the most accurate approaches. Moreover, electron densities of typical systems are also calculated to provide physical intuition of various solids. Our method of extending a molecular neural network to periodic systems can be easily integrated into other neural network structures, highlighting a promising future of ab initio solution of more complex solid systems using neural network ansatz, and more generally endorsing the application of machine learning in materials simulation and condensed matter physics.

preprint2022arXiv

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

Text normalization, defined as a procedure transforming non standard words to spoken-form words, is crucial to the intelligibility of synthesized speech in text-to-speech system. Rule-based methods without considering context can not eliminate ambiguation, whereas sequence-to-sequence neural network based methods suffer from the unexpected and uninterpretable errors problem. Recently proposed hybrid system treats rule-based model and neural model as two cascaded sub-modules, where limited interaction capability makes neural network model cannot fully utilize expert knowledge contained in the rules. Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task. We also release a first publicly accessible largescale dataset for Chinese text normalization. Our proposed model has achieved excellent results on this dataset.

preprint2022arXiv

Angular emission patterns of remnant black holes

The gravitational radiation from the ringdown of a binary black hole merger is described by the solution of the Teukolsky equation, which predicts both the temporal dependence and the angular distribution of the emission. Many studies have explored the temporal feature of the ringdown wave through black hole spectroscopy. In this work, we further study the spatial distribution, by introducing a global fitting procedure over both temporal and spatial dependences, to propose a more complete test of General Relativity. We show that spin-weighted spheroidal harmonics are the better representation of the ringdown angular emission patterns compared to spin-weighted spherical harmonics. The differences are distinguishable in numerical relativity waveforms. We also study the correlation between progenitor binary properties and the excitation of quasinormal modes, including higher-order angular modes, overtones, prograde and retrograde modes. Specifically, we show that the excitation of retrograde modes is dominant when the remnant spin is anti-aligned with the binary orbital angular momentum. This study seeks to provide an analytical strategy and inspire the future development of ringdown test using real gravitational wave events.

preprint2022arXiv

ARROCH: Augmented Reality for Robots Collaborating with a Human

Human-robot collaboration frequently requires extensive communication, e.g., using natural language and gestures. Augmented reality (AR) has provided an alternative way of bridging the communication gap between robots and people. However, most current AR-based human-robot communication methods are unidirectional, focusing on how the human adapts to robot behaviors, and are limited to single-robot domains. In this paper, we develop AR for Robots Collaborating with a Human (ARROCH), a novel algorithm and system that supports bidirectional, multi-turn, human-multi-robot communication in indoor multi-room environments. The human can see through obstacles to observe the robots' current states and intentions, and provide feedback, while the robots' behaviors are then adjusted toward human-multi-robot teamwork. Experiments have been conducted with real robots and human participants using collaborative delivery tasks. Results show that ARROCH outperformed a standard non-AR approach in both user experience and teamwork efficiency. In addition, we have developed a novel simulation environment using Unity (for AR and human simulation) and Gazebo (for robot simulation). Results in simulation demonstrate ARROCH's superiority over AR-based baselines in human-robot collaboration.

preprint2022arXiv

Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding

We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task. In the first stage, we propose an Augmented 2D Temporal Adjacent Network (Augmented 2D-TAN) to temporally ground the target moment corresponding to the given description. Primarily, we improve the original 2D-TAN from two aspects: First, a temporal context-aware Bi-LSTM Aggregation Module is developed to aggregate clip-level representations, replacing the original max-pooling. Second, we propose to employ Random Concatenation Augmentation (RCA) mechanism during the training phase. In the second stage, we use pretrained MDETR model to generate per-frame bounding boxes via language query, and design a set of hand-crafted rules to select the best matching bounding box outputted by MDETR for each frame within the grounded moment.

preprint2022arXiv

AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System

Industrial search and recommendation systems mostly follow the classic multi-stage information retrieval paradigm: matching, pre-ranking, ranking, and re-ranking stages. To account for system efficiency, simple vector-product based models are commonly deployed in the pre-ranking stage. Recent works consider distilling the high knowledge of large ranking models to small pre-ranking models for better effectiveness. However, two major challenges in pre-ranking system still exist: (i) without explicitly modeling the performance gain versus computation cost, the predefined latency constraint in the pre-ranking stage inevitably leads to suboptimal solutions; (ii) transferring the ranking teacher's knowledge to a pre-ranking student with a predetermined handcrafted architecture still suffers from the loss of model performance. In this work, a novel framework AutoFAS is proposed which jointly optimizes the efficiency and effectiveness of the pre-ranking model: (i) AutoFAS for the first time simultaneously selects the most valuable features and network architectures using Neural Architecture Search (NAS) technique; (ii) equipped with ranking model guided reward during NAS procedure, AutoFAS can select the best pre-ranking architecture for a given ranking teacher without any computation overhead. Experimental results in our real world search system show AutoFAS consistently outperforms the previous state-of-the-art (SOTA) approaches at a lower computing cost. Notably, our model has been adopted in the pre-ranking module in the search system of Meituan, bringing significant improvements.

preprint2022arXiv

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

Visual grounding is a task that aims to locate a target object according to a natural language expression. As a multi-modal task, feature interaction between textual and visual inputs is vital. However, previous solutions mainly handle each modality independently before fusing them together, which does not take full advantage of relevant textual information while extracting visual features. To better leverage the textual-visual relationship in visual grounding, we propose a Query-conditioned Convolution Module (QCM) that extracts query-aware visual features by incorporating query information into the generation of convolutional kernels. With our proposed QCM, the downstream fusion module receives visual features that are more discriminative and focused on the desired object described in the expression, leading to more accurate predictions. Extensive experiments on three popular visual grounding datasets demonstrate that our method achieves state-of-the-art performance. In addition, the query-aware visual features are informative enough to achieve comparable performance to the latest methods when directly used for prediction without further multi-modal fusion.

preprint2022arXiv

Benchmarking of DL Libraries and Models on Mobile Devices

Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libs and provides quantitative results on their performance. In this paper, we first build a comprehensive benchmark that includes 6 representative DL libs and 15 diversified DL models. We then perform extensive experiments on 10 mobile devices, which help reveal a complete landscape of the current mobile DL libs ecosystem. For example, we find that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge. In fact, the impacts of DL libs can overwhelm the optimizations from algorithms or hardware, e.g., model quantization and GPU/DSP-based heterogeneous computing. Finally, atop the observations, we summarize practical implications to different roles in the DL lib ecosystem.

preprint2022arXiv

CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning

Graph similarity learning refers to calculating the similarity score between two graphs, which is required in many realistic applications, such as visual tracking, graph classification, and collaborative filtering. As most of the existing graph neural networks yield effective graph representations of a single graph, little effort has been made for jointly learning two graph representations and calculating their similarity score. In addition, existing unsupervised graph similarity learning methods are mainly clustering-based, which ignores the valuable information embodied in graph pairs. To this end, we propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning in order to calculate the similarity between any two input graph objects. Specifically, we generate two augmented views for each graph in a pair respectively. Then, we employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning. The former is resorted to strengthen the consistency of node representations in two views. The latter is utilized to identify node differences between different graphs. Finally, we transform node representations into graph-level representations via pooling operations for graph similarity computation. We have evaluated CGMN on eight real-world datasets, and the experiment results show that the proposed new approach is superior to the state-of-the-art methods in graph similarity learning downstream tasks.

preprint2022arXiv

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, auto-encoder and graph neural networks. However, the majority of existing collaborative filtering systems are not well designed to handle missing data. Particularly, in order to inject the negative signals in the training phase, these solutions largely rely on negative sampling from unobserved user-item interactions and simply treating them as negative instances, which brings the recommendation performance degradation. To address the issues, we develop a Collaborative Reflection-Augmented Autoencoder Network (CRANet), that is capable of exploring transferable knowledge from observed and unobserved user-item interactions. The network architecture of CRANet is formed of an integrative structure with a reflective receptor network and an information fusion autoencoder module, which endows our recommendation framework with the ability of encoding implicit user's pairwise preference on both interacted and non-interacted items. Additionally, a parametric regularization-based tied-weight scheme is designed to perform robust joint training of the two-stage CRANet model. We finally experimentally validate CRANet on four diverse benchmark datasets corresponding to two recommendation tasks, to show that debiasing the negative signals of user-item interactions improves the performance as compared to various state-of-the-art recommendation techniques. Our source code is available at https://github.com/akaxlh/CRANet.

preprint2022arXiv

Comment on "Comparison of six simulation codes for positive streamers in air"(Plasma Sources Sci. Technol. 27 (2018) 095002)

Recently, a comparison of six codes for streamer discharge simulations were performed in [1]. In this comment, we discuss about the big differences between the results obtained by the different codes using the same deterministic model, and raise questions on the convergence of the codes and the minimum spatial resolution that are required for a converged results.

preprint2022arXiv

Distributed Online Optimization in Time-Varying Unbalanced Networks without Explicit Subgradients

This paper studies a distributed online constrained optimization problem over time-varying unbalanced digraphs without explicit subgradients. In sharp contrast to the existing algorithms, we design a novel consensus-based distributed online algorithm with a local randomized zeroth-order oracle and then rescale the oracle by constructing row-stochastic matrices, which aims to address the unbalancedness of time-varying digraphs. Under mild conditions, the average dynamic regret over a time horizon is shown to asymptotically converge at a sublinear rate provided that the accumulated variation grows sublinearly with a specific order. Moreover, the counterpart of the proposed algorithm when subgradients are available is also provided, along with its dynamic regret bound, which reflects that the convergence of our algorithm is essentially not affected by the zeroth-order oracle. Simulations on distributed targets tracking problem and dynamic sparse signal recovery problem in sensor networks are employed to demonstrate the effectiveness of the proposed algorithm.

preprint2022arXiv

DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection

The Mean-Teacher (MT) scheme is widely adopted in semi-supervised object detection (SSOD). In MT, the sparse pseudo labels, offered by the final predictions of the teacher (e.g., after Non Maximum Suppression (NMS) post-processing), are adopted for the dense supervision for the student via hand-crafted label assignment. However, the sparse-to-dense paradigm complicates the pipeline of SSOD, and simultaneously neglects the powerful direct, dense teacher supervision. In this paper, we attempt to directly leverage the dense guidance of teacher to supervise student training, i.e., the dense-to-dense paradigm. Specifically, we propose the Inverse NMS Clustering (INC) and Rank Matching (RM) to instantiate the dense supervision, without the widely used, conventional sparse pseudo labels. INC leads the student to group candidate boxes into clusters in NMS as the teacher does, which is implemented by learning grouping information revealed in NMS procedure of the teacher. After obtaining the same grouping scheme as the teacher via INC, the student further imitates the rank distribution of the teacher over clustered candidates through Rank Matching. With the proposed INC and RM, we integrate Dense Teacher Guidance into Semi-Supervised Object Detection (termed DTG-SSOD), successfully abandoning sparse pseudo labels and enabling more informative learning on unlabeled data. On COCO benchmark, our DTG-SSOD achieves state-of-the-art performance under various labelling ratios. For example, under 10% labelling ratio, DTG-SSOD improves the supervised baseline from 26.9 to 35.9 mAP, outperforming the previous best method Soft Teacher by 1.9 points.

preprint2022arXiv

Dual Power Spectrum Manifold and Toeplitz HPD Manifold: Enhancement and Analysis for Matrix CFAR Detection

Recently, an innovative matrix CFAR detection scheme based on information geometry, also referred to as the geometric detector, has been developed speedily and exhibits distinct advantages in several practical applications. These advantages benefit from the geometry of the Toeplitz Hermitian positive definite (HPD) manifold $\mathcal{M}_{\mathcal{T}H_{++}}$, but the sophisticated geometry also results in some challenges for geometric detectors, such as the implementation of the enhanced detector to improve the SCR (signal-to-clutter ratio) and the analysis of the detection performance. To meet these challenges, this paper develops the dual power spectrum manifold $\mathcal{M}_{\text{P}}$ as the dual space of $\mathcal{M}_{\mathcal{T}H_{++}}$. For each affine invariant geometric measure on $\mathcal{M}_{\mathcal{T}H_{++}}$, we show that there exists an equivalent function named induced potential function on $\mathcal{M}_{\text{P}}$. By the induced potential function, the measurements of the dissimilarity between two matrices can be implemented on $\mathcal{M}_{\text{P}}$, and the geometric detectors can be reformulated as the form related to the power spectrum. The induced potential function leads to two contributions: 1) The enhancement of the geometric detector, which is formulated as an optimization problem concerning $\mathcal{M}_{\mathcal{T}H_{++}}$, is transformed to an equivalent and simpler optimization on $\mathcal{M}_{\text{P}}$. In the presented example of the enhancement, the closed-form solution, instead of the gradient descent method, is provided through the equivalent optimization. 2) The detection performance is analyzed based on $\mathcal{M}_{\text{P}}$, and the advantageous characteristics, which benefit the detection performance, can be deduced by analyzing the corresponding power spectrum to the maximal point of the induced potential function.

preprint2022arXiv

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information

Fine-grained image classification is a challenging computer vision task where various species share similar visual appearances, resulting in misclassification if merely based on visual clues. Therefore, it is helpful to leverage additional information, e.g., the locations and dates for data shooting, which can be easily accessible but rarely exploited. In this paper, we first demonstrate that existing multimodal methods fuse multiple features only on a single dimension, which essentially has insufficient help in feature discrimination. To fully explore the potential of multimodal information, we propose a dynamic MLP on top of the image representation, which interacts with multimodal features at a higher and broader dimension. The dynamic MLP is an efficient structure parameterized by the learned embeddings of variable locations and dates. It can be regarded as an adaptive nonlinear projection for generating more discriminative image representations in visual tasks. To our best knowledge, it is the first attempt to explore the idea of dynamic networks to exploit multimodal information in fine-grained image classification tasks. Extensive experiments demonstrate the effectiveness of our method. The t-SNE algorithm visually indicates that our technique improves the recognizability of image representations that are visually similar but with different categories. Furthermore, among published works across multiple fine-grained datasets, dynamic MLP consistently achieves SOTA results https://paperswithcode.com/dataset/inaturalist and takes third place in the iNaturalist challenge at FGVC8 https://www.kaggle.com/c/inaturalist-2021/leaderboard. Code is available at https://github.com/ylingfeng/DynamicMLP.git

preprint2022arXiv

EEG based Emotion Recognition: A Tutorial and Review

Emotion recognition technology through analyzing the EEG signal is currently an essential concept in Artificial Intelligence and holds great potential in emotional health care, human-computer interaction, multimedia content recommendation, etc. Though there have been several works devoted to reviewing EEG-based emotion recognition, the content of these reviews needs to be updated. In addition, those works are either fragmented in content or only focus on specific techniques adopted in this area but neglect the holistic perspective of the entire technical routes. Hence, in this paper, we review from the perspective of researchers who try to take the first step on this topic. We review the recent representative works in the EEG-based emotion recognition research and provide a tutorial to guide the researchers to start from the beginning. The scientific basis of EEG-based emotion recognition in the psychological and physiological levels is introduced. Further, we categorize these reviewed works into different technical routes and illustrate the theoretical basis and the research motivation, which will help the readers better understand why those techniques are studied and employed. At last, existing challenges and future investigations are also discussed in this paper, which guides the researchers to decide potential future research directions.

preprint2022arXiv

Effects of homophily and heterophily on preferred-degree networks: mean-field analysis and overwhelming transition

We investigate the long-time properties of a dynamic, out-of-equilibrium network of individuals holding one of two opinions in a population consisting of two communities of different sizes. Here, while the agents' opinions are fixed, they have a preferred degree which leads them to endlessly create and delete links. Our evolving network is shaped by homophily/heterophily, which is a form of social interaction by which individuals tend to establish links with others having similar/dissimilar opinions. Using Monte Carlo simulations and a detailed mean-field analysis, we study in detail how the sizes of the communities and the degree of homophily/heterophily affects the network structure. In particular, we show that when the network is subject to enough heterophily, an "overwhelming transition" occurs: individuals of the smaller community are overwhelmed by links from agents of the larger group, and their mean degree greatly exceeds the preferred degree. This and related phenomena are characterized by obtaining the network's total and joint degree distributions, as well as the fraction of links across both communities and that of agents having fewer edges than the preferred degree. We use our mean-field theory to discuss the network's polarization when the group sizes and level of homophily vary.

preprint2022arXiv

Enhanced ultrafast X-ray diffraction by transient resonances

Diffraction-before-destruction imaging with single ultrashort X-ray pulses has the potential to visualise non-equilibrium processes, such as chemical reactions, at the nanoscale with sub-femtosecond resolution in the native environment without the need of crystallization. Here, a nanospecimen partially diffracts a single X-ray flash before sample damage occurs. The structural information of the sample can be reconstructed from the coherent X-ray interference image. State-of-art spatial resolution of such snapshots from individual heavy element nanoparticles is limited to a few nanometers. Further improvement of spatial resolution requires higher image brightness which is ultimately limited by bleaching effects of the sample. We compared snapshots from individual 100 nm Xe nanoparticles as a function of the X-ray pulse duration and incoming X-ray intensity in the vicinity of the Xe M-shell resonance. Surprisingly, images recorded with few femtosecond and sub-femtosecond pulses are up to 10 times brighter than the static linear model predicts. Our Monte-Carlo simulation and statistical analysis of the entire data set confirms these findings and attributes the effect to transient resonances. Our simulation suggests that ultrafast form factor changes during the exposure can increase the brightness of X-ray images by several orders of magnitude. Our study guides the way towards imaging with unprecedented combination of spatial and temporal resolution at the nanoscale.

preprint2022arXiv

Entire solutions of the generalized Hessian inequality

In this paper, we discuss the more general Hessian inequality $σ_{k}^{\frac{1}{k}}(λ(D_i (A\left(|Du|\right) D_j u)))\geq f(u)$ including the Laplacian, p-Laplacian, mean curvature, Hessian, k-mean curvature operators, and provide a necessary and sufficient condition on the global solvability, which can be regarded as generalized Keller-Osserman conditions.

preprint2022arXiv

Finding Global Homophily in Graph Neural Networks When Meeting Heterophily

We investigate graph neural networks on graphs with heterophily. Some existing methods amplify a node's neighborhood with multi-hop neighbors to include more nodes with homophily. However, it is a significant challenge to set personalized neighborhood sizes for different nodes. Further, for other homophilous nodes excluded in the neighborhood, they are ignored for information aggregation. To address these problems, we propose two models GloGNN and GloGNN++, which generate a node's embedding by aggregating information from global nodes in the graph. In each layer, both models learn a coefficient matrix to capture the correlations between nodes, based on which neighborhood aggregation is performed. The coefficient matrix allows signed values and is derived from an optimization problem that has a closed-form solution. We further accelerate neighborhood aggregation and derive a linear time complexity. We theoretically explain the models' effectiveness by proving that both the coefficient matrix and the generated node embedding matrix have the desired grouping effect. We conduct extensive experiments to compare our models against 11 other competitors on 15 benchmark datasets in a wide range of domains, scales and graph heterophilies. Experimental results show that our methods achieve superior performance and are also very efficient.

preprint2022arXiv

Forgery Attack Detection in Surveillance Video Streams Using Wi-Fi Channel State Information

The cybersecurity breaches expose surveillance video streams to forgery attacks, under which authentic streams are falsified to hide unauthorized activities. Traditional video forensics approaches can localize forgery traces using spatial-temporal analysis on relatively long video clips, while falling short in real-time forgery detection. The recent work correlates time-series camera and wireless signals to detect looped videos but cannot realize fine-grained forgery localization. To overcome these limitations, we propose Secure-Pose, which exploits the pervasive coexistence of surveillance and Wi-Fi infrastructures to defend against video forgery attacks in a real-time and fine-grained manner. We observe that coexisting camera and Wi-Fi signals convey common human semantic information and forgery attacks on video streams will decouple such information correspondence. Particularly, retrievable human pose features are first extracted from concurrent video and Wi-Fi channel state information (CSI) streams. Then, a lightweight detection network is developed to accurately discover forgery attacks and an efficient localization algorithm is devised to seamlessly track forgery traces in video streams. We implement Secure-Pose using one Logitech camera and two Intel 5300 NICs and evaluate it in different environments. Secure-Pose achieves a high detection accuracy of 98.7% and localizes abnormal objects under playback and tampering attacks.

preprint2022arXiv

Global Model Learning for Large Deformation Control of Elastic Deformable Linear Objects: An Efficient and Adaptive Approach

Robotic manipulation of deformable linear objects (DLOs) has broad application prospects in many fields. However, a key issue is to obtain the exact deformation models (i.e., how robot motion affects DLO deformation), which are hard to theoretically calculate and vary among different DLOs. Thus, shape control of DLOs is challenging, especially for large deformation control which requires global and more accurate models. In this paper, we propose a coupled offline and online data-driven method for efficiently learning a global deformation model, allowing for both accurate modeling through offline learning and further updating for new DLOs via online adaptation. Specifically, the model approximated by a neural network is first trained offline on random data, then seamlessly migrated to the online phase, and further updated online during actual manipulation. Several strategies are introduced to improve the model's efficiency and generalization ability. We propose a convex-optimization-based controller, and analyze the system's stability using the Lyapunov method. Detailed simulations and real-world experiments demonstrate that our method can efficiently and precisely estimate the deformation model, and achieve large deformation control of untrained DLOs in 2D and 3D dual-arm manipulation tasks better than the existing methods. It accomplishes all 24 tasks with different desired shapes on different DLOs in the real world, using only simulation data for the offline learning.

preprint2022arXiv

High-pressure control of optical nonlinearity in the polar Weyl semimetal TaAs

The transition metal monopnictide family of Weyl semimetals recently has been shown to exhibit anomalously strong second-order optical nonlinearity, which is theoretically attributed to a highly asymmetric polarization distribution induced by their polar structure. We experimentally test this hypothesis by measuring optical second harmonic generation (SHG) from TaAs across a pressure-tuned polar-to-nonpolar structural phase transition. Despite the high-pressure structure remaining noncentrosymmetric, the SHG yield is reduced by more than 60 % by 20 GPa as compared to the ambient pressure value. By examining the pressure dependence of distinct groups of SHG susceptibility tensor elements, we find that the yield is primarily controlled by a single element that governs the response along the polar axis. Our results confirm a connection between the polar axis and the giant optical nonlinearity of Weyl semimetals and demonstrate pressure as a means to tune this effect $in$ $situ$.

preprint2022arXiv

Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation

Recently, transformer-based image segmentation methods have achieved notable success against previous solutions. While for video domains, how to effectively model temporal context with the attention of object instances across frames remains an open problem. In this paper, we propose an online video instance segmentation framework with a novel instance-aware temporal fusion method. We first leverages the representation, i.e., a latent code in the global context (instance code) and CNN feature maps to represent instance- and pixel-level features. Based on this representation, we introduce a cropping-free temporal fusion approach to model the temporal consistency between video frames. Specifically, we encode global instance-specific information in the instance code and build up inter-frame contextual fusion with hybrid attentions between the instance codes and CNN feature maps. Inter-frame consistency between the instance codes are further enforced with order constraints. By leveraging the learned hybrid temporal consistency, we are able to directly retrieve and maintain instance identities across frames, eliminating the complicated frame-wise instance matching in prior methods. Extensive experiments have been conducted on popular VIS datasets, i.e. Youtube-VIS-19/21. Our model achieves the best performance among all online VIS methods. Notably, our model also eclipses all offline methods when using the ResNet-50 backbone.

preprint2022arXiv

Hydrodynamic Relaxation in a Strongly Interacting Fermi Gas

We measure the free decay of a spatially periodic density profile in a normal fluid strongly interacting Fermi gas, which is confined in a box potential. This spatial profile is initially created in thermal equilibrium by a perturbing potential. After the perturbation is abruptly extinguished, the dominant spatial Fourier component exhibits an exponentially decaying (thermally diffusive) mode and a decaying oscillatory (first sound) mode, enabling independent measurement of the thermal conductivity and the shear viscosity directly from the time-dependent evolution.

preprint2022arXiv

Influence of impurities on electronic structure in cuprate superconductors

The impurity is inherently manifest in cuprate superconductors, as cation substitution or intercalation is necessary for the introduction of charge carriers, and its influence on the electronic state is at the heart of a great debate in physics. Here based on the microscopic octet scattering model, the influence of the impurity scattering on the electronic structure of cuprate superconductors is investigated in terms of the self-consistent T-matrix approach. The impurity scattering self-energy is evaluated firstly in the Fermi-arc-tip approximation of the quasiparticle excitations and scattering processes, and the obtained results show that the decisive role played by the impurity scattering self-energy in the particle-hole channel is the further renormalization of the quasiparticle band structure with a reduced quasiparticle lifetime, while the impurity scattering self-energy in the particle-particle channel induces a strong deviation from the d-wave behaviour of the superconducting gap, leading to the existence of a finite gap over the entire electron Fermi surface. Moreover, these impurity scattering self-energies are employed to study the exotic features of the line-shape in the quasiparticle excitation spectrum and the autocorrelation of the quasiparticle excitation spectra, and the obtained results are then compared with the corresponding experimental data. The theory therefore also indicates that the unconventional features of the electronic structure in cuprate superconductors is generated by both the strong electron correlation and impurity scattering.

preprint2022arXiv

Learning a General Clause-to-Clause Relationships for Enhancing Emotion-Cause Pair Extraction

Emotion-cause pair extraction (ECPE) is an emerging task aiming to extract potential pairs of emotions and corresponding causes from documents. Previous approaches have focused on modeling the pair-to-pair relationship and achieved promising results. However, the clause-to-clause relationship, which fundamentally symbolizes the underlying structure of a document, has still been in its research infancy. In this paper, we define a novel clause-to-clause relationship. To learn it applicably, we propose a general clause-level encoding model named EA-GAT comprising E-GAT and Activation Sort. E-GAT is designed to aggregate information from different types of clauses; Activation Sort leverages the individual emotion/cause prediction and the sort-based mapping to propel the clause to a more favorable representation. Since EA-GAT is a clause-level encoding model, it can be broadly integrated with any previous approach. Experimental results show that our approach has a significant advantage over all current approaches on the Chinese and English benchmark corpus, with an average of $2.1\%$ and $1.03\%$.

preprint2022arXiv

Lexical Knowledge Internalization for Neural Dialog Generation

We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models. Instead of further conditioning the knowledge-grounded dialog (KGD) models on externally retrieved knowledge, we seek to integrate knowledge about each input token internally into the model's parameters. To tackle the challenge due to the large scale of lexical knowledge, we adopt the contrastive learning approach and create an effective token-level lexical knowledge retriever that requires only weak supervision mined from Wikipedia. We demonstrate the effectiveness and general applicability of our approach on various datasets and diversified model structures.

preprint2022arXiv

Mamba: a systematic software solution for beamline experiments at HEPS

To cater for the diverse experiment requirements at the High Energy Photon Source (HEPS) with often limited human resources, Bluesky is chosen as the basis for our software framework, Mamba. In our attempt to address Bluesky's lack of integrated GUIs, command injection with feedback is chosen as the main way for the GUIs to cooperate with the CLI; a RPC service is provided, which also covers functionalities unsuitable for command injection, as well as pushing of status updates. In order to fully support high-frequency applications like fly scans, Bluesky's support for asynchronous control is being improved; to support high-throughput experiments, Mamba Data Worker (MDW) is being developed to cover the complexity in asynchronous online data processing for these experiments. To systematically simplify the specification of metadata, scan parameters and data-processing graphs for each type of experiments, an experiment parameter generator (EPG) will be developed; experiment-specific modules to automate preparation steps will also be made. The integration of off-the-shelf code in Mamba for domain-specific needs is under investigation, and Mamba GUI Studio (MGS) is being developed to simplify the implementation and integration of GUIs.

preprint2022arXiv

Modeling Users' Contextualized Page-wise Feedback for Click-Through Rate Prediction in E-commerce Search

Modeling user's historical feedback is essential for Click-Through Rate Prediction in personalized search and recommendation. Existing methods usually only model users' positive feedback information such as click sequences which neglects the context information of the feedback. In this paper, we propose a new perspective for context-aware users' behavior modeling by including the whole page-wisely exposed products and the corresponding feedback as contextualized page-wise feedback sequence. The intra-page context information and inter-page interest evolution can be captured to learn more specific user preference. We design a novel neural ranking model RACP(i.e., Recurrent Attention over Contextualized Page sequence), which utilizes page-context aware attention to model the intra-page context. A recurrent attention process is used to model the cross-page interest convergence evolution as denoising the interest in the previous pages. Experiments on public and real-world industrial datasets verify our model's effectiveness.

preprint2022arXiv

Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion

In this paper, we formulate a potentially valuable panoramic depth completion (PDC) task as panoramic 3D cameras often produce 360° depth with missing data in complex scenes. Its goal is to recover dense panoramic depths from raw sparse ones and panoramic RGB images. To deal with the PDC task, we train a deep network that takes both depth and image as inputs for the dense panoramic depth recovery. However, it needs to face a challenging optimization problem of the network parameters due to its non-convex objective function. To address this problem, we propose a simple yet effective approach termed M{^3}PT: multi-modal masked pre-training. Specifically, during pre-training, we simultaneously cover up patches of the panoramic RGB image and sparse depth by shared random mask, then reconstruct the sparse depth in the masked regions. To our best knowledge, it is the first time that we show the effectiveness of masked pre-training in a multi-modal vision task, instead of the single-modal task resolved by masked autoencoders (MAE). Different from MAE where fine-tuning completely discards the decoder part of pre-training, there is no architectural difference between the pre-training and fine-tuning stages in our M$^{3}$PT as they only differ in the prediction density, which potentially makes the transfer learning more convenient and effective. Extensive experiments verify the effectiveness of M{^3}PT on three panoramic datasets. Notably, we improve the state-of-the-art baselines by averagely 26.2% in RMSE, 51.7% in MRE, 49.7% in MAE, and 37.5% in RMSElog on three benchmark datasets.

preprint2022arXiv

Online Video Instance Segmentation via Robust Context Fusion

Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences. Recent transformer-based neural networks have demonstrated their powerful capability of modeling spatio-temporal correlations for the VIS task. Relying on video- or clip-level input, they suffer from high latency and computational cost. We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames. To acquire the precise and temporal-consistent prediction for each frame efficiently, the key idea is to fuse effective and compact context from reference frames into the target frame. Considering the different effects of reference and target frames on the target prediction, we first summarize contextual features through importance-aware compression. A transformer encoder is adopted to fuse the compressed context. Then, we leverage an order-preserving instance embedding to convey the identity-aware information and correspond the identities to predicted instance masks. We demonstrate that our robust fusion network achieves the best performance among existing online VIS methods and is even better than previously published clip-level methods on the Youtube-VIS 2019 and 2021 benchmarks. In addition, visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. By leveraging the flexibility of our context fusion network on multi-modal data, we further investigate the influence of audios on the video-dense prediction task, which has never been discussed in existing works. We build up an Audio-Visual Instance Segmentation dataset, and demonstrate that acoustic signals in the wild scenarios could benefit the VIS task.

preprint2022arXiv

PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection

In this paper, we delve into two key techniques in Semi-Supervised Object Detection (SSOD), namely pseudo labeling and consistency training. We observe that these two techniques currently neglect some important properties of object detection, hindering efficient learning on unlabeled data. Specifically, for pseudo labeling, existing works only focus on the classification score yet fail to guarantee the localization precision of pseudo boxes; For consistency training, the widely adopted random-resize training only considers the label-level consistency but misses the feature-level one, which also plays an important role in ensuring the scale invariance. To address the problems incurred by noisy pseudo boxes, we design Noisy Pseudo box Learning (NPL) that includes Prediction-guided Label Assignment (PLA) and Positive-proposal Consistency Voting (PCV). PLA relies on model predictions to assign labels and makes it robust to even coarse pseudo boxes; while PCV leverages the regression consistency of positive proposals to reflect the localization quality of pseudo boxes. Furthermore, in consistency training, we propose Multi-view Scale-invariant Learning (MSL) that includes mechanisms of both label- and feature-level consistency, where feature consistency is achieved by aligning shifted feature pyramids between two images with identical content but varied scales. On COCO benchmark, our method, termed PSEudo labeling and COnsistency training (PseCo), outperforms the SOTA (Soft Teacher) by 2.0, 1.8, 2.0 points under 1%, 5%, and 10% labelling ratios, respectively. It also significantly improves the learning efficiency for SSOD, e.g., PseCo halves the training time of the SOTA approach but achieves even better performance. Code is available at https://github.com/ligang-cs/PseCo.

preprint2022arXiv

RAW-GNN: RAndom Walk Aggregation based Graph Neural Network

Graph-Convolution-based methods have been successfully applied to representation learning on homophily graphs where nodes with the same label or similar attributes tend to connect with one another. Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes tend to be adjacent. Several methods have attempted to address this heterophily problem, but they do not change the fundamental aggregation mechanism of GCNs because they rely on summation operators to aggregate information from neighboring nodes, which is implicitly subject to the homophily assumption. Here, we introduce a novel aggregation mechanism and develop a RAndom Walk Aggregation-based Graph Neural Network (called RAW-GNN) method. The proposed approach integrates the random walk strategy with graph neural networks. The new method utilizes breadth-first random walk search to capture homophily information and depth-first search to collect heterophily information. It replaces the conventional neighborhoods with path-based neighborhoods and introduces a new path-based aggregator based on Recurrent Neural Networks. These designs make RAW-GNN suitable for both homophily and heterophily graphs. Extensive experimental results showed that the new method achieved state-of-the-art performance on a variety of homophily and heterophily graphs.

preprint2022arXiv

RecursiveMix: Mixed Learning with History

Mix-based augmentation has been proven fundamental to the generalization of deep vision models. However, current augmentations only mix samples at the current data batch during training, which ignores the possible knowledge accumulated in the learning history. In this paper, we propose a recursive mixed-sample learning paradigm, termed "RecursiveMix" (RM), by exploring a novel training strategy that leverages the historical input-prediction-label triplets. More specifically, we iteratively resize the input image batch from the previous iteration and paste it into the current batch while their labels are fused proportionally to the area of the operated patches. Further, a consistency loss is introduced to align the identical image semantics across the iterations, which helps the learning of scale-invariant feature representations. Based on ResNet-50, RM largely improves classification accuracy by $\sim$3.2\% on CIFAR100 and $\sim$2.8\% on ImageNet with negligible extra computation/storage costs. In the downstream object detection task, the RM pretrained model outperforms the baseline by 2.1 AP points and surpasses CutMix by 1.4 AP points under the ATSS detector on COCO. In semantic segmentation, RM also surpasses the baseline and CutMix by 1.9 and 1.1 mIoU points under UperNet on ADE20K, respectively. Codes and pretrained models are available at \url{https://github.com/megvii-research/RecursiveMix}.

preprint2022arXiv

RigNet: Repetitive Image Guided Network for Depth Completion

Depth completion deals with the problem of recovering dense depth maps from sparse ones, where color images are often used to facilitate this task. Recent approaches mainly focus on image guided learning frameworks to predict dense depth. However, blurry guidance in the image and unclear structure in the depth still impede the performance of the image guided frameworks. To tackle these problems, we explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. Specifically, the repetition is embodied in both the image guidance branch and depth generation branch. In the former branch, we design a repetitive hourglass network to extract discriminative image features of complex environments, which can provide powerful contextual instruction for depth prediction. In the latter branch, we introduce a repetitive guidance module based on dynamic convolution, in which an efficient convolution factorization is proposed to simultaneously reduce its complexity and progressively model high-frequency structures. Extensive experiments show that our method achieves superior or competitive results on KITTI benchmark and NYUv2 dataset.

preprint2022arXiv

Robust Inverse Framework using Knowledge-guided Self-Supervised Learning: An application to Hydrology

Machine Learning is beginning to provide state-of-the-art performance in a range of environmental applications such as streamflow prediction in a hydrologic basin. However, building accurate broad-scale models for streamflow remains challenging in practice due to the variability in the dominant hydrologic processes, which are best captured by sets of process-related basin characteristics. Existing basin characteristics suffer from noise and uncertainty, among many other things, which adversely impact model performance. To tackle the above challenges, in this paper, we propose a novel Knowledge-guided Self-Supervised Learning (KGSSL) inverse framework to extract system characteristics from driver and response data. This first-of-its-kind framework achieves robust performance even when characteristics are corrupted. We show that KGSSL achieves state-of-the-art results for streamflow modeling for CAMELS (Catchment Attributes and MEteorology for Large-sample Studies) which is a widely used hydrology benchmark dataset. Specifically, KGSSL outperforms other methods by up to 16 \% in reconstructing characteristics. Furthermore, we show that KGSSL is relatively more robust to distortion than baseline methods, and outperforms the baseline model by 35\% when plugging in KGSSL inferred characteristics.

preprint2022arXiv

Scalable Simulation of Quantum Measurement Process with Quantum Computers

Recent development in quantum information sciences and technologies, especially building programmable quantum computers, provide us new opportunities to study fundamental aspects of quantum mechanics. We propose qubit models to emulate the quantum measurement process, in which the quantum information of a qubit is mapped to a collection of qubits acting as the measurement device. One model is motivated by single-photon detection and the other by spin measurement. Both models are scalable to generate Schrödinger cat-like state, and their corresponding quantum circuits are shown explicitly. Large-scale simulations could be realized in near-term quantum computers, while classical computers cannot perform the same task efficiently. Due to the scalability of the models, such simulations can help explore the quantum-to-classical boundary, if exists, in the quantum measurement problem. Besides, our protocol to generate cat states may have important applications in quantum computing and metrology.

preprint2022arXiv

Seamless Interaction Design with Coexistence and Cooperation Modes for Robust Human-Robot Collaboration

A robot needs multiple interaction modes to robustly collaborate with a human in complicated industrial tasks. We develop a Coexistence-and-Cooperation (CoCo) human-robot collaboration system. Coexistence mode enables the robot to work with the human on different sub-tasks independently in a shared space. Cooperation mode enables the robot to follow human guidance and recover failures. A human intention tracking algorithm takes in both human and robot motion measurements as input and provides a switch on the interaction modes. We demonstrate the effectiveness of CoCo system in a use case analogous to a real world multi-step assembly task.

preprint2022arXiv

Shape Control of Deformable Linear Objects with Offline and Online Learning of Local Linear Deformation Models

The shape control of deformable linear objects (DLOs) is challenging, since it is difficult to obtain the deformation models. Previous studies often approximate the models in purely offline or online ways. In this paper, we propose a scheme for the shape control of DLOs, where the unknown model is estimated with both offline and online learning. The model is formulated in a local linear format, and approximated by a neural network (NN). First, the NN is trained offline to provide a good initial estimation of the model, which can directly migrate to the online phase. Then, an adaptive controller is proposed to achieve the shape control tasks, in which the NN is further updated online to compensate for any errors in the offline model caused by insufficient training or changes of DLO properties. The simulation and real-world experiments show that the proposed method can precisely and efficiently accomplish the DLO shape control tasks, and adapt well to new and untrained DLOs.

preprint2022arXiv

Speech Emotion Recognition with Global-Aware Fusion on Multi-scale Feature Representation

Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks~(CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be well captured due to the limits of existing CNNs for SER. In this paper, we propose a novel GLobal-Aware Multi-scale (GLAM) neural network (The code is available at https://github.com/lixiangucas01/GLAM) to learn multi-scale feature representation with global-aware fusion module to attend emotional information. Specifically, GLAM iteratively utilizes multiple convolutional kernels with different scales to learn multiple feature representation. Then, instead of using attention-based methods, a simple but effective global-aware fusion module is applied to grab most important emotional information globally. Experiments on the benchmark corpus IEMOCAP over four emotions demonstrates the superiority of our proposed model with 2.5% to 4.5% improvements on four common metrics compared to previous state-of-the-art approaches.

preprint2022arXiv

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Cross-speaker style transfer aims to extract the speech style of the given reference speech, which can be reproduced in the timbre of arbitrary target speakers. Existing methods on this topic have explored utilizing utterance-level style labels to perform style transfer via either global or local scale style representations. However, audiobook datasets are typically characterized by both the local prosody and global genre, and are rarely accompanied by utterance-level style labels. Thus, properly transferring the reading style across different speakers remains a challenging task. This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches. Moreover, by disentangling speaker timbre and style with the proposed switchable adversarial classifiers, the extracted reading style is made adaptable to the timbre of different speakers. Experiment results confirm that the model manages to transfer a given reading style to new target speakers. With the support of local prosody and global genre type predictor, the potentiality of the proposed method in multi-speaker audiobook generation is further revealed.

preprint2022arXiv

Trajectories and Radiation of Charged Particles in the Pulsar Magnetosphere

Trajectories and radiation of the accelerating electrons are studied in the pulsar magnetosphere approximated as the electromagnetic field of the Deutsch's solutions. Because the electrons are accelerated rapidly to ultra-relativistic velocity near the neutron star surface, the electron velocity vector (and then its trajectory) is derived from the balance between Lorentz force and radiation reaction force, which makes the pitch angle between electron trajectories and magnetic field lines nonzero in most part of the magnetosphere. In such a case, the spectral energy distributions (SEDs) of synchro-curvature radiation for the accelerating electrons with a mono-energetic form are calculated. Our results indicate that: (i) the pitch angle is the function of electron position ($r, θ, ϕ$) in the open field line regions, and increases with increasing $r$ and $θ$ as well as increasing the inclination angle; (ii) the radius of curvature becomes large along the particle trajectory, and (iii) the SED appears a double peak structure depending on the emission position, where the synchrotron radiation plays an important role in X-ray band and curvature radiation mainly works in GeV band, which is only determined by parameters $α$ and $ζ$

preprint2022arXiv

Tuning the competition between superconductivity and charge order in kagome superconductor Cs(V1-xNbx)3Sb5

The recently discovered coexistence of superconductivity and charge density wave order in the kagome systems AV3Sb5 (A = K, Rb, Cs) has stimulated enormous interest. According to theory, a vanadium-based kagome system may host a flat band, nontrivial linear dispersive Dirac surface states and electronic correlation. Despite intensive investigations, it remains controversial about the origin of the charge density wave (CDW) order, how does the superconductivity relate to the CDW, and whether the anomalous Hall effect (AHE) arises primarily from the kagome lattice or the CDW order. We report an extensive investigation on Cs(V1-xNbx)3Sb5 samples with systematic Nb doping. Our results show that the Nb doping induces apparent suppression of CDW order and promotes superconductivity; meanwhile, the AHE and magnetoresistance (MR) will be significantly weakened together with the CDW order. Combining with our density functional calculations, we interpret these effects by an antiphase shift of the Fermi energy with respect to the saddle points near M and the Fermi surface centered around Γ. It is found that the former depletes the filled states for the CDW instability and worsens the nesting condition for CDW order; while the latter lifts the Fermi level upward and enlarges the Fermi surface surrounding the Γ point, and thus promotes superconductivity. Our results uncover a delicate but unusual competition between the CDW order and superconductivity.

preprint2022arXiv

Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy. Notably, the success of the asymmetric structure relies on the "global" property of Vanilla Vision Transformer (ViT), whose self-attention mechanism reasons over arbitrary subset of discrete image patches. However, it is still unclear how the advanced Pyramid-based ViTs (e.g., PVT, Swin) can be adopted in MAE pre-training as they commonly introduce operators within "local" windows, making it difficult to handle the random sequence of partial vision tokens. In this paper, we propose Uniform Masking (UM), successfully enabling MAE pre-training for Pyramid-based ViTs with locality (termed "UM-MAE" for short). Specifically, UM includes a Uniform Sampling (US) that strictly samples $1$ random patch from each $2 \times 2$ grid, and a Secondary Masking (SM) which randomly masks a portion of (usually $25\%$) the already sampled regions as learnable tokens. US preserves equivalent elements across multiple non-overlapped local windows, resulting in the smooth support for popular Pyramid-based ViTs; whilst SM is designed for better transferable visual representations since US reduces the difficulty of pixel recovery pre-task that hinders the semantic learning. We demonstrate that UM-MAE significantly improves the pre-training efficiency (e.g., it speeds up and reduces the GPU memory by $\sim 2\times$) of Pyramid-based ViTs, but maintains the competitive fine-tuning performance across downstream tasks. For example using HTC++ detector, the pre-trained Swin-Large backbone self-supervised under UM-MAE only in ImageNet-1K can even outperform the one supervised in ImageNet-22K. The codes are available at https://github.com/implus/UM-MAE.

preprint2022arXiv

Weakly Supervised Text Classification using Supervision Signals from a Language Model

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.

preprint2022arXiv

WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation Models

Learning the user-item relevance hidden in implicit feedback data plays an important role in modern recommender systems. Neural sequential recommendation models, which formulates learning the user-item relevance as a sequential classification problem to distinguish items in future behaviors from others based on the user's historical behaviors, have attracted a lot of interest in both industry and academic due to their substantial practical value. Though achieving many practical successes, we argue that the intrinsic {\bf incompleteness} and {\bf inaccuracy} of user behaviors in implicit feedback data is ignored and conduct preliminary experiments for supporting our claims. Motivated by the observation that model-free methods like behavioral retargeting (BR) and item-based collaborative filtering (ItemCF) hit different parts of the user-item relevance compared to neural sequential recommendation models, we propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, and fine-tuning. WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolves the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning. Experiments on two benchmark datasets and online A/B tests verify the rationality of our claims and demonstrate the effectiveness of WSLRec.

preprint2021arXiv

Bridging Unpaired Facial Photos And Sketches By Line-drawings

In this paper, we propose a novel method to learn face sketch synthesis models by using unpaired data. Our main idea is bridging the photo domain $\mathcal{X}$ and the sketch domain $Y$ by using the line-drawing domain $\mathcal{Z}$. Specially, we map both photos and sketches to line-drawings by using a neural style transfer method, i.e. $F: \mathcal{X}/\mathcal{Y} \mapsto \mathcal{Z}$. Consequently, we obtain \textit{pseudo paired data} $(\mathcal{Z}, \mathcal{Y})$, and can learn the mapping $G:\mathcal{Z} \mapsto \mathcal{Y}$ in a supervised learning manner. In the inference stage, given a facial photo, we can first transfer it to a line-drawing and then to a sketch by $G \circ F$. Additionally, we propose a novel stroke loss for generating different types of strokes. Our method, termed sRender, accords well with human artists' rendering process. Experimental results demonstrate that sRender can generate multi-style sketches, and significantly outperforms existing unpaired image-to-image translation methods.

preprint2021arXiv

Contact Map based Crystal Structure Prediction using Global Optimization

Crystal structure prediction is now playing an increasingly important role in discovery of new materials. Global optimization methods such as genetic algorithms (GA) and particle swarm optimization (PSO) have been combined with first principle free energy calculations to predict crystal structures given composition or only a chemical system. While these approaches can exploit certain crystal patterns such as symmetry and periodicity in their search process, they usually do not exploit the large amount of implicit rules and constraints of atom configurations embodied in the large number of known crystal structures. They currently can only handle crystal structure prediction of relatively small systems. Inspired by the knowledge-rich protein structure prediction approach, herein we explore whether known geometric constraints such as the atomic contact map of a target crystal material can help predict its structure given its space group information. We propose a global optimization based algorithm, CMCrystal, for crystal structure reconstruction based on atomic contact maps. Based on extensive experiments using six global optimization algorithms, we show that it is viable to reconstruct the crystal structure given the atomic contact map for some crystal materials but more constraints are needed for other target materials to achieve successful reconstruction. This implies that atomic interaction information learned from existing materials can be used to improve crystal structure prediction.

preprint2021arXiv

Delayed Projection Techniques for Linearly Constrained Problems: Convergence Rates, Acceleration, and Applications

In this work, we study a novel class of projection-based algorithms for linearly constrained problems (LCPs) which have a lot of applications in statistics, optimization, and machine learning. Conventional primal gradient-based methods for LCPs call a projection after each (stochastic) gradient descent, resulting in that the required number of projections equals that of gradient descents (or total iterations). Motivated by the recent progress in distributed optimization, we propose the delayed projection technique that calls a projection once for a while, lowering the projection frequency and improving the projection efficiency. Accordingly, we devise a series of stochastic methods for LCPs using the technique, including a variance reduced method and an accelerated one. We theoretically show that it is feasible to improve projection efficiency in both strongly convex and generally convex cases. Our analysis is simple and unified and can be easily extended to other methods using delayed projections. When applying our new algorithms to federated optimization, a newfangled and privacy-preserving subfield in distributed optimization, we obtain not only a variance reduced federated algorithm with convergence rates better than previous works, but also the first accelerated method able to handle data heterogeneity inherent in federated optimization.

preprint2021arXiv

Disentangling User Interest and Conformity for Recommendation with Causal Embedding

Recommendation models are usually trained on observational interaction data. However, observational interaction data could result from users' conformity towards popular items, which entangles users' real interest. Existing methods tracks this problem as eliminating popularity bias, e.g., by re-weighting training samples or leveraging a small fraction of unbiased data. However, the variety of user conformity is ignored by these approaches, and different causes of an interaction are bundled together as unified representations, hence robustness and interpretability are not guaranteed when underlying causes are changing. In this paper, we present DICE, a general framework that learns representations where interest and conformity are structurally disentangled, and various backbone recommendation models could be smoothly integrated. We assign users and items with separate embeddings for interest and conformity, and make each embedding capture only one cause by training with cause-specific data which is obtained according to the colliding effect of causal inference. Our proposed methodology outperforms state-of-the-art baselines with remarkable improvements on two real-world datasets on top of various backbone models. We further demonstrate that the learned embeddings successfully capture the desired causes, and show that DICE guarantees the robustness and interpretability of recommendation.

preprint2021arXiv

Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition

Neural text-to-speech (TTS) approaches generally require a huge number of high quality speech data, which makes it difficult to obtain such a dataset with extra emotion labels. In this paper, we propose a novel approach for emotional TTS synthesis on a TTS dataset without emotion labels. Specifically, our proposed method consists of a cross-domain speech emotion recognition (SER) model and an emotional TTS model. Firstly, we train the cross-domain SER model on both SER and TTS datasets. Then, we use emotion labels on the TTS dataset predicted by the trained SER model to build an auxiliary SER task and jointly train it with the TTS model. Experimental results show that our proposed method can generate speech with the specified emotional expressiveness and nearly no hindering on the speech quality.

preprint2021arXiv

Leveraging Meta-path Contexts for Classification in Heterogeneous Information Networks

A heterogeneous information network (HIN) has as vertices objects of different types and as edges the relations between objects, which are also of various types. We study the problem of classifying objects in HINs. Most existing methods perform poorly when given scarce labeled objects as training sets, and methods that improve classification accuracy under such scenarios are often computationally expensive. To address these problems, we propose ConCH, a graph neural network model. ConCH formulates the classification problem as a multi-task learning problem that combines semi-supervised learning with self-supervised learning to learn from both labeled and unlabeled data. ConCH employs meta-paths, which are sequences of object types that capture semantic relationships between objects. ConCH co-derives object embeddings and context embeddings via graph convolution. It also uses the attention mechanism to fuse such embeddings. We conduct extensive experiments to evaluate the performance of ConCH against other 15 classification methods. Our results show that ConCH is an effective and efficient method for HIN classification.

preprint2021arXiv

SceneRec: Scene-Based Graph Neural Networks for Recommender Systems

Collaborative filtering has been largely used to advance modern recommender systems to predict user preference. A key component in collaborative filtering is representation learning, which aims to project users and items into a low dimensional space to capture collaborative signals. However, the scene information, which has effectively guided many recommendation tasks, is rarely considered in existing collaborative filtering methods. To bridge this gap, we focus on scene-based collaborative recommendation and propose a novel representation model SceneRec. SceneRec formally defines a scene as a set of pre-defined item categories that occur simultaneously in real-life situations and creatively designs an item-category-scene hierarchical structure to build a scene-based graph. In the scene-based graph, we adopt graph neural networks to learn scene-specific representation on each item node, which is further aggregated with latent representation learned from collaborative interactions to make recommendations. We perform extensive experiments on real-world E-commerce datasets and the results demonstrate the effectiveness of the proposed method.

preprint2021arXiv

Shallow geothermal energy potential for heating and cooling of buildings with regeneration under climate change scenarios

Shallow ground-source heat pumps (GSHPs) are a promising technology for contributing to the decarbonisation of the energy sector. In heating-dominated climates, the combined use of GSHPs for both heating and cooling increases their technical potential, defined as the maximum energy that can be exchanged with the ground, as the re-injection of excess heat from space cooling leads to a seasonal regeneration of the ground. This paper proposes a new approach to quantify the technical potential of GSHPs, accounting for effects of seasonal regeneration, and to estimate the useful energy to supply building energy demands at regional scale. The useful energy is obtained for direct heat exchange and for district heating and cooling (DHC) under several scenarios for climate change and market penetration levels of cooling systems. The case study in western Switzerland suggests that seasonal regeneration allows for annual maximum heat extraction densities above 300 kWh/m$^2$ at heat injection densities above 330 kWh/m$^2$. Results also show that GSHPs may cover up to 55% of heating demand while covering 57% of service-sector cooling demand for individual GSHPs in 2050, which increases to around 85% with DHC. The regional-scale results may serve to inform decision making on strategic areas for installing GSHPs.

preprint2021arXiv

Towards Cross-Modal Forgery Detection and Localization on Live Surveillance Videos

The cybersecurity breaches render surveillance systems vulnerable to video forgery attacks, under which authentic live video streams are tampered to conceal illegal human activities under surveillance cameras. Traditional video forensics approaches can detect and localize forgery traces in each video frame using computationally-expensive spatial-temporal analysis, while falling short in real-time verification of live video feeds. The recent work correlates time-series camera and wireless signals to recognize replayed surveillance videos using event-level timing information but it cannot realize fine-grained forgery detection and localization on each frame. To fill this gap, this paper proposes Secure-Pose, a novel cross-modal forgery detection and localization system for live surveillance videos using WiFi signals near the camera spot. We observe that coexisting camera and WiFi signals convey common human semantic information and the presence of forgery attacks on video frames will decouple such information correspondence. Secure-Pose extracts effective human pose features from synchronized multi-modal signals and detects and localizes forgery traces under both inter-frame and intra-frame attacks in each frame. We implement Secure-Pose using a commercial camera and two Intel 5300 NICs and evaluate it in real-world environments. Secure-Pose achieves a high detection accuracy of 95.1% and can effectively localize tampered objects under different forgery attacks.

preprint2021arXiv

vrCAPTCHA: Exploring CAPTCHA Designs in Virtual Reality

With the popularity of online access in virtual reality (VR) devices, it will become important to investigate exclusive and interactive CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) designs for VR devices. In this paper, we first present four traditional two-dimensional (2D) CAPTCHAs (i.e., text-based, image-rotated, image-puzzled, and image-selected CAPTCHAs) in VR. Then, based on the three-dimensional (3D) interaction characteristics of VR devices, we propose two vrCAPTCHA design prototypes (i.e., task-driven and bodily motion-based CAPTCHAs). We conducted a user study with six participants for exploring the feasibility of our two vrCAPTCHAs and traditional CAPTCHAs in VR. We believe that our two vrCAPTCHAs can be an inspiration for the further design of CAPTCHAs in VR.

preprint2020arXiv

3DMotion-Net: Learning Continuous Flow Function for 3D Motion Prediction

In this paper, we deal with the problem to predict the future 3D motions of 3D object scans from previous two consecutive frames. Previous methods mostly focus on sparse motion prediction in the form of skeletons. While in this paper we focus on predicting dense 3D motions in the from of 3D point clouds. To approach this problem, we propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds that can predict temporally consistent future motions and naturally bring out the correspondences among consecutive point clouds at the same time. More specifically, in our approach, to eliminate the unsolved and challenging process of defining a discrete point convolution on 3D point cloud sequences to encode spatial and temporal information, we introduce a learnable latent code to represent the temporal-aware shape descriptor which is optimized during model training. Moreover, a temporally consistent motion Morpher is proposed to learn a continuous flow field which deforms a 3D scan from the current frame to the next frame. We perform extensive experiments on D-FAUST, SCAPE and TOSCA benchmark data sets and the results demonstrate that our approach is capable of handling temporally inconsistent input and produces consistent future 3D motion while requiring no ground truth supervision.

preprint2020arXiv

A kilonova associated with GRB 070809

For on-axis typical short gamma-ray bursts (sGRBs), the forward shock emission is usually so bright that renders the identification of kilonovae (also known as macronovae) in the early afterglow ($t<0.5$ d) phase rather challenging. This is why previously no thermal-like kilonova component has been identified at such early time except in the off-axis dim GRB 170817A associated with GW170817. Here we report the identification of an unusual optical radiation component in GRB 070809 at $t\sim 0.47$ d, thanks plausibly to the very-weak/subdominant forward shock emission. The optical emission with a very red spectrum is well in excess of the extrapolation of the X-ray emission that is distinguished by an unusually hard spectrum, which is at odds with the forward shock afterglow prediction but can be naturally interpreted as a kilonova. Our finding supports the speculation that kilonovae are ubiquitous , and demonstrates the possibility of revealing the neutron star merger origin with the early afterglow data of some typical sGRBs that take place well beyond the sensitive radius of the advanced gravitational wave detectors and hence the opportunity of organizing dedicated follow-up observations for events of interest.

preprint2020arXiv

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.

preprint2020arXiv

Airborne LiDAR Point Cloud Classification with Graph Attention Convolution Neural Network

Airborne light detection and ranging (LiDAR) plays an increasingly significant role in urban planning, topographic mapping, environmental monitoring, power line detection and other fields thanks to its capability to quickly acquire large-scale and high-precision ground information. To achieve point cloud classification, previous studies proposed point cloud deep learning models that can directly process raw point clouds based on PointNet-like architectures. And some recent works proposed graph convolution neural network based on the inherent topology of point clouds. However, the above point cloud deep learning models only pay attention to exploring local geometric structures, yet ignore global contextual relationships among all points. In this paper, we present a graph attention convolution neural network (GACNN) that can be directly applied to the classification of unstructured 3D point clouds obtained by airborne LiDAR. Specifically, we first introduce a graph attention convolution module that incorporates global contextual information and local structural features. Based on the proposed graph attention convolution module, we further design an end-to-end encoder-decoder network, named GACNN, to capture multiscale features of the point clouds and therefore enable more accurate airborne point cloud classification. Experiments on the ISPRS 3D labeling dataset show that the proposed model achieves a new state-of-the-art performance in terms of average F1 score (71.5\%) and a satisfying overall accuracy (83.2\%). Additionally, experiments further conducted on the 2019 Data Fusion Contest Dataset by comparing with other prevalent point cloud deep learning models demonstrate the favorable generalization capability of the proposed model.

preprint2020arXiv

Antiferromagnetism of Double Molybdate LiFe(MoO$_4$)$_2$

The magnetic properties of the spin-5/2 double molybdate LiFe(MoO$_4$)$_2$ have been characterized by heat capacity, magnetic susceptibility, and neutron powder diffraction techniques. Unlike the multiferroic system LiFe(MoO$_4$)$_2$ which exhibits two successive magnetic transitions, LiFe(MoO$_4$)$_2$ undergoes only one antiferromagnetic transition at $T_N$ ~ 23.8 K. Its antiferromagnetic magnetic structure with the commensurate propagation vector k = (0, 0.5, 0) has been determined. Density functional theory calculations confirm the antiferromagnetic ground state and provide a numerical estimate of the relevant exchange coupling constants.

preprint2020arXiv

Boresight Alignment of DArk Matter Particle Explorer

The DArk Matter Particle Explorer (DAMPE) can measure $γ$-rays in the energy range from a few GeV to about 10 TeV. The direction of each $γ$-ray is reconstructed with respect to the reference system of the DAMPE payload. In this paper, we adopt a maximum likelihood method and use the $γ$-ray data centered around several bright point-like sources to measure and correct the angular deviation from the real celestial coordinate system, the so called ``boresight alignment'' of the DAMPE payload. As a check, we also estimate the boresight alignment for some sets of simulation data with artificial orientation and obtain consistent results. The time-dependent boresight alignment analysis does not show evidence for significant variation of the parameters.

preprint2020arXiv

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart while those of a dense cluster have to be sufficiently close. Following [16], we solve the problem of spectral clustering on multi-scale data by integrating the concept of objects' "reachability similarity" with a given distance-based similarity to derive an objects' coefficient matrix. We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix. We prove that the resulting coefficient matrix has the "grouping effect" and that it exhibits "sparsity". We show that these two characteristics imply very effective spectral clustering. We evaluate CAST and 10 other clustering methods on a wide range of datasets w.r.t. various measures. Experimental results show that CAST provides excellent performance and is highly robust across test cases of multi-scale data.

preprint2020arXiv

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

We present ConvLab-2, an open-source toolkit that enables researchers to build task-oriented dialogue systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. As the successor of ConvLab (Lee et al., 2019b), ConvLab-2 inherits ConvLab's framework but integrates more powerful dialogue models and supports more datasets. Besides, we have developed an analysis tool and an interactive tool to assist researchers in diagnosing dialogue systems. The analysis tool presents rich statistics and summarizes common mistakes from simulated dialogues, which facilitates error analysis and system improvement. The interactive tool provides a user interface that allows developers to diagnose an assembled dialogue system by interacting with the system and modifying the output of each system component.

preprint2020arXiv

Convolutional Neural Network for Behavioral Modeling and Predistortion of Wideband Power Amplifiers

In this paper, we propose a novel behavior model for wideband PAs using a real-valued time-delay convolutional neural network (RVTDCNN). The input data of the model are sorted and arranged as the graph composed of the in-phase and quadrature (I/Q) components and envelope-dependent terms of current and past signals. We design a pre-designed filter using the convolutional layer to extract the basis functions required for the PA forward or reverse modeling. The generated rich basis functions are modeled using a simple fully connected layer. Because of the weight sharing characteristics of the convolutional structure, the strong memory effect does not lead to a obvious increase in the complexity of the model. Meanwhile, the extraction effect of the pre-designed filter also reduces the training complexity of the model. The experimental results show that the performance of the RVTDCNN model is almost the same as the NN models and the multilayer NN models.

preprint2020arXiv

DeepTracking-Net: 3D Tracking with Unsupervised Learning of Continuous Flow

This paper deals with the problem of 3D tracking, i.e., to find dense correspondences in a sequence of time-varying 3D shapes. Despite deep learning approaches have achieved promising performance for pairwise dense 3D shapes matching, it is a great challenge to generalize those approaches for the tracking of 3D time-varying geometries. In this paper, we aim at handling the problem of 3D tracking, which provides the tracking of the consecutive frames of 3D shapes. We propose a novel unsupervised 3D shape registration framework named DeepTracking-Net, which uses the deep neural networks (DNNs) as auxiliary functions to produce spatially and temporally continuous displacement fields for 3D tracking of objects in a temporal order. Our key novelty is that we present a novel temporal-aware correspondence descriptor (TCD) that captures spatio-temporal essence from consecutive 3D point cloud frames. Specifically, our DeepTracking-Net starts with optimizing a randomly initialized latent TCD. The TCD is then decoded to regress a continuous flow (i.e. a displacement vector field) which assigns a motion vector to every point of time-varying 3D shapes. Our DeepTracking-Net jointly optimizes TCDs and DNNs' weights towards the minimization of an unsupervised alignment loss. Experiments on both simulated and real data sets demonstrate that our unsupervised DeepTracking-Net outperforms the current supervised state-of-the-art method. In addition, we prepare a new synthetic 3D data, named SynMotions, to the 3D tracking and recognition community.

preprint2020arXiv

Detecting and Analyzing Mobility Hotspots using Surface Networks

Capabilities for collecting and storing data on mobile objects have increased dramatically over the past few decades. A persistent difficulty is summarizing large collections of mobile objects. This paper develops methods for extracting and analyzing hotspots or locations with relatively high levels of mobility activity. We use kernel density estimation (KDE) to convert a large collection of mobile objects into a smooth, continuous surface. We then develop a topological algorithm to extract critical geometric features of the surface; these include critical points (peaks, pits and passes) and critical lines (ridgelines and course-lines). We connect the peaks and corresponding ridgelines to produce a surface network that summarizes the topological structure of the surface. We apply graph theoretic indices to analytically characterize the surface and its changes over time. To illustrate our approach, we apply the techniques to taxi cab data collected in Shanghai, China. We find increases in the complexity of the hotspot spatial distribution during normal activity hours in the late morning, afternoon and evening and a spike in the connectivity of the hotspot spatial distribution in the morning as taxis concentrate on servicing travel to work. These results match with scientific and anecdotal knowledge about human activity patterns in the study area.

preprint2020arXiv

Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data

In a lot of real-world data mining and machine learning applications, data are provided by multiple providers and each maintains private records of different feature sets about common entities. It is challenging to train these vertically partitioned data effectively and efficiently while keeping data privacy for traditional data mining and machine learning algorithms. In this paper, we focus on nonlinear learning with kernels, and propose a federated doubly stochastic kernel learning (FDSKL) algorithm for vertically partitioned data. Specifically, we use random features to approximate the kernel mapping function and use doubly stochastic gradients to update the solutions, which are all computed federatedly without the disclosure of data. Importantly, we prove that FDSKL has a sublinear convergence rate, and can guarantee the data security under the semi-honest assumption. Extensive experimental results on a variety of benchmark datasets show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels, while retaining the similar generalization performance.

preprint2020arXiv

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at https://github.com/implus/GFocal.

preprint2020arXiv

Geometry-Aware Segmentation of Remote Sensing Images via Implicit Height Estimation

Recent studies have shown the benefits of using additional elevation data (e.g., DSM) for enhancing the performance of the semantic segmentation of aerial images. However, previous methods mostly adopt 3D elevation information as additional inputs. While in many real-world applications, one does not have the corresponding DSM information at hand and the spatial resolution of acquired DSM images usually do not match the aerial images. To alleviate this data constraint and also take advantage of 3D elevation information, in this paper, we introduce a geometry-aware segmentation model that achieves accurate semantic labeling of aerial images via joint height estimation. Instead of using a single-stream encoder-decoder network for semantic labeling, we design a separate decoder branch to predict the height map and use the DSM images as side supervision to train this newly designed decoder branch. In this way, our model does not require DSM as model input and still benefits from the helpful 3D geometric information during training. Moreover, we develop a new geometry-aware convolution module that fuses the 3D geometric features from the height decoder branch and the 2D contextual features from the semantic segmentation branch. The fused feature embeddings can produce geometry-aware segmentation maps with enhanced performance. Our model is trained with DSM images as side supervision, while in the inference stage, it does not require DSM data and directly predicts the semantic labels in an end-to-end fashion. Experiments on ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of the proposed method for the semantic segmentation of aerial images. The proposed model achieves remarkable performance on both datasets without using any hand-crafted features or post-processing.

preprint2020arXiv

GP-Aligner: Unsupervised Non-rigid Groupwise Point Set Registration Based On Optimized Group Latent Descriptor

In this paper, we propose a novel method named GP-Aligner to deal with the problem of non-rigid groupwise point set registration. Compared to previous non-learning approaches, our proposed method gains competitive advantages by leveraging the power of deep neural networks to effectively and efficiently learn to align a large number of highly deformed 3D shapes with superior performance. Unlike most learning-based methods that use an explicit feature encoding network to extract the per-shape features and their correlations, our model leverages a model-free learnable latent descriptor to characterize the group relationship. More specifically, for a given group we first define an optimizable Group Latent Descriptor (GLD) to characterize the gruopwise relationship among a group of point sets. Each GLD is randomly initialized from a Gaussian distribution and then concatenated with the coordinates of each point of the associated point sets in the group. A neural network-based decoder is further constructed to predict the coherent drifts as the desired transformation from input groups of shapes to aligned groups of shapes. During the optimization process, GP-Aligner jointly updates all GLDs and weight parameters of the decoder network towards the minimization of an unsupervised groupwise alignment loss. After optimization, for each group our model coherently drives each point set towards a middle, common position (shape) without specifying one as the target. GP-Aligner does not require large-scale training data for network training and it can directly align groups of point sets in a one-stage optimization process. GP-Aligner shows both accuracy and computational efficiency improvement in comparison with state-of-the-art methods for groupwise point set registration. Moreover, GP-Aligner is shown great efficiency in aligning a large number of groups of real-world 3D shapes.

preprint2020arXiv

GW190521 and the GWTC-1 Events: Implication on the Black Hole Mass Function of Coalescing Binary Black Hole Systems

With the black hole mass function (BHMF; assuming an exponential cutoff at a mass of $\sim 40\,M_\odot$) of coalescing binary black hole systems constructed with the events detected in the O1 run of the advanced LIGO/Virgo network, Liang et al.(2017) predicted that the birth of the lightest intermediate mass black holes (LIMBHs; with a final mass of $\gtrsim 100\,M_\odot$) is very likely to be caught by the advanced LIGO/Virgo detectors in their O3 run. The O1 and O2 observation run data, however, strongly favor a cutoff of the BHMF much sharper than the exponential one. In this work we show that a power-law function followed by a sudden drop at $\sim 40\,M_\odot$ by a factor of $\sim $a few tens and then a new power-law component extending to $\geq 100M_\odot$ are consistent with the O1 and O2 observation run data. With this new BHMF, quite a few LIMBH events can be detected in the O3 observation run of advanced LIGO/Virgo. The first LIMBH born in GW190521, an event detected in the early stage of the O3 run of advanced LIGO/Virgo network, provides additional motivation for our hypothesis.

preprint2020arXiv

Local polynomial regression for pooled response data

We propose local polynomial estimators for the conditional mean of a continuous response when only pooled response data are collected under different pooling designs. Asymptotic properties of these estimators are investigated and compared. Extensive simulation studies are carried out to compare finite sample performance of the proposed estimators under various model settings and pooling strategies. We apply the proposed local polynomial regression methods to two real-life applications to illustrate practical implementation and performance of the estimators for the mean function.

preprint2020arXiv

Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks

In recent years, the Natural Language Inference (NLI) task has garnered significant attention, with new datasets and models achieving near human-level performance on it. However, the full promise of NLI -- particularly that it learns knowledge that should be generalizable to other downstream NLP tasks -- has not been realized. In this paper, we study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization. We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model. We validate our conjecture by showing competitive results on the task of QA and obtaining the best reported results on the task of Checking Factual Correctness of Summaries.

preprint2020arXiv

Materials Requirements of High-Speed and Low-Power Spin-Orbit-Torque Magnetic Random-Access Memory

As spin-orbit-torque magnetic random-access memory (SOT-MRAM) is gathering great interest as the next-generation low-power and high-speed on-chip cache memory applications, it is critical to analyze the magnetic tunnel junction (MTJ) properties needed to achieve sub-ns, and ~fJ write operation when integrated with CMOS access transistors. In this paper, a 2T-1MTJ cell-level modeling framework for in-plane type Y SOT-MRAM suggests that high spin Hall conductivity and moderate SOT material sheet resistance are preferred. We benchmark write energy and speed performances of type Y SOT cells based on various SOT materials experimentally reported in the literature, including heavy metals, topological insulators and semimetals. We then carry out detailed benchmarking of SOT material Pt, beta-W, and BixSe(1-x) with different thickness and resistivity. We further discuss how our 2T-1MTJ model can be expanded to analyze other variations of SOT-MRAM, including perpendicular (type Z) and type X SOT-MRAM, two-terminal SOT-MRAM, as well as spin-transfer-torque (STT) and voltage-controlled magnetic anisotropy (VCMA)-assisted SOT-MRAM. This work will provide essential guidelines for SOT-MRAM materials, devices, and circuits research in the future.

preprint2020arXiv

Modeling Discourse Structure for Document-level Neural Machine Translation

Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.

preprint2020arXiv

Negotiation-based Human-Robot Collaboration via Augmented Reality

Effective human-robot collaboration (HRC) requires extensive communication among the human and robot teammates, because their actions can potentially produce conflicts, synergies, or both. We develop a novel augmented reality (AR) interface to bridge the communication gap between human and robot teammates. Building on our AR interface, we develop an AR-mediated, negotiation-based (ARN) framework for HRC. We have conducted experiments both in simulation and on real robots in an office environment, where multiple mobile robots work on delivery tasks. The robots could not complete the tasks on their own, but sometimes need help from their human teammate, rendering human-robot collaboration necessary. Results suggest that ARN significantly reduced the human-robot team's task completion time compared to a non-AR baseline approach.

preprint2020arXiv

Object Detection in the Context of Mobile Augmented Reality

In the past few years, numerous Deep Neural Network (DNN) models and frameworks have been developed to tackle the problem of real-time object detection from RGB images. Ordinary object detection approaches process information from the images only, and they are oblivious to the camera pose with regard to the environment and the scale of the environment. On the other hand, mobile Augmented Reality (AR) frameworks can continuously track a camera's pose within the scene and can estimate the correct scale of the environment by using Visual-Inertial Odometry (VIO). In this paper, we propose a novel approach that combines the geometric information from VIO with semantic information from object detectors to improve the performance of object detection on mobile devices. Our approach includes three components: (1) an image orientation correction method, (2) a scale-based filtering approach, and (3) an online semantic map. Each component takes advantage of the different characteristics of the VIO-based AR framework. We implemented the AR-enhanced features using ARCore and the SSD Mobilenet model on Android phones. To validate our approach, we manually labeled objects in image sequences taken from 12 room-scale AR sessions. The results show that our approach can improve on the accuracy of generic object detectors by 12% on our dataset.

preprint2020arXiv

Observation of the Topologically Originated Edge States in large-gap Quasi-One-Dimensional a-Bi$_4$Br$_4$

Two-dimensional topological insulator features time-reversal-invariant spin-momentum-locked one-dimensional (1D) edge states with a linear energy dispersion. However, experimental access to 1D edge states is still of great challenge and only limited to few techniques to date. Here, by using infrared absorption spectroscopy, we observed robust topologically originated edge states in a-Bi4Br4 belts with definitive signature of strong infrared absorption at belt sides and distinct anisotropy with respect to light polarizations, which is further supported by first-principles calculations. Our work demonstrates for the first time that the infrared spectroscopy can offer a power-efficient approach in experimentally probing 1D edge states of topological materials.

preprint2020arXiv

On the Convergence of FedAvg on Non-IID Data

Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $η$ must decay, even if full-gradient is used; otherwise, the solution will be $Ω(η)$ away from the optimal.

preprint2020arXiv

One-Shot Object Detection without Fine-Tuning

Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited. In this paper, we attempt to enrich such categories by addressing the one-shot object detection problem, where the number of annotated training examples for learning an unseen class is limited to one. We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module, the combination of which integrates metric learning with an anchor-free Faster R-CNN-style detection pipeline, eventually eliminating the need to fine-tune on the support images. We also propose novel training strategies that effectively improve detection performance. Extensive quantitative and qualitative evaluations were performed and our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.

preprint2020arXiv

Online Speaker Diarization with Relation Network

In this paper, we propose an online speaker diarization system based on Relation Network, named RenoSD. Unlike conventional diariztion systems which consist of several independently-optimized modules, RenoSD implements voice-activity-detection (VAD), embedding extraction, and speaker identity association using a single deep neural network. The most striking feature of RenoSD is that it adopts a meta-learning strategy for speaker identity association. In particular, the relation network learns to learn a deep distance metric in a data-driven way and it can determine through a simple forward pass whether two given segments belong to the same speaker. As such, RenoSD can be performed in an online manner with low latency. Experimental results on AMI and CALLHOME datasets show that the proposed RenoSD system achieves consistent improvements over the state-of-the-art x-vector baseline. Compared with an existing online diarization system named UIS-RNN, RenoSD achieves a better performance using much fewer training data and at a lower time complexity.

preprint2020arXiv

Robust Image Matching By Dynamic Feature Selection

Estimating dense correspondences between images is a long-standing image under-standing task. Recent works introduce convolutional neural networks (CNNs) to extract high-level feature maps and find correspondences through feature matching. However,high-level feature maps are in low spatial resolution and therefore insufficient to provide accurate and fine-grained features to distinguish intra-class variations for correspondence matching. To address this problem, we generate robust features by dynamically selecting features at different scales. To resolve two critical issues in feature selection,i.e.,how many and which scales of features to be selected, we frame the feature selection process as a sequential Markov decision-making process (MDP) and introduce an optimal selection strategy using reinforcement learning (RL). We define an RL environment for image matching in which each individual action either requires new features or terminates the selection episode by referring a matching score. Deep neural networks are incorporated into our method and trained for decision making. Experimental results show that our method achieves comparable/superior performance with state-of-the-art methods on three benchmarks, demonstrating the effectiveness of our feature selection strategy.

preprint2020arXiv

Simulating multi-exit evacuation using deep reinforcement learning

Conventional simulations on multi-exit indoor evacuation focus primarily on how to determine a reasonable exit based on numerous factors in a changing environment. Results commonly include some congested and other under-utilized exits, especially with massive pedestrians. We propose a multi-exit evacuation simulation based on Deep Reinforcement Learning (DRL), referred to as the MultiExit-DRL, which involves in a Deep Neural Network (DNN) framework to facilitate state-to-action mapping. The DNN framework applies Rainbow Deep Q-Network (DQN), a DRL algorithm that integrates several advanced DQN methods, to improve data utilization and algorithm stability, and further divides the action space into eight isometric directions for possible pedestrian choices. We compare MultiExit-DRL with two conventional multi-exit evacuation simulation models in three separate scenarios: 1) varying pedestrian distribution ratios, 2) varying exit width ratios, and 3) varying open schedules for an exit. The results show that MultiExit-DRL presents great learning efficiency while reducing the total number of evacuation frames in all designed experiments. In addition, the integration of DRL allows pedestrians to explore other potential exits and helps determine optimal directions, leading to the high efficiency of exit utilization.

preprint2020arXiv

Spectral-change enhancement with prior SNR for the hearing impaired

A previous signal processing algorithm that aimed to enhance spectral changes (SCE) over time showed benefit for hearing-impaired (HI) listeners to recognize speech in background noise. In this work, the previous SCE was manipulated to perform on target-dominant segments, rather than treating all frames equally. Instantaneous signal-to-noise ratios (SNRs) were calculated to determine whether the segments should be processed. Initially, the ideal SNR calculated by the knowledge of premixed signals was introduced to the previous SCE algorithm (SCE-iSNR). Speech intelligibility (SI) and clarity preference were measured for 12 HI listeners in steady speech-spectrum noise (SSN) and six-talk speech (STS) maskers, respectively. The results showed the SCE-iSNR algorithm improved SI significantly for both maskers at high signal-to-masker ratios (SMRs) and for STS masker at low SMRs, while processing effect on speech quality was small. Secondly, the estimated SNR obtained from real mixtures was used, resulting in another SCE-eSNR. SI and subjective rating on naturalness and speech quality were tested for 7 HI subjects. The SCE-eSNR algorithm showed improved SI for SSN masker at high SMRs and for STS masker at low SMRs, as well as better naturalness and speech quality for STS masker. The limitations of applying the algorithms are discussed.

preprint2020arXiv

Spin-Orbit-Torque Field-Effect Transistor (SOTFET): Proposal for a New Magnetoelectric Memory

Spin-based memories are attractive for their non-volatility and high durability but provide modest resistance changes, whereas semiconductor logic transistors are capable of large resistance changes, but lack memory function with high durability. The recent availability of multiferroic materials provides an opportunity to directly couple the change in spin states of a magnetic memory to a charge change in a semiconductor transistor. In this work, we propose and analyze the spin-orbit torque field-effect transistor (SOTFET), a device with the potential to significantly boost the energy efficiency of spin-based memories, and to simultaneously offer a palette of new functionalities.

preprint2020arXiv

Supervised Community Detection with Line Graph Neural Networks

Traditionally, community detection in graphs can be solved using spectral methods or posterior inference under probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational detection thresholds in terms of the signal-to-noise ratio. By recasting community detection as a node-wise classification problem on graphs, we can also study it from a learning perspective. We present a novel family of Graph Neural Networks (GNNs) for solving community detection problems in a supervised learning setting. We show that, in a data-driven manner and without access to the underlying generative models, they can match or even surpass the performance of the belief propagation algorithm on binary and multi-class stochastic block models, which is believed to reach the computational threshold. In particular, we propose to augment GNNs with the non-backtracking operator defined on the line graph of edge adjacencies. Our models also achieve good performance on real-world datasets. In addition, we perform the first analysis of the optimization landscape of training linear GNNs for community detection problems, demonstrating that under certain simplifications and assumptions, the loss values at local and global minima are not far apart.

preprint2020arXiv

TEDL: A Text Encryption Method Based on Deep Learning

Recent years have seen an increasing emphasis on information security, and various encryption methods have been proposed. However, for symmetric encryption methods, the well-known encryption techniques still rely on the key space to guarantee security and suffer from frequent key updating. Aiming to solve those problems, this paper proposes a novel text encryption method based on deep learning called TEDL, where the secret key includes hyperparameters in deep learning model and the core step of encryption is transforming input data into weights trained under hyperparameters. Firstly, both communication parties establish a word vector table by training a deep learning model according to specified hyperparameters. Then, a self-update codebook is constructed on the word vector table with the SHA-256 function and other tricks. When communication starts, encryption and decryption are equivalent to indexing and inverted indexing on the codebook, respectively, thus achieving the transformation between plaintext and ciphertext. Results of experiments and relevant analyses show that TEDL performs well for security, efficiency, generality, and has a lower demand for the frequency of key redistribution. Especially, as a supplement to current encryption methods, the time-consuming process of constructing a codebook increases the difficulty of brute-force attacks while not degrade the communication efficiency.

preprint2020arXiv

Time-constrained Adaptive Influence Maximization

The well-known influence maximization problem aims at maximizing the influence of one information cascade in a social network by selecting appropriate seed users prior to the diffusion process. In its adaptive version, additional seed users can be selected after observing certain diffusion results. On the other hand, social computing tasks are often time-critical, and therefore only the influence resulted in the early period is worthwhile, which can be naturally modeled by enforcing a time constraint. In this paper, we present an analysis of the time-constrained adaptive influence maximization problem. We show that the new problem is combinatorially different from the existing problems, and the current techniques such as submodular maximization and adaptive submodularity are unfortunately inapplicable. On the theory side, we provide the hardness results of computing the optimal policy and a lower bound on the adaptive gap. For practical solutions, from basic to advanced, we design a series of seeding policies for achieving high efficacy and scalability. Finally, we investigate the proposed solutions through extensive simulations based on real-world datasets.

preprint2020arXiv

Ultralong carrier lifetime of topological edge states in a-Bi4Br4

The rising of quantum spin Hall insulators (QSHI) in two-dimensional (2D) systems has been attracting significant interest in current research, for which the 1D helical edge states, a hallmark of QSHI, are widely expected to be a promising platform for next-generation optoelectronics. However, the dynamics of the 1D edge states has not yet been experimentally addressed. Here, we report the observation of optical response of the topological helical edge states in a-Bi4Br4, using the infrared-pump infrared-probe microscopic spectroscopy. Remarkably, we observe that the carrier lifetime of the helical edge states reaches nanosecond-scale at room temperature, which is about 2 - 3 orders longer than that of most 2D topological surface states and is even comparable with that of the well developed optoelectronics semiconductors used in modern industry. The ultralong carrier lifetime of the topological edge states may be attributed to their helical and 1D nature. Our findings not only provide an ideal material for further investigations of the carrier dynamics of 1D helical edge states but also pave the way for its application in optoelectronics.

preprint2020arXiv

Unsupervised Learning of 3D Point Set Registration

Point cloud registration is the process of aligning a pair of point sets via searching for a geometric transformation. Recent works leverage the power of deep learning for registering a pair of point sets. However, unfortunately, deep learning models often require a large number of ground truth labels for training. Moreover, for a pair of source and target point sets, existing deep learning mechanisms require explicitly designed encoders to extract both deep spatial features from unstructured point clouds and their spatial correlation representation, which is further fed to a decoder to regress the desired geometric transformation for point set alignment. To further enhance deep learning models for point set registration, this paper proposes Deep-3DAligner, a novel unsupervised registration framework based on a newly introduced deep Spatial Correlation Representation (SCR) feature. The SCR feature describes the geometric essence of the spatial correlation between source and target point sets in an encoding-free manner. More specifically, our method starts with optimizing a randomly initialized latent SCR feature, which is then decoded to a geometric transformation (i.e., rotation and translation) to align source and target point sets. Our Deep-3DAligner jointly updates the SCR feature and weights of the transformation decoder towards the minimization of an unsupervised alignment loss. We conducted experiments on the ModelNet40 datasets to validate the performance of our unsupervised Deep-3DAligner for point set registration. The results demonstrated that, even without ground truth and any assumption of a direct correspondence between source and target point sets for training, our proposed approach achieved comparative performance compared to most recent supervised state-of-the-art approaches.

preprint2020arXiv

Unsupervised Learning of Global Registration of Temporal Sequence of Point Clouds

Global registration of point clouds aims to find an optimal alignment of a sequence of 2D or 3D point sets. In this paper, we present a novel method that takes advantage of current deep learning techniques for unsupervised learning of global registration from a temporal sequence of point clouds. Our key novelty is that we introduce a deep Spatio-Temporal REPresentation (STREP) feature, which describes the geometric essence of both temporal and spatial relationship of the sequence of point clouds acquired with sensors in an unknown environment. In contrast to the previous practice that treats each time step (pair-wise registration) individually, our unsupervised model starts with optimizing a sequence of latent STREP feature, which is then decoded to a temporally and spatially continuous sequence of geometric transformations to globally align multiple point clouds. We have evaluated our proposed approach over both simulated 2D and real 3D datasets and the experimental results demonstrate that our method can beat other techniques by taking into account the temporal information in deep feature learning.

preprint2020arXiv

Unsupervised Partial Point Set Registration via Joint Shape Completion and Registration

We propose a self-supervised method for partial point set registration. While recent proposed learning-based methods have achieved impressive registration performance on the full shape observations, these methods mostly suffer from performance degradation when dealing with partial shapes. To bridge the performance gaps between partial point set registration with full point set registration, we proposed to incorporate a shape completion network to benefit the registration process. To achieve this, we design a latent code for each pair of shapes, which can be regarded as a geometric encoding of the target shape. By doing so, our model does need an explicit feature embedding network to learn the feature encodings. More importantly, both our shape completion network and the point set registration network take the shared latent codes as input, which are optimized along with the parameters of two decoder networks in the training process. Therefore, the point set registration process can thus benefit from the joint optimization process of latent codes, which are enforced to represent the information of full shape instead of partial ones. In the inference stage, we fix the network parameter and optimize the latent codes to get the optimal shape completion and registration results. Our proposed method is pure unsupervised and does not need any ground truth supervision. Experiments on the ModelNet40 dataset demonstrate the effectiveness of our model for partial point set registration.

preprint2019arXiv

Directionally Constrained Fully Convolutional Neural Network For Airborne Lidar Point Cloud Classification

Point cloud classification plays an important role in a wide range of airborne light detection and ranging (LiDAR) applications, such as topographic mapping, forest monitoring, power line detection, and road detection. However, due to the sensor noise, high redundancy, incompleteness, and complexity of airborne LiDAR systems, point cloud classification is challenging. In this paper, we proposed a directionally constrained fully convolutional neural network (D-FCN) that can take the original 3D coordinates and LiDAR intensity as input; thus, it can directly apply to unstructured 3D point clouds for semantic labeling. Specifically, we first introduce a novel directionally constrained point convolution (D-Conv) module to extract locally representative features of 3D point sets from the projected 2D receptive fields. To make full use of the orientation information of neighborhood points, the proposed D-Conv module performs convolution in an orientation-aware manner by using a directionally constrained nearest neighborhood search. Then, we designed a multiscale fully convolutional neural network with downsampling and upsampling blocks to enable multiscale point feature learning. The proposed D-FCN model can therefore process input point cloud with arbitrary sizes and directly predict the semantic labels for all the input points in an end-to-end manner. Without involving additional geometry features as input, the proposed method has demonstrated superior performance on the International Society for Photogrammetry and Remote Sensing (ISPRS) 3D labeling benchmark dataset. The results show that our model has achieved a new state-of-the-art level of performance with an average F1 score of 70.7%, and it has improved the performance by a large margin on categories with a small number of points (such as powerline, car, and facade).

preprint2019arXiv

Generative adversarial networks (GAN) based efficient sampling of chemical space for inverse design of inorganic materials

A major challenge in materials design is how to efficiently search the vast chemical design space to find the materials with desired properties. One effective strategy is to develop sampling algorithms that can exploit both explicit chemical knowledge and implicit composition rules embodied in the large materials database. Here, we propose a generative machine learning model (MatGAN) based on a generative adversarial network (GAN) for efficient generation of new hypothetical inorganic materials. Trained with materials from the ICSD database, our GAN model can generate hypothetical materials not existing in the training dataset, reaching a novelty of 92.53% when generating 2 million samples. The percentage of chemically valid (charge neutral and electronegativity balanced) samples out of all generated ones reaches 84.5% by our GAN when trained with materials from ICSD even though no such chemical rules are explicitly enforced in our GAN model, indicating its capability to learn implicit chemical composition rules. Our algorithm could be used to speed up inverse design or computational screening of inorganic materials.

preprint2019arXiv

Image-free real-time classification of fast moving objects using 'learned' spatial light modulation and a single-pixel detector

Objects classification generally relies on image acquisition and analysis. Real-time classification of high-speed moving objects is challenging, as both high temporal resolution in image acquisition and low computational complexity in objects classification algorithms are required. Here we propose and experimentally demonstrate an approach for real-time moving objects classification without image acquisition. As objects classification algorithms rely on the feature information of objects, we propose to use spatial light modulation to acquire the feature information directly rather than performing image acquisition followed by features extraction. A convolutional neural network is designed and trained to learn the spatial features of the target objects. The trained network can generate structured patterns for spatial light modulation. Using the resulting structured patterns for spatial light modulation, the feature information of target objects can be compressively encoded into a short light intensity sequence. The resulting one-dimensional signal is collected by a single-pixel detector and fed to the convolutional neural network for objects classification. As experimentally demonstrated, the proposed approach can achieve accurate and real-time classification of fast moving objects. The proposed method has potential applications in the fields where fast moving objects classification in real time and for long duration is required.

preprint2019arXiv

New Insight about the Effective Restoration of U_A(1) Symmetry

The effective restoration of the U_{A}(1) symmetry is revisited by implementing the functional renormalization group approach combining with the 2+1 flavor Polyakov-loop quark-meson model. A temperature-dependent 't Hooft term is taken to imitate the restoration of the U_{A}(1) symmetry. Order parameters, meson spectrum and mixing angles, especially the pressure and the entropy density of the system are calculated to explore the effects of different U_{A}(1) symmetry restoration patterns. We show then that the temperature for the restoration of the U_{A}(1) symmetry is much higher than that for the chiral symmetry SU_{A}(3).

preprint2018arXiv

Wireless Communications with Programmable Metasurface: Transceiver Design and Experimental Results

Metasurfaces have drawn significant attentions due to their superior capability in tailoring electromagnetic waves with a wide frequency range, from microwave to visible light. Recently, programmable metasurfaces have demonstrated the ability of manipulating the amplitude or phase of electromagnetic waves in a programmable manner in real time, which renders them especially appealing in the applications of wireless communications. To practically demonstrate the feasibility of programmable metasurfaces in future communication systems, in this paper, we design and realize a novel metasurface-based wireless communication system. By exploiting the dynamically controllable property of programmable metasurface, we firstly introduce the fundamental principle of the metasurface-based wireless communication system design. We then present the design, implementation and experimental evaluation of the proposed metasurface-based wireless communication system with a prototype, which realizes single carrier quadrature phase shift keying (QPSK) transmission over the air. In the developed prototype, the phase of the reflected electromagnetic wave of programmable metasurface is directly manipulated in real time according to the baseband control signal, which achieves 2.048 Mbps data transfer rate with video streaming transmission over the air. Experimental result is provided to compare the performance of the proposed metasurface-based architecture against the conventional one. With the slight increase of the transmit power by 5 dB, the same bit error rate (BER) performance can be achieved as the conventional system in the absence of channel coding. Such a result is encouraging considering that the metasurface-based system has the advantages of low hardware cost and simple structure, thus leading to a promising new architecture for wireless communications.

preprint2016arXiv

A GeV source in the direction of Supernova Remnant CTB 37B

Supernova remnants (SNRs) are the most attractive candidates for the acceleration sites of Galactic cosmic rays. We report the detection of GeV $γ$-ray emission with the Pass 8 events recorded by Fermi Large Area Telescope (Fermi-LAT) in the vicinity of the shell type SNR CTB 37B that is likely associated with the TeV $γ-$ray source HESS J1713-381. The photon spectrum of CTB 37B is consistent with a power-law with an index of $1.89\pm0.08$ in the energy range of $0.5-500$ GeV, and the measured flux connects smoothly with that of HESS J1713-381 at a few hundred GeV. No significant spatial extension and time variation are detected. The multi-wavelength data can be well fitted with either a leptonic model or a hadronic one. However, parameters of both models suggest more efficient particle acceleration than typical SNRs. Meanwhile, the X-ray and $γ$-ray spectral properties of CTB 37B show that it is an interesting source bridging young SNRs dominated by non-thermal emission and old SNRs interacting with molecular clouds.

preprint2016arXiv

An Enhanced Deep Feature Representation for Person Re-identification

Feature representation and metric learning are two critical components in person re-identification models. In this paper, we focus on the feature representation and claim that hand-crafted histogram features can be complementary to Convolutional Neural Network (CNN) features. We propose a novel feature extraction model called Feature Fusion Net (FFN) for pedestrian image representation. In FFN, back propagation makes CNN features constrained by the handcrafted features. Utilizing color histogram features (RGB, HSV, YCbCr, Lab and YIQ) and texture features (multi-scale and multi-orientation Gabor features), we get a new deep feature representation that is more discriminative and compact. Experiments on three challenging datasets (VIPeR, CUHK01, PRID450s) validates the effectiveness of our proposal.

preprint2016arXiv

Explore Spatiotemporal and Demographic Characteristics of Human Mobility via Twitter: A Case Study of Chicago

Characterizing human mobility patterns is essential for understanding human behaviors and the interactions with socioeconomic and natural environment. With the continuing advancement of location and Web 2.0 technologies, location-based social media (LBSM) have been gaining widespread popularity in the past few years. With an access to locations of users, profiles and the contents of the social media posts, the LBSM data provided a novel modality of data source for human mobility study. By exploiting the explicit location footprints and mining the latent demographic information implied in the LBSM data, the purpose of this paper is to investigate the spatiotemporal characteristics of human mobility with a particular focus on the impact of demography. We first collect geo-tagged Twitter feeds posted in the conterminous United States area, and organize the collection of feeds using the concept of space-time trajectory corresponding to each Twitter user. Commonly human mobility measures, including detected home and activity centers, are derived for each user trajectory. We then select a subset of Twitter users that have detected home locations in the city of Chicago as a case study, and apply name analysis to the names provided in user profiles to learn the implicit demographic information of Twitter users, including race/ethnicity, gender and age. Finally we explore the spatiotemporal distribution and mobility characteristics of Chicago Twitter users, and investigate the demographic impact by comparing the differences across three demographic dimensions (race/ethnicity, gender and age). We found that, although the human mobility measures of different demographic groups generally follow the generic laws (e.g., power law distribution), the demographic information, particular the race/ethnicity group, significantly affects the urban human mobility patterns.

preprint2016arXiv

Generalized uncertainty principles, effective Newton constant and regular black holes

In this paper, we explore the quantum spacetimes that are potentially connected with the generalized uncertainty principles. By analyzing the gravity-induced quantum interference pattern and the Gedanken for weighting photon, we find that the generalized uncertainty principles inspire the effective Newton constant as same as our previous proposal. A characteristic momentum associated with the tidal effect is suggested, which incorporates the quantum effect with the geometric nature of gravity. When the simplest generalized uncertainty principle is considered, the minimal model of the regular black holes is reproduced by the effective Newton constant. The black hole's tunneling probability, accurate to the second order correction, is carefully analyzed. We find that the tunneling probability is regularized by the size of the black hole remnant. Moreover, the black hole remnant is the final state of a tunneling process that the probability is minimized. A theory of modified gravity is suggested, by substituting the effective Newton constant into the Hilbert-Einstein action.

preprint2016arXiv

GRB/GW association: Long-short GRB candidates, time-lag, measuring gravitational wave velocity and testing Einstein's equivalence principle

Short-duration gamma-ray bursts (SGRBs) are widely believed to be powered by the mergers of compact binaries, such as binary neutron stars or possibly neutron star-black hole binaries. Though the prospect of detecting SGRBs with gravitational wave (GW) signals by the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO)/VIRGO network is promising, no known SGRB has been found within the expected advanced LIGO/VIRGO sensitivity range for binary neutron star systems. We find, however, that the two long-short GRBs (GRB 060505 and GRB 060614) may be within the horizon of advanced GW detectors. In the upcoming era of GW astronomy, the merger origin of some long-short GRBs, as favored by the macronova signature displayed in GRB 060614, can be unambiguously tested. The model-dependent time lags between the merger and the onset of the prompt emission of the GRB are estimated. The comparison of such time lags between model predictions and the real data expected in the era of the GW astronomy would be helpful in revealing the physical processes taking place at the central engine (including the launch of the relativistic outflow, the emergence of the outflow from the dense material ejected during the merger, and the radiation of gamma rays). We also show that the speed of GWs, with or without a simultaneous test of Einstein's equivalence principle, can be directly measured to an accuracy of $\sim 3\times 10^{-8}~{\rm cm~s^{-1}}$ or even better in the advanced LIGO/VIRGO era. The Astrophysical Journal, Volume

preprint2016arXiv

Implications of the Tentative Association between GW150914 and a {\it Fermi}-GBM Transient

The merger-driven Gamma-ray Bursts (GRBs) and their associated gravitational wave (GW) radiation, if both successfully detected, have some far-reaching implications, including for instance: (i) The statistical comparison of the physical properties of the short/long-short GRBs with and without GW detection can test the general origin model; (ii) Revealing the physical processes taking place at the central engine; (iii) Measuring the velocity of the Gravitational wave directly/accurately. In this work we discuss these implications in the case of possible association of GW150914/ GBM transient 150914. We compared GBM transient 150914 with other SGRBs and found that such an event {may be} a distinct outlier in some statistical diagrams, possibly due to its specific binary-black-hole merger origin. However, the presence of a "new" group of SGRBs with "unusual" physical parameters is also possible. If the outflow of GBM transient 150914 was launched by the accretion onto the nascent black hole, the magnetic activity rather than the neutrino process is likely responsible for the energy extraction and the accretion disk mass is estimated to be $\sim 10^{-5}~M_\odot$. The GW150914/GBM transient 150914 association, {if confirmed, would} provide the first opportunity to directly measure the GW velocity and its departure from the speed of the light {should be within} a factor of $\sim 10^{-17}$.

preprint2016arXiv

Interfacial Control of Dzyaloshinskii Moriya Interaction in Heavy Metal_Ferromagnetic Metal Thin Film Heterostructures

The interfacial Dzyaloshinskii Moriya Interaction (DMI) in ultrathin magnetic thin film heterostructures provides a new approach for controlling spin textures on mesoscopic length scales. Here we investigate the dependence of the interfacial DMI constant D on a Pt wedge insertion layer in Ta_CoFeB_Pt(wedge)_MgO thin films by observing the asymmetric spin wave dispersion using Brillouin light scattering. Continuous tuning of D by more than a factor of three is realized by inserting less than one monolayer of Pt. The observations provide new insights for designing magnetic thin film heterostructures with tailored D for controlling skyrmions and magnetic domain wall chirality and dynamics.

preprint2016arXiv

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.

preprint2016arXiv

Magnons and Phonons Optically Driven Out of Local Equilibrium in a Magnetic Insulator

Magnons are the energy quanta of fundamental spin excitations, namely spin waves, and they can make a considerable contribution to energy transport in some magnetic materials in a similar manner as lattice vibration waves or phonons. The coupling and possible non-equilibrium between magnons and other energy carriers have been used to explain several recently discovered thermally driven spin transport and energy conversion phenomena. Here, we report experiments in which local non-equilibrium between magnons and phonons in a single crystalline bulk magnetic insulator, Y3Fe5O12 (yttrium iron garnet, or YIG), has been created optically within a focused laser spot and probed directly with the use of micro-Brillouin light scattering (BLS). By analyzing the experimental results with a thermally induced magnon diffusion model, we obtain the magnon diffusion length of thermal magnons. By explicitly establishing non-equilibrium between magnons and phonons, our studies represent an important step toward a quantitative understanding of various spin-heat coupling phenomena.

preprint2016arXiv

Network Clustering via Maximizing Modularity: Approximation Algorithms and Theoretical Limits

Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the maximum modularity as well as the implications of finding community structure with provable guarantees. In this paper, we settle definitely the approximability of modularity clustering, proving that approximating the problem within any (multiplicative) positive factor is intractable, unless P = NP. Yet we propose the first additive approximation algorithm for modularity clustering with a constant factor. Moreover, we provide a rigorous proof that a CS with modularity arbitrary close to maximum modularity QOPT might bear no similarity to the optimal CS of maximum modularity. Thus even when CS with near-optimal modularity are found, other verification methods are needed to confirm the significance of the structure.

preprint2016arXiv

Search for a gamma-ray line feature from a group of nearby Galaxy clusters with Fermi LAT Pass 8 data

Galaxy clusters are the largest gravitationally bound objects in the universe and may be suitable targets for indirect dark matter searches. With 85 months of Fermi-LAT Pass 8 publicly available data, we analyze the gamma-ray emission in the directions of 16 nearby Galaxy Clusters with an unbinned likelihood analysis. No globally statistically-significant $γ-$ray line feature is identified and a tentative line signal may be present at $\sim 43$ GeV. The 95\% confidence level upper limits on the velocity-averaged cross section of dark matter particles annihilating into double $γ-$rays (i.e., $\langle σv \rangle_{χχ\rightarrow γγ}$) are derived. Unless very optimistic boost factors of dark matter annihilation in these Galaxy Clusters have been assumed, such constraints are much weaker than the bounds set by the Galactic $γ-$ray data.

preprint2016arXiv

Search for gamma-ray emission from eight dwarf spheroidal galaxy candidates discovered in Year Two of Dark Energy Survey with Fermi-LAT data

Very recently the Dark Energy Survey (DES) Collaboration has released their second group of Dwarf spheroidal (dSph) galaxy candidates. With the publicly-available Pass 8 data of Fermi-LAT we search for $γ-$ray emissions from the directions of these eight newly discovered dSph galaxy candidates. No statistically significant $γ-$ray signal has been found in the combined analysis of these sources. With the empirically estimated J-factors of these sources, the constraint on the annihilation channel of $χχ\rightarrow τ^{+}τ^{-}$ is comparable to the bound set by the joint analysis of fifteen previously known dSphs with kinematically constrained J-factors for the dark matter mass $m_χ>250$ GeV. In the direction of Tucana III (DES J2356-5935), one of the nearest dSph galaxy candidates that is $\sim 25$ kpc away, there is a weak $γ-$ray signal and its peak test statistic (TS) value for the dark matter annihilation channel $χχ\rightarrow τ^{+}τ^{-1}$ is $\approx 6.7$ at $m_χ\sim 15$ GeV. The significance of the possible signal likely increases with time. More data is highly needed to pin down the physical origin of such a GeV excess.

preprint2016arXiv

Search for gamma-ray line features from Milky Way satellites with Fermi-LAT Pass 8 data

With 91 months of the publicly available Fermi-LAT Pass 8 data, we analyze the gamma-ray emission from the Milky Way satellites to search for potential line signals due to the annihilation of dark matter particles into double photons. The searched targets include a sample of dwarf spheroidal galaxies, the Large Magellanic Cloud (LMC) and Small Magellanic Cloud (SMC). No significant line emission has been found neither in the stacked dwarf galaxy sample nor in the direction of LMC/SMC. The corresponding upper limits on the cross section of DM annihilation into two photons are derived. Compared with results of previous gamma-ray line searches with the Pass 7 data, the current constraints on the line emission from dwarf spheroidal galaxies has been significantly improved in a wide energy range. With the rapid increase of the sample of dwarf spheroidal galaxies (candidates), we expect that the sensitivity of gamma ray line searches will be significantly improved in the near future.

preprint2016arXiv

StalemateBreaker: A Proactive Content-Introducing Approach to Automatic Human-Computer Conversation

Existing open-domain human-computer conversation systems are typically passive: they either synthesize or retrieve a reply provided a human-issued utterance. It is generally presumed that humans should take the role to lead the conversation and introduce new content when a stalemate occurs, and that the computer only needs to "respond." In this paper, we propose StalemateBreaker, a conversation system that can proactively introduce new content when appropriate. We design a pipeline to determine when, what, and how to introduce new content during human-computer conversation. We further propose a novel reranking algorithm Bi-PageRank-HITS to enable rich interaction between conversation context and candidate replies. Experiments show that both the content-introducing approach and the reranking algorithm are effective. Our full StalemateBreaker model outperforms a state-of-the-practice conversation system by +14.4% p@1 when a stalemate occurs.

preprint2016arXiv

Statistical Properties of the Single Linkage Hierarchical Clustering Estimator

Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.

preprint2016arXiv

The Macronova in GRB 050709 and the GRB/macronova connection

GRB 050709 was the first short Gamma-ray Burst (sGRB) with an identified optical afterglow. In this work we report a re-analysis of the publicly available data of this event and the discovery of a Li-Paczynski macronova/kilonova that dominates the optical/IR signal at t $>$ 2.5 days. Such a signal would arise from ~0.05 M$_\odot$ r-process material launched by a compact binary merger. The implied mass ejection supports the suggestion that compact binary mergers are significant and possibly main sites of heavy r-process nucleosynthesis. We have re-analyzed all afterglow data from nearby short and hybrid GRBs. A statistical study of sGRB/macronova connection reveals that macronova may have taken place in all these GRBs though the fraction as low as 0.18 cannot be ruled out. The identification of two of the three macronova candidates in the I-band implies a more promising detection prospect for ground-based surveys.

preprint2016arXiv

Top-push Video-based Person Re-identification

Most existing person re-identification (re-id) models focus on matching still person images across disjoint camera views. Since only limited information can be exploited from still images, it is hard (if not impossible) to overcome the occlusion, pose and camera-view change, and lighting variation problems. In comparison, video-based re-id methods can utilize extra space-time information, which contains much more rich cues for matching to overcome the mentioned problems. However, we find that when using video-based representation, some inter-class difference can be much more obscure than the one when using still-image based representation, because different people could not only have similar appearance but also have similar motions and actions which are hard to align. To solve this problem, we propose a top-push distance learning model (TDL), in which we integrate a top-push constrain for matching video features of persons. The top-push constraint enforces the optimization on top-rank matching in re-id, so as to make the matching model more effective towards selecting more discriminative features to distinguish different persons. Our experiments show that the proposed video-based re-id framework outperforms the state-of-the-art video-based re-id methods.

preprint2016arXiv

Two are Better than One: An Ensemble of Retrieval- and Generation-Based Dialog Systems

Open-domain human-computer conversation has attracted much attention in the field of NLP. Contrary to rule- or template-based domain-specific dialog systems, open-domain conversation usually requires data-driven approaches, which can be roughly divided into two categories: retrieval-based and generation-based systems. Retrieval systems search a user-issued utterance (called a query) in a large database, and return a reply that best matches the query. Generative approaches, typically based on recurrent neural networks (RNNs), can synthesize new replies, but they suffer from the problem of generating short, meaningless utterances. In this paper, we propose a novel ensemble of retrieval-based and generation-based dialog systems in the open domain. In our approach, the retrieved candidate, in addition to the original query, is fed to an RNN-based reply generator, so that the neural model is aware of more information. The generated reply is then fed back as a new candidate for post-reranking. Experimental results show that such ensemble outperforms each single part of it by a large margin.

preprint2015arXiv

19 low mass hyper-velocity star candidates from the first data release of LAMOST survey

Hyper-velocity stars are believed to be ejected out from the Galactic center through dynamical interactions between (binary) stars and the central massive black hole(s). In this paper, we report 19 low mass F/G/K type hyper-velocity star candidates from over one mil- lion stars of the first data release of the LAMOST general survey. We determine the unbound probability for each candidate using a Monte-Carlo simulation by assuming a non-Gaussian proper-motion error distribution, Gaussian heliocentric distance and radial velocity error dis- tributions. The simulation results show that all the candidates have unbound possibilities over 50% as expected, and one of them may even exceed escape velocity with over 90% probabili- ty. In addition, we compare the metallicities of our candidates with the metallicity distribution functions of the Galactic bulge, disk, halo and globular cluster, and conclude that the Galactic bulge or disk is likely the birth place for our candidates.

preprint2015arXiv

A possible Macronova in the late afterglow of the `long-short' burst GRB 060614

Long-duration ($>2$ s) $γ$-ray bursts that are believed to originate from the death of massive stars are expected to be accompanied by supernovae. GRB 060614, that lasted 102 s, lacks a supernova-like emission down to very stringent limits and its physical origin is still debated. Here we report the discovery of near-infrared bump that is significantly above the regular decaying afterglow. This red bump is inconsistent with even the weakest known supernova. However, it can arise from a Li-Paczyński macronova $-$ the radioactive decay of debris following a compact binary merger. If this interpretation is correct GRB 060614 arose from a compact binary merger rather than from the death of a massive star and it was a site of a significant production of heavy r-process elements. The significant ejected mass favors a black hole-neutron star merger but a double neutron star merger cannot be ruled out.

preprint2015arXiv

Discovery of gamma-ray emission from a strongly lobe-dominated quasar 3C 275.1

We systematically analyze the 6-year {\it Fermi}/LAT data of the lobe-dominated quasars (LDQs) in the complete LDQ sample from 3CRR survey and report the discovery of high-energy $γ$-ray emission from 3C 275.1. The $γ$-ray emission of 3C 207 is confirmed and significant variability of the lightcurve is identified. We do not find statistically significant $γ$-ray emission from other LDQs. 3C 275.1 is the known $γ$-ray quasar with the lowest core dominance parameter (i.e., $R=0.11$). We also show that both the northern radio hotspot and parsec jet models can reasonably reproduce the $γ$-ray data. The parsec jet model, however, is favored by the potential $γ$-ray variability at the timescale of months. We suggest that some dimmer $γ$-ray LDQs will be detected in the future and LDQs could contribute non-negligibly to the extragalactic $γ$-ray background.

preprint2015arXiv

GeV excess in the Milky Way: The Role of Diffuse Galactic gamma ray Emission template

Several groups have analyzed the publicly-available Fermi-LAT data and reported a spatially extended $γ-$ray excess of around $1-3$ GeV from the region surrounding the Galactic Center that might originate from annihilation of dark matter particles with a rest mass $m_χ\sim 30-40$ GeV. In this work we examine the role of the diffuse Galactic gamma ray emission (DGE) templates played in suppressing the GeV excess. For such a purpose, we adopt in total 128 background templates that have been generated by Ackermann et al. \cite{FermiLAT:2012aa} in the study of the {Fermi-LAT} observations of the diffuse gamma ray emission considering the effects of cosmic rays and the interstellar medium. The possible GeV excess, assumed to follow the spatial distribution of the prompt gamma-rays produced in the annihilation of dark matter particles taking a generalized NFW profile with an inner slope $α=1.2$, has been analyzed in some regions of interest. The introduction of such an additional component centered at the Galactic center is found to have improved the goodness of fit to the data significantly in all background template models regardless of whether the excess spectrum is fixed or not. Our results thus suggest that the presence of a statistically significant GeV excess in the inner Galaxy is robust thought its spectrum depends on the DGE model adopted in the analysis. The possible physical origin of the GeV excess component is discussed and in the dark matter model the annihilation cross section of such particles is evaluated.

preprint2015arXiv

Identifying spatial invasion of pandemics on metapopulation networks via anatomizing arrival history

Spatial spread of infectious diseases among populations via the mobility of humans is highly stochastic and heterogeneous. Accurate forecast/mining of the spread process is often hard to be achieved by using statistical or mechanical models. Here we propose a new reverse problem, which aims to identify the stochastically spatial spread process itself from observable information regarding the arrival history of infectious cases in each subpopulation. We solved the problem by developing an efficient optimization algorithm based on dynamical programming, which comprises three procedures: i, anatomizing the whole spread process among all subpopulations into disjoint componential patches; ii, inferring the most probable invasion pathways underlying each patch via maximum likelihood estimation; iii, recovering the whole process by assembling the invasion pathways in each patch iteratively, without burdens in parameter calibrations and computer simulations. Based on the entropy theory, we introduced an identifiability measure to assess the difficulty level that an invasion pathway can be identified. Results on both artificial and empirical metapopulation networks show the robust performance in identifying actual invasion pathways driving pandemic spread.

preprint2015arXiv

Limitation of the Least Square Method in the Evaluation of Dimension of Fractal Brownian Motions

With the standard deviation for the logarithm of the re-scaled range $\langle |F(t+τ)-F(t)|\rangle$ of simulated fractal Brownian motions $F(t)$ given in a previous paper \cite{q14}, the method of least squares is adopted to determine the slope, $S$, and intercept, $I$, of the log$(\langle |F(t+τ)-F(t)|\rangle)$ vs $\rm{log}(τ)$ plot to investigate the limitation of this procedure. It is found that the reduced $χ^2$ of the fitting decreases with the increase of the Hurst index, $H$ (the expectation value of $S$), which may be attributed to the correlation among the re-scaled ranges. Similarly, it is found that the errors of the fitting parameters $S$ and $I$ are usually smaller than their corresponding standard deviations. These results show the limitation of using the simple least square method to determine the dimension of a fractal time series. Nevertheless, they may be used to reinterpret the fitting results of the least square method to determine the dimension of fractal Brownian motions more self-consistently. The currency exchange rate between Euro and Dollar is used as an example to demonstrate this procedure and a fractal dimension of 1.511 is obtained for spans greater than 30 transactions.

preprint2015arXiv

Maximizing the Sum Rate in Cellular Networks Using Multi-Convex Optimization

In this paper, we propose a novel algorithm to maximize the sum rate in interference-limited scenarios where each user decodes its own message with the presence of unknown interferences and noise considering the signal-to-interference-plus-noise-ratio. It is known that the problem of adapting the transmit and receive filters of the users to maximize the sum rate with a sum transmit power constraint is non-convex. Our novel approach is to formulate the sum rate maximization problem as an equivalent multi-convex optimization problem by adding two sets of auxiliary variables. An iterative algorithm which alternatingly adjusts the system variables and the auxiliary variables is proposed to solve the multi-convex optimization problem. The proposed algorithm is applied to a downlink cellular scenario consisting of several cells each of which contains a base station serving several mobile stations. We examine the two cases, with or without several half-duplex amplify-and-forward relays assisting the transmission. A sum power constraint at the base stations and a sum power constraint at the relays are assumed. Finally, we show that the proposed multi-convex formulation of the sum rate maximization problem is applicable to many other wireless systems in which the estimated data symbols are multi-affine functions of the system variables.

preprint2015arXiv

Maximum Likelihood Estimation for Single Linkage Hierarchical Clustering

We derive a statistical model for estimation of a dendrogram from single linkage hierarchical clustering (SLHC) that takes account of uncertainty through noise or corruption in the measurements of separation of data. Our focus is on just the estimation of the hierarchy of partitions afforded by the dendrogram, rather than the heights in the latter. The concept of estimating this "dendrogram structure'' is introduced, and an approximate maximum likelihood estimator (MLE) for the dendrogram structure is described. These ideas are illustrated by a simple Monte Carlo simulation that, at least for small data sets, suggests the method outperforms SLHC in the presence of noise.

preprint2015arXiv

The `excess' of primary cosmic ray electrons

With the accurate cosmic ray (CR) electron and positron spectra (denoted as $Φ_{\rm e^{-}}$ and $Φ_{\rm e^{+}}$, respectively) measured by AMS-02 collaboration, the difference between the electron and positron fluxes (i.e., $ΔΦ=Φ_{\rm e^{-}}-Φ_{\rm e^{+}}$), dominated by the propagated primary electrons, can be reliably inferred. In the standard model, the spectrum of propagated primary CR electrons at energies $\geq 30$ GeV softens with the increase of energy. The absence of any evidence for such a continuous spectral softening in $ΔΦ$ strongly suggests a significant `excess' of primary CR electrons and at energies of $100-400$ GeV the identified excess component has a flux comparable to that of the observed positron excess. Middle-age but `nearby' supernova remnants (e.g., Monogem and Geminga) are favored sources for such an excess.

preprint2015arXiv

The lightcurve of the macronova associated with the long-short burst GRB 060614

The {\it Swift}-detected GRB 060614 was a unique burst that straddles an imaginary divide between long- and short-duration gamma-ray bursts (GRBs), and its physical origin has been heavily debated over the years. Recently, a distinct very-soft F814W-band excess at $t\sim 13.6$ days after the burst was identified in a joint-analysis of VLT and HST optical afterglow data of GRB~060614, which has been interpreted as evidence for an accompanying Li-Paczynski macronova (also called a kilonova). Under the assumption that the afterglow data in the time interval of $1.7-3.0$ days after the burst are due to external forward shock emission, when this assumption is extrapolated to later times it is found that there is an excess of flux in several multi-band photometric observations. This component emerges at $\sim$4 days after the burst, and it may represent the first time that a multi-epoch/band lightcurve of a macronova has been obtained. The macronova associated with GRB 060614 peaked at $t\lesssim 4$ days after the burst, which is significantly earlier than that observed for a supernova associated with a long-duration GRB. Due to the limited data, no strong evidence for a temperature evolution is found. We derive a conservative estimate of the macronova rate of $\sim 16.3^{+16.3}_{-8.2}~{\rm Gpc^{-3}}{\rm yr^{-1}}$, implying a promising prospect for detecting the gravitational wave radiation from compact object mergers by upcoming Advanced LIGO/VIRGO/KAGRA detectors (i.e., the rate is ${\cal R}_{\rm GW} \sim 0.5^{+0.5}_{-0.25}(D/200~{\rm Mpc})^{3}~{\rm yr^{-1}}$).

preprint2014arXiv

A 2D based Partition Strategy for Solving Ranking under Team Context (RTP)

In this paper, we propose a 2D based partition method for solving the problem of Ranking under Team Context(RTC) on datasets without a priori. We first map the data into 2D space using its minimum and maximum value among all dimensions. Then we construct window queries with consideration of current team context. Besides, during the query mapping procedure, we can pre-prune some tuples which are not top ranked ones. This pre-classified step will defer processing those tuples and can save cost while providing solutions for the problem. Experiments show that our algorithm performs well especially on large datasets with correctness.

preprint2014arXiv

A Search for Strongly Mg-enhanced Stars from the Sloan Digital Sky Survey

Strongly Mg-enhanced stars with [Mg/Fe] $>$ 1.0 show peculiar abundance patterns and hence are of great interest for our understanding of stellar formation and chemical evolution of the Galaxy. A systematical search for strongly Mg-enhanced stars based on the low-resolution $(R\simeq2000)$ spectra of the Sloan Digital Sky Survey (SDSS) is carried out by finding the best matched synthetic spectrum to the observed one in the region of Mg I b lines around $λ$5170Å$~$via a profile matching method. The advantage of our method is that fitting parameters are refined by reproducing the [Mg/Fe] ratios of 47 stars from very precise high-resolution spectroscopic (HRS) analysis by Nissen and Schuster (2010); and these parameters are crucial to the precision and validity of the derived Mg abundances. As a further check of our method, Mg abundances are estimated with our method for member stars in four Galactic globular clusters (M92, M13, M3, M71) which cover the same metallicity range as our sample, and the results are in good agreement with those of HRS analysis in the literature. The validation of our method is also proved by the agreement of [Mg/Fe] between our values and those of HRS analysis by Aoki et al.(2013). Finally, 33 candidates of strongly Mg-enhanced stars with [Mg/Fe]$>$1.0 are selected from 14850 F and G stars. Follow-up observations will be carried out on these candidates with high-resolution spectroscopy by large telescopes in the near future, so as to check our selection procedure and to perform a precise and detailed abundance analysis and to explore the origins of these stars.

preprint2014arXiv

Discovering and Predicting Temporal Patterns of WiFi-interactive Social Populations

Extensive efforts have been devoted to characterizing the rich connectivity patterns among the nodes (components) of such complex networks (systems), and in the course of development of research in this area, people have been prompted to address on a fundamental question: How does the fascinating yet complex topological features of a network affect or determine the collective behavior and performance of the networked system? While elegant attempts to address this core issue have been made, for example, from the viewpoints of synchronization, epidemics, evolutionary cooperation, and the control of complex networks, theoretically or empirically, this widely concerned key question still remains open in the newly emergent field of network science. Such fruitful advances also push the desire to understand (mobile) social networks and characterize human social populations with the interdependent collective dynamics as well as the behavioral patterns. Nowadays, a great deal of digital technologies are unobtrusively embedded into the physical world of human daily activities, which offer unparalleled opportunities to explosively digitize human physical interactions, who is contacting with whom at what time. Such powerful technologies include the Bluetooth, the active Radio Frequency Identification (RFID) technology, wireless sensors and, more close to our interest in this paper, the WiFi technology. As a snapshot of the modern society, a university is in the coverage of WiFi signals, where the WiFi system records the digital access logs of the authorized WiFi users when they access the campus wireless services. Such WiFi access records, as the indirect proxy data, work as the effective proxy of a large-scale population's social interactions.

preprint2014arXiv

Fast radio bursts as a cosmic probe?

We discuss the possibility of using fast radio bursts (FRBs), if cosmological, as a viable cosmic probe. We find out that the contribution of the host galaxies to the detected dispersion measures can be inapparent for the FRBs not from galaxy centers or star forming regions. The inhomogeneity of the intergalactic medium (IGM), however, causes significant deviation of the dispersion measure from that predicted in the simplified homogeneous IGM model for individual event. Fortunately, with sufficient FRBs along different sightlines but within a very narrow redshift interval (e.g., $Δz \sim 0.05$ or $Δz \sim 0.05(1+z)$), the mean from averaging observed dispersion measures does not suffer such a problem and hence may be used as a cosmic probe. We show that in the optimistic case (e.g., tens FRBs in each $Δz$ have been measured; the most distant FRBs were at redshift $\geq 3$; the host galaxies and the FRB sources contribute little to the detected dispersion measures) and with all the uncertainties (i.e. the inhomogeneity of the IGM, the contribution and uncertainty of host galaxies as well as the evolution and error of $f_{\rm IGM}$) considered, FRBs could help constrain the equation of state of dark energy.

preprint2014arXiv

Model-dependent estimate on the connection between fast radio bursts and ultra-high energy cosmic rays

The existence of fast radio bursts (FRBs), a new type of extragalatic transients, has been established recently and quite a few models have been proposed. In this work we discuss the possible connection between the FRB sources and ultra-high energy ($>10^{18}$ eV) cosmic rays. We show that in the blitzar model and the model of merging binary neutron stars, the huge energy release of each FRB central engine together with the rather high rate of FRBs, the accelerated EeV cosmic rays may contribute significantly to the observed ones. In other FRB models including for example the merger of double white dwarfs and the energetic magnetar radio flares, no significant EeV cosmic ray is expected. We also suggest that the mergers of double neutron stars, even if they are irrelevant to FRBs, may play a non-ignorable role in producing EeV cosmic ray protons if supramassive neutron stars were formed in a good fraction of mergers and the merger rate is $\gtrsim 10^{3}~{\rm yr^{-1}~ Gpc^{-3}}$. Such a possibility will be unambiguously tested in the era of gravitational wave astronomy.

preprint2014arXiv

Object Selection under Team Context

Context-aware database has drawn increasing attention from both industry and academia recently by taking users' current situation and environment into consideration. However, most of the literature focus on individual context, overlooking the team users. In this paper, we investigate how to integrate team context into database query process to help the users' get top-ranked database tuples and make the team more competitive. We introduce naive and optimized query algorithm to select the suitable records and show that they output the same results while the latter is more computational efficient. Extensive empirical studies are conducted to evaluate the query approaches and demonstrate their effectiveness and efficiency.

preprint2014arXiv

Power of individuals -- Controlling centrality of temporal networks

Temporal networks are such networks where nodes and interactions may appear and disappear at various time scales. With the evidence of ubiquity of temporal networks in our economy, nature and society, it's urgent and significant to focus on structural controllability of temporal networks, which nowadays is still an untouched topic. We develop graphic tools to study the structural controllability of temporal networks, identifying the intrinsic mechanism of the ability of individuals in controlling a dynamic and large-scale temporal network. Classifying temporal trees of a temporal network into different types, we give (both upper and lower) analytical bounds of the controlling centrality, which are verified by numerical simulations of both artificial and empirical temporal networks. We find that the scale-free distribution of node's controlling centrality is virtually independent of the time scale and types of datasets, meaning the inherent heterogeneity and robustness of the controlling centrality of temporal networks.

preprint2014arXiv

Task-group Relatedness and Generalization Bounds for Regularized Multi-task Learning

In this paper, we study the generalization performance of regularized multi-task learning (RMTL) in a vector-valued framework, where MTL is considered as a learning process for vector-valued functions. We are mainly concerned with two theoretical questions: 1) under what conditions does RMTL perform better with a smaller task sample size than STL? 2) under what conditions is RMTL generalizable and can guarantee the consistency of each task during simultaneous learning? In particular, we investigate two types of task-group relatedness: the observed discrepancy-dependence measure (ODDM) and the empirical discrepancy-dependence measure (EDDM), both of which detect the dependence between two groups of multiple related tasks (MRTs). We then introduce the Cartesian product-based uniform entropy number (CPUEN) to measure the complexities of vector-valued function classes. By applying the specific deviation and the symmetrization inequalities to the vector-valued framework, we obtain the generalization bound for RMTL, which is the upper bound of the joint probability of the event that there is at least one task with a large empirical discrepancy between the expected and empirical risks. Finally, we present a sufficient condition to guarantee the consistency of each task in the simultaneous learning process, and we discuss how task relatedness affects the generalization performance of RMTL. Our theoretical findings answer the aforementioned two questions.

preprint2014arXiv

When reputation enforces evolutionary cooperation in unreliable MANETs

In self-organized mobile ad hoc networks (MANETs), network functions rely on cooperation of self-interested nodes, where a challenge is to enforce their mutual cooperation. In this paper, we study cooperative packet forwarding in a one-hop unreliable channel which results from loss of packets and noisy observation of transmissions. We propose an indirect reciprocity framework based on evolutionary game theory, and enforce cooperation of packet forwarding strategies in both structured and unstructured MANETs. Furthermore, we analyze the evolutionary dynamics of cooperative strategies, and derive the threshold of benefit-to-cost ratio to guarantee the convergence of cooperation. The numerical simulations verify that the proposed evolutionary game theoretic solution enforces cooperation when the benefit-to-cost ratio of the altruistic exceeds the critical condition. In addition, the network throughput performance of our proposed strategy in structured MANETs is measured, which is in close agreement with that of the full cooperative strategy.

preprint2013arXiv

Constraining the distribution of dark matter in inner galaxy with indirect detection signal: The case of tentative 130 gev γ-ray line

The dark matter distribution in the very inner region of our Galaxy is still in debate. In the N-body simulations a cuspy dark matter halo density profile is favored. Several dissipative baryonic processes however are found to be able to significantly flatten dark matter distribution and a cored dark matter halo density profile is possible. The baryons dominate the gravitational potential in the inner Galaxy, hence a direct constrain on the abundance of the dark matter particles is rather challenging. Recently, a few groups have identified a tentative 130 GeV line signal in the Galactic center, which could be interpreted as the signal of the dark matter annihilation. With current 130 GeV line data and adopting the generalized Navarro-Frenk-White profile of the dark matter halo, for local dark matter density ρ_0=0.4 GeV cm^{-3} and r_s=20 kpc we obtain a 95% confidence level lower (upper) limit on the inner slope of dark matter density distribution α= 1.06 (the cross section of dark matter annihilation into gamma-rays <σv >_{χχ-> γγ}= 1.3\times 10^{-27} cm^3 s^{-1}). Such a slope is consistent with the results of some N-body simulations, and if the signal is due to dark matter, suggests that baryonic processes may be unimportant.

preprint2013arXiv

Knowledge-Aided STAP Using Low Rank and Geometry Properties

This paper presents knowledge-aided space-time adaptive processing (KA-STAP) algorithms that exploit the low-rank dominant clutter and the array geometry properties (LRGP) for airborne radar applications. The core idea is to exploit the fact that the clutter subspace is only determined by the space-time steering vectors, {red}{where the Gram-Schmidt orthogonalization approach is employed to compute the clutter subspace. Specifically, for a side-looking uniformly spaced linear array, the} algorithm firstly selects a group of linearly independent space-time steering vectors using LRGP that can represent the clutter subspace. By performing the Gram-Schmidt orthogonalization procedure, the orthogonal bases of the clutter subspace are obtained, followed by two approaches to compute the STAP filter weights. To overcome the performance degradation caused by the non-ideal effects, a KA-STAP algorithm that combines the covariance matrix taper (CMT) is proposed. For practical applications, a reduced-dimension version of the proposed KA-STAP algorithm is also developed. The simulation results illustrate the effectiveness of our proposed algorithms, and show that the proposed algorithms converge rapidly and provide a SINR improvement over existing methods when using a very small number of snapshots.

preprint2013arXiv

PAMELA/Fermi-LAT electron cosmic ray spectrum at $\sim$100 GeV: implication for dark matter annihilation signal in accordance with the 130 GeV $γ$-ray line

Recently, a tentative 130 GeV $γ$-ray line signal was identified by quite a few groups. If correct it would constitute a ``smoking gun'' for dark matter annihilations. Interestingly, the spectra of the cosmic ray electrons detected by PAMELA and Fermi-LAT both show tiny wiggle-like structure at $\sim 100$ GeV, which might indicate a weak signal of the annihilation of $\sim 130$ GeV dark matter particles into electrons/positrons with a velocity-weighted cross section $\langleσv\rangle_{\rm χχ\rightarrow e^{+}e^{-}} \sim 4\times10^{-26}~{\rm cm^{3}~s^{-1}}$. The prospect of identifying such a potential weak dark-matter-annihilation electron and/or positron component by AMS-02, a mission to measure the high energy cosmic ray spectra with unprecedented accuracy, is investigated.

preprint2012arXiv

Dual functions of anti-reflectance and surface passivation of the atomic layer deposited Al2O3 films on crystalline silicon substrates

Surface anti-reflectance and passivation properties of the Al2O3 films deposited on crystalline Si substrates by atomic layer deposition are investigated. Textured Si with 100 nm Al2O3 shows a very low average reflectance of ~2.8 %. Both p-type and n-type Si wafers are well passivated by Al2O3 films. The maximal minority carrier lifetimes are improved from ~10 μs before Al2O3 passivation to above 3 ms for both p-type and n-type Si after Al2O3 passivation layer deposition and annealing at an appropriate temperature. Our results indicate the dual functions of anti-reflectance and surface passivation in c-Si solar cell applications.

preprint2012arXiv

Estimating the value of containment strategies in delaying the arrival time of an influenza pandemic: A case study of travel restriction and patient isolation

With a simple phenomenological metapopulation model, which characterizes the invasion process of an influenza pandemic from a source to a subpopulation at risk, we compare the efficiency of inter- and intra-population interventions in delaying the arrival of an influenza pandemic. We take travel restriction and patient isolation as examples, since in reality they are typical control measures implemented at the inter- and intra-population levels, respectively. We find that the intra-population interventions, e.g., patient isolation, perform better than the inter-population strategies such as travel restriction if the response time is small. However, intra-population strategies are sensitive to the increase of the response time, which might be inevitable due to socioeconomic reasons in practice and will largely discount the efficiency.

preprint2012arXiv

The σlaw of evolutionary dynamics in community-structured populations

Evolutionary game dynamics in finite populations provides a new framework to understand the selection of traits with frequency-dependent fitness. Recently, a simple but fundamental law of evolutionary dynamics, which we call σ law, describes how to determine the selection between two competing strategies: in most evolutionary processes with two strategies, A and B, strategy A is favored over B in weak selection if and only if σR + S > T + σP. This relationship holds for a wide variety of structured populations with mutation rate and weak selection under certain assumptions. In this paper, we propose a model of games based on a community-structured population and revisit this law under the Moran process. By calculating the average payoffs of A and B individuals with the method of effective sojourn time, we find that σ features not only the structured population characteristics but also the reaction rate between individuals. That's to say, an interaction between two individuals are not uniform, and we can take σ as a reaction rate between any two individuals with the same strategy. We verify this viewpoint by the modified replicator equation with non-uniform interaction rates in a simplified version of the prisoner's dilemma game (PDG).

preprint2012arXiv

Towards a temporal network analysis of interactive WiFi users

Complex networks are used to depict topological features of complex systems. The structure of a network characterizes the interactions among elements of the system, and facilitates the study of many dynamical processes taking place on it. In previous investigations, the topological infrastructure underlying dynamical systems is simplified as a static and invariable skeleton. However, this assumption cannot cover the temporal features of many time-evolution networks, whose components are evolving and mutating. In this letter, utilizing the log data of WiFi users in a Chinese university campus, we infuse the temporal dimension into the construction of dynamical human contact network. By quantitative comparison with the traditional aggregation approach, we find that the temporal contact network differs in many features, e.g., the reachability, the path length distribution. We conclude that the correlation between temporal path length and duration is not only determined by their definitions, but also influenced by the microdynamical features of human activities under certain social circumstance as well. The time order of individuals' interaction events plays a critical role in understanding many dynamical processes via human close proximity interactions studied in this letter. Besides, our study also provides a promising measure to identify the potential superspreaders by distinguishing the nodes functioning as the relay hub.

preprint2011arXiv

Emission Patterns and Light Curves of Gamma-Rays in the Pulsar Magnetosphere with a Current-Induced Magnetic Field

We study the emission patterns and light curves of gamma-rays in the pulsar magnetosphere with a current-induced magnetic field perturbation. Based on the solution of a static dipole with the magnetic field induced by some currents (perturbation field), we derive the solutions of a static as well as a retarded dipole with the perturbation field in the Cartesian coordinates. The static (retarded) magnetic field can be expressed as the sum of pure static (retarded) dipolar magnetic field and the static (retarded) perturbation field. We use the solution of the retarded magnetic field to investigate the influence of the perturbation field on the emission patterns and light curves, and we apply the perturbed solutions to calculate the gamma-ray light curves for the case of the Vela pulsar. We find out that the perturbation field induced by the currents will change the emission patterns and then light curves of gamma-rays, especially for a larger perturbation field. Our results indicate that the perturbation field created by the outward-flowing (inward-flowing) electrons (positrons) can decrease the rotation effect on the magnetosphere and makes emission pattern appear to be more smooth relative to that of the pure retarded dipole, but the perturbation field created by the outward-flowing (inward-flowing) positrons (electrons) can make the emission pattern less smooth.

preprint2011arXiv

Evolution of scaling emergence in large-scale spatial epidemic spreading

Background: Zipf's law and Heaps' law are two representatives of the scaling concepts, which play a significant role in the study of complexity science. The coexistence of the Zipf's law and the Heaps' law motivates different understandings on the dependence between these two scalings, which is still hardly been clarified. Methodology/Principal Findings: In this article, we observe an evolution process of the scalings: the Zipf's law and the Heaps' law are naturally shaped to coexist at the initial time, while the crossover comes with the emergence of their inconsistency at the larger time before reaching a stable state, where the Heaps' law still exists with the disappearance of strict Zipf's law. Such findings are illustrated with a scenario of large-scale spatial epidemic spreading, and the empirical results of pandemic disease support a universal analysis of the relation between the two laws regardless of the biological details of disease. Employing the United States(U.S.) domestic air transportation and demographic data to construct a metapopulation model for simulating the pandemic spread at the U.S. country level, we uncover that the broad heterogeneity of the infrastructure plays a key role in the evolution of scaling emergence. Conclusions/Significance: The analyses of large-scale spatial epidemic spreading help understand the temporal evolution of scalings, indicating the coexistence of the Zipf's law and the Heaps' law depends on the collective dynamics of epidemic processes, and the heterogeneity of epidemic spread indicates the significance of performing targeted containment strategies at the early time of a pandemic disease.

preprint2011arXiv

Measuring the viscous and elastic properties of single cells using video particle tracking microrheology

We present a simple and \emph{non-invasive} experimental procedure to measure the linear viscoelastic properties of cells by passive video particle tracking microrheology. In order to do this, a generalised Langevin equation is adopted to relate the time-dependent thermal fluctuations of a bead, chemically bound to the cell's \emph{exterior}, to the frequency-dependent viscoelastic moduli of the cell. It is shown that these moduli are related to the cell's cytoskeletal structure, which in this work is changed by varying the solution osmolarity from iso- to hypo-osmotic conditions. At high frequencies, the viscoelastic moduli frequency dependence changes from $\propto ω^{3/4}$ found in iso-osmotic solutions to $\propto ω^{1/2}$ in hypo--osmotic solutions; the first situation is typical of bending modes in isotropic \textit{in vitro} reconstituted F--actin networks, and the second could indicate that the restructured cytoskeleton behaves as a gel with "\textit{dangling branches}". The insights gained from this form of rheological analysis could prove to be a valuable addition to studies that address cellular physiology and pathology.

preprint2006arXiv

Knowledge-based energy functions for computational studies of proteins

This chapter discusses theoretical framework and methods for developing knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also describe a geometric model for developing both linear and non-linear potential functions by optimization. Applications of knowledge-based potential functions in protein-decoy discrimination, in protein-protein interactions, and in protein design are then described. Several issues of knowledge-based potential functions are finally discussed.

preprint2005arXiv

Uniform synchronous criticality of diversely random complex networks

We investigate collective synchronous behaviors in random complex networks of limit-cycle oscillators with the non-identical asymmetric coupling scheme, and find a uniform coupling criticality of collective synchronization which is independent of complexity of network topologies. Numerically simulations on categories of random complex networks have verified this conclusion.

preprint2004arXiv

On the topology of the world exchange arrangements web

Exchange arrangements among different countries over the world are foundations of the world economy, which generally stand behind the daily economic evolution. As the first study of the world exchange arrangements web (WEAW), we built a bipartite network with countries as one type of nodes and currencies as the other, and found it to have a prominent scale-free feature with a power-law degree distribution. In a further empirical study of the currency section of the WEAW, we calculated the clustering coefficients, average nearest-neighbors degree, and average shortest distance. As an essential economic network, the WEAW is found to be a correlated disassortative network with a hierarchical structure, possessing a more prominent scale-free feature than the world trade web (WTW).

Xiang Li

What is connected

Connect this record

See the researcher in context

Building this map preview

161 published item(s)

MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

Non-aligned supervision for Real Image Dehazing

Designing optimal linear detectors -- a bottom-up approach

Differentiate ChatGPT-generated and Human-written Medical Texts

DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for Autonomous Driving

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

Towards Spatial Equilibrium Object Detection

2D Toda $τ$ Functions, Weighted Hurwitz Numbers and the Cayley Graph: Determinant Representation and Recursion Formula

A density peaks clustering algorithm with sparse search and K-d tree

A Neural Network Architecture for Program Understanding Inspired by Human Behaviors

Ab initio calculation of real solids via neural network ansatz

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

Angular emission patterns of remnant black holes

ARROCH: Augmented Reality for Robots Collaborating with a Human

Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding

AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution

Benchmarking of DL Libraries and Models on Mobile Devices

CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

Comment on "Comparison of six simulation codes for positive streamers in air"(Plasma Sources Sci. Technol. 27 (2018) 095002)

Distributed Online Optimization in Time-Varying Unbalanced Networks without Explicit Subgradients

DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection

Dual Power Spectrum Manifold and Toeplitz HPD Manifold: Enhancement and Analysis for Matrix CFAR Detection

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information

EEG based Emotion Recognition: A Tutorial and Review

Effects of homophily and heterophily on preferred-degree networks: mean-field analysis and overwhelming transition

Enhanced ultrafast X-ray diffraction by transient resonances

Entire solutions of the generalized Hessian inequality

Finding Global Homophily in Graph Neural Networks When Meeting Heterophily

Forgery Attack Detection in Surveillance Video Streams Using Wi-Fi Channel State Information

Global Model Learning for Large Deformation Control of Elastic Deformable Linear Objects: An Efficient and Adaptive Approach

High-pressure control of optical nonlinearity in the polar Weyl semimetal TaAs

Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation

Hydrodynamic Relaxation in a Strongly Interacting Fermi Gas

Influence of impurities on electronic structure in cuprate superconductors

Learning a General Clause-to-Clause Relationships for Enhancing Emotion-Cause Pair Extraction

Lexical Knowledge Internalization for Neural Dialog Generation

Mamba: a systematic software solution for beamline experiments at HEPS

Modeling Users' Contextualized Page-wise Feedback for Click-Through Rate Prediction in E-commerce Search

Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion

Online Video Instance Segmentation via Robust Context Fusion

PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection

RAW-GNN: RAndom Walk Aggregation based Graph Neural Network

RecursiveMix: Mixed Learning with History

RigNet: Repetitive Image Guided Network for Depth Completion

Robust Inverse Framework using Knowledge-guided Self-Supervised Learning: An application to Hydrology

Scalable Simulation of Quantum Measurement Process with Quantum Computers

Seamless Interaction Design with Coexistence and Cooperation Modes for Robust Human-Robot Collaboration

Shape Control of Deformable Linear Objects with Offline and Online Learning of Local Linear Deformation Models

Speech Emotion Recognition with Global-Aware Fusion on Multi-scale Feature Representation

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Trajectories and Radiation of Charged Particles in the Pulsar Magnetosphere

Tuning the competition between superconductivity and charge order in kagome superconductor Cs(V1-xNbx)3Sb5

Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

Weakly Supervised Text Classification using Supervision Signals from a Language Model

WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation Models

Bridging Unpaired Facial Photos And Sketches By Line-drawings

Contact Map based Crystal Structure Prediction using Global Optimization

Delayed Projection Techniques for Linearly Constrained Problems: Convergence Rates, Acceleration, and Applications

Disentangling User Interest and Conformity for Recommendation with Causal Embedding

Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition

Leveraging Meta-path Contexts for Classification in Heterogeneous Information Networks

SceneRec: Scene-Based Graph Neural Networks for Recommender Systems

Shallow geothermal energy potential for heating and cooling of buildings with regeneration under climate change scenarios

Towards Cross-Modal Forgery Detection and Localization on Live Surveillance Videos

vrCAPTCHA: Exploring CAPTCHA Designs in Virtual Reality

3DMotion-Net: Learning Continuous Flow Function for 3D Motion Prediction

A kilonova associated with GRB 070809

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Airborne LiDAR Point Cloud Classification with Graph Attention Convolution Neural Network

Antiferromagnetism of Double Molybdate LiFe(MoO$_4$)$_2$