Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
129works
0followers
59topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

129 published item(s)

preprint2026arXiv

LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to capture fine-grained physical dynamics, they remain predominantly confined to static imitation learning, severely limiting their adaptability and generalization. In this paper, we present LaST-R1, a novel reinforcement learning (RL) post-training framework designed to effectively harness "latent reasoning-before-acting" policies. Specifically, we propose Latent-to-Action Policy Optimization (LAPO), a core RL algorithm that jointly optimizes the latent reasoning process and the action generation. By explicitly embedding latent Chain-of-Thought (CoT) reasoning directly within the RL optimization loop, LAPO stimulates profound physical world modeling, which in turn drives robust execution in interactive environments. Furthermore, an adaptive latent CoT mechanism is introduced, allowing the policy to dynamically modulate its reasoning horizon based on diverse environment states. Experiments show that LaST-R1 achieves a near-perfect 99.9% average success rate on the LIBERO benchmark with only one-shot supervised warm-up, significantly improving convergence speed and performance over prior state-of-the-art (SOTA) methods. In real-world deployments, LaST-R1 yields up to a 22.5% average improvement over SOTA supervised fine-tuning approach across four complex tasks, including both single-arm and dual-arm settings. Finally, LaST-R1 demonstrates strong generalization across simulated and real-world environments.

preprint2026arXiv

Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes the need for phase-coherent combining, but a single energy measurement is nonnegative and, therefore, cannot represent signed model updates. This paper introduces resource-element energy difference (REED), a noncoherent physical-layer primitive for continuous signed aggregation. REED maps the positive and negative parts of each real-valued update to transmit energies on paired orthogonal resource elements and estimates the signed sum by subtracting the corresponding received energies. The construction uses slow-timescale calibration of average channel powers, but does not require instantaneous transmitter- or receiver-side CSI or channel inversion. For independent Rayleigh fading, we derive exact first- and second-moment expressions for single-shot REED and for a chip-diverse extension that spreads each coordinate over multiple independently faded paired chips. The resulting variance laws separate fading-induced self-noise, signal-noise interaction, and receiver-noise fluctuation, giving an explicit diversity-resource tradeoff. More->The rest of abstract is in the paper.

preprint2026arXiv

Solving Max-Cut to Global Optimality via Feasibility-Preserving Graph Neural Networks

Exact solution of hard combinatorial optimization problems often relies on strong convex relaxations, but solving these relaxations repeatedly inside a branch-and-bound algorithm can be prohibitively expensive. Hence, we consider this challenge for Max-Cut, where branch and bound commonly uses semidefinite programming (SDP) relaxations to bound subproblems. We propose a Max-Cut-specific graph neural network that serves as a principled, lightweight neural proxy for these SDP solvers and can be plugged directly into an exact branch-and-bound framework. The proposed architecture has update steps of complexity $\mathcal{O}(n^2 + ne)$, and predicts both primal- and dual-feasible SDP solutions. The primal SDP solutions yield feasible Max-Cut solutions via the Goemans--Williamson algorithm. In addition, it is trained in a self-supervised fashion without requiring solved SDP relaxations as labels. Empirically, we show that our architecture can substantially reduce the cost of bounding in exact Max-Cut solving by up to $10.6 \times$ compared with using the state-of-the-art SDP solver Mosek. Our work highlights the potential of learned, validity-preserving surrogates for accelerating exact optimization over structured convex relaxations.

preprint2024arXiv

DeepPhysiNet: Bridging Deep Learning and Atmospheric Physics for Accurate and Continuous Weather Modeling

Accurate weather forecasting holds significant importance to human activities. Currently, there are two paradigms for weather forecasting: Numerical Weather Prediction (NWP) and Deep Learning-based Prediction (DLP). NWP utilizes atmospheric physics for weather modeling but suffers from poor data utilization and high computational costs, while DLP can learn weather patterns from vast amounts of data directly but struggles to incorporate physical laws. Both paradigms possess their respective strengths and weaknesses, and are incompatible, because physical laws adopted in NWP describe the relationship between coordinates and meteorological variables, while DLP directly learns the relationships between meteorological variables without consideration of coordinates. To address these problems, we introduce the DeepPhysiNet framework, incorporating physical laws into deep learning models for accurate and continuous weather system modeling. First, we construct physics networks based on multilayer perceptrons (MLPs) for individual meteorological variable, such as temperature, pressure, and wind speed. Physics networks establish relationships between variables and coordinates by taking coordinates as input and producing variable values as output. The physical laws in the form of Partial Differential Equations (PDEs) can be incorporated as a part of loss function. Next, we construct hyper-networks based on deep learning methods to directly learn weather patterns from a large amount of meteorological data. The output of hyper-networks constitutes a part of the weights for the physics networks. Experimental results demonstrate that, upon successful integration of physical laws, DeepPhysiNet can accomplish multiple tasks simultaneously, not only enhancing forecast accuracy but also obtaining continuous spatiotemporal resolution results, which is unattainable by either the NWP or DLP.

preprint2024arXiv

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers' income and enabling passengers to travel at lower prices than taxi/car on-demand services (only one passenger can be assigned to a car at a time like UberX and Lyft). Although on-demand ride pooling services can bring so many benefits, ride pooling services need a well-defined matching strategy to maximize the benefits for all parties (passengers, drivers, aggregation companies and environment), in which the regional dispatching of vehicles has a significant impact on the matching and revenue. Existing algorithms often only consider revenue maximization, which makes it difficult for requests with unusual distribution to get a ride. How to increase revenue while ensuring a reasonable assignment of requests brings a challenge to ride pooling service companies (aggregation companies). In this paper, we propose a framework for vehicle dispatching for ride pooling tasks, which splits the city into discrete dispatching regions and uses the reinforcement learning (RL) algorithm to dispatch vehicles in these regions. We also consider the mutual information (MI) between vehicle and order distribution as the intrinsic reward of the RL algorithm to improve the correlation between their distributions, thus ensuring the possibility of getting a ride for unusually distributed requests. In experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly increase revenue up to an average of 3\% over the existing best on-demand ride pooling method.

preprint2024arXiv

RHOBIN Challenge: Reconstruction of Human Object Interaction

Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}.

preprint2024arXiv

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code is made available at \url{https://github.com/baaivision/vid2vid-zero}.

preprint2023arXiv

A Survey on Evaluation of Large Language Models

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.

preprint2023arXiv

Automatically Prepare Training Data for YOLO Using Robotic In-Hand Observation and Synthesis

Deep learning methods have recently exhibited impressive performance in object detection. However, such methods needed much training data to achieve high recognition accuracy, which was time-consuming and required considerable manual work like labeling images. In this paper, we automatically prepare training data using robots. Considering the low efficiency and high energy consumption in robot motion, we proposed combining robotic in-hand observation and data synthesis to enlarge the limited data set collected by the robot. We first used a robot with a depth sensor to collect images of objects held in the robot's hands and segment the object pictures. Then, we used a copy-paste method to synthesize the segmented objects with rack backgrounds. The collected and synthetic images are combined to train a deep detection neural network. We conducted experiments to compare YOLOv5x detectors trained with images collected using the proposed method and several other methods. The results showed that combined observation and synthetic images led to comparable performance to manual data preparation. They provided a good guide on optimizing data configurations and parameter settings for training detectors. The proposed method required only a single process and was a low-cost way to produce the combined data. Interested readers may find the data sets and trained models from the following GitHub repository: github.com/wrslab/tubedet

preprint2023arXiv

Distance Guided Generative Adversarial Network for Explainable Binary Classifications

Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by mapping the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.

preprint2023arXiv

Learning Invariance from Generated Variance for Unsupervised Person Re-identification

This work focuses on unsupervised representation learning in person re-identification (ReID). Recent self-supervised contrastive learning methods learn invariance by maximizing the representation similarity between two augmented views of a same image. However, traditional data augmentation may bring to the fore undesirable distortions on identity features, which is not always favorable in id-sensitive ReID tasks. In this paper, we propose to replace traditional data augmentation with a generative adversarial network (GAN) that is targeted to generate augmented views for contrastive learning. A 3D mesh guided person image generator is proposed to disentangle a person image into id-related and id-unrelated features. Deviating from previous GAN-based ReID methods that only work in id-unrelated space (pose and camera style), we conduct GAN-based augmentation on both id-unrelated and id-related features. We further propose specific contrastive losses to help our network learn invariance from id-unrelated and id-related augmentations. By jointly training the generative and the contrastive modules, our method achieves new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks.

preprint2023arXiv

Power Corrections to Energy Flow Correlations from Large Spin Perturbation

Dynamics of high energy scattering in Quantum Chromodynamics (QCD) are primarily probed through detector energy flow correlations. One important example is the Energy-Energy Correlator (EEC), whose back-to-back limit probes correlations of QCD on the lightcone and can be described by a transverse-momentum dependent factorization formula in the leading power approximation. In this work, we develop a systematic method to go beyond this approximation. We identify the origin of logarithmically enhanced contributions in the back-to-back limit as the exchange of operators with low twists and large spins in the local operator product expansion. Using techniques from the conformal bootstrap, the large logarithms beyond leading power can be resummed to all orders in the perturbative coupling. As an illustration of this method, we perform an all-order resummation of the leading and next-to-leading logarithms beyond the leading power in ${\cal N} = 4$ Super Yang-Mills theory.

preprint2023arXiv

Practical and powerful kernel-based change-point detection

Change-point analysis plays a significant role in various fields to reveal discrepancies in distribution in a sequence of observations. While a number of algorithms have been proposed for high-dimensional data, kernel-based methods have not been well explored due to difficulties in controlling false discoveries and mediocre performance. In this paper, we propose a new kernel-based framework that makes use of an important pattern of data in high dimensions to boost power. Analytic approximations to the significance of the new statistics are derived and fast tests based on the asymptotic results are proposed, offering easy off-the-shelf tools for large datasets. The new tests show superior performance for a wide range of alternatives when compared with other state-of-the-art methods. We illustrate these new approaches through an analysis of a phone-call network data. All proposed methods are implemented in an R package KerSeg.

preprint2023arXiv

Professional Network Matters: Connections Empower Person-Job Fit

Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions. While existing works leverage historical or contextual information, they often disregard a crucial aspect: job seekers' social relationships in professional networks. This paper emphasizes the importance of incorporating professional networks into the Person-Job Fit model. Our innovative approach consists of two stages: (1) defining a Workplace Heterogeneous Information Network (WHIN) to capture heterogeneous knowledge, including professional connections and pre-training representations of various entities using a heterogeneous graph neural network; (2) designing a Contextual Social Attention Graph Neural Network (CSAGNN) that supplements users' missing information with professional connections' contextual information. We introduce a job-specific attention mechanism in CSAGNN to handle noisy professional networks, leveraging pre-trained entity representations from WHIN. We demonstrate the effectiveness of our approach through experimental evaluations conducted across three real-world recruitment datasets from LinkedIn, showing superior performance compared to baseline models.

preprint2023arXiv

Study of the $ω$/$ω_3$, $ρ$/$ρ_3$ and the newly observed $ω$-like state $X(2220)$

We study the excited states of $ω$ and $ω_3$ by comparison with the $ρ$ and $ρ_3$ families, and discuss the possibility of $X(2220)$ as $ω$ excitation by analyzing the mass spectra and strong decay behaviors. In addition, we predict the masses and widths of $ω(2D)$ and $ω_3$and $ρ_3(4D)$, $ρ_3(1G)$, $ω_3$ and $ρ_3(2G)$ and $ω_3(3G)$ and $ρ_3(3G)$. The abundant information of their two-body strong decays predicted in this work will be helpful to further study of these $ω$ and $ω_3$ and $ρ$ and $ρ_3$ states in experiment and theory.

preprint2023arXiv

Transformer in Transformer as Backbone for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings consistently.

preprint2022arXiv

60-nm-span wavelength-tunable vortex fiber laser with intracavity plasmon metasurfaces

Wavelength-tunable vortex fiber lasers that could generate beams carrying orbital angular momentum (OAM) hold great interest in large-capacity optical communications. The wavelength tunability of conventional vortex fiber lasers is however limited by the range of 35 nm due to narrow bandwidth and/or insertion loss of mode conversion components. Optical metasurfaces apart from being compact planar components can flexibly manipulate light with high efficiency in a broad wavelength range. Here, we propose and demonstrate for the first time, to the best of our knowledge, a metasurface-assisted vortex fiber laser that can directly generate OAM beams with changeable topological charges. Due to the designed broadband gap-surface plasmon metasurface, combined with an intracavity tunable filter, the laser enables OAM beam with center wavelength continuously tunable from 1015 nm to 1075 nm, nearly twice of other vortex fiber lasers ever reported. The metasurface can be designed at will to satisfy requirements for either low pump threshold or high slope efficiency of the laser. Furthermore, the cavity-metasurface configuration can be extended to generate higher-order OAM beams or more complex structured beams in different wavelength regions, which greatly broadens the possibilities for developing low-cost and high-quality structured-beam laser sources.

preprint2022arXiv

A Fast and Efficient Change-point Detection Framework based on Approximate $k$-Nearest Neighbor Graphs

Change-point analysis is thriving in this big data era to address problems arising in many fields where massive data sequences are collected to study complicated phenomena over time. It plays an important role in processing these data by segmenting a long sequence into homogeneous parts for follow-up studies. The task requires the method to be able to process large datasets quickly and deal with various types of changes for high-dimensional data. We propose a new approach making use of approximate $k$-nearest neighbor information from the observations, and derive an analytic formula to control the type I error. The time complexity of our proposed method is $O\left(dn(\log n+k \log d)+nk^2\right)$ for an $n$-length sequence of $d$-dimensional data. The test statistic we consider incorporates a useful pattern for moderate- to high- dimensional data so that the proposed method could detect various types of changes in the sequence. The new approach is also asymptotic distribution free, facilitating its usage for a broader community. We apply our method to fMRI datasets and Neuropixels datasets to illustrate its effectiveness.

preprint2022arXiv

A two-grid temporal second-order scheme for the two-dimensional nonlinear Volterra integro-differential equation with weakly singular kernel

In this paper, a two-grid temporal second-order scheme for the two-dimensional nonlinear Volterra integro-differential equation with weakly singular kernel is proposed to reduce the computation time and improve the accuracy of the scheme developed by Xu et al. (Applied Numerical Mathematics 152 (2020) 169-184). The proposed scheme consists of three steps: First, a small nonlinear system is solved on the coarse grid using fix-point iteration. Second, the Lagrange's linear interpolation formula is used to arrive at some auxiliary values for analysis of the fine grid. Finally, a linearized Crank-Nicolson finite difference system is solved on the fine grid. Moreover, the algorithm uses a central difference approximation for the spatial derivatives. In the time direction, the time derivative and integral term are approximated by Crank-Nicolson technique and product integral rule, respectively. With the help of the discrete energy method, the stability and space-time second-order convergence of the proposed approach are obtained in $L^2$-norm. Finally, the numerical results agree with the theoretical analysis and verify the effectiveness of the algorithm.

preprint2022arXiv

A Voxel Graph CNN for Object Classification with Event Cameras

Event cameras attract researchers' attention due to their low power consumption, high dynamic range, and extremely high temporal resolution. Learning models on event-based object classification have recently achieved massive success by accumulating sparse events into dense frames to apply traditional 2D learning methods. Yet, these approaches necessitate heavy-weight models and are with high computational complexity due to the redundant information introduced by the sparse-to-dense conversion, limiting the potential of event cameras on real-life applications. This study aims to address the core problem of balancing accuracy and model complexity for event-based classification models. To this end, we introduce a novel graph representation for event data to exploit their sparsity better and customize a lightweight voxel graph convolutional neural network (\textit{EV-VGCNN}) for event-based classification. Specifically, (1) using voxel-wise vertices rather than previous point-wise inputs to explicitly exploit regional 2D semantics of event streams while keeping the sparsity;(2) proposing a multi-scale feature relational layer (\textit{MFRL}) to extract spatial and motion cues from each vertex discriminatively concerning its distances to neighbors. Comprehensive experiments show that our model can advance state-of-the-art classification accuracy with extremely low model complexity (merely 0.84M parameters).

preprint2022arXiv

An Intermediate-level Attack Framework on The Basis of Linear Regression

This paper substantially extends our work published at ECCV, in which an intermediate-level attack was proposed to improve the transferability of some baseline adversarial examples. Specifically, we advocate a framework in which a direct linear mapping from the intermediate-level discrepancies (between adversarial features and benign features) to prediction loss of the adversarial example is established. By delving deep into the core components of such a framework, we show that 1) a variety of linear regression models can all be considered in order to establish the mapping, 2) the magnitude of the finally obtained intermediate-level adversarial discrepancy is correlated with the transferability, 3) further boost of the performance can be achieved by performing multiple runs of the baseline attack with random initialization. In addition, by leveraging these findings, we achieve new state-of-the-arts on transfer-based $\ell_\infty$ and $\ell_2$ attacks. Our code is publicly available at https://github.com/qizhangli/ila-plus-plus-lr.

preprint2022arXiv

Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning

Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data to a centralized location from distinct ones. This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices and the learning algorithm run on-device, with the aim of relaxing the burden at a central entity/server. Although gossip-based approaches have been used for this purpose in different use cases, they suffer from high communication costs, especially when the number of devices is large. To mitigate this, incremental-based methods are proposed. We first introduce incremental block-coordinate descent (I-BCD) for the decentralized ML, which can reduce communication costs at the expense of running time. To accelerate the convergence speed, an asynchronous parallel incremental BCD (API-BCD) method is proposed, where multiple devices/agents are active in an asynchronous fashion. We derive convergence properties for the proposed methods. Simulation results also show that our API-BCD method outperforms state of the art in terms of running time and communication costs.

preprint2022arXiv

Benchmarking Domain Generalization on EEG-based Emotion Recognition

Electroencephalography (EEG) based emotion recognition has demonstrated tremendous improvement in recent years. Specifically, numerous domain adaptation (DA) algorithms have been exploited in the past five years to enhance the generalization of emotion recognition models across subjects. The DA methods assume that calibration data (although unlabeled) exists in the target domain (new user). However, this assumption conflicts with the application scenario that the model should be deployed without the time-consuming calibration experiments. We argue that domain generalization (DG) is more reasonable than DA in these applications. DG learns how to generalize to unseen target domains by leveraging knowledge from multiple source domains, which provides a new possibility to train general models. In this paper, we for the first time benchmark state-of-the-art DG algorithms on EEG-based emotion recognition. Since convolutional neural network (CNN), deep brief network (DBN) and multilayer perceptron (MLP) have been proved to be effective emotion recognition models, we use these three models as solid baselines. Experimental results show that DG achieves an accuracy of up to 79.41\% on the SEED dataset for recognizing three emotions, indicting the potential of DG in zero-training emotion recognition when multiple sources are available.

preprint2022arXiv

BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval

The task of text-video retrieval aims to understand the correspondence between language and vision, has gained increasing attention in recent years. Previous studies either adopt off-the-shelf 2D/3D-CNN and then use average/max pooling to directly capture spatial features with aggregated temporal information as global video embeddings, or introduce graph-based models and expert knowledge to learn local spatial-temporal relations. However, the existing methods have two limitations: 1) The global video representations learn video temporal information in a simple average/max pooling manner and do not fully explore the temporal information between every two frames. 2) The graph-based local video representations are handcrafted, it depends heavily on expert knowledge and empirical feedback, which may not be able to effectively mine the higher-level fine-grained visual relations. These limitations result in their inability to distinguish videos with the same visual components but with different relations. To solve this problem, we propose a novel cross-modal retrieval framework, Bi-Branch Complementary Network (BiC-Net), which modifies transformer architecture to effectively bridge text-video modalities in a complementary manner via combining local spatial-temporal relation and global temporal information. Specifically, local video representations are encoded using multiple transformer blocks and additional residual blocks to learn spatio-temporal relation features, calling the module a Spatio-Temporal Residual transformer (SRT). Meanwhile, Global video representations are encoded using a multi-layer transformer block to learn global temporal features. Finally, we align the spatio-temporal relation and global temporal features with the text feature on two embedding spaces for cross-modal text-video retrieval.

preprint2022arXiv

Category-Association Based Similarity Matching for Novel Object Pick-and-Place Task

Robotic pick-and-place has been researched for a long time to cope with uncertainty of novel objects and changeable environments. Past works mainly focus on learning-based methods to achieve high precision. However, they have difficulty being generalized for the limitation of specified training models. To break through this drawback of learning-based approaches, we introduce a new perspective of similarity matching between novel objects and a known database based on category-association to achieve pick-and-place tasks with high accuracy and stabilization. We calculate the category name similarity using word embedding to quantify the semantic similarity between the categories of known models and the target real-world objects. With a similar model identified by a similarity prediction function, we preplan a series of robust grasps and imitate them to plan new grasps on the real-world target object. We also propose a distance-based method to infer the in-hand posture of objects and adjust small rotations to achieve stable placements under uncertainty. Through a real-world robotic pick-and-place experiment with a dozen of in-category and out-of-category novel objects, our method achieved an average success rate of 90.6% and 75.9% respectively, validating the capacity of generalization to diverse objects.

preprint2022arXiv

Chiral Phonon Activated Spin Seebeck Effect

Efficient generation of spin polarization is the central focus of spintronics. In magnetic materials, spin currents can arise from heat currents by the conventional spin Seebeck effect. Recently, chiral phonons with definite handedness and angular momenta have also produced profound impacts on multiple research fields. In this paper, starting with nonequilibrium distribution of chiral phonons under temperature gradient, we find a new spin selectivity effect - chiral phonon activated spin Seebeck (CPASS) effect, in chiral materials without magnetic order nor spin-orbit coupling. With both phonon-drag and band transport contributions, the CPASS coefficients are computed based on the Boltzmann transport theory. The spin accumulations by the CPASS effect quadratically increase with temperature gradient, and vary with the chemical potential modulation, thus enabling highly efficient and tunable spin generation. The CPASS effect provides a promising explanation on the chiral-induced spin selectivity effect and opportunities for designing advanced spintronic devices based on nonmagnetic chiral materials.

preprint2022arXiv

CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation

Brain tumor segmentation (BTS) in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes. With the great success of the ten-year BraTS challenges as well as the advances of CNN and Transformer algorithms, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects. However, existing studies hardly consider how to fuse the multi-modality images in a reasonable manner. In this paper, we leverage the clinical knowledge of how radiologists diagnose brain tumors from multiple MRI modalities and propose a clinical knowledge-driven brain tumor segmentation model, called CKD-TransBTS. Instead of directly concatenating all the modalities, we re-organize the input modalities by separating them into two groups according to the imaging principle of MRI. A dual-branch hybrid encoder with the proposed modality-correlated cross-attention block (MCCA) is designed to extract the multi-modality image features. The proposed model inherits the strengths from both Transformer and CNN with the local feature representation ability for precise lesion boundaries and long-range feature extraction for 3D volumetric images. To bridge the gap between Transformer and CNN features, we propose a Trans&CNN Feature Calibration block (TCFC) in the decoder. We compare the proposed model with five CNN-based models and six transformer-based models on the BraTS 2021 challenge dataset. Extensive experiments demonstrate that the proposed model achieves state-of-the-art brain tumor segmentation performance compared with all the competitors.

preprint2022arXiv

Cluster-based ensemble learning for wind power modeling with meteorological wind data

Optimal implementation and monitoring of wind energy generation hinge on reliable power modeling that is vital for understanding turbine control, farm operational optimization, and grid load balance. Based on the idea of similar wind condition leads to similar wind power; this paper constructs a modeling scheme that orderly integrates three types of ensemble learning algorithms, bagging, boosting, and stacking, and clustering approaches to achieve optimal power modeling. It also investigates applications of different clustering algorithms and methodology for determining cluster numbers in wind power modeling. The results reveal that all ensemble models with clustering exploit the intrinsic information of wind data and thus outperform models without it by approximately 15% on average. The model with the best farthest first clustering is computationally rapid and performs exceptionally well with an improvement of around 30%. The modeling is further boosted by about 5% by introducing stacking that fuses ensembles with varying clusters. The proposed modeling framework thus demonstrates promise by delivering efficient and robust modeling performance.

preprint2022arXiv

Coordinate-ordering-free Upper Bounds for Linear Insertion-Deletion Codes

The insertion-deletion codes were motivated to correct the synchronization errors. In this paper we prove several coordinate-ordering-free upper bounds on the insdel distances of linear codes, which are based on the generalized Hamming weights and the formation of minimum Hamming weight codewords. Our bounds are stronger than some previous known bounds. We apply these upper bounds to some cyclic codes and one algebraic-geometric code with any rearrangement of coordinate positions. Some strong upper bounds on the insdel distances of Reed-Muller codes with special coordinate-ordering are also given.

preprint2022arXiv

Deep Multimodal Fusion for Generalizable Person Re-identification

Person re-identification plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. Recently, leveraging the supervised or semi-unsupervised learning paradigms, which benefits from the large-scale datasets and strong computing performance, has achieved a competitive performance on a specific target domain. However, when Re-ID models are directly deployed in a new domain without target samples, they always suffer from considerable performance degradation and poor domain generalization. To address this challenge, we propose a Deep Multimodal Fusion network to elaborate rich semantic knowledge for assisting in representation learning during the pre-training. Importantly, a multimodal fusion strategy is introduced to translate the features of different modalities into the common space, which can significantly boost generalization capability of Re-ID model. As for the fine-tuning stage, a realistic dataset is adopted to fine-tune the pre-trained model for better distribution alignment with real-world data. Comprehensive experiments on benchmarks demonstrate that our method can significantly outperform previous domain generalization or meta-learning methods with a clear margin. Our source code will also be publicly available at https://github.com/JeremyXSC/DMF.

preprint2022arXiv

Deep Semi-supervised Metric Learning with Dual Alignment for Cervical Cancer Cell Detection

Deep learning has achieved unprecedented success in various object detection tasks with huge amounts of labeled data. However, obtaining large-scale annotations for medical images is extremely challenging due to the high demand of labour and expertise. In this paper, we propose a novel deep semi-supervised metric learning method to effectively leverage both labeled and unlabeled data for cervical cancer cell detection. Specifically, our model learns a metric space and conducts dual alignment of semantic features on both the proposal level and the prototype levels. On the proposal level, we align the unlabeled data with class proxies derived from the labeled data. We further align the prototypes of the labeled and unlabeled data to alleviate the influence of possibly noisy pseudo labels generated at the proposal alignment stage. Moreover, we adopt a memory bank to store the labeled prototypes, which significantly enrich the metric learning information from larger batches. Extensive experiments show our proposed method outperforms other state-of-the-art semi-supervised approaches consistently, demonstrating the efficacy of our proposed deep semi-supervised metric learning with dual alignment.

preprint2022arXiv

Dual Decision Improves Open-Set Panoptic Segmentation

Open-set panoptic segmentation (OPS) problem is a new research direction aiming to perform segmentation for both \known classes and \unknown classes, i.e., the objects ("things") that are never annotated in the training set. The main challenges of OPS are twofold: (1) the infinite possibility of the \unknown object appearances makes it difficult to model them from a limited number of training data. (2) at training time, we are only provided with the "void" category, which essentially mixes the "unknown thing" and "background" classes. We empirically find that directly using "void" category to supervise \known class or "background" classifiers without screening will lead to an unsatisfied OPS result. In this paper, we propose a divide-and-conquer scheme to develop a dual decision process for OPS. We show that by properly combining a \known class discriminator with an additional class-agnostic object prediction head, the OPS performance can be significantly improved. Specifically, we first propose to create a classifier with only \known categories and let the "void" class proposals achieve low prediction probability from those categories. Then we distinguish the "unknown things" from the background by using the additional object prediction head. To further boost performance, we introduce "unknown things" pseudo-labels generated from up-to-date models to enrich the training set. Our extensive experimental evaluation shows that our approach significantly improves \unknown class panoptic quality, with more than 30\% relative improvements than the existing best-performed method.

preprint2022arXiv

Dual-Distribution Discrepancy for Anomaly Detection in Chest X-Rays

Chest X-ray (CXR) is the most typical radiological exam for diagnosis of various diseases. Due to the expensive and time-consuming annotations, detecting anomalies in CXRs in an unsupervised fashion is very promising. However, almost all of the existing methods consider anomaly detection as a one-class classification (OCC) problem. They model the distribution of only known normal images during training and identify the samples not conforming to normal profile as anomalies in the testing phase. A large number of unlabeled images containing anomalies are thus ignored in the training phase, although they are easy to obtain in clinical practice. In this paper, we propose a novel strategy, Dual-distribution Discrepancy for Anomaly Detection (DDAD), utilizing both known normal images and unlabeled images. The proposed method consists of two modules. During training, one module takes both known normal and unlabeled images as inputs, capturing anomalous features from unlabeled images in some way, while the other one models the distribution of only known normal images. Subsequently, inter-discrepancy between the two modules, and intra-discrepancy inside the module that is trained on only normal images are designed as anomaly scores to indicate anomalies. Experiments on three CXR datasets demonstrate that the proposed DDAD achieves consistent, significant gains and outperforms state-of-the-art methods. Code is available at https://github.com/caiyu6666/DDAD.

preprint2022arXiv

Exploiting Correspondences with All-pairs Correlations for Multi-view Depth Estimation

Multi-view depth estimation plays a critical role in reconstructing and understanding the 3D world. Recent learning-based methods have made significant progress in it. However, multi-view depth estimation is fundamentally a correspondence-based optimization problem, but previous learning-based methods mainly rely on predefined depth hypotheses to build correspondence as the cost volume and implicitly regularize it to fit depth prediction, deviating from the essence of iterative optimization based on stereo correspondence. Thus, they suffer unsatisfactory precision and generalization capability. In this paper, we are the first to explore more general image correlations to establish correspondences dynamically for depth estimation. We design a novel iterative multi-view depth estimation framework mimicking the optimization process, which consists of 1) a correlation volume construction module that models the pixel similarity between a reference image and source images as all-to-all correlations; 2) a flow-based depth initialization module that estimates the depth from the 2D optical flow; 3) a novel correlation-guided depth refinement module that reprojects points in different views to effectively fetch relevant correlations for further fusion and integrate the fused correlation for iterative depth update. Without predefined depth hypotheses, the fused correlations establish multi-view correspondence in an efficient way and guide the depth refinement heuristically. We conduct sufficient experiments on ScanNet, DeMoN, ETH3D, and 7Scenes to demonstrate the superiority of our method on multi-view depth estimation and its best generalization ability.

preprint2022arXiv

Federated Spatial Reuse Optimization in Next-Generation Decentralized IEEE 802.11 WLANs

As wireless standards evolve, more complex functionalities are introduced to address the increasing requirements in terms of throughput, latency, security, and efficiency. To unleash the potential of such new features, artificial intelligence (AI) and machine learning (ML) are currently being exploited for deriving models and protocols from data, rather than by hand-programming. In this paper, we explore the feasibility of applying ML in next-generation wireless local area networks (WLANs). More specifically, we focus on the IEEE 802.11ax spatial reuse (SR) problem and predict its performance through federated learning (FL) models. The set of FL solutions overviewed in this work is part of the 2021 International Telecommunication Union (ITU) AI for 5G Challenge.

preprint2022arXiv

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfitting problem. In the proposed method, the model of a depthwise separable convolutional network with channel attention is trained with a prototypical loss function. Experimental datasets are extracted from three public speech corpora: Aishell-2, VoxCeleb1 and TORGO. Experimental results show that the proposed method exceeds state-of-the-art methods for few-shot speaker identification in terms of accuracy and F-score.

preprint2022arXiv

Giant and Reversible Electronic Structure Evolution in a Magnetic Topological Material EuCd2As2

The electronic structure and the physical properties of quantum materials can be significantly altered by charge carrier doping and magnetic state transition. Here we report a discovery of a giant and reversible electronic structure evolution with doping in a magnetic topological material. By performing high-resolution angle-resolved photoemission measurements on EuCd2As2,we found that a huge amount of hole doping can be introduced into the sample surface due to surface absorption. The electronic structure exhibits a dramatic change with the hole doping which can not be described by a rigid band shift. Prominent band splitting is observed at high doping which corresponds to a doping-induced magnetic transition at low temperature (below -15 K) from an antiferromagnetic state to a ferromagnetic state. These results have established a detailed electronic phase diagram of EuCd2As2 where the electronic structure and the magnetic structure change systematically and dramatically with the doping level. They further suggest that the transport, magnetic and topological properties of EuCd2As2 can be greatly modified by doping. These work will stimulate further investigations to explore for new phenomena and properties in doping this magnetic topological material.

preprint2022arXiv

Heteromoiré Engineering on Magnetic Bloch Transport in Twisted Graphene Superlattices

Localized electrons subject to applied magnetic fields can restart to propagate freely through the lattice in delocalized magnetic Bloch states (MBSs) when the lattice periodicity is commensurate with the magnetic length. Twisted graphene superlattices with moiré wavelength tunability enable experimental access to the unique delocalization in a controllable fashion. Here we report the observation and characterization of high-temperature Brown-Zak (BZ) oscillations which come in two types, 1/B and B periodicity, originating from the generation of integer and fractional MBSs, in the twisted bilayer and trilayer graphene superlattices, respectively. Coexisting periodic-in-1/B oscillations assigned to different moiré wavelengths, are dramatically observed in small-angle twisted bilayer graphene, which may arise from angle-disorder-induced in-plane heteromoiré superlattices. Moreover, the vertical stacking of heteromoiré supercells in double-twisted trilayer graphene results in a mega-sized superlattice. The exotic superlattice contributes to the periodic-in-B oscillation and dominates the magnetic Bloch transport.

preprint2022arXiv

HIPASS study of southern ultradiffuse galaxies and low surface brightness galaxies

We present results from an HI counterpart search using the HI Parkes All Sky Survey (HIPASS) for a sample of low surface brightness galaxies (LSBGs) and ultradiffuse galaxies (UDGs) identified from the Dark Energy Survey (DES). We aimed to establish the redshifts of the DES LSBGs to determine the UDG fraction and understand their properties. Out of 409 galaxies investigated, none were unambiguously detected in HI. Our study was significantly hampered by the high spectral rms of HIPASS and thus in this paper we do not make any strong conclusive claims but discuss the main trends and possible scenarios our results reflect. The overwhelming number of non-detections suggest that: (A) Either all the LSBGs in the groups, blue or red, have undergone environment aided pre-processing and are HI deficient or the majority of them are distant galaxies, beyond the HIPASS detection threshold. (B) The sample investigated is most likely dominated by galaxies with HI masses typical of dwarf galaxies. Had there been Milky Way (MW) size (R_e) galaxies in our sample, with proportionate HI content, they would have been detected, even with the limitations imposed by the HIPASS spectral quality. This leads us to infer that if some of the LSBGs have MW size optical diameters, their HI content is possibly in the dwarf range. More sensitive observations using the SKA precursors in future may resolve these questions.

preprint2022arXiv

InsMix: Towards Realistic Generative Data Augmentation for Nuclei Instance Segmentation

Nuclei Segmentation from histology images is a fundamental task in digital pathology analysis. However, deep-learning-based nuclei segmentation methods often suffer from limited annotations. This paper proposes a realistic data augmentation method for nuclei segmentation, named InsMix, that follows a Copy-Paste-Smooth principle and performs morphology-constrained generative instance augmentation. Specifically, we propose morphology constraints that enable the augmented images to acquire luxuriant information about nuclei while maintaining their morphology characteristics (e.g., geometry and location). To fully exploit the pixel redundancy of the background and improve the model's robustness, we further propose a background perturbation method, which randomly shuffles the background patches without disordering the original nuclei distribution. To achieve contextual consistency between original and template instances, a smooth-GAN is designed with a foreground similarity encoder (FSE) and a triplet loss. We validated the proposed method on two datasets, i.e., Kumar and CPS datasets. Experimental results demonstrate the effectiveness of each component and the superior performance achieved by our method to the state-of-the-art methods.

preprint2022arXiv

Instance and Panoptic Segmentation Using Conditional Convolutions

We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst (conditional convolutions for instance and panoptic segmentation). In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance. In contrast, we propose to attend to the instances with dynamic conditional convolutions. Instead of using instance-wise ROIs as inputs to the instance mask head of fixed weights, we design dynamic instance-aware mask heads, conditioned on the instances to be predicted. CondInst enjoys three advantages: 1.) Instance and panoptic segmentation are unified into a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2.) The elimination of the ROI cropping also significantly improves the output instance mask resolution. 3.) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference time per instance and making the overall inference time almost constant, irrelevant to the number of instances. We demonstrate a simpler method that can achieve improved accuracy and inference speed on both instance and panoptic segmentation tasks. On the COCO dataset, we outperform a few state-of-the-art methods. We hope that CondInst can be a strong baseline for instance and panoptic segmentation. Code is available at: https://git.io/AdelaiDet

preprint2022arXiv

Knowledge distillation with error-correcting transfer learning for wind power prediction

Wind power prediction, especially for turbines, is vital for the operation, controllability, and economy of electricity companies. Hybrid methodologies combining advanced data science with weather forecasting have been incrementally applied to the predictions. Nevertheless, individually modeling massive turbines from scratch and downscaling weather forecasts to turbine size are neither easy nor economical. Aiming at it, this paper proposes a novel framework with mathematical underpinnings for turbine power prediction. This framework is the first time to incorporate knowledge distillation into energy forecasting, enabling accurate and economical constructions of turbine models by learning knowledge from the well-established park model. Besides, park-scale weather forecasts non-explicitly are mapped to turbines by transfer learning of predicted power errors, achieving model correction for better performance. The proposed framework is deployed on five turbines featuring various terrains in an Arctic wind park, the results are evaluated against the competitors of ablation investigation. The major findings reveal that the proposed framework, developed on favorable knowledge distillation and transfer learning parameters tuning, yields performance boosts from 3.3 % to 23.9 % over its competitors. This advantage also exists in terms of wind energy physics and computing efficiency, which are verified by the prediction quality rate and calculation time.

preprint2022arXiv

Likelihood Scores for Sparse Signal and Change-Point Detection

We consider here the identification of change-points on large-scale data streams. The objective is to find the most efficient way of combining information across data stream so that detection is possible under the smallest detectable change magnitude. The challenge comes from the sparsity of change-points when only a small fraction of data streams undergo change at any point in time. The most successful approach to the sparsity issue so far has been the application of hard thresholding such that only local scores from data streams exhibiting significant changes are considered and added. However the identification of an optimal threshold is a difficult one. In particular it is unlikely that the same threshold is optimal for different levels of sparsity. We propose here a sparse likelihood score for identifying a sparse signal. The score is a likelihood ratio for testing between the null hypothesis of no change against an alternative hypothesis in which the change-points or signals are barely detectable. By the Neyman-Pearson Lemma this score has maximum detection power at the given alternative. The outcome is that we have a scoring of data streams that is successful in detecting at the boundary of the detectable region of signals and change-points. The likelihood score can be seen as a soft thresholding approach to sparse signal and change-point detection in which local scores that indicate small changes are down-weighted much more than local scores indicating large changes. We are able to show second-order optimality of the sparsity likelihood score in the sense of achieving successful detection at the minimum detectable order of change magnitude as well as at the minimum detection asymptotic constant with respect this order of change.

preprint2022arXiv

Linear MSRD codes with Various Matrix Sizes and Unrestricted Lengths

A sum-rank-metric code attaining the Singleton bound is called maximum sum-rank distance (MSRD). MSRD codes have been constructed for some parameter cases. In this paper we construct a linear MSRD code over an arbitrary field ${\bf F}_q$ with various matrix sizes $n_1>n_2>\cdots>n_t$ satisfying $n_i \geq n_{i+1}^2+\cdots+n_t^2$ for $i=1, 2, \ldots, t-1$ for any given minimum sum-rank distance.

preprint2022arXiv

List-decodable Codes and Covering Codes

The list-decodable code has been an active topic in theoretical computer science.There are general results about the list-decodability to the Johnson radius and the list-decoding capacity theorem. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new general simple but strong upper bounds for list-decodable codes in general finite metric spaces based on various covering codes. The general covering code upper bounds can be applied to the case that the volumes of the balls depend on the centers, not only on the radius. Then any good upper bound on the covering radius or the size of covering code imply a good upper bound on the sizes of list-decodable codes. Our results give exponential improvements on the recent generalized Singleton upper bound in STOC 2020 for Hamming metric list-decodable codes, when the code lengths are large. A generalized Singleton upper bound for average-radius list-decodable codes is also given from our general covering code upper bound. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes. We apply our general covering code upper bounds for list-decodable rank-metric codes, list-decodable subspace codes, list-decodable insertion codes list-decodable deletion codes,list-decodable sum-rank-metric codes and list decodable permutation codes. Some new better results about non-list-decodability of rank-metric codes, subspace codes, sum-rank-metric codes and permutation codes with various metrics are obtained.

preprint2022arXiv

Lower and Upper Bounds for Numbers of Linear Regions of Graph Convolutional Networks

The research for characterizing GNN expressiveness attracts much attention as graph neural networks achieve a champion in the last five years. The number of linear regions has been considered a good measure for the expressivity of neural networks with piecewise linear activation. In this paper, we present some estimates for the number of linear regions of the classic graph convolutional networks (GCNs) with one layer and multiple-layer scenarios. In particular, we obtain an optimal upper bound for the maximum number of linear regions for one-layer GCNs, and the upper and lower bounds for multi-layer GCNs. The simulated estimate shows that the true maximum number of linear regions is possibly closer to our estimated lower bound. These results imply that the number of linear regions of multi-layer GCNs is exponentially greater than one-layer GCNs per parameter in general. This suggests that deeper GCNs have more expressivity than shallow GCNs.

preprint2022arXiv

MDS Entanglement-Assisted Quantum Codes of Arbitrary Lengths and Arbitrary Distances

Quantum error correction is fundamentally important for quantum information processing and computation. Quantum error correction codes have been studied and constructed since the pioneering papers of Shor and Steane. Optimal (called MDS) $q$-qubit quantum codes attaining the quantum Singleton bound were constructed for very restricted lengths $n \leq q^2+1$. Entanglement-assisted quantum error correction (EAQEC) code was proposed to use the pre-shared maximally entangled state for the enhancing of error correction capability. Recently there have been a lot of constructions of MDS EAQEC codes attaining the quantum Singleton bound for very restricted lengths. In this paper we construct such MDS EAQEC $[[n, k, d, c]]_q$ codes for arbitrary $n$ satisfying $n \leq q^2+1$ and arbitrary distance $d\leq \frac{n+2}{2}$. It is proved that for any given length $n$ satisfying $O(q^2)=n \leq q^2+1$ and any given distance $d$ satisfying $ O(q^2)=d \leq \frac{n+2}{2}$, there exist at least $O(q^2)$ MDS EAQEC $[[n, k, d, c]]_q$ codes with different $c$ parameters. Our results show that there are much more MDS entanglement-assisted quantum codes than MDS quantum codes without consumption of the maximally entangled state. This is natural from the physical point of view. Our method can also be applied to construct MDS entanglement-assisted quantum codes from the generalized MDS twisted Reed-Solomon codes.

preprint2022arXiv

MixNN: A design for protecting deep learning models

In this paper, we propose a novel design, called MixNN, for protecting deep learning model structure and parameters. The layers in a deep learning model of MixNN are fully decentralized. It hides communication address, layer parameters and operations, and forward as well as backward message flows among non-adjacent layers using the ideas from mix networks. MixNN has following advantages: 1) an adversary cannot fully control all layers of a model including the structure and parameters, 2) even some layers may collude but they cannot tamper with other honest layers, 3) model privacy is preserved in the training phase. We provide detailed descriptions for deployment. In one classification experiment, we compared a neural network deployed in a virtual machine with the same one using the MixNN design on the AWS EC2. The result shows that our MixNN retains less than 0.001 difference in terms of classification accuracy, while the whole running time of MixNN is about 7.5 times slower than the one running on a single virtual machine.

preprint2022arXiv

Monte Carlo study of duality and the Berezinskii-Kosterlitz-Thouless phase transitions of the two-dimensional $q$-state clock model in flow representations

The two-dimensional $q$-state clock model for $q \geq 5$ undergoes two Berezinskii-Kosterlitz-Thouless (BKT) phase transitions as temperature decreases. Here we report an extensive worm-type simulation of the square-lattice clock model for $q=$5--9 in a pair of flow representations, from the high- and low-temperature expansions, respectively. By finite-size scaling analysis of susceptibility-like quantities, we determine the critical points with a precision improving over the existing results. Due to the dual flow representations, each point in the critical region is observed to simultaneously exhibit a pair of anomalous dimensions, which are $η_1=1/4$ and $η_2 = 4/q^2$ at the two BKT transitions. Further, the approximate self-dual points $β_{\rm sd}(L)$, defined by the stringent condition that the susceptibility like quantities in both flow representations are identical, are found to be nearly independent of system size $L$ and behave as $β_{\rm sd} \simeq q/2π$ asymptotically at the large-$q$ limit. The exponent $η$ at $β_{\rm sd}$ is consistent with $1/q$ within statistical error as long as $q \geq 5$. Based on this, we further conjecture that $η(β_{\rm sd}) = 1/q$ holds exactly and is universal for systems in the $q$-state clock universality class. Our work provides a vivid demonstration of rich phenomena associated with the duality and self-duality of the clock model in two dimensions.

preprint2022arXiv

Multi-sequence segmentation via score and higher-criticism tests

We propose local segmentation of multiple sequences sharing a common time- or location-index, building upon the single sequence local segmentation methods of Niu and Zhang (2012) and Fang, Li and Siegmund (2016). We also propose reverse segmentation of multiple sequences that is new even in the single sequence context. We show that local segmentation estimates change-points consistently for both single and multiple sequences, and that both methods proposed here detect signals well, with the reverse segmentation method outperforming a large number of known segmentation methods on a commonly used single sequence test scenario. We show that on a recent allele-specific copy number study involving multiple cancer patients, the simultaneous segmentations of the DNA sequences of all the patients provide information beyond that obtained by segmentation of the sequences one at a time.

preprint2022arXiv

Neighbor Enhanced Graph Convolutional Networks for Node Classification and Recommendation

The recently proposed Graph Convolutional Networks (GCNs) have achieved significantly superior performance on various graph-related tasks, such as node classification and recommendation. However, currently researches on GCN models usually recursively aggregate the information from all the neighbors or randomly sampled neighbor subsets, without explicitly identifying whether the aggregated neighbors provide useful information during the graph convolution. In this paper, we theoretically analyze the affection of the neighbor quality over GCN models' performance and propose the Neighbor Enhanced Graph Convolutional Network (NEGCN) framework to boost the performance of existing GCN models. Our contribution is three-fold. First, we at the first time propose the concept of neighbor quality for both node classification and recommendation tasks in a general theoretical framework. Specifically, for node classification, we propose three propositions to theoretically analyze how the neighbor quality affects the node classification performance of GCN models. Second, based on the three proposed propositions, we introduce the graph refinement process including specially designed neighbor evaluation methods to increase the neighbor quality so as to boost both the node classification and recommendation tasks. Third, we conduct extensive node classification and recommendation experiments on several benchmark datasets. The experimental results verify that our proposed NEGCN framework can significantly enhance the performance for various typical GCN models on both node classification and recommendation tasks.

preprint2022arXiv

New graph-based multi-sample tests for high-dimensional and non-Euclidean data

Testing the equality in distributions of multiple samples is a common task in many fields. However, this problem for high-dimensional or non-Euclidean data has not been well explored. In this paper, we propose new nonparametric tests based on a similarity graph constructed on the pooled observations from multiple samples, and make use of both within-sample edges and between-sample edges, a straightforward but yet not explored idea. The new tests exhibit substantial power improvements over existing tests for a wide range of alternatives. We also study the asymptotic distributions of the test statistics, offering easy off-the-shelf tools for large datasets. The new tests are illustrated through an analysis of the age image dataset.

preprint2022arXiv

Non-Gaussianities in Collider Energy Flux

The microscopic dynamics of particle collisions is imprinted into the statistical properties of asymptotic energy flux, much like the dynamics of inflation is imprinted into the cosmic microwave background. This energy flux is characterized by correlation functions $\langle \mathcal{E}(\vec n_1)\cdots \mathcal{E}(\vec n_k) \rangle$ of energy flow operators $ \mathcal{E}(\vec n)$. There has been significant recent progress in studying energy flux, including the calculation of multi-point correlation functions and their direct measurement inside high-energy jets at the Large Hadron Collider (LHC). In this paper, we build on these advances by defining a notion of "celestial non-gaussianity" as a ratio of the three-point function to a product of two-point functions. We show that this celestial non-gaussianity is under perturbative control within jets at the LHC, allowing us to cleanly access the non-gaussian interactions of quarks and gluons. We find good agreement between perturbative calculations of the non-gaussianity and a charged-particle-based analysis using CMS Open Data, and we observe a strong non-gaussianity peaked in the "flattened triangle" regime. The ability to robustly study three-point correlations is a significant step in advancing our understanding of jet substructure at the LHC. We anticipate that the celestial non-gaussianity, and its generalizations, will play an important role in the development of higher-order parton showers simulations and in the hunt for ever more subtle signals of potential new physics within jets.

preprint2022arXiv

On The Low Speed Limits of Lorentz's Transformation

This article contains a digest of the theory of electromagnetism and a review of the transformation between inertial frames, especially under low speed limits. The covariant nature of the Maxwell's equations is explained using the conventional language. We show that even under low speed limits, the relativistic effects should not be neglected to get a self-consistent theory of the electromagnetic fields, unless the intrinsic dynamics of these fields has been omitted completely. The quasi-static limits, where the relativistic effects can be partly neglected are also reviewed, to clarify some common misunderstandings and imprecise use of the theory in presence of moving media and other related situations. The discussion presented in this paper provide a clear view of why classical electromagnetic theory is relativistic in its essence.

preprint2022arXiv

ORF-Net: Deep Omni-supervised Rib Fracture Detection from Chest CT Scans

Most of the existing object detection works are based on the bounding box annotation: each object has a precise annotated box. However, for rib fractures, the bounding box annotation is very labor-intensive and time-consuming because radiologists need to investigate and annotate the rib fractures on a slice-by-slice basis. Although a few studies have proposed weakly-supervised methods or semi-supervised methods, they could not handle different forms of supervision simultaneously. In this paper, we proposed a novel omni-supervised object detection network, which can exploit multiple different forms of annotated data to further improve the detection performance. Specifically, the proposed network contains an omni-supervised detection head, in which each form of annotation data corresponds to a unique classification branch. Furthermore, we proposed a dynamic label assignment strategy for different annotated forms of data to facilitate better learning for each branch. Moreover, we also design a confidence-aware classification loss to emphasize the samples with high confidence and further improve the model's performance. Extensive experiments conducted on the testing dataset show our proposed method outperforms other state-of-the-art approaches consistently, demonstrating the efficacy of deep omni-supervised learning on improving rib fracture detection performance.

preprint2022arXiv

Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification

Deep learning models were frequently reported to learn from shortcuts like dataset biases. As deep learning is playing an increasingly important role in the modern healthcare system, it is of great need to combat shortcut learning in medical data as well as develop unbiased and trustworthy models. In this paper, we study the problem of developing debiased chest X-ray diagnosis models from the biased training data without knowing exactly the bias labels. We start with the observations that the imbalance of bias distribution is one of the key reasons causing shortcut learning, and the dataset biases are preferred by the model if they were easier to be learned than the intended features. Based on these observations, we proposed a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels via generalized cross entropy loss and then trains a debiased model using pseudo bias labels and bias-balanced softmax function. We constructed several chest X-ray datasets with various dataset bias situations and demonstrated with extensive experiments that our proposed method achieved consistent improvements over other state-of-the-art approaches.

preprint2022arXiv

QuickSkill: Novice Skill Estimation in Online Multiplayer Games

Matchmaking systems are vital for creating fair matches in online multiplayer games, which directly affects players' satisfactions and game experience. Most of the matchmaking systems largely rely on precise estimation of players' game skills to construct equitable games. However, the skill rating of a novice is usually inaccurate, as current matchmaking rating algorithms require considerable amount of games for learning the true skill of a new player. Using these unreliable skill scores at early stages for matchmaking usually leads to disparities in terms of team performance, which causes negative game experience. This is known as the ''cold-start'' problem for matchmaking rating algorithms. To overcome this conundrum, this paper proposes QuickSKill, a deep learning based novice skill estimation framework to quickly probe abilities of new players in online multiplayer games. QuickSKill extracts sequential performance features from initial few games of a player to predict his/her future skill rating with a dedicated neural network, thus delivering accurate skill estimation at the player's early game stage. By employing QuickSKill for matchmaking, game fairness can be dramatically improved in the initial cold-start period. We conduct experiments in a popular mobile multiplayer game in both offline and online scenarios. Results obtained with two real-world anonymized gaming datasets demonstrate that proposed QuickSKill delivers precise estimation of game skills for novices, leading to significantly lower team skill disparities and better player game experience. To the best of our knowledge, proposed QuickSKill is the first framework that tackles the cold-start problem for traditional skill rating algorithms.

preprint2022arXiv

RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning

In recent years, reinforcement learning has faced several challenges in the multi-agent domain, such as the credit assignment issue. Value function factorization emerges as a promising way to handle the credit assignment issue under the centralized training with decentralized execution (CTDE) paradigm. However, existing value function factorization methods cannot deal with ad-hoc cooperation, that is, adapting to new configurations of teammates at test time. Specifically, these methods do not explicitly utilize the relationship between agents and cannot adapt to different sizes of inputs. To address these limitations, we propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based relation encoder to encode the topological structure between agents. Furthermore, RACA utilizes an attention-based observation abstraction mechanism that can generalize to an arbitrary number of teammates with a fixed number of parameters. Experiments demonstrate that our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.

preprint2022arXiv

RHONN Modelling-enabled Nonlinear Predictive Control for Lateral Dynamics Stabilization of An In-wheel Motor Driven Vehicle

Featuring the fast response and flexibility in control allocation, an electric vehicle with in-wheel motors is a good platform for implementing advanced vehicle dynamics control. Among many active safety functions of an in-wheel motor driven vehicle (IMDV), lateral stability control is a key technology, which can be realized through torque vectoring. To further advance the lateral stabilization performance of the IMDV, in this paper a novel data-driven nonlinear model predictive control (NMPC) is proposed based the recurrent high-order neural network (RHONN) modelling method. First, the new RHONN model is developed to represent vehicle's nonlinear dynamic behaviors. Different from the conventional physics-based modelling method, the RHONN model only needs data and forms high-order polynomials. Based on the RHONN model, the steady-state responses of vehicle's yaw rate and sideslip angle are iteratively optimized and set as the control objectives for low-level controller, aiming to improve the system robustness. Besides, a nonlinear model predictive controller is designed based on the RHONN, which is expected to improve the prediction accuracy during the receding horizon control. Further, a constrained optimization problem is formulated to derive the required yaw moment for vehicle lateral dynamics stabilization. Finally, the performance of the developed RHONN-based nonlinear MPC is validated on an IMDV in the CarSim/Simulink simulation environment. The validation results show that the developed approach outperforms the conventional method, and further improves the stable margin of the system. It is able to enhance the lateral stabilization performance of the IMDV under various driving scenarios, demonstrating the feasibility and effectiveness of the proposed approach.

preprint2022arXiv

S2Looking: A Satellite Side-Looking Dataset for Building Change Detection

Building-change detection underpins many important applications, especially in the military and crisis-management domains. Recent methods used for change detection have shifted towards deep learning, which depends on the quality of its training data. The assembly of large-scale annotated satellite imagery datasets is therefore essential for global building-change surveillance. Existing datasets almost exclusively offer near-nadir viewing angles. This limits the range of changes that can be detected. By offering larger observation ranges, the scroll imaging mode of optical satellites presents an opportunity to overcome this restriction. This paper therefore introduces S2Looking, a building-change-detection dataset that contains large-scale side-looking satellite images captured at various off-nadir angles. The dataset consists of 5000 bitemporal image pairs of rural areas and more than 65,920 annotated instances of changes throughout the world. The dataset can be used to train deep-learning-based change-detection algorithms. It expands upon existing datasets by providing (1) larger viewing angles; (2) large illumination variances; and (3) the added complexity of rural images. To facilitate {the} use of the dataset, a benchmark task has been established, and preliminary tests suggest that deep-learning algorithms find the dataset significantly more challenging than the closest-competing near-nadir dataset, LEVIR-CD+. S2Looking may therefore promote important advances in existing building-change-detection algorithms. The dataset is available at https://github.com/S2Looking/.

preprint2022arXiv

Semantic decoupled representation learning for remote sensing image change detection

Contemporary transfer learning-based methods to alleviate the data insufficiency in change detection (CD) are mainly based on ImageNet pre-training. Self-supervised learning (SSL) has recently been introduced to remote sensing (RS) for learning in-domain representations. Here, we propose a semantic decoupled representation learning for RS image CD. Typically, the object of interest (e.g., building) is relatively small compared to the vast background. Different from existing methods expressing an image into one representation vector that may be dominated by irrelevant land-covers, we disentangle representations of different semantic regions by leveraging the semantic mask. We additionally force the model to distinguish different semantic representations, which benefits the recognition of objects of interest in the downstream CD task. We construct a dataset of bitemporal images with semantic masks in an effortless manner for pre-training. Experiments on two CD datasets show our model outperforms ImageNet pre-training, in-domain supervised pre-training, and several recent SSL methods.

preprint2022arXiv

Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

Supervised deep learning models depend on massive labeled data. Unfortunately, it is time-consuming and labor-intensive to collect and annotate bitemporal samples containing desired changes. Transfer learning from pre-trained models is effective to alleviate label insufficiency in remote sensing (RS) change detection (CD). We explore the use of semantic information during pre-training. Different from traditional supervised pre-training that learns the mapping from image to label, we incorporate semantic supervision into the self-supervised learning (SSL) framework. Typically, multiple objects of interest (e.g., buildings) are distributed in various locations in an uncurated RS image. Instead of manipulating image-level representations via global pooling, we introduce point-level supervision on per-pixel embeddings to learn spatially-sensitive features, thus benefiting downstream dense CD. To achieve this, we obtain multiple points via class-balanced sampling on the overlapped area between views using the semantic mask. We learn an embedding space where background and foreground points are pushed apart, and spatially aligned points across views are pulled together. Our intuition is the resulting semantically discriminative representations invariant to irrelevant changes (illumination and unconcerned land covers) may help change recognition. We collect large-scale image-mask pairs freely available in the RS community for pre-training. Extensive experiments on three CD datasets verify the effectiveness of our method. Ours significantly outperforms ImageNet pre-training, in-domain supervision, and several SSL methods. Empirical results indicate our pre-training improves the generalization and data efficiency of the CD model. Notably, we achieve competitive results using 20% training data than baseline (random initialization) using 100% data. Our code is available.

preprint2022arXiv

Star Product PIR Schemes with Colluding Servers over Small Fields

Private Information Retrieval (PIR) was first proposed by B. Chor, O. Goldreich, E. Kushilevitz and M. Sudan in their 1995 FOCS paper. For MDS coded distributed storage system private information retrieval was proposed and the capacity of PIR schemes for MDS coded distributed storage was studied. Star product PIR schemes from general coded distributed storage system with colluding servers were constructed over general finite fields. These star product schemes has no restriction on the sizes of fields and can be constructed for coded distributed storage across large number of servers. In this paper we first propose and prove the Singleton type upper bound on the storage rate, ratio of colluding servers and the retrieval rate of the star product PIR schemes. Secondly star product PIR schemes for coded distributed storage from algebraic geometry (AG) codes are analysed. We prove that when the number of the servers goes to the infinity, star product PIR schemes with colluding servers for AG-coded distributed storage have parameters closing to the Singleton type upper bound if the field is large. Comparing with the star product PIR schemes for Reed-Solomon coded and Reed-Muller coded distributed storage we show that PIR schemes with colluding servers for AG coded distributed storage have their performance advantages. AG-code based star product PIR schemes with colluding, Byzantine and unresponsive servers are discussed. $q$-ary cyclic code based star product PIR schemes for replicated data storage are also studied. When the storage code is the Reed-Muller code, the best choice of the retrieval code is not always the Reed-Muller code.

preprint2022arXiv

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train the models with minimal accuracy loss, which attributes to the high redundancy of video. SBP keeps all forward paths but randomly and independently removes the backward paths for each network layer in each training step. It reduces the GPU memory cost by eliminating the need to cache activation values corresponding to the dropped backward paths, whose amount can be controlled by an adjustable keep-ratio. Experiments show that SBP can be applied to a wide range of models for video tasks, leading to up to 80.0% GPU memory saving and 10% training speedup with less than 1% accuracy drop on action recognition and temporal action detection.

preprint2022arXiv

Strict Half-Singleton Bound, Strict Direct Upper Bound for Linear Insertion-Deletion Codes and Optimal Codes

Insertion-deletion codes (insdel codes for short) are used for correcting synchronization errors in communications, and in other many interesting fields such as DNA storage, date analysis, race-track memory error correction and language processing, and have recently gained a lot of attention. To determine the insdel distances of linear codes is a very challenging problem. The half-Singleton bound on the insdel distances of linear codes due to Cheng-Guruswami-Haeupler-Li is a basic upper bound on the insertion-deletion error-correcting capabilities of linear codes. On the other hand the natural direct upper bound $d_I(\mathcal C) \leq 2d_H(\mathcal C)$ is valid for any insdel code. In this paper, for a linear insdel code $\mathcal C$ we propose a strict half-Singleton upper bound $d_I(\mathcal C) \leq 2(n-2k+1)$ if $\mathcal C$ does not contain the codeword with all 1s, and a stronger direct upper bound $d_I(\mathcal C) \leq 2(d_H(\mathcal C)-t)$ under a weak condition, where $t\geq 1$ is a positive integer determined by the generator matrix. We also give optimal linear insdel codes attaining our strict half-Singleton bound and direct upper bound, and show that the code length of optimal binary linear insdel codes with respect to the (strict) half-Singleton bound is about twice the dimension. Interestingly explicit optimal linear insdel codes attaining the (strict) half-Singleton bound, with the code length being independent of the finite field size, are given.

preprint2022arXiv

TAG: Learning Circuit Spatial Embedding From Layouts

Analog and mixed-signal (AMS) circuit designs still rely on human design expertise. Machine learning has been assisting circuit design automation by replacing human experience with artificial intelligence. This paper presents TAG, a new paradigm of learning the circuit representation from layouts leveraging text, self-attention and graph. The embedding network model learns spatial information without manual labeling. We introduce text embedding and a self-attention mechanism to AMS circuit learning. Experimental results demonstrate the ability to predict layout distances between instances with industrial FinFET technology benchmarks. The effectiveness of the circuit representation is verified by showing the transferability to three other learning tasks with limited data in the case studies: layout matching prediction, wirelength estimation, and net parasitic capacitance prediction.

preprint2022arXiv

Tensor Network States with Low-Rank Tensors

Tensor networks are used to efficiently approximate states of strongly-correlated quantum many-body systems. More generally, tensor network approximations may allow to reduce the costs for operating on an order-$N$ tensor from exponential to polynomial in $N$, and this has become a popular approach for machine learning. We introduce the idea of imposing low-rank constraints on the tensors that compose the tensor network. With this modification, the time and space complexities for the network optimization can be substantially reduced while maintaining high accuracy. We detail this idea for tree tensor network states (TTNS) and projected entangled-pair states. Simulations of spin models on Cayley trees with low-rank TTNS exemplify the effect of rank constraints on the expressive power. We find that choosing the tensor rank $r$ to be on the order of the bond dimension $m$, is sufficient to obtain high-accuracy groundstate approximations and to substantially outperform standard TTNS computations. Thus low-rank tensor networks are a promising route for the simulation of quantum matter and machine learning on large data sets.

preprint2022arXiv

The influence of Aharonov-Casher effect on the generalized Dirac oscillator in the cosmic string space-time

In this manuscript, we investigate the influence of the Aharonov-Casher effect on the generalized Dirac oscillator containing the Coulomb-type potential function related to a relativistic neutral particle having a permanent magnetic dipole moment interacting with the external electromagnetic fields in (1+2)-dimensional cosmic string space-time. The eigenfunctions and energy eigenvalues of such a Dirac oscillator are derived by using the Nikifornov-Uvarov method. We indicate that implementing the scenario gives the relativistic modified exact analytical solutions. In this way, we can see that the degeneracy of the relevant relativistic energy eigenvalues is broken by depending on the Coulomb strength parameter under the influence of the curvature effect and the Aharonov-Casher effect.

preprint2022arXiv

TubeR: Tubelet Transformer for Video Action Detection

We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns a set of tubelet-queries and utilizes a tubelet-attention module to model the dynamic spatio-temporal nature of a video clip, which effectively reinforces the model capacity compared to using actor-positional hypotheses in the spatio-temporal space. For videos containing transitional states or scene changes, we propose a context aware classification head to utilize short-term and long-term context to strengthen action classification, and an action switch regression head for detecting the precise temporal action extent. TubeR directly produces action tubelets with variable lengths and even maintains good results for long video clips. TubeR outperforms the previous state-of-the-art on commonly used action detection datasets AVA, UCF101-24 and JHMDB51-21.

preprint2022arXiv

Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

As RGB-D sensors become more affordable, using RGB-D images to obtain high-accuracy 6D pose estimation results becomes a better option. State-of-the-art approaches typically use different backbones to extract features for RGB and depth images. They use a 2D CNN for RGB images and a per-pixel point cloud network for depth data, as well as a fusion network for feature fusion. We find that the essential reason for using two independent backbones is the "projection breakdown" problem. In the depth image plane, the projected 3D structure of the physical world is preserved by the 1D depth value and its built-in 2D pixel coordinate (UV). Any spatial transformation that modifies UV, such as resize, flip, crop, or pooling operations in the CNN pipeline, breaks the binding between the pixel value and UV coordinate. As a consequence, the 3D structure is no longer preserved by a modified depth image or feature. To address this issue, we propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input. Our method has a Unified CNN framework for 6D pose estimation with a single CNN backbone. In particular, the architecture of our method is based on Mask R-CNN with two extra heads, one named RT head for directly predicting 6D pose and the other named abc head for guiding the network to map the visible points to their coordinates in the 3D model as an auxiliary module. This end-to-end approach balances simplicity and accuracy, achieving comparable accuracy with state of the arts and 7.2x faster inference speed on the YCB-Video dataset.

preprint2022arXiv

Unitail: Detecting, Reading, and Matching in Retail Scene

To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. With 1.8M quadrilateral-shaped instances annotated, the Unitail offers a detection dataset to align product appearance better. Furthermore, it provides a gallery-style OCR dataset containing 1454 product categories, 30k text regions, and 21k transcriptions to enable robust reading on products and motivate enhanced product matching. Besides benchmarking the datasets using various state-of-the-arts, we customize a new detector for product detection and provide a simple OCR-based matching solution that verifies its effectiveness.

preprint2022arXiv

Unsupervised Lifelong Person Re-identification via Contrastive Rehearsal

Existing unsupervised person re-identification (ReID) methods focus on adapting a model trained on a source domain to a fixed target domain. However, an adapted ReID model usually only works well on a certain target domain, but can hardly memorize the source domain knowledge and generalize to upcoming unseen data. In this paper, we propose unsupervised lifelong person ReID, which focuses on continuously conducting unsupervised domain adaptation on new domains without forgetting the knowledge learnt from old domains. To tackle unsupervised lifelong ReID, we conduct a contrastive rehearsal on a small number of stored old samples while sequentially adapting to new domains. We further set an image-to-image similarity constraint between old and new models to regularize the model updates in a way that suits old knowledge. We sequentially train our model on several large-scale datasets in an unsupervised manner and test it on all seen domains as well as several unseen domains to validate the generalizability of our method. Our proposed unsupervised lifelong method achieves strong generalizability, which significantly outperforms previous lifelong methods on both seen and unseen domains. Code will be made available at https://github.com/chenhao2345/UCR.

preprint2022arXiv

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules to help select the most relevant object-action pairs within an image and refine the queries' representation using rich semantic and spatial features. These enhancements lead to state-of-the-art results on the two most popular HOI benchmarks: V-COCO and HICO-DET.

preprint2021arXiv

Algebraic Geometric Secret Sharing Schemes over Large Fields Are Asymptotically Threshold

In Chen-Cramer Crypto 2006 paper \cite{cc} algebraic geometric secret sharing schemes were proposed such that the "Fundamental Theorem in Information-Theoretically Secure Multiparty Computation" by Ben-Or, Goldwasser and Wigderson \cite{BGW88} and Chaum, Crépeau and Damgård \cite{CCD88} can be established over constant-size base finite fields. These algebraic geometric secret sharing schemes defined by a curve of genus $g$ over a constant size finite field ${\bf F}_q$ is quasi-threshold in the following sense, any subset of $u \leq T-1$ players (non qualified) has no information of the secret and any subset of $u \geq T+2g$ players (qualified) can reconstruct the secret. It is natural to ask that how far from the threshold these quasi-threshold secret sharing schemes are? How many subsets of $u \in [T, T+2g-1]$ players can recover the secret or have no information of the secret? In this paper it is proved that almost all subsets of $u \in [T,T+g-1]$ players have no information of the secret and almost all subsets of $u \in [T+g,T+2g-1]$ players can reconstruct the secret when the size $q$ goes to the infinity and the genus satisfies $\lim \frac{g}{\sqrt{q}}=0$. Then algebraic geometric secret sharing schemes over large finite fields are asymptotically threshold in this case. We also analyze the case when the size $q$ of the base field is fixed and the genus goes to the infinity.

preprint2021arXiv

Analysis of Emission Dynamics of a Long Lifetime in Single InAs/GaAs Quantum Dots

A very long lifetime emission with non-single exponential decay characteristic has been reported for single InAs/GaAs quantum dot (QD) samples, in which there exists a long-lived metastable state in the wetting layer (WL) [ACS Photonics 2020,7,3228-3235]. In this article we have proposed a new three-level model to simulate the emission decay curve. In this model, assuming that the excitons in metastable state will diffuse and be trapped by QDs, and then emit fluorescence in QDs, a stretched-like exponential decay formula is derived as I(t)=At^(β-1)e^(-(rt)^β), which can well describe the long lifetime decay curve with an analytical expression of average lifetime <τ>=1/rΓ(1/β+1), where Γ is the Gamma function. Furthermore, based on the proposed three-level model, an expression of the second-order auto-correlation function g^2 (t) which can well fit the measured g^2 (t) curve is also obtained.

preprint2021arXiv

Analysis of the Dirac equation with the Killingbeck potential in non-commutative space

In this paper, we investigate the Dirac equation with the Killingbeck potential under the external magnetic field in non-commutative space. Corresponding to the expressions of the energy level and wave functions in spin symmetry limit and pseudo-spin symmetry limit are derived by using the Bethe ansatz method. The parameter B associated with the external magnetic field and non-commutative parameter θ make to modify the energy level for considered systems.

preprint2021arXiv

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

Multi-orientation scene text detection has recently gained significant research attention. Previous methods directly predict words or text lines, typically by using quadrilateral shapes. However, many of these methods neglect the significance of consistent labeling, which is important for maintaining a stable training process, especially when it comprises a large amount of data. Here we solve this problem by proposing a new method, Orderless Box Discretization (OBD), which first discretizes the quadrilateral box into several key edges containing all potential horizontal and vertical positions. To decode accurate vertex positions, a simple yet effective matching procedure is proposed for reconstructing the quadrilateral bounding boxes. Our method solves the ambiguity issue, which has a significant impact on the learning process. Extensive ablation studies are conducted to validate the effectiveness of our proposed method quantitatively. More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors. Combining both OBD and these useful refinements, we achieve state-of-the-art performance on various benchmarks, including ICDAR 2015 and MLT. Our method also won the first place in the text detection task at the recent ICDAR2019 Robust Reading Challenge for Reading Chinese Text on Signboards, further demonstrating its superior performance. The code is available at https://git.io/TextDet.

preprint2021arXiv

Geographical Knowledge-driven Representation Learning for Remote Sensing Images

The proliferation of remote sensing satellites has resulted in a massive amount of remote sensing images. However, due to human and material resource constraints, the vast majority of remote sensing images remain unlabeled. As a result, it cannot be applied to currently available deep learning methods. To fully utilize the remaining unlabeled images, we propose a Geographical Knowledge-driven Representation learning method for remote sensing images (GeoKR), improving network performance and reduce the demand for annotated data. The global land cover products and geographical location associated with each remote sensing image are regarded as geographical knowledge to provide supervision for representation learning and network pre-training. An efficient pre-training framework is proposed to eliminate the supervision noises caused by imaging times and resolutions difference between remote sensing images and geographical knowledge. A large scale pre-training dataset Levir-KR is proposed to support network pre-training. It contains 1,431,950 remote sensing images from Gaofen series satellites with various resolutions. Experimental results demonstrate that our proposed method outperforms ImageNet pre-training and self-supervised representation learning methods and significantly reduces the burden of data annotation on downstream tasks such as scene classification, semantic segmentation, object detection, and cloud / snow detection. It demonstrates that our proposed method can be used as a novel paradigm for pre-training neural networks. Codes will be available on https://github.com/flyakon/Geographical-Knowledge-driven-Representaion-Learning.

preprint2021arXiv

GTA: Global Temporal Attention for Video Action Understanding

Self-attention learns pairwise interactions to model long-range dependencies, yielding great improvements for video action recognition. In this paper, we seek a deeper understanding of self-attention for temporal modeling in videos. We first demonstrate that the entangled modeling of spatio-temporal information by flattening all pixels is sub-optimal, failing to capture temporal relationships among frames explicitly. To this end, we introduce Global Temporal Attention (GTA), which performs global temporal attention on top of spatial attention in a decoupled manner. We apply GTA on both pixels and semantically similar regions to capture temporal relationships at different levels of spatial granularity. Unlike conventional self-attention that computes an instance-specific attention matrix, GTA directly learns a global attention matrix that is intended to encode temporal structures that generalize across different samples. We further augment GTA with a cross-channel multi-head fashion to exploit channel interactions for better temporal modeling. Extensive experiments on 2D and 3D networks demonstrate that our approach consistently enhances temporal modeling and provides state-of-the-art performance on three video action recognition datasets.

preprint2021arXiv

Interaction between optical pulse and tumor using finite element analysis

Photoacoustic imaging is an emerging technology based on the photoacoustic effect that has developed rapidly in recent years. It combines the high contrast of optical imaging and the high penetration and high resolution of acoustic imaging. As a non-destructive biological tissue imaging technology, photoacoustic imaging has important application value in the field of biomedicine. With its high efficiency bi-oimaging capabilities and excellent biosafety performance, it has been favored by researchers. The visualization of photoacoustic imaging has great research signifi-cance in the early diagnosis of some diseases, especially tumors. In photoacoustic imaging, light transmission and thermal effects are important processes. This article is based on COMSOL software and uses finite element analysis to construct a physi-cal model for simulation. Through laser pulses into the stomach tissue containing tumor, the physical process of light transmission and biological heat transfer was studied, and a photothermal model composed of two physical fields was built, and finally a series of visualization graphics were obtained. This work has certain theo-retical guiding significance for further promoting the application of photoacoustic imaging in the field of biomedicine.

preprint2021arXiv

Momentum-Resolved Visualization of Electronic Evolution in Doping a Mott Insulator

High temperature superconductivity in cuprates arises from doping a parent Mott insulator by electrons or holes. A central issue is how the Mott gap evolves and the low-energy states emerge with doping. Here we report angle-resolved photoemission spectroscopy measurements on a cuprate parent compound by sequential in situ electron doping. The chemical potential jumps to the bottom of the upper Hubbard band upon a slight electron doping, making it possible to directly visualize the charge transfer band and the full Mott gap region. With increasing doping, the Mott gap rapidly collapses due to the spectral weight transfer from the charge transfer band to the gapped region and the induced low-energy states emerge in a wide energy range inside the Mott gap. These results provide key information on the electronic evolution in doping a Mott insulator and establish a basis for developing microscopic theories for cuprate superconductivity.

preprint2021arXiv

Spinning Gluons from the QCD Light-Ray OPE

We study the transverse spin structure of the squeezed limit of the three-point energy correlator, $\langle \mathcal{E}(\vec n_1) \mathcal{E}(\vec n_2) \mathcal{E}(\vec n_3) \rangle$. To describe its all orders perturbative behavior, we develop the light-ray operator product expansion (OPE) in QCD. At leading twist the iterated OPE of $\mathcal{E}(\vec n_i)$ operators closes onto light-ray operators $\mathbb{O}^{[J]}(\vec n)$ with spin $J$, and transverse spin $j=0,2$. We compute the $\mathcal{E}(\vec n_1) \mathcal{E}(\vec n_2)$, $\mathcal{E}(\vec n_1) \mathbb{O}^{[J]}(\vec n_2) $ and $\mathbb{O}^{[J_1]}(\vec n_1) \mathbb{O}^{[J_2]}(\vec n_2) $ OPEs as analytic functions of $J$, which allows for the description of arbitrary squeezed limits of $N$-point correlators in QCD. We use these results with $J=3$ to reproduce the perturbative expansion in the squeezed limit of the three-point correlator, as well as to resum the leading twist singular structure for both quark and gluon jets, including transverse spin contributions, as required for phenomenological applications. Finally, we briefly comment on the transverse spin structure at higher twists, and show that to all orders in the twist expansion the highest transverse spin contributions are universal between quark and gluon jets, and are descendants of the leading twist transverse spin-2 operator, allowing their resummation into a simple two-dimensional Euclidean conformal block. Due to the general applicability of our results to arbitrary correlation functions of energy flow operators, we anticipate that they can be widely applied to improving our understanding of jet substructure at the LHC.

preprint2021arXiv

Structure Parameter Optimized Kernel Based Online Prediction with a Generalized Optimization Strategy for Nonstationary Time Series

In this paper, sparsification techniques aided online prediction algorithms in a reproducing kernel Hilbert space are studied for nonstationary time series. The online prediction algorithms as usual consist of the selection of kernel structure parameters and the kernel weight vector updating. For structure parameters, the kernel dictionary is selected by some sparsification techniques with online selective modeling criteria, and moreover the kernel covariance matrix is intermittently optimized in the light of the covariance matrix adaptation evolution strategy (CMA-ES). Optimizing the real symmetric covariance matrix can not only improve the kernel structure&#39;s flexibility by the cross relatedness of the input variables, but also partly alleviate the prediction uncertainty caused by the kernel dictionary selection for nonstationary time series. In order to sufficiently capture the underlying dynamic characteristics in prediction-error time series, a generalized optimization strategy is designed to construct the kernel dictionary sequentially in multiple kernel connection modes. The generalized optimization strategy provides a more self-contained way to construct the entire kernel connections, which enhances the ability to adaptively track the changing dynamic characteristics. Numerical simulations have demonstrated that the proposed approach has superior prediction performance for nonstationary time series.

preprint2021arXiv

Symmetry-Enforced Nodal Chain Phonons

Topological phonons in crystalline materials have been attracting great interest. However, most cases studied so far are direct generalizations of the topological states from electronic systems. Here, we reveal a novel class of topological phonons -- the symmetry-enforced nodal-chain phonons, which manifest features unique for phononic systems. We show that with $D_{2d}$ little co-group at a non-time-reversal-invariant-momentum point, the phononic nodal chain is guaranteed to exist owing to the vector basis symmetry of phonons, which is a unique character distinct from electronic and other systems. Combined with the spinless character, this makes the proposed nodal-chain phonons enforced by symmorphic crystal symmetries. We further screen all 230 space groups, and find five candidate groups. Interestingly, the nodal chains in these five groups exhibit two different patterns: for tetragonal systems, they are one-dimensional along the fourfold axis; for cubic systems, they form a three-dimensional network structure. Based on first-principles calculations, we identify K$_{2}$O as a realistic material hosting almost ideal nodal-chain phonons. We show that the effect of LO-TO splitting, another unique feature for phonons, helps to expose the nodal-chain phonons in K$_{2}$O in a large energy window. In addition, all the five candidate groups have spacetime inversion symmetry, so the nodal chains also feature a quantized $π$ Berry phase. This leads to drumhead surface phonon modes that must exist on multiple surfaces of a sample.

preprint2020arXiv

3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning

3D human shape and pose estimation from monocular images has been an active area of research in computer vision, having a substantial impact on the development of new applications, from activity recognition to creating virtual avatars. Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images; however, high-resolution visual content is not always available in several practical scenarios such as video surveillance and sports broadcasting. Low-resolution images in real scenarios can vary in a wide range of sizes, and a model trained in one resolution does not typically degrade gracefully across resolutions. Two common approaches to solve the problem of low-resolution input are applying super-resolution techniques to the input images which may result in visual artifacts, or simply training one model for each resolution, which is impractical in many realistic applications. To address the above issues, this paper proposes a novel algorithm called RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme. The proposed network is able to learn the 3D body shape and pose across different resolutions with a single model. The self-supervision loss encourages scale-consistency of the output, and the contrastive learning scheme enforces scale-consistency of the deep features. We show that both these new training losses provide robustness when learning 3D shape and pose in a weakly-supervised manner. Extensive experiments demonstrate that the RSC-Net can achieve consistently better results than the state-of-the-art methods for challenging low-resolution images.

preprint2020arXiv

A new deformation family of Schwarz&#39; D surface

We prove the existence of a new 2-parameter family o$Δ$ of embedded triply periodic minimal surfaces of genus 3. The new surfaces share many properties with classical orthorhombic deformations of Schwarz&#39; D surface, but also exotic in many ways. In particular, they do not belong to Meeks&#39; five-dimensional family. Nevertheless, o$Δ$ meets classical deformations in a 1-parameter family on its boundary.

preprint2020arXiv

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at https://tinyurl.com/AdelaiDet

preprint2020arXiv

An Analytical Approach to Eddy Current in Electromagnetic Damping

An analytical method of calculating eddy current in a metallic spinning gyroscope in external magnetic field is presented. With reasonable assumptions, the problem is simplified from the time-dependent one governed by Maxwell equations to the boundary value problem of Poisson equation, which yields a closed form expression of the eddy current. The rotation frequency as a function of time is calculated, compared with experiment and the relative error is found to be 8.61%.

preprint2020arXiv

An orthorhombic deformation family of Schwarz&#39; H surfaces

The classical H surfaces of H. A. Schwarz form a 1-parameter family of triply periodic minimal surfaces (TPMS) that are usually described as close relatives to his more famous P surface. However, a crucial distinction between these surfaces is that the P surface belongs to a 5-dimensional smooth family of embedded TPMS of genus three discovered by W. Meeks, while the H surfaces are among the few known examples outside this family. We construct a 2-parameter family of embedded TPMS of genus three that contains the H family and meets the Meeks family. In particular, we prove that H surfaces can be deformed continuously within the space of TPMS of genus three into Meeks surfaces.

preprint2020arXiv

Analytic Continuation and Reciprocity Relation for Collinear Splitting in QCD

It is well-known that direct analytic continuation of DGLAP evolution kernel (splitting functions) from space-like to time-like kinematics breaks down at three loops. We identify the origin of this breakdown as splitting functions are not analytic function of external momenta. However, splitting functions can be constructed from square of (generalized) splitting amplitudes. We establish the rule of analytic continuation for splitting amplitudes, and use them to determine the analytic continuation of certain holomorphic and anti-holomorphic part of splitting functions and transverse-momentum dependent distributions. In this way we derive the time-like splitting functions at three loops without ambiguity. We also propose a reciprocity relation for singlet splitting functions, and provide non-trivial evidence that it holds in QCD at least through three loops.

preprint2020arXiv

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Instance segmentation is one of the fundamental vision tasks. Recently, fully convolutional instance segmentation methods have drawn much attention as they are often simpler and more efficient than two-stage approaches like Mask R-CNN. To date, almost all such approaches fall behind the two-stage Mask R-CNN method in mask precision when models have similar computation complexity, leaving great room for improvement. In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity. Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches. The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference. BlendMask can be easily incorporated with the state-of-the-art one-stage detection frameworks and outperforms Mask R-CNN under the same training schedule while being 20% faster. A light-weight version of BlendMask achieves $ 34.2% $ mAP at 25 FPS evaluated on a single 1080Ti GPU card. Because of its simplicity and efficacy, we hope that our BlendMask could serve as a simple yet strong baseline for a wide range of instance-wise prediction tasks. Code is available at https://git.io/AdelaiDet

preprint2020arXiv

Cardiovascular risk and work stress in biomedical researchers in China: An observational, big data study protocol

Introduction: Internet technologies could strengthen data collection and integration and have been used extensively in public health research. It is necessary to apply this technology to further investigate the behaviour and health of biomedical researchers. A browser-based extension was developed by researchers and clinicians to promote the collection and analysis of researchers&#39; behavioural and psychological data. This protocol illustrates an observational study aimed at (1) characterising the health status of biomedical researchers in China and assessing work stress, job satisfaction, role conflict, role ambiguity, and family support; (2) identifying the association between work, behaviour, and health; and (3) investigating the association between behaviour and mental status. Our findings will contribute to the understanding of the influences of job, work environment, and family support on the mental and physical health of biomedical researchers. Methods and analysis: This is a prospective observational study; all candidates will be recruited from China. Participants will install an extension on their Internet browsers, which will collect data when they are accessing PubMed. A web-based survey will be sent to the user interfaces every 6 months that will involve sociodemographic variables, perceived stress scale, job satisfaction scale, role conflict and ambiguity scale, and family support scale. Machine-learning algorithms will analyse the data generated during daily access. Ethics and dissemination: This study received ethical approval from the ethics committee of the Shanghai Children&#39;s Medical Centre (reference number SCMCIRB-K2018082). Study results will be disseminated through peer-reviewed publications and conference presentations.

preprint2020arXiv

Conditional Convolutions for Instance Segmentation

We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation). Top-performing instance segmentation methods such as Mask R-CNN rely on ROI operations (typically ROIPool or ROIAlign) to obtain the final instance masks. In contrast, we propose to solve instance segmentation from a new perspective. Instead of using instance-wise ROIs as inputs to a network of fixed weights, we employ dynamic instance-aware networks, conditioned on instances. CondInst enjoys two advantages: 1) Instance segmentation is solved by a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference. We demonstrate a simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed. On the COCO dataset, we outperform a few recent methods including well-tuned Mask RCNN baselines, without longer training schedules needed. Code is available: https://github.com/aim-uofa/adet

preprint2020arXiv

Coordinated Path Following Control of Fixed-wing Unmanned Aerial Vehicles

In this paper, we investigate the problem of coordinated path following for fixed-wing UAVs with speed constraints in 2D plane. The objective is to steer a fleet of UAVs along the path(s) while achieving the desired sequenced inter-UAV arc distance. In contrast to the previous coordinated path following studies, we are able through our proposed hybrid control law to deal with the forward speed and the angular speed constraints of fixed-wing UAVs. More specifically, the hybrid control law makes all the UAVs work at two different levels: those UAVs whose path following errors are within an invariant set (i.e., the designed coordination set) work at the coordination level; and the other UAVs work at the single-agent level. At the coordination level, we prove that even with speed constraints, the proposed control law can make sure the path following errors reduce to zero, while the desired arc distances converge to the desired value. At the single-agent level, the convergence analysis for the path following error entering the coordination set is provided. We develop a hardware-in-the-loop simulation testbed of the multi-UAV system by using actual autopilots and the X-Plane simulator. The effectiveness of the proposed approach is corroborated with both MATLAB and the testbed.

preprint2020arXiv

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well.

preprint2020arXiv

Deep Mining External Imperfect Data for Chest X-ray Disease Screening

Deep learning approaches have demonstrated remarkable progress in automatic Chest X-ray analysis. The data-driven feature of deep models requires training data to cover a large distribution. Therefore, it is substantial to integrate knowledge from multiple datasets, especially for medical images. However, learning a disease classification model with extra Chest X-ray (CXR) data is yet challenging. Recent researches have demonstrated that performance bottleneck exists in joint training on different CXR datasets, and few made efforts to address the obstacle. In this paper, we argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. Specifically, the imperfect data is in two folds: domain discrepancy, as the image appearances vary across datasets; and label discrepancy, as different datasets are partially labeled. To this end, we formulate the multi-label thoracic disease classification problem as weighted independent binary tasks according to the categories. For common categories shared across domains, we adopt task-specific adversarial training to alleviate the feature differences. For categories existing in a single dataset, we present uncertainty-aware temporal ensembling of model predictions to mine the information from the missing labels further. In this way, our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability. We conduct extensive experiments on three datasets with more than 360,000 Chest X-ray images. Our method outperforms other competing models and sets state-of-the-art performance on the official NIH test set with 0.8349 AUC, demonstrating its effectiveness of utilizing the external dataset to improve the internal classification.

preprint2020arXiv

Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation

Deep learning methods show promising results for overlapping cervical cell instance segmentation. However, in order to train a model with good generalization ability, voluminous pixel-level annotations are demanded which is quite expensive and time-consuming for acquisition. In this paper, we propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining (MMT-PSM), which consists of a teacher and a student network during training. Two networks are encouraged to be consistent both in feature and semantic level under small perturbations. The teacher&#39;s self-ensemble predictions from $K$-time augmented samples are used to construct the reliable pseudo-labels for optimizing the student. We design a novel strategy to estimate the sensitivity to perturbations for each proposal and select informative samples from massive cases to facilitate fast and effective semantic distillation. In addition, to eliminate the unavoidable noise from the background region, we propose to use the predicted segmentation mask as guidance to enforce the feature distillation in the foreground region. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only, and outperforms state-of-the-art semi-supervised methods.

preprint2020arXiv

DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation through SwaftNet for Surrounding Sensing

Semantically interpreting the traffic scene is crucial for autonomous transportation and robotics systems. However, state-of-the-art semantic segmentation pipelines are dominantly designed to work with pinhole cameras and train with narrow Field-of-View (FoV) images. In this sense, the perception capacity is severely limited to offer higher-level confidence for upstream navigation tasks. In this paper, we propose a network adaptation framework to achieve Panoramic Annular Semantic Segmentation (PASS), which allows to re-use conventional pinhole-view image datasets, enabling modern segmentation networks to comfortably adapt to panoramic images. Specifically, we adapt our proposed SwaftNet to enhance the sensitivity to details by implementing attention-based lateral connections between the detail-critical encoder layers and the context-critical decoder layers. We benchmark the performance of efficient segmenters on panoramic segmentation with our extended PASS dataset, demonstrating that the proposed real-time SwaftNet outperforms state-of-the-art efficient networks. Furthermore, we assess real-world performance when deploying the Detail-Sensitive PASS (DS-PASS) system on a mobile robot and an instrumented vehicle, as well as the benefit of panoramic semantics for visual odometry, showing the robustness and potential to support diverse navigational applications.

preprint2020arXiv

Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

In Markov games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning (RL) agents, because the opponents can evolve their policies concurrently. This increases the complexity of the learning task and slows down the learning speed of the RL agents. This paper proposes efficient use of rough heuristics to speed up policy learning when playing against concurrent learners. Specifically, we propose an algorithm that can efficiently learn explainable and generalized action selection rules by taking advantages of the representation of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games. A neural network is used to model the opponent from their behaviors and the corresponding policy is inferred for action selection and rule evolution. In cases of multiple heuristic policies, we introduce the concept of Pareto optimality for action selection. Besides, taking advantages of the condition representation and matching mechanism of XCS, the heuristic policies and the opponent model can provide guidance for situations with similar feature representation. Furthermore, we introduce an accuracy-based eligibility trace mechanism to speed up rule evolution, i.e., classifiers that can match the historical traces are reinforced according to their accuracy. We demonstrate the advantages of the proposed algorithm over several benchmark algorithms in a soccer and a thief-and-hunter scenarios.

preprint2020arXiv

Electronic Evolution from the Parent Mott Insulator to a Superconductor in Lightly Hole-Doped Bi2Sr2CaCu2O8+delta

High temperature superconductivity in cuprates is realized by doping the Mott insulator with charge carriers. A central issue is how such an insulating state can evolve into a conducting or superconducting state when charge carriers are introduced. Here, by in situ vacuum annealing and Rb deposition on the Bi2Sr2Ca0.6Dy0.4Cu2O8+delta (Bi2212) sample surface to push its doping level continuously from deeply underdoped (Tc=25 K, doping level p-0.066) to the near zero doping parent Mott insulator, angle-resolved photoemission spectroscopy measurements are carried out to observe the detailed electronic structure evolution in lightly hole-doped region for the first time. Our results indicate that the chemical potential lies at about 1 eV above the charge transfer band for the parent state at zero doping which is quite close to the upper Hubbard band. With increasing hole doping, the chemical potential moves continuously towards the charge transfer band and the band structure evolution exhibits a rigid band shift-like behavior. When the chemical potential approaches the charge transfer band at a doping level of -0.05, the nodal spectral weight near the Fermi level increases, followed by the emergence of the coherent quasiparticle peak and the insulator-superconductor transition. Our observations provide key insights in understanding the insulator-superconductor transition in doping the parent cuprate compound and for establishing related theories.

preprint2020arXiv

Fifth-degree elastic potential for predictive stress-strain relations and elastic instabilities under large strain and complex loading in Si

Materials under complex loading develop large strains and often transition via an elastic instability, as observed in both simple and complex systems. Here, we present Si I under large strain in terms of Lagrangian strain by an $5^{th}$-order elastic potential found by minimizing error relative to density functional theory (DFT) results. The Cauchy stress-Lagrangian strain curves for arbitrary complex loadings are in excellent correspondence with DFT results, including elastic instability driving Si I$\rightarrow$II phase transformation (PT) and the shear instabilities. PT conditions for Si I$\rightarrow$II under action of cubic axial stresses are linear in Cauchy stresses in agreement with DFT predictions. Such elastic potential permits study of elastic instabilities and orientational dependence leading to different PTs, slip, twinning, or fracture, providing a fundamental basis for continuum simulations of crystal behavior under extreme loading.

preprint2020arXiv

Group Ensemble: Learning an Ensemble of ConvNets in a single ConvNet

Ensemble learning is a general technique to improve accuracy in machine learning. However, the heavy computation of a ConvNets ensemble limits its usage in deep learning. In this paper, we present Group Ensemble Network (GENet), an architecture incorporating an ensemble of ConvNets in a single ConvNet. Through a shared-base and multi-head structure, GENet is divided into several groups to make explicit ensemble learning possible in a single ConvNet. Owing to group convolution and the shared-base, GENet can fully leverage the advantage of explicit ensemble learning while retaining the same computation as a single ConvNet. Additionally, we present Group Averaging, Group Wagging and Group Boosting as three different strategies to aggregate these ensemble members. Finally, GENet outperforms larger single networks, standard ensembles of smaller networks, and other recent state-of-the-art methods on CIFAR and ImageNet. Specifically, group ensemble reduces the top-1 error by 1.83% for ResNeXt-50 on ImageNet. We also demonstrate its effectiveness on action recognition and object detection tasks.

preprint2020arXiv

Label-Aware Graph Convolutional Networks

Recent advances in Graph Convolutional Networks (GCNs) have led to state-of-the-art performance on various graph-related tasks. However, most existing GCN models do not explicitly identify whether all the aggregated neighbors are valuable to the learning tasks, which may harm the learning performance. In this paper, we consider the problem of node classification and propose the Label-Aware Graph Convolutional Network (LAGCN) framework which can directly identify valuable neighbors to enhance the performance of existing GCN models. Our contribution is three-fold. First, we propose a label-aware edge classifier that can filter distracting neighbors and add valuable neighbors for each node to refine the original graph into a label-aware~(LA) graph. Existing GCN models can directly learn from the LA graph to improve the performance without changing their model architectures. Second, we introduce the concept of positive ratio to evaluate the density of valuable neighbors in the LA graph. Theoretical analysis reveals that using the edge classifier to increase the positive ratio can improve the learning performance of existing GCN models. Third, we conduct extensive node classification experiments on benchmark datasets. The results verify that LAGCN can improve the performance of existing GCN models considerably, in terms of node classification.

preprint2020arXiv

Low-complexity Point Cloud Filtering for LiDAR by PCA-based Dimension Reduction

Signals emitted by LiDAR sensors would often be negatively influenced during transmission by rain, fog, dust, atmospheric particles, scattering of light and other influencing factors, causing noises in point cloud images. To address this problem, this paper develops a new noise reduction method to filter LiDAR point clouds, i.e. an adaptive clustering method based on principal component analysis (PCA). Different from the traditional filtering methods that directly process three-dimension (3D) point cloud data, the proposed method uses dimension reduction to generate two-dimension (2D) data by extracting the first principal component and the second principal component of the original data with little information attrition. In the 2D space spanned by two principal components, the generated 2D data are clustered for noise reduction before being restored into 3D. Through dimension reduction and the clustering of the generated 2D data, this method derives low computational complexity, effectively removing noises while retaining details of environmental features. Compared with traditional filtering algorithms, the proposed method has higher precision and recall. Experimental results show a F-score as high as 0.92 with complexity reduced by 50% compared with traditional density-based clustering method.

preprint2020arXiv

Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

Recently, neural architecture search (NAS) methods have attracted much attention and outperformed manually designed architectures on a few high-level vision tasks. In this paper, we propose HiNAS (Hierarchical NAS), an effort towards employing NAS to automatically design effective neural network architectures for image denoising. HiNAS adopts gradient based search strategies and employs operations with adaptive receptive field to build an flexible hierarchical search space. During the search stage, HiNAS shares cells across different feature levels to save memory and employ an early stopping strategy to avoid the collapse issue in NAS, and considerably accelerate the search speed. The proposed HiNAS is both memory and computation efficient, which takes only about 4.5 hours for searching using a single GPU. We evaluate the effectiveness of our proposed HiNAS on two different datasets, namely an additive white Gaussian noise dataset BSD500, and a realistic noise dataset SIM1800. Experimental results show that the architecture found by HiNAS has fewer parameters and enjoys a faster inference speed, while achieving highly competitive performance compared with state-of-the-art methods. We also present analysis on the architectures found by NAS. HiNAS also shows good performance on experiments for image de-raining.

preprint2020arXiv

Multi-Objective Vehicle Rebalancing for Ridehailing System using a Reinforcement Learning Approach

The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric demand is considered here. We pose the rebalancing problem within a semi Markov decision problem (SMDP) framework with closed queues of vehicles serving stationary, but asymmetric demand, over a large city with multiple nodes (representing neighborhoods). We assume that the passengers queue up at every node until they are matched with a vehicle. The goal of the SMDP is to minimize a convex combination of the waiting time of the passengers and the total empty vehicle miles traveled. The resulting SMDP appears to be difficult to solve for closed-form expression for the rebalancing strategy. As a result, we use a deep reinforcement learning algorithm to determine the approximately optimal solution to the SMDP. The trained policy is compared with other well-known algorithms for rebalancing, which are designed to address other objectives (such as to minimize demand drop probability) for the ridehailing problem.

preprint2020arXiv

NAS-FCOS: Fast Neural Architecture Search for Object Detection

The success of deep neural networks relies on significant architecture engineering. Recently neural architecture search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount of computational resources, e.g., a few thousand GPU-days. To date, on challenging vision tasks such as object detection, NAS, especially fast versions of NAS, is less studied. Here we propose to search for the decoder structure of object detectors with search efficiency being taken into consideration. To be more specific, we aim to efficiently search for the feature pyramid network (FPN) as well as the prediction head of a simple anchor-free object detector, namely FCOS, using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms and strategies for evaluating network quality, we are able to efficiently search a top-performing detection architecture within 4 days using 8 V100 GPUs. The discovered architecture surpasses state-of-the-art object detection models (such as Faster R-CNN, RetinaNet and FCOS) by 1.5 to 3.5 points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS for object detection.

preprint2020arXiv

New Constructions of Subspace Codes Using Subsets of MRD codes in Several Blocks

A basic problem for the constant dimension subspace coding is to determine the maximal possible size A_q (n, d, k) of a set of k-dimensional subspaces in Fnq such that the subspace distance satisfies d(U, V )> or =d for any two different subspaces U andV in this set. We present two new constructions of constant dimension subspace codes using subsets of maximal rank-distance (MRD) codes in several blocks. This method is firstly applied to the linkage construction and secondly to arbitrary number of blocks of lifting MRD codes. In these two constructions, subsets of MRD codes with bounded ranks play an essential role. The Delsarte theorem of the rank distribution of MRD codes is an important ingredient to count codewords in our constructed constant dimension subspace codes. We give many new lower bounds for A_q (n, d, k). More than 110 new constant dimension subspace codes better than previously best known codes are constructed.

preprint2020arXiv

Parameter-controlled inserting constructions of constant dimension subspace codes

A basic problem in constant dimension subspace coding is to determine the maximal possible size ${\bf A}_q(n,d,k)$ of a set of $k$-dimensional subspaces in ${\bf F}_q^n$ such that the subspace distance satisfies $\operatorname{dis}(U,V)=2k-2\dim(U \cap V) \geq d$ for any two different $k$-dimensional subspaces $U$ and $V$ in this set. In this paper we propose new parameter-controlled inserting constructions of constant dimension subspace codes. These inserting constructions are flexible because they are controlled by parameters. Several new better lower bounds which are better than all previously constructive lower bounds can be derived from our flexible inserting constructions. $141$ new constant dimension subspace codes of distances $4,6,8$ better than previously best known codes are constructed.

preprint2020arXiv

Planning to Build Soma Blocks Using a Dual-arm Robot

This paper presents a planner that can automatically find an optimal assembly sequence for a dual-arm robot to assemble the soma blocks. The planner uses the mesh model of objects and the final state of the assembly to generate all possible assembly sequence and evaluate the optimal assembly sequence by considering the stability, graspability, assemblability, as well as the need for a second arm. Especially, the need for a second arm is considered when supports from worktables and other workpieces are not enough to produce a stable assembly. The planner will refer to an assisting grasp to additionally hold and support the unstable components so that the robot can further assemble new workpieces and finally reach a stable state. The output of the planner is the optimal assembly orders, candidate grasps, assembly directions, and the assisting grasps if any. The output of the planner can be used to guide a dual-arm robot to perform the assembly task. The planner is verified in both simulations and real-world executions.

preprint2020arXiv

Plasmon-field-induced Metastable States in the Wetting Layer: Detected by the Fluorescence Decay Time of InAs/GaAs Single Quantum Dots

We report a new way to slow down the spontaneous emission rate of excitons in the wetting layer (WL) through radiative field coupling between the exciton emissions and the dipole field of metal islands. As a result, a long-lifetime decay process is detected in the emission of InAs/GaAs single quantum dots (QDs). It is found that when the separation distance from WL layer (QD layer) to the metal islands is around 20 nm and the islands have an average size of approximately 50 nm, QD lifetime may change from approximately 1 to 160 ns. The corresponding second-order autocorrelation function g(2) (τ) changes from antibunching into a bunching and antibunching characteristics due to the existence of long-lived metastable states in the WL. This phenomenon can be understood by treating the metal islands as many dipole oscillators in the dipole approximation, which may cause destructive interference between the exciton dipole field and the induced dipole field of metal islands.

preprint2020arXiv

Progressive Object Transfer Detection

Recent development of object detection mainly depends on deep learning with large-scale benchmarks. However, collecting such fully-annotated data is often difficult or expensive for real-world applications, which restricts the power of deep neural networks in practice. Alternatively, humans can detect new objects with little annotation burden, since humans often use the prior knowledge to identify new objects with few elaborately-annotated examples, and subsequently generalize this capacity by exploiting objects from wild images. Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework. Specifically, we make three main contributions in this paper. First, POTD can leverage various object supervision of different domains effectively into a progressive detection procedure. Via such human-like learning, one can boost a target detection task with few annotations. Second, POTD consists of two delicate transfer stages, i.e., Low-Shot Transfer Detection (LSTD), and Weakly-Supervised Transfer Detection (WSTD). In LSTD, we distill the implicit object knowledge of source detector to enhance target detector with few annotations. It can effectively warm up WSTD later on. In WSTD, we design a recurrent object labelling mechanism for learning to annotate weakly-labeled images. More importantly, we exploit the reliable object supervision from LSTD, which can further enhance the robustness of target detector in the WSTD stage. Finally, we perform extensive experiments on a number of challenging detection benchmarks with different settings. The results demonstrate that, our POTD outperforms the recent state-of-the-art approaches.

preprint2020arXiv

Quantum Detection using Magnetic Avalanches in Single-Molecule Magnets

The detection of a single quantum of energy with high efficiency and low false positive rate is of considerable scientific interest, from serving as single quantum sensors of optical and infra-red photons to enabling the direct detection of low-mass dark matter. We report the first experimental demonstration of magnetic avalanches induced by scattering of quanta in single-molecule magnet (SMM) crystals made of Mn12-acetate, establishing the use of SMMs as particle detectors for the first time. While the current setup has an energy threshold in the MeV regime, our results motivate the exploration of a wide variety of SMMs whose properties could allow for detection of sub-eV energy depositions.

preprint2020arXiv

Research on the new form of higher-order generalized uncertainty principle in quantum system

This paper proposes a new high-order generalized uncertainty principle, which can modify the momentum operator and position operator simultaneously. Moreover, the new form of GUP is consistent with the viewpoint of the existence of the minimum length uncertainty and the maximum observable momentum proposed by the mainstream quantum gravity theory. By using the new GUP, the maximum localization state and position eigenfunction are discussed, and the corresponding conclusions are compared with the existing literature. The harmonic oscillator is further discussed at the end of this article as an example.

preprint2020arXiv

Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion

Accurate medical image segmentation commonly requires effective learning of the complementary information from multimodal data. However, in clinical practice, we often encounter the problem of missing imaging modalities. We tackle this challenge and propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code, which uniquely sticks to each modality, and the modality-invariant content code, which absorbs multimodal information for the segmentation task. With enhanced modality-invariance, the disentangled content code from each modality is fused into a shared representation which gains robustness to missing data. The fusion is achieved via a learning-based strategy to gate the contribution of different modalities at different locations. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset. With competitive performance to the state-of-the-art approaches for full modality, our method achieves outstanding robustness under various missing modality(ies) situations, significantly exceeding the state-of-the-art method by over 16% in average for Dice on whole tumor segmentation.

preprint2020arXiv

SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search

The $k$-Nearest Neighbor Search ($k$-NNS) is the backbone of several cloud-based services such as recommender systems, face recognition, and database search on text and images. In these services, the client sends the query to the cloud server and receives the response in which case the query and response are revealed to the service provider. Such data disclosures are unacceptable in several scenarios due to the sensitivity of data and/or privacy laws. In this paper, we introduce SANNS, a system for secure $k$-NNS that keeps client&#39;s query and the search result confidential. SANNS comprises two protocols: an optimized linear scan and a protocol based on a novel sublinear time clustering-based algorithm. We prove the security of both protocols in the standard semi-honest model. The protocols are built upon several state-of-the-art cryptographic primitives such as lattice-based additively homomorphic encryption, distributed oblivious RAM, and garbled circuits. We provide several contributions to each of these primitives which are applicable to other secure computation tasks. Both of our protocols rely on a new circuit for the approximate top-$k$ selection from $n$ numbers that is built from $O(n + k^2)$ comparators. We have implemented our proposed system and performed extensive experimental results on four datasets in two different computation environments, demonstrating more than $18-31\times$ faster response time compared to optimally implemented protocols from the prior work. Moreover, SANNS is the first work that scales to the database of 10 million entries, pushing the limit by more than two orders of magnitude.

preprint2020arXiv

Single-Layer Graph Convolutional Networks For Recommendation

Graph Convolutional Networks (GCNs) and their variants have received significant attention and achieved start-of-the-art performances on various recommendation tasks. However, many existing GCN models tend to perform recursive aggregations among all related nodes, which arises severe computational burden. Moreover, they favor multi-layer architectures in conjunction with complicated modeling techniques. Though effective, the excessive amount of model parameters largely hinder their applications in real-world recommender systems. To this end, in this paper, we propose the single-layer GCN model which is able to achieve superior performance along with remarkably less complexity compared with existing models. Our main contribution is three-fold. First, we propose a principled similarity metric named distribution-aware similarity (DA similarity), which can guide the neighbor sampling process and evaluate the quality of the input graph explicitly. We also prove that DA similarity has a positive correlation with the final performance, through both theoretical analysis and empirical simulations. Second, we propose a simplified GCN architecture which employs a single GCN layer to aggregate information from the neighbors filtered by DA similarity and then generates the node representations. Moreover, the aggregation step is a parameter-free operation, such that it can be done in a pre-processing manner to further reduce red the training and inference costs. Third, we conduct extensive experiments on four datasets. The results verify that the proposed model outperforms existing GCN models considerably and yields up to a few orders of magnitude speedup in training, in terms of the recommendation performance.

preprint2020arXiv

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here.

preprint2020arXiv

Stacking disorder in periodic minimal surfaces

We construct 1-parameter families of non-periodic embedded minimal surfaces of infinite genus in $T \times \mathbb{R}$, where $T$ denotes a flat 2-tori. Each of our families converges to a foliation of $T \times \mathbb{R}$ by $T$. These surfaces then lift to minimal surfaces in $\mathbb{R}^3$ that are periodic in horizontal directions but not periodic in the vertical direction. In the language of crystallography, our construction can be interpreted as disordered stacking of layers of periodically arranged catenoid necks. Our work is motivated by experimental observations of twinning defects in periodic minimal surfaces, which we reproduce as special cases of stacking disorder.

preprint2020arXiv

The Lang-Trotter Conjecture for products of non-CM elliptic curves

Inspired by the work of Lang-Trotter on the densities of primes with fixed Frobenius traces for elliptic curves defined over $\mathbb{Q}$ and by the subsequent generalization of Cojocaru-Davis-Silverberg-Stange to generic abelian varieties, we study the analogous question for abelian surfaces isogenous to products of non-CM elliptic curves over $\mathbb{Q}$. We formulate the corresponding conjectural asymptotic, provide upper bounds, and explicitly compute (when the elliptic curves lie outside a thin set) the arithmetically significant constants appearing in the asymptotic. This allows us to provide computational evidence for the conjecture.

preprint2020arXiv

The ram pressure stripped radio tails of galaxies in the Coma cluster

Previous studies have revealed a population of galaxies in galaxy clusters with ram pressure stripped (RPS) tails of gas and embedded young stars. We observed 1.4 GHz continuum and HI emission with the Very Large Array in its B-configuration in two fields of the Coma cluster to study the radio properties of RPS galaxies. The best continuum sensitivities in the two fields are 6 and 8 $μ$Jy per 4&#39;&#39; beam respectively, which are 4 and 3 times deeper than those previously published. Radio continuum tails are found in 10 (8 are new) out of 20 RPS galaxies, unambiguously revealing the presence of relativistic electrons and magnetic fields in the stripped tails. Our results also hint that the tail has a steeper spectrum than the galaxy. The 1.4 GHz continuum in the tails is enhanced relative to their H$α$ emission by a factor of $\sim$7 compared to the main bodies of the RPS galaxies. The 1.4 GHz continuum of the RPS galaxies is also enhanced relative to their IR emission by a factor of $\sim$2 compared to star-forming galaxies. The enhancement is likely related to ram pressure and turbulence in the tail. We furthermore present HI detections in three RPS galaxies and upper limits for the other RPS galaxies. The cold gas in D100&#39;s stripped tail is dominated by molecular gas, which is likely a consequence of the high ambient pressure. No evidence of radio emission associated with ultra-diffuse galaxies is found in our data.

preprint2020arXiv

TLFW: A Three-layer Framework in Wireless Rechargeable Sensor Network with a Mobile Base Station

Wireless sensor networks as the base support for the Internet of things has been a large number of popularity and application. Such as intelligent agriculture, we have to use the sensor network to obtain the growth environmental data of crops, etc.. However, the difficulty of power supply of wireless nodes has seriously hindered the application and development of Internet of things. In order to solve this problem, people use low-power, sleep scheduling and other energy-saving methods on the nodes. Although these methods can prolong the working time of nodes, they will eventually become invalid because of the exhaustion of energy. The use of solar energy, wind energy, and wireless signals in the environment to obtain energy is another way to solve the energy problem of nodes. However, these methods are affected by weather, environment and other factors, and are unstable. Thus, the discontinuity work of the node is caused. In recent years, the development of wireless power transfer (WPT) has brought another solution to this problem. In this paper, a three-layer framework is proposed for mobile station data collection in rechargeable wireless sensor networks to keep the node running forever, named TLFW which includes the sensor layer, cluster head layer, and mobile station layer. And the framework can minimize the total energy consumption of the system. The simulation results show that the scheme can reduce the energy consumption of the entire system, compared with a Mobile Station in a Rechargeable Sensor Network(MSiRSN).

preprint2020arXiv

Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

Deep convolutional neural networks have achieved remarkable progress on a variety of medical image computing tasks. A common problem when applying supervised deep learning methods to medical images is the lack of labeled data, which is very expensive and time-consuming to be collected. In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data. To utilize the unlabeled data, our method encourages the consistent predictions of the network-in-training for the same input under different regularizations. Aiming for the semi-supervised segmentation problem, we enhance the effect of regularization for pixel-level predictions by introducing a transformation, including rotation and flipping, consistent scheme in our self-ensembling model. With the aim of semi-supervised segmentation tasks, we introduce a transformation consistent strategy in our self-ensembling model to enhance the regularization effect for pixel-level predictions. We have extensively validated the proposed semi-supervised method on three typical yet challenging medical image segmentation tasks: (i) skin lesion segmentation from dermoscopy images on International Skin Imaging Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge (LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows superior segmentation performance on challenging 2D/3D medical images, demonstrating the effectiveness of our semi-supervised method for medical image segmentation.

preprint2020arXiv

Transport of Finite Size Self-Propelled Particles Confined in a 2D Zigzag Channel with Gaussian Colored Noise

The directional transport of finite size self-propelled Brownian particles confined in a 2D zigzag channel with colored noise is investigated. The noises(noise parallel to x-axis and y-axis), the asymmetry parameter Δk, the ratio f(ratio of the particle radius and the bottleneck half width), the selfpropelled speed v0 have joint effect on the particles. The average velocity of self-propelled particles is significantly different from passive particles. The average velocity exhibits complicated behavior with increasing self-propelled speed v0

preprint2020arXiv

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin.

preprint2020arXiv

Weakly Inscribed Polyhedra

We study convex polyhedra in $\mathbb{R}\mathbb{P}^3$ with all their vertices on a sphere. We do not require, in particular, that the polyhedra lie in the interior of the sphere, hence the term &#34;weakly inscribed&#34;. Such polyhedra can be interpreted as ideal polyhedra, if we regard $\mathbb{R}\mathbb{P}^3$ as a combination of the hyperbolic space and the de Sitter space, with the sphere as the common ideal boundary. We have three main results: (1) the $1$-skeleta of weakly inscribed polyhedra are characterized in a purely combinatorial way, (2) the exterior dihedral angles are characterized by linear programming, and (3) we also describe the hyperbolic-de Sitter structure induced on the boundary of weakly inscribed polyhedra.

preprint2020arXiv

Yet Another Intermediate-Level Attack

The transferability of adversarial examples across deep neural network (DNN) models is the crux of a spectrum of black-box attacks. In this paper, we propose a novel method to enhance the black-box transferability of baseline adversarial examples. By establishing a linear mapping of the intermediate-level discrepancies (between a set of adversarial inputs and their benign counterparts) for predicting the evoked adversarial loss, we aim to take full advantage of the optimization procedure of multi-step baseline attacks. We conducted extensive experiments to verify the effectiveness of our method on CIFAR-100 and ImageNet. Experimental results demonstrate that it outperforms previous state-of-the-arts considerably. Our code is at https://github.com/qizhangli/ila-plus-plus.

preprint2019arXiv

Sending-or-Not-Sending with Independent Lasers: Secure Twin-Field Quantum Key Distribution Over 509 km

Twin field quantum key distribution promises high key rates at long distance to beat the rate distance limit. Here, applying the sending or not sending TF QKD protocol, we experimentally demonstrate a secure key distribution breaking the absolute key rate limit of repeaterless QKD over 509 km, 408 km ultra-low loss optical fibre and 350 km standard optical fibre. Two independent lasers are used as the source with remote frequency locking technique over 500 km fiber distance; Practical optical fibers are used as the optical path with appropriate noise filtering; And finite key effects are considered in the key rate analysis. The secure key rates obtained at different distances are more than 5 times higher than the conditional limit of repeaterless QKD, a bound value assuming the same detection loss in the comparison. The achieved secure key rate is also higher than that a traditional QKD protocol running with a perfect repeaterless QKD device and even if an infinite number of sent pulses. Our result shows that the protocol and technologies applied in this experiment enable TF QKD to achieve high secure key rate at long distribution distance, and hence practically useful for field implementation of intercity QKD.

preprint2018arXiv

SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects, 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.