Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
113works
0followers
31topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

113 published item(s)

preprint2026arXiv

FollowTable: A Benchmark for Instruction-Following Table Retrieval

Table Retrieval (TR) has traditionally been formulated as an ad-hoc retrieval problem, where relevance is primarily determined by topical semantic similarity. With the growing adoption of LLM-based agentic systems, access to structured data is increasingly instruction-driven, where relevance is conditional on explicit content and schema constraints rather than topical similarity alone. We therefore formalize Instruction-Following Table Retrieval (IFTR), a new task that requires models to jointly satisfy topical relevance and fine-grained instruction constraints. We identify two core challenges in IFTR: (i) sensitivity to content scope, such as inclusion and exclusion constraints, and (ii) awareness of schema-grounded requirements, including column semantics and representation granularity--capabilities largely absent in existing retrievers. To support systematic evaluation, we introduce FollowTable, the first large-scale benchmark for IFTR, constructed via a taxonomy-driven annotation pipeline. We further propose a new metric, termed the Instruction Responsiveness Score, to evaluate whether retrieval rankings consistently adapt to user instructions relative to a topic-only baseline. Our results indicate that existing retrieval models struggle to follow fine-grained instructions over tabular data. In particular, they exhibit systematic biases toward surface-level semantic cues and remain limited in handling schema-grounded constraints, highlighting substantial room for future improvements.

preprint2026arXiv

NEWTON: Agentic Planning for Physically Grounded Video Generation

Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitting the parameters that fully determine dynamics, and no amount of model scaling can recover what was never specified. From this diagnosis we derive three properties that physics conditioning must satisfy -- sufficiency, dynamism, and verifiability -- and show that no existing approach satisfies all three. We present NEWTON, in which video generation is demoted from the system output to one action inside an agent's toolbox: a learned planner orchestrates physics-aware tools (keyframe generation, scientific computation, prompt refinement) to construct rich conditioning, and a verifier closes the loop for iterative re-planning. The planner is the sole trainable component, optimized on-policy via Flow-GRPO inside the live multi-turn loop. On VideoPhy-2, NEWTON improves joint accuracy from 21.4% to 29.7% on LTX-Video and from 30.7% to 37.4% on Veo-3.1, without modifying either generator. Our project page: https://Newton026.github.io/newton

preprint2026arXiv

RAGR: Review-Augmented Generative Recommendation

Sequential recommendation (SR) is traditionally formulated as next-item prediction over a chronological sequence of interacted items. Although recent generative recommendation (GR) methods introduce new machinery, such as semantic IDs, autoregressive decoding, and unified token spaces, they largely inherit the same item-only modeling assumption. We argue that this design constitutes a structural bottleneck, because user decision-making is not purely behavioral: while item interactions reveal what users choose, review feedback often explain why they choose it by exposing latent evaluative factors. Motivated by this observation, we propose Review-Augmented Generative Recommendation (RAGR), a novel GR framework that incorporates review feedback directly into the generative user sequence rather than treating reviews as auxiliary side information. Specifically, RAGR introduces a Review-Augmented User Sequence Modeling mechanism that interleaves item semantic IDs and review semantic IDs in chronological order to construct a mixed behavioral-semantic sequence, enabling review signals to participate directly in autoregressive next-token generation. To preserve the recommendation objective, we further introduce an Item-Centric Task Generation Alignment strategy based on direct preference optimization (DPO), which encourages the model to favor item tokens over review tokens at prediction positions. Experiments on three real-world datasets show that RAGR yields consistent and significant gains over strong GR backbones across all metrics. Our code and data are available at \url{https://github.com/Zhang-Yingyi/TKDE_RAGR}.

preprint2025arXiv

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effective for simple actions, this System 1 approach faces a fundamental theoretical bottleneck we identify as the Semantic-Kinematic Impedance Mismatch: the inherent difficulty of grounding semantically dense, discrete linguistic intent into kinematically dense, high-frequency motion data in a single shot. In this paper, we argue that the solution lies in an architectural shift towards Latent System 2 Reasoning. Drawing inspiration from Hierarchical Motor Control in cognitive science, we propose Latent Motion Reasoning (LMR) that reformulates generation as a two-stage Think-then-Act decision process. Central to LMR is a novel Dual-Granularity Tokenizer that disentangles motion into two distinct manifolds: a compressed, semantically rich Reasoning Latent for planning global topology, and a high-frequency Execution Latent for preserving physical fidelity. By forcing the model to autoregressively reason (plan the coarse trajectory) before it moves (instantiates the frames), we effectively bridge the ineffability gap between language and physics. We demonstrate LMR's versatility by implementing it for two representative baselines: T2M-GPT (discrete) and MotionStreamer (continuous). Extensive experiments show that LMR yields non-trivial improvements in both semantic alignment and physical plausibility, validating that the optimal substrate for motion planning is not natural language, but a learned, motion-aligned concept space. Codes and demos can be found in \hyperlink{https://chenhaoqcdyq.github.io/LMR/}{https://chenhaoqcdyq.github.io/LMR/}

preprint2024arXiv

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation

The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance. Most prior works adopt unified DETR framework to generate segmentation masks in query-to-instance manner. In this work, we integrate strengths of that leading RVOS models to build up an effective paradigm. We first obtain binary mask sequences from the RVOS models. To improve the consistency and quality of masks, we propose Two-Stage Multi-Model Fusion strategy. Each stage rationally ensembles RVOS models based on framework design as well as training strategy, and leverages different video object segmentation (VOS) models to enhance mask coherence by object propagation mechanism. Our method achieves 75.7% J&F on Ref-Youtube-VOS validation set and 70% J&F on test set, which ranks 1st place on 5th Large-scale Video Object Segmentation Challenge (ICCV 2023) track 3. Code is available at https://github.com/RobertLuo1/iccv2023_RVOS_Challenge.

preprint2024arXiv

FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning

Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic communication complexity. In this paper, we introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. To enhance communication efficiency, we reduce the sketch size to match the effective dimension of the Hessian matrix. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches. Specifically, our approaches reach super-linear convergence rates w.r.t. the communication rounds for the first time. We validate the effectiveness of our algorithms through various experiments, which coincide with our theoretical findings.

preprint2024arXiv

Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images

Focusing on the complicated pathological features, such as blurred boundaries, severe scale differences between symptoms, background noise interference, etc., in the task of retinal edema lesions joint segmentation from OCT images and enabling the segmentation results more reliable. In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network, which can provide accurate segmentation results with reliability assessment. Specifically, aiming at improving the model's ability to learn the complex pathological features of retinal edema lesions in OCT images, we develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module of our newly designed. Meanwhile, to make the segmentation results more reliable, a novel uncertainty segmentation head based on the subjective logical evidential theory is introduced to generate the final segmentation results with a corresponding overall uncertainty evaluation score map. We conduct comprehensive experiments on the public database of AI-Challenge 2018 for retinal edema lesions segmentation, and the results show that our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches. The code will be released on: https://github.com/LooKing9218/ReliableRESeg.

preprint2024arXiv

Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt

Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant `anomaly' model predictions using task-specific `normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.

preprint2023arXiv

A Generalist FaceX via Learning Unified Facial Representation

This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously. To achieve this goal, we initially formulate a unified facial representation for a broad spectrum of facial editing tasks, which macroscopically decomposes a face into fundamental identity, intra-personal variation, and environmental factors. Based on this, we introduce Facial Omni-Representation Decomposing (FORD) for seamless manipulation of various facial components, microscopically decomposing the core aspects of most facial editing tasks. Furthermore, by leveraging the prior of a pretrained StableDiffusion (SD) to enhance generation quality and accelerate training, we design Facial Omni-Representation Steering (FORS) to first assemble unified facial representations and then effectively steer the SD-aware generation process by the efficient Facial Representation Controller (FRC). %Without any additional features, Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks. Full codes and models will be available at https://github.com/diffusion-facex/FaceX.

preprint2023arXiv

A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that deep learning has great potential to accelerate and refine metasurface designs. Here, we present XGAN, an extended generative adversarial network (GAN) with a surrogate for high-quality free-form metasurface designs. The proposed surrogate provides a physical constraint to XGAN so that XGAN can accurately generate metasurfaces monolithically from input spectral responses. In comparative experiments involving 20000 free-form metasurface designs, XGAN achieves 0.9734 average accuracy and is 500 times faster than the conventional methodology. This method facilitates the metasurface library building for specific spectral responses and can be extended to various inverse design problems, including optical metamaterials, nanophotonic devices, and drug discovery.

preprint2023arXiv

BSNet: Lane Detection via Draw B-spline Curves Nearby

Curve-based methods are one of the classic lane detection methods. They learn the holistic representation of lane lines, which is intuitive and concise. However, their performance lags behind the recent state-of-the-art methods due to the limitation of their lane representation and optimization. In this paper, we revisit the curve-based lane detection methods from the perspectives of the lane representations' globality and locality. The globality of lane representation is the ability to complete invisible parts of lanes with visible parts. The locality of lane representation is the ability to modify lanes locally which can simplify parameter optimization. Specifically, we first propose to exploit the b-spline curve to fit lane lines since it meets the locality and globality. Second, we design a simple yet efficient network BSNet to ensure the acquisition of global and local features. Third, we propose a new curve distance to make the lane detection optimization objective more reasonable and alleviate ill-conditioned problems. The proposed methods achieve state-of-the-art performance on the Tusimple, CULane, and LLAMAS datasets, which dramatically improved the accuracy of curve-based methods in the lane detection task while running far beyond real-time (197FPS).

preprint2023arXiv

Converse Attention Knowledge Transfer for Low-Resource Named Entity Recognition

In recent years, great success has been achieved in many tasks of natural language processing (NLP), e.g., named entity recognition (NER), especially in the high-resource language, i.e., English, thanks in part to the considerable amount of labeled resources. However, most low-resource languages do not have such an abundance of labeled data as high-resource English, leading to poor performance of NER in these low-resource languages. Inspired by knowledge transfer, we propose Converse Attention Network, or CAN in short, to improve the performance of NER in low-resource languages by leveraging the knowledge learned in pretrained high-resource English models. CAN first translates low-resource languages into high-resource English using an attention based translation module. In the process of translation, CAN obtain the attention matrices that align the two languages. Furthermore, CAN use the attention matrices to align the high-resource semantic features from a pretrained high-resource English model with the low-resource semantic features. As a result, CAN obtains aligned high-resource semantic features to enrich the representations of low-resource languages. Experiments on four low-resource NER datasets show that CAN achieves consistent and significant performance improvements, which indicates the effectiveness of CAN.

preprint2022arXiv

A Deep Reinforcement Learning based Approach for NOMA-based Random Access Network with Truncated Channel Inversion Power Control

As a main use case of 5G and Beyond wireless network, the ever-increasing machine type communications (MTC) devices pose critical challenges over MTC network in recent years. It is imperative to support massive MTC devices with limited resources. To this end, Non-orthogonal multiple access (NOMA) based random access network has been deemed as a prospective candidate for MTC network. In this paper, we propose a deep reinforcement learning (RL) based approach for NOMA-based random access network with truncated channel inversion power control. Specifically, each MTC device randomly selects a pre-defined power level with a certain probability for data transmission. Devices are using channel inversion power control yet subject to the upper bound of the transmission power. Due to the stochastic feature of the channel fading and the limited transmission power, devices with different achievable power levels have been categorized as different types of devices. In order to achieve high throughput with considering the fairness between all devices, two objective functions are formulated. One is to maximize the minimum long-term expected throughput of all MTC devices, the other is to maximize the geometric mean of the long-term expected throughput for all MTC devices. A Policy based deep reinforcement learning approach is further applied to tune the transmission probabilities of each device to solve the formulated optimization problems. Extensive simulations are conducted to show the merits of our proposed approach.

preprint2022arXiv

A Survey of Visual Sensory Anomaly Detection

Visual sensory anomaly detection (AD) is an essential problem in computer vision, which is gaining momentum recently thanks to the development of AI for good. Compared with semantic anomaly detection which detects anomaly at the label level (semantic shift), visual sensory AD detects the abnormal part of the sample (covariate shift). However, no thorough review has been provided to summarize this area for the computer vision community. In this survey, we are the first one to provide a comprehensive review of visual sensory AD and category into three levels according to the form of anomalies. Furthermore, we classify each kind of anomaly according to the level of supervision. Finally, we summarize the challenges and provide open directions for this community. All resources are available at https://github.com/M-3LAB/awesome-visual-sensory-anomaly-detection.

preprint2022arXiv

Class-Aware Contrastive Semi-Supervised Learning

Pseudo-label-based semi-supervised learning (SSL) has achieved great success on raw data utilization. However, its training procedure suffers from confirmation bias due to the noise contained in self-generated artificial labels. Moreover, the model's judgment becomes noisier in real-world applications with extensive out-of-distribution data. To address this issue, we propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL), which is a drop-in helper to improve the pseudo-label quality and enhance the model's robustness in the real-world setting. Rather than treating real-world data as a union set, our method separately handles reliable in-distribution data with class-wise clustering for blending into downstream tasks and noisy out-of-distribution data with image-wise contrastive for better generalization. Furthermore, by applying target re-weighting, we successfully emphasize clean label learning and simultaneously reduce noisy label learning. Despite its simplicity, our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10. On the real-world dataset Semi-iNat 2021, we improve FixMatch by 9.80% and CoMatch by 3.18%. Code is available https://github.com/TencentYoutuResearch/Classification-SemiCLS.

preprint2022arXiv

Concurrent Adversarial Learning for Large-Batch Training

Large-batch training has become a commonly used technique when training neural networks with a large number of GPU/TPU processors. As batch size increases, stochastic optimizers tend to converge to sharp local minima, leading to degraded test performance. Current methods usually use extensive data augmentation to increase the batch size, but we found the performance gain with data augmentation decreases as batch size increases, and data augmentation will become insufficient after certain point. In this paper, we propose to use adversarial learning to increase the batch size in large-batch training. Despite being a natural choice for smoothing the decision surface and biasing towards a flat region, adversarial learning has not been successfully applied in large-batch training since it requires at least two sequential gradient computations at each step, which will at least double the running time compared with vanilla training even with a large number of processors. To overcome this issue, we propose a novel Concurrent Adversarial Learning (ConAdv) method that decouple the sequential gradient computations in adversarial learning by utilizing staled parameters. Experimental results demonstrate that ConAdv can successfully increase the batch size on ResNet-50 training on ImageNet while maintaining high accuracy. In particular, we show ConAdv along can achieve 75.3\% top-1 accuracy on ImageNet ResNet-50 training with 96K batch size, and the accuracy can be further improved to 76.2\% when combining ConAdv with data augmentation. This is the first work successfully scales ResNet-50 training batch size to 96K.

preprint2022arXiv

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

Optical flow estimation aims to find the 2D motion field by identifying corresponding pixels between two images. Despite the tremendous progress of deep learning-based optical flow methods, it remains a challenge to accurately estimate large displacements with motion blur. This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images. The locality of convolutional features makes the computed correlations susceptible to various noises. On large displacements with motion blur, noisy correlations could cause severe errors in the estimated flow. To overcome this challenge, we propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT), aiming to revitalize the correlation volume computation. In CRAFT, a Semantic Smoothing Transformer layer transforms the features of one frame, making them more global and semantically stable. In addition, the dot-product correlations are replaced with transformer Cross-Frame Attention. This layer filters out feature noises through the Query and Key projections, and computes more accurate correlations. On Sintel (Final) and KITTI (foreground) benchmarks, CRAFT has achieved new state-of-the-art performance. Moreover, to test the robustness of different models on large motions, we designed an image shifting attack that shifts input images to generate large artificial motions. Under this attack, CRAFT performs much more robustly than two representative methods, RAFT and GMA. The code of CRAFT is is available at https://github.com/askerlee/craft.

preprint2022arXiv

Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras

In this paper, we propose a probabilistic continuous-time visual-inertial odometry (VIO) for rolling shutter cameras. The continuous-time trajectory formulation naturally facilitates the fusion of asynchronized high-frequency IMU data and motion-distorted rolling shutter images. To prevent intractable computation load, the proposed VIO is sliding-window and keyframe-based. We propose to probabilistically marginalize the control points to keep the constant number of keyframes in the sliding window. Furthermore, the line exposure time difference (line delay) of the rolling shutter camera can be online calibrated in our continuous-time VIO. To extensively examine the performance of our continuous-time VIO, experiments are conducted on publicly-available WHU-RSVI, TUM-RSVI, and SenseTime-RSVI rolling shutter datasets. The results demonstrate the proposed continuous-time VIO significantly outperforms the existing state-of-the-art VIO methods. The codebase of this paper will also be open-sourced at \url{https://github.com/APRIL-ZJU/Ctrl-VIO}.

preprint2022arXiv

DA$^2$ Dataset: Toward Dexterity-Aware Dual-Arm Grasping

In this paper, we introduce DA$^2$, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects. The dataset contains about 9M pairs of parallel-jaw grasps, generated from more than 6000 objects and each labeled with various grasp dexterity measures. In addition, we propose an end-to-end dual-arm grasp evaluation model trained on the rendered scenes from this dataset. We utilize the evaluation model as our baseline to show the value of this novel and nontrivial dataset by both online analysis and real robot experiments. All data and related code will be open-sourced at https://sites.google.com/view/da2dataset.

preprint2022arXiv

Dynamically Stable Poincaré Embeddings for Neural Manifolds

In a Riemannian manifold, the Ricci flow is a partial differential equation for evolving the metric to become more regular. We hope that topological structures from such metrics may be used to assist in the tasks of machine learning. However, this part of the work is still missing. In this paper, we propose Ricci flow assisted Eucl2Hyp2Eucl neural networks that bridge this gap between the Ricci flow and deep neural networks by mapping neural manifolds from the Euclidean space to the dynamically stable Poincaré ball and then back to the Euclidean space. As a result, we prove that, if initial metrics have an $L^2$-norm perturbation which deviates from the Hyperbolic metric on the Poincaré ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric. Specifically, the role of the Ricci flow is to serve as naturally evolving to the stable Poincaré ball. For such dynamically stable neural manifolds under the Ricci flow, the convergence of neural networks embedded with such manifolds is not susceptible to perturbations. And we show that Ricci flow assisted Eucl2Hyp2Eucl neural networks outperform with their all Euclidean counterparts on image classification tasks.

preprint2022arXiv

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Recently, the image-wise implicit neural representation of videos, NeRV, has gained popularity for its promising results and swift speed compared to regular pixel-wise implicit representations. However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance. The key reason of this phenomenon is the coupled formulation of NeRV, which outputs the spatial and temporal information of video frames directly from the frame index input. In this paper, we propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context. Under the guidance of this new formulation, our model greatly reduces the redundant model parameters, while retaining the representation ability. We experimentally find that our method can improve the performance to a large extent with fewer parameters, resulting in a more than $8\times$ faster speed on convergence. Code is available at https://github.com/kyleleey/E-NeRV.

preprint2022arXiv

Efficient Trajectory Planning and Control for USV with Vessel Dynamics and Differential Flatness

Unmanned surface vessels (USVs) are widely used in ocean exploration and environmental protection fields. To ensure that USV can successfully perform its mission, trajectory planning and motion tracking are the two most critical technologies. In this paper, we propose a novel trajectory generation and tracking method for USV based on optimization theory. Specifically, the USV dynamic model is described with differential flatness, so that the trajectory can be generated by dynamic RRT* in a linear invariant system expression form under the objective of optimal boundary value. To reduce the sample number and improve efficiency, we adjust the trajectory through local optimization. The dynamic constraints are considered in the optimization process so that the generated trajectory conforms to the kinematic characteristics of the under-actuated hull, and makes it easier to be tracked. Finally, motion tracking is added with model predictive control under a sequential quadratic programming problem. Experimental results show the planned trajectory is more in line with the kinematic characteristics of USV, and the tracking accuracy remains a higher level.

preprint2022arXiv

Enhancing Sequential Recommendation with Graph Contrastive Learning

The sequential recommendation systems capture users' dynamic behavior patterns to predict their next interaction behaviors. Most existing sequential recommendation methods only exploit the local context information of an individual interaction sequence and learn model parameters solely based on the item prediction loss. Thus, they usually fail to learn appropriate sequence representations. This paper proposes a novel recommendation framework, namely Graph Contrastive Learning for Sequential Recommendation (GCL4SR). Specifically, GCL4SR employs a Weighted Item Transition Graph (WITG), built based on interaction sequences of all users, to provide global context information for each interaction and weaken the noise information in the sequence data. Moreover, GCL4SR uses subgraphs of WITG to augment the representation of each interaction sequence. Two auxiliary learning objectives have also been proposed to maximize the consistency between augmented representations induced by the same interaction sequence on WITG, and minimize the difference between the representations augmented by the global context on WITG and the local representation of the original sequence. Extensive experiments on real-world datasets demonstrate that GCL4SR consistently outperforms state-of-the-art sequential recommendation methods.

preprint2022arXiv

Exemplar-Based Image Colorization with A Learning Framework

Image learning and colorization are hot spots in multimedia domain. Inspired by the learning capability of humans, in this paper, we propose an automatic colorization method with a learning framework. This method can be viewed as a hybrid of exemplar-based and learning-based method, and it decouples the colorization process and learning process so as to generate various color styles for the same gray image. The matching process in the exemplar-based colorization method can be regarded as a parameterized function, and we employ a large amount of color images as the training samples to fit the parameters. During the training process, the color images are the ground truths, and we learn the optimal parameters for the matching process by minimizing the errors in terms of the parameters for the matching function. To deal with images with various compositions, a global feature is introduced, which can be used to classify the images with respect to their compositions, and then learn the optimal matching parameters for each image category individually. What's more, a spatial consistency based post-processing is design to smooth the extracted color information from the reference image to remove matching errors. Extensive experiments are conducted to verify the effectiveness of the method, and it achieves comparable performance against the state-of-the-art colorization algorithms.

preprint2022arXiv

First-principles study the structural, magnetic, optical properties and doping effect in chromium arsenide

We systematically study the pristine and doped chromium arsenide (CrAs) in six different crystal structures to investigate the structural, magnetic, and optical properties for real applications by first-principles calculations. First, we found that the ground-state structure is an orthorhombic MnP-type structure with antiferromagnetic spin order. The rocksalt structure is an low-energy metastable phase and a ferromagnetic metal with high spin polarization at the Fermi level. Secondly, the NiAs structure and MnP structure have a higher absorption coefficient than other structures in the infrared region and ultraviolet region, respectively. In the visible light region, the wurtzite and zincblende structures are more transparent than other structures. At last, we found that Ti substitution of Cr and Te substitution of As can lead to a phase transition in ground-state structure and ground-state magnetic order, respectively. These results can promote the application of the CrAs system into spintronics.

preprint2022arXiv

Guide Local Feature Matching by Overlap Estimation

Local image feature matching under large appearance, viewpoint, and distance changes is challenging yet important. Conventional methods detect and match tentative local features across the whole images, with heuristic consistency checks to guarantee reliable matches. In this paper, we introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR, to constrain local feature matching in the commonly visible region. OETR performs overlap estimation in a two-step process of feature correlation and then overlap regression. As a preprocessing module, OETR can be plugged into any existing local feature detection and matching pipeline, to mitigate potential view angle or scale variance. Intensive experiments show that OETR can boost state-of-the-art local feature matching performance substantially, especially for image pairs with small shared regions. The code will be publicly available at https://github.com/AbyssGaze/OETR.

preprint2022arXiv

Joint Learning Content and Degradation Aware Feature for Blind Super-Resolution

To achieve promising results on blind image super-resolution (SR), some attempts leveraged the low resolution (LR) images to predict the kernel and improve the SR performance. However, these Supervised Kernel Prediction (SKP) methods are impractical due to the unavailable real-world blur kernels. Although some Unsupervised Degradation Prediction (UDP) methods are proposed to bypass this problem, the \textit{inconsistency} between degradation embedding and SR feature is still challenging. By exploring the correlations between degradation embedding and SR feature, we observe that jointly learning the content and degradation aware feature is optimal. Based on this observation, a Content and Degradation aware SR Network dubbed CDSR is proposed. Specifically, CDSR contains three newly-established modules: (1) a Lightweight Patch-based Encoder (LPE) is applied to jointly extract content and degradation features; (2) a Domain Query Attention based module (DQA) is employed to adaptively reduce the inconsistency; (3) a Codebook-based Space Compress module (CSC) that can suppress the redundant information. Extensive experiments on several benchmarks demonstrate that the proposed CDSR outperforms the existing UDP models and achieves competitive performance on PSNR and SSIM even compared with the state-of-the-art SKP methods.

preprint2022arXiv

Kernel representation formula from complex to real Wiener-Ito integrals and vice versa

We clearly characterize the relation between real and complex Wiener-Ito integrals. Given a complex multiple Wiener-Ito integral, we get explicit expressions for two kernels of its real and imaginary parts. Conversely, consider a two-dimensional real Wiener-Ito integral, we obtain the representation formula by a finite sum of complex Wiener-Ito integrals. The main tools are a recursion technique and Malliavin derivative operators. We build a bridge between real and complex Wiener-Ito integrals.

preprint2022arXiv

Large deviations principle for stationary solutions of stochastic differential equations with multiplicative noise

We study the large deviations principle (LDP) for stationary solutions of a class of stochastic differential equations (SDE) in infinite time intervals by the weak convergence approach, and then establish the LDP for the invariant measures of the SDE by the contraction principle. We further point out the equivalence of the rate function of the LDP for invariant measures induced by the LDP for stationary solutions and the rate function defined by quasi-potential. This fact gives another view of the quasi-potential introduced by Freidlin and Wentzell.

preprint2022arXiv

Large-scale full-programmable quantum walk and its applications

With photonics, the quantum computational advantage has been demonstrated on the task of boson sampling. Next, developing quantum-enhanced approaches for practical problems becomes one of the top priorities for photonic systems. Quantum walks are powerful kernels for developing new and useful quantum algorithms. Here we realize large-scale quantum walks using a fully programmable photonic quantum computing system. The system integrates a silicon quantum photonic chip, enabling the simulation of quantum walk dynamics on graphs with up to 400 vertices and possessing full programmability over quantum walk parameters, including the particle property, initial state, graph structure, and evolution time. In the 400-dimensional Hilbert space, the average fidelity of random entangled quantum states after the whole on-chip circuit evolution reaches as high as 94.29$\pm$1.28$\%$. With the system, we demonstrated exponentially faster hitting and quadratically faster mixing performance of quantum walks over classical random walks, achieving more than two orders of magnitude of enhancement in the experimental hitting efficiency and almost half of the reduction in the experimental evolution time for mixing. We utilize the system to implement a series of quantum applications, including measuring the centrality of scale-free networks, searching targets on Erdös-Rényi networks, distinguishing non-isomorphic graph pairs, and simulating the topological phase of higher-order topological insulators. Our work shows one feasible path for quantum photonics to address applications of practical interests in the near future.

preprint2022arXiv

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos. However, they mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be memorized, which leads to a segmentation mask error accumulation problem and further affect the segmentation performance. In addition, the linear increase of memory frames with the growth of frame number also limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to improve the practicability of the models. Without any bells and whistles, our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks. Moreover, extensive experiments demonstrate that the proposed Quality Assessment Module (QAM) can be applied to memory-based methods as generic plugins and significantly improves performance. Our source code is available at https://github.com/workforai/QDMN.

preprint2022arXiv

Lightweight Object-level Topological Semantic Mapping and Long-term Global Localization based on Graph Matching

Mapping and localization are two essential tasks for mobile robots in real-world applications. However, largescale and dynamic scenes challenge the accuracy and robustness of most current mature solutions. This situation becomes even worse when computational resources are limited. In this paper, we present a novel lightweight object-level mapping and localization method with high accuracy and robustness. Different from previous methods, our method does not need a prior constructed precise geometric map, which greatly releases the storage burden, especially for large-scale navigation. We use object-level features with both semantic and geometric information to model landmarks in the environment. Particularly, a learning topological primitive is first proposed to efficiently obtain and organize the object-level landmarks. On the basis of this, we use a robot-centric mapping framework to represent the environment as a semantic topology graph and relax the burden of maintaining global consistency at the same time. Besides, a hierarchical memory management mechanism is introduced to improve the efficiency of online mapping with limited computational resources. Based on the proposed map, the robust localization is achieved by constructing a novel local semantic scene graph descriptor, and performing multi-constraint graph matching to compare scene similarity. Finally, we test our method on a low-cost embedded platform to demonstrate its advantages. Experimental results on a large scale and multi-session real-world environment show that the proposed method outperforms the state of arts in terms of lightweight and robustness.

preprint2022arXiv

Minimalist and High-performance Conversational Recommendation with Uncertainty Estimation for User Preference

Conversational recommendation system (CRS) is emerging as a user-friendly way to capture users' dynamic preferences over candidate items and attributes. Multi-shot CRS is designed to make recommendations multiple times until the user either accepts the recommendation or leaves at the end of their patience. Existing works are trained with reinforcement learning (RL), which may suffer from unstable learning and prohibitively high demands for computing. In this work, we propose a simple and efficient CRS, MInimalist Non-reinforced Interactive COnversational Recommender Network (MINICORN). MINICORN models the epistemic uncertainty of the estimated user preference and queries the user for the attribute with the highest uncertainty. The system employs a simple network architecture and makes the query-vs-recommendation decision using a single rule. Somewhat surprisingly, this minimalist approach outperforms state-of-the-art RL methods on three real-world datasets by large margins. We hope that MINICORN will serve as a valuable baseline for future research.

preprint2022arXiv

Observability-Aware Intrinsic and Extrinsic Calibration of LiDAR-IMU Systems

Accurate and reliable sensor calibration is essential to fuse LiDAR and inertial measurements, which are usually available in robotic applications. In this paper, we propose a novel LiDAR-IMU calibration method within the continuous-time batch-optimization framework, where the intrinsics of both sensors and the spatial-temporal extrinsics between sensors are calibrated without using calibration infrastructure such as fiducial tags. Compared to discrete-time approaches, the continuous-time formulation has natural advantages for fusing high rate measurements from LiDAR and IMU sensors. To improve efficiency and address degenerate motions, two observability-aware modules are leveraged: (i) The information-theoretic data selection policy selects only the most informative segments for calibration during data collection, which significantly improves the calibration efficiency by processing only the selected informative segments. (ii) The observability-aware state update mechanism in nonlinear least-squares optimization updates only the identifiable directions in the state space with truncated singular value decomposition (TSVD), which enables accurate calibration results even under degenerate cases where informative data segments are not available. The proposed LiDAR-IMU calibration approach has been validated extensively in both simulated and real-world experiments with different robot platforms, demonstrating its high accuracy and repeatability in commonly-seen human-made environments. We also open source our codebase to benefit the research community: {\url{https://github.com/APRIL-ZJU/OA-LICalib}}.

preprint2022arXiv

Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs

Model hubs with many pre-trained models (PTMs) have become a cornerstone of deep learning. Although built at a high cost, they remain \emph{under-exploited} -- practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This naïve but common practice poses two obstacles to full exploitation of pre-trained model hubs: first, the PTM selection by popularity has no optimality guarantee, and second, only one PTM is used while the remaining PTMs are ignored. An alternative might be to consider all possible combinations of PTMs and extensively fine-tune each combination, but this would not only be prohibitive computationally but may also lead to statistical over-fitting. In this paper, we propose a new paradigm for exploiting model hubs that is intermediate between these extremes. The paradigm is characterized by two aspects: (1) We use an evidence maximization procedure to estimate the maximum value of label evidence given features extracted by pre-trained models. This procedure can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) The best ranked PTM can either be fine-tuned and deployed if we have no preference for the model's architecture or the target PTM can be tuned by the top $K$ ranked PTMs via a Bayesian procedure that we propose. This procedure, which we refer to as \emph{B-Tuning}, not only improves upon specialized methods designed for tuning homogeneous PTMs, but also applies to the challenging problem of tuning heterogeneous PTMs where it yields a new level of benchmark performance.

preprint2022arXiv

Region-Aware Face Swapping

This paper presents a novel Region-Aware Face Swapping (RAFSwap) network to achieve identity-consistent harmonious high-resolution face generation in a local-global manner: \textbf{1)} Local Facial Region-Aware (FRA) branch augments local identity-relevant features by introducing the Transformer to effectively model misaligned cross-scale semantic interaction. \textbf{2)} Global Source Feature-Adaptive (SFA) branch further complements global identity-relevant cues for generating identity-consistent swapped faces. Besides, we propose a \textit{Face Mask Predictor} (FMP) module incorporated with StyleGAN2 to predict identity-relevant soft facial masks in an unsupervised manner that is more practical for generating harmonious high-resolution faces. Abundant experiments qualitatively and quantitatively demonstrate the superiority of our method for generating more identity-consistent high-resolution swapped faces over SOTA methods, \eg, obtaining 96.70 ID retrieval that outperforms SOTA MegaFS by 5.87$\uparrow$.

preprint2022arXiv

Revisiting Item Promotion in GNN-based Collaborative Filtering: A Masked Targeted Topological Attack Perspective

Graph neural networks (GNN) based collaborative filtering (CF) have attracted increasing attention in e-commerce and social media platforms. However, there still lack efforts to evaluate the robustness of such CF systems in deployment. Fundamentally different from existing attacks, this work revisits the item promotion task and reformulates it from a targeted topological attack perspective for the first time. Specifically, we first develop a targeted attack formulation to maximally increase a target item's popularity. We then leverage gradient-based optimizations to find a solution. However, we observe the gradient estimates often appear noisy due to the discrete nature of a graph, which leads to a degradation of attack ability. To resolve noisy gradient effects, we then propose a masked attack objective that can remarkably enhance the topological attack ability. Furthermore, we design a computationally efficient approach to the proposed attack, thus making it feasible to evaluate large-large CF systems. Experiments on two real-world datasets show the effectiveness of our attack in analyzing the robustness of GNN-based CF more practically.

preprint2022arXiv

Robust photon-efficient imaging using a pixel-wise residual shrinkage network

Single-photon light detection and ranging (LiDAR) has been widely applied to 3D imaging in challenging scenarios. However, limited signal photon counts and high noises in the collected data have posed great challenges for predicting the depth image precisely. In this paper, we propose a pixel-wise residual shrinkage network for photon-efficient imaging from high-noise data, which adaptively generates the optimal thresholds for each pixel and denoises the intermediate features by soft thresholding. Besides, redefining the optimization target as pixel-wise classification provides a sharp advantage in producing confident and accurate depth estimation when compared with existing research. Comprehensive experiments conducted on both simulated and real-world datasets demonstrate that the proposed model outperforms the state-of-the-arts and maintains robust imaging performance under different signal-to-noise ratios including the extreme case of 1:100.

preprint2022arXiv

Scatter Points in Space: 3D Detection from Multi-view Monocular Images

3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision. To combine information from different perspectives without troublesome 2D instance tracking, recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space, which is inefficient. In this paper, we attempt to improve multi-view feature aggregation by proposing a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity. The scattered points augmented by multi-view geometric constraints and visual features are then employed to infer objects location and shape in the scene. To make up the limitations of single frame and model multi-view geometry explicitly, we further propose a surface filter module for noise suppression. Experimental results show that our method achieves significantly better performance than previous works in terms of 3D detection (more than 0.1 AP improvement on some categories of ScanNet). The code will be publicly available.

preprint2022arXiv

SCSNet: An Efficient Paradigm for Learning Simultaneously Image Colorization and Super-Resolution

In the practical application of restoring low-resolution gray-scale images, we generally need to run three separate processes of image colorization, super-resolution, and dows-sampling operation for the target device. However, this pipeline is redundant and inefficient for the independent processes, and some inner features could have been shared. Therefore, we present an efficient paradigm to perform {S}imultaneously Image {C}olorization and {S}uper-resolution (SCS) and propose an end-to-end SCSNet to achieve this goal. The proposed method consists of two parts: colorization branch for learning color information that employs the proposed plug-and-play \emph{Pyramid Valve Cross Attention} (PVCAttn) module to aggregate feature maps between source and reference images; and super-resolution branch for integrating color and texture information to predict target images, which uses the designed \emph{Continuous Pixel Mapping} (CPM) module to predict high-resolution images at continuous magnification. Furthermore, our SCSNet supports both automatic and referential modes that is more flexible for practical application. Abundant experiments demonstrate the superiority of our method for generating authentic images over state-of-the-art methods, e.g., averagely decreasing FID by 1.8$\downarrow$ and 5.1 $\downarrow$ compared with current best scores for automatic and referential modes, respectively, while owning fewer parameters (more than $\times$2$\downarrow$) and faster running speed (more than $\times$3$\uparrow$).

preprint2022arXiv

Stability and Generalization of Differentially Private Minimax Problems

In the field of machine learning, many problems can be formulated as the minimax problem, including reinforcement learning, generative adversarial networks, to just name a few. So the minimax problem has attracted a huge amount of attentions from researchers in recent decades. However, there is relatively little work on studying the privacy of the general minimax paradigm. In this paper, we focus on the privacy of the general minimax setting, combining differential privacy together with minimax optimization paradigm. Besides, via algorithmic stability theory, we theoretically analyze the high probability generalization performance of the differentially private minimax algorithm under the strongly-convex-strongly-concave condition. To the best of our knowledge, this is the first time to analyze the generalization performance of general minimax paradigm, taking differential privacy into account.

preprint2022arXiv

Stable Ferromagnetism and High Curie Temperature in VGe$_2$N$_4$

The discovery of monolayer MA$_2$Z$_4$ (M = transition metals; A = IVA elements; Z = VA elements, Science 369, 2020, 670-674) family has led another advance for facilitating and harnessing magnetism in low-dimensional materials. However, only Cr and V based MA$_2$N$_4$ compounds exhibit intrinsic magnetism yet with unsatisfied magnetic ordering temperature. Herein, we identify a stable ferromagnetic number of this family, i.e., VGe$_2$Z$_4$ monolayer, by means of first-principles calculations. It is found that the magnetic configuration sustains under both compression and tensile uniaxial in-plane strain, and the former can act as a positive modulator to enhance magnetic ordering temperature (Tc). Electronic structure calculations reveal a large band gap in the spin down channel while band-gapless in the spin up channel, an impressive near-half-metallic character, which is a favorable candidate for spintronic device.

preprint2022arXiv

Superconducting LaP2H2 with graphenelike phosphorus layers

Novel structural building blocks in compounds could induce interesting physical and chemical properties. Although phosphorus tends to form very different motifs, the existence of lone pair electrons has always prevented the formation of graphenelike structures. Here, the application of first-principles swarm structural calculations has allowed us to predict the stability of pressure-induced hexagonal LaP2H2 containing graphenelike phosphorus, which derives from the trigonal bipyramid configuration of P atoms regulated by symmetric hydrogen bonds. LaP2 in LaP2H2 has the same configuration as MgB2, and P and H atoms form a three-dimensional framework as H3S. Interestingly, LaP2H2 shows a superconductivity dominated by the graphenelike phosphorus layer and its coupling with La atoms. On the other hand, LaP2H2 is not only superconducting at a lower pressure than the H-rich LaPH6, but it also shows a superconducting transition temperature three times higher. Our work provides an example which extends the landscape of conventional superconductors at lower pressures.

preprint2022arXiv

SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR Point Cloud

Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud. To train our model without the time consuming and tedious data labeling process, we first generate synthetic primitives for the basic appearance of target lines, and build an iterative line auto-labeling process to gradually refine line labels on real LiDAR scans. Our segmentation model can extract lines under arbitrary scale perturbations, and we use shared EdgeConv encoder layers to train the two segmentation and descriptor heads jointly. Base on the model, we can build a highly-available global registration module for point cloud registration, in conditions without initial transformation hints. Experiments have demonstrated that our line-based registration method is highly competitive to state-of-the-art point-based approaches. Our code is available at https://github.com/zxrzju/SuperLine3D.git.

preprint2022arXiv

Thoughts on the Consistency between Ricci Flow and Neural Network Behavior

The Ricci flow is a partial differential equation for evolving the metric in a Riemannian manifold to make it more regular. On the other hand, neural networks seem to have similar geometric behavior for specific tasks. In this paper, we construct the linearly nearly Euclidean manifold as a background to observe the evolution of Ricci flow and the training of neural networks. Under the Ricci-DeTurck flow, we prove the dynamical stability and convergence of the linearly nearly Euclidean metric for an $L^2$-Norm perturbation. In practice, from the information geometry and mirror descent points of view, we give the steepest descent gradient flow for neural networks on the linearly nearly Euclidean manifold. During the training process of the neural network, we observe that its metric will also regularly converge to the linearly nearly Euclidean metric, which is consistent with the convergent behavior of linearly nearly Euclidean metrics under the Ricci-DeTurck flow.

preprint2022arXiv

Towards Efficient and Scalable Sharpness-Aware Minimization

Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of the loss landscape and generalization, has demonstrated significant performance boosts on training large-scale models such as vision transformers. However, the update rule of SAM requires two sequential (non-parallelizable) gradient computations at each step, which can double the computational overhead. In this paper, we propose a novel algorithm LookSAM - that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. The empirical results illustrate that LookSAM achieves similar accuracy gains to SAM while being tremendously faster - it enjoys comparable computational complexity with first-order optimizers such as SGD or Adam. To further evaluate the performance and scalability of LookSAM, we incorporate a layer-wise modification and perform experiments in the large-batch training scenario, which is more prone to converge to sharp local minima. We are the first to successfully scale up the batch size when training Vision Transformers (ViTs). With a 64k batch size, we are able to train ViTs from scratch in minutes while maintaining competitive performance.

preprint2022arXiv

Towards Practical Differential Privacy in Data Analysis: Understanding the Effect of Epsilon on Utility in Private ERM

In this paper, we focus our attention on private Empirical Risk Minimization (ERM), which is one of the most commonly used data analysis method. We take the first step towards solving the above problem by theoretically exploring the effect of epsilon (the parameter of differential privacy that determines the strength of privacy guarantee) on utility of the learning model. We trace the change of utility with modification of epsilon and reveal an established relationship between epsilon and utility. We then formalize this relationship and propose a practical approach for estimating the utility under an arbitrary value of epsilon. Both theoretical analysis and experimental results demonstrate high estimation accuracy and broad applicability of our approach in practical applications. As providing algorithms with strong utility guarantees that also give privacy when possible becomes more and more accepted, our approach would have high practical value and may be likely to be adopted by companies and organizations that would like to preserve privacy but are unwilling to compromise on utility.

preprint2022arXiv

Two-Dimensional Ferromagnetic Half-Metallic Janus V2AsP Monolayer

Two-dimensional (2D) ferromagnetic materials present promising candidates for spintronic devices, and the half-metallic materials with 100% spin polarization at Fermi energy level are highly desired for many spin-based devices. 2D Janus materials have attracted great attention in recent years due to their excellent properties induced by breaking the symmetry. Here, using the density functional theory, we report that the Janus V2AsP monolayer demonstrates a charming ferromagnetic half-metallic feature. It is dynamically stable in view of the absence of imaginary frequency phonon. The half-metallic gap is about 0.38 eV and the spin splitting of about 1.34eV for the V2AsP monolayer. Interestingly, a tensile strain of 4.9% can induce it to undergo a phase transition from ferromagnetic to anti-ferromagnetic state. Moreover, the Curie temperature (Tc) enhances with the increase of compressive strain. All there appealing properties make the half-metallic Janus V2AsP monolayer a promising material for 2D spintronic applications.

preprint2022arXiv

Understanding the Generalization Performance of Spectral Clustering Algorithms

The theoretical analysis of spectral clustering mainly focuses on consistency, while there is relatively little research on its generalization performance. In this paper, we study the excess risk bounds of the popular spectral clustering algorithms: \emph{relaxed} RatioCut and \emph{relaxed} NCut. Firstly, we show that their excess risk bounds between the empirical continuous optimal solution and the population-level continuous optimal solution have a $\mathcal{O}(1/\sqrt{n})$ convergence rate, where $n$ is the sample size. Secondly, we show the fundamental quantity in influencing the excess risk between the empirical discrete optimal solution and the population-level discrete optimal solution. At the empirical level, algorithms can be designed to reduce this quantity. Based on our theoretical analysis, we propose two novel algorithms that can not only penalize this quantity, but also cluster the out-of-sample data without re-eigendecomposition on the overall sample. Experiments verify the effectiveness of the proposed algorithms.

preprint2021arXiv

A Quantitative Metric for Privacy Leakage in Federated Learning

In the federated learning system, parameter gradients are shared among participants and the central modulator, while the original data never leave their protected source domain. However, the gradient itself might carry enough information for precise inference of the original data. By reporting their parameter gradients to the central server, client datasets are exposed to inference attacks from adversaries. In this paper, we propose a quantitative metric based on mutual information for clients to evaluate the potential risk of information leakage in their gradients. Mutual information has received increasing attention in the machine learning and data mining community over the past few years. However, existing mutual information estimation methods cannot handle high-dimensional variables. In this paper, we propose a novel method to approximate the mutual information between the high-dimensional gradients and batched input data. Experimental results show that the proposed metric reliably reflect the extent of information leakage in federated learning. In addition, using the proposed metric, we investigate the influential factors of risk level. It is proven that, the risk of information leakage is related to the status of the task model, as well as the inherent data distribution.

preprint2021arXiv

Cocktail Edge Caching: Ride Dynamic Trends of Content Popularity with Ensemble Learning

Edge caching will play a critical role in facilitating the emerging content-rich applications. However, it faces many new challenges, in particular, the highly dynamic content popularity and the heterogeneous caching configurations. In this paper, we propose Cocktail Edge Caching, that tackles the dynamic popularity and heterogeneity through ensemble learning. Instead of trying to find a single dominating caching policy for all the caching scenarios, we employ an ensemble of constituent caching policies and adaptively select the best-performing policy to control the cache. Towards this goal, we first show through formal analysis and experiments that different variations of the LFU and LRU policies have complementary performance in different caching scenarios. We further develop a novel caching algorithm that enhances LFU/LRU with deep recurrent neural network (LSTM) based time-series analysis. Finally, we develop a deep reinforcement learning agent that adaptively combines base caching policies according to their virtual hit ratios on parallel virtual caches. Through extensive experiments driven by real content requests from two large video streaming platforms, we demonstrate that CEC not only consistently outperforms all single policies, but also improves the robustness of them. CEC can be well generalized to different caching scenarios with low computation overheads for deployment.

preprint2021arXiv

Discovery of carbon-based strongest and hardest amorphous material

Carbon is likely the most fascinating element of the periodic table because of the diversity of its allotropes stemming from its variable (sp, sp2, and sp3) bonding motifs. Exploration of new forms of carbon has been an eternal theme of contemporary scientific research. Here we report on novel amorphous carbon phases containing high fraction of sp3 bonded atoms recovered after compressing fullerene C60 to previously unexplored high pressure and temperature. The synthesized carbons are the hardest and strongest amorphous materials known to date, capable of scratching diamond crystal and approaching its strength which is evidenced by complimentary mechanical tests. Photoluminescence and absorption spectra of the materials demonstrate they are semiconductors with tunable bandgaps in the range of 1.5-2.2 eV, comparable to that of amorphous silicon. A remarkable combination of the outstanding mechanical and electronic properties makes this class of amorphous carbons an excellent candidate for photovoltaic applications demanding ultrahigh strength and wear resistance.

preprint2021arXiv

FlowMOT: 3D Multi-Object Tracking by Scene Flow Association

Most end-to-end Multi-Object Tracking (MOT) methods face the problems of low accuracy and poor generalization ability. Although traditional filter-based methods can achieve better results, they are difficult to be endowed with optimal hyperparameters and often fail in varying scenarios. To alleviate these drawbacks, we propose a LiDAR-based 3D MOT framework named FlowMOT, which integrates point-wise motion information with the traditional matching algorithm, enhancing the robustness of the motion prediction. We firstly utilize a scene flow estimation network to obtain implicit motion information between two adjacent frames and calculate the predicted detection for each old tracklet in the previous frame. Then we use Hungarian algorithm to generate optimal matching relations with the ID propagation strategy to finish the tracking task. Experiments on KITTI MOT dataset show that our approach outperforms recent end-to-end methods and achieves competitive performance with the state-of-the-art filter-based method. In addition, ours can work steadily in the various-speed scenarios where the filter-based methods may fail.

preprint2021arXiv

Generalized Adler-Moser Polynomials and Multiple vortex rings for the Gross-Pitaevskii equation

New finite energy traveling wave solutions with small speed are constructed for the three dimensional Gross-Pitaevskii equation \begin{equation*} iΨ_t= ΔΨ+(1-|Ψ|^2)Ψ, \end{equation*} where $Ψ$ is a complex valued function defined on ${\mathbb R}^3\times{\mathbb R}$. These solutions have the shape of $2n+1$ vortex rings, far away from each other. Among these vortex rings, $n+1$ of them have positive orientation and the other $n$ of them have negative orientation. The location of these rings are described by the roots of a sequence of polynomials with rational coefficients. The polynomials found here can be regarded as a generalization of the classical Adler-Moser polynomials and can be expressed as the Wronskian of certain very special functions. The techniques used in the derivation of these polynomials should have independent interest.

preprint2021arXiv

Keyword-Guided Neural Conversational Model

We study the problem of imposing conversational goals/keywords on open-domain conversational agents, where the agent is required to lead the conversation to a target keyword smoothly and fast. Solving this problem enables the application of conversational agents in many real-world scenarios, e.g., recommendation and psychotherapy. The dominant paradigm for tackling this problem is to 1) train a next-turn keyword classifier, and 2) train a keyword-augmented response retrieval model. However, existing approaches in this paradigm have two limitations: 1) the training and evaluation datasets for next-turn keyword classification are directly extracted from conversations without human annotations, thus, they are noisy and have low correlation with human judgements, and 2) during keyword transition, the agents solely rely on the similarities between word embeddings to move closer to the target keyword, which may not reflect how humans converse. In this paper, we assume that human conversations are grounded on commonsense and propose a keyword-guided neural conversational model that can leverage external commonsense knowledge graphs (CKG) for both keyword transition and response retrieval. Automatic evaluations suggest that commonsense improves the performance of both next-turn keyword prediction and keyword-augmented response retrieval. In addition, both self-play and human evaluations show that our model produces responses with smoother keyword transition and reaches the target keyword faster than competitive baselines.

preprint2021arXiv

Learning Hierarchical Review Graph Representations for Recommendation

The user review data have been demonstrated to be effective in solving different recommendation problems. Previous review-based recommendation methods usually employ sophisticated compositional models, such as Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to learn semantic representations from the review data for recommendation. However, these methods mainly capture the local dependency between neighbouring words in a word window, and they treat each review equally. Therefore, they may not be effective in capturing the global dependency between words, and tend to be easily biased by noise review information. In this paper, we propose a novel review-based recommendation model, named Review Graph Neural Network (RGNN). Specifically, RGNN builds a specific review graph for each individual user/item, which provides a global view about the user/item properties to help weaken the biases caused by noise review information. A type-aware graph attention mechanism is developed to learn semantic embeddings of words. Moreover, a personalized graph pooling operator is proposed to learn hierarchical representations of the review graph to form the semantic representation for each user/item. We compared RGNN with state-of-the-art review-based recommendation approaches on two real-world datasets. The experimental results indicate that RGNN consistently outperforms baseline methods, in terms of Mean Square Error (MSE).

preprint2021arXiv

Pre-training Graph Transformer with Multimodal Side Information for Recommendation

Side information of items, e.g., images and text description, has shown to be effective in contributing to accurate recommendations. Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We relate items by common user activities, e.g., co-purchase, and construct a homogeneous item graph. This graph provides a unified view of item relations and their associated side information in multimodality. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction. Experimental results on real datasets demonstrate that the proposed PMGT model effectively exploits the multimodality side information to achieve better accuracies in downstream tasks including item recommendation, item classification, and click-through ratio prediction. We also report a case study of testing the proposed PMGT model in an online setting with 600 thousand users.

preprint2021arXiv

Quantum spin Hall effect in two-dimensional transition-metal chalcogenides

Based on first-principles calculations, we have found a family of 2D transition-metal (TM) chalcogenides MX5 (M = Zr, Hf and X = S, Se and Te) can host quantum spin Hall (QSH) effect. The molecular dynamics simulation indicate that they are all thermal-dynamically stable at room temperature, the largest band gap is 0.19 eV. We have investigated MX5's electronic properties and found their properties are very similar. The single-layer ZrX5 are all gapless semimetals without consideration of spin-orbit coupling (SOC). The consideration of SOC will result in insulating phases with band gaps of 0.05 eV (direct), 0.18 eV (direct) and 0.13 eV (indirect) for ZrS5, ZrSe5 to ZrTe5, respectively. The evolution of Wannier charge centers and edge states confirm they are all QSH insulators. The mechanisms for QSH effect in ZrX5 originate from the special nonsymmorphic space group features. In addition, the QSH state of ZrS5 survives at a large range of strain as long as the interchain coupling is not strong enough to reverse the band ordering. The single-layer ZrS5 will occur a topological insulator (TI)-to-semimetal (metal) or metal-to-semimetal transition under certain strain. Monolayer MX5 expand the TI materials based on TM chalcogenides and may open up a new way to fabricate novel low power spintronic devices at room temperature.

preprint2021arXiv

Sparse Reconstruction for Radar Imaging based on Quantum Algorithms

The sparse-driven radar imaging can obtain the high-resolution images about target scene with the down-sampled data. However, the huge computational complexity of the classical sparse recovery method for the particular situation seriously affects the practicality of the sparse imaging technology. In this paper, this is the first time the quantum algorithms are applied to the image recovery for the radar sparse imaging. Firstly, the radar sparse imaging problem is analyzed and the calculation problem to be solved by quantum algorithms is determined. Then, the corresponding quantum circuit and its parameters are designed to ensure extremely low computational complexity, and the quantum-enhanced reconstruction algorithm for sparse imaging is proposed. Finally, the computational complexity of the proposed method is analyzed, and the simulation experiments with the raw radar data are illustrated to verify the validity of the proposed method.

preprint2021arXiv

Structure and magnetic properties of melilite-type compounds RE2Be2GeO7 (RE = Pr, Nd, Gd-Yb) with Rare-Earth ions on Shastry-Sutherland lattice

Rare-earth (RE) based frustrated magnets as typical systems of combining strong spin-orbit coupling, geometric frustration and anisotropic exchange interactions, can give rise to diverse exotic magnetic ground states such as quantum spin liquid (QSL). The discovery of new RE-based frustrated materials is crucial for exploring the exotic magnetic phases. Herein, we report the synthesis, structure and magnetic properties of a family of melilite-type RE2Be2GeO7 (RE = Pr, Nd, Gd-Yb) compounds crystallized in a tetragonal structure, where magnetic RE3+ ions lay out on Shastry-Sutherland lattice (SSL) within ab-plane and are well separated by nonmagnetic GeBe2O7 polyhedrons along c-axis. Temperature-dependent susceptibilities and isothermal magnetization M(H) measurements reveal that most RE2Be2GeO7 compounds except RE=Tb show no magnetic ordering down to 2 K despite the dominant antiferromagnetic (AFM) interactions, where Tb2Be2GeO7 undergoes AFM transition with Neel temperature TN~ 2.5 K and field-induced spin flop behaviors (T< TN). In addition, the calculated magnetic entropy change from the isothermal M(H) curves reveal a viable magnetocaloric effect (MCE) for RE2Be2GeO7 (RE =Gd, Dy) in liquid helium temperature regimes, Gd2Be2GeO7 shows maximum Sm up to 54.8 J K-1 Kg-1 at H= 7 T and Dy2Be2GeO7 has largest value Sm=16.1 J K-1 kg-1 at H= 2 T in this family. More excitingly, rich diversity of RE ions in this family enables an archetype for exploring exotic quantum magnetic phenomena with large variability of spin located on SSL lattice.

preprint2021arXiv

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation

In this paper we tackle the problem of pose guided person image generation, which aims to transfer a person image from the source pose to a novel target pose while maintaining the source appearance. Given the inefficiency of standard CNNs in handling large spatial transformation, we propose a structure-aware flow based method for high-quality person image generation. Specifically, instead of learning the complex overall pose changes of human body, we decompose the human body into different semantic parts (e.g., head, torso, and legs) and apply different networks to predict the flow fields for these parts separately. Moreover, we carefully design the network modules to effectively capture the local and global semantic correlations of features within and among the human parts respectively. Extensive experimental results show that our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.

preprint2021arXiv

Superconductivity in graphite-diamond hybrid

Search for new high-temperature superconductors and insight into their superconducting mechanism are of fundamental importance in condensed matter physics. The discovery of near-room temperature superconductivity at more than a million atmospheres ushers in a new era for superconductors. However, the critical task of identifying materials with comparable superconductivity at near or ambient pressure remains. Carbon materials can always lead to intriguing surprises due to their structural diversity and electronic adjustability. Insulating diamond upon doping or external stimuli has achieved superconducting state. Thus, it still has a great opportunity to find superconducting ones with higher transition temperature (Tc). Here, we report an intrinsic superconducting graphite-diamond hybrid through first-principles calculations, whose atomic-resolution structural characteristics have been experimentally determined recently. The predicted Tc is approximated at 39 K at ambient pressure, and strain energizing can further boost Tc to 42 K. The strong electron-phonon coupling associated with the out-of-plane vibration of carbon atoms at the junction plays a dominant role in the superconducting transition. Our work demonstrates the great potential of such carbon materials as high-Tc superconductors, which will definitely attract extensive research.

preprint2021arXiv

Two-dimensional antiferromagnetic semiconductor T&#39;-MoTeI from first principles

Two-dimensional intrinsic antiferromagnetic semiconductors are expected to stand out in the spintronic field. The present work finds the monolayer T&#39;-MoTeI is intrinsically an antiferromagnetic semiconductor by using first-principles calculation. Firstly, the dimerized distortion of the Mo atoms causes T&#39;-MoTeI to have dynamic stability, which is different from the small imaginary frequency in the phonon spectrum of T-MoTeI. Secondly, T&#39;-MoTeI is an indirect-bandgap semiconductor with 1.35 eV. Finally, in the systematic study of strain effects, there are significant changes in the electronic structure as well as the bandgap, but the antiferromagnetic ground state is not affected. Monte Carlo simulations predict that the Neel temperature of T&#39;-MoTeI is 95 K. The results suggest that the monolayer T&#39;-MoTeI can be a potential candidate for spintronics applications.

preprint2021arXiv

Variational quantum process tomography

Quantum process tomography is an experimental technique to fully characterize an unknown quantum process. Standard quantum process tomography suffers from exponentially scaling of the number of measurements with the increasing system size. In this work, we put forward a quantum machine learning algorithm which approximately encodes the unknown unitary quantum process into a relatively shallow depth parametric quantum circuit. We demonstrate our method by reconstructing the unitary quantum processes resulting from the quantum Hamiltonian evolution and random quantum circuits up to $8$ qubits. Results show that those quantum processes could be reconstructed with high fidelity, while the number of input states required are at least $2$ orders of magnitude less than required by the standard quantum process tomography.

preprint2020arXiv

A Learning Framework for n-bit Quantized Neural Networks toward FPGAs

The quantized neural network (QNN) is an efficient approach for network compression and can be widely used in the implementation of FPGAs. This paper proposes a novel learning framework for n-bit QNNs, whose weights are constrained to the power of two. To solve the gradient vanishing problem, we propose a reconstructed gradient function for QNNs in back-propagation algorithm that can directly get the real gradient rather than estimating an approximate gradient of the expected loss. We also propose a novel QNN structure named n-BQ-NN, which uses shift operation to replace the multiply operation and is more suitable for the inference on FPGAs. Furthermore, we also design a shift vector processing element (SVPE) array to replace all 16-bit multiplications with SHIFT operations in convolution operation on FPGAs. We also carry out comparable experiments to evaluate our framework. The experimental results show that the quantized models of ResNet, DenseNet and AlexNet through our learning framework can achieve almost the same accuracies with the original full-precision models. Moreover, when using our learning framework to train our n-BQ-NN from scratch, it can achieve state-of-the-art results compared with typical low-precision QNNs. Experiments on Xilinx ZCU102 platform show that our n-BQ-NN with our SVPE can execute 2.9 times faster than with the vector processing element (VPE) in inference. As the SHIFT operation in our SVPE array will not consume Digital Signal Processings (DSPs) resources on FPGAs, the experiments have shown that the use of SVPE array also reduces average energy consumption to 68.7% of the VPE array with 16-bit.

preprint2020arXiv

A new family of disorder-free Rare-Earth-based kagomé lattice magnets: structure and magnetic characterizations of RE3BWO9 (RE=Pr, Nd, Gd-Ho) Boratotungstates

Exploration of rare-earth (RE)-based Kagome lattice magnets with spin-orbital entangled jeff=1/2 moments will provide new platform for investigating the exotic magnetic phases. Here, we report a new family of RE3BWO9 (RE=Pr,Nd,Gd-Ho) boratotungstates with magnetic RE3+ ions arranged on Kagome lattice, and perform its structure and magnetic characterizations. This serial compounds crystallize in hexagonal coordinated structure with space group P63 (No.173), where magnetic RE3+ ions have distorted Kagomé lattice connections within the ab plane and stacked in a AB-type fashion along c axis. The interlayer RE-RE separation is comparable with that of intralayer distance, forming 3-dimensional (3D) exchange coupled magnetic framework of RE3+ ions. The magnetic susceptibility data of RE3BWO9 (RE=Pr, Nd, Gd-Ho) reveal dominant antiferromagnetic interactions between magnetic RE3+ ions, but without visible magnetic ordering down to 2 K. The magnetization analyses for different RE3+ ions show diverse anisotropic behaviors, make RE3BWO9 as an appealing Kagome-lattice antiferromagnet to explore exotic magnetic phases.

preprint2020arXiv

Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since each agent&#39;s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the increase in the number of agents. Previous works borrow various multiagent coordination mechanisms into deep learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents that different actions have different influences on other agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions&#39; influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with several network architectures.

preprint2020arXiv

Adaptive Resource Allocation for Improved DF Aided Downlink Multi-user OFDM Systems

In this letter, we propose a joint resource allocation algorithm for an OFDM-based multi-user system assisted by an improved Decode-and-Forward (DF) relay. We aim at maximizing the sum rate of the system by jointly optimizing subcarrier pairing, subcarrier pair-user assignment, and power allocation in such a single DF relay system. When the relay does not perform any transmission on some subcarriers in the second phase, we further allow the source to transmit new symbols on these inactive subcarriers. We effectively solve the formulated mixed integer programming problem by using continuous relaxation and dual minimization methods. Numerical results verify the theoretical analysis, and illustrate the remarkable gains resulted from the extra direct-link transmissions.

preprint2020arXiv

APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals

Audio-guided face reenactment aims at generating photorealistic faces using audio information while maintaining the same facial movement as when speaking to a real person. However, existing methods can not generate vivid face images or only reenact low-resolution faces, which limits the application value. To solve those problems, we propose a novel deep neural network named APB2Face, which consists of GeometryPredictor and FaceReenactor modules. GeometryPredictor uses extra head pose and blink state signals as well as audio to predict the latent landmark geometry information, while FaceReenactor inputs the face landmark image to reenact the photorealistic face. A new dataset AnnVI collected from YouTube is presented to support the approach, and experimental results indicate the superiority of our method than state-of-the-arts, whether in authenticity or controllability.

preprint2020arXiv

AttentionAnatomy: A unified framework for whole-body organs at risk segmentation using multiple partially annotated datasets

Organs-at-risk (OAR) delineation in computed tomography (CT) is an important step in Radiation Therapy (RT) planning. Recently, deep learning based methods for OAR delineation have been proposed and applied in clinical practice for separate regions of the human body (head and neck, thorax, and abdomen). However, there are few researches regarding the end-to-end whole-body OARs delineation because the existing datasets are mostly partially or incompletely annotated for such task. In this paper, our proposed end-to-end convolutional neural network model, called \textbf{AttentionAnatomy}, can be jointly trained with three partially annotated datasets, segmenting OARs from whole body. Our main contributions are: 1) an attention module implicitly guided by body region label to modulate the segmentation branch output; 2) a prediction re-calibration operation, exploiting prior information of the input images, to handle partial-annotation(HPA) problem; 3) a new hybrid loss function combining batch Dice loss and spatially balanced focal loss to alleviate the organ size imbalance problem. Experimental results of our proposed framework presented significant improvements in both Sørensen-Dice coefficient (DSC) and 95\% Hausdorff distance compared to the baseline model.

preprint2020arXiv

CL-MAPF: Multi-Agent Path Finding for Car-Like Robots with Kinematic and Spatiotemporal Constraints

Multi-Agent Path Finding has been widely studied in the past few years due to its broad application in the field of robotics and AI. However, previous solvers rely on several simplifying assumptions. They limit their applicability in numerous real-world domains that adopt nonholonomic car-like agents rather than holonomic ones. In this paper, we give a mathematical formalization of Multi-Agent Path Finding for Car-Like robots (CL-MAPF) problem. For the first time, we propose a novel hierarchical search-based solver called Car-like Conflict-Based Search to address this problem. It applies a body conflict tree to address collisions considering shapes of the agents. We introduce a new algorithm called Spatiotemporal Hybrid-State A* as the single-agent path planner to generate path satisfying both kinematic and spatiotemporal constraints. We also present a sequential planning version of our method for the sake of efficiency. We compare our method with two baseline algorithms on a dedicated benchmark containing 3000 instances and validate it in real-world scenarios. The experiment results give clear evidence that our algorithm scales well to a large number of agents and is able to produce solutions that can be directly applied to car-like robots in the real world. The benchmark and source code are released in https://github.com/APRIL-ZJU/CL-CBS.

preprint2020arXiv

Contextualized Graph Attention Network for Recommendation with Item Knowledge Graph

Graph neural networks (GNN) have recently been applied to exploit knowledge graph (KG) for recommendation. Existing GNN-based methods explicitly model the dependency between an entity and its local graph context in KG (i.e., the set of its first-order neighbors), but may not be effective in capturing its non-local graph context (i.e., the set of most related high-order neighbors). In this paper, we propose a novel recommendation framework, named Contextualized Graph Attention Network (CGAT), which can explicitly exploit both local and non-local graph context information of an entity in KG. Specifically, CGAT captures the local context information by a user-specific graph attention mechanism, considering a user&#39;s personalized preferences on entities. Moreover, CGAT employs a biased random walk sampling process to extract the non-local context of an entity, and utilizes a Recurrent Neural Network (RNN) to model the dependency between the entity and its non-local contextual entities. To capture the user&#39;s personalized preferences on items, an item-specific attention mechanism is also developed to model the dependency between a target item and the contextual items extracted from the user&#39;s historical behaviors. Experimental results on real datasets demonstrate the effectiveness of CGAT, compared with state-of-the-art KG-based recommendation methods.

preprint2020arXiv

Controlling Magnetic Order, Magnetic Anisotropy, and Band Topology in Semimetals ${\rm Sr(Mn_{0.9}Cu_{0.1})Sb_2}$ and ${\rm Sr(Mn_{0.9}Zn_{0.1})Sb_2}$

Neutron diffraction and magnetic susceptibility studies show that orthorhombic single-crystals of topological semimetals ${\rm Sr(Mn_{0.9}Cu_{0.1})Sb_2}$ and ${\rm Sr(Mn_{0.9}Zn_{0.1})Sb_2}$ undergo three dimensional C-type antiferromagnetic (AFM) ordering of the Mn$^{2+}$ moments at $T_N = 200\pm10$ and $210\pm12$ K, respectively, significantly lower than that of the parent SrMnSb$_2$ with $T_N=297 \pm 3$ K. Magnetization versus applied magnetic field (perpendicular to MnSb planes) below $T_N$ exhibits slightly modified de Haas van Alphen oscillations for the Zn-doped crystal as compared to that of the parent compound. By contrast, the Cu-doped system does not show de Haas van Alphen magnetic oscillations, suggesting that either Cu substitution for Mn changes the electronic structure of the parent compound substantially, or that the Cu sites are strong scatterers of carriers that significantly shorten their mean free path thus diminishing the oscillations. Density functional theory (DFT) calculations including spin-orbit coupling predict the C-type AFM state for the parent, Cu-, and Zn-doped systems and identify the $a$-axis (i.e., perpendicular to the Mn layer) as the easy magnetization direction in the parent and 12.5% of Cu or Zn substitutions. In contrast, 25% of Cu content changes the easy magnetization to the $b$-axis (i.e., within the Mn layer). We find that the incorporation of Cu and Zn in SrMnSb$_2$ tunes electronic bands near the Fermi level resulting in different band topology and semi-metallicity. The parent and Zn-doped systems have coexistence of electron and hole pockets with opened Dirac cone around the Y-point whereas the Cu-doped system has dominant hole pockets around the Fermi level with a distorted Dirac cone. The tunable electronic structure may point out possibilities of rationalizing the experimentally observed de Haas van Alphen magnetic oscillations.

preprint2020arXiv

Convolutional Spectral Kernel Learning

Recently, non-stationary spectral kernels have drawn much attention, owing to its powerful feature representation ability in revealing long-range correlations and input-dependent characteristics. However, non-stationary spectral kernels are still shallow models, thus they are deficient to learn both hierarchical features and local interdependence. In this paper, to obtain hierarchical and local knowledge, we build an interpretable convolutional spectral kernel network (\texttt{CSKN}) based on the inverse Fourier transform, where we introduce deep architectures and convolutional filters into non-stationary spectral kernel representations. Moreover, based on Rademacher complexity, we derive the generalization error bounds and introduce two regularizers to improve the performance. Combining the regularizers and recent advancements on random initialization, we finally complete the learning framework of \texttt{CSKN}. Extensive experiments results on real-world datasets validate the effectiveness of the learning framework and coincide with our theoretical findings.

preprint2020arXiv

Dive Deeper Into Box for Object Detection

Anchor free methods have defined the new frontier in state-of-the-art object detection researches where accurate bounding box estimation is the key to the success of these methods. However, even the bounding box has the highest confidence score, it is still far from perfect at localization. To this end, we propose a box reorganization method(DDBNet), which can dive deeper into the box for more accurate localization. At the first step, drifted boxes are filtered out because the contents in these boxes are inconsistent with target semantics. Next, the selected boxes are broken into boundaries, and the well-aligned boundaries are searched and grouped into a sort of optimal boxes toward tightening instances more precisely. Experimental results show that our method is effective which leads to state-of-the-art performance for object detection.

preprint2020arXiv

Dynamic Spatio-temporal Graph-based CNNs for Traffic Prediction

Forecasting future traffic flows from previous ones is a challenging problem because of their complex and dynamic nature of spatio-temporal structures. Most existing graph-based CNNs attempt to capture the static relations while largely neglecting the dynamics underlying sequential data. In this paper, we present dynamic spatio-temporal graph-based CNNs (DST-GCNNs) by learning expressive features to represent spatio-temporal structures and predict future traffic flows from surveillance video data. In particular, DST-GCNN is a two stream network. In the flow prediction stream, we present a novel graph-based spatio-temporal convolutional layer to extract features from a graph representation of traffic flows. Then several such layers are stacked together to predict future flows over time. Meanwhile, the relations between traffic flows in the graph are often time variant as the traffic condition changes over time. To capture the graph dynamics, we use the graph prediction stream to predict the dynamic graph structures, and the predicted structures are fed into the flow prediction stream. Experiments on real datasets demonstrate that the proposed model achieves competitive performances compared with the other state-of-the-art methods.

preprint2020arXiv

Effect of controlled point-like disorder induced by 2.5 MeV electron irradiation on nematic resistivity anisotropy of hole-doped (Ba,K)Fe$_2$As$_2$

In-plane anisotropy of electrical resistivity was studied in samples of the hole-doped Ba$_{1-x}$K$_x$Fe$_2$As$_2$ in the composition range $0.21 \leq x \leq 0.26$ where anisotropy changes sign. Low-temperature ($\sim$20~K) irradiation with relativistic 2.5 MeV electrons was used to control the level of disorder and residual resistivity of the samples. Modification of the stress-detwinning technique enabled measurements of the same samples before and after irradiation, leading to conclusion of anisotropic character of predominantly inelastic scattering processes. Our main finding is that the resistivity anisotropy is of the same sign irrespective of residual resistivity, and remains the same in the orthorhombic $C_2$ phase above the re-entrant tetragonal transition. Unusual $T$-linear dependence of the anisotropy $Δρ\equiv ρ_a(T)-ρ_b(T)$ is found in pristine samples with $x=$0.213 and $x=$0.219, without similar signatures in either $ρ_a(T)$ or $ρ_b(T)$. We show that this feature can be reproduced by a phenomenological model of R.~M.~Fernandes {\it et al.} Phys. Rev. Lett. {\bf 107},217002 (2011). We speculate that onset of fluctuations of nematic order on approaching the instability towards the re-entrant tetragonal phase contributes to this unusual dependence.

preprint2020arXiv

Effect of dilute magnetism in a topological insulator

Three-dimensional topological insulators (TIs) have emerged as a unique state of quantum matter and generated enormous interests in condensed matter physics. The surfaces of a three dimensional (3D) TI are composed of a massless Dirac cone, which is characterized by the Z2 topological invariant. Introduction of magnetism on the surface of TI is essential to realize the quantum anomalous Hall effect (QAHE) and other novel magneto-electric phenomena. Here, by using a combination of first principles calculations, magneto-transport, angle-resolved photoemission spectroscopy (ARPES), and time resolved ARPES (tr-ARPES), we study the electronic properties of Gadolinium (Gd) doped Sb2Te3. Our study shows that Gd doped Sb2Te3 is a spin-orbit-induced bulk band-gap material, whose surface is characterized by a single topological surface state. We further demonstrate that introducing diluted 4f-electron magnetism into the Sb2Te3 topological insulator system by the Gd doping creates surface magnetism in this system. Our results provide a new platform to investigate the interaction between dilute magnetism and topology in doped topological materials.

preprint2020arXiv

Extended Feature Pyramid Network for Small Object Detection

Small object detection remains an unsolved challenge because it is hard to extract information of small objects with only a few pixels. While scale-level corresponding detection in feature pyramid network alleviates this problem, we find feature coupling of various scales still impairs the performance of small objects. In this paper, we propose extended feature pyramid network (EFPN) with an extra high-resolution pyramid level specialized for small object detection. Specifically, we design a novel module, named feature texture transfer (FTT), which is used to super-resolve features and extract credible regional details simultaneously. Moreover, we design a foreground-background-balanced loss function to alleviate area imbalance of foreground and background. In our experiments, the proposed EFPN is efficient on both computation and memory, and yields state-of-the-art results on small traffic-sign dataset Tsinghua-Tencent 100K and small category of general object detection dataset MS COCO.

preprint2020arXiv

Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations

Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations, including rotations, scalings, and changes of lighting conditions. We observe that the features of a transformed image are drastically different from the ones of the original image. To make CNNs more invariant to transformations, we propose &#34;Feature Lenses&#34;, a set of ad-hoc modules that can be easily plugged into a trained model (referred to as the &#34;host model&#34;). Each individual lens reconstructs the original features given the features of a transformed image under a particular transformation. These lenses jointly counteract feature distortions caused by various transformations, thus making the host model more robust without retraining. By only updating lenses, the host model is freed from iterative updating when facing new transformations absent in the training data; as feature semantics are preserved, downstream applications, such as classifiers and detectors, automatically gain robustness without retraining. Lenses are trained in a self-supervised fashion with no annotations, by minimizing a novel &#34;Top-K Activation Contrast Loss&#34; between lens-transformed features and original features. Evaluated on ImageNet, MNIST-rot, and CIFAR-10, Feature Lenses show clear advantages over baseline methods.

preprint2020arXiv

First-principles prediction into robust high-performance photovoltaic double perovskites A$_{2}$SiI$_{6}$ (A = K, Rb, Cs)

Despite the exceeding 23\% photovoltaic efficiency achieved in organic-inorganic hybrid perovskite solar cells obtaining, the stable materials with desirable band gap are rare and are highly desired. With the aid of first-principles calculations, we predict a new promising family of nontoxic inorganic double perovskites (DPs), namely, silicon (Si)-based halides A$_{2}$SiI$_{6}$ (A = K, Rb, Cs; X = Cl, Br, I). This family containing the earth-abundant Si could be applied for perovskite solar cells (PSCs). Particularly A$_{2}$SiI$_{6}$ exhibits superb physical traits, including suitable band gaps of 0.84-1.15 eV, dispersive lower conduction bands, small carrier effective masses, wide photon absorption in the visible range. Importantly, the good stability at high temperature renders them as promising optical absorbers for solar cells.

preprint2020arXiv

FReeNet: Multi-Identity Face Reenactment

This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model. The proposed FReeNet consists of two parts: Unified Landmark Converter (ULC) and Geometry-aware Generator (GAG). The ULC adopts an encode-decoder architecture to efficiently convert expression in a latent landmark space, which significantly narrows the gap of the face contour between source and target identities. The GAG leverages the converted landmark to reenact the photorealistic image with a reference image of the target person. Moreover, a new triplet perceptual loss is proposed to force the GAG module to learn appearance and geometry information simultaneously, which also enriches facial details of the reenacted images. Further experiments demonstrate the superiority of our approach for generating photorealistic and expression-alike faces, as well as the flexibility for transferring facial expressions between identities.

preprint2020arXiv

Hierarchical and Efficient Learning for Person Re-Identification

Recent works in the person re-identification task mainly focus on the model accuracy while ignore factors related to the efficiency, e.g. model size and latency, which are critical for practical application. In this paper, we propose a novel Hierarchical and Efficient Network (HENet) that learns hierarchical global, partial, and recovery features ensemble under the supervision of multiple loss combinations. To further improve the robustness against the irregular occlusion, we propose a new dataset augmentation approach, dubbed Random Polygon Erasing (RPE), to random erase irregular area of the input image for imitating the body part missing. We also propose an Efficiency Score (ES) metric to evaluate the model efficiency. Extensive experiments on Market1501, DukeMTMC-ReID, and CUHK03 datasets shows the efficiency and superiority of our approach compared with epoch-making methods.

preprint2020arXiv

High-Temperature Ferromagnetic Semiconductors: Janus Monolayer Vanadium Trihalides

Two-dimensional (2D) intrinsic ferromagnetic semiconductors are expected to stand out in the spintronic field. Recently, the monolayer VI$_{3}$ has been experimentally synthesized but the weak ferromagnetism and low Curie temperature ($T_C$) limit its potential application. Here we report that the Janus structure can elevate the $T_C$ to 240 K. And it is discussed that the reason for high $T_C$ in Janus structure originates from the lower virtual exchange gap between $t_{2g}$ and $e_{g}$ states of nearest-neighbor V atoms. Besides, $T_C$ can be further substantially enhanced by tensile strain due to the increasing ferromagnetism driven by rapidly quenched direct exchange interaction. Our work supports a feasible approach to enhance Curie temperature of monolayer VI$_{3}$ and unveils novel stable intrinsic FM semiconductors for realistic applications in spintronics.

preprint2020arXiv

Improved mathematical models of structured-light modulation analysis technique for contaminant and defect detection

Surface quality inspection of optical components is critical in optical and electronic industries. Structured-Light Modulation Analysis Technique (SMAT) is a novel method recently proposed for the contaminant and defect detection of specular surfaces and transparent objects, and this approach was verified to be effective in eliminating ambient light. The mechanisms and mathematical models of SMAT were analyzed and established based on the theory of photometry and the optical characteristics of contaminants and defects. However, there are still some phenomena exist as conundrums in actual detection process, which cannot be well explained. In order to better analyze the phenomena in practical circumstances, improved mathematical models of SMAT are constructed based on the surface topography of contaminants and defects in this paper. These mathematical models can be used as tools for analyzing various contaminants and defects in different systems, and provide effective instruction for subsequent work. Simulations and experiments on the modulation and the luminous flux of fringe patterns have been implemented to verify the validity of these mathematical models. In adddition, by using the fringe patterns with mutually perpendicular sinusoidal directions, two obtained modulation images can be merged to solve the incomplete information acquisition issue caused by the differentiated response of modulation.

preprint2020arXiv

Input Perturbation: A New Paradigm between Central and Local Differential Privacy

Traditionally, there are two models on differential privacy: the central model and the local model. The central model focuses on the machine learning model and the local model focuses on the training data. In this paper, we study the \textit{input perturbation} method in differentially private empirical risk minimization (DP-ERM), preserving privacy of the central model. By adding noise to the original training data and training with the `perturbed data&#39;, we achieve ($ε$,$δ$)-differential privacy on the final model, along with some kind of privacy on the original data. We observe that there is an interesting connection between the local model and the central model: the perturbation on the original data causes the perturbation on the gradient, and finally the model parameters. This observation means that our method builds a bridge between local and central model, protecting the data, the gradient and the model simultaneously, which is more superior than previous central methods. Detailed theoretical analysis and experiments show that our method achieves almost the same (or even better) performance as some of the best previous central methods with more protections on privacy, which is an attractive result. Moreover, we extend our method to a more general case: the loss function satisfies the Polyak-Lojasiewicz condition, which is more general than strong convexity, the constraint on the loss function in most previous work.

preprint2020arXiv

LIC-Fusion 2.0: LiDAR-Inertial-Camera Odometry with Sliding-Window Plane-Feature Tracking

Multi-sensor fusion of multi-modal measurements from commodity inertial, visual and LiDAR sensors to provide robust and accurate 6DOF pose estimation holds great potential in robotics and beyond. In this paper, building upon our prior work (i.e., LIC-Fusion), we develop a sliding-window filter based LiDAR-Inertial-Camera odometry with online spatiotemporal calibration (i.e., LIC-Fusion 2.0), which introduces a novel sliding-window plane-feature tracking for efficiently processing 3D LiDAR point clouds. In particular, after motion compensation for LiDAR points by leveraging IMU data, low-curvature planar points are extracted and tracked across the sliding window. A novel outlier rejection criterion is proposed in the plane-feature tracking for high-quality data association. Only the tracked planar points belonging to the same plane will be used for plane initialization, which makes the plane extraction efficient and robust. Moreover, we perform the observability analysis for the LiDAR-IMU subsystem and report the degenerate cases for spatiotemporal calibration using plane features. While the estimation consistency and identified degenerate motions are validated in Monte-Carlo simulations, different real-world experiments are also conducted to show that the proposed LIC-Fusion 2.0 outperforms its predecessor and other state-of-the-art methods.

preprint2020arXiv

Limited Feedback based Adaptive Power Allocation and Subcarrier Pairing for OFDM DF Relay Networks with Diversity

A limited feedback based dynamic resource allocation algorithm is proposed for a relay cooperative network with Orthogonal Frequency Division Multiplexing (OFDM) modulation. A communication model where one source node communicates with one destination node assisted by one half-duplex Decode-and-Foward (DF) relay is considered in this paper. We first consider the \emph{selective} DF scheme, in which some relay subcarriers will keep idle if they are not advantageous to forward the received symbols. Furthermore, we consider the \emph{enhanced} DF scheme where the idle subcarriers are used to transmit new messages at the source. We aim to maximize the system instantaneous rate by jointly optimizing power allocation and subcarrier pairing on each subcarrier based on the Lloyd algorithm. Both sum and individual power constraints are considered. The joint optimization turns out to be a mixed integer programming problem. We then transform it into a convex optimization by continuous relaxation, and achieve the solution in the dual domain.

preprint2020arXiv

Nearly Optimal Clustering Risk Bounds for Kernel K-Means

In this paper, we study the statistical properties of kernel $k$-means and obtain a nearly optimal excess clustering risk bound, substantially improving the state-of-art bounds in the existing clustering risk analyses. We further analyze the statistical effect of computational approximations of the Nyström kernel $k$-means, and prove that it achieves the same statistical accuracy as the exact kernel $k$-means considering only $Ω(\sqrt{nk})$ Nyström landmark points. To the best of our knowledge, such sharp excess clustering risk bounds for kernel (or approximate kernel) $k$-means have never been proposed before.

preprint2020arXiv

Neural Architecture Optimization with Graph VAE

Due to their high computational efficiency on a continuous space, gradient optimization methods have shown great potential in the neural architecture search (NAS) domain. The mapping of network representation from the discrete space to a latent space is the key to discovering novel architectures, however, existing gradient-based methods fail to fully characterize the networks. In this paper, we propose an efficient NAS approach to optimize network architectures in a continuous space, where the latent space is built upon variational autoencoder (VAE) and graph neural networks (GNN). The framework jointly learns four components: the encoder, the performance predictor, the complexity predictor and the decoder in an end-to-end manner. The encoder and the decoder belong to a graph VAE, mapping architectures between continuous representations and network architectures. The predictors are two regression models, fitting the performance and computational complexity, respectively. Those predictors ensure the discovered architectures characterize both excellent performance and high computational efficiency. Extensive experiments demonstrate our framework not only generates appropriate continuous representations but also discovers powerful neural architectures.

preprint2020arXiv

Optimizing Non-Orthogonal Multiple Access in Random Access Networks

Non-orthogonal multiple access (NOMA) has been considered as a promising solution for improving the spectrum efficiency of next-generation wireless networks. In this paper, the performance of a p-persistent slotted ALOHA system in support of NOMA transmissions is investigated. Specifically, wireless users can choose to use high or low power for data transmissions with certain probabilities. To achieve the maximum network throughput, an analytical framework is developed to analyze the successful transmission probability of NOMA and long term average throughput of users involved in the non-orthogonal transmissions. The feasible region of the maximum number of concurrent users using high and low power to ensure successful NOMA transmissions are quantified. Based on the analysis, an algorithm is proposed to find the optimal transmission probabilities for users to choose high and low power to achieve the maximum system throughput. In addition, the impact of power settings on the network performance is further investigated. Simulations are conducted to validate the analysis.

preprint2020arXiv

PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud

In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud. Our framework combines the efficiency of traditional geometric methods with robustness of deep learning methods, consisting of two stages: At stage-1, our accelerated cluster proposal algorithm will generate refined cluster proposals by segmenting point clouds without ground, capable of generating less redundant proposals with higher recall in an extremely short time; stage-2 we will amplify and further process these proposals by a neural network to estimate semantic label for each point and meanwhile propose a novel data augmentation method to enhance the network&#39;s recognition capability for all categories especially for non-rigid objects. Evaluated on KITTI raw dataset, PASS3D stands out against the state-of-the-art on some results, making itself competent to 3D perception in autonomous driving system. Our source code will be open-sourced. A video demonstration is available at https://www.youtube.com/watch?v=cukEqDuP_Qw.

preprint2020arXiv

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1-3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5\% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.

preprint2020arXiv

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.

preprint2020arXiv

Sample caching Markov chain Monte Carlo approach to boson sampling simulation

Boson sampling is a promising candidate for quantum supremacy. It requires to sample from a complicated distribution, and is trusted to be intractable on classical computers. Among the various classical sampling methods, the Markov chain Monte Carlo method is an important approach to the simulation and validation of boson sampling. This method however suffers from the severe sample loss issue caused by the autocorrelation of the sample sequence. Addressing this, we propose the sample caching Markov chain Monte Carlo method that eliminates the correlations among the samples, and prevents the sample loss at the meantime, allowing more efficient simulation of boson sampling. Moreover, our method can be used as a general sampling framework that can benefit a wide range of sampling tasks, and is particularly suitable for applications where a large number of samples are taken.

preprint2020arXiv

Selection of strain and fitting schemes for calculating higher-order elastic constants

Criteria of selecting strain and fitting schemes are proposed for the calculation of higher-order elastic constants more efficiently, robustly and accurately. As demonstrated by the third-order elastic constants (TOECs) of diamond, the proposed method is 3-5 times faster than existing methods, and the range of strain for getting correct TOECs is expanded. In addition, our result provides an evidence for the inaccuracy of some previous experiments caused by higher-order effect, and the difference among experiments and several different theoretical methods is resolved. Finally, we give the recommend TOECs values for diamond.

preprint2020arXiv

Semantic Graph Based Place Recognition for 3D Point Clouds

Due to the difficulty in generating the effective descriptors which are robust to occlusion and viewpoint changes, place recognition for 3D point cloud remains an open issue. Unlike most of the existing methods that focus on extracting local, global, and statistical features of raw point clouds, our method aims at the semantic level that can be superior in terms of robustness to environmental changes. Inspired by the perspective of humans, who recognize scenes through identifying semantic objects and capturing their relations, this paper presents a novel semantic graph based approach for place recognition. First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud. Thus, place recognition is modeled as a graph matching problem. Then we design a fast and effective graph similarity network to compute the similarity. Exhaustive evaluations on the KITTI dataset show that our approach is robust to the occlusion as well as viewpoint changes and outperforms the state-of-the-art methods with a large margin. Our code is available at: \url{https://github.com/kxhit/SG_PR}.

preprint2020arXiv

Spin Dynamics in the Antiferromagnetic Oxypnictides and Fluoropnictides: LaMnAsO, LaMnSbO, and BaMnAsF

Inelastic neutron scattering (INS) from polycrystalline antiferromagnetic LaMnAsO, LaMnSbO, and BaMnAsF are analyzed using a $J_1-J_2-J_c$ Heisenberg model in the framework of the linear spin-wave theory. All three systems show clear evidence that the nearest- and next-nearest-neighbor interactions within the Mn square lattice layer ($J_1$ and $J_2$) are both antiferromagnetic (AFM). However, for all compounds studied the competing interactions have a ratio of $2J_2/J_1 < 1$, which favors the square lattice checkerboard AFM structure over the stripe AFM structure. The inter-plane coupling $J_c$ in all three systems is on the order of $\sim 3\times10^{-4}J_1$, rendering the magnetic properties of these systems with quasi-two-dimensional character. The substitution of Sb for As significantly lowers the in-plane exchange coupling, which is also reflected in the decrease of the N{é}el temperature, $T_{\rm N}$. Although BaMnAsF shares the MnAs sheets as LaMnAsO, their $J_1$ and $J_2$ values are substantially different. Using density functional theory, we calculate exchange parameters $J_{ij}$ to rationalize the differences among these systems.

preprint2020arXiv

Stability of the saddle solutions for the Allen-Cahn equation

We are concerned with the saddle solutions of the Allen-Cahn equation constructed by Cabré and Terra \cite{C,C2} in $\mathbb{R}^{2m}% =\mathbb{R}^{m}\times\mathbb{R}^{m}$. These solutions vanish precisely on the Simons cone. The existence and uniqueness of saddle solution are shown in \cite{C,C2,C1}. Regarding the stability, Schatzman \cite{Sch} proved that the saddle solution is unstable for $m=1,$ Cabré \cite{C1} showed the instability for $m=2,3$ and stability for $m\geq7$. This has left open the case of $m=4,5,6$. In this paper we show that the saddle solutions are stable when $m=4,5,6$, thereby confirming Cabré&#39;s conjecture in \cite{C1}. The conjecture that saddle solutions in dimensions $2m\geq8$ should be global minimizers of the energy functional remains open.

preprint2020arXiv

Sub-optimal convergence of discontinuous Galerkin methods with central fluxes for linear hyperbolic equations with even degree polynomial approximations

In this paper, we theoretically and numerically verify that the discontinuous Galerkin (DG) methods with central fluxes for linear hyperbolic equations on non-uniform meshes have sub-optimal convergence properties when measured in the $L^2$-norm for even degree polynomial approximations. On uniform meshes, the optimal error estimates are provided for arbitrary number of cells in one and multi-dimensions, improving previous results. The theoretical findings are found to be sharp and consistent with numerical results.

preprint2020arXiv

Synthesis and temperature-dependent photoluminescence of high density GeSe triangular nanoplate arrays on Si substrates

We have grown germanium selenide (GeSe) triangular nanoplate arrays (TNAs) with a high density (3.82E+6 / mm2) on the Si (111) substrate using a simple thermal evaporation method. The thickness and trilateral lengths of a single triangular nanoplate were statistically estimated by atomic force microscopy (AFM) as 44 nm, 365 nm, 458 nm and 605 nm, respectively. Transmission electron microscopy (TEM) images and X-ray diffraction (XRD) patterns show that the TNAs were composed of single crystalline GeSe phase. The Se-related defects in the lattice were also revealed by TEM images and Raman vibration modes. Unlike previously reported GeSe compounds, the GeSe TNAs exhibited temperature-dependent photoluminescence (PL). In addition, not previously reported PL peak (1.25 eV) of the 44 nm thick TNAs at 5 K was in the gaps between those of GeSe monolayers (1.5 nm) and thin films (400 nm), revealing a close relationship between the PL peak and the thickness of GeSe. The high-density structure and temperature-dependent PL of the TNAs on the Si substrate may be useful for temperature controllable semiconductor nanodevices.

preprint2020arXiv

Targetless Calibration of LiDAR-IMU System Based on Continuous-time Batch Estimation

Sensor calibration is the fundamental block for a multi-sensor fusion system. This paper presents an accurate and repeatable LiDAR-IMU calibration method (termed LI-Calib), to calibrate the 6-DOF extrinsic transformation between the 3D LiDAR and the Inertial Measurement Unit (IMU). % Regarding the high data capture rate for LiDAR and IMU sensors, LI-Calib adopts a continuous-time trajectory formulation based on B-Spline, which is more suitable for fusing high-rate or asynchronous measurements than discrete-time based approaches. % Additionally, LI-Calib decomposes the space into cells and identifies the planar segments for data association, which renders the calibration problem well-constrained in usual scenarios without any artificial targets. We validate the proposed calibration approach on both simulated and real-world experiments. The results demonstrate the high accuracy and good repeatability of the proposed method in common human-made scenarios. To benefit the research community, we open-source our code at \url{https://github.com/APRIL-ZJU/lidar_IMU_calib}

preprint2020arXiv

Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square Loss and RKHS

Theoretical analysis of the divide-and-conquer based distributed learning with least square loss in the reproducing kernel Hilbert space (RKHS) have recently been explored within the framework of learning theory. However, the studies on learning theory for general loss functions and hypothesis spaces remain limited. To fill the gap, we study the risk performance of distributed empirical risk minimization (ERM) for general loss functions and hypothesis spaces. The main contributions are two-fold. First, we derive two tight risk bounds under certain basic assumptions on the hypothesis space, as well as the smoothness, Lipschitz continuity, strong convexity of the loss function. Second, we further develop a more general risk bound for distributed ERM without the restriction of strong convexity.

preprint2020arXiv

Variational Quantum Circuits for Quantum State Tomography

Quantum state tomography is a key process in most quantum experiments. In this work, we employ quantum machine learning for state tomography. Given an unknown quantum state, it can be learned by maximizing the fidelity between the output of a variational quantum circuit and this state. The number of parameters of the variational quantum circuit grows linearly with the number of qubits and the circuit depth, so that only polynomial measurements are required, even for highly-entangled states. After that, a subsequent classical circuit simulator is used to transform the information of the target quantum state from the variational quantum circuit into a familiar format. We demonstrate our method by performing numerical simulations for the tomography of the ground state of a one-dimensional quantum spin chain, using a variational quantum circuit simulator. Our method is suitable for near-term quantum computing platforms, and could be used for relatively large-scale quantum state tomography for experimentally relevant quantum states.

preprint2019arXiv

Electronic, magnetic, and optical properties of Mn-doped GaSb: a first-principles study

Half-metallic ferromagnets can produce fully spin-polarized conduction electrons and can be applied to fabricate spintronic devices. Thus, in this study, the electronic structure, magnetic properties, and optical properties of GaSb, which has exhibited half-metallicity, doped with Mn, a 3d transition metal, are calculated using the generalized gradient approximation and Heyd-Scuseria-Ernzerhof (HSE) functional. Ga$_{1-x}$Mn$_x$Sb ($x = 0.25, 0.5, 0.75$) materials exhibit ferromagnetic half-metallic properties and a high Curie temperature, indicating that this series can applied in spintronic devices. Meanwhile, they absorb strongly in the infrared band, suggesting that Ga$_{1-x}$Mn$_{x}$Sb also has potential applications in infrared photoelectric devices.

preprint2019arXiv

Extreme Ultraviolet Time- and Angle-Resolved Photoemission Spectroscopy with 21.5 meV Resolution using High-Order Harmonic Generation from a Turn-Key Yb:KGW Amplifier

Characterizing and controlling electronic properties of quantum materials require direct measurements of non-equilibrium electronic band structures over large regions of momentum space. Here, we demonstrate an experimental apparatus for time- and angle-resolved photoemission spectroscopy using high-order harmonic probe pulses generated by a robust, moderately high power (20 W) Yb:KGW amplifier with tunable repetition rate between 50 and 150 kHz. By driving high-order harmonic generation (HHG) with the second harmonic of the fundamental 1025 nm laser pulses, we show that single-harmonic probe pulses at 21.8 eV photon energy can be effectively isolated without the use of a monochromator. The on-target photon flux can reach 5 x 10^10 photons/second at 50 kHz, and the time resolution is measured to be 320 fs. The relatively long pulse duration of the Yb-driven HHG source allows us to reach an excellent energy resolution of 21.5 meV, which is achieved by suppressing the space-charge broadening using a low photon flux of 1.5 x 10^8 photons/second at a higher repetition rate of 150 kHz. The capabilities of the setup are demonstrated through measurements in the topological semimetal ZrSiS and the topological insulator Sb2-xGdxTe3.

preprint2019arXiv

Large thermoelectric power factor of high-mobility 1T&#39;&#39; phase of transition-metal dichalcogenides

The experimental studies about monolayer transition metal dichalcogenides in the recent year reveal this kind of compounds have many metastable phases with unique physical properties, not just 1H phases. Here, we focus on the 1T&#39;&#39; phase and systematically investigate the electronic structures and transport properties of MX2 (M=Mo, W; X=S, Se, Te) using the first-principles calculations with Boltzmann transport theory. And among them, only three molybdenum compounds has small direct bandgap at K point, which derive from the distortion of octahedral-coordination [MoS6]. For these three cases, hole carrier mobility of MoSe2 is estimated as 690 cm^2/Vs at room temperature, far more high than that of other two MoX2. For the reason, the combination of the modest carrier effective mass and weak electron-phonon coupling lead to the outstanding transport performance of MoSe2. The Seebeck coefficient of MoSe2 is also evaluated as high as 300 10^-6 V/K at room temperature. Due to the temperature dependent mobility of T^-1.9 and higher Seebeck coefficient at low temperature, it is found that MoSe2 has a large thermoelectric power factor around 6 10^-3 W/mK^2 in the low to intermediate temperature range. The present results suggests 1T&#39;&#39; MoSe2 maybe a excellent candidate for thermoelectric material.

preprint2019arXiv

Modulation of heat transport in two-dimensional group-III chalcogenides

We systematically investigated the modulation of heat transport of experimentally accessible two-dimensional (2D) group-III chalcogenides by firstprinciples calculations. It was found that intrinsic thermal conductivity (kappa) of chalcogenides MX (M = Ga, In; X = S, Se) were desirable for efficient heat dissipation. Meanwhile, we showed that the long-range anharmonic interactions played an important role in heat transport of the chalcogenides. The difference of kappa among the 2D group-III chalcogenides can be well described by the Slack model and can be mainly attributed to phonon group velocity. Based on that, we proposed three methods including strain engineering, size effect and making Janus structures to effectively modulate the kappa of 2D group-III chalcogenides, with different underlying mechanisms. We found that tensile strain and rough boundary scattering could continuously decrease the kappa while compressive strain could increase the kappa of 2D group-III chalcogenides. On the other side, the change of kappa by producing Janus structures is permanent and dependent on the structural details. These results provide guilds to modulate heat transport properties of 2D group-III chalcogenides for devices application

preprint2019arXiv

The direct and indirect optical absorptions of cubic BAs and BSb

Recently, boron arsenide (BAs) has been measured high thermal conductivity in the experiments, great encouraging for the low-power photoelectric devices. Therefore, in the present work, we have systematically investigated the direct and indirect optical absorptions of BAs and BSb and the doping effect of congeners by using first-principles calculations. We obtain the absorption onset corresponding to the value of indirect bandgap by considering the phonon-assisted second-order optical absorptions. And the redshift of absorption onset, enhancement and smoothness of optical absorptions spectra are also captured in the temperature-dependent calculations. In order to introduce one-order absorptions into the visible range, the doping effect of congeners on optical absorptions is studied without the assists of phonon. It is found that the decrease of local direct bandgap after doping derives from either the small bandgap in the prototypical III-V semiconductors or CBM locating at R$_c$ point. Thus, doping of congeners can improve the direct optical absorptions in visible range.

preprint2019arXiv

Theoretical study of structure and magnetism of Ga$_{1-x}$V$_x$Sb compounds for spintronic applications

In this paper, the structural, electronic and magnetic properties of Zinc-blende Ga1-xVxSb compounds, with x from dilute doping situation to extreme doping limiting, were systematically investigated by first-principles calculations. V atoms prefer to substitute the Ga atoms and the formation energy is lower in Sb-rich than Ga-rich growth condition. Meantime, the SbGa antisite defects can effectively decrease the energy barrier of substitution process, from 0.85 eV to 0.53 eV. The diffusion of V atom in GaSb lattice is through meta-stable interstitial sites with an energy barrier of 0.6 eV. At a low V concentration x = 0.0625, V atoms prefer a homogeneous distribution and an antiferromagnetic coupling among them. However, starting from x = 0.5, the magnetic coupling among V atoms changes to be ferromagnetic, due to enhanced superexchange interaction between eg and t2g states of neighbouring V atoms. At the extreme limiting of x = 1.00, we found that Zinc-blende VSb as well as its analogs VAs and VP are intrinsic ferromagneitc semiconductors, with a large change of light absorption at the curie temperature. These results indicate that Ga1-xVxSb compounds can provide a platform to design the new electronic, spintronic and optoelectronic devices.

preprint2019arXiv

Tunable anisotropic absorption in monolayer black phosphorus using critical coupling

We present a monolayer black phosphorus (BP)-based metamaterial structure for tunable anisotropic absorption in the mid-infrared. Based on the critical coupling mechanism of guided resonance, the structure realizes the high absorption efficiency of 99.65$\%$ for TM polarization, while only 2.61$\%$ at the same wavelength for TE polarization due to the intrinsic anisotropy of BP. The absorption characteristics can be flexibly controlled by changing critical coupling conditions, including the electron doping of BP, geometric parameters and incident angles of light. The results show feasibility in designing high-performance BP-based optoelectronic devices with spectral tunability and polarization selectivity.

preprint2019arXiv

Two Dimensional Ferromagnetic Semiconductor: Monolayer CrGeS$_3$

Recently, two-dimensional ferromagnetic semiconductors have been an important class of materials for many potential applications in spintronic devices. Based on density functional theory, we systematically explore the magnetic and electronic properties of CrGeS$_3$ with the monolayer structures. The comparison of total energy between different magnetic states ensures the ferromagnetic ground state of monolayer CrGeS$_3$. It is also shown that ferromagnetic and semiconducting properties are exhibited in monolayer CrGeS$_3$ with the magnetic moment of 3 $μ_{B}$ for each Cr atom, donated mainly by the intense $dp$$σ$-hybridization of Cr $e_g$-S $p$. There are the bandgap of 0.70 eV of spin-up state in the monolayer structure when 0.77 eV in spin-down state. The global gap is 0.34 eV (2.21 eV by using HSE06 functional), which originates from bonding $dpσ$ hybridized states of Cr $e_g$-S $p$ and unoccupied Cr $t_{2g}$-Ge $p$ hybridization. Besides, we estimate that the monolayer CrGeS$_3$ possesses the Curie temperature of 161 K by mean-field theory.

preprint2018arXiv

Type-I and type-II Nodal Lines Coexistence in the Antiferromagnetic monolayer CrAs$_{2}$

Topological nodal line semimetals, hosting one-dimensional Fermi lines with symmetry protection, has become a hot topic in topological quantum matter. Due to the breaking of time reversal symmetry in magnetic system, nodal lines require protection by additional symmetries. Here, we report the discovery of antiferromagnetic type-I and type-II nodal lines coexist in the monolayer CrAs$_{2}$ based on a systematic first-principles calculation. Remarkably, the type-I nodal line in CrAs$_{2}$ form a concentric loop centered around the $Γ$ point is filling-enforced by nonsymmorphic analogue symmetry and robust against spin-orbital coupling. The type-II nodal lines, a kind of open nodal lines appear around the Fermi level, are protected by the mirror symmetry in the absence of spin-orbital coupling. The antiferromagnetic monolayer CrAs$_{2}$ proposed here may provide a platform for the correlation between magnetism and exotic topological phases.