Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
56works
0followers
34topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

56 published item(s)

preprint2026arXiv

A plug-and-play generative framework for multi-satellite precipitation estimation

Reliable precipitation monitoring is essential for disaster risk reduction, water resources management, and agricultural decision-making. Multi-source satellite observations, particularly the combination of geostationary infrared and passive microwave measurements, have become a primary means of precipitation detection. Traditional multi-source satellite precipitation estimation methods remain computationally inefficient, and many deep learning methods lack the flexibility to incorporate new sensors without retraining the full model. Here we introduce PRISMA (Precipitation Inference from Satellite Modalities via generAtive modeling), a plug-and-play latent generative framework for multi-sensor precipitation estimation. PRISMA learns an unconditional precipitation prior from IMERG Final fields and constrains it through independently trained, sensor-specific conditional branches, allowing new observation sources to be incorporated without retraining the generative backbone. Applied to FY-4B AGRI infrared and GPM GMI microwave observations, PRISMA improves Critical Success Index by up to 40.3% and reduces root-mean-square error by 22.6% relative to infrared-only estimation within microwave swaths, while also improving probabilistic skill and maintaining an average inference time of about 37 s. Independent rain-gauge validation across China confirms consistent gains, and typhoon case studies show that microwave conditioning restores eyewall and spiral rainband structures, reducing storm-core mean absolute error by up to 42.3%. PRISMA thus provides an extensible and efficient framework for multi-sensor precipitation estimation.

preprint2026arXiv

ELMM: Efficient Lightweight Multimodal Large Language Models for Multimodal Knowledge Graph Completion

Multimodal Knowledge Graphs (MKGs) extend traditional knowledge graphs by incorporating visual and textual modalities, enabling richer and more expressive entity representations. However, existing MKGs often suffer from incompleteness, which hinder their effectiveness in downstream tasks. Therefore, multimodal knowledge graph completion (MKGC) task is receiving increasing attention. While large language models (LLMs) have shown promise for knowledge graph completion (KGC), their application to the multimodal setting remains underexplored. Moreover, applying Multimodal Large Language Models (MLLMs) to the task of MKGC introduces significant challenges: (1) the large number of image tokens per entity leads to semantic noise and modality conflicts, and (2) the high computational cost of processing large token inputs. To address these issues, we propose Efficient Lightweight Multimodal Large Language Models (ELMM) for MKGC. ELMM proposes a Multi-view Visual Token Compressor (MVTC) based on multi-head attention mechanism, which adaptively compresses image tokens from both textual and visual views, thereby effectively reducing redundancy while retaining necessary information and avoiding modality conflicts. Additionally, we design an attention pruning strategy to remove redundant attention layers from MLLMs, thereby significantly reducing the inference cost. We further introduce a linear projection to compensate for the performance degradation caused by pruning. Extensive experiments on four benchmark datasets demonstrate that ELMM achieves state-of-the-art performance.

preprint2026arXiv

FocalOrder: Focal Preference Optimization for Reading Order Detection

Reading order detection is the foundation of document understanding. Most existing methods rely on uniform supervision, implicitly assuming a constant difficulty distribution across layout regions. In this work, we challenge this assumption by revealing a critical flaw: \textbf{Positional Disparity}, a phenomenon where models demonstrate mastery over the deterministic start and end regions but suffer a performance collapse in the complex intermediate sections. This degradation arises because standard training allows the massive volume of easy patterns to drown out the learning signals from difficult layouts. To address this, we propose \textbf{FocalOrder}, a framework driven by \textbf{Focal Preference Optimization (FPO)}. Specifically, FocalOrder employs adaptive difficulty discovery with exponential moving average mechanism to dynamically pinpoint hard-to-learn transitions, while introducing a difficulty-calibrated pairwise ranking objective to enforce global logical consistency. Extensive experiments demonstrate that FocalOrder establishes new state-of-the-art results on OmniDocBench v1.0 and Comp-HRDoc. Our compact model not only outperforms competitive specialized baselines but also significantly surpasses large-scale general VLMs. These results demonstrate that aligning the optimization with intrinsic structural ambiguity of documents is critical for mastering complex document structures.

preprint2026arXiv

GC-ART: Global Learnable Second-Order Rational Tone Curves for Illumination Robustness

We introduce GC-ART (Global Curve Adaptive Rational Tone-mapping), a lightweight differentiable pre-processing module for robust image classification. GC-ART predicts an endpoint-pinned rational tone curve from per-channel soft histograms using a 643-parameter MLP, then applies the curve pointwise before the classifier. The module is trained end-to-end with cross-entropy and a soft monotonicity penalty. On CIFAR-10 with a CIFAR-style ResNet-18, GC-ART matches clean accuracy with the unenhanced baseline and other learned enhancers, improves over the baseline on multiplicative darkening, and achieves the best learned-method result on contrast corruption (48.45% vs. 46.27% for the baseline and 47.13% for Zero-DCE++). These results suggest that histogram-conditioned rational curves can learn useful global tone corrections, including contrast-expanding behavior, while preserving edge locations by construction through pointwise mapping. GC-ART also uses substantially fewer FLOPs than convolutional learned enhancers at 32 x 32. The current hyperparameters are untuned, leaving room for systematic improvement.

preprint2026arXiv

GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model

Conditional generative models, particularly diffusion-based methods, have recently been applied to graph prediction by modeling the target as a conditional distribution given the input graph, yielding competitive results compared to deterministic predictor. However, existing diffusion-based prediction methods typically require expensive iterative denoising at inference and often suffer from unstable sampling, which motivates recent efforts to reduce inference denoising steps and enable stable sampling via techniques such as consistency training. Despite this progress, we find that existing consistency training methods for graph prediction could potentially fall into a shortcut solution: the model may attempt to satisfy the self-consistency constraint by ignoring the noisy target (i.e., assigning it negligible weight), ultimately collapsing into a purely deterministic predictor. To mitigate such shortcut solution, we propose GCCM, a graph contrastive consistency model that goes beyond isolated pairwise matching between the same target at different noise levels by introducing negative pairs into a contrastive consistency objective. This adds an additional separation requirement, making the shortcut solution no longer trivially sufficient to satisfy the proposed objective. Moreover, we apply feature perturbation to the input node/edge features to break identical conditioning on the input graph, so that the shortcut no longer yields the same predictions across noise levels and becomes less attractive. Extensive experiments on benchmark datasets demonstrate that GCCM mitigates the shortcut solution and yields consistent performance improvements in graph prediction compared to deterministic predictors.

preprint2026arXiv

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propagating continuous states, yet replacing explicit derivations with latent computation can hurt tasks that require symbolic checking. We propose Latent-Then-Explicit Reasoning (LaTER), a two-stage paradigm that first performs bounded exploration in a continuous latent space and then switches to explicit CoT for verification and answer generation. In a training-free instantiation, LaTER projects final-layer hidden states back to the input embedding space, preserves the latent KV cache, and uses entropy and model-native stop-token probes to decide when to switch. We find that strong reasoning models already exhibit structured latent trajectories under this interface. On Qwen3-14B, training-free LaTER reduces total token usage by 16%-32% on several benchmarks while matching or improving accuracy on most of them; for example, it improves AIME 2025 from 70.0% to 73.3% while reducing tokens from 15,730 to 10,661. We further construct Latent-Switch-69K, a supervised corpus that pairs condensed solution intuitions with shortened explicit derivations. Fine-tuning with latent rollout and halting supervision yields additional gains: trained LaTER reaches 80.0% accuracy on AIME 2025, 10.0 points above the standard CoT baseline, while using 33% fewer tokens. Our code, data, and model are available at https://github.com/TioeAre/LaTER.

preprint2026arXiv

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-designs the efficient teacher-forcing layout with SP execution by pairing clean-history and noisy-target temporal chunks on each rank, enabling a natural teacher-forcing mask with SP-aware chunked VAE encoding. Combined with NVFP4 precision, it reduces GPU memory cost and accelerates GEMM computation during training, the proportion of which increases as video length grows. Moreover, we show that a high-quality infrastructure and dataset enable a remarkably clean training pipeline. Unlike existing Self-Forcing series methods that rely on ODE initialization and subsequent distribution matching distillation (DMD), LongLive-2.0 directly tunes a diffusion model into a long, multi-shot, interactive auto-regressive (AR) diffusion model. It can be further converted to real-time generation (4 to 2 denoising steps) with standalone LoRA weights. For inference on Blackwell GPUs, we enable W4A4 NVFP4 inference, quantize KV cache into NVFP4 for memory savings, and boost end-to-end throughput with asynchronous streaming VAE decoding. On non-Blackwell GPU architectures, we deploy SP inference to match the speed on Blackwell GPUs, while the quantized KV cache can lower inter-GPU communication of SP. Experiments show up to 2.15x speedup in training, and 1.84x in inference. LongLive-2.0-5B achieves 45.7 FPS inference while attaining strong performance on benchmarks. To our knowledge, LongLive-2.0 is the first NVFP4 training and inference system for long video generation.

preprint2026arXiv

PARL: Position-Aware Relation Learning Network for Document Layout Analysis

Document layout analysis aims to detect and categorize structural elements (e.g., titles, tables, figures) in scanned or digital documents. Popular methods often rely on high-quality Optical Character Recognition (OCR) to merge visual features with extracted text. This dependency introduces two major drawbacks: propagation of text recognition errors and substantial computational overhead, limiting the robustness and practical applicability of multimodal approaches. In contrast to the prevailing multimodal trend, we argue that effective layout analysis depends not on text-visual fusion, but on a deep understanding of documents' intrinsic visual structure. To this end, we propose PARL (Position-Aware Relation Learning Network), a novel OCR-free, vision-only framework that models layout through positional sensitivity and relational structure. Specifically, we first introduce a Bidirectional Spatial Position-Guided Deformable Attention module to embed explicit positional dependencies among layout elements directly into visual features. Second, we design a Graph Refinement Classifier (GRC) to refine predictions by modeling contextual relationships through a dynamically constructed layout graph. Extensive experiments show PARL achieves state-of-the-art results. It establishes a new benchmark for vision-only methods on DocLayNet and, notably, surpasses even strong multimodal models on M6Doc. Crucially, PARL (65M) is highly efficient, using roughly four times fewer parameters than large multimodal models (256M), demonstrating that sophisticated visual structure modeling can be both more efficient and robust than multimodal fusion.

preprint2026arXiv

Pervasive Vulnerability Analysis and Defense for QKD-based Quantum Private Query

Quantum Private Query (QPQ) based on Quantum Key Distribution (QKD) is among the most practically viable quantum communication protocols, with application value second only to QKD itself. However, prevalent security vulnerabilities in the post-processing stages of most existing QKD-based QPQ protocols have been severely overlooked. This study focuses on hidden information extraction under undetermined signal bits, revealing that most such QPQ protocols face severe security threats even without complex quantum resources. Specifically, direct observation attack causes incremental information leakage, while the minimum error discrimination attack efficiently steals additional database inforamtion. To address these critical flaws, the proposed multi-encryption defense scheme is compatible with existing QPQ protocols. The study demonstrates the necessity of the multi-encryption strategy for the security of databases in QPQ, providing key theoretical and technical support for constructing practical QPQ protocols resistant to real-world attacks.

preprint2026arXiv

Post-Training as Reweighting: A Stochastic View of Reasoning Trajectories in Language Models

Foundation models encode rich structural knowledge but often rely on post-training procedures to adapt their reasoning behavior to specific tasks. Popular approaches such as reinforcement learning with verifiable rewards (RLVR) and inference-time reward aggregation are typically analyzed from a performance perspective, leaving their effects on the underlying reasoning distribution less understood. In this work, we study post-training reasoning from a stochastic trajectory viewpoint. Following Kim et al. (2025), we model reasoning steps of varying difficulty as Markov transitions with different probabilities, and formalize reasoning processes using tree-structured Markov chains. Within this framework, pretraining corresponds to discovering the reasoning structure, while post-training primarily reweights existing chains of thought. We show that both RLVR and inference-time reward aggregation concentrate probability mass on a small number of high-probability trajectories, leading to the suppression of rare but essential reasoning paths. As a consequence, solving hard instances often depends on low-probability trajectories already present in the base model. We further prove that exploration-oriented mechanisms, such as rejecting easy instances and applying KL regularization, help preserve these rare trajectories. Empirical simulations support our theoretical analysis.

preprint2026arXiv

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance this trade-off, we introduce Conditional Entropy Shaping (CES), a framework that dynamically controls token-level response entropy, enabling LLMs to produce concise solutions on simple problems while encouraging deeper exploration on hard ones. Built on DAPO, CES uses token-level entropy as an uncertainty signal and applies a conditional bidirectional policy: it penalizes high-entropy "forking point" tokens on correct reasoning paths to improve conciseness, and rewards them on incorrect paths to encourage exploration and error correction. We implement CES on DeepSeek-R1-Distill-7B and evaluate it on 12 mathematical benchmarks. CES consistently improves average accuracy while reducing response length relative to DAPO, and supplementary experiments show similar trends on a smaller 1.5B backbone and on out-of-domain benchmarks.

preprint2026arXiv

When Normality Shifts: Risk-Aware Test-Time Adaptation for Unsupervised Tabular Anomaly Detection

Unsupervised tabular anomaly detection methods typically learn feature patterns from normal samples during training and subsequently identify samples that deviate from these patterns as anomalies during testing. However, in practical scenarios, the limited scale and diversity of training data often lead to an incomplete characterization of normal patterns. While test-time adaptation offers a remedy, its isolated focus on test-time optimization ignores the critical synergy with training-phase learning. Furthermore, indiscriminate adaptation to unlabeled test data inevitably triggers anomaly contamination, preventing the model from fully realizing its discriminative capability between normal and anomalous samples. To address these issues, we propose RTTAD, a Risk-aware Test-time adaptation method for unsupervised Tabular Anomaly Detection. RTTAD holistically tackles normality shifts via a synergistic two-stage mechanism. During training, collaborative dual-task learning captures multi-level representations to establish a robust normal prior. During testing, a Test-Time Contrastive Learning (TTCL) module explicitly accounts for adaptation risk by selectively updating the model using high-confidence pseudo-normal samples while constraining anomalous ones. Additionally, TTCL incorporates a k-nearest neighbor-based contrastive objective to refine embedding distributions, thereby further enhancing the model's discriminative capacity. Extensive experiments on 15 tabular datasets demonstrate that RTTAD achieves state-of-the-art overall detection performance.

preprint2023arXiv

A New Explanation of the Mechanism of Hadley Circulation

The Hadley circulation (or Hadley cell) is traditionally described as a large-scale atmospheric circulation phenomenon driven by differential heating of the Earth surface: warm, moist air rises near the equator, diverges poleward in the upper troposphere, and subsides in the subtropics. In this article, the mechanism of the Hadley circulation is revisited and a new model is provided to explain its mechanism. The new model is based on a form of the atmospheric dynamic equation which substitutes pressure with temperature and density; thereby categorizing weather systems into thermal and dynamic systems. Such classification is useful for explaining large-scale weather systems such as the Hadley cell. The proposed explanation for the mechanism of the Hadley circulation argues that subtropical highs are the driving force of the Hadley cell, rather than the conventionally-believed ITCZ (Intertropical Convergence Zone). To support our theory, we analyze the atmospheric air density flux divergence with the results from the Community Earth System Model (CESM) and derive a new continuity equation by adding source/sink terms, in which evaporation serves as the air-mass source, and precipitation (condensation) as the air-mass sink. Results found that the equatorial easterlies could be linked to the solar diurnal cycle, demonstrating that the trade wind can be generated by the solar diurnal cycle, especially in the spring and fall seasons, as well as from the equatorial branch of the subtropical high.

preprint2023arXiv

Communication under Mixed Gaussian-Impulsive Channel: An End-to-End Framework

In many communication scenarios, the communication signals simultaneously suffer from white Gaussian noise (WGN) and non-Gaussian impulsive noise (IN), i.e., mixed Gaussian-impulsive noise (MGIN). Under MGIN channel, classical communication signal schemes and corresponding detection methods usually can not achieve desirable performance as they are optimized with respect to WGN. Moreover, as the widely adopted IN model has no analytical and general closed-form expression of probability density function (PDF), it is extremely hard to obtain optimal communication signal and corresponding detection schemes based on classical stochastic signal processing theory. To circumvent these difficulties, we propose a data-driven end-to-end framework to address the communication signal design and detection under MGIN channel in this paper. In this proposed framework, a channel noise simulator (CNS) is elaborately designed based on an improved generative adversarial net (GAN) to simulate the MGIN without requirement of any analytical PDF. Meanwhile, a multi-level wavelet convolutional neural network (MWCNN) based preprocessing network is used to mitigate the negative effect of outliers due to the IN. Compared with conventional approaches and existing end-to-end systems, extensive simulation results verify that our proposed novel end-to-end communication system can achieve better performance in terms of bit-error rate (BER) under MGIN environments.

preprint2023arXiv

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.

preprint2022arXiv

A Hierarchical HAZOP-Like Safety Analysis for Learning-Enabled Systems

Hazard and Operability Analysis (HAZOP) is a powerful safety analysis technique with a long history in industrial process control domain. With the increasing use of Machine Learning (ML) components in cyber physical systems--so called Learning-Enabled Systems (LESs), there is a recent trend of applying HAZOP-like analysis to LESs. While it shows a great potential to reserve the capability of doing sufficient and systematic safety analysis, there are new technical challenges raised by the novel characteristics of ML that require retrofit of the conventional HAZOP technique. In this regard, we present a new Hierarchical HAZOP-Like method for LESs (HILLS). To deal with the complexity of LESs, HILLS first does "divide and conquer" by stratifying the whole system into three levels, and then proceeds HAZOP on each level to identify (latent-)hazards, causes, security threats and mitigation (with new nodes and guide words). Finally, HILLS attempts at linking and propagating the causal relationship among those identified elements within and across the three levels via both qualitative and quantitative methods. We examine and illustrate the utility of HILLS by a case study on Autonomous Underwater Vehicles, with discussions on assumptions and extensions to real-world applications. HILLS, as a first HAZOP-like attempt on LESs that explicitly considers ML internal behaviours and its interactions with other components, not only uncovers the inherent difficulties of doing safety analysis for LESs, but also demonstrates a good potential to tackle them.

preprint2022arXiv

Advanced Deep Networks for 3D Mitochondria Instance Segmentation

Mitochondria instance segmentation from electron microscopy (EM) images has seen notable progress since the introduction of deep learning methods. In this paper, we propose two advanced deep networks, named Res-UNet-R and Res-UNet-H, for 3D mitochondria instance segmentation from Rat and Human samples. Specifically, we design a simple yet effective anisotropic convolution block and deploy a multi-scale training strategy, which together boost the segmentation performance. Moreover, we enhance the generalizability of the trained models on the test set by adding a denoising operation as pre-processing. In the Large-scale 3D Mitochondria Instance Segmentation Challenge at ISBI 2021, our method ranks the 1st place. Code is available at https://github.com/Limingxing00/MitoEM2021-Challenge.

preprint2022arXiv

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

DeepFake based digital facial forgery is threatening the public media security, especially when lip manipulation has been used in talking face generation, the difficulty of fake video detection is further improved. By only changing lip shape to match the given speech, the facial features of identity is hard to be discriminated in such fake talking face videos. Together with the lack of attention on audio stream as the prior knowledge, the detection failure of fake talking face generation also becomes inevitable. Inspired by the decision-making mechanism of human multisensory perception system, which enables the auditory information to enhance post-sensory visual evidence for informed decisions output, in this study, a fake talking face detection framework FTFDNet is proposed by incorporating audio and visual representation to achieve more accurate fake talking face videos detection. Furthermore, an audio-visual attention mechanism (AVAM) is proposed to discover more informative features, which can be seamlessly integrated into any audio-visual CNN architectures by modularization. With the additional AVAM, the proposed FTFDNet is able to achieve a better detection performance on the established dataset (FTFDD). The evaluation of the proposed work has shown an excellent performance on the detection of fake talking face videos, which is able to arrive at a detection rate above 97%.

preprint2022arXiv

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

Talking face generation with great practical significance has attracted more attention in recent audio-visual studies. How to achieve accurate lip synchronization is a long-standing challenge to be further investigated. Motivated by xxx, in this paper, an AttnWav2Lip model is proposed by incorporating spatial attention module and channel attention module into lip-syncing strategy. Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction. To our limited knowledge, this is the first attempt to introduce attention mechanism to the scheme of talking face generation. An extensive experiments have been conducted to evaluate the effectiveness of the proposed model. Compared to the baseline measured by LSE-D and LSE-C metrics, a superior performance has been demonstrated on the benchmark lip synthesis datasets, including LRW, LRS2 and LRS3.

preprint2022arXiv

Audio-visual speech separation based on joint feature representation with cross-modal attention

Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments. Unfortunately, most of current separation strategies prefer a straightforward fusion based on feature learning of each single modality, which is far from sufficient consideration of inter-relationships between modalites. Inspired by learning joint feature representations from audio and visual streams with attention mechanism, in this study, a novel cross-modal fusion strategy is proposed to benefit the whole framework with semantic correlations between different modalities. To further improve audio-visual speech separation, the dense optical flow of lip motion is incorporated to strengthen the robustness of visual representation. The evaluation of the proposed work is performed on two public audio-visual speech separation benchmark datasets. The overall improvement of the performance has demonstrated that the additional motion network effectively enhances the visual representation of the combined lip images and audio signal, as well as outperforming the baseline in terms of all metrics with the proposed cross-modal fusion.

preprint2022arXiv

Augmentation-Free Graph Contrastive Learning with Performance Guarantee

Graph contrastive learning (GCL) is the most representative and prevalent self-supervised learning approach for graph-structured data. Despite its remarkable success, existing GCL methods highly rely on an augmentation scheme to learn the representations invariant across different augmentation views. In this work, we revisit such a convention in GCL through examining the effect of augmentation techniques on graph data via the lens of spectral theory. We found that graph augmentations preserve the low-frequency components and perturb the middle-and high-frequency components of the graph, which contributes to the success of GCL algorithms on homophilic graphs but hinder its application on heterophilic graphs, due to the high-frequency preference of heterophilic data. Motivated by this, we propose a novel, theoretically-principled, and augmentation-free GCL method, named AF-GCL, that (1) leverages the features aggregated by Graph Neural Network to construct the self-supervision signal instead of augmentations and therefore (2) is less sensitive to the graph homophily degree. Theoretically, We present the performance guarantee for AF-GCL as well as an analysis for understanding the efficacy of AF-GCL. Extensive experiments on 14 benchmark datasets with varying degrees of heterophily show that AF-GCL presents competitive or better performance on homophilic graphs and outperforms all existing state-of-the-art GCL methods on heterophilic graphs with significantly less computational overhead.

preprint2022arXiv

Auto-scaling Vision Transformers without Training

This work targets automated designing and scaling of Vision Transformers (ViTs). The motivation comes from two pain spots: 1) the lack of efficient and principled methods for designing and scaling ViTs; 2) the tremendous computational cost of training ViT that is much heavier than its convolution counterpart. To tackle these issues, we propose As-ViT, an auto-scaling framework for ViTs without training, which automatically discovers and scales up ViTs in an efficient and principled manner. Specifically, we first design a "seed" ViT topology by leveraging a training-free search process. This extremely fast search is fulfilled by a comprehensive study of ViT's network complexity, yielding a strong Kendall-tau correlation with ground-truth accuracies. Second, starting from the "seed" topology, we automate the scaling rule for ViTs by growing widths/depths to different ViT layers. This results in a series of architectures with different numbers of parameters in a single run. Finally, based on the observation that ViTs can tolerate coarse tokenization in early training stages, we propose a progressive tokenization strategy to train ViTs faster and cheaper. As a unified framework, As-ViT achieves strong performance on classification (83.5% top1 on ImageNet-1k) and detection (52.7% mAP on COCO) without any manual crafting nor scaling of ViT architectures: the end-to-end model design and scaling process cost only 12 hours on one V100 GPU. Our code is available at https://github.com/VITA-Group/AsViT.

preprint2022arXiv

Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning

PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization bound analysis tool, PAC-Bayesian bound can also be incorporated into an objective function to train a probabilistic neural network, making them a powerful and relevant framework that can numerically provide a tight generalization bound for supervised learning. For simplicity, we call probabilistic neural network learned using training objectives derived from PAC-Bayesian bounds as {\it PAC-Bayesian learning}. Despite their empirical success, the theoretical analysis of PAC-Bayesian learning for neural networks is rarely explored. This paper proposes a new class of convergence and generalization analysis for PAC-Bayes learning when it is used to train the over-parameterized neural networks by the gradient descent method. For a wide probabilistic neural network, we show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as its kernel. Based on this finding, we further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network. Finally, drawing the insight from our theoretical results, we propose a proxy measure for efficient hyperparameters selection, which is proven to be time-saving.

preprint2022arXiv

Enhancing Adversarial Training with Second-Order Statistics of Weights

Adversarial training has been shown to be one of the most effective approaches to improve the robustness of deep neural networks. It is formalized as a min-max optimization over model weights and adversarial perturbations, where the weights can be optimized through gradient descent methods like SGD. In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through \textbf{S}econd-Order \textbf{S}tatistics \textbf{O}ptimization (S$^2$O) with respect to the weights. By relaxing a common (but unrealistic) assumption of previous PAC-Bayesian frameworks that all weights are statistically independent, we derive an improved PAC-Bayesian adversarial generalization bound, which suggests that optimizing second-order statistics of weights can effectively tighten the bound. In addition to this theoretical insight, we conduct an extensive set of experiments, which show that S$^2$O not only improves the robustness and generalization of the trained neural networks when used in isolation, but also integrates easily in state-of-the-art adversarial training techniques like TRADES, AWP, MART, and AVMixup, leading to a measurable improvement of these techniques. The code is available at \url{https://github.com/Alexkael/S2O}.

preprint2022arXiv

Extreme Continuous Treatment Effects: Measures, Estimation and Inference

This paper concerns estimation and inference for treatment effects in deep tails of the counterfactual distribution of unobservable potential outcomes corresponding to a continuously valued treatment. We consider two measures for the deep tail characteristics: the extreme quantile function and the tail mean function defined as the conditional mean beyond a quantile level. Then we define the extreme quantile treatment effect (EQTE) and the extreme average treatment effect (EATE), which can be identified through the commonly adopted unconfoundedness condition and estimated with the aid of extreme value theory. Our limiting theory is for the EQTE and EATE processes indexed by a set of quantile levels and hence facilitates uniform inference. Simulations suggest that our method works well in finite samples and an empirical application illustrates its practical merit.

preprint2022arXiv

Filtering electrons by mode coupling in finite semiconductor superlattices

Electron transmission through semiconductor superlattices is studied with transfer matrix method and resonance theory. The formation of electron band-pass transmission is ascribed to the coupling of different modes in those semiconductor superlattices with the symmetric unit cell. Upon Fabry-Pérot resonance condition, Bloch modes and two other resonant modes are identified to be related to the nature of the superlattice and its unit cell, respectively. The bands related to the unit cell and the superlattice overlap spontaneously in the tunneling region due to the shared wells, and the coupling of perfectly resonances results in the band-pass tunneling. Our findings provide a promising way to study electronic systems with more complicated superlattices or even optical systems with photonic crystals.

preprint2022arXiv

High-throughput decoder of quasi-cyclic LDPC codes with limited precision for continuous-variable quantum key distribution systems

More than Mbps secret key rate was demonstrated for continuous-variable quantum key distribution (CV-QKD) systems, but real-time postprocessing is not allowed, which is restricted by the throughput of the error correction decoding in postprocessing. In this paper, a high-throughput FPGA-based quasi-cyclic LDPC decoder is proposed and implemented to support Mbps real-time secret key rate generation for CV-QKD for the first time. A residual bit error correction algorithm is used to solve the problem of high frame errors rate (FER) caused by the limited precision of the decoder. Specifically, real-time high-speed decoding for CV-QKD systems with typical code rates 0.2 and 0.1 is implemented on a commercial FPGA, and two throughputs of 360.92Mbps and 194.65Mbps are achieved, respectively, which can support 17.97 Mbps and 2.48 Mbps real-time generation of secret key rates under typical transmission distances of 25km and 50km, correspondingly. The proposed method paves the way for high-rate real-time CV-QKD deployment in secure metropolitan area network.

preprint2022arXiv

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

Recently, masked image modeling (MIM) has offered a new methodology of self-supervised pre-training of vision transformers. A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e.g., ViT), albeit hierarchical vision transformers (e.g., Swin Transformer) have potentially better properties in formulating vision inputs. In this paper, we offer a new design of hierarchical vision transformers named HiViT (short for Hierarchical ViT) that enjoys both high efficiency and good performance in MIM. The key is to remove the unnecessary "local inter-unit operations", deriving structurally simple hierarchical vision transformers in which mask-units can be serialized like plain vision transformers. For this purpose, we start with Swin Transformer and (i) set the masking unit size to be the token size in the main stage of Swin Transformer, (ii) switch off inter-unit self-attentions before the main stage, and (iii) eliminate all operations after the main stage. Empirical studies demonstrate the advantageous performance of HiViT in terms of fully-supervised, self-supervised, and transfer learning. In particular, in running MAE on ImageNet-1K, HiViT-B reports a +0.6% accuracy gain over ViT-B and a 1.9$\times$ speed-up over Swin-B, and the performance gain generalizes to downstream tasks of detection and segmentation. Code will be made publicly available.

preprint2022arXiv

Isogeometric analysis of diffusion problems on random surfaces

In this article, we discuss the numerical solution of diffusion equations on random surfaces within the isogeometric framework. We describe in detail, how diffusion problems on random surfaces can be modelled and how quantities of interest may be derived. In particular, we employ a low rank approximation algorithm for the high-dimensional space-time correlation of the random solution based on an online singular value decomposition, cp. [7]. Extensive numerical studies are performed to validate the approach. In particular, we consider complex computational geometries originating from surface triangulations. The latter can be recast into the isogeometric context by transforming them into quadrangulations using the procedure from [41] and a subsequent approximation by NURBS surfaces.

preprint2022arXiv

Knowledge Graph Based Waveform Recommendation: A New Communication Waveform Design Paradigm

Traditionally, a communication waveform is designed by experts based on communication theory and their experiences on a case-by-case basis, which is usually laborious and time-consuming. In this paper, we investigate the waveform design from a novel perspective and propose a new waveform design paradigm with the knowledge graph (KG)-based intelligent recommendation system. The proposed paradigm aims to improve the design efficiency by structural characterization and representations of existing waveforms and intelligently utilizing the knowledge learned from them. To achieve this goal, we first build a communication waveform knowledge graph (CWKG) with a first-order neighbor node, for which both structured semantic knowledge and numerical parameters of a waveform are integrated by representation learning. Based on the developed CWKG, we further propose an intelligent communication waveform recommendation system (CWRS) to generate waveform candidates. In the CWRS, an improved involution1D operator, which is channel-agnostic and space-specific, is introduced according to the characteristics of KG-based waveform representation for feature extraction, and the multi-head self-attention is adopted to weigh the influence of various components for feature fusion. Meanwhile, multilayer perceptron-based collaborative filtering is used to evaluate the matching degree between the requirement and the waveform candidate. Simulation results show that the proposed CWKG-based CWRS can automatically recommend waveform candidates with high reliability.

preprint2022arXiv

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding. According to their respective characteristics, the scheme of independently designed architecture has been widely used in correspondence to each single task. This may lead to the representation learned by the model being task-specific, and inevitably result in the lack of generalization ability of the feature based on multi-modal modeling. More recent studies have shown that establishing cross-modal relationship between auditory and visual stream is a promising solution for the challenge of audio-visual multi-task learning. Therefore, as a motivation to bridge the multi-modal associations in audio-visual tasks, a unified framework is proposed to achieve target speaker detection and speech enhancement with joint learning of audio-visual modeling in this study.

preprint2022arXiv

On the Equivalence between Neural Network and Support Vector Machine

Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalences between NNs and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments is available at \url{https://github.com/leslie-CH/equiv-nn-svm}.

preprint2022arXiv

Playing Tic-Tac-Toe Games with Intelligent Single-pixel Imaging

Single-pixel imaging (SPI) is a novel optical imaging technique by replacing a two-dimensional pixelated sensor with a single-pixel detector and pattern illuminations. SPI have been extensively used for various tasks related to image acquisition and processing. In this work, a novel non-image-based task of playing Tic-Tac-Toe games interactively is merged into the framework of SPI. An optoelectronic artificial intelligent (AI) player with minimal digital computation can detect the game states, generate optimal moves and display output results mainly by pattern illumination and single-pixel detection. Simulated and experimental results demonstrate the feasibility of proposed scheme and its unbeatable performance against human players.

preprint2022arXiv

Propagation Path Loss Models in Forest Scenario at 605 MHz

When signals propagate through forest areas, they will be affected by environmental factors such as vegetation. Different types of environments have different influences on signal attenuation. This paper analyzes the existing classical propagation path loss models and the model with excess loss caused by forest areas and then proposes a new short-range wireless channel propagation model, which can be applied to different types of forest environments. We conducted continuous-wave measurements at a center frequency of 605 MHz on predetermined routes in distinct types of forest areas and recorded the reference signal received power. Then, we use various path loss models to fit the measured data based on different vegetation types and distributions. Simulation results show that the proposed model has substantially smaller fitting errors with reasonable computational complexity, as compared with representative traditional counterparts.

preprint2022arXiv

Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance

The increasing use of Machine Learning (ML) components embedded in autonomous systems -- so-called Learning-Enabled Systems (LESs) -- has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.

preprint2022arXiv

Secure two-way fiber-optic time transfer against sub-ns asymmetric delay attack

Two-way fiber-optic time transfer is a promising precise time synchronization technique with sub-nanosecond accuracy. However, asymmetric delay attack is a serious threat which cannot be prevent by any encryption method. In this paper, a dynamic model based scheme is proposed to defense the sub-nanosecond asymmetric delay attack. A threshold is set according to the estimated time difference by a two-state clock model where the fixed frequency difference is excluded from the time difference to detect the asymmetric delay attack which is smaller than the time difference induced by the fixed frequency difference. Theoretical simulation and experimental demonstration are implemented to prove the feasibility of the scheme. A two-way fiber-optic time transfer system with time stability with 24.5ps, 3.98ps, and 2.95ps at 1s, 10s, and 100s averaging time is shown under sub-ns asymmetric time delay attack experimentally. The proposed method provides a promising secure sub-ns precise time synchronization technique against asymmetric delay attack.

preprint2022arXiv

Strong Neel ordering and luminescence correlation in a two-dimensional antiferromagnet

Magneto-optical effect has been widely used in light modulation, optical sensing and information storage. Recently discovered two-dimensional (2D) van der Waals layered magnets are considered as promising platforms for investigating novel magneto-optical phenomena and devices, due to the long-range magnetic ordering down to atomically-thin thickness, rich species and tunable properties. However, majority 2D antiferromagnets suffer from low luminescence efficiency which hinders their magneto-optical investigations and applications. Here, we uncover strong light-magnetic ordering interactions in 2D antiferromagnetic MnPS3 utilizing a newly-emerged near-infrared photoluminescence (PL) mode far below its intrinsic bandgap. This ingap PL mode shows strong correlation with the Neel ordering and persists down to monolayer thickness. Combining the DFT, STEM and XPS, we illustrate the origin of the PL mode and its correlation with Neel ordering, which can be attributed to the oxygen ion-mediated states. Moreover, the PL strength can be further tuned and enhanced using ultraviolet-ozone treatment. Our studies offer an effective approach to investigate light-magnetic ordering interactions in 2D antiferromagnetic semiconductors.

preprint2022arXiv

Terahertz Receiver based on Room-Temperature Rydberg-Atoms

Realization of practical terahertz wireless communications still faces many challenges. The receiver with high sensitivity is important for THz wireless communications. Here we demonstrate a terahertz receiver based on the cesium Rydberg atoms in a room-temperature vapor cell. The minimum detectable THz electric field is calibrated. With this receiver, the phase-sensitive conversion of amplitude-modulated or frequency-modulated terahertz waves into optical signals is performed. The results show that the atomic receiver has many advantages due to its quantum properties. Especially, the long distance THz wireless communications is achievable using this receiver. Furthermore, the atomic receiver can be used in the THz wireless-to-optical link.

preprint2022arXiv

Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective

Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.

preprint2022arXiv

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice as well.

preprint2021arXiv

Detecting Operational Adversarial Examples for Reliable Deep Learning

The utilisation of Deep Learning (DL) raises new challenges regarding its dependability in critical applications. Sound verification and validation methods are needed to assure the safe and reliable use of DL. However, state-of-the-art debug testing methods on DL that aim at detecting adversarial examples (AEs) ignore the operational profile, which statistically depicts the software's future operational use. This may lead to very modest effectiveness on improving the software's delivered reliability, as the testing budget is likely to be wasted on detecting AEs that are unrealistic or encountered very rarely in real-life operation. In this paper, we first present the novel notion of "operational AEs" which are AEs that have relatively high chance to be seen in future operation. Then an initial design of a new DL testing method to efficiently detect "operational AEs" is provided, as well as some insights on our prospective research plan.

preprint2021arXiv

Electronic controllable broadband and robust terahertz surface plasmon-polaritons switch based on hybrid ITO waveguide coupler

The surface plasmon-polaritons (SPPs) switch is the key element of the integrated devices in optical computation and terahertz (THz) communications. In this paper, we propose a novel design of THz SPPs switch based on quantum engineering. Due to the robustness of coherent quantum control technique, our switch is very robust against with perturbations of geometrical parameters and presents a good performance at on-state (and off-state) from 0.5 THz to 0.7 THz. The on-state and off-state of our device can be controlled by the external voltage. We believe this finding will be the great improvement for the integrated optical computing and THz communications.

preprint2021arXiv

Machine learning topological invariants of non-Hermitian systems

The study of topological properties by machine learning approaches has attracted considerable interest recently. Here we propose machine learning the topological invariants that are unique in non-Hermitian systems. Specifically, we train neural networks to predict the winding of eigenvalues of four prototypical non-Hermitian Hamiltonians on the complex energy plane with nearly $100\%$ accuracy. Our demonstrations in the non-Hermitian Hatano-Nelson model, Su-Schrieffer-Heeger model and generalized Aubry-André-Harper model in one dimension, and two-dimensional Dirac fermion model with non-Hermitian terms show the capability of the neural networks in exploring topological invariants and the associated topological phase transitions and topological phase diagrams in non-Hermitian systems. Moreover, the neural networks trained by a small data set in the phase diagram can successfully predict topological invariants in untouched phase regions. Thus, our work paves the way to revealing non-Hermitian topology with the machine learning toolbox.

preprint2021arXiv

Towards fully ab initio simulation of atmospheric aerosol nucleation

Atmospheric aerosol nucleation contributes to more than half of cloud condensation nuclei globally. The emissions, properties and concentrations of atmospheric aerosols or aerosol precursors could respond significantly to climate change. Despite the importance for climate, the detailed nucleation mechanisms are still poorly understood. The ultimate goal of theoretical understanding aerosol nucleation is to simulate nucleation in ambient condition, hindered by lack of accurate reactive force field. Here we propose the reactive force field for nucleation systems with good size scalability based on deep neural network. The huge computational costs from direct molecular dynamics in ambient conditions are surmounted by bridging the simulation in the limited box with cluster kinetics, facilitating the aerosol nucleation simulation to be fully ab initio. We found that the acid-base formation rates previously based on hard sphere collision rate constants tend to be underestimated up to several times. These findings show that the widely recognized acid-base nucleation observed in the CLOUD (Cosmics Leaving OUtdoor Droplets) chamber experiments, pristine and polluted environments should be revisited to considering the contribution of collision enhancement. Besides, the framework here is transferable to other nucleation systems, potentially boosting the nucleation parameterizations accuracy generally to effectively advance the climate model predictions reliability.

preprint2020arXiv

A highly sensitive piezoresistive sensor based on MXene and polyvinyl butyral with a wide detection limit and low power consumption

As a new class of two-dimensional transition-metal carbide and carbonitride, MXene have been widely used in the energy storage, sensor, catalysis, electromagnetic interference shielding and other field. It is a challenge to simultaneously realize a sensor of extremely high sensitivity, wide detection limits, low power consumption and good mechanical stability. In this work, taking advantage of high conductivity of MXene and porous structure of polyvinyl butyral, a highly sensitive piezoresistive sensor was fabricated. The fabricated MXene/PVB-based sensor exhibits highly sensitive reliably with a factor of ~11.9 kPa^-1, ~1.15 kPa^-1 and ~0.20 kPa^-1 in the ranges of 31.2 Pa-312 Pa, 312 Pa- 62.4 kPa and 62.4 kPa-1248.4 kPa, respectively. The sensor has a wide detection range (~31.2 Pa to ~2.205 MPa), low detection limit (6.8 Pa), low detection voltage (0.1 mV), low power consumption (~3.6 * 10^-10 W), fast response time ( ~110 ms), as well as good mechanical stability (over 10,000 maximum-pressure cycles). Moreover, it is demonstrated that the sensor can detect subtle bending and release activities of human, including arterial pulses and voice signal, which is potentially suitable as a wide detection range, highly sensitive and low power consumption piezoresistive sensor. This work provides a new avenue to expand the application of MXene-based flexible pressure sensor in the field of wide sensing range and ultra-low power consumption.

preprint2020arXiv

Achievable Rate Region of MISO Interference Channel Aided by Intelligent Reflecting Surface

This paper investigates the achievable rate region of the multiple-input single-output (MISO) interference channel aided by intelligent reflecting surfaces (IRSs). We exploit the the additional design degree of freedom provided by the coordinated IRSs to enhance the desired signal and suppress interference so as to enlarge the achievable rate region of the interference channel. To this end, we jointly optimize the active transmit beamforming at the transmitters and passive reflective beamforming at the IRSs, subject to the constant modulus constraints of reflective beamforming vectors. To address the non-convex optimization problem, we propose an iterative algorithm to optimize the transmit beamforming via second-order cone program (SOCP) and the reflective beamforming via the semi-definite relaxation (SDR). Numerical results demonstrate that the performance of the IRS-aided interference channel with the proposed algorithm can significantly outperform the conventional interference channel without IRS.

preprint2020arXiv

Enhanced Valley Zeeman Splitting in Fe-Doped Monolayer MoS2

The Zeeman effect offers unique opportunities for magnetic manipulation of the spin degree of freedom (DOF). Recently, valley Zeeman splitting, referring to the lifting of valley degeneracy, has been demonstrated in two-dimensional transition metal dichalcogenides (TMDs) at liquid helium temperature. However, to realize the practical applications of valley pseudospins, the valley DOF must be controllable by a magnetic field at room temperature, which remains a significant challenge. Magnetic doping in TMDs can enhance the Zeeman splitting, however, to achieve this experimentally is not easy. Here, we report unambiguous magnetic manipulation of valley Zeeman splitting at 300 K (g = -6.4) and 10 K (g = -11) in a CVD-grown Fe-doped MoS2 monolayer; the effective g factor can be tuned to -20.7 by increasing the Fe dopant concentration, which represents an approximately fivefold enhancement as compared to undoped MoS2. Our measurements and calculations reveal that the enhanced splitting and geff factors are due to the Heisenberg exchange interaction of the localized magnetic moments (Fe 3d electrons) with MoS2 through the d-orbital hybridization.

preprint2020arXiv

Finite-key analysis for twin-field quantum key distribution based on generalized operator dominance condition

Quantum key distribution (QKD) can help two distant peers to share secret key bits, whose security is guaranteed by the law of physics. In practice, the secret key rate of a QKD protocol is always lowered with the increasing of channel distance, which severely limits the applications of QKD. Recently, twin-field (TF) QKD has been proposed and intensively studied, since it can beat the rate-distance limit and greatly increase the achievable distance of QKD. Remarkalebly, K. Maeda et. al. proposed a simple finite-key analysis for TF-QKD based on operator dominance condition. Although they showed that their method is sufficient to beat the rate-distance limit, their operator dominance condition is not general, i.e. it can be only applied in three decoy states scenarios, which implies that its key rate cannot be increased by introducing more decoy states, and also cannot reach the asymptotic bound even in case of preparing infinite decoy states and optical pulses. Here, to bridge this gap, we propose an improved finite-key analysis of TF-QKD through devising new operator dominance condition. We show that by adding the number of decoy states, the secret key rate can be furtherly improved and approach the asymptotic bound. Our theory can be directly used in TF-QKD experiment to obtain higher secret key rate. Our results can be directly used in experiments to obtain higher key rates.

preprint2020arXiv

High-throughput GPU layered decoder of multi-edge type low density parity check codes in continuous-variable quantum key distribution systems

The decoding throughput in the postprocessing is one of the bottlenecks for a continuous-variable quantum key distribution (CV-QKD) system. In this paper, we propose a layered decoder to decode quasi-cyclic multi-edge type LDPC (QC-METLDPC) codes based on graphic processing unit (GPU) in continuous-variable quantum key distribution (CV-QKD) systems. We optimize the storage method of the parity check matrix, merge the sub-matrices which are unrelated, and decode multiple codewords in parallel on GPU. Simulation results demonstrate that the average decoding speed of LDPC codes with three typical code rates, i.e., 0.1, 0.05 and 0.02, is up to 64.11Mbits/s, 48.65Mbits/s and 39.51Mbits/s, respectively, when decoding 128 codewords of length 106 simultaneously without early termination.

preprint2020arXiv

In-plane terahertz surface plasmon-polaritons coupler based on adiabatic following

We propose a robust and broadband integrated terahertz (THz) coupler based on the in-plane surface plasmon polaritons (SPPs) waveguides, conducted with the quantum coherent control -- Stimulated Raman Adiabatic Passage (STIRAP). Our coupler consists of two asymmetric specific curved corrugated metallic structures working as the input and output SPPs waveguides, and one straight corrugated metallic structure functioning as the middle SPPs waveguide. From the theoretical and simulated results, we demonstrate that the SPPs can be efficiently transfered from the input to the output waveguides. Our device is robust against the perturbations of geometric parameters, and meanwhile it manifests broadband performance (from 0.3 THz to 0.8 THz) with the high transmission rate over 70$\%$. The in-plane THz coupler can largely simplify the fabrication process, which will make contribution to develop compact and robust integrated THz devices and promote the future applications in all optical network and THz communications.

preprint2020arXiv

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply

In recent years, the mean field theory has been applied to the study of neural networks and has achieved a great deal of success. The theory has been applied to various neural network structures, including CNNs, RNNs, Residual networks, and Batch normalization. Inevitably, recent work has also covered the use of dropout. The mean field theory shows that the existence of depth scales that limit the maximum depth of signal propagation and gradient backpropagation. However, the gradient backpropagation is derived under the gradient independence assumption that weights used during feed forward are drawn independently from the ones used in backpropagation. This is not how neural networks are trained in a real setting. Instead, the same weights used in a feed-forward step needs to be carried over to its corresponding backpropagation. Using this realistic condition, we perform theoretical computation on linear dropout networks and a series of experiments on dropout networks. Our empirical results show an interesting phenomenon that the length gradients can backpropagate for a single input and a pair of inputs are governed by the same depth scale. Besides, we study the relationship between variance and mean of statistical metrics of the gradient and shown an emergence of universality. Finally, we investigate the maximum trainable length for deep dropout networks through a series of experiments using MNIST and CIFAR10 and provide a more precise empirical formula that describes the trainable length than original work.

preprint2020arXiv

Optimized protocol for twin-field quantum key distribution

Twin-field quantum key distribution (TF-QKD) and its variant protocols are highly attractive due to the advantage of overcoming the rate-loss limit for secret key rates of point-to-point QKD protocols. For variations of TF-QKD, the key point to ensure security is switching randomly between a code mode and a test mode. Among all TF-QKD protocols, their code modes are very different, e.g. modulating continuous phases, modulating only two opposite phases, and sending or not sending signal pulses. Here we show that, by discretizing the number of global phases in the code mode, we can give a unified view on the first two types of TF-QKD protocols, and demonstrate that increasing the number of discrete phases extends the achievable distance, and as a trade-off, lowers the secret key rate at short distances due to the phase post-selection.

preprint2020arXiv

Population transfer via a dissipative structural continuum

We propose a model to study quantum population transfer via a structural continuum. The model is composed of two spins which are coupled to two bosonic modes separately by two control pulses, and the two bosonic modes are coupled to a common structural continuum. We show that efficient population transfer can be achieved between the two spins by using a multi-level stimulated Raman adiabatic passage (STIRAP) across the continuum, which we refer to as straddle STIRAP via continuum. We also consider the stability of this model against different control parameters and show that efficient population transfer can be achieved even in presence a moderate dissipation.

preprint2020arXiv

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features limit the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch and the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on the benchmark datasets show the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.

preprint2020arXiv

Sn4+ Precursor Enables 12.4% Efficient Kesterite Solar Cell from DMSO Solution with Open Circuit Voltage Deficit Below 0.30 V

The limiting factor preventing kesterite (CZTSSe) thin film solar cell performance further improvement is the large open-circuit voltage deficit (Voc,def) issue, which is 0.345V for the current world record device with an efficiency of 12.6%. In this work, SnCl4 and SnCl2_2H2O are respectively used as tin precursor to investigate the Voc,def issue of dimethyl sulfoxide (DMSO) solution processed CZTSSe solar cells. Different complexations of tin compounds with thiourea and DMSO lead to different reaction pathways from solution to absorber material and thus dramatic difference in photovoltaic performance. The coordination of Sn2+ with Tu leads to the formation of SnS and ZnS and Cu2S in the precursor film, which converted to selenides first and then fused to CZTSSe, resulting in poor film quality and device performance. The highest efficiency obtained from this film is 8.84% with a Voc,def of 0.391V. The coordination of Sn4+ with DMSO facilitates direct formation ofkesterite CZTS phase in the precursor film which directed converted to CZTSSe during selenization, resulting in compositional uniform absorber and high device performance. A device with active area efficiency 12.2% and a Voc,def of 0.344 V was achieved from Sn4+ solution processed absorber. Furthermore, CZTSSe/CdS heterojunction heat treatment (JHT) significantly improved Sn4+ device performance but had slightly negative effect on Sn2+ device. A champion CZTSSe solar cell with a total area efficiency of 12.4% (active are efficiency 13.6%) and low Voc,def of 0.297 V was achieved from Sn4+ solution. Our results demonstrate the preformed uniform kesterite phase enabled by Sn4+ precursor is the key in achieving highly efficient kesterite absorber material. The lowest Voc-def and high efficiency achieved here shines new light on the future of kesterite solar cell.

preprint2020arXiv

Spin-Valley Locking Effect in Defect States of Monolayer MoS$_2$

Valley pseudospin in two-dimensional (2D) transition-metal dichalcogenides (TMDs) allows optical control of spin-valley polarization and intervalley quantum coherence. Defect states in TMDs give rise to new exciton features and theoretically exhibit spin-valley polarization; however, experimental achievement of this phenomenon remains challenges. Here, we report unambiguous valley pseudospin of defect-bound localized excitons in CVD-grown monolayer MoS2; enhanced valley Zeeman splitting with an effective g-factor of -6.2 is observed. Our results reveal that all five d-orbitals and the increased effective electron mass contribute to the band shift of defect states, demonstrating a new physics of the magnetic responses of defect-bound localized excitons, strikingly different from that of A excitons. Our work paves the way for the manipulation of the spin-valley degrees of freedom through defects toward valleytronic devices.