Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
66works
0followers
31topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

66 published item(s)

preprint2026arXiv

Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents

Standard embodied evaluations do not independently score whether an agent correctly commits to task completion at episode closure, a capacity we call terminal commitment. Behaviorally distinct failures--never completing the task, completing it but failing to stop, and reporting success without sufficient evidence--collapse into the same benchmark failure. We introduce VIGIL, an evaluation framework that makes terminal commitment independently measurable. Under VIGIL's default protocol, agents observe only egocentric RGB, receive no action-success signals, and must end each episode with a semantic report checked deterministically against hidden world state. This yields two separate scores: world-state completion (W) and benchmark success (B), where B additionally requires a correct terminal report. This decoupling makes four outcome categories distinguishable: missed execution, post-attainment drift, unsupported commitment, and verified success. Across 20 models on 1,000 frozen episodes, systems with comparable W differ by up to 19.7 pp in B: one model converts achieved states into correct reports, while another with near-identical execution drifts past the goal without closing. An action-feedback intervention further tests the separation: execution-oriented signals improve W broadly, yet commitment failures persist in models that do not already ground terminal reports in the achieved state. VIGIL provides a protocol that makes terminal commitment independently visible and scorable.

preprint2026arXiv

SAFE Quantum Machine Learning with Variational Quantum Classifiers

We propose a variational quantum classifier operating on high dimensional deep representations via amplitude encoding, stabilized by a learnable classical pre encoding layer.By combining normalized amplitude embeddings with bounded quantum observables, the resulting model induces a structured and smooth hypothesis class with controlled sensitivity to input variations. Model reliability is assessed using SAFE-AI metrics derived from the Cramer von Mises divergence, enabling consistent evaluation across accuracy, robustness, and explainability dimensions. Empirical results show that the proposed quantum model provides competitive predictive performance compared with strong classical baselines while exhibiting a more balanced SAFE reliability profile, with improved robustness to noise and stability under structured feature removal. These findings suggest that variational quantum circuits offer a principled mechanism for stability oriented SAFE learning in safety critical settings.

preprint2026arXiv

Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are less explored, often relying on restrictive priors like a pretrained base matrix, ReLU activations or non-verifiable singularity conditions. We first investigate the limits of neural networks constrained strictly to low-rank manifolds without pretrained dense priors. We demonstrate a theoretical paradox: while purely rank-1 layers can exactly interpolate arbitrary scalar datasets, they collapse for function approximations. To overcome this bottleneck without surrendering parameter efficiency, we introduce a unified \textit{Structural Correspondence} framework. We prove that augmenting low-rank layers with only a minimal sparse diagonal component, say a Diagonal plus Low-Rank (DLoR) structure, is sufficient to reach Universal Approximation. We show that any full-rank transformation can be exactly reconstructed using these DLoR components by trading off network width (additive decomposition) or depth (multiplicative decomposition). By tracking asymptotic Taylor remainders, we prove that DLoR neural networks fully restore the Universal Approximation Theorem for general activation functions. Finally, we establish that multiplicative depth provides superior parameter-to-expressivity scaling compared to additive width. Our results show that dense matrices and specific activation functions are not topological prerequisites for universal expressivity.

preprint2024arXiv

Regularized lattice Boltzmann method based maximum principle and energy stability preserving finite-difference scheme for the Allen-Cahn equation

The Allen-Cahn equation (ACE) inherently possesses two crucial properties: the maximum principle and the energy dissipation law. Preserving these two properties at the discrete level is also necessary in the numerical methods for the ACE. In this paper, unlike the traditional top-down macroscopic numerical schemes which discretize the ACE directly, we first propose a novel bottom-up mesoscopic regularized lattice Boltzmann method based macroscopic numerical scheme for d (=1, 2, 3)-dimensional ACE, where the DdQ(2d+1) [(2d+1) discrete velocities in d-dimensional space] lattice structure is adopted. In particular, the proposed macroscopic numerical scheme has a second-order accuracy in space, and can also be viewd as an implicit-explicit finite-difference scheme for the ACE, in which the nonlinear term is discretized semi-implicitly, the temporal derivative and dissipation term of the ACE are discretized by using the explicit Euler method and second-order central difference method, respectively. Then we also demonstrate that the proposed scheme can preserve the maximum bound principle and the original energy dissipation law at the discrete level under some conditions. Finally, some numerical experiments are conducted to validate our theoretical analysis.

preprint2024arXiv

Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt

Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant `anomaly' model predictions using task-specific `normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.

preprint2023arXiv

AdaptSLAM: Edge-Assisted Adaptive SLAM with Resource Constraints via Uncertainty Minimization

Edge computing is increasingly proposed as a solution for reducing resource consumption of mobile devices running simultaneous localization and mapping (SLAM) algorithms, with most edge-assisted SLAM systems assuming the communication resources between the mobile device and the edge server to be unlimited, or relying on heuristics to choose the information to be transmitted to the edge. This paper presents AdaptSLAM, an edge-assisted visual (V) and visual-inertial (VI) SLAM system that adapts to the available communication and computation resources, based on a theoretically grounded method we developed to select the subset of keyframes (the representative frames) for constructing the best local and global maps in the mobile device and the edge server under resource constraints. We implemented AdaptSLAM to work with the state-of-the-art open-source V- and VI-SLAM ORB-SLAM3 framework, and demonstrated that, under constrained network bandwidth, AdaptSLAM reduces the tracking error by 62% compared to the best baseline method.

preprint2022arXiv

$T_{cc}^{+}(3875)$ relevant $DD^*$ scattering from $N_f=2$ lattice QCD

The $S$-wave $DD^*$ scattering in the isospin $I=0,1$ channels is studied in $N_f=2$ lattice QCD at $m_π\approx 350$ MeV. It is observed that the $DD^*$ interaction is repulsive in the $I=1$ channel when the $DD^*$ energy is near the $DD^*$ threshold. In contrast, the $DD^*$ interaction in the $I=0$ channel is definitely attractive in a wide range of the $DD^*$ energy. This is consistent with the isospin assignment $I=0$ for $T_{cc}^+(3875)$. By analyzing the components of the $DD^*$ correlation functions, it turns out that the quark diagram responsible for the different properties of $I=0,1$ $DD^*$ interactions can be understood as the charged $ρ$ meson exchange effect. This observation provides direct information on the internal dynamics of $T_{cc}^+(3875)$.

preprint2022arXiv

AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-time Image Enhancement

The 3D Lookup Table (3D LUT) is a highly-efficient tool for real-time image enhancement tasks, which models a non-linear 3D color transform by sparsely sampling it into a discretized 3D lattice. Previous works have made efforts to learn image-adaptive output color values of LUTs for flexible enhancement but neglect the importance of sampling strategy. They adopt a sub-optimal uniform sampling point allocation, limiting the expressiveness of the learned LUTs since the (tri-)linear interpolation between uniform sampling points in the LUT transform might fail to model local non-linearities of the color transform. Focusing on this problem, we present AdaInt (Adaptive Intervals Learning), a novel mechanism to achieve a more flexible sampling point allocation by adaptively learning the non-uniform sampling intervals in the 3D color space. In this way, a 3D LUT can increase its capability by conducting dense sampling in color ranges requiring highly non-linear transforms and sparse sampling for near-linear transforms. The proposed AdaInt could be implemented as a compact and efficient plug-and-play module for a 3D LUT-based method. To enable the end-to-end learning of AdaInt, we design a novel differentiable operator called AiLUT-Transform (Adaptive Interval LUT Transform) to locate input colors in the non-uniform 3D LUT and provide gradients to the sampling intervals. Experiments demonstrate that methods equipped with AdaInt can achieve state-of-the-art performance on two public benchmark datasets with a negligible overhead increase. Our source code is available at https://github.com/ImCharlesY/AdaInt.

preprint2022arXiv

An Interpretability Evaluation Benchmark for Pre-trained Language Models

While pre-trained language models (LMs) have brought great improvements in many NLP tasks, there is increasing attention to explore capabilities of LMs and interpret their predictions. However, existing works usually focus only on a certain capability with some downstream tasks. There is a lack of datasets for directly evaluating the masked word prediction performance and the interpretability of pre-trained LMs. To fill in the gap, we propose a novel evaluation benchmark providing with both English and Chinese annotated data. It tests LMs abilities in multiple dimensions, i.e., grammar, semantics, knowledge, reasoning and computation. In addition, it provides carefully annotated token-level rationales that satisfy sufficiency and compactness. It contains perturbed instances for each original instance, so as to use the rationale consistency under perturbations as the metric for faithfulness, a perspective of interpretability. We conduct experiments on several widely-used pre-trained LMs. The results show that they perform very poorly on the dimensions of knowledge and computation. And their plausibility in all dimensions is far from satisfactory, especially when the rationale is short. In addition, the pre-trained LMs we evaluated are not robust on syntax-aware data. We will release this evaluation benchmark at \url{http://xyz}, and hope it can facilitate the research progress of pre-trained LMs.

preprint2022arXiv

Annihilation diagram contribution to charmonium masses

In this work, we generate gauge configurations with $N_f=2$ dynamical charm quarks on anisotropic lattices. The mass shift of $1S$ and $1P$ charmonia owing to the charm quark annihilation effect can be investigated directly in a manner of unitary theory. The distillation method is adopted to treat the charm quark annihilation diagrams at a very precise level. For $1S$ charmonia, the charm quark annihilation effect almost does not change the $J/ψ$ mass, but lifts the $η_c$ mass by approximately 3-4 MeV. For $1P$ charmonia, this effect results in positive mass shifts of approximately 1 MeV for $χ_{c1}$ and $h_c$, but decreases the $χ_{c2}$ mass by approximately 3 MeV. We have not obtain a reliable result for the mass shift of $χ_{c0}$. In addition, it is observed that the spin averaged mass of the spin-triplet $1P$ charmonia is in a good agreement with the $h_c$, as expected by the non-relativistic quark model and measured by experiments.

preprint2022arXiv

Article's Scientific Prestige: measuring the impact of individual articles in the Web of Science

We performed a citation analysis on the Web of Science publications consisting of more than 63 million articles and 1.45 billion citations on 254 subjects from 1981 to 2020. We proposed the Article's Scientific Prestige (ASP) metric and compared this metric to number of citations (#Cit) and journal grade in measuring the scientific impact of individual articles in the large-scale hierarchical and multi-disciplined citation network. In contrast to #Cit, ASP, that is computed based on the eigenvector centrality, considers both direct and indirect citations, and provides steady-state evaluation cross different disciplines. We found that ASP and #Cit are not aligned for most articles, with a growing mismatch amongst the less cited articles. While both metrics are reliable for evaluating the prestige of articles such as Nobel Prize winning articles, ASP tends to provide more persuasive rankings than #Cit when the articles are not highly cited. The journal grade, that is eventually determined by a few highly cited articles, is unable to properly reflect the scientific impact of individual articles. The number of references and coauthors are less relevant to scientific impact, but subjects do make a difference.

preprint2022arXiv

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

This paper presents DavarOCR, an open-source toolbox for OCR and document understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 different task forms. DavarOCR provides detailed usage instructions and the trained models for each algorithm. Compared with the previous opensource OCR toolbox, DavarOCR has relatively more complete support for the sub-tasks of the cutting-edge technology of document understanding. In order to promote the development and application of OCR technology in academia and industry, we pay more attention to the use of modules that different sub-domains of technology can share. DavarOCR is publicly released at https://github.com/hikopensource/Davar-Lab-OCR.

preprint2022arXiv

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

In this paper, we focus on studying robustness evaluation of Chinese question matching. Most of the previous work on analyzing robustness issue focus on just one or a few types of artificial adversarial examples. Instead, we argue that it is necessary to formulate a comprehensive evaluation about the linguistic capabilities of models on natural texts. For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. DuQM contains 3 categories and 13 subcategories with 32 linguistic perturbations. The extensive experiments demonstrate that DuQM has a better ability to distinguish different models. Importantly, the detailed breakdown of evaluation by linguistic phenomenon in DuQM helps us easily diagnose the strength and weakness of different models. Additionally, our experiment results show that the effect of artificial adversarial examples does not work on the natural texts.

preprint2022arXiv

Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting

End-to-end text spotting has attached great attention recently due to its benefits on global optimization and high maintainability for real applications. However, the input scale has always been a tough trade-off since recognizing a small text instance usually requires enlarging the whole image, which brings high computational costs. In this paper, to address this problem, we propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework, which aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency. Concretely, we adopt a resolution selector to dynamically decide the input resolutions for different images, which is constraint by both inference accuracy and computational cost. Another sequential knowledge distillation strategy is conducted on the text recognition branch, making the low-res input obtains comparable performance to a high-res image. The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability. Extensive experiments on several text spotting benchmarks show that the proposed method vastly improves the usability of low-res models. The code is available at https://github.com/hikopensource/DAVAR-Lab-OCR/.

preprint2022arXiv

Efficient semiquantum key distribution based on single photons in both polarization and spatial-mode degrees of freedom

In this paper, we propose an efficient semiquantum key distribution (SQKD) protocol which is based on single photons in both polarization and spatial-mode degrees of freedom. This protocol is feasible for a quantum communicant distributing a random private key to a classical communicant. This protocol needn't require the classical communicant to use any quantum memory or unitary operation equipment. We validate the complete robustness of the transmissions of single photons between two communicants. It turns out that during these transmissions, if Eve wants not to be detected by two communicants, she will obtain nothing useful about the final shared key bits. Compared with Boyer et al.'s famous pioneering SQKD protocol (Phys Rev Lett, 2007, 99:140501), this protocol has double quantum communication capacity, as one single photon with two degrees of freedom for generating the key bits can carry two private bits; and this protocol has higher quantum communication efficiency, as it consumes less qubits for establishing a private key of the same length. Compared with the only existing SQKD protocol with single photons in two degrees of freedom (Int J Theor Phys, 2020, 59: 2807), this protocol has higher quantum communication efficiency.

preprint2022arXiv

Exploration of Artificial Intelligence-oriented Power System Dynamic Simulators

With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consists of a high-performance simulator with neural network supportability and flexible external and internal application programming interfaces (APIs). With the support of APIs, simulation-assisted AI and AI-assisted simulation form a comprehensive interaction mechanism between power system dynamic simulations and AI. A prototype of this design is implemented and made public based on a highly efficient electromechanical simulator. Tests of this prototype are carried out under four scenarios including sample generation, AI-based stability prediction, data-driven dynamic component modeling, and AI-aided stability control, which prove the validity, flexibility, and efficiency of the design and implementation of the AI-oriented power system dynamic simulator.

preprint2022arXiv

Feasibility Study of Neural ODE and DAE Modules for Power System Dynamic Component Modeling

In the context of high penetration of renewables, the need to build dynamic models of power system components based on accessible measurement data has become urgent. To address this challenge, firstly, a neural ordinary differential equations (ODE) module and a neural differential-algebraic equations (DAE) module are proposed to form a data-driven modeling framework that accurately captures components' dynamic characteristics and flexibly adapts to various interface settings. Secondly, analytical models and data-driven models learned by the neural ODE and DAE modules are integrated together and simulated simultaneously using unified transient stability simulation methods. Finally, the neural ODE and DAE modules are implemented with Python and made public on GitHub. Using the portal measurements, three simple but representative cases of excitation controller modeling, photovoltaic power plant modeling, and equivalent load modeling of a regional power network are carried out in the IEEE-39 system and 2383wp system. Neural dynamic model-integrated simulations are compared with the original model-based ones to verify the feasibility and potentiality of the proposed neural ODE and DAE modules.

preprint2022arXiv

Guide Local Feature Matching by Overlap Estimation

Local image feature matching under large appearance, viewpoint, and distance changes is challenging yet important. Conventional methods detect and match tentative local features across the whole images, with heuristic consistency checks to guarantee reliable matches. In this paper, we introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR, to constrain local feature matching in the commonly visible region. OETR performs overlap estimation in a two-step process of feature correlation and then overlap regression. As a preprocessing module, OETR can be plugged into any existing local feature detection and matching pipeline, to mitigate potential view angle or scale variance. Intensive experiments show that OETR can boost state-of-the-art local feature matching performance substantially, especially for image pairs with small shared regions. The code will be publicly available at https://github.com/AbyssGaze/OETR.

preprint2022arXiv

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

preprint2022arXiv

p-orbital disclination states in non-Euclidean geometries

Disclinations are ubiquitous lattice defects existing in almost all crystalline materials. In two-dimensional nanomaterials, disclinations lead to the warping and deformation of the hosting material, yielding non-Euclidean geometries. However, such geometries have never been investigated experimentally in the context of topological phenomena. Here, by creating the physical realization of disclinations in conical and saddle-shaped acoustic systems, we demonstrate that disclinations can lead to topologically protected bound modes in non-Euclidean surfaces. In the designed honeycomb sonic crystal for p-orbital acoustic waves, non-Euclidean geometry interplay with the p-orbital physics and the band topology, showing intriguing emergent features as confirmed by consistent experiments and simulations. Our study opens a pathway towards topological phenomena in non-Euclidean geometries that may inspire future studies on, e.g., electrons and phonons in nanomaterials with curved surfaces.

preprint2022arXiv

Perceptual Quality Assessment for Fine-Grained Compressed Images

Recent years have witnessed the rapid development of image storage and transmission systems, in which image compression plays an important role. Generally speaking, image compression algorithms are developed to ensure good visual quality at limited bit rates. However, due to the different compression optimization methods, the compressed images may have different levels of quality, which needs to be evaluated quantificationally. Nowadays, the mainstream full-reference (FR) metrics are effective to predict the quality of compressed images at coarse-grained levels (the bit rates differences of compressed images are obvious), however, they may perform poorly for fine-grained compressed images whose bit rates differences are quite subtle. Therefore, to better improve the Quality of Experience (QoE) and provide useful guidance for compression algorithms, we propose a full-reference image quality assessment (FR-IQA) method for compressed images of fine-grained levels. Specifically, the reference images and compressed images are first converted to $YCbCr$ color space. The gradient features are extracted from regions that are sensitive to compression artifacts. Then we employ the Log-Gabor transformation to further analyze the texture difference. Finally, the obtained features are fused into a quality score. The proposed method is validated on the fine-grained compression image quality assessment (FGIQA) database, which is especially constructed for assessing the quality of compressed images with close bit rates. The experimental results show that our metric outperforms mainstream FR-IQA metrics on the FGIQA database. We also test our method on other commonly used compression IQA databases and the results show that our method obtains competitive performance on the coarse-grained compression IQA databases as well.

preprint2022arXiv

Pyramid Frequency Network with Spatial Attention Residual Refinement Module for Monocular Depth Estimation

Deep-learning-based approaches to depth estimation are rapidly advancing, offering superior performance over existing methods. To estimate the depth in real-world scenarios, depth estimation models require the robustness of various noise environments. In this work, a Pyramid Frequency Network(PFN) with Spatial Attention Residual Refinement Module(SARRM) is proposed to deal with the weak robustness of existing deep-learning methods. To reconstruct depth maps with accurate details, the SARRM constructs a residual fusion method with an attention mechanism to refine the blur depth. The frequency division strategy is designed, and the frequency pyramid network is developed to extract features from multiple frequency bands. With the frequency strategy, PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets. Additional experiments on the noisy NYUv2 dataset demonstrate that PFN is more reliable than existing deep-learning methods in high-noise scenes.

preprint2022arXiv

Quantum dialogue based on quantum encryption with single photons in both polarization and spatial-mode degrees of freedom

In this paper, a novel information leakage resistant quantum dialogue (QD) protocol with single photons in both polarization and spatial-mode degrees of freedom is proposed, which utilizes quantum encryption technology to overcome the information leakage problem. In the proposed QD protocol, during the transmission process, the single photons in both polarization and spatial-mode degrees of freedom used for encoding two communicants' private classical bits are protected by both quantum encryption technology and decoy photon technology. For avoiding the information leakage problem, the initial states of the single photons in both polarization and spatial-mode degrees of freedom used for encoding two communicants' private classical bits are shared between two communicants through quantum key encryption and decryption. The information-theoretical efficiency of the proposed QD protocol is as high as 40%.

preprint2022arXiv

Scatter Points in Space: 3D Detection from Multi-view Monocular Images

3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision. To combine information from different perspectives without troublesome 2D instance tracking, recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space, which is inefficient. In this paper, we attempt to improve multi-view feature aggregation by proposing a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity. The scattered points augmented by multi-view geometric constraints and visual features are then employed to infer objects location and shape in the scene. To make up the limitations of single frame and model multi-view geometry explicitly, we further propose a surface filter module for noise suppression. Experimental results show that our method achieves significantly better performance than previous works in terms of 3D detection (more than 0.1 AP improvement on some categories of ScanNet). The code will be publicly available.

preprint2022arXiv

Semiquantum Private Comparison of Size Relationship Based on d-level Single-Particle States

In this paper, we propose a novel semiquantum private comparison (SQPC) protocol of size relationship based on d-level single-particle states. The designed protocol can compare the size relationship of different privacy messages from two classical users with the help of a semi-honest third party (TP), who is permitted to misbehave on her own but cannot be in collusion with anyone else. The correctness analysis shows that this protocol can gain correct comparison results. The security analysis turns out that this protocol can resist famous outside attacks and participant attacks. Moreover, this protocol can guarantee that TP does not know the accurate comparison results. Compared with the only existing SQPC protocol of size relationship (Quantum Inf. Process. 20:124 (2021)), this protocol takes advantage over it on the aspects of initial quantum resource, TP's measurement operations and TP's knowledge about the comparison results.

preprint2022arXiv

SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement

Image-adaptive lookup tables (LUTs) have achieved great success in real-time image enhancement tasks due to their high efficiency for modeling color transforms. However, they embed the complete transform, including the color component-independent and the component-correlated parts, into only a single type of LUTs, either 1D or 3D, in a coupled manner. This scheme raises a dilemma of improving model expressiveness or efficiency due to two factors. On the one hand, the 1D LUTs provide high computational efficiency but lack the critical capability of color components interaction. On the other, the 3D LUTs present enhanced component-correlated transform capability but suffer from heavy memory footprint, high training difficulty, and limited cell utilization. Inspired by the conventional divide-and-conquer practice in the image signal processor, we present SepLUT (separable image-adaptive lookup table) to tackle the above limitations. Specifically, we separate a single color transform into a cascade of component-independent and component-correlated sub-transforms instantiated as 1D and 3D LUTs, respectively. In this way, the capabilities of two sub-transforms can facilitate each other, where the 3D LUT complements the ability to mix up color components, and the 1D LUT redistributes the input colors to increase the cell utilization of the 3D LUT and thus enable the use of a more lightweight 3D LUT. Experiments demonstrate that the proposed method presents enhanced performance on photo retouching benchmark datasets than the current state-of-the-art and achieves real-time processing on both GPUs and CPUs.

preprint2022arXiv

Single-state multi-party semiquantum key agreement protocol based on multi-particle GHZ entangled states

In this paper, we put forward a novel single-state three-party semiquantum key agreement (SQKA) protocol with three-particle GHZ entangled states first. Different with previous quantum key agreement (QKA) protocols, the proposed single-state three-party SQKA protocol can realize the goal that a quantum party and two classical parties who only possess limited quantum capabilities equally contribute to the generation of a shared private key over quantum channels. Detailed security analysis turns out that the proposed single-state three-party SQKA protocol is secure against several famous attacks from an outside eavesdropper, such as the Trojan horse attack, the entangle-measure attack, the measure-resend attack and the intercept-resend attack. Moreover, it can resist the participant attack, which means that the shared private key cannot be determined fully by any nontrivial subset of three parties. The proposed single-state three-party SQKA protocol has the following nice features: (1) it only employs one kind of three-particle GHZ entangled states as initial quantum resource; (2) it doesn't need pre-shared keys among different parties; (3) it doesn't need unitary operations or quantum entanglement swapping. Finally, we generalize the proposed single-state three-party SQKA protocol into the case of N-party by only employing one kind of N-particle GHZ entangled states as initial quantum resource, which inherits the nice features of its three-party counterpart.

preprint2022arXiv

Snow Mask Guided Adaptive Residual Network for Image Snow Removal

Image restoration under severe weather is a challenging task. Most of the past works focused on removing rain and haze phenomena in images. However, snow is also an extremely common atmospheric phenomenon that will seriously affect the performance of high-level computer vision tasks, such as object detection and semantic segmentation. Recently, some methods have been proposed for snow removing, and most methods deal with snow images directly as the optimization object. However, the distribution of snow location and shape is complex. Therefore, failure to detect snowflakes / snow streak effectively will affect snow removing and limit the model performance. To solve these issues, we propose a Snow Mask Guided Adaptive Residual Network (SMGARN). Specifically, SMGARN consists of three parts, Mask-Net, Guidance-Fusion Network (GF-Net), and Reconstruct-Net. Firstly, we build a Mask-Net with Self-pixel Attention (SA) and Cross-pixel Attention (CA) to capture the features of snowflakes and accurately localized the location of the snow, thus predicting an accurate snow mask. Secondly, the predicted snow mask is sent into the specially designed GF-Net to adaptively guide the model to remove snow. Finally, an efficient Reconstruct-Net is used to remove the veiling effect and correct the image to reconstruct the final snow-free image. Extensive experiments show that our SMGARN numerically outperforms all existing snow removal methods, and the reconstructed images are clearer in visual contrast. All codes will be available.

preprint2022arXiv

Spot-adaptive Knowledge Distillation

Knowledge distillation (KD) has become a well established paradigm for compressing deep neural networks. The typical way of conducting knowledge distillation is to train the student network under the supervision of the teacher network to harness the knowledge at one or multiple spots (i.e., layers) in the teacher network. The distillation spots, once specified, will not change for all the training samples, throughout the whole distillation process. In this work, we argue that distillation spots should be adaptive to training samples and distillation epochs. We thus propose a new distillation strategy, termed spot-adaptive KD (SAKD), to adaptively determine the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period. As SAKD actually focuses on "where to distill" instead of "what to distill" that is widely investigated by most existing works, it can be seamlessly integrated into existing distillation methods to further improve their performance. Extensive experiments with 10 state-of-the-art distillers are conducted to demonstrate the effectiveness of SAKD for improving their distillation performance, under both homogeneous and heterogeneous distillation settings. Code is available at https://github.com/zju-vipa/spot-adaptive-pytorch

preprint2022arXiv

The Astrometric Performance Test of 80-cm Telescope at Yaoan Station and Precise CCD Positions of Apophis

The 80-cm azimuthal telescope is newly mounted at Yaoan Station, Purple Mountain Observatory in 2018. The astrometric performance of the telescope is tested in the following three aspects. (a) The geometric distortion of its CCD attached. It is stable in both a single epoch and multi epochs. Eight distortion solutions are derived over about one year. The maximum values range from 0.75 to 0.79 pixel and the median values range from 0.14 to 0.16 pixel. (b) The limit magnitude of stars. About 20.5 magnitude (Gaia-G) stars can be detected with Johnson-V filter exposured in 300 seconds. The astrometric error of about 20.5 magnitude stars is estimated at 0.14 arcsec using the fitted sigmoidal function. (c) The astrometric accuracy and the precision of stacked fast-moving faint object. 24 stacked frames of the potentially hazardous asteroid (PHA) (99942) Apophis are derived on April 14 and 15, 2021 (fainter than 18 mag) based on the ephemeris shifts. During data reduction, the newest Gaia EDR3 Catalog and Jet Propulsion Laboratory Horizons ephemeris are referenced as theoretical positions of stars and Apophis, respectively. Our results show that the mean (O-C)s (observed minus computed) of Apophis are -0.018 and 0.020 arcsec in right ascention and declination, and the dispersions are estimated at 0.094 and 0.085 arcsec, respectively, which show the consistency of the stacked results by Astrometrica.

preprint2022arXiv

The Glueball content of $η_c$

We carry out the first lattice QCD derivation of the mixing energy and the mixing angle of the pseudoscalar charmonium and glueball on two gauge ensembles with $N_f=2$ degenerate dynamical charm quarks. The mixing energy is determined to be $49(6)$ MeV on the near physical charm ensemble, which seems insensitive to charm quark mass. By the assumption that $X(2370)$ is predominantly a pseudoscalar glueball, the mixing angle is determined to be approximately $4.6(6)^\circ$, which results in a $+3.9(9)$ MeV mass shift of the ground state pseudoscalar charmonium. In the mean time, the mixing can raise the total width of the pseudoscalar charmonium by 7.2(8) MeV, which explains to some extent the relative large total width of the $η_c$ meson. As a result, the branching fraction of $η_c\to γγ$ can be understood in this $c\bar{c}$-glueball mixing framework. On the other hand, the possible discrepancy of the theoretical predictions and the experimental results of the partial width of $J/ψ\toγη_c$ cannot be alleviated by the $c\bar{c}$-glueball mixing picture yet, which demands future precise experimental measurements of this partial width.

preprint2022arXiv

Two-party secure semiquantum summation against the collective-dephasing noise

In this paper, we propose a two-party semiquantum summation protocol, where two classical users can accomplish the summation of their private binary sequences with the assistance of a quantum semi-honest third party (TP). The term 'semi-honest' implies that TP cannot conspire with others but is able to implement all kinds oof attacks. This protocol employs logical qubits as traveling particles to overcome the negative influence of collective-dephasing noise and needn't make any two parties pre-share a random secret key. The security analysis turns out that this protocol can effectively prevent the outside attacks from Eve and the participant attacks from TP. Moreover, TP has no knowledge about the summation results.

preprint2022arXiv

VizBelle: A Design Space of Embellishments for Data Visualization

Visual embellishments, as a form of non-linguistic rhetorical figures, are used to help convey abstract concepts or attract readers' attention. Creating data visualizations with appropriate and visually pleasing embellishments is challenging since this process largely depends on the experience and the aesthetic taste of designers. To help facilitate designers in the ideation and creation process, we propose a design space, VizBelle, based on the analysis of 361 classified visualizations from online sources. VizBelle consists of four dimensions, namely, communication goal to fit user intention, object to select the target area, strategy and technique to offer potential approaches. We further provide a website to present detailed explanations and examples of various techniques. We conducted a within-subject study with 20 professional and amateur design enthusiasts to evaluate the effectiveness of our design space. Results show that our design space is illuminating and useful for designers to create data visualizations with embellishments.

preprint2022arXiv

VR Viewport Pose Model for Quantifying and Exploiting Frame Correlations

The importance of the dynamics of the viewport pose, i.e., the location and the orientation of users' points of view, for virtual reality (VR) experiences calls for the development of VR viewport pose models. In this paper, informed by our experimental measurements of viewport trajectories across 3 different types of VR interfaces, we first develop a statistical model of viewport poses in VR environments. Based on the developed model, we examine the correlations between pixels in VR frames that correspond to different viewport poses, and obtain an analytical expression for the visibility similarity (ViS) of the pixels across different VR frames. We then propose a lightweight ViS-based ALG-ViS algorithm that adaptively splits VR frames into the background and the foreground, reusing the background across different frames. Our implementation of ALG-ViS in two Oculus Quest 2 rendering systems demonstrates ALG-ViS running in real time, supporting the full VR frame rate, and outperforming baselines on measures of frame quality and bandwidth consumption.

preprint2021arXiv

Concentrated Document Topic Model

We propose a Concentrated Document Topic Model(CDTM) for unsupervised text classification, which is able to produce a concentrated and sparse document topic distribution. In particular, an exponential entropy penalty is imposed on the document topic distribution. Documents that have diverse topic distributions are penalized more, while those having concentrated topics are penalized less. We apply the model to the benchmark NIPS dataset and observe more coherent topics and more concentrated and sparse document-topic distributions than Latent Dirichlet Allocation(LDA).

preprint2021arXiv

Deep Stochastic Volatility Model

Volatility for financial assets returns can be used to gauge the risk for financial market. We propose a deep stochastic volatility model (DSVM) based on the framework of deep latent variable models. It uses flexible deep learning models to automatically detect the dependence of the future volatility on past returns, past volatilities and the stochastic noise, and thus provides a flexible volatility model without the need to manually select features. We develop a scalable inference and learning algorithm based on variational inference. In real data analysis, the DSVM outperforms several popular alternative volatility models. In addition, the predicted volatility of the DSVM provides a more reliable risk measure that can better reflex the risk in the financial market, reaching more quickly to a higher level when the market becomes more risky and to a lower level when the market is more stable, compared with the commonly used GARCH type model with a huge data set on the U.S. stock market.

preprint2021arXiv

Dense U-net for super-resolution with shuffle pooling layer

Recent researches have achieved great progress on single image super-resolution(SISR) due to the development of deep learning in the field of computer vision. In these method, the high resolution input image is down-scaled to low resolution space using a single filter, commonly max-pooling, before feature extraction. This means that the feature extraction is performed in biased filtered feature space. We demonstrate that this is sub-optimal and causes information loss. In this work, we proposed a state-of-the-art convolutional neural network method called Dense U-net with shuffle pooling. To achieve this, a modified U-net with dense blocks, called dense U-net, is proposed for SISR. Then, a new pooling strategy called shuffle pooling is designed, which is aimed to replace the dense U-Net for down-scale operation. By doing so, we effectively replace the handcrafted filter in the SISR pipeline with more lossy down-sampling filters specifically trained for each feature map, whilst also reducing the information loss of the overall SISR operation. In addition, a mix loss function, which combined with Mean Square Error(MSE), Structural Similarity Index(SSIM) and Mean Gradient Error (MGE), comes up to reduce the perception loss and high-level information loss. Our proposed method achieves superior accuracy over previous state-of-the-art on the three benchmark datasets: SET14, BSD300, ICDAR2003. Code is available online.

preprint2021arXiv

Exclusive Topic Modeling

We propose an Exclusive Topic Modeling (ETM) for unsupervised text classification, which is able to 1) identify the field-specific keywords though less frequently appeared and 2) deliver well-structured topics with exclusive words. In particular, a weighted Lasso penalty is imposed to reduce the dominance of the frequently appearing yet less relevant words automatically, and a pairwise Kullback-Leibler divergence penalty is used to implement topics separation. Simulation studies demonstrate that the ETM detects the field-specific keywords, while LDA fails. When applying to the benchmark NIPS dataset, the topic coherence score on average improves by 22% and 10% for the model with weighted Lasso penalty and pairwise Kullback-Leibler divergence penalty, respectively.

preprint2021arXiv

Mask-based Data Augmentation for Semi-supervised Semantic Segmentation

Semantic segmentation using convolutional neural networks (CNN) is a crucial component in image analysis. Training a CNN to perform semantic segmentation requires a large amount of labeled data, where the production of such labeled data is both costly and labor intensive. Semi-supervised learning algorithms address this issue by utilizing unlabeled data and so reduce the amount of labeled data needed for training. In particular, data augmentation techniques such as CutMix and ClassMix generate additional training data from existing labeled data. In this paper we propose a new approach for data augmentation, termed ComplexMix, which incorporates aspects of CutMix and ClassMix with improved performance. The proposed approach has the ability to control the complexity of the augmented data while attempting to be semantically-correct and address the tradeoff between complexity and correctness. The proposed ComplexMix approach is evaluated on a standard dataset for semantic segmentation and compared to other state-of-the-art techniques. Experimental results show that our method yields improvement over state-of-the-art methods on standard datasets for semantic image segmentation.

preprint2021arXiv

Refining Language Models with Compositional Explanations

Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement.

preprint2021arXiv

The Variational Bayesian Inference for Network Autoregression Models

We develop a variational Bayesian (VB) approach for estimating large-scale dynamic network models in the network autoregression framework. The VB approach allows for the automatic identification of the dynamic structure of such a model and obtains a direct approximation of the posterior density. Compared to Markov Chain Monte Carlo (MCMC) based sampling approaches, the VB approach achieves enhanced computational efficiency without sacrificing estimation accuracy. In the simulation study conducted here, the proposed VB approach detects various types of proper active structures for dynamic network models. Compared to the alternative approach, the proposed method achieves similar or better accuracy, and its computational time is halved. In a real data analysis scenario of day-ahead natural gas flow prediction in the German gas transmission network with 51 nodes between October 2013 and September 2015, the VB approach delivers promising forecasting accuracy along with clearly detected structures in terms of dynamic dependence.

preprint2020arXiv

A General Initialization Scheme for Electromagnetic Transient Simulation: Towards Large-Scale Hybrid AC-DC Grids

With the large-scale hybrid AC-DC grids coming into being, electromagnetic transient (EMT) simulation is required to accurately describe the dynamics of systems. However, the EMT steady-state initialization for hybrid AC-DC system is difficult and time-consuming when the system scale is huge. In order to provide a stable snapshot for EMT simulation with nonlinear components and black-box components, this paper proposes a general initialization scheme for EMT simulation (EMT-GIS) which can be implemented in the electromagnetic transient program (EMTP)-type simulators. First, an integrated power flow (IPF) algorithm is introduced to provide the steady-state results. Then, an initialized snapshot calculation-and-splicing mechanism is designed for EMT-GIS. The proposed EMT-GIS is tested using a hybrid AC-DC system in China on the CloudPSS simulation platform. Test results verify the effectiveness of the proposed EMT-GIS.

preprint2020arXiv

A Lattice Study of the Two-photon Decay Widths for Scalar and Pseudo-scalar Charmonium

In this exploratory study, two photon decay widths of pseudo-scalar ($η_c$) and scalar ($χ_{c0}$) charmonium are computed using two ensembles of $N_f=2$ twisted mass lattice QCD gauge configurations. The simulation is performed two lattice ensembles with lattice spacings $a=0.067$ fm with size $32^3\times{64}$ and $a=0.085$ fm with size $24^3\times{48}$, respectively. The results for the decay widths for the two charmonia are obtained which are in the right ballpark however smaller than the experimental ones. Possible reasons for these discrepancies are discussed.

preprint2020arXiv

Atomically Thin Boron Nitride as an Ideal Spacer for Metal-Enhanced Fluorescence

The metal-enhanced fluorescence (MEF) considerably enhances the luminescence for various applications, but its performance largely depends on the dielectric spacer between the fluorophore and plasmonic system. It is still challenging to produce a defect-free spacer having an optimized thickness with a subnanometer accuracy that enables reusability without affecting the enhancement. In this study, we demonstrate the use of atomically thin hexagonal boron nitride (BN) as an ideal MEF spacer owing to its multifold advantages over the traditional dielectric thin films. With rhodamine 6G as a representative fluorophore, it largely improves the enhancement factor (up to ~95+-5), sensitivity (10^-8 M), reproducibility, and reusability (~90% of the plasmonic activity is retained after 30 cycles of heating at 350 °C in air) of MEF. This can be attributed to its two-dimensional structure, thickness control at the atomic level, defect-free quality, high affinities to aromatic fluorophores, good thermal stability, and excellent impermeability. The atomically thin BN spacers could increase the use of MEF in different fields and industries.

preprint2020arXiv

Automatic Crack Detection on Road Pavements Using Encoder Decoder Architecture

Inspired by the development of deep learning in computer vision and object detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform crack detection in an end-to-end method. Crack characteristics with multiple context information are automatically able to learn and perform end-to-end crack detection. Then, a multi-dilation module embedded in an encoder-decoder architecture is proposed. The crack features of multiple context sizes can be integrated into the multi-dilation module by dilation convolution with different dilatation rates, which can obtain much more cracks information. Finally, the hierarchical feature learning module is designed to obtain a multi-scale features from the high to low-level convolutional layers, which are integrated to predict pixel-wise crack detection. Some experiments on public crack databases using 118 images were performed and the results were compared with those obtained with other methods on the same images. The results show that the proposed U-HDN method achieves high performance because it can extract and fuse different context sizes and different levels of feature maps than other algorithms.

preprint2020arXiv

Charmed and $ϕ$ meson decay constants from 2+1-flavor lattice QCD

On a lattice with 2+1-flavor dynamical domain-wall fermions at the physical pion mass, we calculate the decay constants of $D_{s}^{(*)}$, $D^{(*)}$ and $ϕ$. The lattice size is $48^3\times96$, which corresponds to a spatial extension of $\sim5.5$ fm with the lattice spacing $a\approx 0.114$ fm. For the valence light, strange and charm quarks, we use overlap fermions at several mass points close to their physical values. Our results at the physical point are $f_D=213(5)$ MeV, $f_{D_s}=249(7)$ MeV, $f_{D^*}=234(6)$ MeV, $f_{D_s^*}=274(7)$ MeV, and $f_ϕ=241(9)$ MeV. The couplings of $D^*$ and $D_s^*$ to the tensor current ($f_V^T$) can be derived, respectively, from the ratios $f_{D^*}^T/f_{D^*}=0.91(4)$ and $f_{D_s^*}^T/f_{D_s^*}=0.92(4)$, which are the first lattice QCD results. We also obtain the ratios $f_{D^*}/f_D=1.10(3)$ and $f_{D_s^*}/f_{D_s}=1.10(4)$, which reflect the size of heavy quark symmetry breaking in charmed mesons. The ratios $f_{D_s}/f_{D}=1.16(3)$ and $f_{D_s^*}/f_{D^*}=1.17(3)$ can be taken as a measure of SU(3) flavor symmetry breaking.

preprint2020arXiv

Correlators in the supereigenvalue model in the Ramond sector

We investigate the supereigenvalue model in the Ramond sector. We prove that its partition function can be obtained by acting on elementary functions with exponents of the given operators. The Virasoro constraints for this supereigenvalue model are presented. The remarkable property of these bosonic constraint operators is that they obey the Witt algebra and null 3-algebra. The compact expression of correlators can be derived from these Virasoro constraints.

preprint2020arXiv

Deep SCNN-based Real-time Object Detection for Self-driving Vehicles Using LiDAR Temporal Data

Real-time accurate detection of three-dimensional (3D) objects is a fundamental necessity for self-driving vehicles. Most existing computer vision approaches are based on convolutional neural networks (CNNs). Although the CNN-based approaches can achieve high detection accuracy, their high energy consumption is a severe drawback. To resolve this problem, novel energy efficient approaches should be explored. Spiking neural network (SNN) is a promising candidate because it has orders-of-magnitude lower energy consumption than CNN. Unfortunately, the studying of SNN has been limited in small networks only. The application of SNN for large 3D object detection networks has remain largely open. In this paper, we integrate spiking convolutional neural network (SCNN) with temporal coding into the YOLOv2 architecture for real-time object detection. To take the advantage of spiking signals, we develop a novel data preprocessing layer that translates 3D point-cloud data into spike time data. We propose an analog circuit to implement the non-leaky integrate and fire neuron used in our SCNN, from which the energy consumption of each spike is estimated. Moreover, we present a method to calculate the network sparsity and the energy consumption of the overall network. Extensive experiments have been conducted based on the KITTI dataset, which show that the proposed network can reach competitive detection accuracy as existing approaches, yet with much lower average energy consumption. If implemented in dedicated hardware, our network could have a mean sparsity of 56.24% and extremely low total energy consumption of 0.247mJ only. Implemented in NVIDIA GTX 1080i GPU, we can achieve 35.7 fps frame rate, high enough for real-time object detection.

preprint2020arXiv

DO-Conv: Depthwise Over-parameterized Convolutional Layer

Convolutional layers are the core building blocks of Convolutional Neural Networks (CNNs). In this paper, we propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel. The composition of the two convolutions constitutes an over-parameterization, since it adds learnable parameters, while the resulting linear operation can be expressed by a single convolution layer. We refer to this depthwise over-parameterized convolutional layer as DO-Conv. We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs on many classical vision tasks, such as image classification, detection, and segmentation. Moreover, in the inference phase, the depthwise convolution is folded into the conventional convolution, reducing the computation to be exactly equivalent to that of a convolutional layer without over-parameterization. As DO-Conv introduces performance gains without incurring any computational complexity increase for inference, we advocate it as an alternative to the conventional convolutional layer. We open-source a reference implementation of DO-Conv in Tensorflow, PyTorch and GluonCV at https://github.com/yangyanli/DO-Conv.

preprint2020arXiv

Effects of intervalley scatterings in thermoelectric performance of band-convergent antimonene

The strategy of band convergence of multi-valley conduction bands or multi-peak valence bands has been widely used to search or improve thermoelectric materials. However, the phonon-assisted intervalley scatterings due to multiple band degeneracy are usually neglected in the thermoelectric community. In this work, we investigate the (thermo)electric properties of non-polar monolayer $β$- and $α$-antimonene considering full mode- and momentum-resolved electron-phonon interactions. We also analyze thoroughly the selection rules on electron-phonon matrix-elements using group-theory arguments. Our calculations reveal strong intervalley scattering between the nearly degenerate valley states in both $β$- and $α$-antimonene, and the commonly-used deformation potential approximation neglecting the dominant intervalley scattering gives inaccurate estimations of the electron-phonon scattering and thermoelectric transport properties. By considering full electron-phonon interactions based on the rigid-band approximation, we find that, the maximum value of the thermoelectric figure of merits $zT$ at room temperature reduces to 0.37 in $β$-antimonene, by a factor of 5.7 comparing to the value predicted based on the constant relaxation-time approximation method. Our work not only provides an accurate prediction of the thermoelectric performances of antimonenes that reveals the key role of intervalley scatterings in determining the electronic part of zT, but also showcases a computational framework for thermoelectric materials.

preprint2020arXiv

Ensemble of Deep Convolutional Neural Networks for Automatic Pavement Crack Detection and Measurement

Automated pavement crack detection and measurement are important road issues. Agencies have to guarantee the improvement of road safety. Conventional crack detection and measurement algorithms can be extremely time-consuming and low efficiency. Therefore, recently, innovative algorithms have received increased attention from researchers. In this paper, we propose an ensemble of convolutional neural networks (without a pooling layer) based on probability fusion for automated pavement crack detection and measurement. Specifically, an ensemble of convolutional neural networks was employed to identify the structure of small cracks with raw images. Secondly, outputs of the individual convolutional neural network model for the ensemble were averaged to produce the final crack probability value of each pixel, which can obtain a predicted probability map. Finally, the predicted morphological features of the cracks were measured by using the skeleton extraction algorithm. To validate the proposed method, some experiments were performed on two public crack databases (CFD and AigleRN) and the results of the different state-of-the-art methods were compared. The experimental results show that the proposed method outperforms the other methods. For crack measurement, the crack length and width can be measure based on different crack types (complex, common, thin, and intersecting cracks.). The results show that the proposed algorithm can be effectively applied for crack measurement.

preprint2020arXiv

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Pixel Labeling

Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art pixel labeling neural networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both forms of GSTO are lightweight and plug-and-play, which can be flexibly integrated into networks or modules for learning better multi-scale features. In particular, by plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet) for pixel labeling, and it achieves new state-of-the-art results on the COCO benchmark for human pose estimation and other benchmarks for semantic segmentation including Cityscapes, LIP and Pascal Context, with negligible extra computational cost. Moreover, experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules like PPM and ASPP. Code will be made available at https://github.com/VDIGPKU/GSTO.

preprint2020arXiv

Mechanical Properties of Atomically Thin Boron Nitride and the Role of Interlayer Interactions

Atomically thin boron nitride (BN) nanosheets are important two-dimensional nanomaterials with many unique properties distinct from those of graphene, but the investigation of their mechanical properties still greatly lacks. Here we report that high-quality single-crystalline mono- and few-layer BN nanosheets are one of the strongest electrically insulating materials. More intriguingly, few-layer BN shows mechanical behaviors quite different from those of few-layer graphene under indentation. In striking contrast to graphene, whose strength decreases by more than 30% when the number of layers increases from 1 to 8, the mechanical strength of BN nanosheets is not sensitive to increasing thickness. We attribute this difference to the distinct interlayer interactions and hence sliding tendencies in these two materials under indentation. The significantly better mechanical integrity of BN nanosheets makes them a more attractive candidate than graphene for several applications, e.g. as mechanical reinforcements.

preprint2020arXiv

MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?

Binary Neural Networks (BNNs) are neural networks which use binary weights and activations instead of the typical 32-bit floating point values. They have reduced model sizes and allow for efficient inference on mobile or embedded devices with limited power and computational resources. However, the binarization of weights and activations leads to feature maps of lower quality and lower capacity and thus a drop in accuracy compared to traditional networks. Previous work has increased the number of channels or used multiple binary bases to alleviate these problems. In this paper, we instead present an architectural approach: MeliusNet. It consists of alternating a DenseBlock, which increases the feature capacity, and our proposed ImprovementBlock, which increases the feature quality. Experiments on the ImageNet dataset demonstrate the superior performance of our MeliusNet over a variety of popular binary architectures with regards to both computation savings and accuracy. Furthermore, with our method we trained BNN models, which for the first time can match the accuracy of the popular compact network MobileNet-v1 in terms of model size, number of operations and accuracy. Our code is published online at https://github.com/hpi-xnor/BMXNet-v2

preprint2020arXiv

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion

To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-art methods integrate 1D temporal convolution into a conventional 2D CNN backbone. However, they all exploit 1D temporal convolution of fixed kernel size (i.e., 3) in the network building block, thus have suboptimal temporal modeling capability to handle both long-term and short-term actions. To address this problem, we first investigate the impacts of different kernel sizes for the 1D temporal convolutional filters. Then, we propose a simple yet efficient operation called Mixed Temporal Convolution (MixTConv), which consists of multiple depthwise 1D convolutional filters with different kernel sizes. By plugging MixTConv into the conventional 2D CNN backbone ResNet-50, we further propose an efficient and effective network architecture named MSTNet for action recognition, and achieve state-of-the-art results on multiple benchmarks.

preprint2020arXiv

On the Capacity of Fractal D2D Social Networks with Hierarchical Communications

The maximum capacity of fractal D2D (device-to-device) social networks with both direct and hierarchical communications is studied in this paper. Specifically, the fractal networks are characterized by the direct social connection and the self-similarity. Firstly, for a fractal D2D social network with direct social communications, it is proved that the maximum capacity is $ Θ\left(\frac{1}{\sqrt{n\log n}}\right) $ if a user communicates with one of his/her direct contacts randomly, where $ n $ denotes the total number of users in the network, and it can reach up to $ Θ\left(\frac{1}{\log n}\right) $ if any pair of social contacts with distance $ d $ communicate according to the probability in proportion to $ d^{-β} $. Secondly, since users might get in touch with others without direct social connections through the inter-connected multiple users, the fractal D2D social network with these hierarchical communications is studied as well, and the related capacity is further derived. Our results show that this capacity is mainly affected by the correlation exponent $ε$ of the fractal structure. The capacity is reduced in proportional to $ \frac{1}{\log n} $ if $ 2<ε<3 $, while the reduction coefficient is $ \frac{1}{n} $ if $ ε>3 $.

preprint2020arXiv

Raman Signature and Phonon Dispersion of Atomically Thin Boron Nitride

Raman spectroscopy has become an essential technique to characterize and investigate graphene and many other two-dimensional materials. However, there still lacks consensus on the Raman signature and phonon dispersion of atomically thin boron nitride (BN), which has many unique properties distinct from graphene. Such a knowledge gap greatly affects the understanding of basic physical and chemical properties of atomically thin BN as well as the use of Raman spectroscopy to study these nanomaterials. Here, we use both experiment and simulation to reveal the intrinsic Raman signature of monolayer and few-layer BN. We find experimentally that atomically thin BN without interaction with substrate has a G band frequency similar to that of bulk hexagonal BN, but strain induced by substrate can cause pronounced Raman shifts. This is in excellent agreement with our first-principles density functional theory (DFT) calculations at two levels of theory, including van der Waals dispersion forces (opt-vdW) and a fractional of the exact exchange from Hartree-Fock (HF) theory through hybrid HSE06 functional. Both calculations demonstrate that the intrinsic E2g mode of BN does not depend sensibly on the number of layers. Our simulations also suggest the importance of the exact exchange mixing parameter in calculating the vibrational modes in BN, as it determines the fraction of HF exchange included in the DFT calculations.

preprint2020arXiv

Regularised Text Logistic Regression: Key Word Detection and Sentiment Classification for Online Reviews

Online customer reviews have become important for managers and executives in the hospitality and catering industry who wish to obtain a comprehensive understanding of their customers&#39; demands and expectations. We propose a Regularized Text Logistic (RTL) regression model to perform text analytics and sentiment classification on unstructured text data, which automatically identifies a set of statistically significant and operationally insightful word features, and achieves satisfactory predictive classification accuracy. We apply the RTL model to two online review datasets, Restaurant and Hotel, from TripAdvisor. Our results demonstrate satisfactory classification performance compared with alternative classifiers with a highest true positive rate of 94.9%. Moreover, RTL identifies a small set of word features, corresponding to 3% for Restaurant and 20% for Hotel, which boosts working efficiency by allowing managers to drill down into a much smaller set of important customer reviews. We also develop the consistency, sparsity and oracle property of the estimator.

preprint2020arXiv

Roper State from Overlap Fermions

The Roper state is extracted with valence overlap fermions on a $2+1$-flavor domain-wall fermion lattice (spacing $a = 0.114$ fm and $m_π = 330$ MeV) using both the Sequential Empirical Bayes (SEB) method and the variational method. The results are consistent, provided that a large smearing-size interpolation operator is included in the variational calculation to have better overlap with the lowest radial excitation. Similar calculations carried out for an anisotropic clover lattice with similar parameters find the Roper $\approx 280$ MeV higher than that of the overlap fermion. The fact that the prediction of the Roper state by overlap fermions is consistently lower than those of clover fermions, chirally improved fermions, and twisted-mass fermions over a wide range of pion masses has been dubbed a &#34;Roper puzzle.&#34; To understand the origin of this difference, we study the hairpin $Z$-diagram in the isovector scalar meson ($a_0$) correlator in the quenched approximation. Comparing the $a_0$ correlators for clover and overlap fermions, at a pion mass of 290 MeV, we find that the spectral weight of the ghost state with clover fermions is smaller than that of the overlap at $a = 0.12$ fm and $0.09$ fm, whereas the whole $a_0$ correlators of clover and overlap at $a = 0.06$ fm coincide within errors. This suggests that chiral symmetry is restored for clover at $a \le 0.06$ fm and that the Roper should come down at and below this $a$. We conclude that this work supports a resolution of the &#34;Roper puzzle&#34; due to $Z$-graph type chiral dynamics. This entails coupling to higher components in the Fock space (e.g. $Nπ$, $Nππ$ states) to induce the effective flavor-spin interaction between quarks as prescribed in the chiral quark model, resulting in the parity-reversal pattern as observed in the experimental excited states of $N, Δ$ and $Λ$.

preprint2020arXiv

Stochastic Sparse Subspace Clustering

State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points. By enforcing such representation to be sparse, sparse subspace clustering is guaranteed to produce a subspace-preserving data affinity where two points are connected only if they are from the same subspace. On the other hand, however, data points from the same subspace may not be well-connected, leading to the issue of over-segmentation. We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points in self-expressive model. In particular, we show that dropout is equivalent to adding a squared $\ell_2$ norm regularization on the representation coefficients, therefore induces denser solutions. Then, we reformulate the optimization problem as a consensus problem over a set of small-scale subproblems. This leads to a scalable and flexible sparse subspace clustering approach, termed Stochastic Sparse Subspace Clustering, which can effectively handle large scale datasets. Extensive experiments on synthetic data and real world datasets validate the efficiency and effectiveness of our proposal.

preprint2020arXiv

Strangeonium-like hybrids on the lattice

The strangeonium-like $s\bar{s}g$ hybrids are investigated from lattice QCD in the quenched approximation. In the Coulomb gauge, spatially extended operators are constructed for $1^{--}$ and $(0,1,2)^{-+}$ states with the color octet $s\bar{s}$ component being separated from the chromomagnetic field strength by spatial distances $r$, whose matrix elements between the vacuum and the corresponding states are interpreted as Bethe-Salpeter (BS) wave functions. In each of the $(1,2)^{-+}$ channels, the masses and the BS wave functions are reliably derived. The $1^{-+}$ ground state mass is around 2.1-2.2 GeV, and that of $2^{-+}$ is around 2.3-2.4 GeV, while the masses of the first excited states are roughly 1.4 GeV higher. This mass splitting is much larger than the expectation of the phenomenological flux-tube model or constituent gluon model for hybrids, which is usually a few hundred MeV. The BS wave functions with respect to $r$ show clear radial nodal structures of non-relativistic two-body system, which imply that $r$ is a meaningful dynamical variable for these hybrids and motivate a color halo picture of hybrids that the color octet $s\bar{s}$ is surrounded by gluonic degrees of freedom. In the $1^{--}$ channel, the properties of the lowest two states comply with those of $ϕ(1020)$ and $ϕ(1680)$. We have not obtained convincing information relevant to $ϕ(2170)$ yet, however, we argue that whether $ϕ(2170)$ is a conventional $s\bar{s}$ meson or a $s\bar{s}g$ hybrid within the color halo scenario, the ratio of partial decay widths $Γ(ϕη)$ and $Γ(ϕη&#39;)$ observed by BESIII can be understood by the mechanism of hadronic transition of a strangeonium-like meson along with the $η-η&#39;$ mixing.

preprint2020arXiv

The Luneburg-Lissajous lens

We design a new absolute optical instrument by composing Luneburg lens and Lissajous lens, and analyze its imaging mechanism from the perspective of simple harmonic oscillations. The imaging positions are determined by the periods of motions in x and y directions. Besides, instruments composed with multi parts are also devised, which can form imaging or self-imaging as long as the motion periods of x and y directions are satisfied to similar conditions. Our work provides a new way to analyze the imaging of different lens by simply dissociating the equations of motions, and reveal the internal mechanism of some absolute optical instruments.

preprint2020arXiv

Two-dimensional van der Waals Heterostructures for Synergistically Improved Surface Enhanced Raman Spectroscopy

Surface enhanced Raman spectroscopy (SERS) is a precise and non-invasive analytical technique that is widely used in chemical analysis, environmental protection, food processing, pharmaceutics, and diagnostic biology. However, it is still a challenge to produce highly sensitive and reusable SERS substrates with minimum fluorescence background. In this work, we propose the use of van der Waals heterostructures of two-dimensional materials (2D materials) to cover plasmonic metal nanoparticles to solve this challenge. The heterostructures of atomically thin boron nitride (BN) and graphene provide synergistic effects: (1) electrons could tunnel through the atomically thin BN, allowing the charge transfer between graphene and probe molecules to suppress fluorescence background; (2) the SERS sensitivity is enhanced by graphene via chemical enhancement mechanism (CM) in addition to electromagnetic field mechanism (EM); (3) the atomically thin BN protects the underlying graphene and Ag nanoparticles from oxidation during heating for regeneration at 360 °C in the air so that the SERS substrates could be reused. These advances will facilitate wider applications of SERS, especially on the detection of fluorescent molecules with higher sensitivity.

preprint2019arXiv

A coupled-channel lattice study on the resonance-like structure $Z_c(3900)$

In this exploratory study, near-threshold scattering of $D$ and $\bar{D}^*$ meson is investigated using lattice QCD with $N_f=2+1+1$ twisted mass fermion configurations. The calculation is performed within the coupled-channel Lüscher&#39;s finite-size formalism. The study focuses on the channel with $I^G(J^{PC})=1^+(1^{+-})$ where the resonance-like structure $Z_c(3900)$ was discovered. We first identify the most relevant two channels of the problem and the lattice study is performed within the two-channel scattering model. Combined with a two-channel Ross-Shaw theory, scattering parameters are extracted from the energy levels by solving the generalized eigenvalue problem. Our results on the scattering length parameters suggest that, at the particular lattice parameters that we studied, the best fitted parameters do not correspond to a peak behavior in the elastic scattering cross section near the threshold. Furthermore, within the zero-range Ross-Shaw theory, the scenario of a narrow resonance close to the threshold is disfavored beyond $3σ$ level.

preprint2019arXiv

Plasmon-polaritonic quadrupole topological insulators

Quadrupole topological insulator is a symmetry-protected higher-order topological phase with intriguing topology of Wannier bands, which, however, has not yet been realized in plasmonic metamaterials. Here, we propose a lattice of plasmon-polaritonic nanocavities which can realize quadrupole topological insulators by exploiting the geometry-dependent sign-reversal of the couplings between the daisy-like nanocavities. The designed system exhibits various topological and trivial phases as characterized by the nested Wannier bands and the topological quadrupole moment which can be controlled by the distances between the nanocavities. Our study opens a pathway toward plasmonic topological metamaterials with quadrupole topology.

preprint2019arXiv

Where does active travel fit within local community narratives of mobility space and place?

Encouraging sustainable mobility patterns is at the forefront of policymaking at all scales of governance as the collective consciousness surrounding climate change continues to expand. Not every community, however, possesses the necessary economic or socio-cultural capital to encourage modal shifts away from private motorized vehicles towards active modes. The current literature on `soft&#39; policy emphasizes the importance of tailoring behavior change campaigns to individual or geographic context. Yet, there is a lack of insight and appropriate tools to promote active mobility and overcome transport disadvantage from the local community perspective. The current study investigates the promotion of walking and cycling adoption using a series of focus groups with local residents in two geographic communities, namely Chicago&#39;s (1) Humboldt Park neighborhood and (2) suburb of Evanston. The research approach combines traditional qualitative discourse analysis with quantitative text-mining tools, namely topic modeling and sentiment analysis. The analysis uncovers the local mobility culture, embedded norms and values associated with acceptance of active travel modes in different communities. We observe that underserved populations within diverse communities view active mobility simultaneously as a necessity and as a symbol of privilege that is sometimes at odds with the local culture. The mixed methods approach to analyzing community member discourses is translated into policy findings that are either tailored to local context or broadly applicable to curbing automobile dominance. Overall, residents of both Humboldt Park and Evanston envision a society in which multimodalism replaces car-centrism, but differences in the local physical and social environments would and should influence the manner in which overarching policy objectives are met.