Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
60works
0followers
44topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

60 published item(s)

preprint2026arXiv

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

Operating and maintaining (O&M) large-scale online engine systems (eg, search, recommendation and advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. Despite the inherent suitability of LLM-based agents for such operational scenarios, the critical bottleneck impeding their practical deployment lies not in reasoning, but in orchestration capability - specifically, the precise selection of relevant data (encompassing metrics, logs, and change events) and applicable knowledge (including handbook-defined rules and empirically derived practitioner experience) tailored to each individual operational event. Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. Here we present Bian Que, an agentic operating framework with three contributions: (i) The unified operational paradigm, which abstracts routine daily O&M actions into three canonical patterns: release interception, proactive inspection, and alert root cause analysis; (ii) The flexible Skill Arrangement, each predefined Skill explicitly defines the requisite data and operational knowledge for each specific context. Such Skills can be automatically generated and updated by LLM agents, and can also be iteratively optimized by on-call engineers via natural language instructions. (iii) The unified self-evolving mechanism, where each correction signal enables two parallel evolutionary pathways: distilling event memory into knowledge, and targeted refinement of Skills. Deployed on the e-commerce search engine of KuaiShou, Bian Que reduces alert volume by 75%, achieves 80% root-cause analysis accuracy, cuts mean time to resolution by over 50%, and attains a 99.0% pass rate on offline evaluations. Codes are at https://github.com/benchen4395/BianQue_Assistant.

preprint2026arXiv

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention

Multi-Agent Path Finding (MAPF) is a coordination problem that requires computing globally consistent, collision-free trajectories from individual start positions to assigned goal positions under combinatorial planning complexity. In dense environments, suboptimal initial plans induce compound conflicts that hinder feasible repair. For repair-based solvers like LNS2, initial plan quality critically affects downstream repair, yet this factor remains underexplored. We propose DiffLNS, a hybrid framework that integrates a discrete denoising diffusion probabilistic model (D3PM) with LNS2. The D3PM serves as an initializer with sparse social attention that learns a spatiotemporal prior over coordinated multi-agent action trajectories from expert demonstrations and samples multiple joint plans. Operating directly on the categorical action space, our discrete diffusion preserves the MAPF action structure and samples from a multimodal joint-plan distribution to produce diverse drafts well suited for neighborhood repair. These drafts act as warm starts for downstream repair, which completes unfinished trajectories and resolves remaining conflicts under hard MAPF constraints. Experimental results show that despite being trained only on instances with at most 96 agents, the initializer generalizes to scenarios with up to 312 agents at inference time. Across 20 complex and congested settings, DiffLNS achieves an average success rate of 95.8%, outperforming the strongest tested baseline by 9.6 percentage points and matching or exceeding all baselines in all 20 settings. To the best of our knowledge, this is the first work to leverage discrete diffusion for warm-starting an LNS-based MAPF solver.

preprint2026arXiv

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

The Diffusion Transformer (DiT) architecture is the state-of-the-art paradigm for high-fidelity image generation, underpinning models like Stable Diffusion-3 and FLUX.1. However, deploying these models on resource-constrained mobile devices entails prohibitive computational and memory overhead. While efficiency-driven approaches like Linear-DiT and static pruning alleviate bottlenecks, they often incur quality degradation. Unlike cloud environments, mobile constraints require a single-model paradigm that dynamically balances fidelity and latency. We introduce ElasticDiT, which achieves this dynamic trade-off by adjusting spatial compression ratios and DiT block depths. By integrating Shift Sparse Block Attention (SSBA) and a Tiny DWT-Distilled VAE (T-DVAE), ElasticDiT reduces inference latency and memory footprint while maintaining image quality. Experiments confirm that ElasticDiT effectively covers a wide range of fidelity-latency trade-offs within a single set of parameters. By jointly adjusting compression and depth, a single ElasticDiT model can be reconfigured on-the-fly to outperform task-specific baselines. Specifically, our flex lite variant achieves an HPS of 32.87, surpassing the Flux model, while maintaining competitive quality at 84.16 percent average sparsity through SSBA. Furthermore, the plug-and-play T-DVAE provides SD3-level reconstruction with only 1/8x the computational cost of standard VAEs, and Flow-GRPO boosts semantic alignment (GenEval: 66.93 to 73.62). These results demonstrate that ElasticDiT offers a versatile, hardware-adaptive solution that eliminates the need for multiple specialized models, providing a promising path for future high-resolution image generation on mobile devices.

preprint2026arXiv

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

Currently, process reward models (PRMs) have exhibited remarkable potential for test-time scaling. Since large language models (LLMs) regularly generate flawed intermediate reasoning steps when tackling a broad spectrum of reasoning and decision-making tasks, PRMs are required to possess capabilities for detecting process-level errors in real-world scenarios. However, existing benchmarks primarily focus on mathematical reasoning, thereby failing to comprehensively evaluate the error detection ability of PRMs across diverse reasoning scenarios. To mitigate this gap, we introduce GR-Ben, a process-level benchmark specifically designed for assessing PRM's performance across two primary reasoning domains (science and logic) and nine subdomains. We conduct extensive experiments on a diverse set of 22 models, encompassing both PRMs and LLMs, and derive two key findings: (1) In domains beyond mathematical reasoning, the error-detection ability of existing PRMs and LLMs is found to be markedly weaker by comparison.(2) In general, PRMs are less adept at identifying knowledge-based errors, whereas LLMs exhibit poorer performance in detecting computational errors. We hope GR-Ben can foster future researches on PRMs for general domains, thereby enhancing the reasoning capabilities of LLMs.

preprint2026arXiv

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.

preprint2026arXiv

MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory

Agentic memory evolves across tasks into durable derived artifacts: summaries, cached outputs, embeddings, learned skills, and executable tool procedures. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derived from that source can remain visible and steer future actions with stale support. We formalize this failure mode as the cascade update problem, where repair targets the visible derived state of the memory store. We present MemoRepair, a barrier-first cascade-repair contract for agentic memory. A repair event induces a controlled transition from invalidated descendant state to validated successor state: affected descendants are withdrawn before repair, successors are constructed from retained support and staged repaired predecessors under the current interface, and republication is restricted to validated predecessor-closed successors. This contract induces a scalarized repair-selection problem for a fixed repair-cost tradeoff. We show that the induced publication problem reduces to maximum-weight predecessor closure and can be solved exactly by a single s-t min-cut. Experiments on ToolBench and MemoryArena show that, with complete influence provenance, MemoRepair reduces invalidated-memory exposure from 69.8-94.3% under systems without cascade repair to 0%. Compared with exhaustive Repair all, it recovers 91.1-94.3% of validated successors while reducing normalized repair-operator cost from 1.00 to 0.57-0.76.

preprint2026arXiv

Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation

Autonomous 3D indoor scene synthesis breaks down in non-convex rooms with tightly coupled spatial constraints. Data-driven generators lack topological priors for long-horizon planning, while iterative agents fragment semantics and become geometrically brittle. We present ZoneMaestro, a unified framework that shifts the paradigm from object-centric synthesis to Zone-Graph Orchestration. By internalizing a novel zone-based logic, ZoneMaestro translates high-level semantic intent into functional zones and topological constraints, enabling robust adaptation to diverse architectural forms. To support this, we construct Zone-Scene-10K, a large-scale dataset enriched with explicit Zone-Graph annotations. We further introduce an Alternating Alignment Strategy that cycles between reasoning internalization and Zone-Aware Group Relative Policy Optimization (Z-GRPO), effectively reconciling the tension between semantic richness and geometric validity without relying on external physics engines. To rigorously evaluate spatial intelligence beyond convex primitives, we formally define the task of Intricate Spatial Orchestration and release SCALE, a stress-test benchmark for irregular indoor scenarios with complex, dense spatial relations. Extensive experiments demonstrate that ZoneMaestro resolves the density-safety dichotomy, significantly outperforming state-of-the-art baselines in both structural coherence and intent adherence.

preprint2026arXiv

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Presentation generation is moving beyond static slide creation toward end-to-end presentation video generation with research grounding, multimodal media, and interactive delivery. We introduce PresentAgent-2, an agentic framework for generating presentation videos from user queries. Given an open-ended user query and a selected presentation mode, PresentAgent-2 first summarizes the query into a focused topic and performs deep research over presentation-friendly sources to collect multimodal resources, including relevant text, images, GIFs, and videos. It then constructs presentation slides, generates mode-specific scripts, and composes slides, audio, and dynamic media into a complete presentation video. PresentAgent-2 supports three independent presentation modes within a unified framework: Single Presentation, which generates a single-speaker narrated presentation video; Discussion, which creates a multi-speaker presentation with structured speaker roles, such as for asking guiding questions, explaining concepts, clarifying details, and summarizing key points; and Interaction, which independently supports answering audience questions grounded in the generated slides, scripts, retrieved evidence, and presentation context. To evaluate these capabilities, we build a multimodal presentation benchmark covering single presentation, discussion, and interaction scenarios, with task-specific evaluation criteria for content quality, media relevance, dynamic media use, dialogue naturalness, and interaction grounding. Overall, PresentAgent-2 extends presentation generation from document-dependent slide creation to query-driven, research-grounded presentation video generation with multimodal media, dialogue, and interaction. Code: https://github.com/AIGeeksGroup/PresentAgent-2. Website: https://aigeeksgroup.github.io/PresentAgent-2.

preprint2026arXiv

VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.

preprint2025arXiv

Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning

Vision-Language Models (VLMs) often appear culturally competent but rely on superficial pattern matching rather than genuine cultural understanding. We introduce a diagnostic framework to probe VLM reasoning on fire-themed cultural imagery through both classification and explanation analysis. Testing multiple models on Western festivals, non-Western traditions, and emergency scenes reveals systematic biases: models correctly identify prominent Western festivals but struggle with underrepresented cultural events, frequently offering vague labels or dangerously misclassifying emergencies as celebrations. These failures expose the risks of symbolic shortcuts and highlight the need for cultural evaluation beyond accuracy metrics to ensure interpretable and fair multimodal systems.

preprint2024arXiv

Instruct-Imagen: Image Generation with Multi-modal Instruction

This paper presents instruct-imagen, a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks. We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision. It uses natural language to amalgamate disparate modalities (e.g., text, edge, style, subject, etc.), such that abundant generation intents can be standardized in a uniform format. We then build instruct-imagen by fine-tuning a pre-trained text-to-image diffusion model with a two-stage framework. First, we adapt the model using the retrieval-augmented training, to enhance model's capabilities to ground its generation on external multimodal context. Subsequently, we fine-tune the adapted model on diverse image generation tasks that requires vision-language understanding (e.g., subject-driven generation, etc.), each paired with a multi-modal instruction encapsulating the task's essence. Human evaluation on various image generation datasets reveals that instruct-imagen matches or surpasses prior task-specific models in-domain and demonstrates promising generalization to unseen and more complex tasks.

preprint2023arXiv

Deformation measurement of a soil mixing retaining wall using terrestrial laser scanning

Retaining walls are often built to prevent excessive lateral movements of the ground surrounding an excavation site. During an excavation, failure of retaining walls could cause catastrophic accidents and hence their lateral deformations are monitored regularly. Laser scanning can rapidly acquire the spatial data of a relatively large area at fine spatial resolutions, which is ideal for monitoring retaining walls' deformations. This paper attempts to apply laser scanning to measurements of the lateral deformations of a soil mixing retaining wall at an ongoing excavation site. Reference measurements by total station and inclinometer were also conducted to verify those from the laser scanning. The deformations derived using laser scanning data were consistent with the reference measurements at the top part of the retaining wall (i.e., mainly the ring beam of the wall). This research also shows that the multi-scale-model-to-model method was the most accurate deformation estimation method on the research data.

preprint2022arXiv

AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism

In recent days, streaming technology has greatly promoted the development in the field of livestream. Due to the excessive length of livestream records, it's quite essential to extract highlight segments with the aim of effective reproduction and redistribution. Although there are lots of approaches proven to be effective in the highlight detection for other modals, the challenges existing in livestream processing, such as the extreme durations, large topic shifts, much irrelevant information and so forth, heavily hamper the adaptation and compatibility of these methods. In this paper, we formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem. Concretely, we first encode the original data into multiple views and model their temporal relations to capture clues in a hierarchical attention mechanism. Afterwards, we try to convert the detection of highlight clips into the search for optimal decision sequences and use the fully integrated representations to predict the final results in a dynamic-programming mechanism. Furthermore, we construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model. The extensive experiments indicate the effectiveness and validity of our proposed method.

preprint2022arXiv

BRIGHT -- Graph Neural Networks in Real-Time Fraud Detection

Detecting fraudulent transactions is an essential component to control risk in e-commerce marketplaces. Apart from rule-based and machine learning filters that are already deployed in production, we want to enable efficient real-time inference with graph neural networks (GNNs), which is useful to catch multihop risk propagation in a transaction graph. However, two challenges arise in the implementation of GNNs in production. First, future information in a dynamic graph should not be considered in message passing to predict the past. Second, the latency of graph query and GNN model inference is usually up to hundreds of milliseconds, which is costly for some critical online services. To tackle these challenges, we propose a Batch and Real-time Inception GrapH Topology (BRIGHT) framework to conduct an end-to-end GNN learning that allows efficient online real-time inference. BRIGHT framework consists of a graph transformation module (Two-Stage Directed Graph) and a corresponding GNN architecture (Lambda Neural Network). The Two-Stage Directed Graph guarantees that the information passed through neighbors is only from the historical payment transactions. It consists of two subgraphs representing historical relationships and real-time links, respectively. The Lambda Neural Network decouples inference into two stages: batch inference of entity embeddings and real-time inference of transaction prediction. Our experiments show that BRIGHT outperforms the baseline models by >2\% in average w.r.t.~precision. Furthermore, BRIGHT is computationally efficient for real-time fraud detection. Regarding end-to-end performance (including neighbor query and inference), BRIGHT can reduce the P99 latency by >75\%. For the inference stage, our speedup is on average 7.8$\times$ compared to the traditional GNN.

preprint2022arXiv

Depth-Assisted ResiDualGAN for Cross-Domain Aerial Images Semantic Segmentation

Unsupervised domain adaptation (UDA) is an approach to minimizing domain gap. Generative methods are common approaches to minimizing the domain gap of aerial images which improves the performance of the downstream tasks, e.g., cross-domain semantic segmentation. For aerial images, the digital surface model (DSM) is usually available in both the source domain and the target domain. Depth information in DSM brings external information to generative models. However, little research utilizes it. In this paper, depth-assisted ResiDualGAN (DRDG) is proposed where depth supervised loss (DSL), and depth cycle consistency loss (DCCL) are used to bring depth information into the generative model. Experimental results show that DRDG reaches state-of-the-art accuracy between generative methods in cross-domain semantic segmentation tasks.

preprint2022arXiv

e-G2C: A 0.14-to-8.31 $μ$J/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

This work presents the first silicon-validated dedicated EGM-to-ECG (G2C) processor, dubbed e-G2C, featuring continuous lightweight anomaly detection, event-driven coarse/precise conversion, and on-chip adaptation. e-G2C utilizes neural network (NN) based G2C conversion and integrates 1) an architecture supporting anomaly detection and coarse/precise conversion via time multiplexing to balance the effectiveness and power, 2) an algorithm-hardware co-designed vector-wise sparsity resulting in a 1.6-1.7$\times$ speedup, 3) hybrid dataflows for enhancing near 100% utilization for normal/depth-wise(DW)/point-wise(PW) convolutions (Convs), and 4) an on-chip detection threshold adaptation engine for continuous effectiveness. The achieved 0.14-8.31 $μ$J/inference energy efficiency outperforms prior arts under similar complexity, promising real-time detection/conversion and possibly life-critical interventions

preprint2022arXiv

Equivalence between algorithmic instability and transition to replica symmetry breaking in perceptron learning systems

Binary perceptron is a fundamental model of supervised learning for the non-convex optimization, which is a root of the popular deep learning. Binary perceptron is able to achieve a classification of random high-dimensional data by computing the marginal probabilities of binary synapses. The relationship between the algorithmic instability and the equilibrium analysis of the model remains elusive. Here, we establish the relationship by showing that the instability condition around the algorithmic fixed point is identical to the instability for breaking the replica symmetric saddle point solution of the free energy function. Therefore, our analysis would hopefully provide insights towards other learning systems in bridging the gap between non-convex learning dynamics and statistical mechanics properties of more complex neural networks.

preprint2022arXiv

Industrial Experience of Finding Cryptographic Vulnerabilities in Large-scale Codebases

Enterprise environment often screens large-scale (millions of lines of code) codebases with static analysis tools to find bugs and vulnerabilities. Parfait is a static code analysis tool used in Oracle to find security vulnerabilities in industrial codebases. Recently, many studies show that there are complicated cryptographic vulnerabilities caused by misusing cryptographic APIs in Java. In this paper, we describe how we realize a precise and scalable detection of these complicated cryptographic vulnerabilities based on Parfait framework. The key challenge in the detection of cryptographic vulnerabilities is the high false alarm rate caused by pseudo-influences. Pseudo-influences happen if security-irrelevant constants are used in constructing security-critical values. Static analysis is usually unable to distinguish them from hard-coded constants that expose sensitive information. We tackle this problem by specializing the backward dataflow analysis used in Parfait with refinement insights, an idea from the tool CryptoGuard. We evaluate our analyzer on a comprehensive Java cryptographic vulnerability benchmark and eleven large real-world applications. The results show that the Parfait-based cryptographic vulnerability detector can find real-world cryptographic vulnerabilities in large-scale codebases with high true-positive rates and low runtime cost.

preprint2022arXiv

Joint Optimization of STAR-RIS Assisted UAV Communication Systems

In this letter, we study the simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted unmanned aerial vehicle (UAV) communications. Our goal is to maximize the sum rate of all users by jointly optimizing the STAR-RIS's beamforming vectors, the UAV's trajectory and power allocation. We decompose the formulated non-convex problem into three subproblems and solve them alternately to obtain the solution. Simulations show that: 1) the STAR-RIS achieves a higher sum rate than traditional RIS; 2) to exploit the benefits of STAR-RIS, the UAV's trajectory is closer to STAR-RIS than that of RIS; 3) the energy splitting for reflection and transmission highly depends on the real-time trajectory of UAV.

preprint2022arXiv

Learning an Efficient Multimodal Depth Completion Model

With the wide application of sparse ToF sensors in mobile devices, RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems. First, the fusion of multimodal information requires more network modules to process different modalities. But the application scenarios of sparse ToF measurements usually demand lightweight structure and low computational cost. Second, fusing sparse and noisy depth data with dense pixel-wise RGB data may introduce artifacts. In this paper, a light but efficient depth completion network is proposed, which consists of a two-branch global and local depth prediction module and a funnel convolutional spatial propagation network. The two-branch structure extracts and fuses cross-modal features with lightweight backbones. The improved spatial propagation module can refine the completed depth map gradually. Furthermore, corrected gradient loss is presented for the depth completion problem. Experimental results demonstrate the proposed method can outperform some state-of-the-art methods with a lightweight architecture. The proposed method also wins the championship in the MIPI2022 RGB+TOF depth completion challenge.

preprint2022arXiv

Modelling graph dynamics in fraud detection with "Attention"

At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions. Despite the variety of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are both heterogeneous and dynamic. In this paper, we propose DyHGN (Dynamic Heterogeneous Graph Neural Network) and its variants to capture both temporal and heterogeneous information. We first construct dynamic heterogeneous graphs from registration and transaction data from eBay. Then, we build models with diachronic entity embedding and heterogeneous graph transformer. We also use model explainability techniques to understand the behaviors of DyHGN-* models. Our findings reveal that modelling graph dynamics with heterogeneous inputs need to be conducted with "attention" depending on the data structure, distribution, and computation cost.

preprint2022arXiv

Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks

Due to diverse architectures in deep neural networks (DNNs) with severe overparameterization, regularization techniques are critical for finding optimal solutions in the huge hypothesis space. In this paper, we propose an effective regularization technique, called Neighborhood Region Smoothing (NRS). NRS leverages the finding that models would benefit from converging to flat minima, and tries to regularize the neighborhood region in weight space to yield approximate outputs. Specifically, gap between outputs of models in the neighborhood region is gauged by a defined metric based on Kullback-Leibler divergence. This metric provides similar insights with the minimum description length principle on interpreting flat minima. By minimizing both this divergence and empirical loss, NRS could explicitly drive the optimizer towards converging to flat minima. We confirm the effectiveness of NRS by performing image classification tasks across a wide range of model architectures on commonly-used datasets such as CIFAR and ImageNet, where generalization ability could be universally improved. Also, we empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method, which is considered as the evidence of flat minima.

preprint2022arXiv

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. We demonstrate that confining the gradient norm of loss function could help lead the optimizers towards finding flat minima. We leverage the first-order approximation to efficiently implement the corresponding gradient to fit well in the gradient descent framework. In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. Also, we show that the recent sharpness-aware minimization method (Foret et al., 2021) is a special, but not the best, case of our method, where the best case of our method could give new state-of-art performance on these tasks. Code is available at {https://github.com/zhaoyang-0204/gnp}.

preprint2022arXiv

Quantitative Performance Assessment of CNN Units via Topological Entropy Calculation

Identifying the status of individual network units is critical for understanding the mechanism of convolutional neural networks (CNNs). However, it is still challenging to reliably give a general indication of unit status, especially for units in different network models. To this end, we propose a novel method for quantitatively clarifying the status of single unit in CNN using algebraic topological tools. Unit status is indicated via the calculation of a defined topological-based entropy, called feature entropy, which measures the degree of chaos of the global spatial pattern hidden in the unit for a category. In this way, feature entropy could provide an accurate indication of status for units in different networks with diverse situations like weight-rescaling operation. Further, we show that feature entropy decreases as the layer goes deeper and shares almost simultaneous trend with loss during training. We show that by investigating the feature entropy of units on only training data, it could give discrimination between networks with different generalization ability from the view of the effectiveness of feature representations.

preprint2022arXiv

Scalar Diffraction Analysis Of Dispersion In Low-Index Thin Flat Lenses

We analyze the dispersion property of low-index thin lenses by using scalar diffraction and finite difference time domain (FDTD) methods. We compare the dispersion results obtained by using these methods with reported experimental results, and the well-known analytical formula for focal length (f) of diffractive lenses as a function of wavelength (λ),f(λ)=(f_0 λ_0)/λ,where f_0 is the designed focal length for wavelength λ_0. We show that when the analytical formula is applied to thin flat lenses with low-refractive index, the results are accurate for small numerical aperture (NA) up to 0.2. For larger NA, the error between the analytical approximation and the FDTD analysis remains around 8% over a wide range of NA.

preprint2022arXiv

Spatio-temporal Gait Feature with Global Distance Alignment

Gait recognition is an important recognition technology, because gait is not easy to camouflage and does not need cooperation to recognize subjects. However, many existing methods are inadequate in preserving both temporal information and fine-grained information, thus reducing its discrimination. This problem is more serious when the subjects with similar walking postures are identified. In this paper, we try to enhance the discrimination of spatio-temporal gait features from two aspects: effective extraction of spatio-temporal gait features and reasonable refinement of extracted features. Thus our method is proposed, it consists of Spatio-temporal Feature Extraction (SFE) and Global Distance Alignment (GDA). SFE uses Temporal Feature Fusion (TFF) and Fine-grained Feature Extraction (FFE) to effectively extract the spatio-temporal features from raw silhouettes. GDA uses a large number of unlabeled gait data in real life as a benchmark to refine the extracted spatio-temporal features. GDA can make the extracted features have low inter-class similarity and high intra-class similarity, thus enhancing their discrimination. Extensive experiments on mini-OUMVLP and CASIA-B have proved that we have a better result than some state-of-the-art methods.

preprint2022arXiv

The fractional Chern insulator with Rydberg-dressed neutral atoms

Topological nontrivial bands can be realized via Rydberg-dressed neutral atoms. We propose a two-dimensional hard-core boson model with a topological ground enrgy at band on a honeycomb lattice, where the particle hopping is realized via van der Waals interaction that exchanges the Rydberg states of two interacting atoms, while nonzero phases associated with hopping is created by transferring the optical phase of laser fields to the atomic pair wave function. Using exactly diagonalization and infinite density matrix renormalization group simulation, we find in the system a fractional Chern insulator phase with a Chern number C = 1/2, which can persist in the presence of weak many-body interactions. Our studies indicate that fractional Chern insulators can be studied with neutral-atom arrays.

preprint2022arXiv

Video-Guided Curriculum Learning for Spoken Video Grounding

In this paper, we introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions. Compared with using text, employing audio requires the model to directly exploit the useful phonemes and syllables related to the video from raw speech. Moreover, we randomly add environmental noises to this speech audio, further increasing the difficulty of this task and better simulating real applications. To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise. Considering during inference the model can not obtain ground truth video segments, we design a curriculum strategy that gradually shifts the input video from the ground truth to the entire video content during pre-training. Finally, the model can learn how to extract critical visual information from the entire video clip to help understand the spoken language. In addition, we collect the first large-scale spoken video grounding dataset based on ActivityNet, which is named as ActivityNet Speech dataset. Extensive experiments demonstrate our proposed video-guided curriculum learning can facilitate the pre-training process to obtain a mutual audio encoder, significantly promoting the performance of spoken video grounding tasks. Moreover, we prove that in the case of noisy sound, our model outperforms the method that grounding video with ASR transcripts, further demonstrating the effectiveness of our curriculum strategy.

preprint2022arXiv

xFraud: Explainable Fraud Transaction Detection

At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly composed of a detector and an explainer. The xFraud detector can effectively and efficiently predict the legitimacy of incoming transactions. Specifically, it utilizes a heterogeneous graph neural network to learn expressive representations from the informative heterogeneously typed entities in the transaction logs. The explainer in xFraud can generate meaningful and human-understandable explanations from graphs to facilitate further processes in the business unit. In our experiments with xFraud on real transaction networks with up to 1.1 billion nodes and 3.7 billion edges, xFraud is able to outperform various baseline models in many evaluation metrics while remaining scalable in distributed settings. In addition, we show that xFraud explainer can generate reasonable explanations to significantly assist the business analysis via both quantitative and qualitative evaluations.

preprint2021arXiv

A Comprehensive Survey of 6G Wireless Communications

While fifth-generation (5G) communications are being rolled out worldwide, sixth-generation (6G) communications have attracted much attention from both the industry and the academia. Compared with 5G, 6G will have a wider frequency band, higher transmission rate, spectrum efficiency, greater connection capacity, shorter delay, broader coverage, and more robust anti-interference capability to satisfy various network requirements. This survey presents an insightful understanding of 6G wireless communications by introducing requirements, features, critical technologies, challenges, and applications. First, we give an overview of 6G from perspectives of technologies, security and privacy, and applications. Subsequently, we introduce various 6G technologies and their existing challenges in detail, e.g., artificial intelligence (AI), intelligent surfaces, THz, space-air-ground-sea integrated network, cell-free massive MIMO, etc. Because of these technologies, 6G is expected to outperform existing wireless communication systems regarding the transmission rate, latency, global coverage, etc. Next, we discuss security and privacy techniques that can be applied to protect data in 6G. Since edge devices are expected to gain popularity soon, the vast amount of generated data and frequent data exchange make the leakage of data easily. Finally, we predict real-world applications built on the technologies and features of 6G; for example, smart healthcare, smart city, and smart manufacturing will be implemented by taking advantage of AI.

preprint2021arXiv

FaultNet: A Deep Convolutional Neural Network for bearing fault classification

The increased presence of advanced sensors on the production floors has led to the collection of datasets that can provide significant insights into machine health. An important and reliable indicator of machine health, vibration signal data can provide us a greater understanding of different faults occurring in mechanical systems. In this work, we analyze vibration signal data of mechanical systems with bearings by combining different signal processing methods and coupling them with machine learning techniques to classify different types of bearing faults. We also highlight the importance of using different signal processing methods and analyze their effect on accuracy for bearing fault detection. Apart from the traditional machine learning algorithms we also propose a convolutional neural network FaultNet which can effectively determine the type of bearing fault with a high degree of accuracy. The distinguishing factor of this work is the idea of channels proposed to extract more information from the signal, we have stacked the Mean and Median channels to raw signal to extract more useful features to classify the signals with greater accuracy.

preprint2021arXiv

Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck

Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. However, the sequential text generation common pitfall with VAEs is that the model tends to ignore latent variables with a strong auto-regressive decoder. In this paper, we propose a principled approach to alleviate this issue by applying a discretized bottleneck to enforce an implicit latent feature matching in a more compact latent space. We impose a shared discrete latent space where each input is learned to choose a combination of latent atoms as a regularized latent representation. Our model endows a promising capability to model underlying semantics of discrete sequences and thus provide more interpretative latent structures. Empirically, we demonstrate our model's efficiency and effectiveness on a broad range of tasks, including language modeling, unaligned text style transfer, dialog response generation, and neural machine translation.

preprint2021arXiv

Interaction between optical pulse and tumor using finite element analysis

Photoacoustic imaging is an emerging technology based on the photoacoustic effect that has developed rapidly in recent years. It combines the high contrast of optical imaging and the high penetration and high resolution of acoustic imaging. As a non-destructive biological tissue imaging technology, photoacoustic imaging has important application value in the field of biomedicine. With its high efficiency bi-oimaging capabilities and excellent biosafety performance, it has been favored by researchers. The visualization of photoacoustic imaging has great research signifi-cance in the early diagnosis of some diseases, especially tumors. In photoacoustic imaging, light transmission and thermal effects are important processes. This article is based on COMSOL software and uses finite element analysis to construct a physi-cal model for simulation. Through laser pulses into the stomach tissue containing tumor, the physical process of light transmission and biological heat transfer was studied, and a photothermal model composed of two physical fields was built, and finally a series of visualization graphics were obtained. This work has certain theo-retical guiding significance for further promoting the application of photoacoustic imaging in the field of biomedicine.

preprint2021arXiv

Performance of a focal plane detector for soft X-ray imaging spectroscopy based on back-illuminated sCMOS

Spectroscopy focusing array (SFA) and Polarimetry focusing array (PFA) are the two major payloads of enhanced X-ray Timing and Polarimetry mission (eXTP). Nested Wolter-\RNum{1} X-ray mirror module is implemented in SFA and PFA to achive high effective area. When evaluating the properties of the mirror module, the alignment of the optical axis of the X-ray mirror module and a quasi-parallel X-ray beam is a prerequisite to ensure the accuracy of the results. Hence, to assist the alignment of the X-ray mirror module, an X-ray focal plane detector is designed based on the back-illuminated scientific Complementary Metal-Oxide-Semiconductor Transistor (sCMOS) sensor GSENSE6060BSI, one of the largest detection areas, is produced by \textit{Gpixel Inc}. Then the characteristics of readout noise, dark current, and split-pixel event properties of the detector are studied with the self-developed multi-target fluorescence X-ray source in a 100 m long X-ray test facility. The energy calibration is carried out with the single-pixel event and the energy non-linearity of the detector is also obtained. Eventually, the simulation of the eXTP mirror module based on the optical model is conducted and the alignment test of the Wolter-\RNum{1} X-ray mirror module designed for \textit{EP/FXT} (Einstein Probe/Follow-up X-ray Telescope) with "Burkert test" method is shown.

preprint2021arXiv

Photon-assisted Landau-Zener transitions in a periodically driven Rabi dimer coupled to a dissipative mode

We investigate multiple photon-assisted Landau-Zener (LZ) transitions in a hybrid circuit quantum electrodynamics device in which each of two interacting transmission-line resonators is coupled to a qubit, and the qubits are driven by periodic driving fields and also coupled to a common phonon mode. The quantum state of the entire composite system is modeled using the multi-$\rm D_2$ Ansatz in combination with the time-dependent Dirac-Frenkel variational principle. Applying a sinusoidal driving field to one of the qubits, this device is an ideal platform to study the photon-assisted LZ transitions by comparing the dynamics of the two qubits. A series of interfering photon-assisted LZ transitions take place if the photon frequency is much smaller than the driving amplitude. Once the two energy scales are comparable, independent LZ transitions arise and a transition pathway is revealed using an energy diagram. It is found that both adiabatic and nonadiabatic transitions are involved in the dynamics. Used to model environmental effects on the LZ transitions, the common phonon mode coupled to the qubits allows for more available states to facilitate the LZ transitions. An analytical formula is obtained to estimate the short-time phonon population and produces results in reasonable agreement with numerical calculations. Equipped with the knowledge of the photon-assisted LZ transitions in the system, we can precisely manipulate the qubit state and successfully generate the qubit dynamics with a square-wave pattern by applying driving fields to both qubits, opening up new venues to manipulate the states of qubits and photons in quantum information devices and quantum computers

preprint2021arXiv

Photovoltaic Self-Powered Gas Sensing: A Review

The self-powered sensing system could harness ambient energy to power the sensor without the need for external electrical energy. Recently, the concept of photovoltaic (PV) self-powered gas sensing has aroused wider attentions due to room-temperature operation, low power consumption, small size and potential applications. The PV self-powered gas sensors integrate the photovoltaic effects and the gas sensing function into a single chip, which could truly achieve the goal of zero power consumption for an independent gas sensing device. As an emerging concept, the PV self-powered gas sensing has been achieved by using different strategies, including integrated gas sensor and solar cell, integrated light filter and solar cell, gas-sensitive heterojunction photovoltaics, and gas-sensitive lateral photovoltaics, respectively. The purpose of this review is to summarize recent advances of PV self-powered gas sensing and also remark on the directions for future research in this topic.

preprint2021arXiv

Privacy-Preserving Blockchain-Based Federated Learning for IoT Devices

Home appliance manufacturers strive to obtain feedback from users to improve their products and services to build a smart home system. To help manufacturers develop a smart home system, we design a federated learning (FL) system leveraging the reputation mechanism to assist home appliance manufacturers to train a machine learning model based on customers' data. Then, manufacturers can predict customers' requirements and consumption behaviors in the future. The working flow of the system includes two stages: in the first stage, customers train the initial model provided by the manufacturer using both the mobile phone and the mobile edge computing (MEC) server. Customers collect data from various home appliances using phones, and then they download and train the initial model with their local data. After deriving local models, customers sign on their models and send them to the blockchain. In case customers or manufacturers are malicious, we use the blockchain to replace the centralized aggregator in the traditional FL system. Since records on the blockchain are untampered, malicious customers or manufacturers' activities are traceable. In the second stage, manufacturers select customers or organizations as miners for calculating the averaged model using received models from customers. By the end of the crowdsourcing task, one of the miners, who is selected as the temporary leader, uploads the model to the blockchain. To protect customers' privacy and improve the test accuracy, we enforce differential privacy on the extracted features and propose a new normalization technique. We experimentally demonstrate that our normalization technique outperforms batch normalization when features are under differential privacy protection. In addition, to attract more customers to participate in the crowdsourcing FL task, we design an incentive mechanism to award participants.

preprint2021arXiv

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of natural languages. In this paper, we propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, which are only applied to specific models, our method is more general and could be easily adapted to any MLE-based training procedure. In addition, our framework allows task-specific evaluation metrics to be designed to flexibly control the generated sentences, for example, in terms of controlling vocabulary usage and avoiding nontrivial repetitions. Extensive experimental results demonstrate the superiority of our method on two synthetic and several standard real datasets, significantly improving related baselines.

preprint2020arXiv

3D Face Reconstruction from A Single Image Assisted by 2D Face Images in the Wild

3D face reconstruction from a single 2D image is a challenging problem with broad applications. Recent methods typically aim to learn a CNN-based 3D face model that regresses coefficients of 3D Morphable Model (3DMM) from 2D images to render 3D face reconstruction or dense face alignment. However, the shortage of training data with 3D annotations considerably limits performance of those methods. To alleviate this issue, we propose a novel 2D-assisted self-supervised learning (2DASL) method that can effectively use "in-the-wild" 2D face images with noisy landmark information to substantially improve 3D face model learning. Specifically, taking the sparse 2D facial landmarks as additional information, 2DSAL introduces four novel self-supervision schemes that view the 2D landmark and 3D landmark prediction as a self-mapping process, including the 2D and 3D landmark self-prediction consistency, cycle-consistency over the 2D landmark prediction and self-critic over the predicted 3DMM coefficients based on landmark predictions. Using these four self-supervision schemes, the 2DASL method significantly relieves demands on the the conventional paired 2D-to-3D annotations and gives much higher-quality 3D face models without requiring any additional 3D annotations. Experiments on multiple challenging datasets show that our method outperforms state-of-the-arts for both 3D face reconstruction and dense face alignment by a large margin.

preprint2020arXiv

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns. To reduce the dominant data movement cost of training, process in-memory (PIM) has emerged as a promising solution as it alleviates the need to access DNN weights. However, state-of-the-art PIM DNN training accelerators employ either analog/mixed signal computing which has limited precision or digital computing based on a memory technology that supports limited logic functions and thus requires complicated procedure to realize floating point computation. In this paper, we propose a spin orbit torque magnetic random access memory (SOT-MRAM) based digital PIM accelerator that supports floating point precision. Specifically, this new accelerator features an innovative (1) SOT-MRAM cell, (2) full addition design, and (3) floating point computation. Experiment results show that the proposed SOT-MRAM PIM based DNN training accelerator can achieve 3.3$\times$, 1.8$\times$, and 2.5$\times$ improvement in terms of energy, latency, and area, respectively, compared with a state-of-the-art PIM based DNN training accelerator.

preprint2020arXiv

Attacks to Federated Learning: Responsive Web User Interface to Recover Training Data from User Gradients

Local differential privacy (LDP) is an emerging privacy standard to protect individual user data. One scenario where LDP can be applied is federated learning, where each user sends in his/her user gradients to an aggregator who uses these gradients to perform stochastic gradient descent. In a case where the aggregator is untrusted and LDP is not applied to each user gradient, the aggregator can recover sensitive user data from these gradients. In this paper, we present a new interactive web demo showcasing the power of local differential privacy by visualizing federated learning with local differential privacy. Moreover, the live demo shows how LDP can prevent untrusted aggregators from recovering sensitive training data. A measure called the exp-hamming recovery is also created to show the extent of how much data the aggregator can recover.

preprint2020arXiv

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a growing demand for DNN chips. However, designing DNN chips is non-trivial because: (1) mainstream DNNs have millions of parameters and operations; (2) the large design space due to the numerous design choices of dataflows, processing elements, memory hierarchy, etc.; and (3) an algorithm/hardware co-design is needed to allow the same DNN functionality to have a different decomposition, which would require different hardware IPs to meet the application specifications. Therefore, DNN chips take a long time to design and require cross-disciplinary experts. To enable fast and effective DNN chip design, we propose AutoDNNchip - a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN chip implementation given DNNs from machine learning frameworks (e.g., PyTorch) for a designated application and dataset. Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator&#39;s energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.), optimize chip design via the Chip Predictor, and then generate optimized synthesizable RTL to achieve the target design metrics. Experimental results show that our Chip Predictor&#39;s predicted performance differs from real-measured ones by < 10% when validated using 15 DNN models and 4 platforms (edge-FPGA/TPU/GPU and ASIC). Furthermore, accelerators generated by our AutoDNNchip can achieve better (up to 3.86X improvement) performance than that of expert-crafted state-of-the-art accelerators.

preprint2020arXiv

Blockchain-Based Differential Privacy Cost Management System

Privacy preservation is a big concern for various sectors. To protect individual user data, one emerging technology is differential privacy. However, it still has limitations for datasets with frequent queries, such as the fast accumulation of privacy cost. To tackle this limitation, this paper explores the integration of a secured decentralised ledger, blockchain. Blockchain will be able to keep track of all noisy responses generated with differential privacy algorithm and allow for certain queries to reuse old responses. In this paper, a demo of a proposed blockchain-based privacy management system is designed as an interactive decentralised web application (DApp). The demo created illustrates that leveraging on blockchain will allow the total privacy cost accumulated to decrease significantly.

preprint2020arXiv

Correlating magnetic structure and magnetotransport in semimetal thin films of Eu$_{1-x}$Sm$_x$TiO$_3$

We report on the evolution of the average and depth-dependent magnetic order in thin film samples of biaxially stressed and electron-doped EuTiO$_3$ for samples across a doping range $<$0.1 to 7.8 $\times 10^{20}$ cm$^{-3}$. Under an applied in-plane magnetic field, the G-type antiferromagnetic ground state undergoes a continuous spin-flop phase transition into in-plane, field-polarized ferromagnetism. The critical field for ferromagnetism slightly decreases with an increasing number of free carriers, yet the field evolution of the spin-flop transition is qualitatively similar across the doping range. Unexpectedly, we observe interfacial ferromagnetism with saturated Eu$^{2+}$ moments at the substrate interface at low fields preceding ferromagnetic saturation throughout the bulk of the degenerate semiconductor film. We discuss the implications of these findings for the unusual magnetotransport properties of this compound.

preprint2020arXiv

Deep High-Resolution Representation Learning for Visual Recognition

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{https://github.com/HRNet}}.

preprint2020arXiv

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem. In this work, we first characterize this phenomenon from the information-theoretic perspective and show that under certain conditions, the mutual information between the output after $l$ layers and the input of GCN converges to 0 exponentially with respect to $l$. We also show that, on the other hand, graph decomposition can potentially weaken the condition of such convergence rate, which enabled our analysis for GraphCNN. While different graph structures can only benefit from the corresponding decomposition, in practice, we propose an automatic connectivity-aware graph decomposition algorithm, DeGNN, to improve the performance of general graph neural networks. Extensive experiments on widely adopted benchmark datasets demonstrate that DeGNN can not only significantly boost the performance of corresponding GNNs, but also achieves the state-of-the-art performances.

preprint2020arXiv

Dual-discriminator GAN: A GAN way of profile face recognition

A wealth of angle problems occur when facial recognition is performed: At present, the feature extraction network presents eigenvectors with large differences between the frontal face and profile face recognition of the same person in many cases. For this reason, the state-of-the-art facial recognition network will use multiple samples for the same target to ensure that eigenvector differences caused by angles are ignored during training. However, there is another solution available, which is to generate frontal face images with profile face images before recognition. In this paper, we proposed a method of generating frontal faces with image-to-image profile faces based on Generative Adversarial Network (GAN).

preprint2020arXiv

Feature Quantization Improves GAN Training

The instability in GAN training has been a long-standing problem despite remarkable research efforts. We identify that instability issues stem from difficulties of performing feature matching with mini-batch statistics, due to a fragile balance between the fixed target distribution and the progressively generated distribution. In this work, we propose Feature Quantization (FQ) for the discriminator, to embed both true and fake data samples into a shared discrete space. The quantized values of FQ are constructed as an evolving dictionary, which is consistent with feature statistics of the recent distribution history. Hence, FQ implicitly enables robust feature matching in a compact space. Our method can be easily plugged into existing GAN models, with little computational overhead in training. We apply FQ to 3 representative GAN models on 9 benchmarks: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation. Extensive experimental results show that the proposed FQ-GAN can improve the FID scores of baseline methods by a large margin on a variety of tasks, achieving new state-of-the-art performance.

preprint2020arXiv

IRS-Assisted Millimeter Wave Communications: Joint Power Allocation and Beamforming Design

Intelligent reflecting surface (IRS) technology offers more feasible propagation paths for millimeter-wave (mmWave) communication systems to overcome blockage than existing technologies. In this paper, we consider a downlink wireless system with the IRS and formulate a joint power allocation and beamforming design problem to maximize the weighted sum-rate, which is a multi-variable optimization problem. To solve the problem, we propose a novel alternating manifold optimization based beamforming algorithm. Simulation results show that our proposed optimization algorithm outperforms existing algorithms significantly in improving the weighted sum-rate of the wireless communication system.

preprint2020arXiv

Magnetic Order and Competition With Superconductivity in (Er-Ho)Ni$_{2}$B$_{2}$C

The rare earth magnetic order in pure and doped Er$_{(1-x)}$Ho$_{x}$Ni$_2$B$_2$C (x~=~0,~0.25,~0.50,~0.75,~1) single crystal samples was investigated using magnetization and neutron diffraction measurements. Superconducting quaternary borocarbides, $R$Ni$_2$B$_2$C where R~=~Er, Ho , are both magnetic intermetallic superconductors with the transition temperatures $\sim$ 10 K. These compounds also develop magnetic order in the vicinity of this temperature. Depending on the rare earth composition the coupling between superconductivity and magnetism creates several phases, ranging from a reentrant superconductor with a mixture of commensurate and incommensurate antiferromagnetism to a total incommensurate antiferromagnetic spin modulation with a weak ferromagnetic state. All of these phases coexist with superconductivity. RKKY magnetic interactions are used to describe the magnetic orders in the pure compounds. However, the doping of Ho on Er (or Er on Ho) sites which have two strong magnetic moments with two different easy directions creates new and complicated magnetic modulations with possible local disorder effects. One fascinating effect is the development of an induced magnetic state resembling the pure and doped $R_2$CuO$_4$, R~=~Nd and Pr.

preprint2020arXiv

Materializing Rival Ground States in the Barlowite Family of Kagome Magnets: Quantum Spin Liquid, Spin Ordered, and Valence Bond Crystal States

The spin-$\frac{1}{2}$ kagome antiferromagnet is considered an ideal host for a quantum spin liquid ground state. We find that when the bonds of the kagome lattice are modulated with a periodic pattern, new quantum ground states emerge. Newly synthesized crystalline barlowite (Cu$_4$(OH)$_6$FBr) and Zn-substituted barlowite demonstrate the delicate interplay between singlet states and spin order on the spin-$\frac{1}{2}$ kagome lattice. Comprehensive structural measurements demonstrate that our new variant of barlowite maintains hexagonal symmetry at low temperatures with an arrangement of distorted and undistorted kagome triangles, for which numerical simulations predict a pinwheel valence bond crystal (VBC) state instead of a quantum spin liquid (QSL). The presence of interlayer spins eventually leads to an interesting pinwheel $q=0$ magnetic order. Partially Zn-substituted barlowite (Cu$_{3.44}$Zn$_{0.56}$(OH)$_6$FBr) has an ideal kagome lattice and shows QSL behavior, indicating a surprising robustness of the QSL against interlayer impurities. The magnetic susceptibility is similar to that of herbertsmithite, even though the Cu$^{2+}$ impurities are above the percolation threshold for the interlayer lattice and they couple more strongly to the nearest kagome moment. This system is a unique playground displaying QSL, VBC, and spin order, furthering our understanding of these highly competitive quantum states.

preprint2020arXiv

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7$\times$ and the speedup by up to 19.2$\times$ over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.

preprint2020arXiv

Structure-Aware Human-Action Generation

Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence. Most existing methods borrow ideas from video generation, which naively treat skeleton nodes/joints as pixels of images without considering the rich inter-frame and intra-frame structure information, leading to potential distorted actions. Graph convolutional networks (GCNs) is a promising way to leverage structure information to learn structure representations. However, directly adopting GCNs to tackle such continuous action sequences both in spatial and temporal spaces is challenging as the action graph could be huge. To overcome this issue, we propose a variant of GCNs to leverage the powerful self-attention mechanism to adaptively sparsify a complete action graph in the temporal space. Our method could dynamically attend to important past frames and construct a sparse graph to apply in the GCN framework, well-capturing the structure information in action sequences. Extensive experimental results demonstrate the superiority of our method on two standard human action datasets compared with existing methods.

preprint2020arXiv

The Algebraic Expressions of Huygens Principle and Holographic Principle of Light

Huygens principle (HP) is the cornerstone of wave optics, its mathematical model is a boundary value problem of wave equation. The solutions of this mathematical model should be partial derivative u sub n independent and satisfy the form of retarded potential. In the engaged formulas, only the Rayleigh-Sommerfeld diffraction formula (RSDF) satisfies these two restrictions. Unfortunately, the HP requires spherical boundary, while the boundary of RSDF is an infinite plane. Besides that, we find the the geometric constructions of HP and holographic principle of light (HPL) are complementary. Here we derive out the complete expressions of HP and HPL with spherical boundary, based on the method of images. Furthermore, the HP, HPL and RSDF are combined into one new principle that if the boundary of a vacuum region is a spherical surface or an infinite plane, all the light in this vacuum region is determined by the light on the boundary.

preprint2020arXiv

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

Resistive-random-access-memory (ReRAM) based processing-in-memory (R$^2$PIM) accelerators show promise in bridging the gap between Internet of Thing devices&#39; constrained resources and Convolutional/Deep Neural Networks&#39; (CNNs/DNNs&#39;) prohibitive energy cost. Specifically, R$^2$PIM accelerators enhance energy efficiency by eliminating the cost of weight movements and improving the computational density through ReRAM&#39;s high density. However, the energy efficiency is still limited by the dominant energy cost of input and partial sum (Psum) movements and the cost of digital-to-analog (D/A) and analog-to-digital (A/D) interfaces. In this work, we identify three energy-saving opportunities in R$^2$PIM accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative R$^2$PIM accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both input and Psum movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D) conversion and the total number of conversions by using time-domain interfaces (TDIs) and the employed ALBs, respectively; (3) we develop an only-once input read (O$^2$IR) mapping method to further decrease the energy of input accesses and the number of D/A conversions. The evaluation with more than 10 CNN/DNN models and various chip configurations shows that, TIMELY outperforms the baseline R$^2$PIM accelerator, PRIME, by one order of magnitude in energy efficiency while maintaining better computational density (up to 31.2$\times$) and throughput (up to 736.6$\times$). Furthermore, comprehensive studies are performed to evaluate the effectiveness of the proposed ALB, TDI, and O$^2$IR innovations in terms of energy savings and area reduction.

preprint2020arXiv

Topological Singularity Induced Chiral Kohn Anomaly in a Weyl Semimetal

The electron-phonon interaction (EPI) is instrumental in a wide variety of phenomena in solid-state physics, such as electrical resistivity in metals, carrier mobility, optical transition and polaron effects in semiconductors, lifetime of hot carriers, transition temperature in BCS superconductors, and even spin relaxation in diamond nitrogen-vacancy centers for quantum information processing. However, due to the weak EPI strength, most phenomena have focused on electronic properties rather than on phonon properties. One prominent exception is the Kohn anomaly, where phonon softening can emerge when the phonon wavevector nests the Fermi surface of metals. Here we report a new class of Kohn anomaly in a topological Weyl semimetal (WSM), predicted by field-theoretical calculations, and experimentally observed through inelastic x-ray and neutron scattering on WSM tantalum phosphide (TaP). Compared to the conventional Kohn anomaly, the Fermi surface in a WSM exhibits multiple topological singularities of Weyl nodes, leading to a distinct nesting condition with chiral selection, a power-law divergence, and non-negligible dynamical effects. Our work brings the concept of Kohn anomaly into WSMs and sheds light on elucidating the EPI mechanism in emergent topological materials.

preprint2020arXiv

Variational approach to time-dependent fluorescence of a driven qubit

We employ the Dirac-Frenkel variational principle and multiple Davydov ansatz to study time-dependent fluorescence spectra of a driven qubit in the weak- to strong qubit-reservoir coupling regimes, where both the Rabi frequency and spontaneous decay rate are comparable to the transition frequency of the qubit. Our method agrees well with the time-local master-equation approach in the weak-coupling regime, and offers a flexible way to compute the spectra from the bosonic dynamics instead of two-time correlation functions. While the perturbative master equation breaks down in the strong-coupling regime, our method actually becomes more accurate due to the use of bosonic coherent states under certain conditions. We show that the counter-rotating coupling between the qubit and the reservoir has considerable contributions to the photon number dynamics and the spectra under strong driving conditions even though the coupling is moderately weak. The time-dependent spectra are found to be generally asymmetric, a feature that is derived from photon number dynamics. In addition, it is shown that the spectral profiles can be dramatically different from the Mollow triplet due to strong dissipation and/or multiphoton processes associated with the strong driving. Our formalism provides a unique perspective to interpret time-dependent spectra.

preprint2020arXiv

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG). Given an untrimmed video and a declarative/interrogative sentence depicting an object, STVG aims to localize the spatio-temporal tube of the queried object. STVG has two challenging settings: (1) We need to localize spatio-temporal object tubes from untrimmed videos, where the object may only exist in a very small segment of the video; (2) We deal with multi-form sentences, including the declarative sentences with explicit objects and interrogative sentences with unknown objects. Existing methods cannot tackle the STVG task due to the ineffective tube pre-generation and the lack of object relationship modeling. Thus, we then propose a novel Spatio-Temporal Graph Reasoning Network (STGRN) for this task. First, we build a spatio-temporal region graph to capture the region relationships with temporal object dynamics, which involves the implicit and explicit spatial subgraphs in each frame and the temporal dynamic subgraph across frames. We then incorporate textual clues into the graph and develop the multi-step cross-modal graph reasoning. Next, we introduce a spatio-temporal localizer with a dynamic selection method to directly retrieve the spatio-temporal tubes without tube pre-generation. Moreover, we contribute a large-scale video grounding dataset VidSTG based on video relation dataset VidOR. The extensive experiments demonstrate the effectiveness of our method.

preprint2019arXiv

Signatures of many-body localization and metastability by weak perturbation

Nonequilibrium dynamics in isolated quantum many-body systems displays a number of intriguing features, such as many-body localization (MBL) and prethermalization. Here we investigate a simple ladder system with disorder, in which various distinct dynamical features coexist and interplay. By exact diagonalization, we demonstrate that the system exhibits the signatures of an MBL-ergodic-MBL reentrant transition, metastability, and disorder-free MBL. We give an account of these properties by introducing a quasi-particle picture and interpreting the quasi-vacuum energy fluctuation as an effective disorder on the quasi-particle dynamics. It is speculated that the weak perturbation behavior is a finite-size effect, but its relaxation time scale increases with the system size.

preprint2013arXiv

A Novel Carrier Waveform Inter-Displacement Modulation Method in Underwater Communication Channel

As the main way of underwater wireless communication, underwater acoustic communication is one of the focuses of ocean research. Compared with the free space wireless communication channel, the underwater acoustic channel suffers from more severe multipath effect, the less available bandwidth and the even complex noise. The underwater acoustic channel is one of the most complicated wireless communication channels. To achieve a reliable underwater acoustic communication, Phase Shift Keying (PSK) modulation and Passive Time Reversal Mirror (PTRM) equalization are considered to be a suitable scheme. However, due to the serious distortion of the received signal caused by the channel, this scheme suffers from a high Bit Error Rate (BER) under the condition of the low Signal to Noise Ratio (SNR). To solve this problem, we proposes a Carrier Waveform Inter-Displacement (CWID) modulation method based on the Linear Frequency Modulation (LFM) PSK and PTRM scheme. The new communication scheme reduces BER by increasing the difference from the carrier waveform for different symbols. Simulation results show the effectiveness and superiority of the proposed method.