Source author record

Shuai Wang

Shuai Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

86works

36topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

1/2 order convergence rate of Euler-type methods for time-changed stochastic differential equations with super-linearly growing drift and diffusion coefficients

This paper investigates the strong convergence properties of two Euler-type methods for a class of time-changed stochastic differential equations (TCSDEs) with super-linearly growing drift and diffusion coefficients. Building upon existing research, we propose a backward Euler method (BEM) and introduce its explicit counterpart -- the projected Euler method (PEM). We prove that both methods converge strongly in the $L_2$-sense at the optimal rate of 1/2. This result extends the applicability of both the BEM and the PEM to a broader class of TCSDEs. Moreover, the two methods offer complementary strengths: while BEM possesses wide applicability, PEM is computationally more efficient. Numerical simulations confirm our theoretical findings and illustrate practical performance of both schemes.

preprint2026arXiv

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from available data sources. To address this, we propose AgenticMath, a novel agentic method for generating high-quality mathematical question-answer pairs to enhance the supervised fine-tuning of LLMs. Our method operates through four stages: (1) Seed Question Filter that selects questions with high information richness, complexity, and clarity; (2) an Agentic Question Rephrase step that employs a multi-agent system to generate diverse, logically consistent paraphrases; (3) an Answer Augment step where rewrite answers using chain-of-thought reasoning to enhance numerical and logical correctness, without reliance on human-provided labels; and (4) a final Question and Answer Evaluation that retains only the most superior pairs. Extensive experiments demonstrate that, fine-tuning 3B-8B parameter LLMs on AgenticMath generated datasets (comprising only 30-60K math samples) achieves competitive or superior performance on diverse in domain and out-of-domain mathematical reasoning benchmarks compared to baselines trained on much more data (e.g., 400K or 2.3M samples). Our work demonstrates that targeted, high-quality data generation is a more efficient path to improving mathematical reasoning in LLMs than large-scale, low-quality alternatives.

preprint2026arXiv

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a representative-token retriever for diffusion language models: it appends K masked positions to the prompt and reads all K in a single bidirectional forward pass. Across in-domain and out-of-domain evaluation, multi-token DiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressive multi-token is flat or negative and pays a latency cost that scales with K where diffusion does not. After supervised fine-tuning, DiffRetriever on Dream is the strongest BEIR-7 retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vector RepLLaMA. A per-query oracle on the frozen base model exceeds contrastive fine-tuning at the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.

preprint2026arXiv

Efficient Differentiable Causal Discovery via Reliable Super-Structure Learning

Recently, differentiable causal discovery has emerged as a promising approach to improve the accuracy and efficiency of existing methods. However, when applied to high-dimensional data or data with latent confounders, these methods, often based on off-the-shelf continuous optimization algorithms, struggle with the vast search space, the complexity of the objective function, and the nontrivial nature of graph-theoretical constraints. As a result, there has been a surge of interest in leveraging super-structures to guide the optimization process. Nonetheless, learning an appropriate super-structure at the right level of granularity, and doing so efficiently across various settings, presents significant challenges. In this paper, we propose ALVGL, a novel and general enhancement to the differentiable causal discovery pipeline. ALVGL employs a sparse and low-rank decomposition to learn the precision matrix of the data. We design an ADMM procedure to optimize this decomposition, identifying components in the precision matrix that are most relevant to the underlying causal structure. These components are then combined to construct a super-structure that is provably a superset of the true causal graph. This super-structure is used to initialize a standard differentiable causal discovery method with a more focused search space, thereby improving both optimization efficiency and accuracy. We demonstrate the versatility of ALVGL by instantiating it across a range of structural causal models, including both Gaussian and non-Gaussian settings, with and without unmeasured confounders. Extensive experiments on synthetic and real-world datasets show that ALVGL not only achieves state-of-the-art accuracy but also significantly improves optimization efficiency, making it a reliable and effective solution for differentiable causal discovery.

preprint2026arXiv

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward framework for pose-free off-road reconstruction. The key idea is to resolve temporal conflicts through spatially localized conditioning. Specifically, we introduce voxel-grounded temporal Gaussian aggregation, which partitions the canonical Gaussian space into spatial voxels and performs query-conditioned temporal attention within each voxel. Intra-voxel softmax normalization ensures that temporal selectivity and spatial occupancy become mutually reinforcing rather than conflicting. We furthermore introduce surface normal cues as auxiliary geometric guidance to regularize the geometry of Gaussian primitives. Extensive experiments on ORAD-3D and RELLIS-3D demonstrate that Ground4D consistently outperforms existing feedforward methods in reconstruction quality and generalizes zero-shot to unseen off-road domains. Project page and code:https://github.com/wsnbws/Ground4D.

preprint2026arXiv

Indoor Fluid Antenna Systems Enabled by Layout-Specific Modeling and Group Relative Policy Optimization

Fluid antenna system (FAS) revolutionizes wireless communications via utilizing position-flexible antennas that dynamically optimize channel conditions and mitigate multipath fading. This innovation is particularly valuable in indoor environments, in which signal propagation is severely degraded due to structural obstructions and complex multipath reflections. In this paper, we investigate the channel modeling and the joint optimization of antenna positioning, beamforming, and power allocation for indoor FAS. In particular, we propose a layout-specific channel model, and employ the novel group relative policy optimization (GRPO) algorithm for tackling the optimization problem. Compared to the state-of-the-art Sionna model, our model achieves an 83.3% reduction in computation time with an approximately 3 dB increase in root-mean-square error (RMSE). When simplified to a two-ray model, our model allows for a closed-form antenna position solution with near-optimal performance. For the joint optimization problem, our GRPO algorithm outperforms proximal policy optimization (PPO) and other baselines in sum-rate, while requiring only 50.8% computational resources of PPO, thanks to its group advantage estimation. Simulation results show that increasing either the group size or trajectory length in GRPO does not yield significant improvements in sum-rate, suggesting that these parameters can be selected conservatively without sacrificing performance.

preprint2026arXiv

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Traditional workflow-based agents exhibit limited intelligence when addressing real-world problems requiring tool invocation. Tool-integrated reasoning (TIR) agents capable of autonomous reasoning and tool invocation are rapidly emerging as a powerful approach for complex decision-making tasks involving multi-step interactions with external environments. In this work, we introduce MindWatcher, a TIR agent integrating interleaved thinking and multimodal chain-of-thought (CoT) reasoning. MindWatcher can autonomously decide whether and how to invoke diverse tools and coordinate their use, without relying on human prompts or workflows. The interleaved thinking paradigm enables the model to switch between thinking and tool calling at any intermediate stage, while its multimodal CoT capability allows manipulation of images during reasoning to yield more precise search results. We implement automated data auditing and evaluation pipelines, complemented by manually curated high-quality datasets for training, and we construct a benchmark, called MindWatcher-Evaluate Bench (MWE-Bench), to evaluate its performance. MindWatcher is equipped with a comprehensive suite of auxiliary reasoning tools, enabling it to address broad-domain multimodal problems. A large-scale, high-quality local image retrieval database, covering eight categories including cars, animals, and plants, endows model with robust object recognition despite its small size. Finally, we design a more efficient training infrastructure for MindWatcher, enhancing training speed and hardware utilization. Experiments not only demonstrate that MindWatcher matches or exceeds the performance of larger or more recent models through superior tool invocation, but also uncover critical insights for agent training, such as the genetic inheritance phenomenon in agentic RL.

preprint2026arXiv

MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model

Text-guided audio editing aims to modify specific acoustic events while strictly preserving non-target content. Despite recent progress, existing approaches remain fundamentally limited. Training-free methods often suffer from signal degradation caused by diffusion inversion, while training-based methods, although achieving higher generation quality, are severely constrained by the scarcity of high-quality paired data and task formulations that cover only a narrow subset of editing operations. In addition, standard architectures typically decouple text and audio processing, limiting the ability to align instructions with specific acoustic contexts. To address these challenges, we propose MMEdit, an audio-language-model-driven framework for unified audio editing. We systematically extend task definitions to cover a comprehensive range of editing operations, including addition, replacement, removal, reordering, and attribute modification. Furthermore, we design a scalable data synthesis pipeline to construct large-scale paired datasets with fine-grained event-level annotations. To capture complex editing semantics, we integrate a Qwen2-Audio encoder with an MMDiT-based generator, enabling precise cross-modal alignment and localized editing. Experimental results demonstrate that our method achieves superior editing localization accuracy, robust instruction following, and high fidelity in non-edited regions.

preprint2026arXiv

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

preprint2026arXiv

Physically-Grounded Manifold Projection Model for Generalizable Metal Artifact Reduction in Dental CBCT

Metal artifacts in Dental CBCT severely obscure anatomical structures, hindering diagnosis. Current deep learning for Metal Artifact Reduction (MAR) faces limitations: supervised methods suffer from spectral blurring due to "regression-to-the-mean", while unsupervised ones risk structural hallucinations. Denoising Diffusion Models (DDPMs) offer realism but rely on slow, stochastic iterative sampling, unsuitable for clinical use. To resolve this, we propose the Physically-Grounded Manifold Projection (PGMP) framework. First, our Anatomically-Adaptive Physics Simulation (AAPS) pipeline synthesizes high-fidelity training pairs via Monte Carlo spectral modeling and patient-specific digital twins, bridging the synthetic-to-real gap. Second, our DMP-Former adapts the Direct x-Prediction paradigm, reformulating restoration as a deterministic manifold projection to recover clean anatomy in a single forward pass, eliminating stochastic sampling. Finally, a Semantic-Structural Alignment (SSA) module anchors the solution using priors from medical foundation models (MedDINOv3), ensuring clinical plausibility. Experiments on synthetic and multi-center clinical datasets show PGMP outperforms state-of-the-art methods on unseen anatomy, setting new benchmarks in efficiency and diagnostic reliability. Code and data: https://github.com/ricoleehduu/PGMP.

preprint2026arXiv

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Modern coding agents integrated into IDEs orchestrate powerful tools and high-privilege system access, creating a high-stakes attack surface. Prior work on Indirect Prompt Injection (IPI) is mainly query-specific, requiring particular user queries as triggers and leading to poor generalizability. We propose query-agnostic IPI, a new attack paradigm that reliably executes malicious payloads under arbitrary user queries. Our key insight is that malicious payloads should leverage the invariant prompt context (i.e., system prompt and tool descriptions) rather than variant user queries. We present QueryIPI, an automated framework that uses tool descriptions as optimizable payloads and refines them via iterative, prompt-based blackbox optimization. QueryIPI leverages system invariants for initial seed generation aligned with agent conventions, and iterative reflection to resolve instruction-following failures and safety refusals. Experiments on five simulated agents show that QueryIPI achieves up to 87% success rate, outperforming the best baseline (50%). Crucially, generated malicious descriptions transfer to real-world coding agents, highlighting a practical security risk.

preprint2026arXiv

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has demonstrated attacks against general-purpose and domain-specific agents, but none have focused on the security risks of tool invocation in coding agents. To fill this gap, we conduct the first systematic red-teaming of six popular real-world coding agents: Cursor, Claude Code, Copilot, Windsurf, Cline, and Trae. Our red-teaming proceeds in two phases. In Phase 1, we perform prompt leakage reconnaissance to recover system prompts. We discover a general vulnerability, ToolLeak, which allows malicious prompt exfiltration through benign argument retrieval during tool invocation. In Phase 2, we hijack the agent's tool-invocation behavior using a novel two-channel prompt injection in the tool description and return values, achieving remote code execution (RCE). We adaptively construct payloads using security information leaked in Phase 1. In emulation across five backends, our method outperforms baselines on Claude-Sonnet-4, Claude-Sonnet-4.5, Grok-4, and GPT-5. On real agents, our approach succeeds on 19 of 25 agent-LLM pairs, achieving leakage on every agent using Claude and Grok backends. For tool-invocation hijacking, we obtain RCE on every tested agent-LLM pair, with our two-channel method delivering the highest success rate. We provide case studies on Cursor and Claude Code, analyze security guardrails of external and built-in tools, and conclude with practical defense recommendations.

preprint2026arXiv

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in choosing between retrieval-augmented generation (RAG) and long-context (LC) strategies: RAG is efficient but constrained by retrieval quality, while LC supports global reasoning at higher cost and with position sensitivity. Existing methods such as Self-Route adopt failure-driven fallback from RAG to LC, but remain passive, inefficient, and hard to interpret. We propose Pre-Route, a proactive routing framework that performs structured reasoning before answering. Using lightweight metadata (e.g., document type, length, initial snippet), Pre-Route enables task analysis, coverage estimation, and information-need prediction, producing explainable and cost-efficient routing decisions. Our study shows three key findings: (i) LLMs possess latent routing ability that can be reliably elicited with guidelines, allowing single-sample performance to approach that of multi-sample (Best-of-N) results; (ii) linear probes reveal that structured prompts sharpen the separability of the "optimal routing dimension" in representation space; and (iii) distillation transfers this reasoning structure to smaller models for lightweight deployment. Experiments on LaRA (in-domain) and LongBench-v2 (OOD) confirm that Pre-Route outperforms Always-RAG, Always-LC, and Self-Route baselines, achieving superior overall cost-effectiveness.

preprint2026arXiv

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework

Large Language Model (LLM)-based agent systems are increasingly deployed for complex real-world tasks but remain vulnerable to natural language-based attacks that exploit over-privileged tool use. This paper aims to understand and mitigate such attacks through the lens of privilege escalation, defined as agent actions exceeding the least privilege required for a user's intended task. Based on a formal model of LLM agent systems, we identify novel privilege escalation scenarios, particularly in multi-agent systems, including a variant akin to the classic confused deputy problem. To defend against both known and newly demonstrated privilege escalation, we propose SEAgent, a mandatory access control (MAC) framework built upon attribute-based access control (ABAC). SEAgent monitors agent-tool interactions via an information flow graph and enforces customizable security policies based on entity attributes. Our evaluations show that SEAgent effectively blocks various privilege escalation while maintaining a low false positive rate and negligible system overhead. This demonstrates its robustness and adaptability in securing LLM-based agent systems.

preprint2026arXiv

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge

This paper summarizes the ICASSP 2026 Automatic Song Aesthetics Evaluation (ASAE) Challenge, which focuses on predicting the subjective aesthetic scores of AI-generated songs. The challenge consists of two tracks: Track 1 targets the prediction of the overall musicality score, while Track 2 focuses on predicting five fine-grained aesthetic scores. The challenge attracted strong interest from the research community and received numerous submissions from both academia and industry. Top-performing systems significantly surpassed the official baseline, demonstrating substantial progress in aligning objective metrics with human aesthetic preferences. The outcomes establish a standardized benchmark and advance human-aligned evaluation methodologies for modern music generation systems.

preprint2026arXiv

Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark

Off-road nighttime autonomous driving suffers from unreliable visible-light perception, making infrared modality crucial for accurate freespace detection. However, progress remains limited due to the scarcity of annotated infrared off-road datasets and the inter-frame inconsistencies inherent to current single-frame methods. To address these gaps, we present the IRON dataset, which, to our knowledge, is the first large-scale infrared dataset for off-road temporal freespace detection under all-day conditions, with strong support for nighttime perception. The dataset comprises 24,314 densely annotated infrared images with synchronized RGB images in diverse scenes and different light conditions. Building upon this dataset, we propose IRONet, a novel flow-free framework for temporal freespace detection that addresses inter-frame inconsistencies by aggregating historical context via a memory-attention mechanism and a carefully designed mask decoder. On our IRON dataset, IRONet achieves state-of-the-art performance, reaching 82.93%(+1.19%) IoU and 90.66%(+0.71%) F1 score at real-time inference. Remarkably, IRONet also exhibits robust generalization to RGB modalities on ORFD and Rellis datasets. Overall, our work establishes a foundation for reliable all-day off-road autonomous driving and future research in infrared temporal perception. The code and IRON dataset are available at https://github.com/wsnbws/IRON.

preprint2025arXiv

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of explicit load balancing constraints during inference allows adversarial inputs to trigger severe routing concentration. We demonstrate that out-of-distribution prompts can manipulate the routing strategy such that all tokens are consistently routed to the same set of top-$k$ experts, which creates computational bottlenecks on certain devices while forcing others to idle. This converts an efficiency mechanism into a denial-of-service attack vector, leading to violations of service-level agreements for time to first token. We propose RepetitionCurse, a low-cost black-box strategy to exploit this vulnerability. By identifying a universal flaw in MoE router behavior, RepetitionCurse constructs adversarial prompts using simple repetitive token patterns in a model-agnostic manner. On widely deployed MoE models like Mixtral-8x7B, our method increases end-to-end inference latency by 3.063x, degrading service availability significantly.

preprint2023arXiv

Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated Learning Framework

As a novel distributed learning paradigm, federated learning (FL) faces serious challenges in dealing with massive clients with heterogeneous data distribution and computation and communication resources. Various client-variance-reduction schemes and client sampling strategies have been respectively introduced to improve the robustness of FL. Among others, primal-dual algorithms such as the alternating direction of method multipliers (ADMM) have been found being resilient to data distribution and outperform most of the primal-only FL algorithms. However, the reason behind remains a mystery still. In this paper, we firstly reveal the fact that the federated ADMM is essentially a client-variance-reduced algorithm. While this explains the inherent robustness of federated ADMM, the vanilla version of it lacks the ability to be adaptive to the degree of client heterogeneity. Besides, the global model at the server under client sampling is biased which slows down the practical convergence. To go beyond ADMM, we propose a novel primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. In addition, FedVRA unifies several representative FL algorithms in the sense that they are either special instances of FedVRA or are close to it. Extensions of FedVRA to semi/un-supervised learning are also presented. Experiments based on (semi-)supervised image classification tasks demonstrate superiority of FedVRA over the existing schemes in learning scenarios with massive heterogeneous clients and client sampling.

preprint2023arXiv

Predictions of photophysical properties of phosphorescent platinum(II) complexes based on ensemble machine learning approach

Phosphorescent metal complexes have been under intense investigations as emissive dopants for energy efficient organic light emitting diodes (OLEDs). Among them, cyclometalated Pt(II) complexes are widespread triplet emitters with color-tunable emissions. To render their practical applications as OLED emitters, it is in great need to develop Pt(II) complexes with high radiative decay rate constant ($k_r$) and photoluminescence (PL) quantum yield. Thus, an efficient and accurate prediction tool is highly desirable. Here, we develop a general protocol for accurate predictions of emission wavelength, radiative decay rate constant, and PL quantum yield for phosphorescent Pt(II) emitters based on the combination of first-principles quantum mechanical method, machine learning (ML) and experimental calibration. A new dataset concerning phosphorescent Pt(II) emitters is constructed, with more than two hundred samples collected from the literature. Features containing pertinent electronic properties of the complexes are chosen. Our results demonstrate that ensemble learning models combined with stacking-based approaches exhibit the best performance, where the values of squared correlation coefficients ($R^2$), mean absolute error (MAE), and root mean square error (RMSE) are 0.96, 7.21 nm and 13.00 nm for emission wavelength prediction, and 0.81, 0.11 and 0.15 for PL quantum yield prediction. For radiative decay rate constant ($k_r$), the obtained value of $R^2$ is 0.67 while MAE and RMSE are 0.21 and 0.25 (both in log scale), respectively. The accuracy of the protocol is further confirmed using 24 recently reported Pt(II) complexes, which demonstrates its reliability for a broad palette of Pt(II) emitters.We expect this protocol will become a valuable tool, accelerating the rational design of novel OLED materials with desired properties.

preprint2022arXiv

A Framework Based on Generational and Environmental Response Strategies for Dynamic Multi-objective Optimization

Due to the dynamics and uncertainty of the dynamic multi-objective optimization problems (DMOPs), it is difficult for algorithms to find a satisfactory solution set before the next environmental change, especially for some complex environments. One reason may be that the information in the environmental static stage can not be used well in the traditional framework. In this paper, a novel framework based on generational and environmental response strategies (FGERS) is proposed, in which response strategies are run both in the environmental change stage and the environmental static stage to obtain population evolution information of those both stages. Unlike in the traditional framework, response strategies are only run in the environmental change stage. For simplicity, the feed-forward center point strategy was chosen to be the response strategy in the novel dynamic framework (FGERS-CPS). FGERS-CPS is not only to predict change trend of the optimum solution set in the environmental change stage, but to predict the evolution trend of the population after several generations in the environmental static stage. Together with the feed-forward center point strategy, a simple memory strategy and adaptive diversity maintenance strategy were used to form the complete FGERS-CPS. On 13 DMOPs with various characteristics, FGERS-CPS was compared with four classical response strategies in the traditional framework. Experimental results show that FGERS-CPS is effective for DMOPs.

preprint2022arXiv

Accelerating Edge Intelligence via Integrated Sensing and Communication

Realizing edge intelligence consists of sensing, communication, training, and inference stages. Conventionally, the sensing and communication stages are executed sequentially, which results in excessive amount of dataset generation and uploading time. This paper proposes to accelerate edge intelligence via integrated sensing and communication (ISAC). As such, the sensing and communication stages are merged so as to make the best use of the wireless signals for the dual purpose of dataset generation and uploading. However, ISAC also introduces additional interference between sensing and communication functionalities. To address this challenge, this paper proposes a classification error minimization formulation to design the ISAC beamforming and time allocation. The globally optimal solution is derived via the rank-1 guaranteed semidefinite relaxation, and performance analysis is performed to quantify the ISAC gain over that of conventional edge intelligence. Simulation results are provided to verify the effectiveness of the proposed ISAC-assisted edge intelligence system. Interestingly, we find that ISAC is always beneficial, when the duration of generating a sample is more than the duration of uploading a sample. Otherwise, the ISAC gain can vanish or even be negative. Nevertheless, we still derive a sufficient condition, under which a positive ISAC gain is feasible.

preprint2022arXiv

Accelerating Federated Edge Learning via Topology Optimization

Federated edge learning (FEEL) is envisioned as a promising paradigm to achieve privacy-preserving distributed learning. However, it consumes excessive learning time due to the existence of straggler devices. In this paper, a novel topology-optimized federated edge learning (TOFEL) scheme is proposed to tackle the heterogeneity issue in federated learning and to improve the communication-and-computation efficiency. Specifically, a problem of jointly optimizing the aggregation topology and computing speed is formulated to minimize the weighted summation of energy consumption and latency. To solve the mixed-integer nonlinear problem, we propose a novel solution method of penalty-based successive convex approximation, which converges to a stationary point of the primal problem under mild conditions. To facilitate real-time decision making, an imitation-learning based method is developed, where deep neural networks (DNNs) are trained offline to mimic the penalty-based method, and the trained imitation DNNs are deployed at the edge devices for online inference. Thereby, an efficient imitate-learning based approach is seamlessly integrated into the TOFEL framework. Simulation results demonstrate that the proposed TOFEL scheme accelerates the federated learning process, and achieves a higher energy efficiency. Moreover, we apply the scheme to 3D object detection with multi-vehicle point cloud datasets in the CARLA simulator. The results confirm the superior learning performance of the TOFEL scheme over conventional designs with the same resource and deadline constraints.

preprint2022arXiv

Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars

Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc. This work studies the problem of cross-modal human re-identification (ReID), in response to the regular human movements across camera-allowed regions (e.g., streets) and camera-restricted regions (e.g., offices) deployed with heterogeneous sensors. By leveraging the emerging low-cost RGB-D cameras and mmWave radars, we propose the first-of-its-kind vision-RF system for cross-modal multi-person ReID at the same time. Firstly, to address the fundamental inter-modality discrepancy, we propose a novel signature synthesis algorithm based on the observed specular reflection model of a human body. Secondly, an effective cross-modal deep metric learning model is introduced to deal with interference caused by unsynchronized data across radars and cameras. Through extensive experiments in both indoor and outdoor environments, we demonstrate that our proposed system is able to achieve ~92.5% top-1 accuracy and ~97.5% top-5 accuracy out of 56 volunteers. We also show that our proposed system is able to robustly reidentify subjects even when multiple subjects are present in the sensors' field of view.

preprint2022arXiv

Edge Federated Learning Via Unit-Modulus Over-The-Air Computation

Edge federated learning (FL) is an emerging paradigm that trains a global parametric model from distributed datasets based on wireless communications. This paper proposes a unit-modulus over-the-air computation (UMAirComp) framework to facilitate efficient edge federated learning, which simultaneously uploads local model parameters and updates global model parameters via analog beamforming. The proposed framework avoids sophisticated baseband signal processing, leading to low communication delays and implementation costs. Training loss bounds of UMAirComp FL systems are derived and two low-complexity large-scale optimization algorithms, termed penalty alternating minimization (PAM) and accelerated gradient projection (AGP), are proposed to minimize the nonconvex nonsmooth loss bound. Simulation results show that the proposed UMAirComp framework with PAM algorithm achieves a smaller mean square error of model parameters' estimation, training loss, and test error compared with other benchmark schemes. Moreover, the proposed UMAirComp framework with AGP algorithm achieves satisfactory performance while reduces the computational complexity by orders of magnitude compared with existing optimization algorithms. Finally, we demonstrate the implementation of UMAirComp in a vehicle-to-everything autonomous driving simulation platform. It is found that autonomous driving tasks are more sensitive to model parameter errors than other tasks since the neural networks for autonomous driving contain sparser model parameters.

preprint2022arXiv

Federated Stochastic Primal-dual Learning with Differential Privacy

Federated learning (FL) is a new paradigm that enables many clients to jointly train a machine learning (ML) model under the orchestration of a parameter server while keeping the local data not being exposed to any third party. However, the training of FL is an interactive process between local clients and the parameter server. Such process would cause privacy leakage since adversaries may retrieve sensitive information by analyzing the overheard messages. In this paper, we propose a new federated stochastic primal-dual algorithm with differential privacy (FedSPD-DP). Compared to the existing methods, the proposed FedSPD-DP incorporates local stochastic gradient descent (local SGD) and partial client participation (PCP) for addressing the issues of communication efficiency and straggler effects due to randomly accessed clients. Our analysis shows that the data sampling strategy and PCP can enhance the data privacy whereas the larger number of local SGD steps could increase privacy leakage, revealing a non-trivial tradeoff between algorithm communication efficiency and privacy protection. Specifically, we show that, by guaranteeing $(ε, δ)$-DP for each client per communication round, the proposed algorithm guarantees $(\mathcal{O}(qε\sqrt{p T}), δ)$-DP after $T$ communication rounds while maintaining an $\mathcal{O}(1/\sqrt{pTQ})$ convergence rate for a convex and non-smooth learning problem, where $Q$ is the number of local SGD steps, $p$ is the client sampling probability, $q=\max_{i} q_i/\sqrt{1-q_i}$ and $q_i$ is the data sampling probability of each client under PCP. Experiment results are presented to evaluate the practical performance of the proposed algorithm and comparison with state-of-the-art methods.

preprint2022arXiv

From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search

Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called `seed studies', prior to query formulation. Seed studies help verify the effectiveness of a query prior to the full assessment of retrieved studies. Beyond this use of seeds, specific IR methods can exploit seed studies for guiding both automatic query formulation and new retrieval models. One major limitation of work to date is that these methods exploit `pseudo seed studies' through retrospective use of included studies (i.e., relevance assessments). However, we show pseudo seed studies are not representative of real seed studies used by information specialists. Hence, we provide a test collection with real world seed studies used to assist with the formulation of queries. To support our collection, we provide an analysis, previously not possible, on how seed studies impact retrieval and perform several experiments using seed-study based methods to compare the effectiveness of using seed studies versus pseudo seed studies. We make our test collection and the results of all of our experiments and analysis available at http://github.com/ielab/sysrev-seed-collection

preprint2022arXiv

Global unique solution for the 3-D full compressible MHD equations in space of lower regularity

In this paper, we establish new $L^p$ gradient estimates of the solutions in order to discuss Cauchy problem for the full compressible magnetohydrodynamic(MHD) systems in $\mathrm{R}^3$. We use the "$\rm{div}-\rm{curl}$" decomposition technique (see \cite{{HJR},{MR}}) and new modified effective viscous flux and vorticity to calculate "$\Vert\nabla \mathbf{u}\Vert_{L^3}$" and "$\Vert\nabla \mathbb{H}\Vert_{L^3}$".As a result, we obtain global well-posedness for the solution with the initial data being in a class of space with lower regularity, while the energy of which should be suitably small.

preprint2022arXiv

Integrated Sensing, Communication, and Computation Over-the-Air: MIMO Beamforming Design

To support the unprecedented growth of the Internet of Things (IoT) applications, tremendous data need to be collected by the IoT devices and delivered to the server for further computation. By utilizing the same signals for both radar sensing and data communication, the integrated sensing and communication (ISAC) technique has broken the barriers between data collection and delivery in the physical layer. By exploiting the analog-wave addition in a multi-access channel, over-the-air computation (AirComp) enables function computation via transmissions in the physical layer. The promising performance of ISAC and AirComp motivates the current work on developing a framework called integrated sensing, communication, and computation over-the-air (ISCCO). The performance metrics of radar sensing and AirComp are evaluated by the mean squared errors of the estimated target response matrix and the received computation results, respectively. The design challenge of MIMO ISCCO lies in the joint optimization of beamformers for sensing, communication, and computation at both the IoT devices and the server, which results in a non-convex problem. To solve this problem, an algorithmic solution based on the technique of semidefinite relaxation is proposed. The use case of target location estimation based on ISCCO is demonstrated in simulation to show the performance superiority.

preprint2022arXiv

Learning Ball-balancing Robot Through Deep Reinforcement Learning

The ball-balancing robot (ballbot) is a good platform to test the effectiveness of a balancing controller. Considering balancing control, conventional model-based feedback control methods have been widely used. However, contacts and collisions are difficult to model, and often lead to failure in balancing control, especially when the ballbot tilts a large angle. To explore the maximum initial tilting angle of the ballbot, the balancing control is interpreted as a recovery task using Reinforcement Learning (RL). RL is a powerful technique for systems that are difficult to model, because it allows an agent to learn policy by interacting with the environment. In this paper, by combining the conventional feedback controller with the RL method, a compound controller is proposed. We show the effectiveness of the compound controller by training an agent to successfully perform a recovery task involving contacts and collisions. Simulation results demonstrate that using the compound controller, the ballbot can keep balance under a larger set of initial tilting angles, compared to the conventional model-based controller.

preprint2022arXiv

Longitudinal Prediction of Postnatal Brain Magnetic Resonance Images via a Metamorphic Generative Adversarial Network

Missing scans are inevitable in longitudinal studies due to either subject dropouts or failed scans. In this paper, we propose a deep learning framework to predict missing scans from acquired scans, catering to longitudinal infant studies. Prediction of infant brain MRI is challenging owing to the rapid contrast and structural changes particularly during the first year of life. We introduce a trustworthy metamorphic generative adversarial network (MGAN) for translating infant brain MRI from one time-point to another. MGAN has three key features: (i) Image translation leveraging spatial and frequency information for detail-preserving mapping; (ii) Quality-guided learning strategy that focuses attention on challenging regions. (iii) Multi-scale hybrid loss function that improves translation of tissue contrast and structural details. Experimental results indicate that MGAN outperforms existing GANs by accurately predicting both contrast and anatomical details.

preprint2022arXiv

Low-complexity Beam Selection algorithms based on SVD for MmWave Massive MIMO Systems

To realize mmWave massive MIMO systems in practice, Beamspace MIMO with beam selection provides an attractive solution at a considerably reduced number of radio frequency (RF) chains. We propose low-complexity beam selection algorithms based on singular value decomposition (SVD). We first diagonalize the channel matrix by SVD, and the appropriate beams are selected one-by-one in a decremental or incremental order based on the criterion of sum-rate maximization. To reduce the complexity of the proposed algorithms significantly, we make use of SVD in the last iteration to aviod SVD from scratch again. Meanwhile, our proposed algorithms naturally obtain the precoding matrix, which can eliminate the multiusers interference. Simulation results demonstrate that our proposed algorithms can outperform the competing algorithms, including the fully digital zero-precoding.

preprint2022arXiv

Near-field radiative heat transfer between hybrid polaritonic structures

Near-field radiative heat transfer between close objects may exceed the far-field blackbody radiation in orders of magnitude when exploiting polaritonic materials. Great efforts have been made to experimentally measure this fundamental stochastic effect but mostly based on simple materials. In this work, we foster an all-optical method to characterize the heat transfer between less explored plasmon-phonon hybrid polaritonic systems made of graphene-SiC heterostructures. A large heat flux about 26 times of the blackbody radiation limit is obtained over a 150-nm vacuum gap, attributed to the couplings of three different surface modes (plasmon, phonon polaritons and frustrated mode). The interaction of polaritonic modes in the hybrid system is also explored to build a switchable thermophotonic device with nearly unity heat flux tunability. This work paves the way for understanding mode-mediated near-field heat transfer and provides a platform for building thermophotonic or thermo-optoelectronic blocks for various applications.

preprint2022arXiv

Optimal time decay estimation for large-solution about 3D compressible MHD equations

This paper mainly focus on optimal time decay estimation for large-solution about compressible magnetohydrodynamic equations in 3D whole space, provided that $(σ_{0}-1,u_{0},M_{0})\in L^1\cap H^2$. In [2](Chen et al.,2019), they proved time decay estimation of $\|(σ-1,u,M)\|_{H^1}$ being $(1+t)^{-\frac{3}{4}}$. Based on it, we obtained that of $\|\nabla(σ-1,u,M)\|_{H^1}$ being $(1+t)^{-\frac{5}{4}}$ in [24]. Therefore, we are committed to improving that of $\|\nabla^2 (σ-1,u,M)\|_{L^2}$ in this paper. Thanks to the method adopted in [25] (Wang and Wen, 2021), we get the optimal time decay estimation to the highest-order derivative for space of solution, which means that time decay estimation of $\|\nabla^2 (σ-1,u,M)\|_{L^2}$ is $(1+t)^{-\frac{7}{4}}$.

preprint2022arXiv

Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods

Reconfigurable intelligent surfaces (RISs) have a revolutionary capability to customize the radio propagation environment for wireless networks. To fully exploit the advantages of RISs in wireless systems, the phases of the reflecting elements must be jointly designed with conventional communication resources, such as beamformers, transmit power, and computation time. However, due to the unique constraints on the phase shift, and massive numbers of reflecting units and users in large-scale networks, the resulting optimization problems are challenging to solve. This paper provides a review of current optimization methods and artificial intelligence-based methods for handling the constraints imposed by RIS and compares them in terms of solution quality and computational complexity. Future challenges in phase shift optimization involving RISs are also described and potential solutions are discussed.

preprint2022arXiv

Recent Increase of Tropical Cyclone Rapid Intensification in Global Coastal Regions

Rapid intensification (RI) is likely the most crucial contributor to the development of strong tropical cyclones and the largest source of prediction error resulting in great threats to life and property, which can become more threatening with proximity to landfall. While enormous efforts have been devoted to studying the basin-wide fluctuation, temporal-spatial variations of global RI events remain uncertain. Here, we show that, compared with open oceans where the annual RI counts do not show any significant change, the coastal offshore regions within 400 km from the coastline host significantly more RI events, with the RI count tripled from 1980 to 2020. Reasons responsible for the coastal RI occurrence are analysed, with the dominant large-scale environmental factors identified. This work yields an important new finding that an increasing threat of RI in coastal regions has occurred in the preceding decades, which may continue in a future warming climate.

preprint2022arXiv

Robotic Wireless Energy Transfer in Dynamic Environments: System Design and Experimental Validation

Wireless energy transfer (WET) is a ground-breaking technology for cutting the last wire between mobile sensors and power grids in smart cities. Yet, WET only offers effective transmission of energy over a short distance. Robotic WET is an emerging paradigm that mounts the energy transmitter on a mobile robot and navigates the robot through different regions in a large area to charge remote energy harvesters. However, it is challenging to determine the robotic charging strategy in an unknown and dynamic environment due to the uncertainty of obstacles. This paper proposes a hardware-in-the-loop joint optimization framework that offers three distinctive features: 1) efficient model updates and re-optimization based on the last-round experimental data; 2) iterative refinement of the anchor list for adaptation to different environments; 3) verification of algorithms in a high-fidelity Gazebo simulator and a multi-robot testbed. Experimental results show that the proposed framework significantly saves the WET mission completion time while satisfying collision avoidance and energy harvesting constraints.

preprint2022arXiv

Transition from multiphoton to tunneling ionization in the process of high harmonic generation in solids

High harmonic generation in solids is becoming an important method for strong field solid state physics research. The power scale relationship between high harmonics and the driving laser is investigated both experimentally and theoretically. Results of the power scale dependence clearly divided the interaction into two regimes. The modification of the bandgap by intense laser proved to be very important for theoretically reproducing the experimental result. Combining with the Keldysh theory analysis, the harmonic generation process is found to be transitioned from multiphoton excitations to the diabatic tunneling.

preprint2022arXiv

Tunable two-dimensional superconductivity and spin-orbit coupling at the EuO/KTaO3(110) interface

Unconventional quantum states, most notably the two-dimensional (2D) superconductivity, have been realized at the interfaces of oxide heterostructures where they can be effectively tuned by the gate voltage ($V_G$). Here we report that the interface between high-quality EuO (111) thin film and KTaO3 (KTO) (110) substrate shows superconductivity with onset transition temperature $T_c^{onset}$ = 1.35 K. The 2D nature of superconductivity is verified by the large anisotropy of the upper critical field and the characteristics of a Berezinskii-Kosterlitz-Thouless transition. By applying $V_G$, $T_c^{onset}$ can be tuned from ~ 1 to 1.7 K; such an enhancement can be possibly associated with a boosted spin-orbit energy $ε_{so}$ = $\hbar$ / $τ_{so}$, where $τ_{so}$ is the spin-orbit relaxation time. Further analysis of $τ_{so}$ based on the upper critical field ($H_{c2}$) and magnetoconductance reveals complex nature of spin-orbit coupling (SOC) at the EuO/KTO(110) interface with different mechanisms dominate the influence of SOC effects for the superconductivity and the magnetotransport in the normal state. Our results demonstrate that the SOC should be considered as an important factor determining the 2D superconductivity at oxide interfaces.

preprint2022arXiv

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural program embeddings directly from the program source codes, by learning from features such as tokens, abstract syntax trees, and control flow graphs. This paper takes a fresh look at how to improve program embeddings by leveraging compiler intermediate representation (IR). We first demonstrate simple yet highly effective methods for enhancing embedding quality by training embedding models alongside source code and LLVM IR generated by default optimization levels (e.g., -O2). We then introduce IRGen, a framework based on genetic algorithms (GA), to identify (near-)optimal sequences of optimization flags that can significantly improve embedding quality.

preprint2022arXiv

VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

In this paper, we propose a tightly-coupled SLAM system fused with RGB, Depth, IMU and structured plane information. Traditional sparse points based SLAM systems always maintain a mass of map points to model the environment. Huge number of map points bring us a high computational complexity, making it difficult to be deployed on mobile devices. On the other hand, planes are common structures in man-made environment especially in indoor environments. We usually can use a small number of planes to represent a large scene. So the main purpose of this article is to decrease the high complexity of sparse points based SLAM. We build a lightweight back-end map which consists of a few planes and map points to achieve efficient bundle adjustment (BA) with an equal or better accuracy. We use homography constraints to eliminate the parameters of numerous plane points in the optimization and reduce the complexity of BA. We separate the parameters and measurements in homography and point-to-plane constraints and compress the measurements part to further effectively improve the speed of BA. We also integrate the plane information into the whole system to realize robust planar feature extraction, data association, and global consistent planar reconstruction. Finally, we perform an ablation study and compare our method with similar methods in simulation and real environment data. Our system achieves obvious advantages in accuracy and efficiency. Even if the plane parameters are involved in the optimization, we effectively simplify the back-end map by using planar structures. The global bundle adjustment is nearly 2 times faster than the sparse points based SLAM algorithm.

preprint2021arXiv

Multiscale Attention Guided Network for COVID-19 Diagnosis Using Chest X-ray Images

Coronavirus disease 2019 (COVID-19) is one of the most destructive pandemic after millennium, forcing the world to tackle a health crisis. Automated lung infections classification using chest X-ray (CXR) images could strengthen diagnostic capability when handling COVID-19. However, classifying COVID-19 from pneumonia cases using CXR image is a difficult task because of shared spatial characteristics, high feature variation and contrast diversity between cases. Moreover, massive data collection is impractical for a newly emerged disease, which limited the performance of data thirsty deep learning models. To address these challenges, Multiscale Attention Guided deep network with Soft Distance regularization (MAG-SD) is proposed to automatically classify COVID-19 from pneumonia CXR images. In MAG-SD, MA-Net is used to produce prediction vector and attention from multiscale feature maps. To improve the robustness of trained model and relieve the shortage of training data, attention guided augmentations along with a soft distance regularization are posed, which aims at generating meaningful augmentations and reduce noise. Our multiscale attention model achieves better classification performance on our pneumonia CXR image dataset. Plentiful experiments are proposed for MAG-SD which demonstrates its unique advantage in pneumonia classification over cutting-edge models. The code is available at https://github.com/JasonLeeGHub/MAG-SD.

preprint2021arXiv

On Secure Degrees of Freedom of the MIMO Interference Channel with Local Output Feedback

This paper studies the problem of sum-secure degrees of freedom (SDoF) of the (M,M,N,N) multiple-input multiple-output (MIMO) interference channel with local output feedback, so as to build an information-theoretic foundation and provide practical transmission schemes for 6G-enabled vehicles-to-vehicles (V2V). For this problem, we propose two novel transmission schemes, i.e., the interference decoding scheme and the interference alignment scheme, and thus establish a sum-SDoF lower bound. In particular, to optimize the phase duration, we analyze the security and decoding constraints and formulate a linear-fractional optimization problem. Furthermore, we show that the derived sum-SDoF lower bound is the sum-SDoF for M <= N/2, N=M, and 2N <= M antenna configurations, and reveal that for a fixed N, the optimal M to maximize the sum-SDoF is not less than 2N. Through simulations, we examine the secure sum-rate performance of proposed transmission schemes, and reveal that using local output feedback can lead to a higher secure sum-rate than that by using delayed channel state information at the transmitter.

preprint2021arXiv

Reconfigurable Intelligent Surface Assisted Edge Machine Learning

The ever-growing popularity and rapid improving of artificial intelligence (AI) have raised rethinking on the evolution of wireless networks. Mobile edge computing (MEC) provides a natural platform for AI applications since it provides rich computation resources to train AI models, as well as low-latency access to the data generated by mobile and Internet of Things devices. In this paper, we present an infrastructure to perform machine learning tasks at an MEC server with the assistance of a reconfigurable intelligent surface (RIS). In contrast to conventional communication systems where the principal criteria are to maximize the throughput, we aim at optimizing the learning performance. Specifically, we minimize the maximum learning error of all users by jointly optimizing the beamforming vectors of the base station and the phase-shift matrix of the RIS. An alternating optimization-based framework is proposed to optimize the two terms iteratively, where closed-form expressions of the beamforming vectors are derived, and an alternating direction method of multipliers (ADMM)-based algorithm is designed together with an error level searching framework to effectively solve the nonconvex optimization problem of the phase-shift matrix. Simulation results demonstrate significant gains of deploying an RIS and validate the advantages of our proposed algorithms over various benchmarks.

preprint2021arXiv

Space Shift Keying with Reconfigurable Intelligent Surfaces: Phase Configuration Designs and Performance Analysis

Reconfigurable intelligent surface (RIS)-assisted transmission and space shift keying (SSK) appear as promising candidates for future energy-efficient wireless systems. In this paper, two RIS-based SSK schemes are proposed to efficiently improve the error and throughput performance of conventional SSK systems, respectively. The first one, termed RIS-SSK with passive beamforming (RIS-SSK-PB), employs an RIS for beamforming and targets the maximization of the minimum squared Euclidean distance between any two decision points. The second one, termed RIS-SSK with Alamouti space-time block coding (RIS-SSK-ASTBC), employs an RIS for ASTBC and enables the RIS to transmit its own Alamouti-coded information while reflecting the incident SSK signals to the destination. A low-complexity beamformer and an efficient maximum-likelihood (ML) detector are designed for RIS-SSK-PB and RIS-SSK-ASTBC, respectively. Approximate expressions for the average bit error probabilities of the source and/or the RIS are derived in closed-form assuming ML detection. Extensive computer simulations are conducted to verify the performance analysis. Results show that RIS-SSK-PB significantly outperforms the existing RIS-free and RIS-based SSK schemes, and RIS-SSK-ASTBC enables highly reliable transmission with throughput improvement.

preprint2021arXiv

Unit selection synthesis based data augmentation for fixed phrase speaker verification

Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data augmentation methods usually focus on the diversity of acoustic environment, leaving the lexicon variation neglected. For text dependent speaker verification tasks, it's well-known that preparing training data with the target transcript is the most effectual approach to build a well-performing system, however collecting such data is time-consuming and expensive. In this work, we propose a unit selection synthesis based data augmentation method to leverage the abundant text-independent data resources. In this approach text-independent speeches of each speaker are firstly broke up to speech segments each contains one phone unit. Then segments that contain phonetics in the target transcript are selected to produce a speech with the target transcript by concatenating them in turn. Experiments are carried out on the AISHELL Speaker Verification Challenge 2019 database, the results and analysis shows that our proposed method can boost the system performance significantly.

Shuai Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

86 published item(s)

1/2 order convergence rate of Euler-type methods for time-changed stochastic differential equations with super-linearly growing drift and diffusion coefficients

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Efficient Differentiable Causal Discovery via Reliable Super-Structure Learning

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

Indoor Fluid Antenna Systems Enabled by Layout-Specific Modeling and Group Relative Policy Optimization

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Physically-Grounded Manifold Projection Model for Generalizable Metal Artifact Reduction in Dental CBCT

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge

Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated Learning Framework

Predictions of photophysical properties of phosphorescent platinum(II) complexes based on ensemble machine learning approach

A Framework Based on Generational and Environmental Response Strategies for Dynamic Multi-objective Optimization

Accelerating Edge Intelligence via Integrated Sensing and Communication

Accelerating Federated Edge Learning via Topology Optimization

Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars

Edge Federated Learning Via Unit-Modulus Over-The-Air Computation

Federated Stochastic Primal-dual Learning with Differential Privacy

From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search

Global unique solution for the 3-D full compressible MHD equations in space of lower regularity

Integrated Sensing, Communication, and Computation Over-the-Air: MIMO Beamforming Design

Learning Ball-balancing Robot Through Deep Reinforcement Learning

Longitudinal Prediction of Postnatal Brain Magnetic Resonance Images via a Metamorphic Generative Adversarial Network

Low-complexity Beam Selection algorithms based on SVD for MmWave Massive MIMO Systems

Near-field radiative heat transfer between hybrid polaritonic structures

Optimal time decay estimation for large-solution about 3D compressible MHD equations

Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods

Recent Increase of Tropical Cyclone Rapid Intensification in Global Coastal Regions

Robotic Wireless Energy Transfer in Dynamic Environments: System Design and Experimental Validation

Transition from multiphoton to tunneling ionization in the process of high harmonic generation in solids

Tunable two-dimensional superconductivity and spin-orbit coupling at the EuO/KTaO3(110) interface

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

Multiscale Attention Guided Network for COVID-19 Diagnosis Using Chest X-ray Images

On Secure Degrees of Freedom of the MIMO Interference Channel with Local Output Feedback

Reconfigurable Intelligent Surface Assisted Edge Machine Learning

Space Shift Keying with Reconfigurable Intelligent Surfaces: Phase Configuration Designs and Performance Analysis

Unit selection synthesis based data augmentation for fixed phrase speaker verification

An Edge Computing-based Photo Crowdsourcing Framework for Real-time 3D Reconstruction

Angle Aware User Cooperation for Secure Massive MIMO in Rician Fading Channel

Arbitrary Polarization Conversion Dichroism Metasurfaces for All-in-One Full Poincaré Sphere Polarizers

BUT System for the Second DIHARD Speech Diarization Challenge

Deep Job Understanding at LinkedIn

Detecting Domain Polarity-Changes of Words in a Sentiment Lexicon

Edge Learning with Unmanned Ground Vehicle: Joint Path, Energy and Sample Size Planning

End-to-End Speaker-Dependent Voice Activity Detection

Enhanced contact angle hysteresis of salt aqueous solution on graphite surface by a tiny amount of cation

Enhancing Rumor Detection in Social Media Using Dynamic Propagation Structures

Hybrid Transceiver Optimization for Multi-Hop Communications

Improving Positive Unlabeled Learning: Practical AUL Estimation and New Training Method for Extremely Imbalanced Data Sets

Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Learning Centric Power Allocation for Edge Intelligence

Local Indicator of Colocation Quotient with a Statistical Significance Test: Examining Spatial Association of Crime and Facilities

Machine Intelligence at the Edge with Learning Centric Power Allocation

Matrix-Monotonic Optimization Part II: Multi-Variable Optimization

MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension

New Viewpoint and Algorithms for Water-Filling Solutions in Wireless Communications

Possible Quantum Paraelectric State in Kitaev Spin Liquid Candidate H$_{3}$LiIr$_{2}$O$_{6}$

Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events

WatchDog: Real-time Vehicle Tracking on Geo-distributed Edge Nodes

Full-color complex-amplitude vectorial holograms based on multi-freedom metasurfaces

Period-doubling bifurcation of dissipative-soliton-resonance pulses in a passively mode-locked fiber laser

Convex searches for discrete-time Zames-Falb multipliers

Joint Relay-User Beamforming Design in Full-Duplex Two-Way Relay Channel

Joint Transceiver and Power Splitter Design Over Two-Way Relaying Channel with Lattice Codes and Energy Harvesting

Modeling Review Spam Using Temporal Patterns and Co-bursting Behaviors

Translingual Obfuscation