Researcher profile

Wen Wu

Wen Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

Pre-training neural operators on diverse partial differential equation (PDE) datasets has emerged as a promising direction for building general-purpose surrogate models in scientific machine learning. However, the inherent complexity and structural diversity of PDE solution operators make multi-PDE pre-training fundamentally challenging. Existing methods mainly address this by increasing model capacity, while leaving the target solution operators unchanged. Inspired by classical numerical analysis, we instead propose to transform complex and diverse solution operators into simpler, better-aligned forms that are easier to model jointly. Since the optimal transformation varies across PDE types, it must be adaptive and input-dependent, allowing a single neural operator to approximate an entire family of operators. We instantiate this idea as AOT-POT (adaptive operator-transformation for pre-training operator transformer), which expands hidden representations into multiple parallel streams, adaptively aggregates and redistributes them before and after each sub-layer, and mixes streams through Sinkhorn-projected doubly stochastic matrices for stable training. These mechanisms together reshape diverse solution operators into a unified form that can be effectively modeled by a single architecture. Empirically, AOT-POT achieves state-of-the-art performance on 12 PDE benchmarks with only 3\% additional parameters, reducing relative L2 error by up to 77.6\% (40.9\% on average). Fine-tuning AOT-POT further reduces L2 error by up to 92\% on in-domain PDEs and 89\% on out-of-domain PDEs (unseen types during pre-training), demonstrating that adaptive operator transformation is an effective and complementary direction for advancing PDE foundation models beyond simply scaling model capacity.

preprint2026arXiv

HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding

Speculative decoding (SD) has emerged as a promising approach to accelerate LLM inference without sacrificing output quality. Existing SD methods tailored for video-LLMs primarily focus on pruning redundant visual tokens to mitigate the computational burden of massive visual inputs. However, existing methods do not achieve inference acceleration comparable to text-only LLMs. We observe from extensive experiments that this phenomenon mainly stems from two limitations: (i) their pruning strategies inadequately preserve visual semantic tokens, degrading draft quality and acceptance rates; (ii) even with aggressive pruning (e.g., 90% visual tokens removed), the draft model's remaining inference cost limits overall speedup. To address these limitations, we propose HIPPO, a general holistic-aware parallel speculative decoding framework. Specifically, HIPPO proposes (i) a semantic-aware token preservation method, which fuses global attention scores with local visual semantics to retain semantic information at high pruning ratios; (ii) a video parallel SD algorithm that decouples and overlaps draft generation and target verification phases. Experiments on four video-LLMs across six benchmarks demonstrate HIPPO's effectiveness, yielding up to 3.51x speedup compared to vanilla auto-regressive decoding.

preprint2026arXiv

MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Domain Risks in LLMs

Ensuring the safety of Large Language Models (LLMs) is critical for real-world deployment. However, current safety measures often fail to address implicit, domain-specific risks. To investigate this gap, we introduce a dataset of 3,000 annotated queries spanning education, finance, and management. Evaluations across 14 leading LLMs reveal a concerning vulnerability: an average jailbreak success rate of 57.8%. In response, we propose MENTOR, a metacognition-driven self-evolution framework. MENTOR first performs structured self-assessment through simulated critical thinking, such as perspective-taking and consequential reasoning to uncover latent model misalignments. These reflections are formalized into dynamic rule-based knowledge graphs that evolve with emerging risk patterns. To enforce these rules at inference time, we introduce activation steering, a method that directly modulates the model's internal representations to ensure compliance. Experiments demonstrate that MENTOR substantially reduces attack success rates across all tested domains and achieves risk analysis performance comparable to human experts. Our work offers a scalable and adaptive pathway toward robust domain-specific alignment of LLMs.

preprint2026arXiv

MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model

Text-guided audio editing aims to modify specific acoustic events while strictly preserving non-target content. Despite recent progress, existing approaches remain fundamentally limited. Training-free methods often suffer from signal degradation caused by diffusion inversion, while training-based methods, although achieving higher generation quality, are severely constrained by the scarcity of high-quality paired data and task formulations that cover only a narrow subset of editing operations. In addition, standard architectures typically decouple text and audio processing, limiting the ability to align instructions with specific acoustic contexts. To address these challenges, we propose MMEdit, an audio-language-model-driven framework for unified audio editing. We systematically extend task definitions to cover a comprehensive range of editing operations, including addition, replacement, removal, reordering, and attribute modification. Furthermore, we design a scalable data synthesis pipeline to construct large-scale paired datasets with fine-grained event-level annotations. To capture complex editing semantics, we integrate a Qwen2-Audio encoder with an MMDiT-based generator, enabling precise cross-modal alignment and localized editing. Experimental results demonstrate that our method achieves superior editing localization accuracy, robust instruction following, and high fidelity in non-edited regions.

preprint2023arXiv

Beef up mmWave Dense Cellular Networks with D2D-Assisted Cooperative Edge Caching

Edge caching is emerging as the most promising solution to reduce the content retrieval delay and relieve the huge burden on the backhaul links in the ultra-dense networks by proactive caching popular contents in the small base station (SBS). However, constraint cache resource of individual SBSs significantly throttles the performance of edge caching. In this paper, we propose a device-to-device (D2D) assisted cooperative edge caching (DCEC) policy for millimeter (mmWave) dense networks, which cooperatively utilizes the cache resource of users and SBSs in proximity. In the proposed DCEC policy, a content can be cached in either users' devices or SBSs according to the content popularity, and a user can retrieve the requested content from neighboring users via D2D links or the neighboring SBSs via cellular links to efficiently exploit the cache diversity. Unlike existing cooperative caching policies in the lower frequency bands that require complex interference management techniques to suppress interference, we take advantage of directional antenna in mmWave systems to ensure high transmission rate whereas mitigating interference footprint. Taking the practical directional antenna model and the network density into consideration, we derive closed-form expressions of the backhaul offloading performance and content retrieval delay based on the stochastic information of network topology. In addition, analytical results indicate that, with the increase of the network density, the content retrieval delay via D2D links increases significantly while that via cellular links increases slightly. Comprehensive simulations validate our theoretical analysis and demonstrate that the proposed policy can achieve higher performance in offloading the backhaul traffic and reducing the content retrieval delay compared with the state-of-the-art most popular caching (MPC) policy.

preprint2023arXiv

Holistic Network Virtualization and Pervasive Network Intelligence for 6G

In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.

preprint2023arXiv

Model-Driven Deep Learning for Non-Coherent Massive Machine-Type Communications

In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.

preprint2023arXiv

Performance Analysis and Enhancement of Beamforming Training in 802.11ad

Beamforming (BF) training is crucial to establishing reliable millimeter-wave communication connections between stations (STAs) and an access point. In IEEE 802.11ad BF training protocol, all STAs contend for limited BF training opportunities, i.e., associated BF training (A-BFT) slots, which results in severe collisions and significant BF training latency, especially in dense user scenarios. In this paper, we first develop an analytical model to evaluate the BF training protocol performance. Our analytical model accounts for various protocol components, including user density, the number of A-BFT slots, and protocol parameters, i.e., retry limit and contention window size. We then derive the average successful BF training probability, the BF training efficiency and latency. Since the derived BF training efficiency is an implicit function, to reveal the relationship between system parameters and BF training performance, we also derive an approximate expression of BF training efficiency. Theoretical analysis indicates that the BF training efficiency degrades drastically in dense user scenarios. To address this issue, we propose an enhancement scheme which adaptively adjusts the protocol parameters in tune with user density, to improve the BF training performance in dense user scenarios. Extensive simulations are carried out to validate the accuracy of the developed analytical model. In addition, simulation results show that the proposed enhancement scheme can improve the BF training efficiency by 35% in dense user scenarios.

preprint2022arXiv

Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning

Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.

preprint2022arXiv

Cost-Effective Two-Stage Network Slicing for Edge-Cloud Orchestrated Vehicular Networks

In this paper, we study a network slicing problem for edge-cloud orchestrated vehicular networks, in which the edge and cloud servers are orchestrated to process computation tasks for reducing network slicing cost while satisfying the quality of service requirements. We propose a two-stage network slicing framework, which consists of 1) network planning stage in a large timescale to perform slice deployment, edge resource provisioning, and cloud resource provisioning, and 2) network operation stage in a small timescale to perform resource allocation and task dispatching. Particularly, we formulate the network slicing problem as a two-timescale stochastic optimization problem to minimize the network slicing cost. Since the problem is NP-hard due to coupled network planning and network operation stages, we develop a Two timescAle netWork Slicing (TAWS) algorithm by collaboratively integrating reinforcement learning (RL) and optimization methods, which can jointly make network planning and operation decisions. Specifically, by leveraging the timescale separation property of decisions, we decouple the problem into a large-timescale network planning subproblem and a small-timescale network operation subproblem. The former is solved by an RL method, and the latter is solved by an optimization method. Simulation results based on real-world vehicle traffic traces show that the TAWS can effectively reduce the network slicing cost as compared to the benchmark scheme.

preprint2022arXiv

Multi-channel Attentive Graph Convolutional Network With Sentiment Fusion For Multimodal Sentiment Analysis

Nowadays, with the explosive growth of multimodal reviews on social media platforms, multimodal sentiment analysis has recently gained popularity because of its high relevance to these social media posts. Although most previous studies design various fusion frameworks for learning an interactive representation of multiple modalities, they fail to incorporate sentimental knowledge into inter-modality learning. This paper proposes a Multi-channel Attentive Graph Convolutional Network (MAGCN), consisting of two main components: cross-modality interactive learning and sentimental feature fusion. For cross-modality interactive learning, we exploit the self-attention mechanism combined with densely connected graph convolutional networks to learn inter-modality dynamics. For sentimental feature fusion, we utilize multi-head self-attention to merge sentimental knowledge into inter-modality feature representations. Extensive experiments are conducted on three widely-used datasets. The experimental results demonstrate that the proposed model achieves competitive performance on accuracy and F1 scores compared to several state-of-the-art approaches.

preprint2022arXiv

Personalized QoE Enhancement for Adaptive Video Streaming: A Digital Twin-Assisted Scheme

In this paper, we present a digital twin (DT)-assisted adaptive video streaming scheme to enhance personalized quality-of-experience (PQoE). Since PQoE models are user-specific and time-varying, existing schemes based on universal and time-invariant PQoE models may suffer from performance degradation. To address this issue, we first propose a DT-assisted PQoE model construction method to obtain accurate user-specific PQoE models. Specifically, user DTs (UDTs) are respectively constructed for individual users, which can acquire and utilize users' data to accurately tune PQoE model parameters in real time. Next, given the obtained PQoE models, we formulate a resource management problem to maximize the overall long-term PQoE by taking the dynamics of user' locations, video content requests, and buffer statuses into account. To solve this problem, a deep reinforcement learning algorithm is developed to jointly determine segment version selection, and communication and computing resource allocation. Simulation results on the real-world dataset demonstrate that the proposed scheme can effectively enhance PQoE compared with benchmark schemes.

preprint2022arXiv

Split Learning over Wireless Networks: Parallel Design and Resource Management

Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer. The existing SL approach conducts the training process sequentially across devices, which incurs significant training latency especially when the number of devices is large. In this paper, we design a novel SL scheme to reduce the training latency, named Cluster-based Parallel SL (CPSL) which conducts model training in a "first-parallel-then-sequential" manner. Specifically, the CPSL is to partition devices into several clusters, parallelly train device-side models in each cluster and aggregate them, and then sequentially train the whole AI model across clusters, thereby parallelizing the training process and reducing training latency. Furthermore, we propose a resource management algorithm to minimize the training latency of CPSL considering device heterogeneity and network dynamics in wireless networks. This is achieved by stochastically optimizing the cut layer selection, real-time device clustering, and radio spectrum allocation. The proposed two-timescale algorithm can jointly make the cut layer selection decision in a large timescale and device clustering and radio spectrum allocation decisions in a small timescale. Extensive simulation results on non-independent and identically distributed data demonstrate that the proposed solutions can greatly reduce the training latency as compared with the existing SL benchmarks, while adapting to network dynamics.

preprint2020arXiv

Efficient Hybrid Beamforming with Anti-Blockage Design for High-Speed Railway Communications

Future railway is expected to accommodate both train operation services and passenger broadband services. The millimeter wave (mmWave) communication is a promising technology in providing multi-gigabit data rates to onboard users. However, mmWave communications suffer from severe propagation attenuation and vulnerability to blockage, which can be very challenging in high-speed railway (HSR) scenarios. In this paper, we investigate efficient hybrid beamforming (HBF) design for train-to-ground communications. First, we develop a two-stage HBF algorithm in blockage-free scenarios. In the first stage, the minimum mean square error method is adopted for optimal hybrid beamformer design with low complexity and fast convergence; in the second stage, the orthogonal matching pursuit method is utilized to approximately recover the analog and digital beamformers. Second, in blocked scenarios, we design an anti-blockage scheme by adaptively invoking the proposed HBF algorithm, which can efficiently deal with random blockages. Extensive simulation results are presented to show the sum rate performance of the proposed algorithms under various configurations, including transmission power, velocity of the train, blockage probability, etc. It is demonstrated that the proposed anti-blockage algorithm can improve the effective rate by 20% in severely-blocked scenarios while maintaining low outage probability.

preprint2020arXiv

Hankel determinants of a Sturmian sequence

Let $τ$ be the substitution $1\to 101$ and $0\to 1$ on the alphabet $\{0,1\}$. The fixed point of $τ$ leading by 1, denoted by $\mathbf{s}$, is a Sturmian sequence. We first give a characterization of $\mathbf{s}$ using $f$-representation. Then we show that the distribution of zeros in the determinants induces a partition of integer lattices in the first quadrant. Combining those properties, we give the explicit values of the Hankel determinants $H_{m,n}$ of $\mathbf{s}$ for all $m\ge 0$ and $n\ge 1$.

preprint2020arXiv

Pyramid Focusing Network for mutation prediction and classification in CT images

Predicting the mutation status of genes in tumors is of great clinical significance. Recent studies have suggested that certain mutations may be noninvasively predicted by studying image features of the tumors from Computed Tomography (CT) data. Currently, this kind of image feature identification method mainly relies on manual processing to extract generalized image features alone or machine processing without considering the morphological differences of the tumor itself, which makes it difficult to achieve further breakthroughs. In this paper, we propose a pyramid focusing network (PFNet) for mutation prediction and classification based on CT images. Firstly, we use Space Pyramid Pooling to collect semantic cues in feature maps from multiple scales according to the observation that the shape and size of the tumors are varied.Secondly, we improve the loss function based on the consideration that the features required for proper mutation detection are often not obvious in cross-sections of tumor edges, which raises more attention to these hard examples in the network. Finally, we devise a training scheme based on data augmentation to enhance the generalization ability of networks. Extensively verified on clinical gastric CT datasets of 20 testing volumes with 63648 CT images, our method achieves the accuracy of 94.90% in predicting the HER-2 genes mutation status of at the CT image.

preprint2020arXiv

Stieltjes continued fractions related to the Paperfolding sequence and Rudin-Shapiro sequence

We investigate two Stieltjes continued fractions given by the paperfolding sequence and the Rudin-Shapiro sequence. By explicitly describing certain subsequences of the convergents $P_n(x)/Q_n(x)$ modulo $4$, we give the formal power series expansions (modulo $4$) of these two continued fractions and prove that they are congruent modulo $4$ to algebraic series in $\mathbb{Z}[[x]]$. Therefore, the coefficient sequences of the formal power series expansions are $2$-automatic. Write $Q_{n}(x)=\sum_{i\ge 0}a_{n,i}x^{i}$. Then $(Q_{n}(x))_{n\ge 0}$ defines a two-dimensional coefficient sequence $(a_{n,i})_{n,i\ge 0}$. We prove that the coefficient sequences $(a_{n,i}\mod 4)_{n\ge 0}$ introduced by both $(Q_{n}(x))_{n\ge 0}$ and $(P_{n}(x))_{n\ge 0}$ are $2$-automatic for all $i\ge 0$. Moreover, the pictures of these two dimensional coefficient sequences modulo $4$ present a kind of self-similar phenomenon.