Researcher profile

Pai Peng

Pai Peng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

EPD-Serve: A Flexible Multimodal EPD Disaggregation Inference Serving System On Ascend

With the widespread adoption of large multimodal models, efficient inference across text, image, audio, and video modalities has become critical. However, existing multimodal inference systems typically employ monolithic architectures that tightly couple the Encode, Prefill, and Decode stages on homogeneous hardware, neglecting the heterogeneous computational characteristics of each stage. This design leads to inefficient resource utilization and limited system throughput. To address these issues, we propose EPD-Serve, a stage-level disaggregated inference serving system for multimodal models. EPD-Serve decouples the inference pipeline into independent Encode, Prefill, and Decode stages, enabling logical isolation and flexible co-located deployment through dynamic orchestration. Leveraging the Ascend interconnect topology, EPD-Serve introduces asynchronous feature prefetching between Encode and Prefill stages and a hierarchical grouped KV cache transmission mechanism between Prefill and Decode stages to improve cross-node communication efficiency. In addition, EPD-Serve incorporates multi-route scheduling, instance-level load balancing, and multi-stage hardware co-location with spatial multiplexing to better support diverse multimodal workloads. Comprehensive experiments on multimodal understanding models demonstrate that, under high-concurrency scenarios, EPD-Serve improves end-to-end throughput by 57.37-69.48% compared to PD-disaggregated deployment, while satisfying strict SLO constraints, including TTFT below 2000 ms and TPOT below 50 ms. These results highlight the effectiveness of stage-level disaggregation for optimizing multimodal large model inference systems.

preprint2026arXiv

Slit-Induced Reflection Mode Conversion Between Fundamental Lamb Modes in Elastic Plates with Low-Frequency and Broadband Response

We present a compact slit-based interface that enables reflection mode conversion in an elastic plate by transforming an incident lowest-order antisymmetric Lamb mode into a reflected lowest-order symmetric Lamb mode. The interface is realized by introducing inclined vacuum slits along the lateral sides of the plate. This geometry intentionally breaks symmetry and produces a strongly localized, vortex-like deformation near the slit tips, which provides an efficient coupling pathway from the antisymmetric incident response to a symmetric reflected component. The conversion is quantified in a consistent, field-based manner by extracting the reflected antisymmetric level from the standing-wave envelope on a probe line placed sufficiently far from the slits, and then inferring the converted symmetric energy under a lossless assumption with only two reflected propagating channels. The proposed design delivers strong low-frequency conversion and sustains broadband half-power operation, while parametric studies confirm that the performance is tunable and robust over a wide range of slit angles and widths.

preprint2026arXiv

The DAWN of World-Action Interactive Models

A plausible scene evolution depends on the maneuver being considered, while a good maneuver depends on how the scene may evolve. Existing World Action Models (WAMs) largely miss this reciprocity, treating world prediction and action generation as either isolated parallel branches or rigid predict-then-plan pipelines. We formalize this perspective as World-Action Interactive Models (WAIMs), and instantiate it in autonomous driving with \textbf{DAWN} (\textbf{D}enoising \textbf{A}ctions and \textbf{W}orld i\textbf{N}teractive model), a simple yet strong latent generative baseline. DAWN operates in a compact semantic latent space and couples a \emph{World Predictor} with a \emph{World-Conditioned Action Denoiser}: the predicted world hypothesis conditions action denoising, while the denoised action hypothesis is fed back to update the world prediction, so that both are recursively refined during inference. Rather than eliminating test-time world evolution altogether or rolling out the full future in pixel space, DAWN performs a short explicit latent rollout that is sufficient to support long-horizon trajectory generation in complex interactive scenes. Experiments show that DAWN achieves strong planning performance and favorable safety-related results across multiple autonomous driving benchmarks. More broadly, our results suggest that interactive world-action generation is a principled path toward truly actionable world models.

preprint2026arXiv

Ultra-low-frequency reflection mode conversion from compressional to shear waves enabled by periodic inclined slits

A periodically patterned free surface with inclined slits can convert an incident compressional wave into a reflected shear wave with nearly complete efficiency at very low frequency. The system is described by two-dimensional in-plane linear elasticity, and the slits are treated as voids. The conversion is quantified by the ratio between the reflected shear-wave energy and the incident compressional-wave energy, obtained from mode decomposition and energy-flux evaluation below the surface. The results indicate that the inclined geometry introduces strong coupling between normal and tangential motions at the boundary, enabling suppression of the ordinary compressional reflection while redirecting the reflected energy into the shear channel. This simple, geometry-controlled mechanism provides a compact route for low-frequency elastic-wave polarization control.

preprint2023arXiv

FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network

Few-shot semantic segmentation is the task of learning to locate each pixel of the novel class in the query image with only a few annotated support images. The current correlation-based methods construct pair-wise feature correlations to establish the many-to-many matching because the typical prototype-based approaches cannot learn fine-grained correspondence relations. However, the existing methods still suffer from the noise contained in naive correlations and the lack of context semantic information in correlations. To alleviate these problems mentioned above, we propose a Feature-Enhanced Context-Aware Network (FECANet). Specifically, a feature enhancement module is proposed to suppress the matching noise caused by inter-class local similarity and enhance the intra-class relevance in the naive correlation. In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features, significantly boosting the encoder to capture a reliable matching pattern. Experiments on PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that our proposed FECANet leads to remarkable improvement compared to previous state-of-the-arts, demonstrating its effectiveness.

preprint2022arXiv

Video Moment Retrieval from Text Queries via Single Frame Annotation

Video moment retrieval aims at finding the start and end timestamps of a moment (part of a video) described by a given natural language query. Fully supervised methods need complete temporal boundary annotations to achieve promising results, which is costly since the annotator needs to watch the whole moment. Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor. In this paper, we look closer into the annotation process and propose a new paradigm called "glance annotation". This paradigm requires the timestamp of only one single random frame, which we refer to as a "glance", within the temporal boundary of the fully supervised counterpart. We argue this is beneficial because comparing to weak supervision, trivial cost is added yet more potential in performance is provided. Under the glance annotation setting, we propose a method named as Video moment retrieval via Glance Annotation (ViGA) based on contrastive learning. ViGA cuts the input video into clips and contrasts between clips and queries, in which glance guided Gaussian distributed weights are assigned to all clips. Our extensive experiments indicate that ViGA achieves better results than the state-of-the-art weakly supervised methods by a large margin, even comparable to fully supervised methods in some cases.

preprint2021arXiv

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search

Text-based person search aims at retrieving target person in an image gallery using a descriptive sentence of that person. It is very challenging since modal gap makes effectively extracting discriminative features more difficult. Moreover, the inter-class variance of both pedestrian images and descriptions is small. So comprehensive information is needed to align visual and textual clues across all scales. Most existing methods merely consider the local alignment between images and texts within a single scale (e.g. only global scale or only partial scale) then simply construct alignment at each scale separately. To address this problem, we propose a method that is able to adaptively align image and textual features across all scales, called NAFS (i.e.Non-local Alignment over Full-Scale representations). Firstly, a novel staircase network structure is proposed to extract full-scale image features with better locality. Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales. Then, instead of separately aligning features at each scale, a novel contextual non-local attention mechanism is applied to simultaneously discover latent alignments across all scales. The experimental results show that our method outperforms the state-of-the-art methods by 5.53% in terms of top-1 and 5.35% in terms of top-5 on text-based person search dataset. The code is available at https://github.com/TencentYoutuResearch/PersonReID-NAFS

preprint2020arXiv

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination

Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will potentially lead to a lower recall rate and a higher threshold introduces more false positives. This problem is more severe in pedestrian detection because the instance density varies more intensively. However, previous works on NMS don't consider or vaguely consider the factor of the existent of nearby pedestrians. Thus, we propose Nearby Objects Hallucinator (NOH), which pinpoints the objects nearby each proposal with a Gaussian distribution, together with NOH-NMS, which dynamically eases the suppression for the space that might contain other objects with a high likelihood. Compared to Greedy-NMS, our method, as the state-of-the-art, improves by $3.9\%$ AP, $5.1\%$ Recall, and $0.8\%$ $\text{MR}^{-2}$ on CrowdHuman to $89.0\%$ AP and $92.9\%$ Recall, and $43.9\%$ $\text{MR}^{-2}$ respectively.

preprint2020arXiv

Observation of Floquet prethermalization in dipolar spin chains

Periodically driven Floquet quantum systems provide a promising platform to investigate novel physics out of equilibrium. Unfortunately, the drive generically heats up the system to a featureless infinite temperature state. For large driving frequency, the heat absorption rate is predicted to be exponentially small, giving rise to a long-lived prethermal regime which exhibits all the intriguing properties of Floquet systems. Here we experimentally observe Floquet prethermalization using nuclear magnetic resonance techniques. We first show the relaxation of a far-from-equilibrium initial state to a long-lived prethermal state, well described by the time-independent ''prethermal'' Hamiltonian. By measuring the autocorrelation of this prethermal Hamiltonian we can further experimentally confirm the predicted exponentially slow heating rate. More strikingly, we find that in the timescale when the effective Hamiltonian picture breaks down, the Floquet system still possesses other quasi-conservation laws. Our results demonstrate that it is possible to realize robust Floquet engineering, thus enabling the experimental observation of non-trivial Floquet phases of matter.

preprint2020arXiv

Prethermal quasiconserved observables in Floquet quantum systems

Prethermalization, by introducing emergent quasiconserved observables, plays a crucial role in protecting Floquet many-body phases over exponentially long time, while the ultimate fate of such quasiconserved operators can signal thermalization to infinite temperature. To elucidate the properties of prethermal quasiconservation in many-body Floquet systems, here we systematically analyze infinite temperature correlations between observables. We numerically show that the late-time behavior of the autocorrelations unambiguously distinguishes quasiconserved observables from non-conserved ones, allowing to single out a set of linearly-independent quasiconserved observables. By investigating two Floquet spin models, we identify two different mechanism underlying the quasi-conservation law. First, we numerically verify energy quasiconservation when the driving frequency is large, so that the system dynamics is approximately described by a static prethermal Hamiltonian. More interestingly, under moderate driving frequency, another quasiconserved observable can still persist if the Floquet driving contains a large global rotation. We show theoretically how to calculate this conserved observable and provide numerical verification. Having systematically identified all quasiconserved observables, we can finally investigate their behavior in the infinite-time limit and thermodynamic limit, using autocorrelations obtained from both numerical simulation and experiments in solid state nuclear magnetic resonance systems.

preprint2020arXiv

Reflected continuously tunable acoustic metasurface with rotatable space coiling-up structure

In this paper, we propose a continuously tunable acoustic metasurface composed of identical anisotropic resonant units, each of which contains a rigid pedestal and a rotatable inclusion with space coiling-up structure. The metasurface can manipulate the reflected phase by adjusting the rotational angle of inclusion. The theoretical analysis shows that the polarization-dependent phase change can be induced by the even-order standing wave modes inside inclusion. By utilizing the rotatable inclusion, we design a tunable acoustic carpet cloaking device, which works with a wide range for incident angle. When incident waves come from different directions, the cloaking effect can be obtained by arrange the rotational angle of each inclusion.

preprint2019arXiv

Ultra-thin Underwater Acoustic Metasurface with Multiply Resonant Units

This paper describes a new kind of acoustic metasurface with multiply resonant units, which have previously been used to induce multiple resonances and effectively produce negative mass density and bulk/shear moduli. The proposed acoustic metasurface can be constructed using real materials and does not rely on an ideal rigid material. Therefore, it can work well in a water background. The thickness of the acoustic metasurface is about two orders of magnitude smaller than the acoustic wavelength in water. The design of a unit group is proposed to avoid the phase discretization becoming too fine in such a long-wavelength condition. We demonstrate that the proposed acoustic metasurface achieves good performance in anomalous reflection, focusing, and carpet cloaking.