Researcher profile

Peng Shi

Peng Shi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

MobileDreamer: Generative Sketch World Model for GUI Agent

Mobile GUI agents have shown strong potential in real-world automation and practical applications. However, most existing agents remain reactive, making decisions mainly from current screen, which limits their performance on long-horizon tasks. Building a world model from repeated interactions enables forecasting action outcomes and supports better decision making for mobile GUI agents. This is challenging because the model must predict post-action states with spatial awareness while remaining efficient enough for practical deployment. In this paper, we propose MobileDreamer, an efficient world-model-based lookahead framework to equip the GUI agents based on the future imagination provided by the world model. It consists of textual sketch world model and rollout imagination for GUI agent. Textual sketch world model forecasts post-action states through a learning process to transform digital images into key task-related sketches, and designs a novel order-invariant learning strategy to preserve the spatial information of GUI elements. The rollout imagination strategy for GUI agent optimizes the action-selection process by leveraging the prediction capability of world model. Experiments on Android World show that MobileDreamer achieves state-of-the-art performance and improves task success by 5.25%. World model evaluations further verify that our textual sketch modeling accurately forecasts key GUI elements.

preprint2025arXiv

Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR

Reading text from images or scanned documents via OCR models has been a longstanding focus of researchers. Intuitively, text reading is perceived as a straightforward perceptual task, and existing work primarily focuses on constructing enriched data engineering to enhance SFT capabilities. In this work, we observe that even advanced OCR models exhibit significantly higher entropy in formatted text (\emph{e.g.}, formula, table, etc.) compared to plain text, often by an order of magnitude. These statistical patterns reveal that advanced OCR models struggle with high output uncertainty when dealing with format sensitive document, suggesting that reasoning over diverse reading pathways may improve OCR performance. To address this, we propose format decoupled reinforcement learning (FD-RL), which leverages high-entropy patterns for targeted optimization. Our approach employs entropy-based data filtration strategy to identify format-intensive instances, and adopt format decoupled rewards tailored to different format types, enabling format-level validation rather than token-level memorization. FD-RL achieves an average score of 90.41 on OmniDocBench, setting a new record for end-to-end models on this highly popular benchmark. More importantly, we conduct comprehensive ablation studies over data, training, filtering, and rewarding strategies, thoroughly validating their effectiveness.

preprint2024arXiv

Optical skyrmions and other topological quasiparticles of light

Skyrmions are topologically stable quasiparticles that have been predicted and demonstrated in quantum fields, solid-state physics, and magnetic materials, but only recently observed in electromagnetic fields, triggering fast expanding research across different spectral ranges and applications. Here we review the recent advances in optical skyrmions within a unified framework. Starting from fundamental theories, including classification of skyrmionic states, we describe generation and topological control of different kinds of optical skyrmions in structured and time-dependent optical fields. We further highlight generalized classes of optical topological quasiparticles beyond skyrmions and outline the emerging applications, future trends, and open challenges. A complex vectorial field structure of optical quasiparticles with versatile topological characteristics emerges as an important feature in modern spin-optics, imaging and metrology, optical forces, structured light and topological and quantum technologies.

preprint2023arXiv

General Solution to 2D Steady Navier-Stokes Equation for Incompressible Flow without vorticity diffusion

The study solves the general solution to 2D steady Navier-Stokes equation for incompressible flow without vorticity diffusion, which is more general than Stokes flow. In order to obtain the general solution, two potential functions are introduced to express the velocity: a vector potential describing the rotational incompressible flow and a scalar potential describing the irrotational incompressible flow. The results show that the vorticity equation expressed with potential functions is a biharmonic function, which means that the potential functions describing the flow field are polynomials of no more than fourth degree. For a steady unidirectional shear flow, the velocity and pressure fields can be described with the vector potential expressed by a polynomial of third degree. For non unidirectional two-dimensional steady shear flow, there may be four independent parameters in the two potential functions.

preprint2022arXiv

Better Language Model with Hypernym Class Prediction

Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and Arxiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.

preprint2022arXiv

Intrinsic spin-momentum dynamics of surface electromagnetic waves in complex dispersive system

Spin-momentum locking is an intrinsic property of surface electromagnetic fields and its study has led to the discovery of photonic spin lattices and diverse applications. Previously, dispersion was ignored in the spin-momentum locking, giving rise to abnormal phenomena contradictory to the physical realities. Here, we formulate four dispersive spin-momentum equations for surface waves, revealing universally that the transverse spin vector is locked with the momentum. The locking property obeys the right-hand rule in the dielectric but the left-hand rule in the dispersive metal/magnetic materials. In addition to the dispersion, the structural features can affect the spin-momentum locking significantly. Remarkably, an extraordinary longitudinal spin originating from the coupling polarization ellipticity is uncovered even for the purely polarized state. We further demonstrate the spin-momentum locking properties with diverse photonic topological lattices by engineering the rotating symmetry. The findings open up opportunities for designing robust nanodevices with practical importance in chiral quantum optics.

preprint2022arXiv

Semi-global Periodic Event-triggered Output Regulation for Nonlinear Multi-agent Systems

This study focuses on periodic event-triggered (PET) cooperative output regulation problem for a class of nonlinear multi-agent systems. The key feature of PET mechanism is that event-triggered conditions are required to be monitored only periodically. This approach is beneficial for Zeno behavior exclusion and saving of battery energy of onboard sensors. At first, new PET distributed observers are proposed to estimate the leader information. We show that the estimation error converges to zero exponentially with a known convergence rate under asynchronous PET communication. Second, a novel PET output feedback controller is designed for the underlying strict feedback nonlinear multi-agent systems. Based on a state transformation technique and a local PET state observer, the cooperative semi-global output regulation problem can be solved by the proposed new control design technique. Simulation results of multiple Lorenz systems illustrate that the developed control scheme is effective.

preprint2022arXiv

Solution to Waves in Dissipative Media with Reciprocal Attenuation in Time and Space Domains

The study points out that the traditional solutions to wave equation of dissipative wave and motion equation of block for a multi-degree-of-freedom mass spring damper system are the possible solutions, which are not necessarily objective and conflict each other. The disturbance in discrete system like crystals vibration can be expressed in differential form. A new general solution to dissipative wave equation is proposed with the general Fourier transform. The solution reveals that the attenuation of the disturbance can simultaneously occur in time and space domains. Then the general solution is used in case studies to analyze the properties of dissipative waves. It is concluded that the properties of waves formulated with the same equation can be different because of the difference of attenuation mechanism.

preprint2020arXiv

Periodic event-triggered output regulation for linear multi-agent systems

This study considers the problem of periodic event-triggered (PET) cooperative output regulation for a class of linear multi-agent systems. The advantage of the PET output regulation is that the data transmission and triggered condition are only needed to be monitored at discrete sampling instants. It is assumed that only a small number of agents can have access to the system matrix and states of the leader. Meanwhile, the PET mechanism is considered not only in the communication between various agents, but also in the sensor-to-controller and controller-to-actuator transmission channels for each agent. The above problem set-up will bring some challenges to the controller design and stability analysis. Based on a novel PET distributed observer, a PET dynamic output feedback control method is developed for each follower. Compared with the existing works, our method can naturally exclude the Zeno behavior, and the inter-event time becomes multiples of the sampling period. Furthermore, for every follower, the minimum inter-event time can be determined \textit{a prior}, and computed directly without the knowledge of the leader information. An example is given to verify and illustrate the effectiveness of the new design scheme.

preprint2020arXiv

What is Shear Wave

This study shows that the traditional definition of shear wave breaks the shear stress reciprocity. By analyzing the displacement field for shear wave and the motion equation of material element, it is found that the displacement field is related to local rigid body rotation, and the local rigid body rotation cannot be balanced by the stress state assumed in classical continuum mechanics. It is also found that the displacement field caused by shear deformation, which is neither divergence nor rotation, is replaced with that caused by local rigid body rotation during the derivation of wave equation. This indicates that the definition of shear wave is beyond the basic assumption of continuum mechanics. The study modified the constitutive relation and elastic tensor based on that the traditionally defined shear wave is objective. The motion equation corresponding to shear wave is also derived. It is concluded from the definition of traditional shear wave that the local rigid body rotation should contribute stress and the shear stress reciprocity is not prerequisite for nonpolar continuum.

preprint2019arXiv

A new perspective from a Dirichlet model for forecasting outstanding liabilities of nonlife insurers

Forecasting the outstanding claim liabilities to set adequate reserves is critical for a nonlife insurer's solvency. Chain-Ladder and Bornhuetter-Ferguson are two prominent actuarial approaches used for this task. The selection between the two approaches is often ad hoc due to different underlying assumptions. We introduce a Dirichlet model that provides a common statistical framework for the two approaches, with some appealing properties. Depending on the type of information available, the model inference naturally leads to either Chain-Ladder or Bornhuetter-Ferguson prediction. Using claims data on Worker's compensation insurance from several US insurers, we discuss both frequentist and Bayesian inference.