Researcher profile

Pan Wang

Pan Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting

Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation, yet their clinical translation is hindered by architectural heterogeneity and the prevalence of factual hallucinations. Standard supervised fine-tuning often fails to strictly align linguistic outputs with visual evidence, while existing reinforcement learning approaches struggle with either prohibitive computational costs or limited exploration. To address these challenges, we propose a comprehensive framework for self-consistent radiology report generation. First, we conduct a systematic evaluation to identify optimal vision encoder and LLM backbone configurations for medical imaging. Building on this foundation, we introduce a novel "Reason-then-Summarize" architecture optimized via Group Relative Policy Optimization (GRPO). This framework restructures generation into two distinct components: a think block for detailed findings and an answer block for structured disease labels. By utilizing a multi-dimensional composite reward function, we explicitly penalize logical discrepancies between the generated narrative and the final diagnosis. Extensive experiments on the MIMIC-CXR benchmark demonstrate that our method achieves state-of-the-art performance in clinical efficacy metrics and significantly reduces hallucinations compared to strong supervised baselines.

preprint2026arXiv

AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

Vision-language model (VLM) agents increasingly rely on memory-augmented reinforcement learning to reuse experience across long-horizon tasks, yet most existing frameworks store memory as text and depend on proprietary teacher models to summarize or refine it. This design is poorly matched to spatial decision making: geometric priors are compressed into lossy language, and sparse interaction is often supervised through delayed textual feedback rather than dense visually grounded signals. We argue that reusable experience for VLM agents should remain visually grounded. Based on this insight, we propose \textbf{AtlasVA}, a teacher-free visual skill memory framework that organizes memory into three complementary layers: spatial heatmaps, visual exemplars, and symbolic text skills. AtlasVA further evolves danger and affinity atlases directly from trajectory statistics and lightweight grid heuristics, and reuses these self-evolving atlases as potential-based shaping rewards for reinforcement learning. This unifies perception, memory, and optimization without external LLM supervision. Experiments on \textsc{Sokoban}, \textsc{FrozenLake}, 3D embodied navigation, and 3D robotic manipulation benchmarks show that AtlasVA consistently outperforms text-centric memory baselines and competitive VLM agents, with especially strong gains on spatially intensive tasks. Homepage: https://wangpan-ustc.github.io/AtlasvaWeb

preprint2026arXiv

Quantum tunnelling-integrated optoplasmonic nanotrap enables conductance visualisation of individual proteins

Biological electron transfer (ET) relies on quantum mechanical tunnelling through a dynamically folded protein. Yet, the spatiotemporal coupling between structural fluctuations and electron flux remains poorly understood, largely due to limitations in existing experimental techniques, such as ensemble averaging and non-physiological operating conditions. Here, we introduce a quantum tunnelling-integrated optoplasmonic nanotrap (QTOP-trap), an optoelectronic platform that combines plasmonic optical trapping with real-time quantum tunnelling measurements. This label-free approach enables single-molecule resolution of protein conductance in physiological electrolytes, achieving sub-3 nm spatial precision and 10-μs temporal resolution. By synchronising optoelectronic measurements, QTOP-trap resolves protein-specific conductance signatures and directly correlates tertiary structure dynamics with conductance using a "protein switch" strategy. This methodology establishes a universal framework for dissecting non-equilibrium ET mechanisms in individual conformational-active proteins, with broad implications for bioenergetics research and biomimetic quantum device design.

preprint2023arXiv

Multi-scale multi-modal micro-expression recognition algorithm based on transformer

A micro-expression is a spontaneous unconscious facial muscle movement that can reveal the true emotions people attempt to hide. Although manual methods have made good progress and deep learning is gaining prominence. Due to the short duration of micro-expression and different scales of expressed in facial regions, existing algorithms cannot extract multi-modal multi-scale facial region features while taking into account contextual information to learn underlying features. Therefore, in order to solve the above problems, a multi-modal multi-scale algorithm based on transformer network is proposed in this paper, aiming to fully learn local multi-grained features of micro-expressions through two modal features of micro-expressions - motion features and texture features. To obtain local area features of the face at different scales, we learned patch features at different scales for both modalities, and then fused multi-layer multi-headed attention weights to obtain effective features by weighting the patch features, and combined cross-modal contrastive learning for model optimization. We conducted comprehensive experiments on three spontaneous datasets, and the results show the accuracy of the proposed algorithm in single measurement SMIC database is up to 78.73% and the F1 value on CASMEII of the combined database is up to 0.9071, which is at the leading level.

preprint2023arXiv

Reconstruction of compressed spectral imaging based on global structure and spectral correlation

In this paper, a convolutional sparse coding method based on global structure characteristics and spectral correlation is proposed for the reconstruction of compressive spectral images. The spectral data is regarded as the convolution sum of the convolution kernel and the corresponding coefficients, using the convolution kernel operates the global image information, preserving the structure information of the spectral image in the spatial dimension. To take full exploration of the constraints between spectra, the coefficients corresponding to the convolution kernel are constrained by the L_(2,1)norm to improve spectral accuracy. And, to solve the problem that convolutional sparse coding is insensitive to low frequency, the global total-variation (TV) constraint is added to estimate the low-frequency components. It not only ensures the effective estimation of the low-frequency but also transforms the convolutional sparse coding into a de-noising process, which makes the reconstructing process simpler. Simulations show that compared with the current mainstream optimization methods, the proposed method can improve the reconstruction quality by up to 4 dB in PSNR and 10% in SSIM, and has a great improvement in the details of the reconstructed image.

preprint2022arXiv

Electrical manipulation of plasmon-phonon polaritons in heterostructures of graphene on biaxial crystals

Phonon polaritons in natural anisotropic crystals hold great promise for infrared nano-optics. However, the direct electrical control of these polaritons is difficult, preventing the development of active polaritonic devices. Here we propose the heterostructures of graphene on a biaxial crystal (α-phase molybdenum trioxide) slab and theoretically study the hybridized plasmon-phonon polaritons with dependence on the Fermi level of graphene from three aspects: dispersion relationships, iso-frequency contours, and the quantum spin Hall effects. We demonstrate the distinct wavelength tunability of the plasmon-phonon polaritons modes and the optical topologic transitions from open (hyperbolic) to closed (bow-tie-like) iso-frequency contours as the increase of the Fermi level of graphene. Furthermore, we observe the tunable quantum spin Hall effects of the plasmon-phonon polaritons, manifesting propagation direction switching by the Fermi level tuning of the graphene. Our findings open opportunities for novel electrically tunable polaritonic devices and programmable quantum optical networks.

preprint2022arXiv

Experimental Performance Evaluation of Cell-free Massive MIMO Systems Using COTS RRU with OTA Reciprocity Calibration and Phase Synchronization

Downlink coherent multiuser transmission is an essential technique for cell-free massive multiple-input multiple output (MIMO) systems, and the availability of channel state information (CSI) at the transmitter is a basic requirement. To avoid CSI feedback in a time-division duplex system, the uplink channel parameters should be calibrated to obtain the downlink CSI due to the radio frequency circuit mismatch of the transceiver. In this paper, a design of a reference signal for over-the-air reciprocity calibration is proposed. The frequency domain generated reference signals can make full use of the flexible frame structure of the fifth generation (5G) new radio, which can be completely transparent to commercial off-the-shelf (COTS) remote radio units (RRUs) and commercial user equipments. To further obtain the calibration of multiple RRUs, an interleaved RRU grouping with a genetic algorithm is proposed, and an averaged Argos calibration algorithm is also presented. We develop a cell-free massive MIMO prototype system with COTS RRUs, demonstrate the statistical characteristics of the calibration error and the effectiveness of the calibration algorithm, and evaluate the impact of the calibration delay on the different cooperative transmission schemes.

preprint2022arXiv

Radiation build-up and dissipation in random fiber laser

Random fiber laser (RFL) is a complex physical system that arises from the distributed amplification and the intrinsic stochasticity of the fiber scattering. There has been widespread interest in analyzing the underlying lightwave kinetics at steady state. However, the transient state, such as the RFL build-up and dissipation, is also particularly important for unfolding lightwave interaction process. Here, we investigate for the first time the RFL dynamics at transient state, and track the RFL temporal and spectral evolution theoretically and experimentally. Particularly, with the contribution of randomly distributed feedback, the build-up of RFL shows continuous Verhulst logistic growth curves without cavity-related features, which is significantly different from the step-like growth curve of conventional fiber lasers. Furthermore, the radiation build-up duration is inversely related to the pump power, and the spectral evolution of RFL undergoes two phases from spectral density increase to spectral broadening. From steady-state to pump switch-off state, the RFL output power dissipates immediately, and the remaining Stokes lightwave from the Rayleigh scattering will gradually disappear after one round-trip. This work provides new insights into the transient dynamics features of the RFL.

preprint2021arXiv

A General Framework for Revealing Human Mind with auto-encoding GANs

Addressing the question of visualising human mind could help us to find regions that are associated with observed cognition and responsible for expressing the elusive mental image, leading to a better understanding of cognitive function. The traditional approach treats brain decoding as a classification problem, reading the mind through statistical analysis of brain activity. However, human thought is rich and varied, that it is often influenced by more of a combination of object features than a specific type of category. For this reason, we propose an end-to-end brain decoding framework which translates brain activity into an image by latent space alignment. To find the correspondence from brain signal features to image features, we embedded them into two latent spaces with modality-specific encoders and then aligned the two spaces by minimising the distance between paired latent representations. The proposed framework was trained by simultaneous electroencephalogram and functional MRI data, which were recorded when the subjects were viewing or imagining a set of image stimuli. In this paper, we focused on implementing the fMRI experiment. Our experimental results demonstrated the feasibility of translating brain activity to an image. The reconstructed image matches image stimuli approximate in both shape and colour. Our framework provides a promising direction for building a direct visualisation to reveal human mind.

preprint2021arXiv

ByteSGAN: A Semi-supervised Generative Adversarial Network for Encrypted Traffic Classification of SDN Edge Gateway in Green Communication Network

With the rapid development of Green Communication Network, the types and quantity of network traffic data are accordingly increasing. Network traffic classification become a non-trivial research task in the area of network management and security, which not only help to improve the fine-grained network resource allocation, but also enable policy-driven network management. Meanwhile, the combination of SDN and Edge Computing can leverage both SDN at its global visiability of network-wide and Edge Computing at its low latency and good privacy-preserving. However, capturing large labeled datasets is a cumbersome and time-consuming manual labor. Semi-Supervised learning is an appropriate technique to overcome this problem. With that in mind, we proposed a Generative Adversarial Network (GAN)-based Semi-Supervised Learning Encrypted Traffic Classification method called \emph{ByteSGAN} embedded in SDN Edge Gateway to achieve the goal of traffic classification in a fine-grained manner to further improve network resource utilization. ByteSGAN can only use a small number of labeled traffic samples and a large number of unlabeled samples to achieve a good performance of traffic classification by modifying the structure and loss function of the regular GAN discriminator network in a semi-supervised learning way. Based on public dataset 'ISCX2012 VPN-nonVPN', two experimental results show that the ByteSGAN can efficiently improve the performance of traffic classifier and outperform the other supervised learning method like CNN.