Source author record

Weipeng Zhang

Weipeng Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.optics Computer Vision physics.app-ph Artificial Intelligence Computation and Language eess.SP eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three structural weaknesses, including high variance updates, vanishing gradients in zero-advantage regions, and exploration bottlenecks when corrective signals are insufficient. We therefore propose Asymmetric On-Policy Distillation (AOPD), which replaces ineffective negative reinforcement with localized divergence minimization in non-positive advantage regions while preserving positive reinforcement learning. Experiments on mathematical reasoning benchmarks show that AOPD consistently outperforms standard OPD, with average gains of 4.09 / 8.34 under strong / weak initialization, respectively. AOPD also maintains higher policy entropy during training and better capability retention during sequential tool-use adaptation.

preprint2026arXiv

Compact, Large-Scale Photonic Neurons by Modulation-and-Weight Microring Resonators

Neuromorphic photonics promises sub-nanosecond latency, ultrawide bandwidth, and high parallelism, but practical scalability is constrained by fabrication tolerances, spectral alignment, and tuning energy. Here, we present a large-scale, compact, and reconfigurable photonic neuron in which each microring performs modulation and weighting simultaneously. By exploiting both carrier and thermal tuning within a single device, this architecture reduces footprint, relaxes spectral alignment requirements to just two optical components, and yields a steep transfer response that lowers tuning energy. The proposed neuron supports multiple operating configurations, allowing its dynamical behavior to be adapted to different computational tasks. In particular, a short electrical feedback path enables recurrent operation, providing tunable short- and long-term memory for temporal processing. Using a 10-microring resonator array, we demonstrate both spatial and temporal computing, including a 3$\times$3 convolution for image processing with an error of $<$5\% and high-frequency financial time-series prediction. Each modulation-weighting element occupies 80$\times$45 \SI{}{\micro\meter^2} and consumes an average of \SI{0.186}{\milli\watt}, corresponding to a compute density of \SI{4.67}{TOPS/s/\milli\meter^2}. Excluding electronic power, the on-chip tuning efficiency reaches approximately \SI{105}{TOPs/\watt}, which is comparable to state-of-the-art implementations. These results indicate that modulation-and-weighting microring resonator banks provide a scalable building block for large-scale neuromorphic photonic systems, offering a favorable combination of compact footprint, low power consumption, and functional flexibility.

preprint2026arXiv

GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding

Interpreting ultra-high-resolution (UHR) remote sensing images requires models to search for sparse and tiny visual evidence across large-scale scenes. Existing remote sensing vision-language models can inspect local regions with zooming and cropping tools, but most exploration strategies follow either a one-shot focus or a single sequential trajectory. Such single-path exploration can lose global context, leave scattered regions unvisited, and revisit or count the same evidence multiple times. To this end, we propose GeoVista, a planning-driven active perception framework for UHR remote sensing interpretation. Instead of committing to one zooming path, GeoVista first builds a global exploration plan, then verifies multiple candidate regions through branch-wise local inspection, while maintaining an explicit evidence state for cross-region aggregation and de-duplication. To enable this behavior, we introduce APEX-GRO, a cold-start supervised trajectory corpus that reformulates diverse UHR tasks as Global-Region-Object interactive reasoning processes with a unified, scale-invariant spatial representation. We further design an Observe-Plan-Track mechanism for global observation, adaptive region inspection, and evidence tracking, and align the model with a GRPO-based strategy using step-wise rewards for planning, localization, and final answer correctness. Experiments on RSHR-Bench, XLRS-Bench, and LRS-VQA show that GeoVista achieves state-of-the-art performance. Code and dataset are available at https://github.com/ryan6073/GeoVista

preprint2026arXiv

SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning

Remote sensing vision-language models commonly rely on pretrained visual encoders to convert images into semantic features before language-model reasoning. While effective for scene-level understanding, this pipeline may prematurely compress local visual evidence, making fine-grained spatial reasoning vulnerable to language priors, especially in ultra-high-resolution remote sensing imagery. We present SkyNative, a native multimodal framework for remote sensing that adopts an encoder-free architecture, removing the pretrained visual backbone to directly represent images as raw patch tokens in the language-model token space. To reconcile low-level visual patches with textual tokens, SkyNative introduces a modality-aware decoupling mechanism that uses modality-specific parameters within a unified autoregressive backbone. We further introduce a visual reliance benchmark that diagnoses whether models ground their answers in image evidence through progressive visual degradation and misleading textual prompts. Across standard remote sensing understanding tasks and large-format spatial reasoning evaluations, SkyNative shows stronger image-grounded perception and improved robustness against prompt-induced language priors. These results suggest that native patch-level multimodal modeling is a promising direction for reliable remote sensing vision-language reasoning.

preprint2025arXiv

Online training and pruning of multi-wavelength photonic neural networks

CMOS-compatible photonic integrated circuits (PICs) are emerging as a promising platform in artificial intelligence (AI) computing. Owing to the compact footprint of microring resonators (MRRs) and the enhanced interconnect efficiency enabled by wavelength division multiplexing (WDM), MRR-based photonic neural networks (PNNs) are particularly promising for large-scale integration. However, the scalability and energy efficiency of such systems are fundamentally limited by the MRR resonance wavelength variations induced by fabrication process variations (FPVs) and environmental fluctuations. Existing solutions use post-fabrication approaches or thermo-optic tuning, incurring high control power and additional process complexity. In this work, we introduce an online training and pruning method that addresses this challenge, adapting to FPV-induced and thermally induced shifts in MRR resonance wavelength. By incorporating a power-aware pruning term into the conventional loss function, our approach simultaneously optimizes the PNN accuracy and the total power consumption for MRR tuning. In proof-of-concept on-chip experiments on the Iris dataset, our system PNNs can adaptively train to maintain a 96% classification accuracy, while achieving a 44.7% reduction in tuning power via pruning. Additionally, our approach reduces the power consumption by orders-of-magnitude on larger datasets. By addressing chip-to-chip variation and minimizing power requirements, our approach significantly improves the scalability and energy efficiency of MRR-based integrated analog photonic processors, paving the way for large-scale PICs to enable versatile applications including neural networks, photonic switching, LiDAR, and radio-frequency beamforming.

preprint2022arXiv

Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss

Data-driven methods have achieved notable performance on intent detection, which is a task to comprehend user queries. Nonetheless, they are controversial for over-confident predictions. In some scenarios, users do not only care about the accuracy but also the confidence of model. Unfortunately, mainstream neural networks are poorly calibrated, with a large gap between accuracy and confidence. To handle this problem defined as confidence calibration, we propose a model using the hyperspherical space and rebalanced accuracy-uncertainty loss. Specifically, we project the label vector onto hyperspherical space uniformly to generate a dense label representation matrix, which mitigates over-confident predictions due to overfitting sparce one-hot label matrix. Besides, we rebalance samples of different accuracy and uncertainty to better guide model training. Experiments on the open datasets verify that our model outperforms the existing calibration methods and achieves a significant improvement on the calibration metric.

preprint2022arXiv

Design Automation of Photonic Resonator Weights

Neuromorphic photonic processors based on resonator weight banks are an emerging candidate technology for enabling modern artificial intelligence (AI) in high speed, analog systems. These purpose-built analog devices implement vector multiplications with the physics of resonator devices, offering efficiency, latency, and throughput advantages over equivalent electronic circuits. Along with these advantages, however, often comes the difficult challenges of compensation for fabrication variations and environmental disturbances. In this paper we review sources of variation and disturbances from our experiments, as well as mathematically define quantities that model them. Then, we introduce how the physics of resonators can be exploited to weight and sum multiwavelength signals. Finally, we outline automated design and control methodologies necessary to create practical, manufacturable, and high accuracy/precision resonator weight banks that can withstand operating conditions in the field. This represents a road map for unlocking the potential of resonator weight banks in practical deployment scenarios.

preprint2022arXiv

Silicon microring synapses enable photonic deep learning beyond 9-bit precision

Deep neural networks (DNN) consist of layers of neurons interconnected by synaptic weights. A high bit-precision in weights is generally required to guarantee high accuracy in many applications. Minimizing error accumulation between layers is also essential when building large-scale networks. Recent demonstrations of photonic neural networks are limited in bit-precision due to crosstalk and the high sensitivity of optical components (e.g., resonators). Here, we experimentally demonstrate a record-high precision of 9 bits with a dithering control scheme for photonic synapses. We then numerically simulated the impact with increased synaptic precision on a wireless signal classification application. This work could help realize the potential of photonic neural networks for many practical, real-world tasks.

preprint2021arXiv

Sub-Nyquist Sampling with Optical Pulses for Photonic Blind Source Separation

We proposed and demonstrated an optical pulse sampling method for photonic blind source separation. It can separate large bandwidth of mixed signals by small sampling frequency, which can reduce the workload of digital signal processing.

preprint2014arXiv

A Latent Clothing Attribute Approach for Human Pose Estimation

As a fundamental technique that concerns several vision tasks such as image parsing, action recognition and clothing retrieval, human pose estimation (HPE) has been extensively investigated in recent years. To achieve accurate and reliable estimation of the human pose, it is well-recognized that the clothing attributes are useful and should be utilized properly. Most previous approaches, however, require to manually annotate the clothing attributes and are therefore very costly. In this paper, we shall propose and explore a \emph{latent} clothing attribute approach for HPE. Unlike previous approaches, our approach models the clothing attributes as latent variables and thus requires no explicit labeling for the clothing attributes. The inference of the latent variables are accomplished by utilizing the framework of latent structured support vector machines (LSSVM). We employ the strategy of \emph{alternating direction} to train the LSSVM model: In each iteration, one kind of variables (e.g., human pose or clothing attribute) are fixed and the others are optimized. Our extensive experiments on two real-world benchmarks show the state-of-the-art performance of our proposed approach.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint