Source author record

Jiawei Shao

Jiawei Shao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.CO eess.SP Machine Learning astro-ph.GA Computation and Language Computer Vision Cryptography and Security Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Large Language Model (LLM) training often optimizes for preference alignment, rewarding outputs that are perceived as helpful and interaction-friendly. However, this preference-oriented objective can be exploited: manipulative prompts can steer responses toward user-appeasing agreement and away from truth-oriented correction. In this work, we investigate whether aligned models are vulnerable to Preference-Undermining Attacks (PUA), a class of manipulative prompting strategies designed to exploit the model's desire to please user preferences at the expense of truthfulness. We propose a diagnostic methodology that provides a finer-grained and more directive analysis than aggregate benchmark scores, using a factorial evaluation framework to decompose prompt-induced shifts into interpretable effects of system objectives (truth- vs. preference-oriented) and PUA-style dialogue factors (directive control, personal derogation, conditional approval, reality denial) within a controlled $2 \times 2^4$ design. Surprisingly, more advanced models are sometimes more susceptible to manipulative prompts. Beyond the dominant reality-denial factor, we observe model-specific sign reversals and interactions with PUA-style factors, suggesting tailored defenses rather than uniform robustness. These findings offer a novel, reproducible factorial evaluation methodology that provides finer-grained diagnostics for post-training processes like RLHF, enabling better trade-offs in the product iteration of LLMs by offering a more nuanced understanding of preference alignment risks and the impact of manipulative prompts.

preprint2026arXiv

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs and passive perception, lacking the ability to adaptively determine when, whether, and how to observe the interface. We present GUI-Eyes, a reinforcement learning framework for active visual perception in GUI tasks. To acquire more informative observations, the agent learns to make strategic decisions on both whether and how to invoke visual tools, such as cropping or zooming, within a two-stage reasoning process. To support this behavior, we introduce a progressive perception strategy that decomposes decision-making into coarse exploration and fine-grained grounding, coordinated by a two-level policy. In addition, we design a spatially continuous reward function tailored to tool usage, which integrates both location proximity and region overlap to provide dense supervision and alleviate the reward sparsity common in GUI environments. On the ScreenSpot-Pro benchmark, GUI-Eyes-3B achieves 44.8% grounding accuracy using only 3k labeled samples, significantly outperforming both supervised and RL-based baselines. These results highlight that tool-aware active perception, enabled by staged policy reasoning and fine-grained reward feedback, is critical for building robust and data-efficient GUI agents.

preprint2026arXiv

ScRPO: From Errors to Insights

We introduce Self-correction Relative Policy Optimization (ScRPO), a novel reinforcement learning framework designed to empower large language models with advanced mathematical reasoning capabilities through iterative self-reflection and error correction. The ScRPO framework operates in two distinct phases: (1) Trial-and-error learning stage, where the model is trained via GRPO, and incorrect responses are collected to form an "error pool"; and (2) Self-correction learning stage, which guides the model to introspectively analyze and rectify the reasoning flaws behind its previous errors. Extensive evaluations across challenging mathematical benchmarks, including AIME, AMC, Olympiad, MATH-500, and GSM8k, validate the efficacy of our approach. Using DeepSeek-R1-Distill-Qwen-1.5B and 7B as backbones, ScRPO achieves average accuracies of 64.8% and 77.8%, respectively. This represents a significant improvement of 6.0% and 3.2% over vanilla baselines, consistently outperforming strong post-training methods such as DAPO and GRPO. These findings establish ScRPO as a robust paradigm for enabling autonomous self-improvement in AI systems, particularly in tasks with limited external feedback.

preprint2026arXiv

Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference

The expanding application of smart sensing has created a growing demand for the accurate understanding of human action at the network edge. Traditional approaches require massive video data to be transmitted from resource-constrained edge devices to powerful cloud servers, incurring prohibitive uplink bandwidth consumption and unacceptable latency while raising privacy concerns. To overcome these bottlenecks, we propose a task-oriented communication framework for human action understanding (TOAU) through edge-cloud collaboration. Our framework utilizes a monocular pose estimator to extract continuous joint coordinates from raw videos, followed by a vector quantized variational autoencoder (VQ-VAE) to convert these coordinates into discrete motion tokens. Consequently, only a compact sequence of codebook indices is transmitted over the network, consuming as few as 9 bits per frame and avoiding privacy leakages. At the cloud server, a lightweight projector aligns these motion tokens with the embedding space of a large vision-language model (VLM) to facilitate complex action understanding, which is trained with an efficient instruction tuning paradigm. Comprehensive evaluations on three benchmarks demonstrate that our TOAU system reduces the transmission payload to approximately 1\% and the system latency to around 20\% compared to video codec-based solutions, while delivering comparable action understanding accuracy.

preprint2023arXiv

Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach

This paper investigates task-oriented communication for edge inference, where a low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing. It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth. We propose a learning-based communication scheme that jointly optimizes feature extraction, source coding, and channel coding in a task-oriented manner, i.e., targeting the downstream inference task rather than data reconstruction. Specifically, we leverage an information bottleneck (IB) framework to formalize a rate-distortion tradeoff between the informativeness of the encoded feature and the inference performance. As the IB optimization is computationally prohibitive for the high-dimensional data, we adopt a variational approximation, namely the variational information bottleneck (VIB), to build a tractable upper bound. To reduce the communication overhead, we leverage a sparsity-inducing distribution as the variational prior for the VIB framework to sparsify the encoded feature vector. Furthermore, considering dynamic channel conditions in practical communication systems, we propose a variable-length feature encoding scheme based on dynamic neural networks to adaptively adjust the activated dimensions of the encoded feature to different channel conditions. Extensive experiments evidence that the proposed task-oriented communication system achieves a better rate-distortion tradeoff than baseline methods and significantly reduces the feature transmission latency in dynamic channel conditions.

preprint2022arXiv

Stochastic Coded Federated Learning with Convergence and Privacy Guarantees

Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework, where many clients collaboratively train a machine learning model by exchanging model updates with a parameter server instead of sharing their raw data. Nevertheless, FL training suffers from slow convergence and unstable performance due to stragglers caused by the heterogeneous computational resources of clients and fluctuating communication rates. This paper proposes a coded FL framework to mitigate the straggler issue, namely stochastic coded federated learning (SCFL). In this framework, each client generates a privacy-preserving coded dataset by adding additive noise to the random linear combination of its local data. The server collects the coded datasets from all the clients to construct a composite dataset, which helps to compensate for the straggling effect. In the training process, the server as well as clients perform mini-batch stochastic gradient descent (SGD), and the server adds a make-up term in model aggregation to obtain unbiased gradient estimates. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning. Besides, we demonstrate a privacy-performance tradeoff of the proposed SCFL method by analyzing the influence of the privacy constraint on the convergence rate. Finally, numerical experiments corroborate our analysis and show the benefits of SCFL in achieving fast convergence while preserving data privacy.

preprint2020arXiv

BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems

The emergence of various intelligent mobile applications demands the deployment of powerful deep learning models at resource-constrained mobile devices. The device-edge co-inference framework provides a promising solution by splitting a neural network at a mobile device and an edge computing server. In order to balance the on-device computation and the communication overhead, the splitting point needs to be carefully picked, while the intermediate feature needs to be compressed before transmission. Existing studies decoupled the design of model splitting, feature compression, and communication, which may lead to excessive resource consumption of the mobile device. In this paper, we introduce an end-to-end architecture, named BottleNet++, that consists of an encoder, a non-trainable channel layer, and a decoder for more efficient feature compression and transmission. The encoder and decoder essentially implement joint source-channel coding via convolutional neural networks (CNNs), while explicitly considering the effect of channel noise. By exploiting the strong sparsity and the fault-tolerant property of the intermediate feature in a deep neural network (DNN), BottleNet++ achieves a much higher compression ratio than existing methods. Furthermore, by providing the channel condition to the encoder as an input, our method enjoys a strong generalization ability in different channel conditions. Compared with merely transmitting intermediate data without feature compression, BottleNet++ achieves up to 64x bandwidth reduction over the additive white Gaussian noise channel and up to 256x bit compression ratio in the binary erasure channel, with less than 2% reduction in accuracy. With a higher compression ratio, BottleNet++ enables splitting a DNN at earlier layers, which leads to up to 3x reduction in on-device computation compared with other compression methods.

preprint2016arXiv

The kinetic Sunyaev-Zel'dovich tomography II: probing the circumgalactic medium

We propose the use of the kinetic Sunyaev-Zel'dovich (kSZ) effect to probe the circumgalactic medium (CGM), with the aid of a spectroscopic survey covering the same area of a SZ survey. One can design an optimal estimator of the kSZ effect of the CGM with a matched filter, and construct the cross correlation between the estimator and the peculiar velocity recovered from the galaxy survey, which can be measured by stacking a number of galaxies. We investigate two compelling profiles for the CGM, the MB profile (Maller & Bullock 2004) and the $β$ profile, and estimate the detectability against the synergy of a fiducial galaxy survey with number density $10^{-3}h^3\,$ Mpc$^{-3}$ and an ACT-like SZ survey. We show that the shape of the filter does not change much with redshift for the $β$ profile, while there are significant side lobes at $z<0.1$ for the MB profile. By stacking $\sim 10^4$ Milky Way-size halos around z $\sim 0.5$, one can get $\gtrsim$ 1 $σ$ signal to noise (S/N) for the both profiles. The S/N increases with decreasing redshift before it reaches a maximum ($\sim$ 7.5 at z $\simeq$ 0.15 for the MB profile, $\sim 19$ at $z\simeq 0.03$ for the $β$ profile). Due to the large beam size, a Planck-like CMB survey can marginally detect the kSZ signal by stacking the same number of galaxies at $z<0.1$. The search for the CGM in realistic surveys will involve dividing the galaxies into subsamples with similar redshift and mass of host halos, and scaling the results presented here to obtain the S/N.

preprint2011arXiv

The kinetic SZ tomography with spectroscopic redshift surveys

The kinetic Sunyaev Zel'dovich effect (kSZ) effect is a potentially powerful probe to the missing baryons. However, the kSZ signal is overwhelmed by various contaminations and the cosmological application is hampered by loss of redshift information due to the projection effect. We propose a kSZ tomography method to alleviate these problems, with the aid of galaxy spectroscopic redshift surveys. We propose to estimate the large scale peculiar velocity through the 3D galaxy distribution, weigh it by the 3D galaxy density and adopt the product projected along the line of sight with a proper weighting as an estimator of the true kSZ temperature fluctuation $Θ$. We thus propose to measure the kSZ signal through the $\HatΘ$-$Θ$ cross correlation. This approach has a number of advantages (see details in the abstract of the paper). We test the proposed kSZ tomography against non-adiabatic and adiabatic hydrodynamical simulations. We confirm that $\hatΘ$ is indeed tightly correlated with $Θ$ at $k\la 1h/$Mpc, although nonlinearities in the density and velocity fields and nonlinear redshift distortion do weaken the tightness of the $\hatΘ$-$Θ$ correlation. We further quantify the reconstruction noise in $\HatΘ$ from galaxy distribution shot noise. Based on these results, we quantify the applicability of the proposed kSZ tomography for future surveys. We find that, in combination with the BigBOSS-N spectroscopic redshift survey, the PLANCK CMB experiment will be able to detect the kSZ with an overall significance of $\sim 50σ$ and further measure its redshift distribution at many redshift bins over $0<z<2$.

preprint2011arXiv

The thermal SZ tomography

The thermal Sunyaev-Zel'dovich (tSZ) effect directly measures the thermal pressure of free electrons integrated along the line of sight and thus contains valuable information on the thermal history of the universe. However, the redshift information is entangled in the projection along the line of sight. This projection effect severely degrades the power of the tSZ effect to reconstruct the thermal history. We investigate the tSZ tomography technique to recover this otherwise lost redshift information by cross correlating the tSZ effect with galaxies of known redshifts, or alternatively with matter distribution reconstructed from weak lensing tomography. We investigate in detail the 3D distribution of the gas thermal pressure and its relation with the matter distribution, through our adiabatic hydrodynamic simulation and the one with additional gastrophysics including radiative cooling, star formation and supernova feedback. (1) We find a strong correlation between the gas pressure and matter distribution, with a typical cross correlation coefficient r ~ 0.7 at k . 3h/Mpc and z < 2. This tight correlation will enable robust cross correlation measurement between SZ surveys such as Planck, ACT and SPT and lensing surveys such as DES and LSST, at ~20-100σ level. (2) We propose a tomography technique to convert the measured cross correlation into the contribution from gas in each redshift bin to the tSZ power spectrum. Uncertainties in gastrophysics may affect the reconstruction at ~ 2% level, due to the ~ 1% impact of gastrophysics on r, found in our simulations. However, we find that the same gastrophysics affects the tSZ power spectrum at ~ 40% level, so it is robust to infer the gastrophysics from the reconstructed redshift resolved contribution.

Jiawei Shao

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

ScRPO: From Errors to Insights

Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference

Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach

Stochastic Coded Federated Learning with Convergence and Privacy Guarantees

BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems

The kinetic Sunyaev-Zel'dovich tomography II: probing the circumgalactic medium

The kinetic SZ tomography with spectroscopic redshift surveys

The thermal SZ tomography