Researcher profile

Sen Zhang

Sen Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Stable Attention Response for Reliable Precipitation Nowcasting

Precipitation nowcasting remains challenging due to the highly localized, rapidly evolving, and heterogeneous nature of atmospheric dynamics. Although recent methods increasingly adopt attention-based architectures in both unimodal and multimodal settings, they mainly emphasize stronger representation learning and prediction capacity, while paying less attention to the stability of attention responses across samples. In this work, we show that cross-sample instability of attention-response energy is an important and previously underexplored source of forecasting unreliability. Empirically, inaccurate forecasts are associated with larger attention-response energy variance across heads and layers. Theoretically, we show that cross-sample variability can propagate through self-attention, and enlarge a lower bound on prediction error. Based on this insight, we propose HARECast, a Head-wise Attention Response Energy-regulated framework for precipitation nowcasting. HARECast explicitly models head-wise attention-response energy and stabilizes it through a group-wise regularization objective that reduces cross-sample fluctuations. The proposed formulation is generic and applicable to both unimodal and multimodal nowcasting architectures. We instantiate HARECast in a standard forecasting pipeline with reconstruction branches and a diffusion-based predictor, and evaluate it on commonly used benchmarks--SEVIR and MeteoNet. Experimental results demonstrate that HARECast achieves state-of-the-art performance.

preprint2026arXiv

United We Defend: Collaborative Membership Inference Defenses in Federated Learning

Membership inference attacks (MIAs), which determine whether a specific data point was included in the training set of a target model, have posed severe threats in federated learning (FL). Unfortunately, existing MIA defenses, typically applied independently to each client in FL, are ineffective against powerful trajectory-based MIAs that exploit temporal information throughout the training process to infer membership status. In this paper, we investigate a new FL defense scenario driven by heterogeneous privacy needs and privacy-utility trade-offs, where only a subset of clients are defended, as well as a collaborative defense mode where clients cooperate to mitigate membership privacy leakage. To this end, we introduce CoFedMID, a collaborative defense framework against MIAs in FL, which limits local model memorization of training samples and, through a defender coalition, enhances privacy protection and model utility. Specifically, CoFedMID consists of three modules: a class-guided partition module for selective local training samples, a utility-aware compensation module to recycle contributive samples and prevent their overconfidence, and an aggregation-neutral perturbation module that injects noise for cancellation at the coalition level into client updates. Extensive experiments on three datasets show that our defense framework significantly reduces the performance of seven MIAs while incurring only a small utility loss. These results are consistently verified across various defense settings.

preprint2026arXiv

When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for advanced reasoning in Large Language Models (LLMs), but rollout samples are expensive to obtain, making sample efficiency a critical bottleneck. A natural remedy is to reuse each rollout batch for multiple gradient updates, a standard practice in classical RL. Yet in RLVR, this amplifies policy shift, leading to severe performance degradation. Detecting the onset of degradation early enough to stop reuse remains an open and challenging problem. We close this gap by identifying the \textit{Disproportionate Weight Divergence (DWD)} phenomenon: performance degradation is synchronized with a sharp surge in the \texttt{lm\_head} weight change, while intermediate layers remain stable. Empirically, we verify that DWD emerges consistently across diverse LLMs and tasks. Theoretically, we prove that (i) harmful gradients concentrate at the \texttt{lm\_head} while intermediate layers are structurally attenuated, and (ii) the \texttt{lm\_head} gradient norm lower-bounds the policy divergence. These results establish the \texttt{lm\_head} gradient norm as a principled, real-time signal of catastrophic policy shift. Guided by this insight, we propose \textit{Dynamic Gradient Gating (DGG)}, a lightweight intervention that monitors the \texttt{lm\_head} gradient norm in real time and intercepts harmful gradients before they corrupt the optimizer. DGG consistently matches or exceeds the standard single-use baseline, achieving up to $2.93\times$ sample efficiency and $2.14\times$ wall-clock speedup across math, ALFWorld, WebShop, and search-augmented QA tasks.

preprint2022arXiv

Criteria Comparative Learning for Real-scene Image Super-Resolution

Real-scene image super-resolution aims to restore real-world low-resolution images into their high-quality versions. A typical RealSR framework usually includes the optimization of multiple criteria which are designed for different image properties, by making the implicit assumption that the ground-truth images can provide a good trade-off between different criteria. However, this assumption could be easily violated in practice due to the inherent contrastive relationship between different image properties. Contrastive learning (CL) provides a promising recipe to relieve this problem by learning discriminative features using the triplet contrastive losses. Though CL has achieved significant success in many computer vision tasks, it is non-trivial to introduce CL to RealSR due to the difficulty in defining valid positive image pairs in this case. Inspired by the observation that the contrastive relationship could also exist between the criteria, in this work, we propose a novel training paradigm for RealSR, named Criteria Comparative Learning (Cria-CL), by developing contrastive losses defined on criteria instead of image patches. In addition, a spatial projector is proposed to obtain a good view for Cria-CL in RealSR. Our experiments demonstrate that compared with the typical weighted regression strategy, our method achieves a significant improvement under similar parameter settings.

preprint2022arXiv

Exploring Negatives in Contrastive Learning for Unpaired Image-to-Image Translation

Unpaired image-to-image translation aims to find a mapping between the source domain and the target domain. To alleviate the problem of the lack of supervised labels for the source images, cycle-consistency based methods have been proposed for image structure preservation by assuming a reversible relationship between unpaired images. However, this assumption only uses limited correspondence between image pairs. Recently, contrastive learning (CL) has been used to further investigate the image correspondence in unpaired image translation by using patch-based positive/negative learning. Patch-based contrastive routines obtain the positives by self-similarity computation and recognize the rest patches as negatives. This flexible learning paradigm obtains auxiliary contextualized information at a low cost. As the negatives own an impressive sample number, with curiosity, we make an investigation based on a question: are all negatives necessary for feature contrastive learning? Unlike previous CL approaches that use negatives as much as possible, in this paper, we study the negatives from an information-theoretic perspective and introduce a new negative Pruning technology for Unpaired image-to-image Translation (PUT) by sparsifying and ranking the patches. The proposed algorithm is efficient, flexible and enables the model to learn essential information between corresponding patches stably. By putting quality over quantity, only a few negative patches are required to achieve better results. Lastly, we validate the superiority, stability, and versatility of our model through comparative experiments.

preprint2022arXiv

foREST: A Tree-based Approach for Fuzzing RESTful APIs

Representational state transfer (REST) is a widely employed architecture by web applications and cloud. Users can invoke such services according to the specification of their application interfaces, namely RESTful APIs. Existing approaches for fuzzing RESTful APIs are generally based on classic API-dependency graphs. However, such dependencies are inefficient for REST services due to the explosion of dependencies among APIs. In this paper, we propose a novel tree-based approach that can better capture the essential dependencies and largely improve the efficiency of RESTful API fuzzing. In particular, the hierarchical information of the endpoints across multiple APIs enables us to construct an API tree, and the relationships of tree nodes can indicate the priority of resource dependencies, \textit{e.g.,} it's more likely that a node depends on its parent node rather than its offspring or siblings. In the evaluation part, we first confirm that such a tree-based approach is more efficient than traditional graph-based approaches. We then apply our tool to fuzz two real-world RESTful services and compare the performance with two state-of-the-art tools, EvoMaster and RESTler. Our results show that foREST can improve the code coverage in all experiments, ranging from 11.5\% to 82.5\%. Besides, our tool finds 11 new bugs previously unknown.

preprint2022arXiv

Information-Theoretic Odometry Learning

In this paper, we propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation, a crucial component of many robotics and vision tasks such as navigation and virtual reality where relative camera poses are required in real time. We formulate this problem as optimizing a variational information bottleneck objective function, which eliminates pose-irrelevant information from the latent representation. The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language. Specifically, we bound the generalization errors of the deep information bottleneck framework and the predictability of the latent representation. These provide not only a performance guarantee but also practical guidance for model design, sample collection, and sensor selection. Furthermore, the stochastic latent representation provides a natural uncertainty measure without the needs for extra structures or computations. Experiments on two well-known odometry datasets demonstrate the effectiveness of our method.

preprint2022arXiv

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

Depth estimation, visual odometry (VO), and bird's-eye-view (BEV) scene layout estimation present three critical tasks for driving scene perception, which is fundamental for motion planning and navigation in autonomous driving. Though they are complementary to each other, prior works usually focus on each individual task and rarely deal with all three tasks together. A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i.e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts. In this paper, we address these issues by proposing a novel joint perception framework named JPerceiver, which can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence. It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO based on a carefully-designed scale loss. Meanwhile, a cross-view and cross-modal transfer (CCT) module is devised to leverage the depth clues for reasoning road and vehicle layout through an attention mechanism. JPerceiver can be trained in an end-to-end multi-task learning way, where the CGT scale loss and CCT module promote inter-task knowledge transfer to benefit feature learning of each task. Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks in terms of accuracy, model size, and inference speed. The code and models are available at~\href{https://github.com/sunnyHelen/JPerceiver}{https://github.com/sunnyHelen/JPerceiver}.

preprint2022arXiv

Multimodal Representations Learning Based on Mutual Information Maximization and Minimization and Identity Embedding for Multimodal Sentiment Analysis

Multimodal sentiment analysis (MSA) is a fundamental complex research problem due to the heterogeneity gap between different modalities and the ambiguity of human emotional expression. Although there have been many successful attempts to construct multimodal representations for MSA, there are still two challenges to be addressed: 1) A more robust multimodal representation needs to be constructed to bridge the heterogeneity gap and cope with the complex multimodal interactions, and 2) the contextual dynamics must be modeled effectively throughout the information flow. In this work, we propose a multimodal representation model based on Mutual information Maximization and Minimization and Identity Embedding (MMMIE). We combine mutual information maximization between modal pairs, and mutual information minimization between input data and corresponding features to mine the modal-invariant and task-related information. Furthermore, Identity Embedding is proposed to prompt the downstream network to perceive the contextual information. Experimental results on two public datasets demonstrate the effectiveness of the proposed model.

preprint2022arXiv

Near-field radiative heat transfer between hybrid polaritonic structures

Near-field radiative heat transfer between close objects may exceed the far-field blackbody radiation in orders of magnitude when exploiting polaritonic materials. Great efforts have been made to experimentally measure this fundamental stochastic effect but mostly based on simple materials. In this work, we foster an all-optical method to characterize the heat transfer between less explored plasmon-phonon hybrid polaritonic systems made of graphene-SiC heterostructures. A large heat flux about 26 times of the blackbody radiation limit is obtained over a 150-nm vacuum gap, attributed to the couplings of three different surface modes (plasmon, phonon polaritons and frustrated mode). The interaction of polaritonic modes in the hybrid system is also explored to build a switchable thermophotonic device with nearly unity heat flux tunability. This work paves the way for understanding mode-mediated near-field heat transfer and provides a platform for building thermophotonic or thermo-optoelectronic blocks for various applications.

preprint2022arXiv

Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World

Monocular visual odometry (VO) has attracted extensive research attention by providing real-time vehicle motion from cost-effective camera images. However, state-of-the-art optimization-based monocular VO methods suffer from the scale inconsistency problem for long-term predictions. Deep learning has recently been introduced to address this issue by leveraging stereo sequences or ground-truth motions in the training dataset. However, it comes at an additional cost for data collection, and such training data may not be available in all datasets. In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases. Specifically, we first train a scale-aware disparity network using both monocular real images and stereo virtual data. The virtual-to-real domain gap is bridged by using an adversarial training strategy to map images from both domains into a shared feature space. The resulting scale-consistent disparities are then integrated with a direct VO system by constructing a virtual stereo objective that ensures the scale consistency over long trajectories. Additionally, to address the suboptimality issue caused by the separate optimization backend and the learning process, we further propose a mutual reinforcement pipeline that allows bidirectional information flow between learning and optimization, which boosts the robustness and accuracy of each other. We demonstrate the effectiveness of our framework on the KITTI and vKITTI2 datasets.

preprint2022arXiv

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years. Although current methods have reached a high up-to-scale accuracy, they usually fail to learn the true scale metric due to the inherent scale ambiguity from training with monocular sequences. In this work, we tackle this problem and propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics. Specifically, we first propose an IMU photometric loss and a cross-sensor photometric consistency loss to provide dense supervision and absolute scales. To fully exploit the complementary information from both sensors, we further drive a differentiable camera-centric extended Kalman filter (EKF) to update the IMU preintegrated motions when observing visual measurements. In addition, the EKF formulation enables learning an ego-motion uncertainty measure, which is non-trivial for unsupervised methods. By leveraging IMU during training, DynaDepth not only learns an absolute scale, but also provides a better generalization ability and robustness against vision degradation such as illumination change and moving objects. We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.

preprint2021arXiv

Community Preserved Social Graph Publishing with Node Differential Privacy

The goal of privacy-preserving social graph publishing is to protect individual privacy while preserving data utility. Community structure, which is an important global pattern of nodes, is a crucial data utility as it serves as fundamental operations for many graph analysis tasks. Yet, most existing methods with differential privacy (DP) commonly fall in edge-DP to sacrifice security in exchange for utility. Moreover, they reconstruct graphs from the local feature-extraction of nodes, resulting in poor community preservation. Motivated by this, we propose PrivCom, a strict node-DP graph publishing algorithm to maximize the utility on the community structure while maintaining a higher level of privacy. Specifically, to reduce the huge sensitivity, we devise a Katz index-based private graph feature extraction method, which can capture global graph structure features while greatly reducing the global sensitivity via a sensitivity regulation strategy. Yet, with a fixed sensitivity, the feature captured by Katz index, which is presented in matrix form, requires privacy budget splits. As a result, plenty of noise is injected, thereby mitigating global structural utility. To this end, we design a private Oja algorithm approximating eigen-decomposition, which yields the noisy Katz matrix via privately estimating eigenvectors and eigenvalues from extracted low-dimensional vectors. Experimental results confirm our theoretical findings and the efficacy of PrivCom.

preprint2021arXiv

Explicit pseudo-transient continuation and the trust-region updating strategy for unconstrained optimization

This paper considers an explicit continuation method and the trust-region updating strategy for the unconstrained optimization problem. Moreover, in order to improve its computational efficiency and robustness, the new method uses the switching preconditioning technique. In the well-conditioned phase, the new method uses the L-BFGS method as the preconditioning technique in order to improve its computational efficiency. Otherwise, the new method uses the inverse of the Hessian matrix as the pre-conditioner in order to improve its robustness. Numerical results aslo show that the new method is more robust and faster than the traditional optimization method such as the trust-region method and the line search method. The computational time of the new method is about one percent of that of the trust-region method (the subroutine fminunc.m of the MATLAB2019a environment, it is set by the trust-region method) or one fifth of that the line search method (fminunc.m is set by the quasi-Newton method) for the large-scale problem. Finally, the global convergence analysis of the new method is also given.

preprint2021arXiv

GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained Text Style Transfer

Non-parallel text style transfer has attracted increasing research interests in recent years. Despite successes in transferring the style based on the encoder-decoder framework, current approaches still lack the ability to preserve the content and even logic of original sentences, mainly due to the large unconstrained model space or too simplified assumptions on latent embedding space. Since language itself is an intelligent product of humans with certain grammars and has a limited rule-based model space by its nature, relieving this problem requires reconciling the model capacity of deep neural networks with the intrinsic model constraints from human linguistic rules. To this end, we propose a method called Graph Transformer based Auto Encoder (GTAE), which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level, to maximally retain the content and the linguistic structure of original sentences. Quantitative experiment results on three non-parallel text style transfer tasks show that our model outperforms state-of-the-art methods in content preservation, while achieving comparable performance on transfer accuracy and sentence naturalness.

preprint2021arXiv

Passive temperature management based on near-field heat transfer

Thermal or temperature management in modern machines has drawn great attentions in the last decades. The waste heat caused during the machine operation is particularly pernicious for the temperature-dependent electronics and may reduce the apparatus performance and lifetime. To control the operation temperature while maintaining high input powers is often a dilemma. Enormous works have been done for the purpose. Here, a passive temperature management method based on near-field heat transfer is introduced, utilizing graphene-plasmon enhanced evanescence wave tunneling. Within a pump power tolerance range of 0.5-7 KW m-2, the device can automatically regulate its thermal emissivity to quickly acquire thermal homeostasis around a designed temperature. It is compact, fully passive and could be incorporated into chip design. The results pave a promising way for passive thermal management that could be used in modern instruments in particular for vacuum environmental applications.

preprint2020arXiv

Large magnetoelectric coupling in multiferroic oxide heterostructures assembled via epitaxial lift-off

The strain dependent functional properties of epitaxial transition metal oxide films can be significantly modified via substrate selection. However, large lattice mismatches preclude dislocation-free epitaxial growth on ferroelectric substrates, whose strain states are modified by applied electric fields. Here we overcome this mismatch problem by depositing an epitaxial film of ferromagnetic La0.7Sr0.3MnO3 on a single crystal substrate of well lattice matched SrTiO3 via a film of SrRuO3 that we subsequently dissolved, permitting the transfer of unstrained La0.7Sr0.3MnO3 to a ferroelectric substrate of 0.68Pb(Mg1/3Nb2/3)O3 0.32PbTiO3 in a different crystallographic orientation. Ferroelectric domain switching, and a concomitant ferroelectric phase transition, produced large non volatile changes of magnetization that were mediated by magnetic domain rotations at locations defined by the microstructure - as revealed via high resolution vector maps of magnetization constructed from photoemission electron microscopy data, with contrast from x-ray magnetic circular dichroism. In future, our method may be exploited to control functional properties in dislocation free epitaxial films of any composition.