Source author record

Kai Zhao

Kai Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision cond-mat.mtrl-sci eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Multi-Scale Attention-Based Attack Diagnosis Mechanism for Parallel Cyber-Physical Attacks in Power Grids

Parallel cyber--physical attacks (PCPA) can simultaneously damage physical transmission lines and disrupt measurement data transmission in power grids, severely impairing system situational awareness and attack diagnosis. This paper investigates the attack diagnosis problem for linearized AC/DC power flow models under PCPA, where physical attacks include not only line disconnections but also admittance modifications, such as those caused by compromised distributed flexible AC transmission system (D-FACTS) devices. To address this challenge, we propose a learning-assisted attack diagnosis framework based on meta--mixed-integer programming (MMIP), which integrates a convolutional graph cross-attention attack localization (CGCA-AL) model. First, sufficient conditions for measurement reconstruction are derived, enabling the recovery of unknown measurements in attacked areas using available measurements and network topology information. Based on these conditions, the attack diagnosis problem is formulated as an MMIP model. The proposed CGCA-AL employs a multi-scale attention mechanism to predict a probability distribution over potential physical attack locations, which is incorporated into the MMIP as informative objective coefficients. By solving the resulting MMIP, both the locations and magnitudes of physical attacks are optimally estimated, and system states are subsequently reconstructed. Simulation results on IEEE 30-bus and IEEE 118-bus test systems demonstrate the effectiveness, robustness, and scalability of the proposed attack diagnosis framework under complex PCPA scenarios.

preprint2026arXiv

Attention Debiasing for Token Pruning in Vision Language Models

Vision-language models (VLMs) typically encode substantially more visual tokens than text tokens, resulting in significant token redundancy. Pruning uninformative visual tokens is therefore crucial for improving computational efficiency, and language-to-vision attention has become a widely used importance criterion for this purpose. However, we find that attention in VLMs is systematically biased. It disproportionately favors tokens appearing later in the sequence, manifesting as over-attention to lower image regions, and assigns inflated scores to semantically empty padding tokens. These behaviors stem from intrinsic recency bias and attention sink effects inherited from large language models (LLMs), and they distort attention-based pruning by preserving irrelevant visual content. To derive a pruning criterion better aligned with semantic relevance, we introduce two lightweight yet effective debiasing techniques that restore the reliability of attention. The first compensates for positional distortions by removing recency-induced attention trends, producing a content-aware and position-agnostic importance measure. The second suppresses attention sink effects by eliminating spurious attention on padding tokens. Our method is model-agnostic, pruning-method-agnostic, and task-agnostic, enabling plug-and-play integration with existing VLM pruning models. Despite its simplicity, our approach consistently delivers strong performance gains. We evaluate our method on ten vision-language benchmarks spanning both image-based and video-based tasks, in comparison with seven state-of-the-art visual token pruning methods and across two representative VLM architectures. Our method achieves substantial performance gains, demonstrating strong effectiveness and generalizability. Our code is available at https://github.com/intcomp/attention-bias.

preprint2026arXiv

Batch-Fabricated PDMS Templates for the Robotic Transfer of 2D Materials

Robotic stacking of van der Waals heterostructures has been at the verge thanks to the convergence between artificial intelligence (AI) and two-dimensional (2D) materials research. Key ingredients to fulfill this pursuit often include algorithms to identify layer compounds on chips, hard-wares to realize sophisticated operations of motion and/or rotation in a microscale, and, as importantly, highly-standardized and uniform transfer stamps that are often used in picking up layered materials under a microscope. Here, we report a hot-casted-droplet batch fabrication method for polydimethylsiloxane (PDMS) templates tailored for dry transfer of 2D materials. Controlled precursor formulation, degassing, and motorized-syringe dispensing produce dome-shaped PDMS templates with ultra-smooth surfaces (root-mean-square roughness about 0.3 nm at relatively low curing temperatures). By tuning the curing temperature, the reproducible and controllable apex curvature allows precisely defined contact area between the organic adhesive film and substrate, via thermal expansion. Our results further reveals thermalmechanical behaviors with different casting parameters of such PDMS domes. This scalable and parameterized fabrication protocol gives rise to uniform transfer-stamps with ultra-smooth surface, which may be beneficial for future AI-driven robotic assembly of 2D material heterostructures.

preprint2026arXiv

Sub-JEPA: Subspace Gaussian Regularization for Stable End-to-End World Models

Joint-Embedding Predictive Architectures (JEPAs) provide a simpleframework for learning world models by predicting future latent representations.However, JEPA training is subject to a bias-variance tradeoff.Without sufficient structural constraints, excessive representationalvariance causes the model to collapse to trivial solutions.The recent LeWorldModel (LeWM) shows that this issue can be alleviated bysimply constraining latent embeddings with an isotropic Gaussian prior.However, latent representations inherently lie on low-dimensional manifoldswithin a high-dimensional ambient space, and enforcing an isotropic Gaussianprior directly in this ambient space introduces an overly strong bias.In this work, we propose ame, which seeks a favorable operatingpoint on the bias-variance frontier by applying Gaussian constraints inmultiple random subspaces rather than in the originalembedding space.This design relaxes the global constraint while preserving itsanti-collapse effect, leading to a better balance between trainingstability and representation flexibility.Extensive experiments across fourcontinuous-control environments demonstrate that consistentlyoutperforms LeWM with very clear margins.Our method is simple yet effective, and serves as a strong baseline for future JEPA-based world model research.fdefinedeeemodeThe code is available at https://github.com/intcomp/Sub-JEPA.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint