Researcher profile

Ao Wang

Ao Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts

Spatial-Temporal Graph (STG) forecasting on large-scale networks has garnered significant attention. However, existing models predominantly focus on short-horizon predictions and suffer from notorious computational costs and memory consumption when scaling to long-horizon predictions and large graphs. Targeting the above challenges, we present FaST, an effective and efficient framework based on heterogeneity-aware Mixture-of-Experts (MoEs) for long-horizon and large-scale STG forecasting, which unlocks one-week-ahead (672 steps at a 15-minute granularity) prediction with thousands of nodes. FaST is underpinned by two key innovations. First, an adaptive graph agent attention mechanism is proposed to alleviate the computational burden inherent in conventional graph convolution and self-attention modules when applied to large-scale graphs. Second, we propose a new parallel MoE module that replaces traditional feed-forward networks with Gated Linear Units (GLUs), enabling an efficient and scalable parallel structure. Extensive experiments on real-world datasets demonstrate that FaST not only delivers superior long-horizon predictive accuracy but also achieves remarkable computational efficiency compared to state-of-the-art baselines. Our source code is available at: https://github.com/yijizhao/FaST.

preprint2026arXiv

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

Vision-Language Models (VLMs) have shown strong promise on Optical Character Recognition (OCR), yet the sheer number of visual tokens required to encode dense documents incurs prohibitive inference cost. Existing pruning methods rely on physical eviction, e.g., permanently discarding visual tokens during the prefill stage. While effective for natural images, this strategy fundamentally breaks down on OCR, where virtually every visual token may correspond to a character or structural element, and any irreversible loss leads to catastrophic accuracy degradation. We observe that, although document images appear globally dense and seemingly unprunable, the model's attention to them is in fact temporally sparse: at each decoding step it concentrates on a small region that shifts gradually across steps, much as a human reader fixates on successive words rather than perceiving an entire page at once. Motivated by this Dynamic Visual Fixation phenomenon, we recast the intractable global pruning problem as a tractable local, dynamic one and propose FastOCR, a training-free framework with two complementary modules. Specifically, Focal-Guided Pruning identifies a small set of focal layers and selects the most task-relevant visual tokens from them at each step, while Cross-Step Fixation Reuse exploits the gradual shift of fixation to warm-start each step from the previous one. By dynamically adjusting which tokens are attended rather than evicting any from the cache, FastOCR avoids permanent information loss. Extensive experiments show that FastOCR serves as a plug-and-play acceleration module, generalizing consistently across five VLMs of varying sizes and architectures. On Qwen2.5-VL, FastOCR retains 98% of the unpruned model's accuracy while attending to only 5% of the visual tokens per decoding step, reducing attention latency by 3.0$\times$.

preprint2026arXiv

Physical Effects of Gravitational Waves at Second Order

There is currently no rigorous definition of gravitational wave strain at second order in cosmological perturbation theory. The usual association of gravitational waves with transverse and traceless fluctuations of the metric on spatial hypersurfaces becomes ambiguous at second order, as it inherently depends on the spacetime slicing. While this poses no practical issues in linearized gravity, it presents a fundamental problem for secondary gravitational waves, especially notorious for gravitational waves induced by primordial fluctuations. We compute, for the first time, the physical effects of gravitational waves at second order, as measured by geodesic observers that emit and receive electromagnetic signals, thereby settling the debate on gauge ambiguities. We find that the measured gravitational wave strain coincides with the transverse-traceless components in the Newton gauge.

preprint2026arXiv

Symmetry-engineered and electrically tunable in-plane anomalous Hall effect in oxide heterostructures

The family of Hall effects has long served as a premier probe of how symmetry, magnetic order, and topology intertwine in solids. Recently, the in-plane anomalous Hall effect (IP-AHE), a transverse Hall response driven by in-plane magnetization, has emerged as a distinct member of this family, offering innovative spintronic functionalities and illuminating intricate interplay between mirror-symmetry breaking and in-plane magnetic order. However, practical routes to deterministically and reversibly control IP-AHE remain limited. Here, we establish a symmetry-engineered IP-AHE platform, CaRuO3/La2/3Ca1/3MnO3/CaRuO3 heterostructure on NdGaO3(110), that turns strict mirror-symmetry breaking constraints into effective tuning knobs. IP-AHE in these epitaxial trilayers unambiguously couples to the CaRuO3-buffer-induced mirror-symmetry breaking and faithfully reproduces the ferromagnetic hysteresis. Ionic liquid gating further enables reversible reconfigurations of the symmetry breaking, thereby achieving electrical modulation and ON/OFF switching of IP-AHE. This highly tunable IP-AHE platform opens pathways for exploring nontrivial magnetic order and developing programmable Hall functionalities in planar geometries.

preprint2022arXiv

Arrhythmia Classifier using Binarized Convolutional Neural Network for Resource-Constrained Devices

Monitoring electrocardiogram signals is of great significance for the diagnosis of arrhythmias. In recent years, deep learning and convolutional neural networks have been widely used in the classification of cardiac arrhythmias. However, the existing neural network applied to ECG signal detection usually requires a lot of computing resources, which is not friendlyF to resource-constrained equipment, and it is difficult to realize real-time monitoring. In this paper, a binarized convolutional neural network suitable for ECG monitoring is proposed, which is hardware-friendly and more suitable for use in resource-constrained wearable devices. Targeting the MIT-BIH arrhythmia database, the classifier based on this network reached an accuracy of 95.67% in the five-class test. Compared with the proposed baseline full-precision network with an accuracy of 96.45%, it is only 0.78% lower. Importantly, it achieves 12.65 times the computing speedup, 24.8 times the storage compression ratio, and only requires a quarter of the memory overhead.

preprint2022arXiv

Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks Quantization

Cardiovascular disease (CVDs) is one of the universal deadly diseases, and the detection of it in the early stage is a challenging task to tackle. Recently, deep learning and convolutional neural networks have been employed widely for the classification of objects. Moreover, it is promising that lots of networks can be deployed on wearable devices. An increasing number of methods can be used to realize ECG signal classification for the sake of arrhythmia detection. However, the existing neural networks proposed for arrhythmia detection are not hardware-friendly enough due to a remarkable quantity of parameters resulting in memory and power consumption. In this paper, we present a 1-D adaptive loss-aware quantization, achieving a high compression rate that reduces memory consumption by 23.36 times. In order to adapt to our compression method, we need a smaller and simpler network. We propose a 17 layer end-to-end neural network classifier to classify 17 different rhythm classes trained on the MIT-BIH dataset, realizing a classification accuracy of 93.5%, which is higher than most existing methods. Due to the adaptive bitwidth method making important layers get more attention and offered a chance to prune useless parameters, the proposed quantization method avoids accuracy degradation. It even improves the accuracy rate, which is 95.84%, 2.34% higher than before. Our study achieves a 1-D convolutional neural network with high performance and low resources consumption, which is hardware-friendly and illustrates the possibility of deployment on wearable devices to realize a real-time arrhythmia diagnosis.

preprint2022arXiv

MPANet: Multi-Patch Attention For Infrared Small Target object Detection

Infrared small target detection (ISTD) has attracted widespread attention and been applied in various fields. Due to the small size of infrared targets and the noise interference from complex backgrounds, the performance of ISTD using convolutional neural networks (CNNs) is restricted. Moreover, the constriant that long-distance dependent features can not be encoded by the vanilla CNNs also impairs the robustness of capturing targets' shapes and locations in complex scenarios. To this end, a multi-patch attention network (MPANet) based on the axial-attention encoder and the multi-scale patch branch (MSPB) structure is proposed. Specially, an axial-attention-improved encoder architecture is designed to highlight the effective features of small targets and suppress background noises. Furthermore, the developed MSPB structure fuses the coarse-grained and fine-grained features from different semantic scales. Extensive experiments on the SIRST dataset show the superiority performance and effectiveness of the proposed MPANet compared to the state-of-the-art methods.

preprint2020arXiv

Anomalous thermal transport in metallic transition-metal nitrides originated from strong electron-phonon interactions

Metallic transition-metal nitrides (TMNs) are promising conductive ceramics for many applications, whose thermal transport is of great importance in device design. It is found metallic TiN and HfN hold anomalous thermal transport behaviors compared to common metals and nonmetallic TMNs. They have extremely large intrinsic phonon thermal conductivity mainly due to the large acoustic-optic phonon frequency gaps. The phonon thermal conductivity is reduced by two orders of magnitude as the phonon-isotope and phonon-electron scatterings are considered, which also induce the nontrivial temperature-independent behavior of phonon thermal conductivity. Nesting Fermi surfaces exist in both TiN and HfN, which cause the strong electron-phonon coupling strengths and heavily harm the transport of phonons and electrons. The phonon component takes an abnormally large ratio in total thermal conductivity, as 29% for TiN and 26% for HfN at 300 K. The results for thin films are also presented and it is shown that the phonon thermal conductivity can be efficiently limited by size. Our findings provide a deep understanding on the thermal transport in metallic TMNs and expand the scope of heat conduction theory in metal.

preprint2020arXiv

InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache

Internet-scale web applications are becoming increasingly storage-intensive and rely heavily on in-memory object caching to attain required I/O performance. We argue that the emerging serverless computing paradigm provides a well-suited, cost-effective platform for object caching. We present InfiniCache, a first-of-its-kind in-memory object caching system that is completely built and deployed atop ephemeral serverless functions. InfiniCache exploits and orchestrates serverless functions' memory resources to enable elastic pay-per-use caching. InfiniCache's design combines erasure coding, intelligent billed duration control, and an efficient data backup mechanism to maximize data availability and cost-effectiveness while balancing the risk of losing cached state and performance. We implement InfiniCache on AWS Lambda and show that it: (1) achieves 31 -- 96X tenant-side cost savings compared to AWS ElastiCache for a large-object-only production workload, (2) can effectively provide 95.4% data availability for each one hour window, and (3) enables comparative performance seen in a typical in-memory cache.