Researcher profile

Minghao Guo

Minghao Guo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Hierarchical Transformer Preconditioning for Interactive Physics Simulation

Neural preconditioners for real-time physics simulation offer promising data-driven priors, but they often fail to capture long-range couplings efficiently because they inherit local message passing or sparse-operator access patterns. We introduce the Hierarchical Transformer Preconditioner, a neural preconditioner anchored to a weak-admissibility H-matrix partition. The partition provides a multiscale structural prior (dense diagonal leaves plus coarsening off-diagonal tiles) that enables full-graph approximate-inverse computation with O(N) scaling at fixed block sizes. The network models the inverse through low-rank far-field factors and uses highway connections (axial buffers plus a global summary token) to propagate context across transformer depth. At each PCG iteration, preconditioner application reduces to batched dense GEMMs with regular memory access. The key training contribution is a cosine-Hutchinson probe objective that learns the action of MA on convergence-critical spectral subspaces, optimizing angular alignment of MAz with z rather than forcing eigenvalue clusters to a prescribed location. This removes unnecessary spectral-placement constraints from SAI-style objectives and improves conditioning on irregular spectra. Because both inference and apply are dense, dependency-free tensor programs, the full solve loop is captured as a single CUDA Graph. On stiff multiphase Poisson systems (up to 100:1 density contrast, N = 1,024-16,384), the solver runs from ~143 to ~21 fps. At N = 8,192, it reaches 17.9 ms/frame, with 2.2x speedup over GPU Jacobi, ~28x over GPU IC/DILU (AMGX multicolor_dilu), and 2.7x over neural SPAI retrained per scale on the same benchmark.

preprint2026arXiv

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later reasoning. In prior work, many visually grounded questions can be answered using only captions or textual traces, allowing answers to be inferred without preserving the fine-grained visual evidence. Meanwhile, harder cases that require reasoning over changing visual states are largely absent. Therefore, we introduce MemEye, a framework that evaluates memory capabilities from two dimensions: one measures the granularity of decisive visual evidence (from scene-level to pixel-level evidence), and the other measures how retrieved evidence must be used (from single evidence to evolutionary synthesis). Under this framework, we construct a new benchmark across 8 life-scenario tasks, with ablation-driven validation gates for assessing answerability, shortcut resistance, visual necessity, and reasoning structure. By evaluating 13 memory methods across 4 VLM backbones, we show that current architectures still struggle to preserve fine-grained visual details and reason about state changes over time. Our findings show that long-term multimodal memory depends on evidence routing, temporal tracking, and detail extraction.

preprint2026arXiv

Neural Statistical Functions

Classical deep learning typically operates on individual cases. Despite its success, real-world usage often requires repeated inference to estimate statistical quantities for complex decision-making tasks involving uncertainty or extreme-value analysis, resulting in substantial latency. We introduce neural statistical functions, a new family of models learned from pre-trained single-sample predictors and scattered data samples, which can directly infer statistics over continuous operating condition ranges without explicit sampling. By introducing the notion of prefix statistics, we transform and unify diverse statistical functions (e.g., integrals, quantiles, and maxima) into an interval-conditional framework, in which a principled identity between the prefix statistics and the individual-case regression serves as the learning objective. Neural statistical functions achieve strong performance in estimating essential statistics of complex physical processes, including accumulated energy in dynamical systems, quantiles of aerodynamic responses, and maximum stress in crash processes, while achieving up to a 100$\times$ reduction in model evaluations.

preprint2026arXiv

RigidFormer: Learning Rigid Dynamics using Transformers

Learning-based simulation of multi-object rigid-body dynamics remains difficult because contact is discontinuous and errors compound over long horizons. Most existing methods remain tied to mesh connectivity and vertex-level message passing, which limits their applicability to mesh-free inputs such as point clouds and leads to high computational cost. Efficiently modeling high-fidelity rigid-body dynamics from mesh-free representations, therefore, remains challenging. We introduce RigidFormer, an object-centric Transformer-based model that learns mesh-free rigid-body dynamics with controllable integration step sizes. RigidFormer reasons at the object level and advances each object through compact anchors; Anchor-Vertex Pooling enriches these anchors with local vertex features, retaining contact-relevant geometry without dense vertex-level interaction. We propose Anchor-based RoPE to inject anchor geometry into attention while respecting the unordered nature of objects and anchors: object-token processing is permutation-equivariant, and the mean-pooled anchor descriptor is invariant to anchor reindexing while preserving shape extent. RigidFormer further enforces rigidity by projecting updates onto the rigid-body manifold using differentiable Kabsch alignment. On standard benchmarks, RigidFormer outperforms or matches mesh-based baselines using point inputs, runs faster, generalizes to unseen point resolutions and across datasets, and scales to 200+ objects; we also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components.

preprint2026arXiv

Trust or Abstain? A Self-Aware RAG Approach

Retrieval-augmented generation (RAG) improves large language models (LLMs) by incorporating external evidence, but it also introduces knowledge conflicts when retrieved contextual knowledge (CK) and parametric knowledge (PK) disagree or are both unreliable. Existing approaches mainly coordinate which source to use, without explicitly asking whether each answer path is correct. We argue that faithful RAG requires LLM self-awareness, namely the ability to recognize the limits of its own knowledge and reasoning. To ground this problem, we construct a model-specific, ground-truth-aligned knowledge-conflict benchmark by evaluating LLM backbones on PK-only and CK-conditioned answer paths over approximately 69K query-context instances per backbone, drawn from five conflict-QA datasets. We then introduce SABER, a Self-Aware Belief Estimator for RAG that requires no LLM fine-tuning. SABER combines a self-prior with PK-side and CK-side conditional reasoning representations from multi-trace inference, then estimates reliability beliefs with two lightweight predictors to drive a 4-cell decision over trust PK, trust CK, trust either, or abstain. Across four LLM backbones, SABER improves end-to-end accuracy and conflict-specific faithfulness over ten inference-time and fine-tuning baselines, with the largest gains on conflict-heavy datasets. Under abstention, SABER's risk-coverage curve Pareto-dominates every prompt-based abstainer, providing a tunable balance between coverage and answer risk. Our code is available at https://github.com/xizhu1022/SABER.

preprint2026arXiv

WorldParticle: Unified Simulation of Lagrangian Particle Dynamics via Transformer

A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based particle simulator built on a single transformer architecture to model cloth, elastic solds, Newtonian and non-Newtonian fluids, granular materials, and molecular dynamics. Our model follows a prediction-correction design on a shared Lagrangian particle representation. An explicit predictor first advances particles under the known external forces, producing an intermediate state that captures externally driven motion but not inter-particle interactions. A learned corrector then predicts the residual position and velocity updates through three stages: a particle tokenizer that encodes local particle-particle, particle-boundary, and topology-guided interactions; a super-token encoder that hierarchically merges particle tokens into a compact set of super tokens via alternating self-attention and token merging; and a super-token decoder that lifts these super tokens back to particle resolution through cross-attention to predict per-particle position and velocity corrections. Progressive token merging reduces the attention cost at successive encoder layers by halving the token count at each level, and the decoder communicates through the compact super-token set rather than full particle-to-particle attention. Across the six dynamics categories, the same architecture generalizes to unseen materials, boundary configurations, initial conditions, and external forces. We further demonstrate downstream interactive control, inverse design, and learning from real-world manipulation data, reducing the need for per-phenomenon solver engineering.

preprint2022arXiv

Data-Efficient Graph Grammar Learning for Molecular Generation

The problem of molecular generation has received significant attention recently. Existing methods are typically based on deep neural networks and require training on large datasets with tens of thousands of samples. In practice, however, the size of class-specific chemical datasets is usually limited (e.g., dozens of samples) due to labor-intensive experimentation and data collection. This presents a considerable challenge for the deep learning generative models to comprehensively describe the molecular design space. Another major challenge is to generate only physically synthesizable molecules. This is a non-trivial task for neural network-based generative models since the relevant chemical knowledge can only be extracted and generalized from the limited training data. In this work, we propose a data-efficient generative model that can be learned from datasets with orders of magnitude smaller sizes than common benchmarks. At the heart of this method is a learnable graph grammar that generates molecules from a sequence of production rules. Without any human assistance, these production rules are automatically constructed from training data. Furthermore, additional chemical knowledge can be incorporated in the model by further grammar optimization. Our learned graph grammar yields state-of-the-art results on generating high-quality molecules for three monomer datasets that contain only ${\sim}20$ samples each. Our approach also achieves remarkable performance in a challenging polymer generation task with only $117$ training samples and is competitive against existing methods using $81$k data points. Code is available at https://github.com/gmh14/data_efficient_grammar.

preprint2020arXiv

When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks

Recent advances in adversarial attacks uncover the intrinsic vulnerability of modern deep neural networks. Since then, extensive efforts have been devoted to enhancing the robustness of deep networks via specialized learning algorithms and loss functions. In this work, we take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks. To obtain the large number of networks needed for this study, we adopt one-shot neural architecture search, training a large network for once and then finetuning the sub-networks sampled therefrom. The sampled architectures together with the accuracies they achieve provide a rich basis for our study. Our "robust architecture Odyssey" reveals several valuable observations: 1) densely connected patterns result in improved robustness; 2) under computational budget, adding convolution operations to direct connection edge is effective; 3) flow of solution procedure (FSP) matrix is a good indicator of network robustness. Based on these observations, we discover a family of robust architectures (RobNets). On various datasets, including CIFAR, SVHN, Tiny-ImageNet, and ImageNet, RobNets exhibit superior robustness performance to other widely used architectures. Notably, RobNets substantially improve the robust accuracy (~5% absolute gains) under both white-box and black-box attacks, even with fewer parameter numbers. Code is available at https://github.com/gmh14/RobNets.

preprint2019arXiv

A New Channel of Bulge Formation via The Destruction of Short Bars

Short (inner) bars of sub-kiloparsec radius have been hypothesized to be an important mechanism for driving gas inflows to small scales, thus feeding central black holes. Recent numerical simulations have shown that the growth of central black holes in galaxies can destroy short bars, when the black hole reaches a mass of $\sim 0.1\%$ of the total stellar mass of the galaxy. We study $N$-body simulations of galaxies with single and double bars to track the long-term evolution of the central stellar mass distribution. We find that the destruction of the short bar contributes significantly to the growth of the bulge. The final bulge mass is roughly equal to the sum of the masses of the initial pseudo bulge and short bar. The initially boxy/peanut-shaped bulge of Sérsic index $n\lesssim1$ is transformed into a more massive, compact structure that bears many similarities to a classical bulge, in terms of its morphology ($n \approx 2$), kinematics (dispersion-dominated, isotropic), and location on standard scaling relations (Kormendy relation, mass-size relation, and correlations between black hole mass and bulge stellar mass and velocity dispersion). Our proposed channel for forming classical bulges relies solely on the destruction of short bars without any reliance on mergers. We suggest that some of the less massive, less compact classical bulges were formed in this manner.

preprint2019arXiv

Generation of abdominal synthetic CTs from 0.35T MR images using generative adversarial networks for MR-only liver radiotherapy

Electron density maps must be accurately estimated to achieve valid dose calculation in MR-only radiotherapy. The goal of this study is to assess whether two deep learning models, the conditional generative adversarial network (cGAN) and the cycle-consistent generative adversarial network (cycleGAN), can generate accurate abdominal synthetic CT (sCT) images from 0.35T MR images for MR-only liver radiotherapy. A retrospective study was performed using CT images and 0.35T MR images of 12 patients with liver (n=8) and non-liver abdominal (n=4) cancer. CT images were deformably registered to the corresponding MR images to generate deformed CT (dCT) images for treatment planning. Both cGAN and cycleGAN were trained using MR and dCT transverse slices. Four-fold cross-validation testing was conducted to generate sCT images for all patients. The HU prediction accuracy was evaluated by voxel-wise similarity metric between each dCT and sCT image for all 12 patients. dCT-based and sCT-based dose distributions were compared using gamma and dose-volume histogram (DVH) metric analysis for 8 liver patients. sCTcycleGAN achieved the average mean absolute error (MAE) of 94.1 HU, while sCTcGAN achieved 89.8 HU. In both models, the average gamma passing rates within all volumes of interest were higher than 95% using a 2%, 2 mm criterion, and 99% using a 3%, 3 mm criterion. The average differences in the mean dose and DVH metrics were within +/-0.6% for the planning target volume and within +/-0.15% for evaluated organs in both models. Results demonstrated that abdominal sCT images generated by both cGAN and cycleGAN achieved accurate dose calculation for 8 liver radiotherapy plans. sCTcGAN images had smaller average MAE and achieved better dose calculation accuracy than sCTcyleGAN images. More abdominal patients will be enrolled in the future to further evaluate two models.