Source author record

Stan Z. Li

Stan Z. Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computational Engineering, Finance, and Science Quantitative Methods

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Discovering the Representation Bottleneck of Graph Neural Networks

Graph neural networks (GNNs) rely mainly on the message-passing paradigm to propagate node features and build interactions, and different graph learning problems require different ranges of node interactions. In this work, we explore the capacity of GNNs to capture node interactions under contexts of different complexities. We discover that GNNs usually fail to capture the most informative kinds of interaction styles for diverse graph learning tasks, and thus name this phenomenon GNNs' representation bottleneck. As a response, we demonstrate that the inductive bias introduced by existing graph construction mechanisms can result in this representation bottleneck, \emph{i.e.}, preventing GNNs from learning interactions of the most appropriate complexity. To address that limitation, we propose a novel graph rewiring approach based on interaction patterns learned by GNNs to dynamically adjust each node's receptive fields. Extensive experiments on both real-world and synthetic datasets prove the effectiveness of our algorithm in alleviating the representation bottleneck and its superiority in enhancing the performance of GNNs over state-of-the-art graph rewiring baselines.

preprint2026arXiv

Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling

Protein-protein interaction (PPI) represents a central challenge within the biology field, and accurately predicting the consequences of mutations in this context is crucial for drug design and protein engineering. Deep learning (DL) has shown promise in forecasting the effects of such mutations, but is hindered by two primary constraints. First, the structures of mutant proteins are often elusive to acquire. Secondly, PPI takes place dynamically, which is rarely integrated into the DL architecture design. To address these obstacles, we present a novel framework named Refine-PPI with two key enhancements. First, we introduce a structure refinement module trained by a mask mutation modeling (MMM) task on available wild-type structures, which is then transferred to produce the inaccessible mutant structures. Second, we employ a new kind of geometric network, called the probability density cloud network (PDC-Net), to capture 3D dynamic variations and encode the atomic uncertainty associated with PPI. Comprehensive experiments on SKEMPI.v2 substantiate the superiority of Refine-PPI over all existing tools for predicting free energy change. These findings underscore the effectiveness of our hallucination strategy and the PDC module in addressing the absence of mutant protein structure and modeling geometric uncertainty.

preprint2026arXiv

InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence Insertion Problem?

The integration of sentences poses an intriguing challenge within the realm of NLP, but it has not garnered the attention it deserves. Existing methods that focus on sentence arrangement, textual consistency, and question answering are inadequate in addressing this issue. To bridge this gap, we introduce InsertGNN, which conceptualizes the problem as a graph and employs a hierarchical Graph Neural Network (GNN) to comprehend the interplay between sentences. Our approach was rigorously evaluated on a TOEFL dataset, and its efficacy was further validated on the expansive arXiv dataset using cross-domain learning. Thorough experimentation unequivocally establishes InsertGNN's superiority over all comparative benchmarks, achieving an impressive 70% accuracy, a performance on par with average human test scores.

preprint2026arXiv

Instructor-inspired Machine Learning for Robust Molecular Property Prediction

Machine learning catalyzes a revolution in chemical and biological science. However, its efficacy heavily depends on the availability of labeled data, and annotating biochemical data is extremely laborious. To surmount this data sparsity challenge, we present an instructive learning algorithm named InstructMol to measure pseudo-labels' reliability and help the target model leverage large-scale unlabeled data. InstructMol does not require transferring knowledge between multiple domains, which avoids the potential gap between the pretraining and fine-tuning stages. We demonstrated the high accuracy of InstructMol on several real-world molecular datasets and out-of-distribution (OOD) benchmarks. Code is available at~ https://github.com/smiles724/InstructMol.

preprint2026arXiv

scHelix: Asymmetric Dual-Stream Integration via Explicit Gene-Level Disentanglement

A critical challenge in single-cell RNA sequencing (scRNA-seq) integration is resolving the tension between eliminating batch effects and maintaining biological fidelity. While recent evidence indicates that batch effects manifest heterogeneously across genes, most existing methods process the transcriptome uniformly, frequently resulting in over-correction and loss of subtle biological signals. To address this, we present scHelix, a dataset-adaptive framework that fundamentally changes how features are processed by explicitly partitioning genes into domain-invariant Anchors and domain-sensitive Variants at the input level. scHelix utilizes a dual-stream sparse diffusion encoder equipped with stop-gradient graph caching to efficiently learn multi-scale structural representations. The core of our approach is a novel asymmetric Align-Refine-Fuse protocol: the unstable Variant stream is first aligned to the robust topology of the Anchor stream, followed by a conservative refinement phase where the Anchor stream absorbs denoised details via bounded residual gating. This divide-and-conquer architecture prevents shortcut learning and ensures robust batch removal without compromising the integrity of biological clusters. Extensive benchmarking demonstrates that scHelix outperforms state-of-the-art methods.

Stan Z. Li

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Discovering the Representation Bottleneck of Graph Neural Networks

Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling

InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence Insertion Problem?

Instructor-inspired Machine Learning for Robust Molecular Property Prediction

scHelix: Asymmetric Dual-Stream Integration via Explicit Gene-Level Disentanglement