Researcher profile

Yangshuai Wang

Yangshuai Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Unlocking Biological Workflows for Robust Protein-Text Question Answering: A Dual-Dimensional RAG Framework

Protein-Text Question Answering (QA) is crucial for interpreting biological sequences through natural language. The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) that efficiently leverages biological databases and facilitates reasoning offers a potent approach for it. However, constrained by the standard RAG pipeline, these models often rely on curated, static datasets instead of expert-proven biological workflows, lacking the fine-grained information processing and struggling to generalize to novel (OOD) proteins. To bridge this gap, we propose 2D-ProteinRAG, a novel framework that empowers LLMs to operate within the gold-standard biological research workflow (BLAST). To further extract high-quality information from noisy retrieval contexts, we introduce a dual-dimensional (2D) filtering strategy following the expert analytical paradigms. Horizontal Fine-grained Attribute Alignment utilizes a lightweight, intent-aware discriminative filter to prune irrelevant metadata and align database entries with specific user queries. Vertical Homology-based Semantic Denoising resolves functional contradictions and redundancy across multiple homologs via hierarchical clustering. Extensive evaluations on both In-Distribution and diverse biological OOD benchmarks demonstrate that 2D-ProteinRAG consistently achieves state-of-the-art performance, outperforming fine-tuned baselines and other RAG methods. Our results validate the framework's robustness and scalability, providing a practical solution for interpreting protein functions in real-world scientific scenarios.

preprint2022arXiv

A framework for a generalisation analysis of machine-learned interatomic potentials

Machine-learned interatomic potentials (MLIPs) and force fields (i.e. interaction laws for atoms and molecules) are typically trained on limited data-sets that cover only a very small section of the full space of possible input structures. MLIPs are nevertheless capable of making accurate predictions of forces and energies in simulations involving (seemingly) much more complex structures. In this article we propose a framework within which this kind of generalisation can be rigorously understood. As a prototypical example, we apply the framework to the case of simulating point defects in a crystalline solid. Here, we demonstrate how the accuracy of the simulation depends explicitly on the size of the training structures, on the kind of observations (e.g., energies, forces, force constants, virials) to which the model has been fitted, and on the fit accuracy. The new theoretical insights we gain partially justify current best practices in the MLIP literature and in addition suggest a new approach to the collection of training data and the design of loss functions.

preprint2022arXiv

Adaptive Multigrid Strategy for Geometry Optimization of Large-Scale Three Dimensional Molecular Mechanics

In this paper, we present an efficient adaptive multigrid strategy for the geometry optimization of large-scale three dimensional molecular mechanics. The resulting method can achieve significantly reduced complexity by exploiting the intrinsic low-rank property of the material configurations and by combining the state-of-the-art adaptive techniques with the hierarchical structure of multigrid algorithms. To be more precise, we develop a oneway multigrid method with adaptive atomistic/continuum (a/c) coupling, e.g., blended ghost force correction (BGFC) approximations with gradient-based a posteriori error estimators on the coarse levels. We utilize state-of-the-art 3D mesh generation techniques to effectively implement the method. For 3D crystalline defects, such as vacancies, micro-cracks and dislocations, compared with brute-force optimization, complexity with superior rates can be observed numerically, and the strategy has a five-fold acceleration in terms of CPU time for systems with $10^8$ atoms.

preprint2022arXiv

Theoretical Study of Elastic Far-Field Decay from Dislocations in Multilattices

We precisely and rigorously characterise the decay of elastic fields generated by dislocations in crystalline materials, focusing specifically on the role of multilattices. Concretely, we establish that the elastic field generated by a dislocation in a multilattice can be decomposed into a continuum field predicted by a linearised Cauchy-Born elasticity theory, and a discrete and nonlinear core corrector representing the defect core. We demonstrate both analytically and numerically the consequences of this result for cell size effects in numerical simulations.

preprint2020arXiv

A Posteriori Error Estimates for Adaptive QM/MM Coupling Methods

Hybrid quantum/molecular mechanics models (QM/MM methods) are widely used in material and molecular simulations when MM models do not provide sufficient accuracy but pure QM models are computationally prohibitive. Adaptive QM/MM coupling methods feature on-the-fly classification of atoms during the simulation, allowing the QM and MM subsystems to be updated as needed. In this work, we propose such an adaptive QM/MM method for material defect simulations based on a new residual based it a posteriori error estimator, which provides both lower and upper bounds for the true error. We validate the analysis and illustrate the effectiveness of the new scheme on numerical simulations for material defects.