Source author record

Shuai Guo

Shuai Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Biomolecules eess.AS Machine Learning Sound Applications Artificial Intelligence Cell Behavior Computation Computation and Language math.AG Methodology Multimedia physics.chem-ph

Catalog footprint

What is connected

11works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications

The proliferation of Large Language Models (LLMs) presents transformative potential for healthcare, yet practical deployment is hindered by the absence of frameworks that assess real-world clinical utility. Existing benchmarks test static knowledge, failing to capture the dynamic, application-oriented capabilities required in clinical practice. To bridge this gap, we introduce a Medical LLM Benchmark MLB, a comprehensive benchmark evaluating LLMs on both foundational knowledge and scenario-based reasoning. MLB is structured around five core dimensions: Medical Knowledge (MedKQA), Safety and Ethics (MedSE), Medical Record Understanding (MedRU), Smart Services (SmartServ), and Smart Healthcare (SmartCare). The benchmark integrates 22 datasets (17 newly curated) from diverse Chinese clinical sources, covering 64 clinical specialties. Its design features a rigorous curation pipeline involving 300 licensed physicians. Besides, we provide a scalable evaluation methodology, centered on a specialized judge model trained via Supervised Fine-Tuning (SFT) on expert annotations. Our comprehensive evaluation of 10 leading models reveals a critical translational gap: while the top-ranked model, Kimi-K2-Instruct (77.3% accuracy overall), excels in structured tasks like information extraction (87.8% accuracy in MedRU), performance plummets in patient-facing scenarios (61.3% in SmartServ). Moreover, the exceptional safety score (90.6% in MedSE) of the much smaller Baichuan-M2-32B highlights that targeted training is equally critical. Our specialized judge model, trained via SFT on a 19k expert-annotated medical dataset, achieves 92.1% accuracy, an F1-score of 94.37%, and a Cohen's Kappa of 81.3% for human-AI consistency, validating a reproducible and expert-aligned evaluation protocol. MLB thus provides a rigorous framework to guide the development of clinically viable LLMs.

preprint2022arXiv

Exponential canonical correlation analysis with orthogonal variation

Canonical correlation analysis (CCA) is a standard tool for studying associations between two data sources; however, it is not designed for data with count or proportion measurement types. In addition, while CCA uncovers common signals, it does not elucidate which signals are unique to each data source. To address these challenges, we propose a new framework for CCA based on exponential families with explicit modeling of both common and source-specific signals. Unlike previous methods based on exponential families, the common signals from our model coincide with canonical variables in Gaussian CCA, and the unique signals are exactly orthogonal. These modeling differences lead to a non-trivial estimation via optimization with orthogonality constraints, for which we develop an iterative algorithm based on a splitting method. Simulations show on par or superior performance of the proposed method compared to the available alternatives. We apply the method to analyze associations between gene expressions and lipids concentrations in nutrigenomic study, and to analyze associations between two distinct cell-type deconvolution methods in prostate cancer tumor heterogeneity study.

preprint2022arXiv

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training, and recipe pipelines. To the best of our knowledge, this toolkit is the first platform that allows a fair and highly-reproducible comparison between several published works in SVS. In addition, we also demonstrate several advanced usages based on the toolkit functionalities, including multilingual training and transfer learning. This paper describes the major framework of Muskits, its functionalities, and experimental results in single-singer, multi-singer, multilingual, and transfer learning scenarios. The toolkit is publicly available at https://github.com/SJTMusicTeam/Muskits.

preprint2022arXiv

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.

preprint2022arXiv

The low-entropy hydration shell at the binding site of spike RBD determines the contagiousness of SARS-CoV-2 variants

The infectivity of SARS-CoV-2 depends on the binding affinity of the receptor-binding domain (RBD) of the spike protein with the angiotensin converting enzyme 2 (ACE2) receptor. The calculated RBD-ACE2 binding energies indicate that the difference in transmission efficiency of SARS-CoV-2 variants cannot be fully explained by electrostatic interactions, hydrogen-bond interactions, van der Waals interactions, internal energy, and nonpolar solvation energies. Here, we demonstrate that low-entropy regions of hydration shells around proteins drive hydrophobic attraction between shape-matched low-entropy regions of the hydration shells, which essentially coordinates protein-protein binding in rotational-configurational space of mutual orientations and determines the binding affinity. An innovative method was used to identify the low-entropy regions of the hydration shells of the RBDs of multiple SARS-CoV-2 variants and the ACE2. We observed integral low-entropy regions of hydration shells covering the binding sites of the RBDs and matching in shape to the low-entropy region of hydration shell at the binding site of the ACE2. The RBD-ACE2 binding is thus found to be guided by hydrophobic collapse between the shape-matched low-entropy regions of the hydration shells. A measure of the low-entropy of the hydration shells can be obtained by counting the number of hydrophilic groups expressing hydrophilicity within the binding sites. The low-entropy level of hydration shells at the binding site of a spike protein is found to be an important indicator of the contagiousness of the coronavirus.

preprint2021arXiv

Hydrophobic interaction determines docking affinity of SARS CoV 2 variants with antibodies

Preliminary epidemiologic, phylogenetic and clinical findings suggest that several novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have increased transmissibility and decreased efficacy of several existing vaccines. Four mutations in the receptor-binding domain (RBD) of the spike protein that are reported to contribute to increased transmission. Understanding physical mechanism responsible for the affinity enhancement between the SARS-CoV-2 variants and ACE2 is the "urgent challenge" for developing blockers, vaccines and therapeutic antibodies against the coronavirus disease 2019 (COVID-19) pandemic. Based on a hydrophobic-interaction-based protein docking mechanism, this study reveals that the mutation N501Y obviously increased the hydrophobic attraction and decrease hydrophilic repulsion between the RBD and ACE2 that most likely caused the transmissibility increment of the variants. By analyzing the mutation-induced hydrophobic surface changes in the attraction and repulsion at the binding site of the complexes of the SARS-CoV-2 variants and antibodies, we found out that all the mutations of N501Y, E484K, K417N and L452R can selectively decrease or increase their binding affinity with some antibodies.

preprint2021arXiv

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well and are prone to over-fitting due to data scarcity. However, we often encounter data limitation problem in building SVS systems because of high data acquisition and annotation costs. In this work, we propose a Perceptual Entropy (PE) loss derived from a psycho-acoustic hearing model to regularize the network. With a one-hour open-source singing voice database, we explore the impact of the PE loss on various mainstream sequence-to-sequence models, including the RNN-based, transformer-based, and conformer-based models. Our experiments show that the PE loss can mitigate the over-fitting problem and significantly improve the synthesized singing quality reflected in objective and subjective evaluations.

preprint2020arXiv

A hydrophobic-interaction-based mechanism trigger docking between the SARS CoV 2 spike and angiotensin-converting enzyme 2

A recent experimental study found that the binding affinity between the cellular receptor human angiotensin converting enzyme 2 (ACE2) and receptor-binding domain (RBD) in spike (S) protein of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is more than 10-fold higher than that of the original severe acute respiratory syndrome coronavirus (SARS-CoV). However, main-chain structures of the SARS-CoV-2 RBD are almost the same with that of the SARS-CoV RBD. Understanding physical mechanism responsible for the outstanding affinity between the SARS-CoV-2 S and ACE2 is the "urgent challenge" for developing blockers, vaccines and therapeutic antibodies against the coronavirus disease 2019 (COVID-19) pandemic. Considering the mechanisms of hydrophobic interaction, hydration shell, surface tension, and the shielding effect of water molecules, this study reveals a hydrophobic-interaction-based mechanism by means of which SARS-CoV-2 S and ACE2 bind together in an aqueous environment. The hydrophobic interaction between the SARS-CoV-2 S and ACE2 protein is found to be significantly greater than that between SARS-CoV S and ACE2. At the docking site, the hydrophobic portions of the hydrophilic side chains of SARS-CoV-2 S are found to be involved in the hydrophobic interaction between SARS-CoV-2 S and ACE2. We propose a method to design live attenuated viruses by mutating several key amino acid residues of the spike protein to decrease the hydrophobic surface areas at the docking site. Mutation of a small amount of residues can greatly reduce the hydrophobic binding of the coronavirus to the receptor, which may be significant reduce infectivity and transmissibility of the virus.

preprint2020arXiv

The role of hydrophobic interactions in folding of $β$-sheets

Exploring the protein-folding problem has been a long-standing challenge in molecular biology. Protein folding is highly dependent on folding of secondary structures as the way to pave a native folding pathway. Here, we demonstrate that a feature of a large hydrophobic surface area covering most side-chains on one side or the other side of adjacent $β$-strands of a $β$-sheet is prevail in almost all experimentally determined $β$-sheets, indicating that folding of $β$-sheets is most likely triggered by multistage hydrophobic interactions among neighbored side-chains of unfolded polypeptides, enable $β$-sheets fold reproducibly following explicit physical folding codes in aqueous environments. $β$-turns often contain five types of residues characterized with relatively small exposed hydrophobic proportions of their side-chains, that is explained as these residues can block hydrophobic effect among neighbored side-chains in sequence. Temperature dependence of the folding of $β$-sheet is thus attributed to temperature dependence of the strength of the hydrophobicity. The hydrophobic-effect-based mechanism responsible for $β$-sheets folding is verified by bioinformatics analyses of thousands of results available from experiments. The folding codes in amino acid sequence that dictate formation of a $β$-hairpin can be deciphered through evaluating hydrophobic interaction among side-chains of an unfolded polypeptide from a $β$-strand-like thermodynamic metastable state.

preprint2020arXiv

Theoretical evidence for new adsorption sites of CO$_2$ on the Ag electrode surface

Nowadays, electrochemical reduction of CO$_2$ has been considered as an effective method to solve the problem of global warming. The primary challenge in studying the mechanism is to determine the adsorption states of CO$_2$, since complicated metal surfaces often result in many different adsorption sites. Based on the density functional theory (DFT) calculations, we performed a theoretical study on the adsorption of CO$_2$ on the Ag electrode surface. The results show that the adsorption populations of CO$_2$ are extremely sensitive to the adsorption sites. Importantly, we found that the preferable adsorption positions are the terrace sites, rather than the previous reported step sites. The adsorption populations were found with the order of (211) > (110) > (111) > (100). Subsequently, the adsorption characteristics were correlated with the d-band theory and the charge transfers between Ag surfaces and CO$_2$.

preprint2012arXiv

Gopakumar-Vafa BPS invariants, Hilbert schemes and quasimodular forms. I

We prove a closed formula for leading Gopakumar- Vafa BPS invariants of local Calabi-Yau geometries given by the canonical line bundles of toric Fano surfaces. It shares some similar features with Goettsche-Yau-Zaslow formula: Connection with Hilbert schemes, connection with quasimodular forms, and quadratic property after suitable transformation. In Part I of this paper we will present the case of projective plane, more general cases will be presented in Part II.

Shuai Guo

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications

Exponential canonical correlation analysis with orthogonal variation

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

The low-entropy hydration shell at the binding site of spike RBD determines the contagiousness of SARS-CoV-2 variants

Hydrophobic interaction determines docking affinity of SARS CoV 2 variants with antibodies

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

A hydrophobic-interaction-based mechanism trigger docking between the SARS CoV 2 spike and angiotensin-converting enzyme 2

The role of hydrophobic interactions in folding of $β$-sheets

Theoretical evidence for new adsorption sites of CO$_2$ on the Ag electrode surface

Gopakumar-Vafa BPS invariants, Hilbert schemes and quasimodular forms. I