Researcher profile

Binbin Shi

Binbin Shi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit five recent frontier and grounding-aware VLMs (Gemini~2.5~Pro, GPT-5, o3, GLM-4.5V, Qwen~2.5~VL) on Medical VQA along two trust-relevant axes. Perception: all models localize anatomical and pathological targets poorly -- the best model reaches only 0.23 mean IoU and 19.1% Acc@0.5 -- and exhibit clinically dangerous laterality confusion. Pipeline integration: a self-grounding pipeline, where the same model localizes then answers, degrades VQA accuracy for every model -- driven by both inaccurate localization and format-compliance failures under the two-step prompt (parse failure rises to 70%--99% for Gemini and GPT-5 on VQA-RAD). Replacing predicted boxes with ground-truth annotations recovers and improves VQA accuracy, consistent with the failure residing in the perception module rather than in the decomposition itself. These observational findings identify grounding quality as a primary trustworthiness bottleneck in our SLAKE bounding-box setting. As a complementary fine-tuning follow-up, supervised fine-tuning of Qwen~2.5~VL on combined Med-VQA training data attains the highest reported SLAKE open-ended recall (85.5%) among comparable methods, suggesting that the VQA-level gap is tractable with domain adaptation; whether this also closes the perception/trustworthiness bottleneck is left to future work.

preprint2026arXiv

Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering

Medical retrieval-augmented generation (RAG) systems typically operate on text chunks extracted from biomedical literature, discarding the rich visual content (tables, figures, structured layouts) of original document pages. We propose MED-VRAG, an iterative multimodal RAG framework that retrieves and reasons over PMC document page images instead of OCR'd text. The system pairs ColQwen2.5 patch-level page embeddings with a sharded MapReduce LLM filter, scaling to ~350K pages while keeping Stage-1 retrieval under 30 ms via an offline coarse-to-fine index (C=8 centroids per page, ANN over centroids, exact two-way scoring on the top-R shortlist). A vision-language model (VLM) then iteratively refines its query and accumulates evidence in a memory bank across up to 3 reasoning rounds, with a single iteration costing ~15.9 s and the full three-round pipeline ~47.8 s on 4xA100. Across four medical QA benchmarks (MedQA, MedMCQA, PubMedQA, MMLU-Med), MEDVRAG reaches 78.6% average accuracy. Under controlled comparison with the same Qwen2.5-VL-32B backbone, retrieval contributes a +5.8 point gain over the no-retrieval baseline; we also note a +1.8 point edge over MedRAG + GPT-4 (76.8%), with the caveat that this is a cross-paper rather than head-to-head comparison. Ablations isolate +1.0 from page-image vs text-chunk retrieval, +1.5 from iteration, and +1.0 from the memory bank.

preprint2026arXiv

Structured Personality Control and Adaptation for LLM Agents

Large Language Models (LLMs) are increasingly shaping human-computer interaction (HCI), from personalized assistants to social simulations. Beyond language competence, researchers are exploring whether LLMs can exhibit human-like characteristics that influence engagement, decision-making, and perceived realism. Personality, in particular, is critical, yet existing approaches often struggle to achieve both nuanced and adaptable expression. We present a framework that models LLM personality via Jungian psychological types, integrating three mechanisms: a dominant-auxiliary coordination mechanism for coherent core expression, a reinforcement-compensation mechanism for temporary adaptation to context, and a reflection mechanism that drives long-term personality evolution. This design allows the agent to maintain nuanced traits while dynamically adjusting to interaction demands and gradually updating its underlying structure. Personality alignment is evaluated using Myers-Briggs Type Indicator questionnaires and tested under diverse challenge scenarios as a preliminary structured assessment. Findings suggest that evolving, personality-aware LLMs can support coherent, context-sensitive interactions, enabling naturalistic agent design in HCI.

preprint2023arXiv

Suppression of blow-up in 3-D Keller-Segel model via Couette flow in whole space

In this paper, we study the 3-D parabolic-parabolic and parabolic-elliptic Keller-Segel models with Couette flow in $\mathbb{R}^3$. We prove that the blow-up phenomenon of solution can be suppressed by enhanced dissipation of large Couette flows. Here we develop Green's function method to describe the enhanced dissipation via a more precise space-time structure and obtain the global existence together with pointwise estimates of the solutions. The result of this paper shows that the enhanced dissipation exists for all frequencies in the case of whole space and it is reason that we obtain global existence for 3-D Keller-Segel models here. It is totally different from the case with the periodic spatial variable $x$ in [2,10]. This paper provides a new methodology to capture dissipation enhancement and also a surprising result which shows a totally new mechanism.

preprint2021arXiv

Suppression of blow up by mixing in generalized Keller-Segel system with fractional dissipation and strong singular kernel

In this paper, we consider the Cauchy problem for a generalized parabolic-elliptic Keller-Segel equation with a fractional dissipation and advection by a weakly mixing (see Definition \ref{def:2.4}). Here the attractive kernel has strong singularity, namely, the derivative appears in the nonlinear term by singular integral. Without advection, the solution of equation blows up in finite time. Under a suitable mixing condition on the advection, we show the global existence of classical solution with large initial data in the case of the derivative of dissipative term is higher than that of nonlinear term. Since the attractive kernel is strong singularity, the weakly mixing has destabilizing effect in addition to the enhanced dissipation effect, which makes the problem more complicated and difficult. In this paper, we establish the $L^\infty$-criterion and obtain the global $L^\infty$ estimate of the solution through some new ideas and techniques. Combined with \cite{Shi.2019}, we discuss all cases of generalized Keller-Segel system with mixing effect, which was proposed by Kiselev, Xu (see \cite{Kiselev.2016}) and Hopf, Rodrigo (see \cite{Hopf.2018}). Based on more precise estimate of solution and the resolvent estimate of semigroup operator, we introduce a new method to study the enhanced dissipation effect of mixing in generalized parabolic-elliptic Keller-Segel equation with a fractional dissipation. And the RAGE theorem is no longer needed in our analysis.

preprint2020arXiv

Deep Mask For X-ray Based Heart Disease Classification

We build a deep learning model to detect and classify heart disease using $X-ray$. We collect data from several hospitals and public datasets. After preprocess we get 3026 images including disease type VSD, ASD, TOF and normal control. The main problem we have to solve is to enable the network to accurately learn the characteristics of the heart, to ensure the reliability of the network while increasing accuracy. By learning the doctor's diagnostic experience, labeling the image and using tools to extract masks of heart region, we train a U-net to generate a mask to give more attention. It forces the model to focus on the characteristics of the heart region and obtain more reliable results.