Source author record

Yuhui Zhang

Yuhui Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.str-el Computation and Language Artificial Intelligence Computer Vision cs.CY Machine Learning physics.app-ph eess.SP Multimedia physics.plasm-ph

Catalog footprint

What is connected

12works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Anisotropic Modality Align

Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further propose the principle of anisotropic modality gap alignment: effective modality alignment should align with the target-modality distribution while preserving the semantic structure of the source modality. Guided by this principle, we propose an anisotropic geometric correction framework, AnisoAlign, for unpaired modality alignment. This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in both geometric diagnostics and text-only MLLM training. Overall, this work recasts the modality gap from an empirical observation into a correctable, structured geometric phenomenon and provides a new representation alignment perspective for training multimodal models with unimodal data.

preprint2026arXiv

RadDiff: Describing Differences in Radiology Image Sets with Natural Language

Understanding how two radiology image sets differ is critical for generating clinical insights and for interpreting medical AI systems. We introduce RadDiff, a multimodal agentic system that performs radiologist-style comparative reasoning to describe clinically meaningful differences between paired radiology studies. RadDiff builds on a proposer-ranker framework from VisDiff, and incorporates four innovations inspired by real diagnostic workflows: (1) medical knowledge injection through domain-adapted vision-language models; (2) multimodal reasoning that integrates images with their clinical reports; (3) iterative hypothesis refinement across multiple reasoning rounds; and (4) targeted visual search that localizes and zooms in on salient regions to capture subtle findings. To evaluate RadDiff, we construct RadDiffBench, a challenging benchmark comprising 57 expert-validated radiology study pairs with ground-truth difference descriptions. On RadDiffBench, RadDiff achieves 47% accuracy, and 50% accuracy when guided by ground-truth reports, significantly outperforming the general-domain VisDiff baseline. We further demonstrate RadDiff's versatility across diverse clinical tasks, including COVID-19 phenotype comparison, racial subgroup analysis, and discovery of survival-related imaging features. Together, RadDiff and RadDiffBench provide the first method-and-benchmark foundation for systematically uncovering meaningful differences in radiological data.

preprint2022arXiv

On the Opportunities and Risks of Foundation Models

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

preprint2021arXiv

Collision dominated, ballistic, and viscous regimes of terahertz plasmonic detection by graphene

The terahertz detection performance and operating regimes of graphene plasmonic field-effect transistors (FETs) were investigated by a hydrodynamic model. Continuous wave detection simulations showed that the graphene response sensitivity is similar to that of other materials including Si, InGaAs, GaN, and diamond-based FETs. However, the pulse detection results indicated a very short response time, which favors the rapid/high-sensitively detection. The analysis on the mobility dependence of the response time revealed the same detection regimes as the traditional semiconductor materials, i.e. the non-resonant (collision dominated) regime, the resonant ballistic regime, and the viscous regime. When the kinematic viscosity (ν) is above a certain critical viscosity value, νNR, the plasmonic FETs always operates in the viscous non-resonant regime regardless of channel length (L). In this regime, the response time rises monotonically with the increase of L. When ν < νNR, the plasmonic resonance can be reached in a certain range of L (i.e. the resonant window). Within this window, the carrier transport is ballistic. For a sufficiently short channel, the graphene devices would always operate in the non-resonant regime regardless of the field-effect mobility, corresponding to another viscous regime. The above work mapped the operating regimes of graphene plasmonic FETs, and demonstrated the significance of the viscous effects for the graphene plasmonic detection. These results could be used for the extraction of the temperature dependences of viscosity in graphene.

preprint2020arXiv

Biomedical and Clinical English Model Packages in the Stanza Python NLP Library

We introduce biomedical and clinical English model packages for the Stanza Python NLP library. These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text, by combining Stanza's fully neural architecture with a wide variety of open datasets as well as large-scale unsupervised biomedical and clinical text data. We show via extensive experiments that our packages achieve syntactic analysis and named entity recognition performance that is on par with or surpasses state-of-the-art results. We further show that these models do not compromise speed compared to existing toolkits when GPU acceleration is available, and are made easy to download and use with Stanza's Python interface. A demonstration of our packages is available at: http://stanza.run/bio.

preprint2020arXiv

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Stanza includes a native Python interface to the widely used Java Stanford CoreNLP software, which further extends its functionality to cover other tasks such as coreference resolution and relation extraction. Source code, documentation, and pretrained models for 66 languages are available at https://stanfordnlp.github.io/stanza.

preprint2020arXiv

Ultrashort Pulse Detection and Response Time Analysis Using Plasma-wave Terahertz Field Effect Transistors

We report on the response characteristics of plasmonic terahertz field-effect transistors (TeraFETs) fed with femtosecond and picosecond pulses. Varying the pulse width (tpw) from 10-15 s to 10-10 s under a constant input power condition revealed two distinctive pulse detection modes. In the short pulse mode (tpw << L/s, where L is the gated channel length, s is the plasma velocity), the source-to-drain voltage response is a sharp pulse oscillatory decay preceded by a delay time on the order of L/s. The plasma wave travels along the channel like the shallow water wave with a relatively narrow wave package. In the long pulse mode (tpw > L/s), the response profile has two oscillatory decay processes and the propagation of plasma wave is analogues to oscillating rod with one side fixed. The ultimate response time at the long pulse mode is significantly higher than that under the short pulse conditions. The detection conditions under the long pulse mode are close to the step response condition, and the response time conforms well to the analytical theory for the step function response. The simulated waveform agrees well with the measured pulse response. Our results show that the measurements of the pulse response enable the material parameter extraction from the pulse response data (including the effective mass, kinematic viscosity and momentum relaxation time).

preprint2015arXiv

Entanglement Area Law in Disordered Free Fermion Anderson Model in One, Two, and Three Dimensions

We calculate numerically the entanglement entropy of free fermion ground states in one-, two- and three-dimensional Anderson models, and find that it obeys the area law as long as the linear size of the subsystem is sufficiently larger than the mean free path. This result holds in the metallic phase of the three-dimensional Anderson model, where the mean free path is finite although the localization length is infinite. Relation between the present results and earlier ones on area law violation in special one-dimensional models that support metallic phases is discussed.

preprint2014arXiv

Existence of strong-pairing quantum Hall phase in bilayer cold atom systems with dipolar interactions

We study bilayer fermionic cold atom systems with dipolar interactions, as well as a two-component tunable pseudopotential (TCTP) model which keeps only the zeroth and first Haldane pseudopotentials, at total Landau level filling factor 1/2. Our numerical results on the TCTP model indicates that Haldane-Rezayi state describes the critical point between strong and weak d-wave pairing quantum Hall phases. Further increasing the attractive zeroth pseudopotentials, the system transits from the strong-pairing phase to a stripe phase, and then to a cluster phase (or phase separation). The dipolar interaction can be mapped onto the TCTP model in the strong-pairing phase, if high order pseudopotentials are ignored. Our numerical results show that this is indeed the case, so the strong-pairing phase can be realized in the cold atom system.

preprint2013arXiv

Edge reconstruction of fractional quantum Hall liquids with spin degrees of freedom

We study the interplay of confining potential, electron-electron interaction, and Zeeman splitting at the edges of fractional quantum Hall liquids, using numerical diagonalization of finite-size systems. The filling factors studied include 1/3, 5/2, 2/5, and 2/3. In the absence of Zeeman splitting and an edge, the first two have spin fully polarized ground states, while the latter two have singlet ground states. We find that with few exceptions, edge instabilities of these systems are triggered by softening of edge spin waves for Abelian fractional quantum Hall liquids (1/3, 2/5 and 2/3 liquids), and are triggered by softening of edge magnetoplasmon excitations for non-Abelian 5/2 liquid at the smoother confinement side. Phase diagrams are obtained in the accessible parameter spaces.

preprint2013arXiv

Edge spin excitations and reconstructions of integer quantum Hall liquids

We study the effect of electron-electron interaction on the charge and spin structures at the edge of integer quantum Hall liquids, under three different kinds of confining potentials. Our exact diagonalization calculation for small systems indicates that the low energy excitations of ν=1 ferromagnetic state are bosonic edge spin waves. Instabilities of the ferromagnetic state with altering confinement strength result from the softening of these edge spin waves, and formation of edge spin textures. In ν\lesssim 2 regime, exact diagonalization on edge electron systems indicates that compact Hartree-Fock states with different total spin always become ground states in some regions of parameter space, and the ground states appear in between two compact states are their edge spin waves. The initial ν=2 instability is toward the compact state with total spin 1. Larger systems are studied using a microscopic trial wave functions, and some quantitative predictions on the edge instabilities for a certain type of confining potential are reached in the thermodynamic limit.

preprint2012arXiv

Coulomb impurity under magnetic field in graphene: a semiclassical approach

We address the problem of a Coulomb impurity in graphene in the presence of a perpendicular uniform magnetic field. We show that the problem can be solved below the supercritical impurity magnitude within the WKB approximation. Without impurity the semiclassical energies correctly reproduce the Landau level spectrum. For a given Landau level the WKB energy depends on the absolute value of angular momentum in a way which is consistent with the exact diagonalization result. Below the supercritical impurity magnitude, the WKB solution can be expanded as a convergent series in powers of the effective fine structure constant. Relevance of our results to validity of the widely used Landau level projection approximation is discussed.

Yuhui Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Anisotropic Modality Align

RadDiff: Describing Differences in Radiology Image Sets with Natural Language

On the Opportunities and Risks of Foundation Models

Collision dominated, ballistic, and viscous regimes of terahertz plasmonic detection by graphene

Biomedical and Clinical English Model Packages in the Stanza Python NLP Library

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Ultrashort Pulse Detection and Response Time Analysis Using Plasma-wave Terahertz Field Effect Transistors

Entanglement Area Law in Disordered Free Fermion Anderson Model in One, Two, and Three Dimensions

Existence of strong-pairing quantum Hall phase in bilayer cold atom systems with dipolar interactions

Edge reconstruction of fractional quantum Hall liquids with spin degrees of freedom

Edge spin excitations and reconstructions of integer quantum Hall liquids

Coulomb impurity under magnetic field in graphene: a semiclassical approach