Researcher profile

Yanbing Zhang

Yanbing Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Accelerated simulation of multiscale gas-radiation coupling flows via a general synthetic iterative scheme

Gas-radiation coupling critically influences hypersonic reentry flows, where extreme temperatures induce pronounced non-equilibrium gas and radiative heat transport. Accurate and efficient simulation of radiative gas dynamics is therefore indispensable for reliable design of thermal protection systems for atmospheric entry vehicles. In this study, a Boltzmann-type kinetic model for radiative gas flows is solved across a broad spectrum of flow and radiation transport regimes using the general synthetic iterative scheme (GSIS). The approach integrates an unstructured finite-volume discrete velocity method with a set of macroscopic synthetic equations. Within this framework, the kinetic model provides high-order closures for the constitutive relations in the synthetic equations. Simultaneously, the macroscopic synthetic equations drive the evolution of the mesoscopic kinetic system, significantly accelerating steady-state convergence in near-continuum regimes, as substantiated by linear Fourier stability analysis. Crucially, the algorithm is proven to be asymptotic-preserving, correctly recovering the continuum and optically thick limits, represented by the radiative Navier-Stokes-Fourier equations governing distinct translational, rotational, vibrational, and radiative temperatures, on coarse meshes independent of the mean free path. Numerical simulations of challenging benchmarks, including three-dimensional hypersonic flow over an Apollo reentry capsule, demonstrate that GSIS achieves orders-of-magnitude speedup over conventional iterative schemes in multiscale simulations of radiative gas flows while accurately capturing non-equilibrium effects and radiative heat transfer in hypersonic environments.

preprint2026arXiv

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

preprint2026arXiv

TextLDM: Language Modeling with Continuous Latent Diffusion

Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous latents, enhanced by Representation Alignment (REPA) with a frozen pretrained language model to produce representations effective for conditional denoising. A standard DiT then performs flow matching in this latent space, identical in architecture to its visual counterpart. The central challenge we address is obtaining high-quality continuous text representations: we find that reconstruction fidelity alone is insufficient, and that aligning latent features with a pretrained language model via REPA is critical for downstream generation quality. Trained from scratch on OpenWebText2, TextLDM substantially outperforms prior diffusion language models and matches GPT-2 under the same settings. Our results establish that the visual DiT recipe transfers effectively to language, taking a concrete step toward unified diffusion architectures for multimodal generation and understanding.

preprint2026arXiv

Thinking with Novel Views: A Systematic Analysis of Generative-Augmented Spatial Intelligence

Current Large Multimodal Models (LMMs) struggle with spatial reasoning tasks requiring viewpoint-dependent understanding, largely because they are confined to a single, static observation. We propose Thinking with Novel Views (TwNV), a paradigm that integrates generative novel-view synthesis into the reasoning loop: a Reasoner LMM identifies spatial ambiguity, instructs a Painter to synthesize an alternative viewpoint, and re-examines the scene with the additional evidence. Through systematic experiments we address three research questions. (1) Instruction format: numerical camera-pose specifications yield more reliable view control than free-form language. (2) Generation fidelity: synthesized view quality is tightly coupled with downstream spatial accuracy. (3) Inference-time visual scaling: iterative multi-turn view refinement further improves performance, echoing recent scaling trends in language reasoning. Across four spatial subtask categories and four LMM architectures (both closed- and open-source), TwNV consistently improves accuracy by +1.3 to +3.9 pp, with the largest gains on viewpoint-sensitive subtasks. These results establish novel-view generation as a practical lever for advancing spatial intelligence of LMMs.

preprint2020arXiv

A note on a conjecture of star chromatic index for outerplanar graphs

A star edge coloring of a graph $G$ is a proper edge coloring of $G$ without bichromatic paths or cycles of length four. The it star chromatic index, $χ_{st}^{'} (G ),$ of $G$ is the minimum number $k$ for which $G$ has a star edge coloring by $k$ colors. In \cite{LB}, L. Bezegov$\acute{a}$ et al. conjectured that $χ_{st}^{'} (G )\leq \lfloor\frac{3Δ}{2}\rfloor+1$ when $G$ is an outerplanar graph with maximum degree $Δ\geq 3.$ In this paper we obtained that $χ_{st}^{'}(G) \leq Δ+6$ when $G$ is an 2-connected outerplanar graph with diameter 2 or 3. If $G$ is an 2-connected outerplanar graph with maximum degree 5, then $χ_{st}^{'}(G) \leq 9.$