Researcher profile

Hengyi Wang

Hengyi Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

FreeText: Training-Free Text Rendering in Diffusion Transformers via Attention Localization and Spectral Glyph Injection

Large-scale text-to-image (T2I) diffusion models excel at open-domain synthesis but still struggle with precise text rendering, especially for multi-line layouts, dense typography, and long-tailed scripts such as Chinese. Prior solutions typically require costly retraining or rigid external layout constraints, which can degrade aesthetics and limit flexibility. We propose \textbf{FreeText}, a training-free, plug-and-play framework that improves text rendering by exploiting intrinsic mechanisms of \emph{Diffusion Transformer (DiT)} models. \textbf{FreeText} decomposes the problem into \emph{where to write} and \emph{what to write}. For \emph{where to write}, we localize writing regions by reading token-wise spatial attribution from endogenous image-to-text attention, using sink-like tokens as stable spatial anchors and topology-aware refinement to produce high-confidence masks. For \emph{what to write}, we introduce Spectral-Modulated Glyph Injection (SGMI), which injects a noise-aligned glyph prior with frequency-domain band-pass modulation to strengthen glyph structure and suppress semantic leakage (rendering the concept instead of the word). Extensive experiments on Qwen-Image, FLUX.1-dev, and SD3 variants across longText-Benchmark, CVTG, and our CLT-Bench show consistent gains in text readability while largely preserving semantic alignment and aesthetic quality, with modest inference overhead.

preprint2025arXiv

Harish-Chandra Theorem for Two-parameter Quantum Groups

This paper is devoted to investigating the centre of two-parameter quantum groups $U_{r,s}(\mathfrak{g})$ via establishing the Harish-Chandra homomorphism. Based on the Rosso form and the representation theory of weight modules, we prove that when rank $\mathfrak{g}$ is even, the Harish-Chandra homomorphism is an isomorphism, and in particular, the centre of the quantum group $\breve{U}_{r,s}(\mathfrak{g})$ of the weight lattice type is a polynomial algebra $\mathbb{K}[z_{\varpi_1},\cdots,z_{\varpi_n}]$, where canonical central elements $z_λ\; (λ\in Λ^+)$ are turned out to be uniformly expressed. For rank $\mathfrak{g}$ to be odd, we figure out a new invertible extra central generator $z_*$, which doesn't survive in $U_q(\mathfrak g)$, then the centre of $\breve{U}_{r,s}(\mathfrak{g})$ contains $\mathbb{K}[z_{\varpi_1},\cdots,z_{\varpi_n}]\otimes_\mathbb K\mathbb K[z_*^{\frac{1}{\ell}}, z_*^{-\frac{1}{\ell}}]$, where $\ell=2$, except $\ell=4$ for $D_{2k+1}$.

preprint2022arXiv

Improving Generalization of Deep Networks for Estimating Physical Properties of Containers and Fillings

We present methods to estimate the physical properties of household containers and their fillings manipulated by humans. We use a lightweight, pre-trained convolutional neural network with coordinate attention as a backbone model of the pipelines to accurately locate the object of interest and estimate the physical properties in the CORSMAL Containers Manipulation (CCM) dataset. We address the filling type classification with audio data and then combine this information from audio with video modalities to address the filling level classification. For the container capacity, dimension, and mass estimation, we present a data augmentation and consistency measurement to alleviate the over-fitting issue in the CCM dataset caused by the limited number of containers. We augment the training data using an object-of-interest-based re-scaling that increases the variety of physical values of the containers. We then perform the consistency measurement to choose a model with low prediction variance in the same containers under different scenes, which ensures the generalization ability of the model. Our method improves the generalization ability of the models to estimate the property of the containers that were not previously seen in the training.

preprint2021arXiv

Non-autoregressive electron flow generation for reaction prediction

Reaction prediction is a fundamental problem in computational chemistry. Existing approaches typically generate a chemical reaction by sampling tokens or graph edits sequentially, conditioning on previously generated outputs. These autoregressive generating methods impose an arbitrary ordering of outputs and prevent parallel decoding during inference. We devise a novel decoder that avoids such sequential generating and predicts the reaction in a Non-Autoregressive manner. Inspired by physical-chemistry insights, we represent edge edits in a molecule graph as electron flows, which can then be predicted in parallel. To capture the uncertainty of reactions, we introduce latent variables to generate multi-modal outputs. Following previous works, we evaluate our model on USPTO MIT dataset. Our model achieves both an order of magnitude lower inference latency, with state-of-the-art top-1 accuracy and comparable performance on Top-K sampling.