Source author record

Hengyi Wang

Hengyi Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning math.QA physics.chem-ph

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FreeText: Training-Free Text Rendering in Diffusion Transformers via Attention Localization and Spectral Glyph Injection

Large-scale text-to-image (T2I) diffusion models excel at open-domain synthesis but still struggle with precise text rendering, especially for multi-line layouts, dense typography, and long-tailed scripts such as Chinese. Prior solutions typically require costly retraining or rigid external layout constraints, which can degrade aesthetics and limit flexibility. We propose \textbf{FreeText}, a training-free, plug-and-play framework that improves text rendering by exploiting intrinsic mechanisms of \emph{Diffusion Transformer (DiT)} models. \textbf{FreeText} decomposes the problem into \emph{where to write} and \emph{what to write}. For \emph{where to write}, we localize writing regions by reading token-wise spatial attribution from endogenous image-to-text attention, using sink-like tokens as stable spatial anchors and topology-aware refinement to produce high-confidence masks. For \emph{what to write}, we introduce Spectral-Modulated Glyph Injection (SGMI), which injects a noise-aligned glyph prior with frequency-domain band-pass modulation to strengthen glyph structure and suppress semantic leakage (rendering the concept instead of the word). Extensive experiments on Qwen-Image, FLUX.1-dev, and SD3 variants across longText-Benchmark, CVTG, and our CLT-Bench show consistent gains in text readability while largely preserving semantic alignment and aesthetic quality, with modest inference overhead.

preprint2025arXiv

Harish-Chandra Theorem for Two-parameter Quantum Groups

This paper is devoted to investigating the centre of two-parameter quantum groups $U_{r,s}(\mathfrak{g})$ via establishing the Harish-Chandra homomorphism. Based on the Rosso form and the representation theory of weight modules, we prove that when rank $\mathfrak{g}$ is even, the Harish-Chandra homomorphism is an isomorphism, and in particular, the centre of the quantum group $\breve{U}_{r,s}(\mathfrak{g})$ of the weight lattice type is a polynomial algebra $\mathbb{K}[z_{\varpi_1},\cdots,z_{\varpi_n}]$, where canonical central elements $z_λ\; (λ\in Λ^+)$ are turned out to be uniformly expressed. For rank $\mathfrak{g}$ to be odd, we figure out a new invertible extra central generator $z_*$, which doesn't survive in $U_q(\mathfrak g)$, then the centre of $\breve{U}_{r,s}(\mathfrak{g})$ contains $\mathbb{K}[z_{\varpi_1},\cdots,z_{\varpi_n}]\otimes_\mathbb K\mathbb K[z_*^{\frac{1}{\ell}}, z_*^{-\frac{1}{\ell}}]$, where $\ell=2$, except $\ell=4$ for $D_{2k+1}$.

preprint2022arXiv

Improving Generalization of Deep Networks for Estimating Physical Properties of Containers and Fillings

We present methods to estimate the physical properties of household containers and their fillings manipulated by humans. We use a lightweight, pre-trained convolutional neural network with coordinate attention as a backbone model of the pipelines to accurately locate the object of interest and estimate the physical properties in the CORSMAL Containers Manipulation (CCM) dataset. We address the filling type classification with audio data and then combine this information from audio with video modalities to address the filling level classification. For the container capacity, dimension, and mass estimation, we present a data augmentation and consistency measurement to alleviate the over-fitting issue in the CCM dataset caused by the limited number of containers. We augment the training data using an object-of-interest-based re-scaling that increases the variety of physical values of the containers. We then perform the consistency measurement to choose a model with low prediction variance in the same containers under different scenes, which ensures the generalization ability of the model. Our method improves the generalization ability of the models to estimate the property of the containers that were not previously seen in the training.

preprint2021arXiv

Non-autoregressive electron flow generation for reaction prediction

Reaction prediction is a fundamental problem in computational chemistry. Existing approaches typically generate a chemical reaction by sampling tokens or graph edits sequentially, conditioning on previously generated outputs. These autoregressive generating methods impose an arbitrary ordering of outputs and prevent parallel decoding during inference. We devise a novel decoder that avoids such sequential generating and predicts the reaction in a Non-Autoregressive manner. Inspired by physical-chemistry insights, we represent edge edits in a molecule graph as electron flows, which can then be predicted in parallel. To capture the uncertainty of reactions, we introduce latent variables to generate multi-modal outputs. Following previous works, we evaluate our model on USPTO MIT dataset. Our model achieves both an order of magnitude lower inference latency, with state-of-the-art top-1 accuracy and comparable performance on Top-K sampling.