Researcher profile

Yuhao Liu

Yuhao Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization

Due to the potential for exploratory reasoning of Latent Visual Reasoning, recent works tend to enable MLLMs (Multimodal Large Language Models) to perform visual reasoning by propagating continuous hidden states instead of decoding intermediate steps into discrete tokens. However, existing works typically rely on hard alignment objectives to force latent representations to match predefined visual features, thereby severely limiting the exploratory of latent reasoning process. To address this problem, we propose CoLVR (Contrastive Optimization for Latent Visual Reasoning). To obtain a more exploratory visual reasoning, CoLVR introduces a latent contrastive training framework. Firstly, CoLVR learns diverse and exploratory representations with a latent contrastive objective guided by angle-based perturbation, which expands the semantic latent space and avoids over-constrained embedding. Then, CoLVR employs a latent trajectory contrastive reward for RL (Reinforcement Learning) post-training to enable fine-grained optimization of latent visual reasoning process and thus fostering diverse reasoning behaviors. Experiments demonstrate that CoLVR significantly enhances the exploratory capability of latent representations, achieving average improvements of 5.83% on VSP and 8.00% on Jigsaw, while also outperforming existing latent models on out of domain benchmarks, with a 3.40% gain on MMStar. The data, codes, and models are released at https://github.com/Oscar-dzy/CoLVR.

preprint2026arXiv

GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation

Large-scale pretraining on Earth observation imagery has yielded powerful representations of the natural and built environment. However, most existing geospatial foundation models do not directly model the structured socioeconomic covariates typically stored in tabular form. This modality gap limits their ability to capture the complete total environment, which is critical for reasoning about complex environmental, social, and health-related outcomes. In this work, we propose GeoViSTA (Geospatial Vision-Tabular Transformer), a vision-tabular architecture that learns unified geospatial embeddings from co-registered gridded imagery and tabular data. GeoViSTA utilizes bilateral cross-attention to exchange spatial and semantic information across modalities, guided by a geography-aware attention mechanism that aligns continuous image patches with irregular census-tract tokens. We train GeoViSTA with a self-supervised joint masked-autoencoding objective, forcing it to recover missing image patches and tabular rows using local spatial context and cross-modal cues. Empirically, GeoViSTA's unified embeddings improve linear probing performance on high-impact downstream tasks, outperforming baselines in predicting disease-specific mortality and fire hazard frequency across held-out regions. These results demonstrate that jointly modeling the physical environment alongside structured socioeconomic context yields highly transferable representations for holistic geospatial inference.

preprint2022arXiv

Sparse superposition codes under VAMP decoding with generic rotational invariant coding matrices

Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder based on vector approximate message-passing (VAMP).Our main findings, based on both a standard replica symmetric potential theory and state evolution analysis, are the superiority of certain structured ensembles of coding matrices (such as partial row-orthogonal) when compared to i.i.d. matrices, as well as a spectrum-independent upper bound on VAMP's threshold. Most importantly, we derive a simple "spectral criterion " for the scheme to be at the same time capacity-achieving while having the best possible algorithmic threshold, in the "large section size" asymptotic limit. Our results therefore provide practical design principles for the coding matrices in this promising communication scheme.

preprint2022arXiv

Sparse superposition codes with rotational invariant coding matrices for memoryless channels

We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approximate message-passing decoding [2].We focus on specific binary output channels for concreteness but our analysis based on the replica symmetric method from statistical physics applies to any memoryless channel. We confirm that the "spectral criterion" introduced in [1], a coding-matrix design principle which allows the code to be capacity-achieving in the "large section size" asymptotic limit, extends to generic memoryless channels. Moreover, we also show that the vanishing error floor property [3] of this coding scheme is universal for arbitrary spectrum of the coding matrix.

preprint2021arXiv

Multi-scale Information Assembly for Image Matting

Image matting is a long-standing problem in computer graphics and vision, mostly identified as the accurate estimation of the foreground in input images. We argue that the foreground objects can be represented by different-level information, including the central bodies, large-grained boundaries, refined details, etc. Based on this observation, in this paper, we propose a multi-scale information assembly framework (MSIA-matte) to pull out high-quality alpha mattes from single RGB images. Technically speaking, given an input image, we extract advanced semantics as our subject content and retain initial CNN features to encode different-level foreground expression, then combine them by our well-designed information assembly strategy. Extensive experiments can prove the effectiveness of the proposed MSIA-matte, and we can achieve state-of-the-art performance compared to most existing matting networks.

preprint2020arXiv

Nematic Fluctuations in Iron-Oxychalcogenide Mott Insulators

Nematic fluctuations occur in a wide range of physical systems from liquid crystals to biological molecules to solids such as exotic magnets, cuprates and iron-based high-$T_c$ superconductors. Nematic fluctuations are thought to be closely linked to the formation of Cooper-pairs in iron-based superconductors. It is unclear whether the anisotropy inherent in this nematicity arises from electronic spin or orbital degrees of freedom. We have studied the iron-based Mott insulators La$_{2}$O$_{2}$Fe$_{2}$O$M$$_{2}$ $M$ = (S, Se) which are structurally similar to the iron pnictide superconductors. They are also in close electronic phase diagram proximity to the iron pnictides. Nuclear magnetic resonance (NMR) revealed a critical slowing down of nematic fluctuations as observed by the spin-lattice relaxation rate ($1/T_1$). This is complemented by the observation of a change of electrical field gradient over a similar temperature range using Mössbauer spectroscopy. The neutron pair distribution function technique applied to the nuclear structure reveals the presence of local nematic $C_2$ fluctuations over a wide temperature range while neutron diffraction indicates that global $C_{4}$ symmetry is preserved. Theoretical modeling of a geometrically frustrated spin-$1$ Heisenberg model with biquadratic and single-ion anisotropic terms provides the interpretation of magnetic fluctuations in terms of hidden quadrupolar spin fluctuations. Nematicity is closely linked to geometrically frustrated magnetism, which emerges from orbital selectivity. The results highlight orbital order and spin fluctuations in the emergence of nematicity in Fe-based oxychalcogenides. The detection of nematic fluctuation within these Mott insulator expands the group of iron-based materials that show short-range symmetry-breaking.