Source author record

Yuhao Liu

Yuhao Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.dis-nn Information Theory math.IT cond-mat.other cond-mat.str-el Machine Learning math.ST quant-ph Statistics Theory

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization

Due to the potential for exploratory reasoning of Latent Visual Reasoning, recent works tend to enable MLLMs (Multimodal Large Language Models) to perform visual reasoning by propagating continuous hidden states instead of decoding intermediate steps into discrete tokens. However, existing works typically rely on hard alignment objectives to force latent representations to match predefined visual features, thereby severely limiting the exploratory of latent reasoning process. To address this problem, we propose CoLVR (Contrastive Optimization for Latent Visual Reasoning). To obtain a more exploratory visual reasoning, CoLVR introduces a latent contrastive training framework. Firstly, CoLVR learns diverse and exploratory representations with a latent contrastive objective guided by angle-based perturbation, which expands the semantic latent space and avoids over-constrained embedding. Then, CoLVR employs a latent trajectory contrastive reward for RL (Reinforcement Learning) post-training to enable fine-grained optimization of latent visual reasoning process and thus fostering diverse reasoning behaviors. Experiments demonstrate that CoLVR significantly enhances the exploratory capability of latent representations, achieving average improvements of 5.83% on VSP and 8.00% on Jigsaw, while also outperforming existing latent models on out of domain benchmarks, with a 3.40% gain on MMStar. The data, codes, and models are released at https://github.com/Oscar-dzy/CoLVR.

preprint2026arXiv

GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation

Large-scale pretraining on Earth observation imagery has yielded powerful representations of the natural and built environment. However, most existing geospatial foundation models do not directly model the structured socioeconomic covariates typically stored in tabular form. This modality gap limits their ability to capture the complete total environment, which is critical for reasoning about complex environmental, social, and health-related outcomes. In this work, we propose GeoViSTA (Geospatial Vision-Tabular Transformer), a vision-tabular architecture that learns unified geospatial embeddings from co-registered gridded imagery and tabular data. GeoViSTA utilizes bilateral cross-attention to exchange spatial and semantic information across modalities, guided by a geography-aware attention mechanism that aligns continuous image patches with irregular census-tract tokens. We train GeoViSTA with a self-supervised joint masked-autoencoding objective, forcing it to recover missing image patches and tabular rows using local spatial context and cross-modal cues. Empirically, GeoViSTA's unified embeddings improve linear probing performance on high-impact downstream tasks, outperforming baselines in predicting disease-specific mortality and fire hazard frequency across held-out regions. These results demonstrate that jointly modeling the physical environment alongside structured socioeconomic context yields highly transferable representations for holistic geospatial inference.

preprint2022arXiv

Sparse superposition codes under VAMP decoding with generic rotational invariant coding matrices

Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder based on vector approximate message-passing (VAMP).Our main findings, based on both a standard replica symmetric potential theory and state evolution analysis, are the superiority of certain structured ensembles of coding matrices (such as partial row-orthogonal) when compared to i.i.d. matrices, as well as a spectrum-independent upper bound on VAMP's threshold. Most importantly, we derive a simple "spectral criterion " for the scheme to be at the same time capacity-achieving while having the best possible algorithmic threshold, in the "large section size" asymptotic limit. Our results therefore provide practical design principles for the coding matrices in this promising communication scheme.

preprint2022arXiv

Sparse superposition codes with rotational invariant coding matrices for memoryless channels

We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approximate message-passing decoding [2].We focus on specific binary output channels for concreteness but our analysis based on the replica symmetric method from statistical physics applies to any memoryless channel. We confirm that the "spectral criterion" introduced in [1], a coding-matrix design principle which allows the code to be capacity-achieving in the "large section size" asymptotic limit, extends to generic memoryless channels. Moreover, we also show that the vanishing error floor property [3] of this coding scheme is universal for arbitrary spectrum of the coding matrix.

preprint2021arXiv

Multi-scale Information Assembly for Image Matting

Image matting is a long-standing problem in computer graphics and vision, mostly identified as the accurate estimation of the foreground in input images. We argue that the foreground objects can be represented by different-level information, including the central bodies, large-grained boundaries, refined details, etc. Based on this observation, in this paper, we propose a multi-scale information assembly framework (MSIA-matte) to pull out high-quality alpha mattes from single RGB images. Technically speaking, given an input image, we extract advanced semantics as our subject content and retain initial CNN features to encode different-level foreground expression, then combine them by our well-designed information assembly strategy. Extensive experiments can prove the effectiveness of the proposed MSIA-matte, and we can achieve state-of-the-art performance compared to most existing matting networks.

preprint2020arXiv

Nematic Fluctuations in Iron-Oxychalcogenide Mott Insulators

Nematic fluctuations occur in a wide range of physical systems from liquid crystals to biological molecules to solids such as exotic magnets, cuprates and iron-based high-$T_c$ superconductors. Nematic fluctuations are thought to be closely linked to the formation of Cooper-pairs in iron-based superconductors. It is unclear whether the anisotropy inherent in this nematicity arises from electronic spin or orbital degrees of freedom. We have studied the iron-based Mott insulators La$_{2}$O$_{2}$Fe$_{2}$O$M$$_{2}$ $M$ = (S, Se) which are structurally similar to the iron pnictide superconductors. They are also in close electronic phase diagram proximity to the iron pnictides. Nuclear magnetic resonance (NMR) revealed a critical slowing down of nematic fluctuations as observed by the spin-lattice relaxation rate ($1/T_1$). This is complemented by the observation of a change of electrical field gradient over a similar temperature range using Mössbauer spectroscopy. The neutron pair distribution function technique applied to the nuclear structure reveals the presence of local nematic $C_2$ fluctuations over a wide temperature range while neutron diffraction indicates that global $C_{4}$ symmetry is preserved. Theoretical modeling of a geometrically frustrated spin-$1$ Heisenberg model with biquadratic and single-ion anisotropic terms provides the interpretation of magnetic fluctuations in terms of hidden quadrupolar spin fluctuations. Nematicity is closely linked to geometrically frustrated magnetism, which emerges from orbital selectivity. The results highlight orbital order and spin fluctuations in the emergence of nematicity in Fe-based oxychalcogenides. The detection of nematic fluctuation within these Mott insulator expands the group of iron-based materials that show short-range symmetry-breaking.

preprint2016arXiv

Simulating the Kibble-Zurek mechanism of the Ising model with a superconducting qubit system

The Kibble-Zurek mechanism (KZM) predicts the density of topological defects produced in the dynamical processes of phase transitions in systems ranging from cosmology to condensed matter and quantum materials. The similarity between KZM and the Landau-Zener transition (LZT), which is a standard tool to describe the dynamics of some non-equilibrium physics in contemporary physics, is being extensively exploited. Here we demonstrate the equivalence between KZM in the Ising model and LZT in a superconducting qubit system. We develop a time-resolved approach to study quantum dynamics of LZT with nano-second resolution. By using this technique, we simulate the key features of KZM in the Ising model with LZT, e.g., the boundary between the adiabatic and impulse regions, the freeze-out phenomenon in the impulse region, especially, the scaling law of the excited state population as the square root of the quenching rate. Our results supply the experimental evidence of the close connection between KZM and LZT, two textbook paradigms to study the dynamics of the non-equilibrium phenomena.

preprint2015arXiv

Deep Structured Models For Group Activity Recognition

This paper presents a deep neural-network-based hierarchical graphical model for individual and group activity recognition in surveillance scenes. Deep networks are used to recognize the actions of individual people in a scene. Next, a neural-network-based hierarchical graphical model refines the predicted labels for each class by considering dependencies between the classes. This refinement step mimics a message-passing step similar to inference in a probabilistic graphical model. We show that this approach can be effective in group activity recognition, with the deep graphical model improving recognition rates over baseline methods.