Source author record

Xiaohao Cai

Xiaohao Cai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.NA astro-ph.CO Graphics Machine Learning astro-ph.IM Computation and Language cs.CY Numerical Analysis Sound

Catalog footprint

What is connected

13works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AnchorRoute: Human Motion Synthesis with Interval-Routed Sparse Contro

Sparse anchors provide a compact interface for human motion authoring: users specify a few root positions, planar trajectory samples, or body-point targets, while the system synthesizes the full-body motion that completes the under-specified intent. We present AnchorRoute, a sparse-anchor motion synthesis framework that uses anchors as a shared scaffold for both generation and refinement. Before generation, AnchorRoute converts sparse anchors into anchor-condition features and injects the resulting condition memory into a frozen Transition Masked Diffusion prior through AnchorKV and dual-context conditioning. This preserves the generation quality of the pretrained text-to-motion prior while learning sparse spatial control. After generation, the same anchors are evaluated as residuals: their timestamps define refinement intervals, and their residuals determine where correction should be concentrated. RouteSolver then refines the motion by projecting soft-token updates onto anchor-defined piecewise-affine interval bases. This couples generation-time anchor conditioning with residual-routed refinement under one anchor scaffold. AnchorRoute supports root-3D, planar-root, and body-point control within the same formulation. In benchmark evaluations, AnchorRoute outperforms prior sparse-control methods under the sparse keyjoint protocol and consistently improves anchor adherence across control families. The results show that the learned anchor-conditioned generator and RouteSolver refinement are complementary: the generator preserves text-motion quality, while RouteSolver provides a controllable path toward stronger anchor adherence.

preprint2026arXiv

CALM: Culturally Self-Aware Language Models

Cultural awareness in language models is the capacity to understand and adapt to diverse cultural contexts. However, most existing approaches treat culture as static background knowledge, overlooking its dynamic and evolving nature. This limitation reduces their reliability in downstream tasks that demand genuine cultural sensitivity. In this work, we introduce CALM, a novel framework designed to endow language models with cultural self-awareness. CALM disentangles task semantics from explicit cultural concepts and latent cultural signals, shaping them into structured cultural clusters through contrastive learning. These clusters are then aligned via cross-attention to establish fine-grained interactions among related cultural features and are adaptively integrated through a Mixture-of-Experts mechanism along culture-specific dimensions. The resulting unified representation is fused with the model's original knowledge to construct a culturally grounded internal identity state, which is further enhanced through self-prompted reflective learning, enabling continual adaptation and self-correction. Extensive experiments conducted on multiple cross-cultural benchmark datasets demonstrate that CALM consistently outperforms state-of-the-art methods.

preprint2026arXiv

CHASM: Cross-frequency Harmonized Axis-Separable Mixing for Spectral Token Operators

Spectral token mixers based on Fourier transforms provide an efficient way to model global interactions in visual feature maps. Existing designs often either apply filter-wise spectral responses along fixed channel axes, or learn adaptive frequency-indexed channel mixing without explicitly aligning the channel directions used across frequencies. We propose CHASM, a Cross-frequency Harmonized Axis-Separable Mixer, as a structured middle ground. CHASM separates what should be shared from what should remain frequency-specific: all frequencies share a learned channel eigenbasis, while each frequency retains its own positive spectral gains. The shared basis makes channel directions comparable across the spectrum, whereas the positive gains preserve local spectral adaptivity. CHASM applies this structured operator separably along the height and width axes and is used as a drop-in replacement mixer inside existing backbones. We provide a structural characterization of the shared-basis operator family and evaluate CHASM through controlled same-backbone comparisons. Across accelerated MRI reconstruction, undersampled MRI segmentation, and natural-image reconstruction, CHASM consistently improves over same-backbone spectral-mixer baselines. Ablations show that removing the shared-basis constraint weakens performance, and randomizing coherent sampling geometry substantially reduces the gain, supporting cross-frequency harmonization as a useful inductive bias for spectral token operators.

preprint2026arXiv

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

Speech-driven gestures and facial animations are fundamental to expressive digital avatars in games, virtual production, and interactive media. However, existing methods are either limited to a single modality for audio motion alignment, failing to fully utilize the potential of massive human motion data, or are constrained by the representation ability and throughput of multimodal models, which makes it difficult to achieve high-quality motion generation or real-time performance. We present UMo, a unified sparse motion modeling architecture for real-time co-speech avatars, which processes text, audio, and motion tokens within a unified formulation. Leveraging a spatially sparse Mixture-of-Experts framework and a temporally sparse, keyframe-centric design, UMo efficiently performs real-time dense reconstruction, enabling temporally coherent and high-fidelity animation generation for both facial expressions and gestures. Furthermore, we implement a multi-stage training strategy with targeted audio augmentation to enhance acoustic diversity and semantic consistency. Consequently, UMo preserves fine-grained speech-motion alignment even under strict latency constraints. Extensive quantitative and qualitative evaluations show that UMo achieves better output quality under low latency and real-time performance constraints, offering a practical solution for high-fidelity real-time co-speech avatars.

preprint2021arXiv

Sparse Bayesian mass-mapping with uncertainties: local credible intervals

Until recently mass-mapping techniques for weak gravitational lensing convergence reconstruction have lacked a principled statistical framework upon which to quantify reconstruction uncertainties, without making strong assumptions of Gaussianity. In previous work we presented a sparse hierarchical Bayesian formalism for convergence reconstruction that addresses this shortcoming. Here, we draw on the concept of local credible intervals (cf. Bayesian error bars) as an extension of the uncertainty quantification techniques previously detailed. These uncertainty quantification techniques are benchmarked against those recovered via Px-MALA - a state of the art proximal Markov Chain Monte Carlo (MCMC) algorithm. We find that typically our recovered uncertainties are everywhere conservative, of similar magnitude and highly correlated (Pearson correlation coefficient $\geq 0.85$) with those recovered via Px-MALA. Moreover, we demonstrate an increase in computational efficiency of $\mathcal{O}(10^6)$ when using our sparse Bayesian approach over MCMC techniques. This computational saving is critical for the application of Bayesian uncertainty quantification to large-scale stage IV surveys such as LSST and Euclid.

preprint2021arXiv

Sparse Bayesian mass-mapping with uncertainties: peak statistics and feature locations

Weak lensing convergence maps - upon which higher order statistics can be calculated - can be recovered from observations of the shear field by solving the lensing inverse problem. For typical surveys this inverse problem is ill-posed (often seriously) leading to substantial uncertainty on the recovered convergence maps. In this paper we propose novel methods for quantifying the Bayesian uncertainty in the location of recovered features and the uncertainty in the cumulative peak statistic - the peak count as a function of signal to noise ratio (SNR). We adopt the sparse hierarchical Bayesian mass-mapping framework developed in previous work, which provides robust reconstructions and principled statistical interpretation of reconstructed convergence maps without the need to assume or impose Gaussianity. We demonstrate our uncertainty quantification techniques on both Bolshoi N-body (cluster scale) and Buzzard V-1.6 (large scale structure) N-body simulations. For the first time, this methodology allows one to recover approximate Bayesian upper and lower limits on the cumulative peak statistic at well defined confidence levels.

preprint2020arXiv

Offline and online reconstruction for radio interferometric imaging

Radio astronomy is transitioning to a big-data era due to the emerging generation of radio interferometric (RI) telescopes, such as the Square Kilometre Array (SKA), which will acquire massive volumes of data. In this article we review methods proposed recently to resolve the ill-posed inverse problem of imaging the raw visibilities acquired by RI telescopes in the big-data scenario. We focus on the recently proposed online reconstruction method [4] and the considerable savings in data storage requirements and computational cost that it yields.

preprint2019arXiv

Linkage between piecewise constant Mumford-Shah model and ROF model and its virtue in image segmentation

The piecewise constant Mumford-Shah (PCMS) model and the Rudin-Osher-Fatemi (ROF) model are two important variational models in image segmentation and image restoration, respectively. In this paper, we explore a linkage between these models. We prove that for the two-phase segmentation problem a partial minimizer of the PCMS model can be obtained by thresholding the minimizer of the ROF model. A similar linkage is still valid for multiphase segmentation under specific assumptions. Thus it opens a new segmentation paradigm: image segmentation can be done via image restoration plus thresholding. This new paradigm, which circumvents the innate non-convex property of the PCMS model, therefore improves the segmentation performance in both efficiency (much faster than state-of-the-art methods based on PCMS model, particularly when the phase number is high) and effectiveness (producing segmentation results with better quality) due to the flexibility of the ROF model in tackling degraded images, such as noisy images, blurry images or images with information loss. As a by-product of the new paradigm, we derive a novel segmentation method, called thresholded-ROF (T-ROF) method, to illustrate the virtue of managing image segmentation through image restoration techniques. The convergence of the T-ROF method is proved, and elaborate experimental results and comparisons are presented.

preprint2015arXiv

A Three-stage Approach for Segmenting Degraded Color Images: Smoothing, Lifting and Thresholding (SLaT)

In this paper, we propose a SLaT (Smoothing, Lifting and Thresholding) method with three stages for multiphase segmentation of color images corrupted by different degradations: noise, information loss, and blur. At the first stage, a convex variant of the Mumford-Shah model is applied to each channel to obtain a smooth image. We show that the model has unique solution under the different degradations. In order to properly handle the color information, the second stage is dimension lifting where we consider a new vector-valued image composed of the restored image and its transform in the secondary color space with additional information. This ensures that even if the first color space has highly correlated channels, we can still have enough information to give good segmentation results. In the last stage, we apply multichannel thresholding to the combined vector-valued image to find the segmentation. The number of phases is only required in the last stage, so users can choose or change it all without the need of solving the previous stages again. Experiments demonstrate that our SLaT method gives excellent results in terms of segmentation quality and CPU time in comparison with other state-of-the-art segmentation methods.

preprint2014arXiv

Disparity and Optical Flow Partitioning Using Extended Potts Priors

This paper addresses the problems of disparity and optical flow partitioning based on the brightness invariance assumption. We investigate new variational approaches to these problems with Potts priors and possibly box constraints. For the optical flow partitioning, our model includes vector-valued data and an adapted Potts regularizer. Using the notation of asymptotically level stable functions we prove the existence of global minimizers of our functionals. We propose a modified alternating direction method of minimizers. This iterative algorithm requires the computation of global minimizers of classical univariate Potts problems which can be done efficiently by dynamic programming. We prove that the algorithm converges both for the constrained and unconstrained problems. Numerical examples demonstrate the very good performance of our partitioning method.

preprint2014arXiv

Non-parametric Image Registration of Airborne LiDAR, Hyperspectral and Photographic Imagery of Forests

There is much current interest in using multi-sensor airborne remote sensing to monitor the structure and biodiversity of forests. This paper addresses the application of non-parametric image registration techniques to precisely align images obtained from multimodal imaging, which is critical for the successful identification of individual trees using object recognition approaches. Non-parametric image registration, in particular the technique of optimizing one objective function containing data fidelity and regularization terms, provides flexible algorithms for image registration. Using a survey of woodlands in southern Spain as an example, we show that non-parametric image registration can be successful at fusing datasets when there is little prior knowledge about how the datasets are interrelated (i.e. in the absence of ground control points). The validity of non-parametric registration methods in airborne remote sensing is demonstrated by a series of experiments. Precise data fusion is a prerequisite to accurate recognition of objects within airborne imagery, so non-parametric image registration could make a valuable contribution to the analysis pipeline.

preprint2014arXiv

Variational Image Segmentation Model Coupled with Image Restoration Achievements

Image segmentation and image restoration are two important topics in image processing with great achievements. In this paper, we propose a new multiphase segmentation model by combining image restoration and image segmentation models. Utilizing image restoration aspects, the proposed segmentation model can effectively and robustly tackle high noisy images, blurry images, images with missing pixels, and vector-valued images. In particular, one of the most important segmentation models, the piecewise constant Mumford-Shah model, can be extended easily in this way to segment gray and vector-valued images corrupted for example by noise, blur or missing pixels after coupling a new data fidelity term which comes from image restoration topics. It can be solved efficiently using the alternating minimization algorithm, and we prove the convergence of this algorithm with three variables under mild condition. Experiments on many synthetic and real-world images demonstrate that our method gives better segmentation results in comparison to others state-of-the-art segmentation models especially for blurry images and images with missing pixels values.

preprint2011arXiv

Vessel Segmentation in Medical Imaging Using a Tight-Frame Based Algorithm

Tight-frame, a generalization of orthogonal wavelets, has been used successfully in various problems in image processing, including inpainting, impulse noise removal, super-resolution image restoration, etc. Segmentation is the process of identifying object outlines within images. There are quite a few efficient algorithms for segmentation that depend on the variational approach and the partial differential equation (PDE) modeling. In this paper, we propose to apply the tight-frame approach to automatically identify tube-like structures such as blood vessels in Magnetic Resonance Angiography (MRA) images. Our method iteratively refines a region that encloses the possible boundary or surface of the vessels. In each iteration, we apply the tight-frame algorithm to denoise and smooth the possible boundary and sharpen the region. We prove the convergence of our algorithm. Numerical experiments on real 2D/3D MRA images demonstrate that our method is very efficient with convergence usually within a few iterations, and it outperforms existing PDE and variational methods as it can extract more tubular objects and fine details in the images.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint