Source author record

Xiaoming Zhao

Xiaoming Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence physics.optics

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Velox: Learning Representations of 4D Geometry and Appearance

We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, which models the time-varying surface distribution capturing the geometry; and a Gaussian decoder, which maps the tokens to 3D Gaussians, helping learn appearance. To demonstrate the utility of our representation, we evaluate it across three downstream tasks -- video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation -- and observe strong performances in all settings.

preprint2024arXiv

Non-orthogonal cavity modes near exceptional points in the far field

Non-orthogonal eigenstates are a fundamental feature of non-Hermitian systems and are accompanied by the emergence of nontrivial features. However, the platforms to explore non-Hermitian mode couplings mainly measure near-field effects, and the far-field behaviour remain mostly unexplored. Here, we study how a microcavity with non-Hermitian mode coupling exhibits eigenstate non-orthogonality by investigating the spatial field and the far-field polarization of cavity modes. The non-Hermiticity arises from asymmetric backscattering, which is controlled by integrating two scatterers of different size and location into a microdisk. We observe that the spatial field overlaps of two modes increases abruptly to its maximum value, whilst different far-field elliptical polarizations of two modes coalesce when approaching an exceptional point. We demonstrate such features experimentally by measuring the far-field polarization from the fabricated microdisks. Our work reveals the non-orthogonality in the far-field degree of freedom, and the integrability of the microdisks paves a way to integrate more non-Hermitian optical properties into nanophotonic systems.

preprint2022arXiv

ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction

Existing methods detect the keypoints in a non-differentiable way, therefore they can not directly optimize the position of keypoints through back-propagation. To address this issue, we present a partially differentiable keypoint detection module, which outputs accurate sub-pixel keypoints. The reprojection loss is then proposed to directly optimize these sub-pixel keypoints, and the dispersity peak loss is presented for accurate keypoints regularization. We also extract the descriptors in a sub-pixel way, and they are trained with the stable neural reprojection error loss. Moreover, a lightweight network is designed for keypoint detection and descriptor extraction, which can run at 95 frames per second for 640x480 images on a commercial GPU. On homography estimation, camera pose estimation, and visual (re-)localization tasks, the proposed method achieves equivalent performance with the state-of-the-art approaches, while greatly reduces the inference time.

preprint2022arXiv

Generative Multiplane Images: Making a 2D GAN 3D-Aware

What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. We find that only two modifications are absolutely necessary: 1) a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth; 2) a pose-conditioned discriminator. We refer to the generated output as a 'generative multiplane image' (GMPI) and emphasize that its renderings are not only high-quality but also guaranteed to be view-consistent, which makes GMPIs different from many prior works. Importantly, the number of alpha maps can be dynamically adjusted and can differ between training and inference, alleviating memory concerns and enabling fast training of GMPIs in less than half a day at a resolution of $1024^2$. Our findings are consistent across three challenging and common high-resolution datasets, including FFHQ, AFHQv2, and MetFaces.

preprint2022arXiv

Initialization and Alignment for Adversarial Texture Optimization

While recovery of geometry from image and video data has received a lot of attention in computer vision, methods to capture the texture for a given geometry are less mature. Specifically, classical methods for texture generation often assume clean geometry and reasonably well-aligned image data. While very recent methods, e.g., adversarial texture optimization, better handle lower-quality data obtained from hand-held devices, we find them to still struggle frequently. To improve robustness, particularly of recent adversarial texture optimization, we develop an explicit initialization and an alignment procedure. It deals with complex geometry due to a robust mapping of the geometry to the texture map and a hard-assignment-based initialization. It deals with misalignment of geometry and images by integrating fast image-alignment into the texture refinement optimization. We demonstrate efficacy of our texture generation on a dataset of 11 scenes with a total of 2807 frames, observing 7.8% and 11.1% relative improvements regarding perceptual and sharpness measurements.

preprint2021arXiv

Sparse LiDAR Assisted Self-supervised Stereo Disparity Estimation

Deep stereo matching has made significant progress in recent years. However, state-of-the-art methods are based on expensive 4D cost volume, which limits their use in real-world applications. To address this issue, 3D correlation maps and iterative disparity updates have been proposed. Regarding that in real-world platforms, such as self-driving cars and robots, the Lidar is usually installed. Thus we further introduce the sparse Lidar point into the iterative updates, which alleviates the burden of network updating the disparity from zero states. Furthermore, we propose training the network in a self-supervised way so that it can be trained on any captured data for better generalization ability. Experiments and comparisons show that the presented method is effective and achieves comparable results with related methods.

Xiaoming Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Velox: Learning Representations of 4D Geometry and Appearance

Non-orthogonal cavity modes near exceptional points in the far field

ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction

Generative Multiplane Images: Making a 2D GAN 3D-Aware

Initialization and Alignment for Adversarial Texture Optimization

Sparse LiDAR Assisted Self-supervised Stereo Disparity Estimation