Researcher profile

Yixin Zhuang

Yixin Zhuang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Decoupling Features and Coordinates for Few-shot RGB Relocalization

Cross-scene model adaption is crucial for camera relocalization in real scenarios. It is often preferable that a pre-learned model can be fast adapted to a novel scene with as few training samples as possible. The existing state-of-the-art approaches, however, can hardly support such few-shot scene adaption due to the entangling of image feature extraction and scene coordinate regression. To address this issue, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression, and pose estimation are performed separately. Our key insight is that feature encoder used for coordinate regression should be learned by removing the distracting factor of coordinate systems, such that feature encoder is learned from multiple scenes for general feature representation and more important, view-insensitive capability. With this feature prior, and combined with a coordinate regressor, few-shot observations in a new scene are much easier to connect with the 3D world than the one with existing integrated solution. Experiments have shown the superiority of our approach compared to the state-of-the-art methods, producing higher accuracy on several scenes with diverse visual appearance and viewpoint distribution.

preprint2022arXiv

Neural Implicit 3D Shapes from Single Images with Spatial Patterns

Neural implicit functions have achieved impressive results for reconstructing 3D shapes from single images. However, the image features for describing 3D point samplings of implicit functions are less effective when significant variations of occlusions, views, and appearances exist from the image. To better encode image features, we study a geometry-aware convolutional kernel to leverage geometric relationships of point samplings by the proposed \emph{spatial pattern}, i.e., a structured point set. Specifically, the kernel operates at 2D projections of 3D points from the spatial pattern. Supported by the spatial pattern, the 2D kernel encodes geometric information that is crucial for 3D reconstruction tasks, while traditional ones mainly consider appearance information. Furthermore, to enable the network to discover more adaptive spatial patterns for further capturing non-local contextual information, the kernel is devised to be deformable manipulated by a spatial pattern generator. Experimental results on both synthetic and real datasets demonstrate the superiority of the proposed method. Pre-trained models, codes, and data are available at https://github.com/yixin26/SVR-SP.

preprint2022arXiv

Visual Localization via Few-Shot Scene Region Classification

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes. Code available at: \url{https://github.com/siyandong/SRC}

preprint2020arXiv

Multimodal Shape Completion via Conditional Generative Adversarial Networks

Several deep learning methods have been proposed for completing partial data from shape acquisition setups, i.e., filling the regions that were missing in the shape. These methods, however, only complete the partial shape with a single output, ignoring the ambiguity when reasoning the missing geometry. Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping. We develop the first multimodal shape completion method that completes the partial shape via conditional generative modeling, without requiring paired training data. Our approach distills the ambiguity by conditioning the completion on a learned multimodal distribution of possible results. We extensively evaluate the approach on several datasets that contain varying forms of shape incompleteness, and compare among several baseline methods and variants of our methods qualitatively and quantitatively, demonstrating the merit of our method in completing partial shapes with both diversity and quality.

preprint2020arXiv

PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes

We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly. The input to our network is a 3D shape segmented into parts, where each part is first encoded into a feature representation using a part autoencoder. The core component of PQ-NET is a sequence-to-sequence or Seq2Seq autoencoder which encodes a sequence of part features into a latent vector of fixed size, and the decoder reconstructs the 3D shape, one part at a time, resulting in a sequential assembly. The latent space formed by the Seq2Seq encoder encodes both part structure and fine part geometry. The decoder can be adapted to perform several generative tasks including shape autoencoding, interpolation, novel shape generation, and single-view 3D reconstruction, where the generated shapes are all composed of meaningful parts.