Researcher profile

Sai-Kit Yeung

Sai-Kit Yeung contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

360DVO: Deep Visual Odometry for Monocular 360-Degree Camera

Monocular omnidirectional visual odometry (OVO) systems leverage 360-degree cameras to overcome field-of-view limitations of perspective VO systems. However, existing methods, reliant on handcrafted features or photometric objectives, often lack robustness in challenging scenarios, such as aggressive motion and varying illumination. To address this, we present 360DVO, the first deep learning-based OVO framework. Our approach introduces a distortion-aware spherical feature extractor (DAS-Feat) that adaptively learns distortion-resistant features from 360-degree images. These sparse feature patches are then used to establish constraints for effective pose estimation within a novel omnidirectional differentiable bundle adjustment (ODBA) module. To facilitate evaluation in realistic settings, we also contribute a new real-world OVO benchmark. Extensive experiments on this benchmark and public synthetic datasets (TartanAir V2 and 360VO) demonstrate that 360DVO surpasses state-of-the-art baselines (including 360VO and OpenVSLAM), improving robustness by 50% and accuracy by 37.5%. Homepage: https://chris1004336379.github.io/360DVO-homepage

preprint2024arXiv

Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significant power in both academia and industry fields, as a focal point in a new artificial intelligence generation. Though significant success was achieved by GPT-4V, exploring MLLMs in domain-specific analysis (e.g., marine analysis) that required domain-specific knowledge and expertise has gained less attention. In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis. This report conducts a systematic evaluation of existing GPT-4V, assessing the performance of GPT-4V on marine research and also setting a new standard for future developments in MLLMs. The experimental results of GPT-4V show that the responses generated by GPT-4V are still far away from satisfying the domain-specific requirements of the marine professions. All images and prompts used in this study will be available at https://github.com/hkust-vgd/Marine_GPT-4V_Eval

preprint2022arXiv

Neural Scene Decoration from a Single Photograph

Furnishing and rendering indoor scenes has been a long-standing task for interior design, where artists create a conceptual design for the space, build a 3D model of the space, decorate, and then perform rendering. Although the task is important, it is tedious and requires tremendous effort. In this paper, we introduce a new problem of domain-specific indoor scene image synthesis, namely neural scene decoration. Given a photograph of an empty indoor space and a list of decorations with layout determined by user, we aim to synthesize a new image of the same space with desired furnishing and decorations. Neural scene decoration can be applied to create conceptual interior designs in a simple yet effective manner. Our attempt to this research problem is a novel scene generation architecture that transforms an empty scene and an object layout into a realistic furnished scene photograph. We demonstrate the performance of our proposed method by comparing it with conditional image synthesis baselines built upon prevailing image translation approaches both qualitatively and quantitatively. We conduct extensive experiments to further validate the plausibility and aesthetics of our generated scenes. Our implementation is available at \url{https://github.com/hkust-vgd/neural_scene_decoration}.

preprint2022arXiv

RIConv++: Effective Rotation Invariant Convolutions for 3D Point Clouds Deep Learning

3D point clouds deep learning is a promising field of research that allows a neural network to learn features of point clouds directly, making it a robust tool for solving 3D scene understanding tasks. While recent works show that point cloud convolutions can be invariant to translation and point permutation, investigations of the rotation invariance property for point cloud convolution has been so far scarce. Some existing methods perform point cloud convolutions with rotation-invariant features, existing methods generally do not perform as well as translation-invariant only counterpart. In this work, we argue that a key reason is that compared to point coordinates, rotation-invariant features consumed by point cloud convolution are not as distinctive. To address this problem, we propose a simple yet effective convolution operator that enhances feature distinction by designing powerful rotation invariant features from the local regions. We consider the relationship between the point of interest and its neighbors as well as the internal relationship of the neighbors to largely improve the feature descriptiveness. Our network architecture can capture both local and global context by simply tuning the neighborhood size in each convolution layer. We conduct several experiments on synthetic and real-world point cloud classifications, part segmentation, and shape retrieval to evaluate our method, which achieves the state-of-the-art accuracy under challenging rotations.

preprint2020arXiv

Global Context Aware Convolutions for 3D Point Cloud Understanding

Recent advances in deep learning for 3D point clouds have shown great promises in scene understanding tasks thanks to the introduction of convolution operators to consume 3D point clouds directly in a neural network. Point cloud data, however, could have arbitrary rotations, especially those acquired from 3D scanning. Recent works show that it is possible to design point cloud convolutions with rotation invariance property, but such methods generally do not perform as well as translation-invariant only convolution. We found that a key reason is that compared to point coordinates, rotation-invariant features consumed by point cloud convolution are not as distinctive. To address this problem, we propose a novel convolution operator that enhances feature distinction by integrating global context information from the input point cloud to the convolution. To this end, a globally weighted local reference frame is constructed in each point neighborhood in which the local point set is decomposed into bins. Anchor points are generated in each bin to represent global shape features. A convolution can then be performed to transform the points and anchor features into final rotation-invariant features. We conduct several experiments on point cloud classification, part segmentation, shape retrieval, and normals estimation to evaluate our convolution, which achieves state-of-the-art accuracy under challenging rotations.

preprint2020arXiv

SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information

Fully-automatic execution is the ultimate goal for many Computer Vision applications. However, this objective is not always realistic in tasks associated with high failure costs, such as medical applications. For these tasks, semi-automatic methods allowing minimal effort from users to guide computer algorithms are often preferred due to desirable accuracy and performance. Inspired by the practicality and applicability of the semi-automatic approach, this paper proposes a novel deep neural network architecture, namely SideInfNet that effectively integrates features learnt from images with side information extracted from user annotations. To evaluate our method, we applied the proposed network to three semantic segmentation tasks and conducted extensive experiments on benchmark datasets. Experimental results and comparison with prior work have verified the superiority of our model, suggesting the generality and effectiveness of the model in semi-automatic semantic segmentation.