Researcher profile

Liangliang Nan

Liangliang Nan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings

Interactive segmentation allows efficient label generation by leveraging user-provided clicks to progressively refine predictions, which is critical when fully supervised labels are costly or generalization to unseen classes is needed. Existing 3D interactive methods are limited: most operate sequentially, predicting only one object per iteration with binary masks, while several recent approaches depend on 2D foundation models and camera alignment to bridge the 2D-3D gap. To address these limitations, we propose a novel interactive segmentation framework that operates directly on sparse, randomly downsampled 3D points and processes multiple object clicks in a single forward pass. Our framework consists of a point Transformer-based encoder and a hierarchical mask decoder, which integrates multi-level crop-and-merge operations conditioned on learnable semantic embeddings. Unlike prior interactive approaches that require repeated model updates after each manually corrective click, our method jointly reasons over all click queries, modeling inter-instance relationships and refining both spatial masks and semantic predictions through spatial and semantic embeddings. Extensive experiments demonstrate that our model improves the mIoU metric by over 20 percent compared to strong baselines and achieves 8-10 percent gains under cross-dataset evaluation for a one-click per instance setting, often requiring only a single click per object. Our approach provides a generalizable and efficient solution for interactive 3D instance segmentation, particularly suitable for real-time applications such as robotic manipulation, navigation, and rapid 3D semantic annotation.

preprint2022arXiv

3D Instance Segmentation of MVS Buildings

We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method.

preprint2022arXiv

3DLG-Detector: 3D Object Detection via Simultaneous Local-Global Feature Learning

Capturing both local and global features of irregular point clouds is essential to 3D object detection (3OD). However, mainstream 3D detectors, e.g., VoteNet and its variants, either abandon considerable local features during pooling operations or ignore many global features in the whole scene context. This paper explores new modules to simultaneously learn local-global features of scene point clouds that serve 3OD positively. To this end, we propose an effective 3OD network via simultaneous local-global feature learning (dubbed 3DLG-Detector). 3DLG-Detector has two key contributions. First, it develops a Dynamic Points Interaction (DPI) module that preserves effective local features during pooling. Besides, DPI is detachable and can be incorporated into existing 3OD networks to boost their performance. Second, it develops a Global Context Aggregation module to aggregate multi-scale features from different layers of the encoder to achieve scene context-awareness. Our method shows improvements over thirteen competitors in terms of detection accuracy and robustness on both the SUN RGB-D and ScanNet datasets. Source code will be available upon publication.

preprint2022arXiv

CSDN: Cross-modal Shape-transfer Dual-refinement Network for Point Cloud Completion

How will you repair a physical object with some missings? You may imagine its original shape from previously captured images, recover its overall (global) but coarse shape first, and then refine its local details. We are motivated to imitate the physical repair procedure to address point cloud completion. To this end, we propose a cross-modal shape-transfer dual-refinement network (termed CSDN), a coarse-to-fine paradigm with images of full-cycle participation, for quality point cloud completion. CSDN mainly consists of "shape fusion" and "dual-refinement" modules to tackle the cross-modal challenge. The first module transfers the intrinsic shape characteristics from single images to guide the geometry generation of the missing regions of point clouds, in which we propose IPAdaIN to embed the global features of both the image and the partial point cloud into completion. The second module refines the coarse output by adjusting the positions of the generated points, where the local refinement unit exploits the geometric relation between the novel and the input points by graph convolution, and the global constraint unit utilizes the input image to fine-tune the generated offset. Different from most existing approaches, CSDN not only explores the complementary information from images but also effectively exploits cross-modal data in the whole coarse-to-fine completion procedure. Experimental results indicate that CSDN performs favorably against ten competitors on the cross-modal benchmark.

preprint2022arXiv

HRBF-Fusion: Accurate 3D reconstruction from RGB-D data using on-the-fly implicits

Reconstruction of high-fidelity 3D objects or scenes is a fundamental research problem. Recent advances in RGB-D fusion have demonstrated the potential of producing 3D models from consumer-level RGB-D cameras. However, due to the discrete nature and limited resolution of their surface representations (e.g., point- or voxel-based), existing approaches suffer from the accumulation of errors in camera tracking and distortion in the reconstruction, which leads to an unsatisfactory 3D reconstruction. In this paper, we present a method using on-the-fly implicits of Hermite Radial Basis Functions (HRBFs) as a continuous surface representation for camera tracking in an existing RGB-D fusion framework. Furthermore, curvature estimation and confidence evaluation are coherently derived from the inherent surface properties of the on-the-fly HRBF implicits, which devote to a data fusion with better quality. We argue that our continuous but on-the-fly surface representation can effectively mitigate the impact of noise with its robustness and constrain the reconstruction with inherent surface smoothness when being compared with discrete representations. Experimental results on various real-world and synthetic datasets demonstrate that our HRBF-fusion outperforms the state-of-the-art approaches in terms of tracking robustness and reconstruction accuracy.

preprint2021arXiv

SUM: A Benchmark Dataset of Semantic Urban Meshes

Recent developments in data acquisition technology allow us to collect 3D texture meshes quickly. Those can help us understand and analyse the urban environment, and as a consequence are useful for several applications like spatial analysis and urban planning. Semantic segmentation of texture meshes through deep learning methods can enhance this understanding, but it requires a lot of labelled data. The contributions of this work are threefold: (1) a new benchmark dataset of semantic urban meshes, (2) a novel semi-automatic annotation framework, and (3) an annotation tool for 3D meshes. In particular, our dataset covers about 4 km2 in Helsinki (Finland), with six classes, and we estimate that we save about 600 hours of labelling work using our annotation framework, which includes initial segmentation and interactive refinement. We also compare the performance of several state-of-theart 3D semantic segmentation methods on the new benchmark dataset. Other researchers can use our results to train their networks: the dataset is publicly available, and the annotation tool is released as open-source.

preprint2020arXiv

An End-to-End Geometric Deficiency Elimination Algorithm for 3D Meshes

The 3D mesh is an important representation of geometric data. In the generation of mesh data, geometric deficiencies (e.g., duplicate elements, degenerate faces, isolated vertices, self-intersection, and inner faces) are unavoidable and may violate the topology structure of an object. In this paper, we propose an effective and efficient geometric deficiency elimination algorithm for 3D meshes. Specifically, duplicate elements can be eliminated by assessing the occurrence times of vertices or faces; degenerate faces can be removed according to the outer product of two edges; since isolated vertices do not appear in any face vertices, they can be deleted directly; self-intersecting faces are detected using an AABB tree and remeshed afterward; by simulating whether multiple random rays that shoot from a face can reach infinity, we can judge whether the surface is an inner face, then decide to delete it or not. Experiments on ModelNet40 dataset illustrate that our method can eliminate the deficiencies of the 3D mesh thoroughly.