Source author record

Jiangbo Lu

Jiangbo Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia cond-mat.mes-hall eess.IV Graphics

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Edge Preserving Implicit Surface Representation of Point Clouds

Learning implicit surface directly from raw data recently has become a very attractive representation method for 3D reconstruction tasks due to its excellent performance. However, as the raw data quality deteriorates, the implicit functions often lead to unsatisfactory reconstruction results. To this end, we propose a novel edge-preserving implicit surface reconstruction method, which mainly consists of a differentiable Laplican regularizer and a dynamic edge sampling strategy. Among them, the differential Laplican regularizer can effectively alleviate the implicit surface unsmoothness caused by the point cloud quality deteriorates; Meanwhile, in order to reduce the excessive smoothing at the edge regions of implicit suface, we proposed a dynamic edge extract strategy for sampling near the sharp edge of point cloud, which can effectively avoid the Laplacian regularizer from smoothing all regions. Finally, we combine them with a simple regularization term for robust implicit surface reconstruction. Compared with the state-of-the-art methods, experimental results show that our method significantly improves the quality of 3D reconstruction results. Moreover, we demonstrate through several experiments that our method can be conveniently and effectively applied to some point cloud analysis tasks, including point cloud edge feature extraction, normal estimation,etc.

preprint2022arXiv

Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation

For video frame interpolation (VFI), existing deep-learning-based approaches strongly rely on the ground-truth (GT) intermediate frames, which sometimes ignore the non-unique nature of motion judging from the given adjacent frames. As a result, these methods tend to produce averaged solutions that are not clear enough. To alleviate this issue, we propose to relax the requirement of reconstructing an intermediate frame as close to the GT as possible. Towards this end, we develop a texture consistency loss (TCL) upon the assumption that the interpolated content should maintain similar structures with their counterparts in the given frames. Predictions satisfying this constraint are encouraged, though they may differ from the pre-defined GT. Without the bells and whistles, our plug-and-play TCL is capable of improving the performance of existing VFI frameworks. On the other hand, previous methods usually adopt the cost volume or correlation map to achieve more accurate image/feature warping. However, the O(N^2) ({N refers to the pixel count}) computational complexity makes it infeasible for high-resolution cases. In this work, we design a simple, efficient (O(N)) yet powerful cross-scale pyramid alignment (CSPA) module, where multi-scale information is highly exploited. Extensive experiments justify the efficiency and effectiveness of the proposed strategy.

preprint2022arXiv

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to higher layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs-based architectures. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks. Code is released at https://github.com/fenglinglwb/EDT.

preprint2022arXiv

Video Frame Interpolation with Transformer

Video frame interpolation (VFI), which aims to synthesize intermediate frames of a video, has made remarkable progress with development of deep convolutional networks over past years. Existing methods built upon convolutional networks generally face challenges of handling large motion due to the locality of convolution operations. To overcome this limitation, we introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. Further, our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other. This design effectively enlarges the receptive field and aggregates multi-scale information. Extensive quantitative and qualitative experiments demonstrate that our method achieves new state-of-the-art results on various benchmarks.

preprint2021arXiv

HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation

Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade methods (e.g. $20\%$ on Human3.6M). The proposed method naturally supports training with "in-the-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images. Leveraging the strength of the HEMlets pose estimation, we further design and append a shallow yet effective network module to regress the SMPL parameters of the body pose and shape. We term the entire HEMlets-based human pose and shape recovery pipeline HEMlets PoSh. Extensive quantitative and qualitative experiments on the existing human body recovery benchmarks justify the state-of-the-art results obtained with our HEMlets PoSh approach.

preprint2020arXiv

Image Co-skeletonization via Co-segmentation

Recent advances in the joint processing of images have certainly shown its advantages over individual processing. Different from the existing works geared towards co-segmentation or co-localization, in this paper, we explore a new joint processing topic: image co-skeletonization, which is defined as joint skeleton extraction of objects in an image collection. Object skeletonization in a single natural image is a challenging problem because there is hardly any prior knowledge about the object. Therefore, we resort to the idea of object co-skeletonization, hoping that the commonness prior that exists across the images may help, just as it does for other joint processing problems such as co-segmentation. We observe that the skeleton can provide good scribbles for segmentation, and skeletonization, in turn, needs good segmentation. Therefore, we propose a coupled framework for co-skeletonization and co-segmentation tasks so that they are well informed by each other, and benefit each other synergistically. Since it is a new problem, we also construct a benchmark dataset by annotating nearly 1.8k images spread across 38 categories. Extensive experiments demonstrate that the proposed method achieves promising results in all the three possible scenarios of joint-processing: weakly-supervised, supervised, and unsupervised.

preprint2020arXiv

MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution

Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame. In this process, inter- and intra-frames are the key sources for exploiting temporal and spatial information. However, there are a couple of limitations for existing VSR methods. First, optical flow is often used to establish temporal correspondence. But flow estimation itself is error-prone and affects recovery results. Second, similar patterns existing in natural images are rarely exploited for the VSR task. Motivated by these findings, we propose a temporal multi-correspondence aggregation strategy to leverage similar patches across frames, and a cross-scale nonlocal-correspondence aggregation scheme to explore self-similarity of images across scales. Based on these two new modules, we build an effective multi-correspondence aggregation network (MuCAN) for VSR. Our method achieves state-of-the-art results on multiple benchmark datasets. Extensive experiments justify the effectiveness of our method.

preprint2015arXiv

Weakly Supervised Fine-Grained Image Categorization

In this paper, we categorize fine-grained images without using any object / part annotation neither in the training nor in the testing stage, a step towards making it suitable for deployments. Fine-grained image categorization aims to classify objects with subtle distinctions. Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using object or object part annotations inside training images. The need for expensive object annotations prevents the wide usage of these methods. Instead, we propose to select useful parts from multi-scale part proposals in objects, and use them to compute a global image representation for categorization. This is specially designed for the annotation-free fine-grained categorization task, because useful parts have shown to play an important role in existing annotation-dependent works but accurate part detectors can be hardly acquired. With the proposed image representation, we can further detect and visualize the key (most discriminative) parts in objects of different classes. In the experiment, the proposed annotation-free method achieves better accuracy than that of state-of-the-art annotation-free and most existing annotation-dependent methods on two challenging datasets, which shows that it is not always necessary to use accurate object / part annotations in fine-grained image categorization.

preprint2014arXiv

ITEM: Immersive Telepresence for Entertainment and Meetings - A Practical Approach

This paper presents an Immersive Telepresence system for Entertainment and Meetings (ITEM). The system aims to provide a radically new video communication experience by seamlessly merging participants into the same virtual space to allow a natural interaction among them and shared collaborative contents. With the goal to make a scalable, flexible system for various business solutions as well as easily accessible by massive consumers, we address the challenges in the whole pipeline of media processing, communication, and displaying in our design and realization of such a system. Particularly, in this paper we focus on the system aspects that maximize the end-user experience, optimize the system and network resources, and enable various teleimmersive application scenarios. In addition, we also present a few key technologies, i.e. fast object-based video coding for real world data and spatialized audio capture and 3D sound localization for group teleconferencing. Our effort is to investigate and optimize the key system components and provide an efficient end-to-end optimization and integration by considering user needs and preferences. Extensive experiments show the developed system runs reliably and comfortably in real time with a minimal setup requirement (e.g. a webcam and/or a depth camera, an optional microphone array, a laptop/desktop connected to the public Internet) for teleimmersive communication. With such a really minimal deployment requirement, we present a variety of interesting applications and user experiences created by ITEM.

preprint2011arXiv

Aharonov-Casher effect in Bi$_{\rm 2}$Se$_{\rm 3}$ square-ring interferometers

Electrical control of spin dynamics in Bi$_{\rm 2}$Se$_{\rm 3}$ was investigated in ring-type interferometers. Aharonov-Bohm and Altshuler-Aronov-Spivak resistance oscillations against magnetic field, and Aharorov-Casher resistance oscillations against gate voltage were observed in the presence of a Berry phase of $π$. A very large tunability of spin precession angle by gate voltage has been obtained, indicating that Bi$_{\rm 2}$Se$_{\rm 3}$-related materials with strong spin-orbit coupling are promising candidates for constructing novel spintronic devices.

Jiangbo Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Edge Preserving Implicit Surface Representation of Point Clouds

Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Video Frame Interpolation with Transformer

HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation

Image Co-skeletonization via Co-segmentation

MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution

Weakly Supervised Fine-Grained Image Categorization

ITEM: Immersive Telepresence for Entertainment and Meetings - A Practical Approach

Aharonov-Casher effect in Bi$_{\rm 2}$Se$_{\rm 3}$ square-ring interferometers