Researcher profile

Yu-Huan Wu

Yu-Huan Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

EDN: Salient Object Detection via Extremely-Downsampled Network

Recent progress on salient object detection (SOD) mainly benefits from multi-scale learning, where the high-level and low-level features collaborate in locating salient objects and discovering fine details, respectively. However, most efforts are devoted to low-level feature learning by fusing multi-scale features or enhancing boundary representations. High-level features, which although have long proven effective for many other tasks, yet have been barely studied for SOD. In this paper, we tap into this gap and show that enhancing high- level features is essential for SOD as well. To this end, we introduce an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization. To accomplish better multi-level feature fusion, we construct the Scale-Correlated Pyramid Convolution (SCPC) to build an elegant decoder for recovering object details from the above extreme downsampling. Extensive experiments demonstrate that EDN achieves state-of-the-art performance with real-time speed. Our efficient EDN-Lite also achieves competitive performance with a speed of 316fps. Hence, this work is expected to spark some new thinking in SOD. Code is available at https://github.com/yuhuan-wu/EDN.

preprint2022arXiv

MobileSal: Extremely Efficient RGB-D Salient Object Detection

The high computational cost of neural networks has prevented recent successes in RGB-D salient object detection (SOD) from benefiting real-world applications. Hence, this paper introduces a novel network, MobileSal, which focuses on efficient RGB-D SOD using mobile networks for deep feature extraction. However, mobile networks are less powerful in feature representation than cumbersome networks. To this end, we observe that the depth information of color images can strengthen the feature representation related to SOD if leveraged properly. Therefore, we propose an implicit depth restoration (IDR) technique to strengthen the mobile networks' feature representation capability for RGB-D SOD. IDR is only adopted in the training phase and is omitted during testing, so it is computationally free. Besides, we propose compact pyramid refinement (CPR) for efficient multi-level feature aggregation to derive salient objects with clear boundaries. With IDR and CPR incorporated, MobileSal performs favorably against state-of-the-art methods on six challenging RGB-D SOD datasets with much faster speed (450fps for the input size of 320 $\times$ 320) and fewer parameters (6.5M). The code is released at https://mmcheng.net/mobilesal.

preprint2022arXiv

P2T: Pyramid Pooling Transformer for Scene Understanding

Recently, the vision transformer has achieved great success by pushing the state-of-the-art of various vision tasks. One of the most challenging problems in the vision transformer is that the large sequence length of image tokens leads to high computational cost (quadratic complexity). A popular solution to this problem is to use a single pooling operation to reduce the sequence length. This paper considers how to improve existing vision transformers, where the pooled feature extracted by a single pooling operation seems less powerful. To this end, we note that pyramid pooling has been demonstrated to be effective in various vision tasks owing to its powerful ability in context abstraction. However, pyramid pooling has not been explored in backbone network design. To bridge this gap, we propose to adapt pyramid pooling to Multi-Head Self-Attention (MHSA) in the vision transformer, simultaneously reducing the sequence length and capturing powerful contextual features. Plugged with our pooling-based MHSA, we build a universal vision transformer backbone, dubbed Pyramid Pooling Transformer (P2T). Extensive experiments demonstrate that, when applied P2T as the backbone network, it shows substantial superiority in various vision tasks such as image classification, semantic segmentation, object detection, and instance segmentation, compared to previous CNN- and transformer-based networks. The code will be released at https://github.com/yuhuan-wu/P2T.

preprint2022arXiv

Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes

Current efficient LiDAR-based detection frameworks are lacking in exploiting object relations, which naturally present in both spatial and temporal manners. To this end, we introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules to capture the spatial and temporal relations accordingly. More Specifically, intra-frame relation module (IntraRM) encapsulates the intra-frame objects into a sparse graph and thus allows us to refine the object features through efficient message passing. On the other hand, inter-frame relation module (InterRM) densely connects each object in its corresponding tracked sequences dynamically, and leverages such temporal information to further enhance its representations efficiently through a lightweight transformer network. We instantiate our novel designs of IntraRM and InterRM with general center-based or anchor-based detectors and evaluate them on Waymo Open Dataset (WOD). With negligible extra overhead, Ret3D achieves the state-of-the-art performance, being 5.5% and 3.2% higher than the recent competitor in terms of the LEVEL 1 and LEVEL 2 mAPH metrics on vehicle detection, respectively.