Source author record

Jun Fu

Jun Fu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV cond-mat.mtrl-sci Machine Learning Multimedia Systems and Control

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ImageSubject: A Large-scale Dataset for Subject Detection

Main subjects usually exist in the images or videos, as they are the objects that the photographer wants to highlight. Human viewers can easily identify them but algorithms often confuse them with other objects. Detecting the main subjects is an important technique to help machines understand the content of images and videos. We present a new dataset with the goal of training models to understand the layout of the objects and the context of the image then to find the main subjects among them. This is achieved in three aspects. By gathering images from movie shots created by directors with professional shooting skills, we collect the dataset with strong diversity, specifically, it contains 107\,700 images from 21\,540 movie shots. We labeled them with the bounding box labels for two classes: subject and non-subject foreground object. We present a detailed analysis of the dataset and compare the task with saliency detection and object detection. ImageSubject is the first dataset that tries to localize the subject in an image that the photographer wants to highlight. Moreover, we find the transformer-based detection model offers the best result among other popular model architectures. Finally, we discuss the potential applications and conclude with the importance of the dataset.

preprint2022arXiv

Mutual Attention-based Hybrid Dimensional Network for Multimodal Imaging Computer-aided Diagnosis

Recent works on Multimodal 3D Computer-aided diagnosis have demonstrated that obtaining a competitive automatic diagnosis model when a 3D convolution neural network (CNN) brings more parameters and medical images are scarce remains nontrivial and challenging. Considering both consistencies of regions of interest in multimodal images and diagnostic accuracy, we propose a novel mutual attention-based hybrid dimensional network for MultiModal 3D medical image classification (MMNet). The hybrid dimensional network integrates 2D CNN with 3D convolution modules to generate deeper and more informative feature maps, and reduce the training complexity of 3D fusion. Besides, the pre-trained model of ImageNet can be used in 2D CNN, which improves the performance of the model. The stereoscopic attention is focused on building rich contextual interdependencies of the region in 3D medical images. To improve the regional correlation of pathological tissues in multimodal medical images, we further design a mutual attention framework in the network to build the region-wise consistency in similar stereoscopic regions of different image modalities, providing an implicit manner to instruct the network to focus on pathological tissues. MMNet outperforms many previous solutions and achieves results competitive to the state-of-the-art on three multimodal imaging datasets, i.e., Parotid Gland Tumor (PGT) dataset, the MRNet dataset, and the PROSTATEx dataset, and its advantages are validated by extensive experiments.

preprint2022arXiv

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Coronary CT Angiography (CCTA) is susceptible to various distortions (e.g., artifacts and noise), which severely compromise the exact diagnosis of cardiovascular diseases. The appropriate CCTA Vessel-level Image Quality Assessment (CCTA VIQA) algorithm can be used to reduce the risk of error diagnosis. The primary challenges of CCTA VIQA are that the local part of coronary that determines final quality is hard to locate. To tackle the challenge, we formulate CCTA VIQA as a multiple-instance learning (MIL) problem, and exploit Transformer-based MIL backbone (termed as T-MIL) to aggregate the multiple instances along the coronary centerline into the final quality. However, not all instances are informative for final quality. There are some quality-irrelevant/negative instances intervening the exact quality assessment(e.g., instances covering only background or the coronary in instances is not identifiable). Therefore, we propose a Progressive Reinforcement learning based Instance Discarding module (termed as PRID) to progressively remove quality-irrelevant/negative instances for CCTA VIQA. Based on the above two modules, we propose a Reinforced Transformer Network (RTN) for automatic CCTA VIQA based on end-to-end optimization. Extensive experimental results demonstrate that our proposed method achieves the state-of-the-art performance on the real-world CCTA dataset, exceeding previous MIL methods by a large margin.

preprint2021arXiv

Ferroelectric gating of the PL in MoSe2

We demonstrate a 2D ferroelectric heterostructure with monolayer MoSe$_2$ and CuInP$_2$S$_6$ (CIPS). In the heterostructure, the electric polarization of CIPS results in c electronic modulation in monolayer MoSe$_2$.

preprint2020arXiv

Residual Squeeze-and-Excitation Network for Fast Image Deraining

Image deraining is an important image processing task as rain streaks not only severely degrade the visual quality of images but also significantly affect the performance of high-level vision tasks. Traditional methods progressively remove rain streaks via different recurrent neural networks. However, these methods fail to yield plausible rain-free images in an efficient manner. In this paper, we propose a residual squeeze-and-excitation network called RSEN for fast image deraining as well as superior deraining performance compared with state-of-the-art approaches. Specifically, RSEN adopts a lightweight encoder-decoder architecture to conduct rain removal in one stage. Besides, both encoder and decoder adopt a novel residual squeeze-and-excitation block as the core of feature extraction, which contains a residual block for producing hierarchical features, followed by a squeeze-and-excitation block for channel-wisely enhancing the resulted hierarchical features. Experimental results demonstrate that our method can not only considerably reduce the computational complexity but also significantly improve the deraining performance compared with state-of-the-art methods.

preprint2015arXiv

Direct Adaptive Controller for Uncertain MIMO Dynamic Systems with Time-varying Delay and Dead-zone Inputs

This paper presents an adaptive tracking control method for a class of nonlinearly parameterized MIMO dynamic systems with time-varying delay and unknown nonlinear dead-zone inputs. A new high dimensional integral Lyapunov-Krasovskii functional is introduced for the adaptive controller to guarantee global stability of the considered systems and also ensure convergence of the tracking errors to the origin. The proposed method provides an alternative to existing methods used for MIMO time-delay systems with dead-zone nonlinearities.

Jun Fu

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

ImageSubject: A Large-scale Dataset for Subject Detection

Mutual Attention-based Hybrid Dimensional Network for Multimodal Imaging Computer-aided Diagnosis

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Ferroelectric gating of the PL in MoSe2

Residual Squeeze-and-Excitation Network for Fast Image Deraining

Direct Adaptive Controller for Uncertain MIMO Dynamic Systems with Time-varying Delay and Dead-zone Inputs