Researcher profile

Jinjun Wang

Jinjun Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Auxiliary Loss Reweighting for Image Inpainting

Image Inpainting is a task that aims to fill in missing regions of corrupted images with plausible contents. Recent inpainting methods have introduced perceptual and style losses as auxiliary losses to guide the learning of inpainting generators. Perceptual and style losses help improve the perceptual quality of inpainted results by supervising deep features of generated regions. However, two challenges have emerged with the usage of auxiliary losses: (i) the time-consuming grid search is required to decide weights for perceptual and style losses to properly perform, and (ii) loss terms with different auxiliary abilities are equally weighted by perceptual and style losses. To meet these two challenges, we propose a novel framework that independently weights auxiliary loss terms and adaptively adjusts their weights within a single training process, without a time-consuming grid search. Specifically, to release the auxiliary potential of perceptual and style losses, we propose two auxiliary losses, Tunable Perceptual Loss (TPL) and Tunable Style Loss (TSL) by using different tunable weights to consider the contributions of different loss terms. TPL and TSL are supersets of perceptual and style losses and release the auxiliary potential of standard perceptual and style losses. We further propose the Auxiliary Weights Adaptation (AWA) algorithm, which efficiently reweights TPL and TSL in a single training process. AWA is based on the principle that the best auxiliary weights would lead to the most improvement in inpainting performance. We conduct experiments on publically available datasets and find that our framework helps current SOTA methods achieve better results.

preprint2020arXiv

Collaborative Attention Network for Person Re-identification

Jointly utilizing global and local features to improve model accuracy is becoming a popular approach for the person re-identification (ReID) problem, because previous works using global features alone have very limited capacity at extracting discriminative local patterns in the obtained feature representation. Existing works that attempt to collect local patterns either explicitly slice the global feature into several local pieces in a handcrafted way, or apply the attention mechanism to implicitly infer the importance of different local regions. In this paper, we show that by explicitly learning the importance of small local parts and part combinations, we can further improve the final feature representation for Re-ID. Specifically, we first separate the global feature into multiple local slices at different scale with a proposed multi-branch structure. Then we introduce the Collaborative Attention Network (CAN) to automatically learn the combination of features from adjacent slices. In this way, the combination keeps the intrinsic relation between adjacent features across local regions and scales, without losing information by partitioning the global features. Experiment results on several widely-used public datasets including Market-1501, DukeMTMC-ReID and CUHK03 prove that the proposed method outperforms many existing state-of-the-art methods.

preprint2020arXiv

End-to-End Multi-Object Tracking with Global Response Map

Most existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection paradigm and the data association framework where objects are firstly detected and then associated. Although deep-learning based method can noticeably improve the object detection performance and also provide good appearance features for cross-frame association, the framework is not completely end-to-end, and therefore the computation is huge while the performance is limited. To address the problem, we present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types. Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames, from which the trajectory of each tracked object can be easily picked up, just like how a detector inputs an image and outputs the bounding boxes of each detected object. The proposed model is fast and accurate. Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.

preprint2020arXiv

Meta Corrupted Pixels Mining for Medical Image Segmentation

Deep neural networks have achieved satisfactory performance in piles of medical image analysis tasks. However the training of deep neural network requires a large amount of samples with high-quality annotations. In medical image segmentation, it is very laborious and expensive to acquire precise pixel-level annotations. Aiming at training deep segmentation models on datasets with probably corrupted annotations, we propose a novel Meta Corrupted Pixels Mining (MCPM) method based on a simple meta mask network. Our method is targeted at automatically estimate a weighting map to evaluate the importance of every pixel in the learning of segmentation network. The meta mask network which regards the loss value map of the predicted segmentation results as input, is capable of identifying out corrupted layers and allocating small weights to them. An alternative algorithm is adopted to train the segmentation network and the meta mask network, simultaneously. Extensive experimental results on LIDC-IDRI and LiTS datasets show that our method outperforms state-of-the-art approaches which are devised for coping with corrupted annotations.

preprint2020arXiv

Multiple Object Tracking by Flowing and Fusing

Most of Multiple Object Tracking (MOT) approaches compute individual target features for two subtasks: estimating target-wise motions and conducting pair-wise Re-Identification (Re-ID). Because of the indefinite number of targets among video frames, both subtasks are very difficult to scale up efficiently in end-to-end Deep Neural Networks (DNNs). In this paper, we design an end-to-end DNN tracking approach, Flow-Fuse-Tracker (FFT), that addresses the above issues with two efficient techniques: target flowing and target fusing. Specifically, in target flowing, a FlowTracker DNN module learns the indefinite number of target-wise motions jointly from pixel-level optical flows. In target fusing, a FuseTracker DNN module refines and fuses targets proposed by FlowTracker and frame-wise object detection, instead of trusting either of the two inaccurate sources of target proposal. Because FlowTracker can explore complex target-wise motion patterns and FuseTracker can refine and fuse targets from FlowTracker and detectors, our approach can achieve the state-of-the-art results on several MOT benchmarks. As an online MOT approach, FFT produced the top MOTA of 46.3 on the 2DMOT15, 56.5 on the MOT16, and 56.5 on the MOT17 tracking benchmarks, surpassing all the online and offline methods in existing publications.

preprint2020arXiv

STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition

Effective and Efficient spatio-temporal modeling is essential for action recognition. Existing methods suffer from the trade-off between model performance and model complexity. In this paper, we present a novel Spatio-Temporal Hybrid Convolution Network (denoted as "STH") which simultaneously encodes spatial and temporal video information with a small parameter cost. Different from existing works that sequentially or parallelly extract spatial and temporal information with different convolutional layers, we divide the input channels into multiple groups and interleave the spatial and temporal operations in one convolutional layer, which deeply incorporates spatial and temporal clues. Such a design enables efficient spatio-temporal modeling and maintains a small model scale. STH-Conv is a general building block, which can be plugged into existing 2D CNN architectures such as ResNet and MobileNet by replacing the conventional 2D-Conv blocks (2D convolutions). STH network achieves competitive or even better performance than its competitors on benchmark datasets such as Something-Something (V1 & V2), Jester, and HMDB-51. Moreover, STH enjoys performance superiority over 3D CNNs while maintaining an even smaller parameter cost than 2D CNNs.

preprint2020arXiv

Vortex-to-velocity reconstruction for wall-bounded turbulence via a data-driven model

Modelling the vortex structures and then translating them into the corresponding velocity fields are two essential aspects for the vortex-based modelling works in wall-bounded turbulence. This work develops a datadriven method, which allows an effective reconstruction for the velocity field based on a given vortex field. The vortex field is defined as a vector field by combining the swirl strength and the real eigenvector of the velocity gradient tensor. The distinctive properties for the vortex field are investigated, with the relationship between the vortex magnitude and orientation revealed by the differential geometry. The vortex-to-velocity reconstruction method incorporates the vortex-vortex and vortex-velocity correlation information and derives the inducing model functions under the framework of the linear stochastic estimation. Fast Fourier transformation is employed to improve the computation efficiency in implementation. The reconstruction accuracy is accessed and compared with the widely-used Biot-Savart law. Results show that the method can effectively recover the turbulent motions in a large scale range, which is very promising for the turbulence modelling. The method is also employed to investigate the inducing effects of vortices at different heights, and some revealing results are discussed and linked to the hot research topics in wall-bounded turbulence.