Researcher profile

Xiankai Lu

Xiankai Lu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization

Sample selection is an effective strategy to mitigate the effect of label noise in robust learning. Typical strategies commonly apply the small-loss criterion to identify clean samples. However, those samples lying around the decision boundary with large losses usually entangle with noisy examples, which would be discarded with this criterion, leading to the heavy degeneration of the generalization performance. In this paper, we propose a novel selection strategy, \textbf{S}elf-\textbf{F}il\textbf{t}ering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them, which can avoid the selection bias of the small-loss criterion for the boundary examples. Specifically, we introduce a memory bank module that stores the historical predictions of each example and dynamically updates to support the selection for the subsequent learning iteration. Besides, to reduce the accumulated error of the sample selection bias of SFT, we devise a regularization term to penalize the confident output distribution. By increasing the weight of the misclassified categories with this term, the loss function is robust to label noise in mild conditions. We conduct extensive experiments on three benchmarks with variant noise types and achieve the new state-of-the-art. Ablation studies and further analysis verify the virtue of SFT for sample selection in robust learning.

preprint2020arXiv

Human-Aware Motion Deblurring

This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). The proposed model is based on a triple-branch encoder-decoder architecture. The first two branches are learned for sharpening FG humans and BG details, respectively; while the third one produces global, harmonious results by comprehensively fusing multi-scale deblurring information from the two domains. The proposed model is further endowed with a supervised, human-aware attention mechanism in an end-to-end fashion. It learns a soft mask that encodes FG human information and explicitly drives the FG/BG decoder-branches to focus on their specific domains. To further benefit the research towards Human-aware Image Deblurring, we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry and sharp image pairs with 65,784 densely annotated FG human bounding boxes. HIDE is specifically built to span a broad range of scenes, human object sizes, motion patterns, and background complexities. Extensive experiments on public benchmarks and our dataset demonstrate that our model performs favorably against the state-of-the-art motion deblurring methods, especially in capturing semantic details.

preprint2020arXiv

Learning Video Object Segmentation from Unlabeled Videos

We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data. We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures intrinsic properties of VOS at multiple granularities. Our approach can help advance understanding of visual patterns in VOS and significantly reduce annotation burden. With a carefully-designed architecture and strong representation learning ability, our learned model can be applied to diverse VOS settings, including object-level zero-shot VOS, instance-level zero-shot VOS, and one-shot VOS. Experiments demonstrate promising performance in these settings, as well as the potential of MuG in leveraging unlabeled data to further improve the segmentation accuracy.

preprint2020arXiv

M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients

Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.

preprint2020arXiv

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.

preprint2020arXiv

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as a process of iterative information fusion over video graphs. Specifically, AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. The underlying pair-wise relations are described by a differentiable attention mechanism. Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case. To further demonstrate the generalizability of our framework, we extend AGNN to an additional task: image object co-segmentation (IOCS). We perform experiments on two famous IOCS datasets and observe again the superiority of our AGNN model. The extensive experiments verify that AGNN is able to learn the underlying semantic/appearance relationships among video frames or related images, and discover the common objects.