Researcher profile

Xiaodong Chen

Xiaodong Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

A Baseline Framework for Part-level Action Parsing and Action Recognition

This technical report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021. Our entry is mainly based on YOLOF for instance and part detection, HRNet for human pose estimation, and CSN for video-level action recognition and frame-level part state parsing. We describe technical details for the Kinetics-TPS dataset, together with some experimental results. In the competition, we achieved 61.37% mAP on the test set of Kinetics-TPS.

preprint2022arXiv

MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition

Recognizing human actions from point cloud videos has attracted tremendous attention from both academia and industry due to its wide applications like automatic driving, robotics, and so on. However, current methods for point cloud action recognition usually require a huge amount of data with manual annotations and a complex backbone network with high computation costs, which makes it impractical for real-world applications. Therefore, this paper considers the task of semi-supervised point cloud action recognition. We propose a Masked Pseudo-Labeling autoEncoder (\textbf{MAPLE}) framework to learn effective representations with much fewer annotations for point cloud action recognition. In particular, we design a novel and efficient \textbf{De}coupled \textbf{s}patial-\textbf{t}emporal Trans\textbf{Former} (\textbf{DestFormer}) as the backbone of MAPLE. In DestFormer, the spatial and temporal dimensions of the 4D point cloud videos are decoupled to achieve efficient self-attention for learning both long-term and short-term features. Moreover, to learn discriminative features from fewer annotations, we design a masked pseudo-labeling autoencoder structure to guide the DestFormer to reconstruct features of masked frames from the available frames. More importantly, for unlabeled data, we exploit the pseudo-labels from the classification head as the supervision signal for the reconstruction of features from the masked frames. Finally, comprehensive experiments demonstrate that MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08\% accuracy on the MSR-Action3D dataset.

preprint2022arXiv

MLPerf Mobile Inference Benchmark

This paper presents the first industry-standard open-source machine learning (ML) benchmark to allow perfor mance and accuracy evaluation of mobile devices with different AI chips and software stacks. The benchmark draws from the expertise of leading mobile-SoC vendors, ML-framework providers, and model producers. It comprises a suite of models that operate with standard data sets, quality metrics and run rules. We describe the design and implementation of this domain-specific ML benchmark. The current benchmark version comes as a mobile app for different computer vision and natural language processing tasks. The benchmark also supports non-smartphone devices, such as laptops and mobile PCs. Benchmark results from the first two rounds reveal the overwhelming complexity of the underlying mobile ML system stack, emphasizing the need for transparency in mobile ML performance analysis. The results also show that the strides being made all through the ML stack improve performance. Within six months, offline throughput improved by 3x, while latency reduced by as much as 12x. ML is an evolving field with changing use cases, models, data sets and quality targets. MLPerf Mobile will evolve and serve as an open-source community framework to guide research and innovation for mobile AI.

preprint2022arXiv

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Action recognition from videos, i.e., classifying a video into one of the pre-defined action types, has been a popular topic in the communities of artificial intelligence, multimedia, and signal processing. However, existing methods usually consider an input video as a whole and learn models, e.g., Convolutional Neural Networks (CNNs), with coarse video-level class labels. These methods can only output an action class for the video, but cannot provide fine-grained and explainable cues to answer why the video shows a specific action. Therefore, researchers start to focus on a new task, Part-level Action Parsing (PAP), which aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video. To this end, we propose a coarse-to-fine framework for this challenging task. In particular, our framework first predicts the video-level class of the input video, then localizes the body parts and predicts the part-level action. Moreover, to balance the accuracy and computation in part-level action parsing, we propose to recognize the part-level actions by segment-level features. Furthermore, to overcome the ambiguity of body parts, we propose a pose-guided positional embedding method to accurately localize body parts. Through comprehensive experiments on a large-scale dataset, i.e., Kinetics-TPS, our framework achieves state-of-the-art performance and outperforms existing methods over a 31.10% ROC score.

preprint2022arXiv

The crossing number of the complete 4-partite graph $K_{1,1,m,n}$

Let $\textrm{cr}(G)$ denote the crossing number of a graph $G$. The well-known Zarankiewicz's conjecture (ZC) asserted $\textrm{cr}(K_{m,n})$ in 1954. In 1971, Harborth gave a conjecture (HC) on $\textrm{cr}(K_{x_1,...,x_n})$. HC on $K_{1,m,n}$ is verified if ZC is true by Ho et al. in 2021. In this paper, we showed the following results: If both $m$ and $n$ are even, then \[\textrm{cr}(K_{1,1,m,n})\geq \frac{1}{2}(\textrm{cr}(K_{m+1,n+3})+\textrm{cr}(K_{m+3,n+1})-mn-\frac{1}{4}(m^2+n^2));\] If both $m$ and $n$ are odd, then \[\textrm{cr}(K_{1,1,m,n})\geq \frac{1}{2}(\textrm{cr}(K_{1,m+1,n+1})+\textrm{cr}(K_{2,m,n})-\frac{1}{4}(m+1)(n+1)+1);\] If $m$ is even and $n$ is odd, then \begin{equation}\nonumber \begin{split} \textrm{cr}(K_{1,1,m,n})&\geq \frac{1}{4}(\textrm{cr}(K_{m+1,n+2})+\textrm{cr}(K_{m+3,n+2})+2\textrm{cr}(K_{2,m,n}) \\&-m(n+1)-\frac{1}{4}(n+1)^2). \end{split} \end{equation} The lower bounds in our result imply that if both $m$ and $n$ are even and ZC is true, then HC on $K_{1,1,m,n}$ holds; if at least one of $m$ and $n$ is odd and both ZC and HC on $K_{2,m,n}$ are true, then HC on $K_{1,1,m,n}$ holds.