Source author record

Runjian Chen

Runjian Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Robotics Social and Information Networks Artificial Intelligence eess.SP

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CoSimGNN: Towards Large-scale Graph Similarity Computation

The ability to compute similarity scores between graphs based on metrics such as Graph Edit Distance (GED) is important in many real-world applications. Computing exact GED values is typically an NP-hard problem and traditional algorithms usually achieve an unsatisfactory trade-off between accuracy and efficiency. Recently, Graph Neural Networks (GNNs) provide a data-driven solution for this task, which is more efficient while maintaining prediction accuracy in small graph (around 10 nodes per graph) similarity computation. Existing GNN-based methods, which either respectively embeds two graphs (lack of low-level cross-graph interactions) or deploy cross-graph interactions for whole graph pairs (redundant and time-consuming), are still not able to achieve competitive results when the number of nodes in graphs increases. In this paper, we focus on similarity computation for large-scale graphs and propose the "embedding-coarsening-matching" framework CoSimGNN, which first embeds and coarsens large graphs with adaptive pooling operation and then deploys fine-grained interactions on the coarsened graphs for final similarity scores. Furthermore, we create several synthetic datasets which provide new benchmarks for graph similarity computation. Detailed experiments on both synthetic and real-world datasets have been conducted and CoSimGNN achieves the best performance while the inference time is at most 1/3 of that of previous state-of-the-art.

preprint2022arXiv

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Transformer has achieved great successes in learning vision and language representation, which is general across various downstream tasks. In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size. However, porting Transformer to sample-efficient visual control remains a challenging and unsolved problem. To this end, we propose a novel Control Transformer (CtrlFormer), possessing many appealing benefits that prior arts do not have. Firstly, CtrlFormer jointly learns self-attention mechanisms between visual tokens and policy tokens among different control tasks, where multitask representation can be learned and transferred without catastrophic forgetting. Secondly, we carefully design a contrastive reinforcement learning paradigm to train CtrlFormer, enabling it to achieve high sample efficiency, which is important in control problems. For example, in the DMControl benchmark, unlike recent advanced methods that failed by producing a zero score in the "Cartpole" task after transfer learning with 100k samples, CtrlFormer can achieve a state-of-the-art score with only 100k samples while maintaining the performance of previous tasks. The code and models are released in our project homepage.

preprint2022arXiv

CycleMLP: A MLP-like Architecture for Dense Prediction

This paper presents a simple MLP-like architecture, CycleMLP, which is a versatile backbone for visual recognition and dense predictions. As compared to modern MLP architectures, e.g., MLP-Mixer, ResMLP, and gMLP, whose architectures are correlated to image size and thus are infeasible in object detection and segmentation, CycleMLP has two advantages compared to modern approaches. (1) It can cope with various image sizes. (2) It achieves linear computational complexity to image size by using local windows. In contrast, previous MLPs have $O(N^2)$ computations due to fully spatial connections. We build a family of models which surpass existing MLPs and even state-of-the-art Transformer-based models, e.g., Swin Transformer, while using fewer parameters and FLOPs. We expand the MLP-like models' applicability, making them a versatile backbone for dense prediction tasks. CycleMLP achieves competitive results on object detection, instance segmentation, and semantic segmentation. In particular, CycleMLP-Tiny outperforms Swin-Tiny by 1.3% mIoU on ADE20K dataset with fewer FLOPs. Moreover, CycleMLP also shows excellent zero-shot robustness on ImageNet-C dataset. Code is available at https://github.com/ShoufaChen/CycleMLP.

preprint2022arXiv

RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs

Blind face restoration is to recover a high-quality face image from unknown degradations. As face image contains abundant contextual information, we propose a method, RestoreFormer, which explores fully-spatial attentions to model contextual information and surpasses existing works that use local operators. RestoreFormer has several benefits compared to prior arts. First, unlike the conventional multi-head self-attention in previous Vision Transformers (ViTs), RestoreFormer incorporates a multi-head cross-attention layer to learn fully-spatial interactions between corrupted queries and high-quality key-value pairs. Second, the key-value pairs in ResotreFormer are sampled from a reconstruction-oriented high-quality dictionary, whose elements are rich in high-quality facial features specifically aimed for face reconstruction, leading to superior restoration results. Third, RestoreFormer outperforms advanced state-of-the-art methods on one synthetic dataset and three real-world datasets, as well as produces images with better visual quality.

preprint2021arXiv

Deep Samplable Observation Model for Global Localization and Kidnapping

Global localization and kidnapping are two challenging problems in robot localization. The popular method, Monte Carlo Localization (MCL) addresses the problem by iteratively updating a set of particles with a "sampling-weighting" loop. Sampling is decisive to the performance of MCL [1]. However, traditional MCL can only sample from a uniform distribution over the state space. Although variants of MCL propose different sampling models, they fail to provide an accurate distribution or generalize across scenes. To better deal with these problems, we present a distribution proposal model, named Deep Samplable Observation Model (DSOM). DSOM takes a map and a 2D laser scan as inputs and outputs a conditional multimodal probability distribution of the pose, making the samples more focusing on the regions with higher likelihood. With such samples, the convergence is expected to be more effective and efficient. Considering that the learning-based sampling model may fail to capture the true pose sometimes, we furthermore propose the Adaptive Mixture MCL (AdaM MCL), which deploys a trusty mechanism to adaptively select updating mode for each particle to tolerate this situation. Equipped with DSOM, AdaM MCL can achieve more accurate estimation, faster convergence and better scalability compared to previous methods in both synthetic and real scenes. Even in real environments with long-term changing, AdaM MCL is able to localize the robot using DSOM trained only by simulation observations from a SLAM map or a blueprint map.

preprint2021arXiv

Graph Partitioning and Graph Neural Network based Hierarchical Graph Matching for Graph Similarity Computation

Graph similarity computation aims to predict a similarity score between one pair of graphs to facilitate downstream applications, such as finding the most similar chemical compounds similar to a query compound or Fewshot 3D Action Recognition. Recently, some graph similarity computation models based on neural networks have been proposed, which are either based on graph-level interaction or node-level comparison. However, when the number of nodes in the graph increases, it will inevitably bring about reduced representation ability or high computation cost. Motivated by this observation, we propose a graph partitioning and graph neural network-based model, called PSimGNN, to effectively resolve this issue. Specifically, each of the input graphs is partitioned into a set of subgraphs to extract the local structural features directly. Next, a novel graph neural network with an attention mechanism is designed to map each subgraph into an embedding vector. Some of these subgraph pairs are automatically selected for node-level comparison to supplement the subgraph-level embedding with fine-grained information. Finally, coarse-grained interaction information among subgraphs and fine-grained comparison information among nodes in different subgraphs are integrated to predict the final similarity score. Experimental results on graph datasets with different graph sizes demonstrate that PSimGNN outperforms state-of-the-art methods in graph similarity computation tasks using approximate Graph Edit Distance (GED) as the graph similarity metric.

preprint2021arXiv

RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model

Compared to the onboard camera and laser scanner, radar sensor provides lighting and weather invariant sensing, which is naturally suitable for long-term localization under adverse conditions. However, radar data is sparse and noisy, resulting in challenges for radar mapping. On the other hand, the most popular available map currently is built by lidar. In this paper, we propose an end-to-end deep learning framework for Radar Localization on Lidar Map (RaLL) to bridge the gap, which not only achieves the robust radar localization but also exploits the mature lidar mapping technique, thus reducing the cost of radar mapping. We first embed both sensor modals into a common feature space by a neural network. Then multiple offsets are added to the map modal for exhaustive similarity evaluation against the current radar modal, yielding the regression of the current pose. Finally, we apply this differentiable measurement model to a Kalman Filter (KF) to learn the whole sequential localization process in an end-to-end manner. \textit{The whole learning system is differentiable with the network based measurement model at the front-end and KF at the back-end.} To validate the feasibility and effectiveness, we employ multi-session multi-scene datasets collected from the real world, and the results demonstrate that our proposed system achieves superior performance over $90km$ driving, even in generalization scenarios where the model training is in UK, while testing in South Korea. We also release the source code publicly.

Runjian Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

CoSimGNN: Towards Large-scale Graph Similarity Computation

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

CycleMLP: A MLP-like Architecture for Dense Prediction

RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs

Deep Samplable Observation Model for Global Localization and Kidnapping

Graph Partitioning and Graph Neural Network based Hierarchical Graph Matching for Graph Similarity Computation

RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model