Researcher profile

Yanan Sun

Yanan Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2023arXiv

Differentiable Search of Accurate and Robust Architectures

Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.

preprint2023arXiv

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

In contrast to extensive studies on general vision, pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously for joint perception, prediction, and planning, posing dramatic challenges for pre-training. To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics, 3D structures, and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem, we present ViDAR, a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric space via a novel Latent Rendering operator for future point cloud prediction. Experiments show significant gain in downstream tasks, e.g., 3.1% NDS on 3D detection, ~10% error reduction on motion forecasting, and ~15% less collision rate on planning.

preprint2022arXiv

A Survey on Evolutionary Neural Architecture Search

Deep Neural Networks (DNNs) have achieved great success in many applications. The architectures of DNNs play a crucial role in their performance, which is usually manually designed with rich expertise. However, such a design process is labour intensive because of the trial-and-error process, and also not easy to realize due to the rare expertise in practice. Neural Architecture Search (NAS) is a type of technology that can design the architectures automatically. Among different methods to realize NAS, Evolutionary Computation (EC) methods have recently gained much attention and success. Unfortunately, there has not yet been a comprehensive summary of the EC-based NAS algorithms. This paper reviews over 200 papers of most recent EC-based NAS methods in light of the core components, to systematically discuss their design principles as well as justifications on the design. Furthermore, current challenges and issues are also discussed to identify future research in this emerging field.

preprint2022arXiv

A Unified Query-based Paradigm for Point Cloud Understanding

3D point cloud understanding is an important component in autonomous driving and robotics. In this paper, we present a novel Embedding-Querying paradigm (EQ- Paradigm) for 3D understanding tasks including detection, segmentation, and classification. EQ-Paradigm is a unified paradigm that enables the combination of any existing 3D backbone architectures with different task heads. Under the EQ-Paradigm, the input is firstly encoded in the embedding stage with an arbitrary feature extraction architecture, which is independent of tasks and heads. Then, the querying stage enables the encoded features to be applicable for diverse task heads. This is achieved by introducing an intermediate representation, i.e., Q-representation, in the querying stage to serve as a bridge between the embedding stage and task heads. We design a novel Q- Net as the querying stage network. Extensive experimental results on various 3D tasks, including object detection, semantic segmentation and shape classification, show that EQ-Paradigm in tandem with Q-Net is a general and effective pipeline, which enables a flexible collaboration of backbones and heads, and further boosts the performance of the state-of-the-art methods. Codes and models are available at https://github.com/dvlab-research/DeepVision3D.

preprint2022arXiv

Architecture Augmentation for Performance Predictor Based on Graph Isomorphism

Neural Architecture Search (NAS) can automatically design architectures for deep neural networks (DNNs) and has become one of the hottest research topics in the current machine learning community. However, NAS is often computationally expensive because a large number of DNNs require to be trained for obtaining performance during the search process. Performance predictors can greatly alleviate the prohibitive cost of NAS by directly predicting the performance of DNNs. However, building satisfactory performance predictors highly depends on enough trained DNN architectures, which are difficult to obtain in most scenarios. To solve this critical issue, we propose an effective DNN architecture augmentation method named GIAug in this paper. Specifically, we first propose a mechanism based on graph isomorphism, which has the merit of efficiently generating a factorial of $\boldsymbol n$ (i.e., $\boldsymbol n!$) diverse annotated architectures upon a single architecture having $\boldsymbol n$ nodes. In addition, we also design a generic method to encode the architectures into the form suitable to most prediction models. As a result, GIAug can be flexibly utilized by various existing performance predictors-based NAS algorithms. We perform extensive experiments on CIFAR-10 and ImageNet benchmark datasets on small-, medium- and large-scale search space. The experiments show that GIAug can significantly enhance the performance of most state-of-the-art peer predictors. In addition, GIAug can save three magnitude order of computation cost at most on ImageNet yet with similar performance when compared with state-of-the-art NAS algorithms.

preprint2022arXiv

Automating Neural Architecture Design without Search

Neural structure search (NAS), as the mainstream approach to automate deep neural architecture design, has achieved much success in recent years. However, the performance estimation component adhering to NAS is often prohibitively costly, which leads to the enormous computational demand. Though a large number of efforts have been dedicated to alleviating this pain point, no consensus has been made yet on which is optimal. In this paper, we study the automated architecture design from a new perspective that eliminates the need to sequentially evaluate each neural architecture generated during algorithm execution. Specifically, the proposed approach is built by learning the knowledge of high-level experts in designing state-of-the-art architectures, and then the new architecture is directly generated upon the knowledge learned. We implemented the proposed approach by using a graph neural network for link prediction and acquired the knowledge from NAS-Bench-101. Compared to existing peer competitors, we found a competitive network with minimal cost. In addition, we also utilized the learned knowledge from NAS-Bench-101 to automate architecture design in the DARTS search space, and achieved 97.82% accuracy on CIFAR10, and 76.51% top-1 accuracy on ImageNet consuming only $2\times10^{-4}$ GPU days. This also demonstrates the high transferability of the proposed approach, and can potentially lead to a new, more computationally efficient paradigm in this research direction.

preprint2022arXiv

Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions

Recently, talking-face video generation has received considerable attention. So far most methods generate results with neutral expressions or expressions that are implicitly determined by neural networks in an uncontrollable way. In this paper, we propose a method to generate talking-face videos with continuously controllable expressions in real-time. Our method is based on an important observation: In contrast to facial geometry of moderate resolution, most expression information lies in textures. Then we make use of neural textures to generate high-quality talking face videos and design a novel neural network that can generate neural textures for image frames (which we called dynamic neural textures) based on the input expression and continuous intensity expression coding (CIEC). Our method uses 3DMM as a 3D model to sample the dynamic neural texture. The 3DMM does not cover the teeth area, so we propose a teeth submodule to complete the details in teeth. Results and an ablation study show the effectiveness of our method in generating high-quality talking-face videos with continuously controllable expressions. We also set up four baseline methods by combining existing representative methods and compare them with our method. Experimental results including a user study show that our method has the best performance.

preprint2022arXiv

Human Instance Matting via Mutual Guidance and Multi-Instance Refinement

This paper introduces a new matting task called human instance matting (HIM), which requires the pertinent model to automatically predict a precise alpha matte for each human instance. Straightforward combination of closely related techniques, namely, instance segmentation, soft segmentation and human/conventional matting, will easily fail in complex cases requiring disentangling mingled colors belonging to multiple instances along hairy and thin boundary structures. To tackle these technical challenges, we propose a human instance matting framework, called InstMatt, where a novel mutual guidance strategy working in tandem with a multi-instance refinement module is used, for delineating multi-instance relationship among humans with complex and overlapping boundaries if present. A new instance matting metric called instance matting quality (IMQ) is proposed, which addresses the absence of a unified and fair means of evaluation emphasizing both instance recognition and matting quality. Finally, we construct a HIM benchmark for evaluation, which comprises of both synthetic and natural benchmark images. In addition to thorough experimental results on complex cases with multiple and overlapping human instances each has intricate boundaries, preliminary results are presented on general instance matting. Code and benchmark are available in https://github.com/nowsyn/InstMatt.

preprint2020arXiv

3DSSD: Point-based 3D Single Stage Object Detector

Currently, there have been many kinds of voxel-based 3D single stage detectors, while point-based single stage methods are still underexplored. In this paper, we first present a lightweight and effective point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency. In this paradigm, all upsampling layers and refinement stage, which are indispensable in all existing point-based methods, are abandoned to reduce the large computation cost. We novelly propose a fusion sampling strategy in downsampling process to make detection on less representative points feasible. A delicate box prediction network including a candidate generation layer, an anchor-free regression head with a 3D center-ness assignment strategy is designed to meet with our demand of accuracy and speed. Our paradigm is an elegant single stage anchor-free framework, showing great superiority to other existing methods. We evaluate 3DSSD on widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.

preprint2020arXiv

A Novel Training Protocol for Performance Predictors of Evolutionary Neural Architecture Search Algorithms

Evolutionary Neural Architecture Search (ENAS) can automatically design the architectures of Deep Neural Networks (DNNs) using evolutionary computation algorithms. However, most ENAS algorithms require intensive computational resource, which is not necessarily available to the users interested. Performance predictors are a type of regression models which can assist to accomplish the search, while without exerting much computational resource. Despite various performance predictors have been designed, they employ the same training protocol to build the regression models: 1) sampling a set of DNNs with performance as the training dataset, 2) training the model with the mean square error criterion, and 3) predicting the performance of DNNs newly generated during the ENAS. In this paper, we point out that the three steps constituting the training protocol are not well though-out through intuitive and illustrative examples. Furthermore, we propose a new training protocol to address these issues, consisting of designing a pairwise ranking indicator to construct the training target, proposing to use the logistic regression to fit the training samples, and developing a differential method to building the training instances. To verify the effectiveness of the proposed training protocol, four widely used regression models in the field of machine learning have been chosen to perform the comparisons on two benchmark datasets. The experimental results of all the comparisons demonstrate that the proposed training protocol can significantly improve the performance prediction accuracy against the traditional training protocols.

preprint2020arXiv

ArcText: A Unified Text Approach to Describing Convolutional Neural Network Architectures

The superiority of Convolutional Neural Networks (CNNs) largely relies on their architectures that are often manually crafted with extensive human expertise. Unfortunately, such kind of domain knowledge is not necessarily owned by each of the users interested. Data mining on existing CNN can discover useful patterns and fundamental sub-comments from their architectures, providing researchers with strong prior knowledge to design proper CNN architectures when they have no expertise in CNNs. There have been various state-of-the-art data mining algorithms at hand, while there is only rare work that has been done for the mining. One of the main reasons is the gap between CNN architectures and data mining algorithms. Specifically, the current CNN architecture descriptions cannot be exactly vectorized to the input of data mining algorithms. In this paper, we propose a unified approach, named ArcText, to describing CNN architectures based on text. Particularly, four different units and an ordering method have been elaborately designed in ArcText, to uniquely describe the same architecture with sufficient information. Also, the resulted description can be exactly converted back to the corresponding CNN architecture. ArcText bridges the gap between CNN architectures and data mining researchers, and has the potentiality to be utilized to wider scenarios.

preprint2020arXiv

Automatically designing CNN architectures using genetic algorithm for image classification

Convolutional Neural Networks (CNNs) have gained a remarkable success on many image classification tasks in recent years. However, the performance of CNNs highly relies upon their architectures. For most state-of-the-art CNNs, their architectures are often manually-designed with expertise in both CNNs and the investigated problems. Therefore, it is difficult for users, who have no extended expertise in CNNs, to design optimal CNN architectures for their own image classification problems of interest. In this paper, we propose an automatic CNN architecture design method by using genetic algorithms, to effectively address the image classification tasks. The most merit of the proposed algorithm remains in its "automatic" characteristic that users do not need domain knowledge of CNNs when using the proposed algorithm, while they can still obtain a promising CNN architecture for the given images. The proposed algorithm is validated on widely used benchmark image classification datasets, by comparing to the state-of-the-art peer competitors covering eight manually-designed CNNs, seven automatic+manually tuning and five automatic CNN architecture design algorithms. The experimental results indicate the proposed algorithm outperforms the existing automatic CNN architecture design algorithms in terms of classification accuracy, parameter numbers and consumed computational resources. The proposed algorithm also shows the very comparable classification accuracy to the best one from manually-designed and automatic+manually tuning CNNs, while consumes much less of computational resource.

preprint2020arXiv

Evolving Deep Convolutional Neural Networks for Hyperspectral Image Denoising

Hyperspectral images (HSIs) are susceptible to various noise factors leading to the loss of information, and the noise restricts the subsequent HSIs object detection and classification tasks. In recent years, learning-based methods have demonstrated their superior strengths in denoising the HSIs. Unfortunately, most of the methods are manually designed based on the extensive expertise that is not necessarily available to the users interested. In this paper, we propose a novel algorithm to automatically build an optimal Convolutional Neural Network (CNN) to effectively denoise HSIs. Particularly, the proposed algorithm focuses on the architectures and the initialization of the connection weights of the CNN. The experiments of the proposed algorithm have been well-designed and compared against the state-of-the-art peer competitors, and the experimental results demonstrate the competitive performance of the proposed algorithm in terms of the different evaluation metrics, visual assessments, and the computational complexity.

preprint2020arXiv

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision

We present a novel end-to-end framework named as GSNet (Geometric and Scene-aware Network), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view. GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass. Extensive experiments show that our diverse feature extraction and fusion scheme can greatly improve model performance. Based on a divide-and-conquer 3D shape representation strategy, GSNet reconstructs 3D vehicle shape with great detail (1352 vertices and 2700 faces). This dense mesh representation further leads us to consider geometrical consistency and scene context, and inspires a new multi-objective loss function to regularize network training, which in turn improves the accuracy of 6D pose estimation and validates the merit of jointly performing both tasks. We evaluate GSNet on the largest multi-task ApolloCar3D benchmark and achieve state-of-the-art performance both quantitatively and qualitatively. Project page is available at https://lkeab.github.io/gsnet/.

preprint2020arXiv

PSO-PS: Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks

Parameter updating is an important stage in parallelism-based distributed deep learning. Synchronous methods are widely used in distributed training the Deep Neural Networks (DNNs). To reduce the communication and synchronization overhead of synchronous methods, decreasing the synchronization frequency (e.g., every $n$ mini-batches) is a straightforward approach. However, it often suffers from poor convergence. In this paper, we propose a new algorithm of integrating Particle Swarm Optimization (PSO) into the distributed training process of DNNs to automatically compute new parameters. In the proposed algorithm, a computing work is encoded by a particle, the weights of DNNs and the training loss are modeled by the particle attributes. At each synchronization stage, the weights are updated by PSO from the sub weights gathered from all workers, instead of averaging the weights or the gradients. To verify the performance of the proposed algorithm, the experiments are performed on two commonly used image classification benchmarks: MNIST and CIFAR10, and compared with the peer competitors at multiple different synchronization configurations. The experimental results demonstrate the competitiveness of the proposed algorithm.