Source author record

Zewei Chen

Zewei Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence cond-mat.str-el Computer Vision cond-mat.mtrl-sci Multiagent Systems Operating Systems

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

MANAS: Multi-Agent Neural Architecture Search

The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective. Due to the large architecture parameter space, efficiency is a key bottleneck preventing NAS from its practical use. In this paper, we address the issue by framing NAS as a multi-agent problem where agents control a subset of the network and coordinate to reach optimal architectures. We provide two distinct lightweight implementations, with reduced memory requirements (1/8th of state-of-the-art), and performances above those of much more computationally expensive methods. Theoretically, we demonstrate vanishing regrets of the form O(sqrt(T)), with T being the total number of rounds. Finally, aware that random search is an, often ignored, effective baseline we perform additional experiments on 3 alternative datasets and 2 network configurations, and achieve favourable results in comparison.

preprint2021arXiv

A Practical Layer-Parallel Training Algorithm for Residual Networks

Gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters, which are time-consuming for deep ResNets. To break the dependencies between modules in both the forward and backward modes, auxiliary-variable methods such as the penalty and augmented Lagrangian (AL) approaches have attracted much interest lately due to their ability to exploit layer-wise parallelism. However, we observe that large communication overhead and lacking data augmentation are two key challenges of these methods, which may lead to low speedup ratio and accuracy drop across multiple compute devices. Inspired by the optimal control formulation of ResNets, we propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost. The proposed strategy first trains the network parameters by solving a succession of independent sub-problems in parallel and then corrects the network parameters through a full serial forward-backward propagation of data. Such a strategy can be applied to most of the existing layer-parallel training methods using auxiliary variables. As an example, we validate the proposed strategy using penalty and AL methods on ResNet and WideResNet across MNIST, CIFAR-10 and CIFAR-100 datasets, achieving significant speedup over the traditional layer-serial training methods while maintaining comparable accuracy.

preprint2020arXiv

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

Neural Architecture Search (NAS) achieved many breakthroughs in recent years. In spite of its remarkable progress, many algorithms are restricted to particular search spaces. They also lack efficient mechanisms to reuse knowledge when confronting multiple tasks. These challenges preclude their applicability, and motivate our proposal of CATCH, a novel Context-bAsed meTa reinforcement learning (RL) algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. CATCH utilizes a probabilistic encoder to encode task properties into latent context variables, which then guide CATCH's controller to quickly "catch" top-performing networks. The contexts also assist a network evaluator in filtering inferior candidates and speed up learning. Extensive experiments demonstrate CATCH's universality and search efficiency over many other widely-recognized algorithms. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified. This is the first work to our knowledge that proposes an efficient transferrable NAS solution while maintaining robustness across various settings.

preprint2020arXiv

DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks

Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared resources. This paper for the first time studies the distributed synchronization framework of parallel real-time tasks, where both tasks and global resources are partitioned to designated processors, and requests to each global resource are conducted on the processor on which the resource is partitioned. We extend the Distributed Priority Ceiling Protocol (DPCP) for parallel tasks under federated scheduling, with which we proved that a request can be blocked by at most one lower-priority request. We develop task and resource partitioning heuristics and propose analysis techniques to safely bound the task response times. Numerical evaluation (with heavy tasks on 8-, 16-, and 32-core processors) indicates that the proposed methods improve the schedulability significantly compared to the state-of-the-art locking protocols under federated scheduling.

preprint2020arXiv

Multi-objective Neural Architecture Search via Non-stationary Policy Gradient

Multi-objective Neural Architecture Search (NAS) aims to discover novel architectures in the presence of multiple conflicting objectives. Despite recent progress, the problem of approximating the full Pareto front accurately and efficiently remains challenging. In this work, we explore the novel reinforcement learning (RL) based paradigm of non-stationary policy gradient (NPG). NPG utilizes a non-stationary reward function, and encourages a continuous adaptation of the policy to capture the entire Pareto front efficiently. We introduce two novel reward functions with elements from the dominant paradigms of scalarization and evolution. To handle non-stationarity, we propose a new exploration scheme using cosine temperature decay with warm restarts. For fast and accurate architecture evaluation, we introduce a novel pre-trained shared model that we continuously fine-tune throughout training. Our extensive experimental study with various datasets shows that our framework can approximate the full Pareto front well at fast speeds. Moreover, our discovered cells can achieve supreme predictive performance compared to other multi-objective NAS methods, and other single-objective NAS methods at similar network sizes. Our work demonstrates the potential of NPG as a simple, efficient, and effective paradigm for multi-objective NAS.

preprint2020arXiv

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, mathematical tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools to make a deep analysis on popular normalization methods and obtain the following conclusions: 1) Most of the normalization methods can be interpreted in a unified framework, namely normalizing pre-activations or weights onto a sphere; 2) Since most of the existing normalization methods are scaling invariant, we can conduct optimization on a sphere with scaling symmetry removed, which can help stabilize the training of network; 3) We prove that training with these normalization methods can make the norm of weights increase, which could cause adversarial vulnerability as it amplifies the attack. Finally, a series of experiments are conducted to verify these claims.

preprint2012arXiv

Spin transport in the Neel and collinear antiferromagnetic phase of the two dimensional spatial and spin anisotropic Heisenberg model on a square lattice

We analyze and compare the effect of spatial and spin anisotropy on spin conductivity in a two dimensional S=1/2 Heisenberg quantum magnet on a square lattice. We explore the model in both the Neel antiferromagnetic (AF) phase and the collinear antiferromagnetic (CAF) phase. We find that in contrast to the effects of spin anisotropy in the Heisenberg model, spatial anisotropy in the AF phase does not suppress the zero temperature regular part of the spin conductivity in the zero frequency limit - rather it enhances it. We also explore the finite temperature effects on the Drude weight in the AF phase for various spatial and spin anisotropy parameters. We find that the Drude weight goes to zero as the temperature approaches zero. At finite temperatures (within the collision less approximation) enhancing spatial anisotropy increases the Drude weight value and increasing spin anisotropy decreases the Drude weight value. In the CAF phase (within the non-interacting approximation) the zero frequency spin conductivity has a finite value for non-zero values of the spatial anisotropy parameter. In the CAF phase increasing the spatial anisotropy parameter suppresses the regular part of the spin conductivity response at zero frequency. Furthermore, we find that the CAF phase displays a spike in the spin conductivity not seen in the AF phase. Inclusion of the smallest amount of spin anisotropy causes a gap to develop in the spin conductivity response of both the AF and CAF phase. Based on these studies we conclude that materials with spatial anisotropy are better spin conductors than those with spin anisotropy both at zero and finite temperatures. We utilize exchange parameter ratios for real material systems as inputs to the computation of spin conductivity.

preprint2012arXiv

Thermodynamics of Ising Spins on the Star Lattice

There is a new class of two-dimensional magnetic materials polymeric iron (III) acetate fabricated recently in which Fe ions form a star lattice. We study the thermodynamics of Ising spins on the star lattice with exact analytic method and Monte Carlo simulations. Mapping the star lattice to the honeycomb lattice, we obtain the partition function for the system with asymmetric interactions. The free energy, internal energy, specific heat, entropy and susceptibility are presented, which can be used to determine the sign of the interactions in the real materials. Moreover, we find the rich phase diagrams of the system as a function of interactions, temperature and external magnetic field. For frustrated interactions without external field, the ground state is disordered (spin liquid) with residual entropy 1.522 per unit cell. When a weak field is applied, the system enters a ferrimagnetic phase with residual entropy ln4 per unit cell.

Zewei Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

MANAS: Multi-Agent Neural Architecture Search

A Practical Layer-Parallel Training Algorithm for Residual Networks

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks

Multi-objective Neural Architecture Search via Non-stationary Policy Gradient

New Interpretations of Normalization Methods in Deep Learning

Spin transport in the Neel and collinear antiferromagnetic phase of the two dimensional spatial and spin anisotropic Heisenberg model on a square lattice

Thermodynamics of Ising Spins on the Star Lattice