Researcher profile

Mohammadreza Soltani

Mohammadreza Soltani contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Fisher Task Distance and Its Application in Neural Architecture Search

We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Taskonomy datasets. Next, we construct an online neural architecture search framework using the Fisher task distance, in which we have access to the past learned tasks. By using the Fisher task distance, we can identify the closest learned tasks to the target task, and utilize the knowledge learned from these related tasks for the target task. Here, we show how the proposed distance between a target task and a set of learned tasks can be used to reduce the neural architecture search space for the target task. The complexity reduction in search space for task-specific architectures is achieved by building on the optimized architectures for similar tasks instead of doing a full search and without using this side information. Experimental results for tasks in MNIST, CIFAR-10, CIFAR-100, ImageNet datasets demonstrate the efficacy of the proposed approach and its improvements, in terms of the performance and the number of parameters, over other gradient-based search methods, such as ENAS, DARTS, PC-DARTS.

preprint2022arXiv

Improved Automated Machine Learning from Transfer Learning

In this paper, we propose a neural architecture search framework based on a similarity measure between some baseline tasks and a target task. We first define the notion of the task similarity based on the log-determinant of the Fisher Information matrix. Next, we compute the task similarity from each of the baseline tasks to the target task. By utilizing the relation between a target and a set of learned baseline tasks, the search space of architectures for the target task can be significantly reduced, making the discovery of the best candidates in the set of possible architectures tractable and efficient, in terms of GPU days. This method eliminates the requirement for training the networks from scratch for a given target task as well as introducing the bias in the initialization of the search space from the human domain.

preprint2022arXiv

On The Energy Statistics of Feature Maps in Pruning of Neural Networks with Skip-Connections

We propose a new structured pruning framework for compressing Deep Neural Networks (DNNs) with skip connections, based on measuring the statistical dependency of hidden layers and predicted outputs. The dependence measure defined by the energy statistics of hidden layers serves as a model-free measure of information between the feature maps and the output of the network. The estimated dependence measure is subsequently used to prune a collection of redundant and uninformative layers. Model-freeness of our measure guarantees that no parametric assumptions on the feature map distribution are required, making it computationally appealing for very high dimensional feature space in DNNs. Extensive numerical experiments on various architectures show the efficacy of the proposed pruning approach with competitive performance to state-of-the-art methods.

preprint2022arXiv

One-Shot Neural Architecture Search via Compressive Sensing

Neural Architecture Search remains a very challenging meta-learning problem. Several recent techniques based on parameter-sharing idea have focused on reducing the NAS running time by leveraging proxy models, leading to architectures with competitive performance compared to those with hand-crafted designs. In this paper, we propose an iterative technique for NAS, inspired by algorithms for learning low-degree sparse Boolean functions. We validate our approach on the DARTs search space (Liu et al., 2018b) and NAS-Bench-201 (Yang et al., 2020). In addition, we provide theoretical analysis via upper bounds on the number of validation error measurements needed for reliable learning, and include ablation studies to further in-depth understanding of our technique.

preprint2022arXiv

Task Affinity with Maximum Bipartite Matching in Few-Shot Learning

We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a novel algorithm for the few-shot learning problem. In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model. Results on various few-shot benchmark datasets demonstrate the efficacy of the proposed approach by improving the classification accuracy over the state-of-the-art methods even when using smaller models.

preprint2021arXiv

GeoStat Representations of Time Series for Fast Classification

Recent advances in time series classification have largely focused on methods that either employ deep learning or utilize other machine learning models for feature extraction. Though successful, their power often comes at the requirement of computational complexity. In this paper, we introduce GeoStat representations for time series. GeoStat representations are based off of a generalization of recent methods for trajectory classification, and summarize the information of a time series in terms of comprehensive statistics of (possibly windowed) distributions of easy to compute differential geometric quantities, requiring no dynamic time warping. The features used are intuitive and require minimal parameter tuning. We perform an exhaustive evaluation of GeoStat on a number of real datasets, showing that simple KNN and SVM classifiers trained on these representations exhibit surprising performance relative to modern single model methods requiring significant computational power, achieving state of the art results in many cases. In particular, we show that this methodology achieves good performance on a challenging dataset involving the classification of fishing vessels, where our methods achieve good performance relative to the state of the art despite only having access to approximately two percent of the dataset used in training and evaluating this state of the art.

preprint2021arXiv

Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows

We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the high-dimensional conditional distributions learned by a normalizing flow. We prove that a Metropolis-Hastings implementation of PL-MCMC asymptotically samples from the exact conditional distributions associated with a normalizing flow. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. Through experimental tests applying normalizing flows to missing data tasks for a variety of data sets, we demonstrate the efficacy of PL-MCMC for conditional sampling from normalizing flows.

preprint2020arXiv

Hyperparameter Optimization in Neural Networks via Structured Sparse Recovery

In this paper, we study two important problems in the automated design of neural networks -- Hyper-parameter Optimization (HPO), and Neural Architecture Search (NAS) -- through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In particular, we show that a special encoding of the hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit strategy), leads to improvement over existing hyperparameter optimization methods. Experimental results on image datasets such as CIFAR-10 confirm the benefits of our approach. In the second part of this paper, we establish a connection between NAS and structured sparse recovery. Building upon ``one-shot'' approaches in NAS, we propose a novel algorithm that we call CoNAS by merging ideas from one-shot approaches with a techniques for learning low-degree sparse Boolean polynomials. We provide theoretical analysis on the number of validation error measurements. Finally, we validate our approach on several datasets and discover novel architectures hitherto unreported, achieving competitive (or better) results in both performance and search time compared to the existing NAS approaches.