Researcher profile

Mehul Motani

Mehul Motani contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2022arXiv

Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test Environments

Traditional reinforcement learning (RL) environments typically are the same for both the training and testing phases. Hence, current RL methods are largely not generalizable to a test environment which is conceptually similar but different from what the method has been trained on, which we term the novel test environment. As an effort to push RL research towards algorithms which can generalize to novel test environments, we introduce the Brick Tic-Tac-Toe (BTTT) test bed, where the brick position in the test environment is different from that in the training environment. Using a round-robin tournament on the BTTT environment, we show that traditional RL state-search approaches such as Monte Carlo Tree Search (MCTS) and Minimax are more generalizable to novel test environments than AlphaZero is. This is surprising because AlphaZero has been shown to achieve superhuman performance in environments such as Go, Chess and Shogi, which may lead one to think that it performs well in novel test environments. Our results show that BTTT, though simple, is rich enough to explore the generalizability of AlphaZero. We find that merely increasing MCTS lookahead iterations was insufficient for AlphaZero to generalize to some novel test environments. Rather, increasing the variety of training environments helps to progressively improve generalizability across all possible starting brick configurations.

preprint2022arXiv

DropNet: Reducing Neural Network Complexity via Iterative Pruning

Modern deep neural networks require a significant amount of computing time and power to train and deploy, which limits their usage on edge devices. Inspired by the iterative weight pruning in the Lottery Ticket Hypothesis, we propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity. DropNet iteratively removes nodes/filters with the lowest average post-activation value across all training samples. Empirically, we show that DropNet is robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets. We show that up to 90% of the nodes/filters can be removed without any significant loss of accuracy. The final pruned network performs well even with reinitialization of the weights and biases. DropNet also has similar accuracy to an oracle which greedily removes nodes/filters one at a time to minimise training loss, highlighting its effectiveness.

preprint2022arXiv

Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks

The importance of learning rate (LR) schedules on network pruning has been observed in a few recent works. As an example, Frankle and Carbin (2019) highlighted that winning tickets (i.e., accuracy preserving subnetworks) can not be found without applying a LR warmup schedule and Renda, Frankle and Carbin (2020) demonstrated that rewinding the LR to its initial state at the end of each pruning cycle improves performance. In this paper, we go one step further by first providing a theoretical justification for the surprising effect of LR schedules. Next, we propose a LR schedule for network pruning called SILO, which stands for S-shaped Improved Learning rate Optimization. The advantages of SILO over existing state-of-the-art (SOTA) LR schedules are two-fold: (i) SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. Specifically, SILO increases the LR upper bound (max_lr) in an S-shape. This leads to an improvement of 2% - 4% in extensive experiments with various types of networks (e.g., Vision Transformers, ResNet) on popular datasets such as ImageNet, CIFAR-10/100. (ii) In addition to the strong theoretical motivation, SILO is empirically optimal in the sense of matching an Oracle, which exhaustively searches for the optimal value of max_lr via grid search. We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.

preprint2022arXiv

Towards Better Long-range Time Series Forecasting using Generative Adversarial Networks

Long-range time series forecasting is usually based on one of two existing forecasting strategies: Direct Forecasting and Iterative Forecasting, where the former provides low bias, high variance forecasts and the later leads to low variance, high bias forecasts. In this paper, we propose a new forecasting strategy called Generative Forecasting (GenF), which generates synthetic data for the next few time steps and then makes long-range forecasts based on generated and observed data. We theoretically prove that GenF is able to better balance the forecasting variance and bias, leading to a much smaller forecasting error. We implement GenF via three components: (i) a novel conditional Wasserstein Generative Adversarial Network (GAN) based generator for synthetic time series data generation, called CWGAN-TS. (ii) a transformer based predictor, which makes long-range predictions using both generated and observed data. (iii) an information theoretic clustering algorithm to improve the training of both the CWGAN-TS and the transformer based predictor. The experimental results on five public datasets demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. Specifically, we find a 5% - 11% improvement in predictive performance (mean absolute error) while having a 15% - 50% reduction in parameters compared to the benchmarks. Lastly, we conduct an ablation study to demonstrate the effectiveness of the components comprising GenF.

preprint2012arXiv

Digital Network Coding Aided Two-way Relaying: Energy Minimization and Queue Analysis

In this paper, we consider a three node, two-way relay system with digital network coding over static channels where all link gains are assumed to be constant during transmission. The aim is to minimize total energy consumption while ensuring queue stability at all nodes, for a given pair of random packet arrival rates. Specifically, we allow for a set of transmission modes and solve for the optimal fraction of resources allocated to each mode, including multiaccess uplink transmission mode and network coding broadcasting mode. In addition, for the downlink, we find the condition to determine whether superposition coding with excess data over the better link and network coded data for both users is energy efficient and the corresponding optimization is formulated and solved. To tackle the queue evolution in this network, we present a detailed analysis of the queues at each node using a random scheduling method that closely approximates the theoretical design, through a two-dimensional Markov chain model.

preprint2012arXiv

On Capacity and Optimal Scheduling for the Half-Duplex Multiple-Relay Channel

We study the half-duplex multiple-relay channel (HD-MRC) where every node can either transmit or listen but cannot do both at the same time. We obtain a capacity upper bound based on a max-flow min-cut argument and achievable transmission rates based on the decode-forward (DF) coding strategy, for both the discrete memoryless HD-MRC and the phase-fading HD-MRC. We discover that both the upper bound and the achievable rates are functions of the transmit/listen state (a description of which nodes transmit and which receive). More precisely, they are functions of the time fraction of the different states, which we term a schedule. We formulate the optimal scheduling problem to find an optimal schedule that maximizes the DF rate. The optimal scheduling problem turns out to be a maximin optimization, for which we propose an algorithmic solution. We demonstrate our approach on a four-node multiple-relay channel, obtaining closed-form solutions in certain scenarios. Furthermore, we show that for the received signal-to-noise ratio degraded phase-fading HD-MRC, the optimal scheduling problem can be simplified to a max optimization.

preprint2008arXiv

Myopic Coding in Multiterminal Networks

This paper investigates the interplay between cooperation and achievable rates in multi-terminal networks. Cooperation refers to the process of nodes working together to relay data toward the destination. There is an inherent tradeoff between achievable information transmission rates and the level of cooperation, which is determined by how many nodes are involved and how the nodes encode/decode the data. We illustrate this trade-off by studying information-theoretic decode-forward based coding strategies for data transmission in multi-terminal networks. Decode-forward strategies are usually discussed in the context of omniscient coding, in which all nodes in the network fully cooperate with each other, both in encoding and decoding. In this paper, we investigate myopic coding, in which each node cooperates with only a few neighboring nodes. We show that achievable rates of myopic decode-forward can be as large as that of omniscient decode-forward in the low SNR regime. We also show that when each node has only a few cooperating neighbors, adding one node into the cooperation increases the transmission rate significantly. Furthermore, we show that myopic decode-forward can achieve non-zero rates as the network size grows without bound.

preprint2007arXiv

On the Capacity of the Single Source Multiple Relay Single Destination Mesh Network

In this paper, we derive the information theoretic capacity of a special class of mesh networks. A mesh network is a heterogeneous wireless network in which the transmission among power limited nodes is assisted by powerful relays, which use the same wireless medium. We investigate the mesh network when there is one source, one destination, and multiple relays, which we call the single source multiple relay single destination (SSMRSD) mesh network. We derive the asymptotic capacity of the SSMRSD mesh network when the relay powers grow to infinity. Our approach is as follows. We first look at an upper bound on the information theoretic capacity of these networks in a Gaussian setting. We then show that this bound is achievable asymptotically using the compress-and-forward strategy for the multiple relay channel. We also perform numerical computations for the case when the relays have finite powers. We observe that even when the relay power is only a few times larger than the source power, the compress-and-forward rate gets close to the capacity. The results indicate the value of cooperation in wireless mesh networks. The capacity characterization quantifies how the relays can cooperate, using the compress-and-forward strategy, to either conserve node energy or to increase transmission rate.

preprint2007arXiv

Optimal Routing for Decode-and-Forward based Cooperation in Wireless Networks

We investigate cooperative wireless relay networks in which the nodes can help each other in data transmission. We study different coding strategies in the single-source single-destination network with many relay nodes. Given the myriad of ways in which nodes can cooperate, there is a natural routing problem, i.e., determining an ordered set of nodes to relay the data from the source to the destination. We find that for a given route, the decode-and-forward strategy, which is an information theoretic cooperative coding strategy, achieves rates significantly higher than that achievable by the usual multi-hop coding strategy, which is a point-to-point non-cooperative coding strategy. We construct an algorithm to find an optimal route (in terms of rate maximizing) for the decode-and-forward strategy. Since the algorithm runs in factorial time in the worst case, we propose a heuristic algorithm that runs in polynomial time. The heuristic algorithm outputs an optimal route when the nodes transmit independent codewords. We implement these coding strategies using practical low density parity check codes to compare the performance of the strategies on different routes.

preprint2007arXiv

Optimal Routing for the Gaussian Multiple-Relay Channel with Decode-and-Forward

In this paper, we study a routing problem on the Gaussian multiple relay channel, in which nodes employ a decode-and-forward coding strategy. We are interested in routes for the information flow through the relays that achieve the highest DF rate. We first construct an algorithm that provably finds optimal DF routes. As the algorithm runs in factorial time in the worst case, we propose a polynomial time heuristic algorithm that finds an optimal route with high probability. We demonstrate that that the optimal (and near optimal) DF routes are good in practice by simulating a distributed DF coding scheme using low density parity check codes with puncturing and incremental redundancy.

preprint2006arXiv

Myopic Coding in Wireless Networks

We investigate the achievable rate of data transmission from sources to sinks through a multiple-relay network. We study achievable rates for omniscient coding, in which all nodes are considered in the coding design at each node. We find that, when maximizing the achievable rate, not all nodes need to ``cooperate'' with all other nodes in terms of coding and decoding. This leads us to suggest a constrained network, whereby each node only considers a few neighboring nodes during encoding and decoding. We term this myopic coding and calculate achievable rates for myopic coding. We show by examples that, when nodes transmit at low SNR, these rates are close to that achievable by omniscient coding, when the network is unconstrained . This suggests that a myopic view of the network might be as good as a global view. In addition, myopic coding has the practical advantage of being more robust to topology changes. It also mitigates the high computational complexity and large buffer/memory requirements of omniscient coding schemes.

preprint2006arXiv

The Capacity of the Single Source Multiple Relay Single Destination Mesh Network

In this paper, we derive the capacity of a special class of mesh networks. A mesh network is defined as a heterogeneous wireless network in which the transmission among power limited nodes is assisted by powerful relays, which use the same wireless medium. We find the capacity of the mesh network when there is one source, one destination, and multiple relays. We call this channel the single source multiple relay single destination (SSMRSD) mesh network. Our approach is as follows. We first look at an upper bound on the information theoretic capacity of these networks in the Gaussian setting. We then show that the bound is achievable asymptotically using the compress-forward strategy for the multiple relay channel. Theoretically, the results indicate the value of cooperation and the utility of carefully deployed relays in wireless ad-hoc and sensor networks. The capacity characterization quantifies how the relays can be used to either conserve node energy or to increase transmission rate.

preprint2006arXiv

The Multiple Access Channel with Feedback and Correlated Sources

In this paper, we investigate communication strategies for the multiple access channel with feedback and correlated sources (MACFCS). The MACFCS models a wireless sensor network scenario in which sensors distributed throughout an arbitrary random field collect correlated measurements and transmit them to a common sink. We derive achievable rate regions for the three-node MACFCS. First, we study the strategy when source coding and channel coding are combined, which we term full decoding at sources. Second, we look at several strategies when source coding and channel coding are separated, which we term full decoding at destination. From numerical computations on Gaussian channels, we see that different strategies perform better under certain source correlations and channel setups.

preprint2005arXiv

Myopic Coding in Multiple Relay Channels

In this paper, we investigate achievable rates for data transmission from sources to sinks through multiple relay networks. We consider myopic coding, a constrained communication strategy in which each node has only a local view of the network, meaning that nodes can only transmit to and decode from neighboring nodes. We compare this with omniscient coding, in which every node has a global view of the network and all nodes can cooperate. Using Gaussian channels as examples, we find that when the nodes transmit at low power, the rates achievable with two-hop myopic coding are as large as that under omniscient coding in a five-node multiple relay channel and close to that under omniscient coding in a six-node multiple relay channel. These results suggest that we may do local coding and cooperation without compromising much on the transmission rate. Practically, myopic coding schemes are more robust to topology changes because encoding and decoding at a node are not affected when there are changes at remote nodes. Furthermore, myopic coding mitigates the high computational complexity and large buffer/memory requirements of omniscient coding.