Source author record

Maheswaran Sathiamoorthy

Maheswaran Sathiamoorthy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Networking and Internet Architecture Artificial Intelligence cs.CY Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Information Theory math.IT Social and Information Networks Systems and Control

Catalog footprint

What is connected

5works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Nonlinear Initialization Methods for Low-Rank Neural Networks

We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initialization parameters in the Frobenius norm with a pair of low-rank initialization matrices via singular value decomposition. Our method is inspired by the insight that approximating the function corresponding to each layer is more important than approximating the parameter values. We provably demonstrate that there is a significant gap between these two approaches for ReLU networks, particularly as the desired rank of the approximating weights decreases, or as the dimension of the inputs to the layer increases (the latter point holds when the network width is super-linear in dimension). Along the way, we provide the first provably efficient algorithm for solving the ReLU low-rank approximation problem for fixed parameter rank $r$ -- previously, it was unknown that the problem was computationally tractable to solve even for rank $1$. We also provide a practical algorithm to solve this problem which is no more expensive than the existing spectral initialization approach, and validate our theory by training ResNet and EfficientNet models (He et al., 2016; Tan & Le, 2019) on ImageNet (Russakovsky et al., 2015).

preprint2021arXiv

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness can lead to convergence and statistical performance issues when training with gradient-based methods. In this paper, we develop DSelect-k: a continuously differentiable and sparse gate for MoE, based on a novel binary encoding formulation. The gate can be trained using first-order methods, such as stochastic gradient descent, and offers explicit control over the number of experts to select. We demonstrate the effectiveness of DSelect-k on both synthetic and real MTL datasets with up to $128$ tasks. Our experiments indicate that DSelect-k can achieve statistically significant improvements in prediction and expert selection over popular MoE gates. Notably, on a real-world, large-scale recommender system, DSelect-k achieves over $22\%$ improvement in predictive performance compared to Top-k. We provide an open-source implementation of DSelect-k.

preprint2013arXiv

XORing Elephants: Novel Erasure Codes for Big Data

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability. This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeoff between locality and minimum distance. We implement our new codes in Hadoop HDFS and compare to a currently deployed HDFS module that uses Reed-Solomon codes. Our modified HDFS implementation shows a reduction of approximately 2x on the repair disk I/O and repair network traffic. The disadvantage of the new coding scheme is that it requires 14% more storage compared to Reed-Solomon codes, an overhead shown to be information theoretically optimal to obtain locality. Because the new codes repair failures faster, this provides higher reliability, which is orders of magnitude higher compared to replication.

preprint2012arXiv

Analysis of Twitter Traffic based on Renewal Densities

In this paper we propose a novel approach for Twitter traffic analysis based on renewal theory. Even though twitter datasets are of increasing interest to researchers, extracting information from message timing remains somewhat unexplored. Our approach, extending our prior work on anomaly detection, makes it possible to characterize levels of correlation within a message stream, thus assessing how much interaction there is between those posting messages. Moreover, our method enables us to detect the presence of periodic traffic, which is useful to determine whether there is spam in the message stream. Because our proposed techniques only make use of timing information and are amenable to downsampling, they can be used as low complexity tools for data analysis.

preprint2011arXiv

Backpressure with Adaptive Redundancy (BWAR)

Backpressure scheduling and routing, in which packets are preferentially transmitted over links with high queue differentials, offers the promise of throughput-optimal operation for a wide range of communication networks. However, when the traffic load is low, due to the corresponding low queue occupancy, backpressure scheduling/routing experiences long delays. This is particularly of concern in intermittent encounter-based mobile networks which are already delay-limited due to the sparse and highly dynamic network connectivity. While state of the art mechanisms for such networks have proposed the use of redundant transmissions to improve delay, they do not work well when the traffic load is high. We propose in this paper a novel hybrid approach that we refer to as backpressure with adaptive redundancy (BWAR), which provides the best of both worlds. This approach is highly robust and distributed and does not require any prior knowledge of network load conditions. We evaluate BWAR through both mathematical analysis and simulations based on cell-partitioned model. We prove theoretically that BWAR does not perform worse than traditional backpressure in terms of the maximum throughput, while yielding a better delay bound. The simulations confirm that BWAR outperforms traditional backpressure at low load, while outperforming a state of the art encounter-routing scheme (Spray and Wait) at high load.