Source author record

Ashok Vardhan Makkuva

Ashok Vardhan Makkuva appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory Machine Learning math.IT

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Reed-Muller Subcodes: Machine Learning-Aided Design of Efficient Soft Recursive Decoding

Reed-Muller (RM) codes are conjectured to achieve the capacity of any binary-input memoryless symmetric (BMS) channel, and are observed to have a comparable performance to that of random codes in terms of scaling laws. On the negative side, RM codes lack efficient decoders with performance close to that of a maximum likelihood decoder for general parameters. Also, they only admit certain discrete sets of rates. In this paper, we focus on subcodes of RM codes with flexible rates that can take any code dimension from 1 to n, where n is the blocklength. We first extend the recursive projection-aggregation (RPA) algorithm proposed recently by Ye and Abbe for decoding RM codes. To lower the complexity of our decoding algorithm, referred to as subRPA in this paper, we investigate different ways for pruning the projections. We then derive the soft-decision based version of our algorithm, called soft-subRPA, that is shown to improve upon the performance of subRPA. Furthermore, it enables training a machine learning (ML) model to search for \textit{good} sets of projections in the sense of minimizing the decoding error rate. Training our ML model enables achieving very close to the performance of full-projection decoding with a significantly reduced number of projections. For instance, our simulation results on a (64,14) RM subcode show almost identical performance for full-projection decoding and pruned-projection decoding with 15 projections picked via training our ML model. This is equivalent to lowering the complexity by a factor of more than 4 without sacrificing the decoding performance.

preprint2020arXiv

Learning in Gated Neural Networks

Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models. In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately. A key idea underpinning our results is the design of two {\em distinct} loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.

preprint2020arXiv

Optimal transport mapping via input convex neural networks

In this paper, we present a novel and principled approach to learn the optimal transport between two distributions, from samples. Guided by the optimal transport theory, we learn the optimal Kantorovich potential which induces the optimal transport map. This involves learning two convex functions, by solving a novel minimax optimization. Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport mapping. Numerical experiments confirm that we learn the optimal transport mapping. This approach ensures that the transport mapping we find is optimal independent of how we initialize the neural networks. Further, target distributions from a discontinuous support can be easily captured, as gradient of a convex function naturally models a {\em discontinuous} transport mapping.

preprint2016arXiv

Equivalence of additive-combinatorial linear inequalities for Shannon entropy and differential entropy

This paper addresses the correspondence between linear inequalities of Shannon entropy and differential entropy for sums of independent group-valued random variables. We show that any balanced (with the sum of coefficients being zero) linear inequality of Shannon entropy holds if and only if its differential entropy counterpart also holds; moreover, any linear inequality for differential entropy must be balanced. In particular, our result shows that recently proved differential entropy inequalities by Kontoyiannis and Madiman \cite{KM14} can be deduced from their discrete counterparts due to Tao \cite{Tao10} in a unified manner. Generalizations to certain abelian groups are also obtained. Our proof of extending inequalities of Shannon entropy to differential entropy relies on a result of Rényi \cite{Renyi59} which relates the Shannon entropy of a finely discretized random variable to its differential entropy and also helps in establishing the entropy of the sum of quantized random variables is asymptotically equal to that of the quantized sum; the converse uses the asymptotics of the differential entropy of convolutions with weak additive noise.