Source author record

Sekhar Tatikonda

Sekhar Tatikonda appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Artificial Intelligence Discrete Mathematics math.ST Neurons and Cognition Statistics Theory

Catalog footprint

What is connected

15works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Surrogate Gap Minimization Improves Sharpness-Aware Training

The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by minimizing a \textit{perturbed loss} defined as the maximum loss within a neighborhood in the parameter space. However, we show that both sharp and flat minima can have a low perturbed loss, implying that SAM does not always prefer flat minima. Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small. The surrogate gap is easy to compute and feasible for direct minimization during training. Based on the above observations, we propose Surrogate \textbf{G}ap Guided \textbf{S}harpness-\textbf{A}ware \textbf{M}inimization (GSAM), a novel improvement over SAM with negligible computation overhead. Conceptually, GSAM consists of two steps: 1) a gradient descent like SAM to minimize the perturbed loss, and 2) an \textit{ascent} step in the \textit{orthogonal} direction (after gradient decomposition) to minimize the surrogate gap and yet not affect the perturbed loss. GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities. Theoretically, we show the convergence of GSAM and provably better generalization than SAM. Empirically, GSAM consistently improves generalization (e.g., +3.2\% over SAM and +5.4\% over AdamW on ImageNet top-1 accuracy for ViT-B/32). Code is released at \url{ https://sites.google.com/view/gsam-iclr22/home}.

preprint2021arXiv

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth. However, the numerical estimation of the gradient in the continuous case is not well solved: existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost \textit{w.r.t} number of solver steps in integration similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation). We validate MALI in various tasks: on image recognition tasks, to our knowledge, MALI is the first to enable feasible training of a Neural ODE on ImageNet and outperform a well-tuned ResNet, while existing methods fail due to either heavy memory burden or inaccuracy; for time series modeling, MALI significantly outperforms the adjoint method; and for continuous generative models, MALI achieves new state-of-the-art performance. We provide a pypi package at \url{https://jzkay12.github.io/TorchDiffEqPack/}

preprint2021arXiv

Multiple-shooting adjoint method for whole-brain dynamic causal modeling

Dynamic causal modeling (DCM) is a Bayesian framework to infer directed connections between compartments, and has been used to describe the interactions between underlying neural populations based on functional neuroimaging data. DCM is typically analyzed with the expectation-maximization (EM) algorithm. However, because the inversion of a large-scale continuous system is difficult when noisy observations are present, DCM by EM is typically limited to a small number of compartments ($<10$). Another drawback with the current method is its complexity; when the forward model changes, the posterior mean changes, and we need to re-derive the algorithm for optimization. In this project, we propose the Multiple-Shooting Adjoint (MSA) method to address these limitations. MSA uses the multiple-shooting method for parameter estimation in ordinary differential equations (ODEs) under noisy observations, and is suitable for large-scale systems such as whole-brain analysis in functional MRI (fMRI). Furthermore, MSA uses the adjoint method for accurate gradient estimation in the ODE; since the adjoint method is generic, MSA is a generic method for both linear and non-linear systems, and does not require re-derivation of the algorithm as in EM. We validate MSA in extensive experiments: 1) in toy examples with both linear and non-linear models, we show that MSA achieves better accuracy in parameter value estimation than EM; furthermore, MSA can be successfully applied to large systems with up to 100 compartments; and 2) using real fMRI data, we apply MSA to the estimation of the whole-brain effective connectome and show improved classification of autism spectrum disorder (ASD) vs. control compared to using the functional connectome. The package is provided \url{https://jzkay12.github.io/TorchDiffEqPack}

preprint2015arXiv

Lossy Compression via Sparse Linear Regression: Performance under Minimum-distance Encoding

We study a new class of codes for lossy compression with the squared-error distortion criterion, designed using the statistical framework of high-dimensional linear regression. Codewords are linear combinations of subsets of columns of a design matrix. Called a Sparse Superposition or Sparse Regression codebook, this structure is motivated by an analogous construction proposed recently by Barron and Joseph for communication over an AWGN channel. For i.i.d Gaussian sources and minimum-distance encoding, we show that such a code can attain the Shannon rate-distortion function with the optimal error exponent, for all distortions below a specified value. It is also shown that sparse regression codes are robust in the following sense: a codebook designed to compress an i.i.d Gaussian source of variance $σ^2$ with (squared-error) distortion $D$ can compress any ergodic source of variance less than $σ^2$ to within distortion $D$. Thus the sparse regression ensemble retains many of the good covering properties of the i.i.d random Gaussian ensemble, while having having a compact representation in terms of a matrix whose size is a low-order polynomial in the block-length.

preprint2014arXiv

Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding

We propose computationally efficient encoders and decoders for lossy compression using a Sparse Regression Code. The codebook is defined by a design matrix and codewords are structured linear combinations of columns of this matrix. The proposed encoding algorithm sequentially chooses columns of the design matrix to successively approximate the source sequence. It is shown to achieve the optimal distortion-rate function for i.i.d Gaussian sources under the squared-error distortion criterion. For a given rate, the parameters of the design matrix can be varied to trade off distortion performance with encoding complexity. An example of such a trade-off as a function of the block length n is the following. With computational resource (space or time) per source sample of O((n/\log n)^2), for a fixed distortion-level above the Gaussian distortion-rate function, the probability of excess distortion decays exponentially in n. The Sparse Regression Code is robust in the following sense: for any ergodic source, the proposed encoder achieves the optimal distortion-rate function of an i.i.d Gaussian source with the same variance. Simulations show that the encoder has good empirical performance, especially at low and moderate rates.

preprint2013arXiv

Achievable Rates for Channels with Deletions and Insertions

This paper considers a binary channel with deletions and insertions, where each input bit is transformed in one of the following ways: it is deleted with probability d, or an extra bit is added after it with probability i, or it is transmitted unmodified with probability 1-d-i. A computable lower bound on the capacity of this channel is derived. The transformation of the input sequence by the channel may be viewed in terms of runs as follows: some runs of the input sequence get shorter/longer, some runs get deleted, and some new runs are added. It is difficult for the decoder to synchronize the channel output sequence to the transmitted codeword mainly due to deleted runs and new inserted runs. The main idea is a mutual information decomposition in terms of the rate achieved by a sub-optimal decoder that determines the positions of the deleted and inserted runs in addition to decoding the transmitted codeword. The mutual information between the channel input and output sequences is expressed as the sum of the rate achieved by this decoder and the rate loss due to its sub-optimality. Obtaining computable lower bounds on each of these quantities yields a lower bound on the capacity. The bounds proposed in this paper provide the first characterization of achievable rates for channels with general insertions, and for channels with both deletions and insertions. For the special case of the deletion channel, the proposed bound improves on the previous best lower bound for deletion probabilities up to 0.3.

preprint2013arXiv

Rewritable storage channels with hidden state

Many storage channels admit reading and rewriting of the content at a given cost. We consider rewritable channels with a hidden state which models the unknown characteristics of the memory cell. In addition to mitigating the effect of the write noise, rewrites can help the write controller obtain a better estimate of the hidden state. The paper has two contributions. The first is a lower bound on the capacity of a general rewritable channel with hidden state. The lower bound is obtained using a coding scheme that combines Gelfand-Pinsker coding with superposition coding. The rewritable AWGN channel is discussed as an example. The second contribution is a simple coding scheme for a rewritable channel where the write noise and hidden state are both uniformly distributed. It is shown that this scheme is asymptotically optimal as the number of rewrites gets large.

preprint2012arXiv

Loopy Belief Propogation and Gibbs Measures

We address the question of convergence in the loopy belief propagation (LBP) algorithm. Specifically, we relate convergence of LBP to the existence of a weak limit for a sequence of Gibbs measures defined on the LBP s associated computation tree.Using tools FROM the theory OF Gibbs measures we develop easily testable sufficient conditions FOR convergence.The failure OF convergence OF LBP implies the existence OF multiple phases FOR the associated Gibbs specification.These results give new insight INTO the mechanics OF the algorithm.

preprint2012arXiv

Message-Passing Algorithms for Quadratic Minimization

Gaussian belief propagation (GaBP) is an iterative algorithm for computing the mean of a multivariate Gaussian distribution, or equivalently, the minimum of a multivariate positive definite quadratic function. Sufficient conditions, such as walk-summability, that guarantee the convergence and correctness of GaBP are known, but GaBP may fail to converge to the correct solution given an arbitrary positive definite quadratic function. As was observed in previous work, the GaBP algorithm fails to converge if the computation trees produced by the algorithm are not positive definite. In this work, we will show that the failure modes of the GaBP algorithm can be understood via graph covers, and we prove that a parameterized generalization of the min-sum algorithm can be used to ensure that the computation trees remain positive definite whenever the input matrix is positive definite. We demonstrate that the resulting algorithm is closely related to other iterative schemes for quadratic minimization such as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically, that there always exists a choice of parameters such that the above generalization of the GaBP algorithm converges.

preprint2012arXiv

Message-Passing Algorithms: Reparameterizations and Splittings

The max-product algorithm, a local message-passing scheme that attempts to compute the most probable assignment (MAP) of a given probability distribution, has been successfully employed as a method of approximate inference for applications arising in coding theory, computer vision, and machine learning. However, the max-product algorithm is not guaranteed to converge to the MAP assignment, and if it does, is not guaranteed to recover the MAP assignment. Alternative convergent message-passing schemes have been proposed to overcome these difficulties. This work provides a systematic study of such message-passing algorithms that extends the known results by exhibiting new sufficient conditions for convergence to local and/or global optima, providing a combinatorial characterization of these optima based on graph covers, and describing a new convergent and correct message-passing algorithm whose derivation unifies many of the known convergent message-passing algorithms. While convergent and correct message-passing algorithms represent a step forward in the analysis of max-product style message-passing algorithms, the conditions needed to guarantee convergence to a global optimum can be too restrictive in both theory and practice. This limitation of convergent and correct message-passing schemes is characterized by graph covers and illustrated by example.

preprint2012arXiv

Sparse Regression Codes for Multi-terminal Source and Channel Coding

We study a new class of codes for Gaussian multi-terminal source and channel coding. These codes are designed using the statistical framework of high-dimensional linear regression and are called Sparse Superposition or Sparse Regression codes. Codewords are linear combinations of subsets of columns of a design matrix. These codes were recently introduced by Barron and Joseph and shown to achieve the channel capacity of AWGN channels with computationally feasible decoding. They have also recently been shown to achieve the optimal rate-distortion function for Gaussian sources. In this paper, we demonstrate how to implement random binning and superposition coding using sparse regression codes. In particular, with minimum-distance encoding/decoding it is shown that sparse regression codes attain the optimal information-theoretic limits for a variety of multi-terminal source and channel coding problems.

preprint2011arXiv

Conditioned Poisson distributions and the concentration of chromatic numbers

The paper provides a simpler method for proving a delicate inequality that was used by Achlioptis and Naor to establish asymptotic concentration for chromatic numbers of Erdos-Renyi random graphs. The simplifications come from two new ideas. The first involves a sharpened form of a piece of statistical folklore regarding goodness-of-fit tests for two-way tables of Poisson counts under linear conditioning constraints. The second idea takes the form of a new inequality that controls the extreme tails of the distribution of a quadratic form in independent Poissons random variables.

preprint2011arXiv

Opportunistic capacity and error exponent regions for compound channel with feedback

Variable length communication over a compound channel with feedback is considered. Traditionally, capacity of a compound channel without feedback is defined as the maximum rate that is determined before the start of communication such that communication is reliable. This traditional definition is pessimistic. In the presence of feedback, an opportunistic definition is given. Capacity is defined as the maximum rate that is determined at the end of communication such that communication is reliable. Thus, the transmission rate can adapt to the channel chosen by nature. Under this definition, feedback communication over a compound channel is conceptually similar to multi-terminal communication. Transmission rate is a vector rather than a scalar; channel capacity is a region rather than a scalar; error exponent is a region rather than a scalar. In this paper, variable length communication over a compound channel with feedback is formulated, its opportunistic capacity region is characterized, and lower bounds for its error exponent region are provided..

preprint2010arXiv

Capacity-achieving Feedback Scheme for Gaussian Finite-State Markov Channels with Channel State Information

In this paper, we propose capacity-achieving communication schemes for Gaussian finite-state Markov channels (FSMCs) subject to an average channel input power constraint, under the assumption that the transmitters can have access to delayed noiseless output feedback as well as instantaneous or delayed channel state information (CSI). We show that the proposed schemes reveals connections between feedback communication and feedback control.

preprint2007arXiv

On the error exponent of variable-length block-coding schemes over finite-state Markov channels with feedback

The error exponent of Markov channels with feedback is studied in the variable-length block-coding setting. Burnashev's classic result is extended and a single letter characterization for the reliability function of finite-state Markov channels is presented, under the assumption that the channel state is causally observed both at the transmitter and at the receiver side. Tools from stochastic control theory are used in order to treat channels with intersymbol interference. In particular the convex analytical approach to Markov decision processes is adopted to handle problems with stopping time horizons arising from variable-length coding schemes.

Sekhar Tatikonda

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Surrogate Gap Minimization Improves Sharpness-Aware Training

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Multiple-shooting adjoint method for whole-brain dynamic causal modeling

Lossy Compression via Sparse Linear Regression: Performance under Minimum-distance Encoding

Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding

Achievable Rates for Channels with Deletions and Insertions

Rewritable storage channels with hidden state

Loopy Belief Propogation and Gibbs Measures

Message-Passing Algorithms for Quadratic Minimization

Message-Passing Algorithms: Reparameterizations and Splittings

Sparse Regression Codes for Multi-terminal Source and Channel Coding

Conditioned Poisson distributions and the concentration of chromatic numbers

Opportunistic capacity and error exponent regions for compound channel with feedback

Capacity-achieving Feedback Scheme for Gaussian Finite-State Markov Channels with Channel State Information

On the error exponent of variable-length block-coding schemes over finite-state Markov channels with feedback