Source author record

Mihai Nica

Mihai Nica appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Artificial Intelligence Computer Vision math-ph math.CO math.MP

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bounding generalization error with input compression: An empirical study with infinite-width networks

Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

preprint2022arXiv

The Exponentially Tilted Gaussian Prior for Variational Autoencoders

An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.

preprint2021arXiv

RSK in last passage percolation: a unified approach

We present a version of the RSK correspondence based on the Pitman transform and geometric considerations. This version unifies ordinary RSK, dual RSK and continuous RSK. We show that this version is both a bijection and an isometry, two crucial properties for taking limits of last passage percolation models. We use the bijective property to give a non-computational proof that dual RSK maps Bernoulli walks to nonintersecting Bernoulli walks.

preprint2020arXiv

A Derivative-Free Method for Solving Elliptic Partial Differential Equations with Deep Neural Networks

We introduce a deep neural network based method for solving a class of elliptic partial differential equations. We approximate the solution of the PDE with a deep neural network which is trained under the guidance of a probabilistic representation of the PDE in the spirit of the Feynman-Kac formula. The solution is given by an expectation of a martingale process driven by a Brownian motion. As Brownian walkers explore the domain, the deep neural network is iteratively trained using a form of reinforcement learning. Our method is a 'Derivative-Free Loss Method' since it does not require the explicit calculation of the derivatives of the neural network with respect to the input neurons in order to compute the training loss. The advantages of our method are showcased in a series of test problems: a corner singularity problem, an interface problem, and an application to a chemotaxis population model.

preprint2018arXiv

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

We study products of random matrices in the regime where the number of terms and the size of the matrices simultaneously tend to infinity. Our main theorem is that the logarithm of the $\ell_2$ norm of such a product applied to any fixed vector is asymptotically Gaussian. The fluctuations we find can be thought of as a finite temperature correction to the limit in which first the size and then the number of matrices tend to infinity. Depending on the scaling limit considered, the mean and variance of the limiting Gaussian depend only on either the first two or the first four moments of the measure from which matrix entries are drawn. We also obtain explicit error bounds on the moments of the norm and the Kolmogorov-Smirnov distance to a Gaussian. Finally, we apply our result to obtain precise information about the stability of gradients in randomly initialized deep neural networks with ReLU activations. This provides a quantitative measure of the extent to which the exploding and vanishing gradient problem occurs in a fully connected neural network with ReLU activations and a given architecture.

preprint2013arXiv

Stabilization Time for a Type of Evolution on Binary Strings

We consider a type of evolution on {0,1}^n which occurs in discrete steps whereby at each step, we replace every occurrence of the substring "01" by "10". After at most n-1 steps we will reach a string of the form 11..1100..11, which we will call a "stabilized" string and we call the number of steps required the "stabilization time". If we choose each bit of the string independently to be a 1 with probability p and a 0 with probability 1-p, then the stabilization time of a string in {0,1}^n is a random variable with values in 0,1,...,n-1}. We study the asymptotic behavior of this random variable as n goes to infinity and we determine its limit distribution after suitable centering and scaling . When p is not 1/2, the limit distribution is Gaussian. When p = 1/2, the limit distribution is a χ_3 distribution. We also explicitly compute the limit distribution in a threshold setting where p=p_n varies with n given by p_n = 1/2 + λ/ 2 \sqrt{n} for λ> 0 a fixed parameter. This analysis gives rise to a one parameter family of distributions that fit between a χ_3 and a Gaussian distribution. The tools used in our arguments are a natural interpretation of strings in {0,1}^n as Young diagrams, and a connection with the known distribution for the maximal height of a Brownian path on [0,1].