Researcher profile

Konstantinos Spiliopoulos

Konstantinos Spiliopoulos contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the regularization parameter is selected appropriately, allowing Newton's method to converge more quickly for data with high-frequency components. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows. This complicates deriving the training dynamics in the overparameterized limit as well as proving the convergence of the finite-width dynamics thereto. The analysis identifies a scaling formula for selecting the regularization parameter, which we show can vanish at a suitable rate as the number of hidden units becomes larger. We prove that, for sufficiently large numbers of hidden units, the regularized Hessian remains positive definite during training and the Newton updates for individual NN parameters converge to zero, showing that the model behaves as a linearization around the initialization.

preprint2026arXiv

Kernel Limit for a Class of Recurrent Neural Networks Trained on Ergodic Data Sequences

Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(1/N)$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.

preprint2022arXiv

Disentangling positive and negative partisanship in social media interactions using a coevolving latent space network with attractors model

We develop a broadly applicable class of coevolving latent space network with attractors (CLSNA) models, where nodes represent individual social actors assumed to lie in an unknown latent space, edges represent the presence of a specified interaction between actors, and attractors are added in the latent level to capture the notion of attractive and repulsive forces. We apply the CLSNA models to understand the dynamics of partisan polarization on social media, where we expect Republicans and Democrats to increasingly interact with their own party and disengage with the opposing party. Using longitudinal social networks from the social media platforms Twitter and Reddit, we investigate the relative contributions of positive (attractive) and negative (repulsive) forces among political elites and the public, respectively. Our goals are to disentangle the positive and negative forces within and between parties and explore if and how they change over time. Our analysis confirms the existence of partisan polarization in social media interactions among both political elites and the public. Moreover, while positive partisanship is the driving force of interactions across the full periods of study for both the public and Democratic elites, negative partisanship has come to dominate Republican elites' interactions since the run-up to the 2016 presidential election.

preprint2022arXiv

Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (RMLD)) have separately been shown to improve the performance of Langevin samplers. We consider these two perturbations simultaneously by presenting a novel form of irreversible perturbation for RMLD that is informed by the underlying geometry. Through numerical examples, we show that this new irreversible perturbation can improve estimation performance over irreversible perturbations that do not take the geometry into account. Moreover we demonstrate that irreversible perturbations generally can be implemented in conjunction with the stochastic gradient version of the Langevin algorithm. Lastly, while continuous-time irreversible perturbations cannot impair the performance of a Langevin estimator, the situation can sometimes be more complicated when discretization is considered. To this end, we describe a discrete-time example in which irreversibility increases both the bias and variance of the resulting estimator.

preprint2022arXiv

Moderate deviations for systems of slow-fast stochastic reaction-diffusion equations

The goal of this paper is to study the Moderate Deviation Principle (MDP) for a system of stochastic reaction-diffusion equations with a time-scale separation in slow and fast components and small noise in the slow component. Based on weak convergence methods in infinite dimensions and related stochastic control arguments, we obtain an exact form for the moderate deviations rate function in different regimes as the small noise and time-scale separation parameters vanish. Many issues that appear due to the infinite dimensionality of the problem are completely absent in their finite-dimensional counterpart. In comparison to corresponding Large Deviation Principles, the moderate deviation scaling necessitates a more delicate approach to establishing tightness and properly identifying the limiting behavior of the underlying controlled problem. The latter involves regularity properties of a solution of an associated elliptic Kolmogorov equation on Hilbert space along with a finite-dimensional approximation argument.

preprint2022arXiv

Normalization effects on deep neural networks

We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $γ_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.

preprint2022arXiv

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characterize their performance when trained with stochastic gradient descent as the number of hidden units $N$ and gradient descent steps grow to infinity. In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the network's statistical output, closing the gap between the $1/\sqrt{N}$ and the mean-field $1/N$ normalization. We develop an asymptotic expansion for the neural network's statistical output pointwise with respect to the scaling parameter as the number of hidden units grows to infinity. Based on this expansion, we demonstrate mathematically that to leading order in $N$, there is no bias-variance trade off, in that both bias and variance (both explicitly characterized) decrease as the number of hidden units increases and time grows. In addition, we show that to leading order in $N$, the variance of the neural network's statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization. Numerical studies on the MNIST and CIFAR10 datasets show that test and train accuracy monotonically improve as the neural network's normalization gets closer to the mean field normalization.

preprint2022arXiv

Online Adjoint Methods for Optimization of PDEs

We present and mathematically analyze an online adjoint algorithm for the optimization of partial differential equations (PDEs). Traditional adjoint algorithms would typically solve a new adjoint PDE at each optimization iteration, which can be computationally costly. In contrast, an online adjoint algorithm updates the design variables in continuous-time and thus constantly makes progress towards minimizing the objective function. The online adjoint algorithm we consider is similar in spirit to the the pseudo-time-stepping, one-shot method which has been previously proposed. Motivated by the application of such methods to engineering problems, we mathematically study the convergence of the online adjoint algorithm. The online adjoint algorithm relies upon a time-relaxed adjoint PDE which provides an estimate of the direction of steepest descent. The algorithm updates this estimate continuously in time, and it asymptotically converges to the exact direction of steepest descent as $t \rightarrow \infty$. We rigorously prove that the online adjoint algorithm converges to a critical point of the objective function for optimizing the PDE. Under appropriate technical conditions, we also prove a convergence rate for the algorithm. A crucial step in the convergence proof is a multi-scale analysis of the coupled system for the forward PDE, adjoint PDE, and the gradient descent ODE for the design variables.

preprint2022arXiv

Rate of homogenization for fully-coupled McKean-Vlasov SDEs

We consider a fully-coupled slow-fast system of McKean-Vlasov SDEs with full dependence on the slow and fast component and on the law of the slow component and derive convergence rates to its homogenized limit. We do not make periodicity assumptions, but we impose conditions on the fast motion to guarantee ergodicity. In the course of the proof we obtain related ergodic theorems and we gain results on the regularity of Poisson type of equations and of the associated Cauchy-Problem on the Wasserstein space that are of independent interest.

preprint2022arXiv

Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum

We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. The evolution of the neural network during training can be viewed as a stochastic system and, using techniques from stochastic analysis, we prove the neural network converges in distribution to a random ODE with a Gaussian distribution. The limit is completely different than in the typical mean-field results for neural networks due to the $\frac{1}{\sqrt{N}}$ normalization factor in the Xavier initialization (versus the $\frac{1}{N}$ factor in the typical mean-field framework). Although the pre-limit problem of optimizing a neural network is non-convex (and therefore the neural network may converge to a local minimum), the limit equation minimizes a (quadratic) convex objective function and therefore converges to a global minimum. Furthermore, under reasonable assumptions, the matrix in the limiting quadratic objective function is positive definite and thus the neural network (in the limit) will converge to a global minimum with zero loss on the training set.

preprint2020arXiv

Importance sampling for slow-fast diffusions based on moderate deviations

We consider systems of slow--fast diffusions with small noise in the slow component. We construct provably logarithmic asymptotically optimal importance schemes for the estimation of rare events based on the moderate deviations principle. Using the subsolution approach we construct schemes and identify conditions under which the schemes will be asymptotically optimal. Moderate deviations--based importance sampling offers a viable alternative to large deviations importance sampling when the events are not too rare. In particular, in many cases of interest one can indeed construct the required change of measure in closed form, a task which is more complicated using the large deviations--based importance sampling, especially when it comes to multiscale dynamically evolving processes. The presence of multiple scales and the fact that we do not make any periodicity assumptions for the coefficients driving the processes, complicates the design and the analysis of efficient importance sampling schemes. Simulation studies illustrate the theory.

preprint2020arXiv

Network effects in default clustering for large systems

We consider a large collection of dynamically interacting components defined on a weighted directed graph determining the impact of default of one component to another one. We prove a law of large numbers for the empirical measure capturing the evolution of the different components in the pool and from this we extract important information for quantities such as the loss rate in the overall pool as well as the mean impact on a given component from system wide defaults. A singular value decomposition of the adjacency matrix of the graph allows to coarse-grain the system by focusing on the highest eigenvalues which also correspond to the components with the highest contagion impact on the pool. Numerical simulations demonstrate the theoretical findings.

preprint2020arXiv

Selection of quasi-stationary states in the stochastically forced Navier-Stokes equation on the torus

The stochastically forced vorticity equation associated with the two dimensional incompressible Navier-Stokes equation on $D_δ:=[0,2πδ]\times [0,2π]$ is considered for $δ\approx 1$, periodic boundary conditions, and viscocity $0<ν\ll 1$. An explicit family of quasi-stationary states of the deterministic vorticity equation is known to play an important role in the long-time evolution of solutions both in the presence of and without noise. Recent results show the parameter $δ$ plays a central role in selecting which of the quasi-stationary states is most important. In this paper, we aim to develop a finite dimensional model that captures this selection mechanism for the stochastic vorticity equation. This is done by projecting the vorticity equation in Fourier space onto a center manifold corresponding to the lowest eight Fourier modes. Through Monte Carlo simulation, the vorticity equation and the model are shown to be in agreement regarding key aspects of the long-time dynamics. Following this comparison, perturbation analysis is performed on the model via averaging and homogenization techniques to determine the leading order dynamics for statistics of interest for $δ\approx1$.

preprint2020arXiv

Typical dynamics and fluctuation analysis of slow-fast systems driven by fractional Brownian motion

This article studies typical dynamics and fluctuations for a slow-fast dynamical system perturbed by a small fractional Brownian noise. Based on an ergodic theorem with explicit rates of convergence, which may be of independent interest, we characterize the asymptotic dynamics of the slow component to two orders (i.e., the typical dynamics and the fluctuations). The limiting distribution of the fluctuations turns out to depend upon the manner in which the small-noise parameter is taken to zero relative to the scale-separation parameter. We study also an extension of the original model in which the relationship between the two small parameters leads to a qualitative difference in limiting behavior. The results of this paper provide an approximation, to two orders, to dynamical systems perturbed by small fractional Brownian noise and subject to multiscale effects.