Researcher profile

Stefan Steinerberger

Stefan Steinerberger contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
39works
0followers
24topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

39 published item(s)

preprint2024arXiv

Nonlinear recursions on the reals and a problem of Graham

We study sequences $(x_n)_{n=1}^{\infty}$ of reals given by $x_{n+1} = f(x)$ where $$f(x) = x - \sum_{i=1}^{m} \frac{α_i}{x - β_i},$$ where $α_1, \dots, α_m \in \mathbb{R}_{>0}$ and $β_1, \dots, β_m \in \mathbb{R}$ are arbitrary. A special case is $x_{n+1} = x_n - 1/x_n$ due to Ronald Graham for which Chamberland \& Martelli showed that the dynamics is chaotic (topologically conjugate to the doubling map). We prove that the general nonlinear recursion, despite being potentially chaotic, is effective at ensuring that most iterates end up close to one of the poles $β_i$ relatively quickly. More precisely, for a positive proportion of initial values $x \in \mathbb{R}$, the sequence gets very close (distance $\lesssim |x|^{-1}$) to one of the poles $β_i$ within a relatively small ($\lesssim x^2$) number of iteration steps.

preprint2023arXiv

Local sign changes of polynomials

The trigonometric monomial $\cos(\left\langle k, x \right\rangle)$ on $\mathbb{T}^d$, a harmonic polynomial $p: \mathbb{S}^{d-1} \rightarrow \mathbb{R}$ of degree $k$ and a Laplacian eigenfunction $-Δf = k^2 f$ have root in each ball of radius $\sim \|k\|^{-1}$ or $\sim k^{-1}$, respectively. We extend this to linear combinations and show that for any trigonometric polynomials on $\mathbb{T}^d$, any polynomial $p \in \mathbb{R}[x_1, \dots, x_d]$ restricted to $\mathbb{S}^{d-1}$ and any linear combination of global Laplacian eigenfunctions on $ \mathbb{R}^d$ with $d \in \left\{2,3\right\}$ the same property holds for any ball whose radius is given by the sum of the inverse constituent frequencies. We also refine the fact that an eigenfunction $- Δϕ= λϕ$ in $Ω\subset \mathbb{R}^n$ has a root in each $B(x, α_n λ^{-1/2})$ ball: the positive and negative mass in each $B(x,β_n λ^{-1/2})$ ball cancel when integrated against $\|x-y\|^{2-n}$.

preprint2023arXiv

Some Remarks on the Erdős Distinct Subset Sums Problem

Let $\left\{a_1, \dots, a_n\right\} \subset \mathbb{N}$ be a set of positive integers, $a_n$ denoting the largest element, so that for any two of the $2^n$ subsets the sum of all elements is distinct. Erdős asked whether this implies $a_n \geq c \cdot 2^n$ for some universal $c>0$. We prove, slightly extending a result of Elkies, that for any $a_1, \dots, a_n \in \mathbb{R}_{>0}$ $$ \int_{\mathbb{R}} \left( \frac{\sin{ x}}{ x} \right)^2 \prod_{i=1}^{n} \cos{( a_i x)^2} dx \geq \fracπ{2^{n}}$$ with equality if and only if all subset sums are $1-$separated. This leads to a new proof of the currently best lower bound $a_n \geq \sqrt{2/πn} \cdot 2^n$. The main new insight is that having distinct subset sums and $a_n$ small requires the random variable $X = \pm a_1 \pm a_2 \pm \dots \pm a_n$ to be close to Gaussian in a precise sense.

preprint2022arXiv

An Agmon estimate for Schrödinger operators on Graphs

The Agmon estimate shows that eigenfunctions of Schrödinger operators, $ -Δϕ+ V ϕ= E ϕ$, decay exponentially in the `classically forbidden' region where the potential exceeds the energy level $\left\{x: V(x) > E \right\}$. Moreover, the size of $|ϕ(x)|$ is bounded in terms of a weighted (Agmon) distance between $x$ and the allowed region. We derive such a statement on graphs when $-Δ$ is replaced by the Graph Laplacian $L = D-A$: we identify an explicit Agmon metric and prove a pointwise decay estimate in terms of the Agmon distance.

preprint2022arXiv

Approximate Solutions of Linear Systems at a Universal Rate

Let $A \in \mathbb{R}^{n \times n}$ be invertible, $x \in \mathbb{R}^n$ unknown and $b =Ax $ given. We are interested in approximate solutions: vectors $y \in \mathbb{R}^n$ such that $\|Ay - b\|$ is small. We prove that for all $0< \varepsilon <1 $ there is a composition of $k$ orthogonal projections onto the $n$ hyperplanes generated by the rows of $A$, where $$k \leq 2 \log\left(\frac{1}{\varepsilon} \right) \frac{ n}{ \varepsilon^{2}}$$ which maps the origin to a vector $y\in \mathbb{R}^n$ satisfying $\| A y - Ax\| \leq \varepsilon \cdot \|A\| \cdot \| x\|$. We note that this upper bound on $k$ is independent of the matrix $A$. This procedure is stable in the sense that $\|y\| \leq 2\|x\|$. The existence proof is based on a probabilistically refined analysis of the Random Kaczmarz method which seems to achieve this rate when solving for $A x = b$ with high likelihood.

preprint2022arXiv

Curvature on Graphs via Equilibrium Measures

We introduce a notion of curvature on finite, combinatorial graphs. It can be easily computed by solving a linear system of equations. We show that graphs with curvature bounded below by $K>0$ have diameter bounded by $\mbox{diam}(G) \leq 2/K$ (a Bonnet-Myers theorem), that $\mbox{diam}(G) = 2/K$ implies that $G$ has constant curvature (a Cheng theorem) and that there is a spectral gap $λ_1 \geq K/(2n)$ (a Lichnerowicz theorem). It is computed for several families of graphs and often coincides with Ollivier curvature or Lin-Lu-Yau curvature. The von Neumann minimax theorem features prominently in the proofs.

preprint2022arXiv

Eigenvector Phase Retrieval: Recovering eigenvectors from the absolute value of their entries

We consider the eigenvalue problem $Ax = λx$ where $A \in \mathbb{R}^{n \times n}$ and the eigenvalue is also real $λ\in \mathbb{R}$. If we are given $A$, $λ$ and, additionally, the absolute value of the entries of $x$ (the vector $(|x_i|)_{i=1}^n$), is there a fast way to recover $x$? In particular, can this be done quicker than computing $x$ from scratch? This may be understood as a special case of the phase retrieval problem. We present a randomized algorithm which provably converges in expectation whenever $λ$ is a simple eigenvalue. The problem should become easier when $|λ|$ is large and we discuss another algorithm for that case as well.

preprint2022arXiv

Intrinsic Sparsity of Kantorovich Solutions

Let $X,Y$ be two finite sets of points having $\#X = m$ and $\#Y = n$ points with $μ= (1/m) \sum_{i=1}^{m} δ_{x_i}$ and $ν= (1/n) \sum_{j=1}^{n} δ_{y_j}$ being the associated uniform probability measures. A result of Birkhoff implies that if $m = n$, then the Kantorovich problem has a solution which also solves the Monge problem: optimal transport can be realized with a bijection $π: X \rightarrow Y$. This is impossible when $m \neq n$. We observe that when $m \neq n$, there exists a solution of the Kantorovich problem such that the mass of each point in $X$ is moved to at most $n/\gcd(m,n)$ different points in $Y$ and that, conversely, each point in $Y$ receives mass from at most $m/\gcd(m,n)$ points in $X$.

preprint2022arXiv

May the force be with you

Modern methods in dimensionality reduction are dominated by nonlinear attraction-repulsion force-based methods (this includes t-SNE, UMAP, ForceAtlas2, LargeVis, and many more). The purpose of this paper is to demonstrate that all such methods, by design, come with an additional feature that is being automatically computed along the way, namely the vector field associated with these forces. We show how this vector field gives additional high-quality information and propose a general refinement strategy based on ideas from Morse theory. The efficiency of these ideas is illustrated specifically using t-SNE on synthetic and real-life data sets.

preprint2022arXiv

On Combinatorial Properties of Greedy Wasserstein Minimization

We discuss a phenomenon where Optimal Transport leads to a remarkable amount of combinatorial regularity. Consider infinite sequences $(x_k)_{k=1}^{\infty}$ in $[0,1]$ constructed in a greedy manner: given $x_1, \dots, x_n$, the new point $x_{n+1}$ is chosen so as to minimize the Wasserstein distance $W_2$ between the empirical measure of the $n+1$ points and the Lebesgue measure, $$x_{n+1} = \arg\min_x ~W_2\left( \frac{1}{n+1} \sum_{k=1}^{n} δ_{x_k} + \frac{δ_{x}}{n+1}, dx\right).$$ This leads to fascinating sequences (for example: $x_{n+1} = (2k+1)/(2n+2)$ for some $k \in \mathbb{Z}$) which coincide with sequences recently introduced by Ralph Kritzinger in a different setting. Numerically, the regularity of these sequences rival the best known constructions from Combinatorics or Number Theory. We prove a regularity result below the square root barrier.

preprint2022arXiv

Sums of Distances on Graphs and Embeddings into Euclidean Space

Let $G=(V,E)$ be a finite, connected graph. We consider a greedy selection of vertices: given a list of vertices $x_1, \dots, x_k$, take $x_{k+1}$ to be any vertex maximizing the sum of distances to the existing vertices and iterate: we keep adding the `most remote&#39; vertex. The frequency with which the vertices of the graph appear in this sequence converges to a set of probability measures with nice properties. The support of these measures is, generically, given by a rather small number of vertices $m \ll |V|$. We prove that this suggests that the graph $G$ is at most &#39;$m$-dimensional&#39; by exhibiting an explicit $1-$Lipschitz embedding $ϕ: G \rightarrow \ell^1(\mathbb{R}^m)$ with good properties.

preprint2022arXiv

The Boundary of a Graph and its Isoperimetric Inequality

We define, for any graph $G=(V,E)$, a boundary $\partial G \subseteq V$. The definition coincides with what one would expected for the discretization of (sufficiently nice) Euclidean domains and contains all vertices from the Chartrand-Erwin-Johns-Zhang boundary. Moreover, it satisfies an isoperimetric principle stating that graphs with many vertices have a large boundary unless they contain long paths: we show that for graphs with maximal degree $Δ$ $$ | \partial G| \geq \frac{1}{2Δ} \frac{|V|}{\mbox{diam}(G)}.$$ For graphs discretizing Euclidean domains, one has $\mbox{diam}(G) \sim |V|^{1/d}$ and recovers the scaling of the classical Euclidean isoperimetric principle.

preprint2022arXiv

The product of two high-frequency Graph Laplacian eigenfunctions is smooth

In the continuous setting, we expect the product of two oscillating functions to oscillate even more (generically). On a graph $G=(V,E)$, there are only $|V|$ eigenvectors of the Laplacian $L=D-A$, so one oscillates `the most&#39;. The purpose of this short note is to point out an interesting phenomenon: if $ϕ_1, ϕ_2$ are delocalized eigenvectors of $L$ corresponding to large eigenvalues, then their (pointwise) product $ϕ_1 \cdot ϕ_2$ is smooth (in the sense of small Dirichlet energy): highly oscillatory functions have largely matching oscillation patterns.

preprint2021arXiv

A common variable minimax theorem for graphs

Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant function that is smooth with respect to all graphs in $\mathcal{G}$, simultaneously, and how to find it if it exists.

preprint2021arXiv

A Pointwise Inequality for Derivatives of Solutions of the Heat Equation in Bounded Domains

Let $u(t,x)$ be a solution of the heat equation in $\mathbb{R}^n$. Then, each $k-$th derivative also solves the heat equation and satisfies a maximum principle, the largest $k-$th derivative of $u(t,x)$ cannot be larger than the largest $k-$th derivative of $u(0,x)$. We prove an analogous statement for the solution of the heat equation on bounded domains $Ω\subset \mathbb{R}^n$ with Dirichlet boundary conditions. As an application, we give a new and fairly elementary proof of the sharp growth of the second derivatives of Laplacian eigenfunction $-Δϕ_k = λ_k ϕ_k$ with Dirichlet conditions on smooth domains $Ω\subset \mathbb{R}^n$.

preprint2021arXiv

Finding Structure in Sequences of Real Numbers via Graph Theory: a Problem List

We investigate a method of generating a graph $G=(V,E)$ out of an ordered list of $n$ distinct real numbers $a_1, \dots, a_n$. These graphs can be used to test for the presence of interesting structure in the sequence. We describe sequences exhibiting intricate hidden structure that was discovered this way. Our list includes sequences of Deutsch, Erdős, Freud & Hegyvari, Recaman, Quet, Zabolotskiy and Zizka. Since our observations are mostly empirical, each sequence in the list is an open problem.

preprint2021arXiv

Max-Cut via Kuramoto-type Oscillators

We consider the Max-Cut problem. Let $G = (V,E)$ be a graph with adjacency matrix $(a_{ij})_{i,j=1}^{n}$. Burer, Monteiro & Zhang proposed to find, for $n$ angles $\left\{θ_1, θ_2, \dots, θ_n\right\} \subset [0, 2π]$, minima of the energy $$ f(θ_1, \dots, θ_n) = \sum_{i,j=1}^{n} a_{ij} \cos{(θ_i - θ_j)}$$ because configurations achieving a global minimum leads to a partition of size 0.878 Max-Cut(G). This approach is known to be computationally viable and leads to very good results in practice. We prove that by replacing $\cos{(θ_i - θ_j)}$ with an explicit function $g_{\varepsilon}(θ_i - θ_j)$ global minima of this new functional lead to a $(1-\varepsilon)$Max-Cut(G). This suggests some interesting algorithms that perform well. It also shows that the problem of finding approximate global minima of energy functionals of this type is NP-hard in general.

preprint2021arXiv

Neural Collapse with Cross-Entropy Loss

We consider the variational problem of cross-entropy loss with $n$ feature vectors on a unit hypersphere in $\mathbb{R}^d$. We prove that when $d \geq n - 1$, the global minimum is given by the simplex equiangular tight frame, which justifies the neural collapse behavior. We also prove that as $n \rightarrow \infty$ with fixed $d$, the minimizing points will distribute uniformly on the hypersphere and show a connection with the frame potential of Benedetto & Fickus.

preprint2021arXiv

On Concavity of Solutions of the Nonlinear Poisson Equation

We consider the nonlinear Poisson equation $-Δu = f(u)$ in domains $Ω\subset \mathbb{R}^n$ with Dirichlet boundary conditions on $\partial Ω$. We show (for monotonically increasing concave $f$ with small Lipschitz constant) that if $D^2 u$ is negative semi-definite on the boundary, then $u$ is concave. A conjecture of Saint Venant from 1856 (proven by Polya in 1948) is that among all domains $Ω$ of fixed measure, the solution of $-Δu =1$ assumes its largest maximum when $Ω$ is a ball. We extend this to $-Δu =f(u)$ for monotonically increasing $f$ with small Lipschitz constant.

preprint2021arXiv

t-SNE, Forceful Colorings and Mean Field Limits

t-SNE is one of the most commonly used force-based nonlinear dimensionality reduction methods. This paper has two contributions: the first is forceful colorings, an idea that is also applicable to other force-based methods (UMAP, ForceAtlas2,...). In every equilibrium, the attractive and repulsive forces acting on a particle cancel out: however, both the size and the direction of the attractive (or repulsive) forces acting on a particle are related to its properties: the force vector can serve as an additional feature. Secondly, we analyze the case of t-SNE acting on a single homogeneous cluster (modeled by affinities coming from the adjacency matrix of a random k-regular graph); we derive a mean-field model that leads to interesting questions in classical calculus of variations. The model predicts that, in the limit, the t-SNE embedding of a single perfectly homogeneous cluster is not a point but a thin annulus of diameter $\sim k^{-1/4} n^{-1/4}$. This is supported by numerical results. The mean field ansatz extends to other force-based dimensionality reduction methods.

preprint2020arXiv

A Nonlocal Transport Equation Modeling Complex Roots of Polynomials under Differentiation

Let $p_n:\mathbb{C} \rightarrow \mathbb{C}$ be a random complex polynomial whose roots are sampled i.i.d. from a radial distribution $u(r) r dr$ in the complex plane. A natural question is how the distribution of roots evolves under repeated (say $n/2-$times) differentiation of the polynomial. We conjecture a mean-field expansion for the evolution of $ψ(s) = u(s) s$ $$ \frac{\partial ψ}{\partial t} = \frac{\partial}{\partial x} \left( \left( \frac{1}{x} \int_{0}^{x} ψ(s) ds \right)^{-1} ψ(x) \right).$$ The evolution of $ψ(s) \equiv 1$ corresponds to the evolution of random Taylor polynomials $$ p_n(z) = \sum_{k=0}^{n}{ γ_k \frac{z^k}{k!}} \quad \mbox{where} \quad γ_k \sim \mathcal{N}_{\mathbb{C}}(0,1).$$ We discuss some numerical examples suggesting that this particular solution may be stable. We prove that the solution is linearly stable. The linear stability analysis reduces to the classical Hardy integral inequality. Many open problems are discussed.

preprint2020arXiv

A Semicircle Law for Derivatives of Random Polynomials

Let $x_1, \dots, x_n$ be $n$ independent and identically distributed random variables with mean zero, unit variance, and finite moments of all remaining orders. We study the random polynomial $p_n$ having roots at $x_1, \dots, x_n$. We prove that for $\ell \in \mathbb{N}$ fixed as $n \rightarrow \infty$, the $(n-\ell)-$th derivative of $p_n^{}$ behaves like a Hermite polynomial: for $x$ in a compact interval,$${n^{\ell/2}} \frac{\ell!}{n!} \cdot p_n^{(n-\ell)}\left( \frac{x}{\sqrt{n}}\right) \rightarrow He_{\ell}(x + γ_n),$$ where $He_{\ell}$ is the $\ell-$th probabilists&#39; Hermite polynomial and $γ_n$ is a random variable converging to the standard $\mathcal{N}(0,1)$ Gaussian as $n \rightarrow \infty$. Thus, there is a universality phenomenon when differentiating a random polynomial many times: the remaining roots follow a Wigner semicircle distribution.

preprint2020arXiv

A Spectral Approach to the Shortest Path Problem

Let $G=(V,E)$ be a simple, connected graph. One is often interested in a short path between two vertices $u,v$. We propose a spectral algorithm: construct the function $ϕ:V \rightarrow \mathbb{R}_{\geq 0}$ $$ ϕ= \arg\min_{f:V \rightarrow \mathbb{R} \atop f(u) = 0, f \not\equiv 0} \frac{\sum_{(w_1, w_2) \in E}{(f(w_1)-f(w_2))^2}}{\sum_{w \in V}{f(w)^2}}.$$ $ϕ$ can also be understood as the smallest eigenvector of the Laplacian Matrix $L=D-A$ after the $u-$th row and column have been removed. We start in the point $v$ and construct a path from $v$ to $u$: at each step, we move to the neighbor for which $ϕ$ is the smallest. This algorithm provably terminates and results in a short path from $v$ to $u$, often the shortest. The efficiency of this method is due to a discrete analogue of a phenomenon in Partial Differential Equations that is not well understood. We prove optimality for trees and discuss a number of open questions.

preprint2020arXiv

Conservation Laws for the Density of Roots of Polynomials under Differentiation

Let $p_n(x)$ be a polynomial of degree $n$ having $n$ distinct, real roots distributed according to a nice probability distribution $u(0,x)dx$ on $\mathbb{R}$. One natural problem is to understand the density $u(t,x)$ of the roots of the $(t\cdot n)-$th derivative of $p_n$ where $0 < t < 1$ as $n \rightarrow \infty$. We derive an \textit{infinite} number of conversation laws for the evolution of $u(t,x)$. The first three are \begin{align*} \int_{\mathbb{R}}{ u(t,x) ~ dx} = 1-t, \qquad \qquad \int_{\mathbb{R}}{ u(t,x) x ~ dx} = \left(1-t\right)\int_{\mathbb{R}}{ u(0,x) x~ dx}, \qquad \int_{\mathbb{R}} \int_{\mathbb{R}} u(t,x) (x-y)^2 u(t,y) ~ dx dy = (1-t)^3 \int_{\mathbb{R}} \int_{\mathbb{R}} u(0,x) (x-y)^2 u(0,y) ~ dx dy. \end{align*} The author suggested that $u(t,x)$ might evolve according to a nonlocal evolution equation involving the Hilbert transform; this has been verified for two special closed form solutions -- these conservation laws thus point to interesting identities for the Hilbert transform. We discuss many open problems.

preprint2020arXiv

Fourier Uncertainty Principles, Scale Space Theory and the Smoothest Average

Let $f \in L^{2}(\mathbb{R}^n)$ and suppose we are interested in computing its average at a fixed scale. This is easy: we pick the density $u_{}$ of a probability distribution with mean 0 and some moment at the desired scale and compute the convolution $u_{} * f$. Is there a particularly natural choice for $u$? This question is studied in scale space theory and the Gaussian is a popular answer. We were interested whether a canonical choice for $u$ can arise from a new axiom: having fixed a scale, the average should oscillate as little as possible, i.e. $$ u_{} = \arg\min_{u_{}} \sup_{f \in L^2(\mathbb{R}^n)} \frac{\| \nabla (u_{} *f) \|_{L^2(\mathbb{R}^n)}}{\|f\|_{L^2(\mathbb{R}^n)}}.$$ This optimal function turns out to be a minimizer of an uncertainty principle: for $α> 0$ and $β> n/2$, there exists $c_{α, β,n} > 0$ such that for all $u \in L^1(\mathbb{R}^n)$ $$ \| |ξ|^β \cdot \widehat{u}\|^α_{L^{\infty}(\mathbb{R}^n)} \cdot \| |x|^α \cdot u \|^β_{L^1(\mathbb{R}^n)} \geq c_{α, β,n} \|u\|_{L^1(\mathbb{R}^n)}^{α+ β}.$$ For $β= 1$, any nonnegative extremizer of the inequality serves as the best averaging function in the sense above, $β\neq 1$ corresponds to other derivatives. For $(n, β)=(1,1)$ we use the Shannon-Whittaker formula to prove that the characteristic function $u(x) = χ_{[-1/2,1/2]}$ is a local minimizer among functions defined on $[-1/2,1/2]$ for $α\in \left\{2,3,4,5,6\right\}$. We provide a sufficient condition for general $α$ in terms of a sign pattern for the hypergeometric function $_1F_2$.

preprint2020arXiv

Non-Convex Planar Harmonic Maps

We formulate a novel characterization of a family of invertible maps between two-dimensional domains. Our work follows two classic results: The Radó-Kneser-Choquet (RKC) theorem, which establishes the invertibility of harmonic maps into a convex planer domain; and Tutte&#39;s embedding theorem for planar graphs - RKC&#39;s discrete counterpart - which proves the invertibility of piecewise linear maps of triangulated domains satisfying a discrete-harmonic principle, into a convex planar polygon. In both theorems, the convexity of the target domain is essential for ensuring invertibility. We extend these characterizations, in both the continuous and discrete cases, by replacing convexity with a less restrictive condition. In the continuous case, Alessandrini and Nesi provide a characterization of invertible harmonic maps into non-convex domains with a smooth boundary by adding additional conditions on orientation preservation along the boundary. We extend their results by defining a condition on the normal derivatives along the boundary, which we call the cone condition; this condition is tractable and geometrically intuitive, encoding a weak notion of local invertibility. The cone condition enables us to extend Alessandrini and Nesi to the case of harmonic maps into non-convex domains with a piecewise-smooth boundary. In the discrete case, we use an analog of the cone condition to characterize invertible discrete-harmonic piecewise-linear maps of triangulations. This gives an analog of our continuous results and characterizes invertible discrete-harmonic maps in terms of the orientation of triangles incident on the boundary.

preprint2020arXiv

On Eigenvectors of Random Band Matrices with Large Band

We study random, symmetric $N \times N$ band matrices with a band of size $W$ and Bernoulli random variables as entries. This interpolates between nearest neighbour interaction $W = 1$ and Wigner matrices $W = N$. Eigenvectors are known to be localized for $W \ll N^{1/8}$, delocalized for $W \gg N^{4/5}$ and it is conjectured that the transition for the bulk occurs at $W \sim N^{1/2}$. Eigenvalues in the spectral edge change their behavior at $W \sim N^{5/6}$ but nothing is known about the associated eigenvectors. We show that up to $W \ll N^{5/7}$ any random matrix has with large probability some eigenvectors in the spectral edge, which either exhibit mass concentration or interact strongly on a small scale.

preprint2020arXiv

On Matrix Rearrangement Inequalities

Given two symmetric and positive semidefinite square matrices $A, B$, is it true that any matrix given as the product of $m$ copies of $A$ and $n$ copies of $B$ in a particular sequence must be dominated in the spectral norm by the ordered matrix product $A^m B^n$? For example, is $$ \| AABAABABB \| \leq \| AAAAABBBB \|\ ? $$ Drury has characterized precisely which disordered words have the property that an inequality of this type holds for all matrices $A,B$. However, the $1$-parameter family of counterexamples Drury constructs for these characterizations is comprised of $3 \times 3$ matrices, and thus as stated the characterization applies only for $N \times N$ matrices with $N \geq 3$. In contrast, we prove that for $2 \times 2$ matrices, the general rearrangement inequality holds for all disordered words. We also show that for larger $N \times N$ matrices, the general rearrangement inequality holds for all disordered words, for most $A,B$ (in a sense of full measure) that are sufficiently small perturbations of the identity.

preprint2020arXiv

On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

preprint2020arXiv

On Vickrey&#39;s Income Averaging

We consider a small set of axioms for income averaging -- recursivity, continuity, and the boundary condition for the present. These properties yield a unique averaging function that is the density of the reflected Brownian motion with a drift started at the current income and moving over the past incomes. When averaging is done over the short past, the weighting function is asymptotically converging to a Gaussian. When averaging is done over the long horizon, the weighing function converges to the exponential distribution. For all intermediate averaging scales, we derive an explicit solution that interpolates between the two.

preprint2020arXiv

Positive-definite Functions, Exponential Sums and the Greedy Algorithm: a curious Phenomenon

We describe a curious dynamical system that results in sequences of real numbers in $[0,1]$ with seemingly remarkable properties. Let the function $f:\mathbb{T} \rightarrow \mathbb{R}$ satisfy $\hat{f}(k) \geq c|k|^{-2}$ and define a sequence via $$ x_n = \arg\min_x \sum_{k=1}^{n-1}{f(x-x_k)}.$$ Such sequences $(x_n)_{n=1}^{\infty}$ seem to be astonishingly regularly distributed in various ways (satisfying favorable exponential sum estimates; every interval $J \subset [0,1]$ contains $\sim |J|n$ elements). We prove $$ W_2\left( \frac{1}{n} \sum_{k=1}^{n}{δ_{x_k}}, dx\right) \leq \frac{c}{\sqrt{n}},$$ where $W_2$ is the 2-Wasserstein distance. Much stronger results seem to be true and it seems like an interesting problem to understand this dynamical system better. We obtain optimal results in dimension $d \geq 3$: using $G(x,y)$ to denote the Green&#39;s function of the Laplacian on a compact manifold, we show that $$ x_n = \arg\min_{x \in M} \sum_{k=1}^{n-1}{G(x,x_k)} \quad \mbox{satisfies} \quad W_2\left( \frac{1}{n} \sum_{k=1}^{n}{δ_{x_k}}, dx\right) \lesssim \frac{1}{n^{1/d}}.$$

preprint2020arXiv

Regularized Potentials of Schrödinger Operators and a Local Landscape Function

We study localization properties of low-lying eigenfunctions $$(-Δ+V) ϕ= λϕ\qquad \mbox{in}~Ω$$ for rapidly varying potentials $V$ in bounded domains $Ω\subset \mathbb{R}^d$. Filoche & Mayboroda introduced the landscape function $(-Δ+ V)u=1$ and showed that the function $u$ has remarkable properties: localized eigenfunctions prefer to localize in the local maxima of $u$. Arnold, David, Filoche, Jerison \& Mayboroda showed that $1/u$ arises naturally as the potential in a related equation. Motivated by these questions, we introduce a one-parameter family of regularized potentials $V_t$ that arise from convolving $V$ with the radial kernel $$ V_t(x) = V * \left( \frac{1}{t} \int_0^t \frac{ \exp\left( - \|\cdot\|^2/ (4s) \right)}{(4 πs )^{d/2}} ds \right).$$ We prove that for eigenfunctions $(-Δ+V) ϕ= λϕ$ this regularization $V_t$ is, in a precise sense, the canonical effective potential on small scales. The landscape function $u$ respects the same type of regularization. This allows allows us to derive landscape-type functions out of solutions of the equation $(-Δ+ V)u = f$ for a general right-hand side $f:Ω\rightarrow \mathbb{R}_{>0}$.

preprint2020arXiv

Spectral Clustering Revisited: Information Hidden in the Fiedler Vector

We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected to their own cluster and more reliably classified than the rest. This can be regarded as a discrete version of the Hot Spots conjecture and should be useful in applications. We give a rigorous proof for the stochastic block model and several examples.

preprint2020arXiv

The smoothest average: Dirichlet, Fejér and Chebyshev

We are interested in the ``smoothest&#39;&#39; averaging that can be achieved by convolving functions $f \in \ell^2(\mathbb{Z})$ with an averaging function $u$. More precisely, suppose $u:\{-n, \ldots, n\} \to \mathbb{R}$ is a symmetric function normalized to $\sum_{k=-n}^{n}u(k) = 1$. We show that every convolution operator is not-too-smooth, in the sense that $$\sup_{f \in \ell^2(\mathbb{Z})} \frac{\| \nabla (f*u)\|_{\ell^2(\mathbb{Z})}}{\|f\|_{\ell^2}}\geq \frac{2}{2n+1},$$ and we show that equality holds if and only if $u$ is constant on the interval $\{-n, \ldots, n\}$. In the setting where smoothness is measured by the $\ell^2$-norm of the discrete second derivative and we further restrict our attention to functions $u$ with nonnegative Fourier transform, we establish the inequality $$\sup_{f \in \ell^2(\mathbb{Z})} \frac{\| Δ(f*u)\|_{\ell^2(\mathbb{Z})}}{\|f\|_{\ell^2(\mathbb{Z})}} \geq \frac{4}{(n+1)^2},$$ with equality if and only if $u$ is the triangle function $u(k)=(n+1-|k|)/(n+1)^2$. We also discuss a continuous analogue and several open problems.

preprint2020arXiv

Three Convolution Inequalities on the Real Line with Connections to Additive Combinatorics

We discuss three convolution inequalities that are connected to additive combinatorics. Cloninger and the second author showed that for nonnegative $f \in L^1(-1/4, 1/4)$, $$ \max_{-1/2 \leq t \leq 1/2} \int_{\mathbb{R}}{f(t-x) f(x) dx} \geq 1.28 \left( \int_{-1/4}^{1/4}{f(x) dx}\right)^2$$ which is related to $g-$Sidon sets (1.28 cannot be replaced by 1.52). We prove a dual statement, related to difference bases, and show that for $f \in L^1(\mathbb{R})$, $$ \min_{0 \leq t \leq 1}\int_{\mathbb{R}}{f(x) f(x+t) dx} \leq 0.42 \|f\|_{L^1}^2,$$ where the constant 1/2 is trivial, 0.42 cannot be replaced by 0.37. This suggests a natural conjecture about the asymptotic structure of $g-$difference bases. Finally, we show for all functions $f \in L^1(\mathbb{R}) \cap L^2(\mathbb{R})$, $$ \int_{-\frac{1}{2}}^{\frac{1}{2}}{ \int_{\mathbb{R}}{f(x) f(x+t) dx}dt} \leq 0.91 \|f\|_{L^1}\|f\|_{L^2}$$

preprint2020arXiv

Using Expander Graphs to test whether samples are i.i.d

The purpose of this note is to point out that the theory of expander graphs leads to an interesting test whether $n$ real numbers $x_1, \dots, x_n$ could be $n$ independent samples of a random variable. To any distinct, real numbers $x_1, \dots, x_n$, we associate a 4-regular graph $G$ as follows: using $π$ to denote the permutation ordering the elements, $x_{π(1)} < x_{π(2)} < \dots < x_{π(n)}$, we build a graph on $\left\{1, \dots, n\right\}$ by connecting $i$ and $i+1$ (cyclically) and $π(i)$ and $π(i+1)$ (cyclically). If the numbers are i.i.d. samples, then a result of Friedman implies that $G$ is close to Ramanujan. This suggests a test for whether these numbers are i.i.d: compute the second largest (in absolute value) eigenvalue of the adjacency matrix. The larger $λ- 2\sqrt{3}$, the less likely it is for the numbers to be i.i.d. We explain why this is a reasonable test and give many examples.

preprint2020arXiv

Wasserstein Distance, Fourier Series and Applications

We study the Wasserstein metric $W_p$, a notion of distance between two probability distributions, from the perspective of Fourier Analysis and discuss applications. In particular, we bound the Earth Mover Distance $W_1$ between the distribution of quadratic residues in a finite field $\mathbb{F}_p$ and uniform distribution by $\lesssim p^{-1/2}$ (the Polya-Vinogradov inequality implies $\lesssim p^{-1/2} \log{p}$). We also show for continuous $f:\mathbb{T} \rightarrow \mathbb{R}_{}$ with mean value 0 $$ (\mbox{number of roots of}~f) \cdot \left( \sum_{k=1}^{\infty}{ \frac{ |\hat{f}(k)|^2}{k^2}}\right)^{\frac{1}{2}} \gtrsim \frac{\|f\|^{2}_{L^1(\mathbb{T})}}{\|f\|_{L^{\infty}(\mathbb{T})}}.$$ Moreover, we show that for a Laplacian eigenfunction $-Δ_g ϕ_λ = λϕ_λ$ on a compact Riemannian manifold $W_p\left(\max\left\{ϕ_λ, 0\right\}dx, \max\left\{-ϕ_λ, 0\right\} dx\right) \lesssim_p \sqrt{\logλ/λ} \|ϕ_λ\|_{L^1}^{1/p}$ which is at most a factor $\sqrt{\logλ}$ away from sharp. Several other problems are discussed.

preprint2019arXiv

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the &#34;crowding problem&#34; of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of freedom $ν$, with $ν\to\infty$ corresponding to SNE and $ν=1$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $ν<1$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.

preprint2019arXiv

Leaky Roots and Stable Gauss-Lucas Theorems

Let $p:\mathbb{C} \rightarrow \mathbb{C}$ be a polynomial. The Gauss-Lucas theorem states that its critical points, $p&#39;(z) = 0$, are contained in the convex hull of its roots. A recent quantitative version Totik shows that if almost all roots are contained in a bounded convex domain $K \subset \mathbb{C}$, then almost all roots of the derivative $p&#39;$ are in a $\varepsilon-$neighborhood $K_{\varepsilon}$ (in a precise sense). We prove another quantitative version: if a polynomial $p$ has $n$ roots in $K$ and $\lesssim c_{K, \varepsilon} (n/\log{n})$ roots outside of $K$, then $p&#39;$ has at least $n-1$ roots in $K_{\varepsilon}$. This establishes, up to a logarithm, a conjecture of the first author: we also discuss an open problem whose solution would imply the full conjecture.