Source author record

Ben Krause

Ben Krause appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CA math.DS math.NT Machine Learning math.CO math.FA math.MG math.PR Neural and Evolutionary Computing

Catalog footprint

What is connected

18works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Wiener Wintner and Return Times Theorem Along the Primes

We prove the following Return Times Theorem along the sequence of prime times, the first extension of the Return Times Theorem to arithmetic sequences: For every probability space, $(Ω,ν)$, equipped with a measure-preserving transformation, $T \colon Ω\to Ω$, and every $f \in L^\infty(Ω)$, there exists a set of full probability, $Ω_f \subset Ω$ with $ν(Ω_f) =1$, so that for all $ω\in Ω_f$, for any other probability space $(X,μ)$, equipped with a measure-preserving transformation $S : X \to X$, for any $g \in L^{\infty}(X)$, \begin{align} \frac{1}{N} \sum_{n \leq N} f(T^{p_n} ω) g(S^{p_n} \cdot) \end{align} converges $μ$-almost surely; above, $\{ 2=p_1 < p_2 < \dots \}$ are an enumeration of the primes. The Wiener-Wintner theorem along the primes is an immediate corollary. Our proof lives at the interface of classical Fourier analysis, combinatorial number theory, higher order Fourier analysis, and pointwise ergodic theory, with $U^3$ theory playing an important role; our $U^3$-estimates for \emph{Heath-Brown} models of the von Mangoldt function may be of independent interest.

preprint2023arXiv

Pointwise Ergodic Theory: Examples and Entropy

We provide an exposition of the proofs of Bourgain's polynomial ergodic theorems. The focus is on the motivation and intuition behind his arguments.

preprint2022arXiv

Pointwise ergodic theorems for non-conventional bilinear polynomial averages

We establish convergence in norm and pointwise almost everywhere for the non-conventional (in the sense of Furstenberg) bilinear polynomial ergodic averages \[ A_N(f,g)(x) := \frac{1}{N} \sum_{n =1}^N f(T^nx) g(T^{P(n)}x)\] as $N \to \infty$, where $T \colon X \to X$ is a measure-preserving transformation of a $σ$-finite measure space $(X,μ)$, $P(\mathrm{n}) \in \mathbb Z[\mathrm{n}]$ is a polynomial of degree $d \geq 2$, and $f \in L^{p_1}(X), \ g \in L^{p_2}(X)$ for some $p_1,p_2 > 1$ with $\frac{1}{p_1} + \frac{1}{p_2} \leq 1$. We also establish an $r$-variational inequality for these averages (at lacunary scales) in the optimal range $r > 2$. We are also able to "break duality" by handling some ranges of exponents $p_1,p_2$ with $\frac{1}{p_1}+\frac{1}{p_2} > 1$, at the cost of increasing $r$ slightly. This gives an affirmative answer to Problem 11 from Frantzikinakis' open problems survey for the Furstenberg--Weiss averages (with $P(\mathrm{n})=\mathrm{n}^2$), which is a bilinear variant of Question 9 considered by Bergelson in his survey on Ergodic Ramsey Theory from 1996. This also gives a contribution to the Furstenberg-Bergelson-Leibman conjecture. Our methods combine techniques from harmonic analysis with the recent inverse theorems of Peluse and Prendiville in additive combinatorics. At large scales, the harmonic analysis of the adelic integers $\mathbb A_{\mathbb Z}$ also plays a role.

preprint2020arXiv

Averages Along the Primes: Improving and Sparse Bounds

Consider averages along the prime integers $ \mathbb P $ given by \begin{equation*} \mathcal{A}_N f (x) = N ^{-1} \sum_{ p \in \mathbb P \;:\; p\leq N} (\log p) f (x-p). \end{equation*} These averages satisfy a uniform scale-free $ \ell ^{p}$-improving estimate. For all $ 1< p < 2$, there is a constant $ C_p$ so that for all integer $ N$ and functions $ f$ supported on $ [0,N]$, there holds \begin{equation*} N ^{-1/p' }\lVert \mathcal{A}_N f\rVert_{\ell^{p'}} \leq C_p N ^{- 1/p} \lVert f\rVert_{\ell^p}. \end{equation*} The maximal function $ \mathcal{A}^{\ast} f =\sup_{N} \lvert \mathcal{A}_N f \rvert$ satisfies $ (p,p)$ sparse bounds for all $ 1< p < 2$. The latter are the natural variants of the scale-free bounds. As a corollary, $ \mathcal{A}^{\ast} $ is bounded on $ \ell ^{p} (w)$, for all weights $ w$ in the Muckenhoupt $A_p$ class. No prior weighted inequalities for $ \mathcal{A}^{\ast} $ were known.

preprint2020arXiv

On Maximal Functions With Curvature

We exhibit a class of "relatively curved" $\vecγ(t) := (γ_1(t),\dots,γ_n(t))$, so that the pertaining multi-linear maximal function satisfies the sharp range of Hölder exponents, \[ \left\| \sup_{r > 0} \ \frac{1}{r} \int_{0}^r \prod_{i=1}^n |f_i(x-γ_i(t))| \ dt \right\|_{L^p(\mathbb{R})} \leq C \cdot \prod_{i=1}^n \| f_j \|_{L^{p_j}(\mathbb{R})} \] whenever $\frac{1}{p} = \sum_{j=1}^n \frac{1}{p_j}$, where $p_j > 1$ and $p \geq p_{\vecγ}$, where $1 \geq p_{\vecγ} > 1/n$ for certain curves. For instance, $p_{\vecγ} = 1/n^+$ for the case of fractional monomials, \[ \vecγ(t) = (t^{α_1},\dots,t^{α_n}), \; \; \; α_1 < \dots < α_n.\] Two sample applications of our method are as follows: For any measurable $u_1,\dots,u_n : \mathbb{R}^{n} \to \mathbb{R}$, with $u_i$ independent of the $i$th coordinate vector, and any relatively curved $\vecγ$, \[ \lim_{r \to 0} \ \frac{1}{r} \int_0^r F\big(x_1 - u_1(x) \cdot γ_1(t),\dots,x_n - u_n(x) \cdot γ_n(t) \big) \ dt = F(x_1,\dots,x_n), \; \; \; a.e. \] for every $F \in L^p(\mathbb{R}^n), \ p > 1$. Every appropriately normalized set $A \subset [0,1]$ of sufficiently large Hausdorff dimension contains the progression, \[ \{ x, x-γ_1(t),\dots,x - γ_n(t) \} \subset A, \] for some $t \geq c_{\vecγ} > 0$ strictly bounded away from zero, depending on $\vecγ$.

preprint2020arXiv

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the $k$-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training---a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase $k$, and (2) can significantly close the performance gap between linearized and full training. Compared with linearized training, higher-order training works in more realistic settings such as standard parameterization and large (initial) learning rate. We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.

preprint2016arXiv

A Discrete Carleson Theorem Along the Primes with a Restricted Supremum

Consider the discrete maximal function acting on finitely supported functions on the integers, \[ \mathcal{C}_Λf(n) := \sup_{λ\in Λ} | \sum_{p \in \pm \mathbb{P}} f(n-p) \log |p| \frac{e^{2πi λp}}{p} |,\] where $\pm \mathbb{P} := \{ \pm p : p \text{ is a prime} \}$, and $Λ\subset [0,1]$. We give sufficient conditions on $Λ$, met by (finite unions of) lacunary sets, for this to be a bounded sublinear operator on $\ell^p(\mathbb{Z})$ for $\frac{3}{2} < p < 4$.

preprint2016arXiv

A Discrete Quadratic Carleson Theorem on $ \ell ^2 $ with a Restricted Supremum

Consider the discrete maximal function acting on $\ell^2(\mathbb Z)$ functions \[ \mathcal{C}_Λ f( n ) := \sup_{ λ\in Λ} \left| \sum_{m \neq 0} f(n-m) \frac{e^{2 πiλm^2}} {m} \right| \] where $Λ\subset [0,1]$. We give sufficient conditions on $Λ$, met by certain kinds of Cantor sets, for this to be a bounded sublinear operator. This result is a discrete analogue of E. M. Stein's integral result, that the maximal operator below is bounded on $L^2(\mathbb R)$. \[ \mathcal{C}_2 f(x):= \sup_{λ\in \mathbb R} \left| \int f(x-y) \frac{e^{2πi λy^2}}{y} \ dy \right|.\] The proof of our result relies heavily on Bourgain's work on arithmetic ergodic theorems, with novel complexity arising from the oscillatory nature of the question at hand, and difficulties arising from the the parameter $λ$ above.

preprint2016arXiv

Sparse Bounds for Random Discrete Carleson Theorems

We study discrete random variants of the Carleson maximal operator. Intriguingly, these questions remain subtle and difficult, even in this setting. Let $\{X_m\}$ be an independent sequence of $\{0,1\}$ random variables with expectations \[ \mathbb E X_m = σ_m = m^{-α}, \ 0 < α< 1/2, \] and $ S_m = \sum_{k=1} ^{m} X_k$. Then the maximal operator below almost surely is bounded from $ \ell ^{p}$ to $ \ell ^{p}$, provided the Minkowski dimension of $ Λ\subset [-1/2, 1/2]$ is strictly less than $ 1- α$. \[ \sup_{λ\in Λ} \Bigl| \sum_{m\neq 0} X_{\lvert m\rvert } \frac{e( λm )}{ {\rm sgn} (m)S_{ |m| }} f(x- m) \Bigr|. \] This operator also satisfies a sparse type bound. The form of the sparse bound immediately implies weighted estimates in all $ \ell ^{2}$, which are novel in this setting. Variants and extensions are also considered.

preprint2015arXiv

Measures of polynomial growth and classical convolution inequalities

We study $L^p(μ) \to L^q(ν)$ mapping properties of the convolution operator $ T_λf(x)=λ*(fμ)(x)$ and of the corresponding maximal operator $ {\mathcal T}_λf(x)=\sup_{t>0} |λ_t*(fμ)(x)|$, where $λ$ is a tempered distribution, and $μ$ and $ν$ are compactly supported measures satisfying the polynomial growth bounds $μ(B(x,r)) \leq Cr^{s_μ}$ and $ν(B(x,r)) \leq Cr^{s_ν}$. As a result, we prove variants of the classical $L^p$-improving (Littman; Strichartz) and maximal (Stein) inequalities in a setting where the Plancherel formula is not available. Connections with the David-Semmes conjecture are also discussed.

preprint2015arXiv

On Higher-Dimensional Oscillation in Ergodic Theory

We extend the results of Jones, Rosenblatt, and Wierdl concerning higher-dimensional oscillation in ergodic theory in a variety of ways. We do so by transference to the integer lattice, where we employ technique from (discrete) harmonic analysis.

preprint2015arXiv

On the Hardy--Littlewood majorant problem for arithmetic sets

The aim of this paper is to exhibit a wide class of sparse deterministic sets, $\mathbf B \subseteq \mathbb{N}$, so that \[ \limsup_{N \to \infty} N^{-1}|\mathbf B \cap [1,N]|= 0, \] for which the Hardy--Littlewood majorant property holds: \[ \sup_{|a_n|\le 1} \Big\| \sum_{n\in\mathbf B\cap[1, N]} a_n e^{2 πi n ξ}\Big \|_{L^p(\mathbb{T}, {\mathrm d} ξ)} \leq \mathbf{C}_p \Big\| \sum_{n\in\mathbf B\cap[1, N]} e^{2 πi n ξ} \Big\|_{L^p(\mathbb{T}, {\mathrm d} ξ)}, \] where $p \geq p_{\mathbf{B}}$ is sufficiently large, the implicit constant $\mathbf{C}_p$ is independent of $N$, and the supremum is taken over all complex sequences $ (a_n : n \in \mathbb{N})$ such that $|a_n| \leq 1$.

preprint2015arXiv

Optimizing and Contrasting Recurrent Neural Network Architectures

Recurrent Neural Networks (RNNs) have long been recognized for their potential to model complex time series. However, it remains to be determined what optimization techniques and recurrent architectures can be used to best realize this potential. The experiments presented take a deep look into Hessian free optimization, a powerful second order optimization method that has shown promising results, but still does not enjoy widespread use. This algorithm was used to train to a number of RNN architectures including standard RNNs, long short-term memory, multiplicative RNNs, and stacked RNNs on the task of character prediction. The insights from these experiments led to the creation of a new multiplicative LSTM hybrid architecture that outperformed both LSTM and multiplicative RNNs. When tested on a larger scale, multiplicative LSTM achieved character level modelling results competitive with the state of the art for RNNs using very different methodology.

preprint2015arXiv

The Maximal Function and Square Function Control the Variation: An Elementary Proof

In this note we prove the following good-$λ$ inequality, for $r>2$, all $λ> 0$, $δ\in \big(0, \frac{1}{2} \big)$ \[ ν\big\{ V_r(f) > 3 λ; \mathcal{M}(f) \leq δλ\big\} \leq 4 ν\{s(f) > δλ\} + {δ^2 \left(1+\frac{16}{r-2}\right)^2} \cdot ν\big\{ V_r(f) > λ\big\}, \] where $\mathcal{M}(f)$ is the martingale maximal function, $s(f)$ is the conditional martingale square function. This immediately proves that $V_r(f)$ is bounded on $L^p$, $1 < p <\infty$ and moreover is integrable when the maximal function is.

preprint2015arXiv

Two-parameter version of Bourgain's inequality: Rational frequencies

Our aim is to establish the first two-parameter version of Bourgain's maximal logarithmic inequality on $L^2(\mathbb R^2)$ for the rational frequencies. We achieve this by introducing a variant of a two-parameter Rademacher--Menschov inequality. The method allows us to control an oscillation seminorm as well.

preprint2014arXiv

Dimension-Free $L^p$-Maximal Inequalities in $\mathbb{Z}_{m+1}^N$

For $m \geq 2$, let $(\mathbb{Z}_{m+1}^N, |\cdot|)$ denote the group equipped with the so-called $l^0$ metric, \[ |y| = \left| \big( y(1), \dots, y(N) \big) \right| := | \{1 \leq i \leq N : y(i) \neq 0 \} |,\] and define the $L^1$-normalized indicator of the $r$-sphere, \[ σ_r := \frac{1}{|\{|x| = r\}|} 1_{\{|x| =r\}}.\] We study the $L^p \to L^p$ mapping properties of the maximal operator \[ M^{N} f (x) := \sup_{r \leq N} | σ_r*f| \] acting on functions defined on $\mathbb{Z}_{m+1}^N$. Specifically, we prove that for all $p>1$, there exist absolute constants $C_{m,p}$ so that \[ \| M^{N} f \|_{L^p(\mathbb{Z}_{m+1}^N)} \leq C_{m,p} \| f \|_{L^p(\mathbb{Z}_{m+1}^N)} \] for all $N$.

preprint2014arXiv

Polynomial Ergodic Averages Converge Rapidly: Variations on a Theorem of Bourgain

Let $L^2(X,Σ,μ,τ)$ be a measure-preserving system, with $τ$ a $\mathbb{Z}$-action. In this note, we prove that the ergodic averages along integer-valued polynomials, $P(n)$, \[ M_N(f):= \frac{1}{N}\sum_{n \leq N} τ^{P(n)} f \] converge pointwise for $f \in L^2(X)$. We do so by proving that, for $r>2$, the $r$-variation, $\mathcal{V}^r(M_N(f))$, extends to a bounded operator on $L^2$. We also prove that our result is sharp, in that $\mathcal{V}^2(M_N(f))$ is an unbounded operator on $L^2$.

preprint2014arXiv

Some Optimizations for (Maximal) Multipliers in $L^p$

We use Oberlin, Nazarov, and Thiele's Multi-Frequency Calderón-Zygmund decomposition to lower estimates on maximal multipliers in $L^p$. We also improve on classical multiplier results of Coifman, Rubio de Francia, and Semmes.

Ben Krause

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

The Wiener Wintner and Return Times Theorem Along the Primes

Pointwise Ergodic Theory: Examples and Entropy

Pointwise ergodic theorems for non-conventional bilinear polynomial averages

Averages Along the Primes: Improving and Sparse Bounds

On Maximal Functions With Curvature

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

A Discrete Carleson Theorem Along the Primes with a Restricted Supremum

A Discrete Quadratic Carleson Theorem on $ \ell ^2 $ with a Restricted Supremum

Sparse Bounds for Random Discrete Carleson Theorems

Measures of polynomial growth and classical convolution inequalities

On Higher-Dimensional Oscillation in Ergodic Theory

On the Hardy--Littlewood majorant problem for arithmetic sets

Optimizing and Contrasting Recurrent Neural Network Architectures

The Maximal Function and Square Function Control the Variation: An Elementary Proof

Two-parameter version of Bourgain's inequality: Rational frequencies

Dimension-Free $L^p$-Maximal Inequalities in $\mathbb{Z}_{m+1}^N$

Polynomial Ergodic Averages Converge Rapidly: Variations on a Theorem of Bourgain

Some Optimizations for (Maximal) Multipliers in $L^p$