Source author record

Kyle Matoba

Kyle Matoba appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence math.NA

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Efficiently Training Low-Curvature Neural Networks

The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of non-linearity. Using this, we demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature than standard models while exhibiting similar predictive performance, which leads to improved robustness and stable gradients, with only a marginally increased training time. To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers. To efficiently minimize this bound, we introduce two novel architectural components: first, a non-linearity called centered-softplus that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained batch normalization layer. Our experiments show that LCNNs have lower curvature, more stable gradients and increased off-the-shelf adversarial robustness when compared to their standard high-curvature counterparts, all without affecting predictive performance. Our approach is easy to use and can be readily incorporated into existing neural network models.

preprint2022arXiv

The Theoretical Expressiveness of Maxpooling

Over the decade since deep neural networks became state of the art image classifiers there has been a tendency towards less use of max pooling: the function that takes the largest of nearby pixels in an image. Since max pooling featured prominently in earlier generations of image classifiers, we wish to understand this trend, and whether it is justified. We develop a theoretical framework analyzing ReLU based approximations to max pooling, and prove a sense in which max pooling cannot be efficiently replicated using ReLU activations. We analyze the error of a class of optimal approximations, and find that whilst the error can be made exponentially small in the kernel size, doing so requires an exponentially complex approximation. Our work gives a theoretical basis for understanding the trend away from max pooling in newer architectures. We conclude that the main cause of a difference between max pooling and an optimal approximation, a prevalent large difference between the max and other values within pools, can be overcome with other architectural decisions, or is not prevalent in natural images.

preprint2021arXiv

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.

preprint2012arXiv

A Computable Figure of Merit for Quasi-Monte Carlo Point Sets

Let $\mathcal{P} \subset [0,1)^S$ be a finite point set of cardinality $N$ in an $S$-dimensional cube, and let $f:[0,1)^S \to \mathbb{R}$ be an integrable function. A QMC integration of $f$ by $\mathcal{P}$ is the average of values of $f$ at each point in $\mathcal{P}$, which approximates the integration of $f$ over the cube. Assume that $\mathcal{P}$ is constructed from an $\mathbb{F}2$-vector space $P\subset (\F2^n)^S$ by means of a digital net with $n$-digit precision. As an $n$-digit discretized version of Josef Dick's method, we introduce Walsh figure of merit (WAFOM) $\textnormal{WF}(P)$ of $P$, which satisfies a Koksma-Hlawka type inequality, namely, QMC integration error is bounded by $C_{S,n}||f||_n \textnormal{WF}(P)$ under $n$-smoothness of $f$, where $C_{S,n}$ is a constant depending only on $S,n$. We show a Fourier inversion formula for $\textnormal{WF}(P)$ which is computable in $O(n SN)$ steps. This effectiveness enables us a random search for $P$ with small value of $\textnormal{WF}(P)$, which would be difficult for other figures of merit such as discrepancy. From an analogy to coding theory, we expect that random search may find better point sets than mathematical constructions. In fact, a naïve search finds point sets $P$ with small $\textnormal{WF}(P)$. In experiments, we show better performance of these point sets in QMC integration than widely used QMC rules. We show some experimental evidence on the effectiveness of our point sets to even non-smooth integrands appearing in finance.