Researcher profile

Simon Lucey

Simon Lucey contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs

Coordinate-MLPs are emerging as an effective tool for modeling multidimensional continuous signals, overcoming many drawbacks associated with discrete grid-based approximations. However, coordinate-MLPs with ReLU activations, in their rudimentary form, demonstrate poor performance in representing signals with high fidelity, promoting the need for positional embedding layers. Recently, Sitzmann et al. proposed a sinusoidal activation function that has the capacity to omit positional embedding from coordinate-MLPs while still preserving high signal fidelity. Despite its potential, ReLUs are still dominating the space of coordinate-MLPs; we speculate that this is due to the hyper-sensitivity of networks -- that employ such sinusoidal activations -- to the initialization schemes. In this paper, we attempt to broaden the current understanding of the effect of activations in coordinate-MLPs, and show that there exists a broader class of activations that are suitable for encoding signals. We affirm that sinusoidal activations are only a single example in this class, and propose several non-periodic functions that empirically demonstrate more robust performance against random initializations than sinusoids. Finally, we advocate for a shift towards coordinate-MLPs that employ these non-traditional activation functions due to their high performance and simplicity.

preprint2022arXiv

Enabling equivariance for arbitrary Lie groups

Although provably robust to translational perturbations, convolutional neural networks (CNNs) are known to suffer from extreme performance degradation when presented at test time with more general geometric transformations of inputs. Recently, this limitation has motivated a shift in focus from CNNs to Capsule Networks (CapsNets). However, CapsNets suffer from admitting relatively few theoretical guarantees of invariance. We introduce a rigourous mathematical framework to permit invariance to any Lie group of warps, exclusively using convolutions (over Lie groups), without the need for capsules. Previous work on group convolutions has been hampered by strong assumptions about the group, which precludes the application of such techniques to common warps in computer vision such as affine and homographic. Our framework enables the implementation of group convolutions over any finite-dimensional Lie group. We empirically validate our approach on the benchmark affine-invariant classification task, where we achieve 30% improvement in accuracy against conventional CNNs while outperforming most CapsNets. As further illustration of the generality of our framework, we train a homography-convolutional model which achieves superior robustness on a homography-perturbed dataset, where CapsNet results degrade.

preprint2022arXiv

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization

Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information. Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method - the first to evade the simplicity bias - highlights the need for a better understanding and control of inductive biases in deep learning.

preprint2022arXiv

GARF: Gaussian Activated Radiance Fields for High Fidelity Reconstruction and Pose Estimation

Despite Neural Radiance Fields (NeRF) showing compelling results in photorealistic novel views synthesis of real-world scenes, most existing approaches require accurate prior camera poses. Although approaches for jointly recovering the radiance field and camera pose exist (BARF), they rely on a cumbersome coarse-to-fine auxiliary positional embedding to ensure good performance. We present Gaussian Activated neural Radiance Fields (GARF), a new positional embedding-free neural radiance field architecture - employing Gaussian activations - that outperforms the current state-of-the-art in terms of high fidelity reconstruction and pose estimation.

preprint2022arXiv

How You Start Matters for Generalization

Characterizing the remarkable generalization properties of over-parameterized neural networks remains an open problem. In this paper, we promote a shift of focus towards initialization rather than neural architecture or (stochastic) gradient descent to explain this implicit regularization. Through a Fourier lens, we derive a general result for the spectral bias of neural networks and show that the generalization of neural networks is heavily tied to their initialization. Further, we empirically solidify the developed theoretical insights using practical, deep networks. Finally, we make a case against the controversial flat-minima conjecture and show that Fourier analysis grants a more reliable framework for understanding the generalization of neural networks.

preprint2022arXiv

Learning Positional Embeddings for Coordinate-MLPs

We propose a novel method to enhance the performance of coordinate-MLPs by learning instance-specific positional embeddings. End-to-end optimization of positional embedding parameters along with network weights leads to poor generalization performance. Instead, we develop a generic framework to learn the positional embedding based on the classic graph-Laplacian regularization, which can implicitly balance the trade-off between memorization and generalization. This framework is then used to propose a novel positional embedding scheme, where the hyperparameters are learned per coordinate (i.e, instance) to deliver optimal performance. We show that the proposed embedding achieves better performance with higher stability compared to the well-established random Fourier features (RFF). Further, we demonstrate that the proposed embedding scheme yields stable gradients, enabling seamless integration into deep architectures as intermediate layers.

preprint2022arXiv

Long-term Visual Map Sparsification with Heterogeneous GNN

We address the problem of map sparsification for long-term visual localization. For map sparsification, a commonly employed assumption is that the pre-build map and the later captured localization query are consistent. However, this assumption can be easily violated in the dynamic world. Additionally, the map size grows as new data accumulate through time, causing large data overhead in the long term. In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization. Inspired by the recent progress in Graph Neural Network(GNN), we propose the first work that models SfM maps as heterogeneous graphs and predicts 3D point importance scores with a GNN, which enables us to directly exploit the rich information in the SfM map graph. Two novel supervisions are proposed: 1) a data-fitting term for selecting valuable points to future localization based on training queries; 2) a K-Cover term for selecting sparse points with full map coverage. The experiments show that our method selected map points on stable and widely visible structures and outperformed baselines in localization performance.

preprint2022arXiv

On Regularizing Coordinate-MLPs

We show that typical implicit regularization assumptions for deep neural networks (for regression) do not hold for coordinate-MLPs, a family of MLPs that are now ubiquitous in computer vision for representing high-frequency signals. Lack of such implicit bias disrupts smooth interpolations between training samples, and hampers generalizing across signal regions with different spectra. We investigate this behavior through a Fourier lens and uncover that as the bandwidth of a coordinate-MLP is enhanced, lower frequencies tend to get suppressed unless a suitable prior is provided explicitly. Based on these insights, we propose a simple regularization technique that can mitigate the above problem, which can be incorporated into existing networks without any architectural modifications.

preprint2022arXiv

Reframing Neural Networks: Deep Structure in Overcomplete Representations

In comparison to classical shallow representation learning techniques, deep neural networks have achieved superior performance in nearly every application benchmark. But despite their clear empirical advantages, it is still not well understood what makes them so effective. To approach this question, we introduce deep frame approximation: a unifying framework for constrained representation learning with structured overcomplete frames. While exact inference requires iterative optimization, it may be approximated by the operations of a feed-forward deep neural network. We indirectly analyze how model capacity relates to frame structures induced by architectural hyperparameters such as depth, width, and skip connections. We quantify these structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability. As a criterion for model selection, we show correlation with generalization error on a variety of common deep network architectures and datasets. We also demonstrate how recurrent networks implementing iterative optimization algorithms can achieve performance comparable to their feed-forward approximations while improving adversarial robustness. This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design with less reliance on ad-hoc engineering.

preprint2020arXiv

Dataless Model Selection with the Deep Frame Potential

Choosing a deep neural network architecture is a fundamental problem in applications that require balancing performance and parameter efficiency. Standard approaches rely on ad-hoc engineering or computationally expensive validation on a specific dataset. We instead attempt to quantify networks by their intrinsic capacity for unique and robust representations, enabling efficient architecture comparisons without requiring any data. Building upon theoretical connections between deep learning and sparse approximation, we propose the deep frame potential: a measure of coherence that is approximately related to representation stability but has minimizers that depend only on network structure. This provides a framework for jointly quantifying the contributions of architectural hyper-parameters such as depth, width, and skip connections. We validate its use as a criterion for model selection and demonstrate correlation with generalization error on a variety of common residual and densely connected network architectures.

preprint2020arXiv

High Accuracy Face Geometry Capture using a Smartphone Video

What's the most accurate 3D model of your face you can obtain while sitting at your desk? We attempt to answer this question in our work. High fidelity face reconstructions have so far been limited to either studio settings or through expensive 3D scanners. On the other hand, unconstrained reconstruction methods are typically limited by low-capacity models. Our method reconstructs accurate face geometry of a subject using a video shot from a smartphone in an unconstrained environment. Our approach takes advantage of recent advances in visual SLAM, keypoint detection, and object detection to improve accuracy and robustness. By not being constrained to a model subspace, our reconstructed meshes capture important details while being robust to noise and being topologically consistent. Our evaluations show that our method outperforms current single and multi-view baselines by a significant margin, both in terms of geometric accuracy and in capturing person-specific details important for making realistic looking models.

preprint2020arXiv

Joint Pose and Shape Estimation of Vehicles from LiDAR Data

We address the problem of estimating the pose and shape of vehicles from LiDAR scans, a common problem faced by the autonomous vehicle community. Recent work has tended to address pose and shape estimation separately in isolation, despite the inherent connection between the two. We investigate a method of jointly estimating shape and pose where a single encoding is learned from which shape and pose may be decoded in an efficient yet effective manner. We additionally introduce a novel joint pose and shape loss, and show that this joint training method produces better results than independently-trained pose and shape estimators. We evaluate our method on both synthetic data and real-world data, and show superior performance against a state-of-the-art baseline.

preprint2020arXiv

When to Use Convolutional Neural Networks for Inverse Problems

Reconstruction tasks in computer vision aim fundamentally to recover an undetermined signal from a set of noisy measurements. Examples include super-resolution, image denoising, and non-rigid structure from motion, all of which have seen recent advancements through deep learning. However, earlier work made extensive use of sparse signal reconstruction frameworks (e.g convolutional sparse coding). While this work was ultimately surpassed by deep learning, it rested on a much more developed theoretical framework. Recent work by Papyan et. al provides a bridge between the two approaches by showing how a convolutional neural network (CNN) can be viewed as an approximate solution to a convolutional sparse coding (CSC) problem. In this work we argue that for some types of inverse problems the CNN approximation breaks down leading to poor performance. We argue that for these types of problems the CSC approach should be used instead and validate this argument with empirical evidence. Specifically we identify JPEG artifact reduction and non-rigid trajectory reconstruction as challenging inverse problems for CNNs and demonstrate state of the art performance on them using a CSC method. Furthermore, we offer some practical improvements to this model and its application, and also show how insights from the CSC model can be used to make CNNs effective in tasks where their naive application fails.