Source author record

Stéphane Deny

Stéphane Deny appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Neurons and Cognition Artificial Intelligence Computer Vision cond-mat.dis-nn cond-mat.stat-mech Machine Learning

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Progress and limitations of deep networks to recognize objects in unusual poses

Deep networks should be robust to rare events if they are to be successfully deployed in high-stakes real-world applications (e.g., self-driving cars). Here we study the capability of deep networks to recognize objects in unusual poses. We create a synthetic dataset of images of objects in unusual orientations, and evaluate the robustness of a collection of 38 recent and competitive deep networks for image classification. We show that classifying these images is still a challenge for all networks tested, with an average accuracy drop of 29.5% compared to when the objects are presented upright. This brittleness is largely unaffected by various network design choices, such as training losses (e.g., supervised vs. self-supervised), architectures (e.g., convolutional networks vs. transformers), dataset modalities (e.g., images vs. image-text pairs), and data-augmentation schemes. However, networks trained on very large datasets substantially outperform others, with the best network tested$\unicode{x2014}$Noisy Student EfficentNet-L2 trained on JFT-300M$\unicode{x2014}$showing a relatively small accuracy drop of only 14.5% on unusual poses. Nevertheless, a visual inspection of the failures of Noisy Student reveals a remaining gap in robustness with the human visual system. Furthermore, combining multiple object transformations$\unicode{x2014}$3D-rotations and scaling$\unicode{x2014}$further degrades the performance of all networks. Altogether, our results provide another measurement of the robustness of deep networks that is important to consider when using them in the real world. Code and datasets are available at https://github.com/amro-kamal/ObjectPose.

preprint2021arXiv

Addressing the Topological Defects of Disentanglement via Distributed Operators

A core challenge in Machine Learning is to learn to disentangle natural factors of variation in data (e.g. object shape vs. pose). A popular approach to disentanglement consists in learning to map each of these factors to distinct subspaces of a model's latent representation. However, this approach has shown limited empirical success to date. Here, we show that, for a broad family of transformations acting on images--encompassing simple affine transformations such as rotations and translations--this approach to disentanglement introduces topological defects (i.e. discontinuities in the encoder). Motivated by classical results from group representation theory, we study an alternative, more flexible approach to disentanglement which relies on distributed latent operators, potentially acting on the entire latent space. We theoretically and empirically demonstrate the effectiveness of this approach to disentangle affine transformations. Our work lays a theoretical foundation for the recent success of a new generation of models using distributed operators for disentanglement.

preprint2016arXiv

Nonlinear decoding of a complex movie from the mammalian retina

Retinal circuitry transforms spatiotemporal patterns of light into spiking activity of ganglion cells, which provide the sole visual input to the brain. Recent advances have led to a detailed characterization of retinal activity and stimulus encoding by large neural populations. The inverse problem of decoding, where the stimulus is reconstructed from spikes, has received less attention, in particular for complex input movies that should be reconstructed "pixel-by-pixel". We recorded around a hundred neurons from a dense patch in a rat retina and decoded movies of multiple small discs executing mutually-avoiding random motions. We constructed nonlinear (kernelized) decoders that improved significantly over linear decoding results, mostly due to their ability to reliably separate between neural responses driven by locally fluctuating light signals, and responses at locally constant light driven by spontaneous or network activity. This improvement crucially depended on the precise, non-Poisson temporal structure of individual spike trains, which originated in the spike-history dependence of neural responses. Our results suggest a general paradigm in which downstream neural circuitry could discriminate between spontaneous and stimulus-driven activity on the basis of higher-order statistical structure intrinsic to the incoming spike trains.

preprint2015arXiv

Dynamical criticality in the collective activity of a population of retinal neurons

Recent experimental results based on multi-electrode and imaging techniques have reinvigorated the idea that large neural networks operate near a critical point, between order and disorder. However, evidence for criticality has relied on the definition of arbitrary order parameters, or on models that do not address the dynamical nature of network activity. Here we introduce a novel approach to assess criticality that overcomes these limitations, while encompassing and generalizing previous criteria. We find a simple model to describe the global activity of large populations of ganglion cells in the rat retina, and show that their statistics are poised near a critical point. Taking into account the temporal dynamics of the activity greatly enhances the evidence for criticality, revealing it where previous methods would not. The approach is general and could be used in other biological networks.