Source author record

Bruno Olshausen

Bruno Olshausen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision cond-mat.dis-nn eess.IV Machine Learning Neurons and Cognition

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Interactive Annotation Tool for Perceptual Video Compression

Human perception is at the core of lossy video compression and yet, it is challenging to collect data that is sufficiently dense to drive compression. In perceptual quality assessment, human feedback is typically collected as a single scalar quality score indicating preference of one distorted video over another. In reality, some videos may be better in some parts but not in others. We propose an approach to collecting finer-grained feedback by asking users to use an interactive tool to directly optimize for perceptual quality given a fixed bitrate. To this end, we built a novel web-tool which allows users to paint these spatio-temporal importance maps over videos. The tool allows for interactive successive refinement: we iteratively re-encode the original video according to the painted importance maps, while maintaining the same bitrate, thus allowing the user to visually see the trade-off of assigning higher importance to one spatio-temporal part of the video at the cost of others. We use this tool to collect data in-the-wild (10 videos, 17 users) and utilize the obtained importance maps in the context of x264 coding to demonstrate that the tool can indeed be used to generate videos which, at the same bitrate, look perceptually better through a subjective study - and are 1.9 times more likely to be preferred by viewers. The code for the tool and dataset can be found at https://github.com/jenyap/video-annotation-tool.git

preprint2022arXiv

RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior

Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key ideas of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, RG-Flow, which can separate information at different scales of images and extract disentangled representations at each scale. We demonstrate our method on synthetic multi-scale image datasets and the CelebA dataset, showing that the disentangled representations enable semantic manipulation and style mixing of the images at different scales. To visualize the latent representations, we introduce receptive fields for flow-based models and show that the receptive fields of RG-Flow are similar to those of convolutional neural networks. In addition, we replace the widely adopted isotropic Gaussian prior distribution by the sparse Laplacian distribution to further enhance the disentanglement of representations. From a theoretical perspective, our proposed method has $O(\log L)$ complexity for inpainting of an image with edge length $L$, compared to previous generative models with $O(L^2)$ complexity.

preprint2013arXiv

Testing our conceptual understanding of V1 function

Here we test our conceptual understanding of V1 function by asking two experimental questions: 1) How do neurons respond to the spatiotemporal structure contained in dynamic, natural scenes? and 2) What is the true range of visual responsiveness and predictability of neural responses obtained in an unbiased sample of neurons across all layers of cortex? We address these questions by recording responses to natural movie stimuli with 32 channel silicon probes. By simultaneously recording from cells in all layers, and taking all recorded cells, we reduce recording bias that results from "hunting" for neural responses evoked from drifting bars and gratings. A nonparametric model reveals that many cells that are visually responsive do not appear to be captured by standard receptive field models. Using nonlinear Radial Basis Function kernels in a support vector machine, we can explain the responses of some of these cells better than standard linear and phase-invariant complex cell models. This suggests that V1 neurons exhibit more complex and diverse responses than standard models can capture, ranging from simple and complex cells strongly driven by their classical receptive fields, to cells with more nonlinear receptive fields inferred from the nonparametric and RFB model, and cells that are not visually responsive despite robust firing.