Source author record

Gerald Friedland

Gerald Friedland appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Multimedia Machine Learning Computer Vision cs.CY Artificial Intelligence Computation and Language Performance Robotics Sound

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

From Tinkering to Engineering: Measurements in Tensorflow Playground

In this article, we present an extension of the Tensorflow Playground, called Tensorflow Meter (short TFMeter). TFMeter is an interactive neural network architecting tool that allows the visual creation of different architectures of neural networks. In addition to its ancestor, the playground, our tool shows information-theoretic measurements while constructing, training, and testing the network. As a result, each change results in a change in at least one of the measurements, providing for a better engineering intuition of what different architectures are able to learn. The measurements are derived from various places in the literature. In this demo, we describe our web application that is available online at http://tfmeter.icsi.berkeley.edu/ and argue that in the same way that the original Playground is meant to build an intuition about neural networks, our extension educates users on available measurements, which we hope will ultimately improve experimental design and reproducibility in the field.

preprint2021arXiv

Multi-modal Ensemble Models for Predicting Video Memorability

Modeling media memorability has been a consistent challenge in the field of machine learning. The Predicting Media Memorability task in MediaEval2020 is the latest benchmark among similar challenges addressing this topic. Building upon techniques developed in previous iterations of the challenge, we developed ensemble methods with the use of extracted video, image, text, and audio features. Critically, in this work we introduce and demonstrate the efficacy and high generalizability of extracted audio embeddings as a feature for the task of predicting media memorability.

preprint2021arXiv

OrigamiSet1.0: Two New Datasets for Origami Classification and Difficulty Estimation

Origami is becoming more and more relevant to research. However, there is no public dataset yet available and there hasn't been any research on this topic in machine learning. We constructed an origami dataset using images from the multimedia commons and other databases. It consists of two subsets: one for classification of origami images and the other for difficulty estimation. We obtained 16000 images for classification (half origami, half other objects) and 1509 for difficulty estimation with $3$ different categories (easy: 764, intermediate: 427, complex: 318). The data can be downloaded at: https://github.com/multimedia-berkeley/OriSet. Finally, we provide machine learning baselines.

preprint2020arXiv

Efficient Saliency Maps for Explainable AI

We describe an explainable AI saliency map method for use with deep convolutional neural networks (CNN) that is much more efficient than popular fine-resolution gradient methods. It is also quantitatively similar or better in accuracy. Our technique works by measuring information at the end of each network scale which is then combined into a single saliency map. We describe how saliency measures can be made more efficient by exploiting Saliency Map Order Equivalence. We visualize individual scale/layer contributions by using a Layer Ordered Visualization of Information. This provides an interesting comparison of scale information contributions within the network not provided by other saliency map methods. Using our method instead of Guided Backprop, coarse-resolution class activation methods such as Grad-CAM and Grad-CAM++ seem to yield demonstrably superior results without sacrificing speed. This will make fine-resolution saliency methods feasible on resource limited platforms such as robots, cell phones, low-cost industrial devices, astronomy and satellite imagery.

preprint2016arXiv

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

This paper presents a novel two-phase method for audio representation, Discriminative and Compact Audio Representation (DCAR), and evaluates its performance at detecting events in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on Grassmannian manifolds, which we found represents the structure of audio effectively. Our experiments used the YLI-MED dataset (an open TRECVID-style video corpus based on YFCC100M), which includes ten events. The results show that the proposed DCAR representation consistently outperforms state-of-the-art audio representations. DCAR's advantage over i-vector, mv-vector, and GMM representations is significant for both easier and harder discrimination tasks. We discuss how these performance differences across easy and hard cases follow from how each type of model leverages (or doesn't leverage) the intrinsic structure of the data. Furthermore, DCAR shows a particularly notable accuracy advantage on events where humans have more difficulty classifying the videos, i.e., events with lower mean annotator confidence.

preprint2016arXiv

Where to be wary: The impact of widespread photo-taking and image enhancement practices on users' geo-privacy

Today's geo-location estimation approaches are able to infer the location of a target image using its visual content alone. These approaches exploit visual matching techniques, applied to a large collection of background images with known geo-locations. Users who are unaware that visual retrieval approaches can compromise their geo-privacy, unwittingly open themselves to risks of crime or other unintended consequences. Private photo sharing is not able to protect users effectively, since its inconvenience is a barrier to consistent use, and photos can still fall into the wrong hands if they are re-shared. This paper lays the groundwork for a new approach to geo-privacy of social images: Instead of requiring a complete change of user behavior, we investigate the protection potential latent in users existing practices. We carry out a series of retrieval experiments using a large collection of social images (8.5M) to systematically analyze where users should be wary, and how both photo taking and editing practices impact the performance of geo-location estimation. We find that practices that are currently widespread are already sufficient to protect single-handedly the geo-location ('geo-cloak') up to more than 50% of images whose location would otherwise be automatically predictable. Our conclusion is that protecting users against the unwanted effects of visual retrieval is a viable research field, and should take as its starting point existing user practices.

preprint2016arXiv

YFCC100M: The New Data in Multimedia Research

We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014. In this article we explain the rationale behind its creation, as well as the implications the dataset has for science, research, engineering, and development. We further present several new challenges in multimedia research that can now be expanded upon with our dataset.

preprint2015arXiv

The YLI-MED Corpus: Characteristics, Procedures, and Plans

The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI feature corpus, which is being developed by the International Computer Science Institute and Lawrence Livermore National Laboratory based on the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset. The videos in YLI-MED are categorized as depicting one of ten target events, or no target event, and are annotated for additional attributes like language spoken and whether the video has a musical score. The annotations also include degree of annotator agreement and average annotator confidence scores for the event categorization of each video. Version 1.0 of YLI-MED includes 1823 "positive" videos that depict the target events and 48,138 "negative" videos, as well as 177 supplementary videos that are similar to event videos but are not positive examples. Our goal in producing YLI-MED is to be as open about our data and procedures as possible. This report describes the procedures used to collect the corpus; gives detailed descriptive statistics about the corpus makeup (and how video attributes affected annotators' judgments); discusses possible biases in the corpus introduced by our procedural choices and compares it with the most similar existing dataset, TRECVID MED's HAVIC corpus; and gives an overview of our future plans for expanding the annotation effort.

Gerald Friedland

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

From Tinkering to Engineering: Measurements in Tensorflow Playground

Multi-modal Ensemble Models for Predicting Video Memorability

OrigamiSet1.0: Two New Datasets for Origami Classification and Difficulty Estimation

Efficient Saliency Maps for Explainable AI

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

Where to be wary: The impact of widespread photo-taking and image enhancement practices on users' geo-privacy

YFCC100M: The New Data in Multimedia Research

The YLI-MED Corpus: Characteristics, Procedures, and Plans