Source author record

Tal Hassner

Tal Hassner appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence Graphics

Catalog footprint

What is connected

19works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Few-shot Learning with Noisy Labels

Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mislabeled samples in FSL settings, we make several technical contributions. (1) We offer simple, yet effective, feature aggregation methods, improving the prototypes used by ProtoNet, a popular FSL technique. (2) We describe a novel Transformer model for Noisy Few-Shot Learning (TraNFS). TraNFS leverages a transformer's attention mechanism to weigh mislabeled versus correct samples. (3) Finally, we extensively test these methods on noisy versions of MiniImageNet and TieredImageNet. Our results show that TraNFS is on-par with leading FSL methods on clean support sets, yet outperforms them, by far, in the presence of label noise.

preprint2022arXiv

FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment

We present Face Swapping GAN (FSGAN) for face swapping and reenactment. Unlike previous work, we offer a subject agnostic swapping scheme that can be applied to pairs of faces without requiring training on those faces. We derive a novel iterative deep learning--based approach for face reenactment which adjusts significant pose and expression variations that can be applied to a single image or a video sequence. For video sequences, we introduce a continuous interpolation of the face views based on reenactment, Delaunay Triangulation, and barycentric coordinates. Occluded face regions are handled by a face completion network. Finally, we use a face blending network for seamless blending of the two faces while preserving the target skin color and lighting conditions. This network uses a novel Poisson blending loss combining Poisson optimization with a perceptual loss. We compare our approach to existing state-of-the-art systems and show our results to be both qualitatively and quantitatively superior. This work describes extensions of the FSGAN method, proposed in an earlier conference version of our work, as well as additional experiments and results.

preprint2022arXiv

Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy

We analyze new generalization bounds for deep learning models trained by transfer learning from a source to a target task. Our bounds utilize a quantity called the majority predictor accuracy, which can be computed efficiently from data. We show that our theory is useful in practice since it implies that the majority predictor accuracy can be used as a transferability measure, a fact that is also validated by our experiments.

preprint2022arXiv

Proactive Image Manipulation Detection

Image manipulation detection algorithms are often trained to discriminate between images manipulated with particular Generative Models (GMs) and genuine/real images, yet generalize poorly to images manipulated with GMs unseen in the training. Conventional detection algorithms receive an input image passively. By contrast, we propose a proactive scheme to image manipulation detection. Our key enabling technique is to estimate a set of templates which when added onto the real image would lead to more accurate manipulation detection. That is, a template protected real image, and its manipulated version, is better discriminated compared to the original real image vs. its manipulated one. These templates are estimated using certain constraints based on the desired properties of templates. For image manipulation detection, our proposed approach outperforms the prior work by an average precision of 16% for CycleGAN and 32% for GauGAN. Our approach is generalizable to a variety of GMs showing an improvement over prior work by an average precision of 10% averaged across 12 GMs. Our code is available at https://www.github.com/vishal3477/proactive_IMD.

preprint2020arXiv

DeepFake Detection Based on the Discrepancy Between the Face and its Context

We propose a method for detecting face swapping and other identity manipulations in single images. Face swapping methods, such as DeepFake, manipulate the face region, aiming to adjust the face to the appearance of its context, while leaving the context unchanged. We show that this modus operandi produces discrepancies between the two regions. These discrepancies offer exploitable telltale signs of manipulation. Our approach involves two networks: (i) a face identification network that considers the face region bounded by a tight semantic segmentation, and (ii) a context recognition network that considers the face context (e.g., hair, ears, neck). We describe a method which uses the recognition signals from our two networks to detect such discrepancies, providing a complementary detection signal that improves conventional real vs. fake classifiers commonly used for detecting fake images. Our method achieves state of the art results on the FaceForensics++, Celeb-DF-v2, and DFDC benchmarks for face manipulation detection, and even generalizes to detect fakes produced by unseen methods.

preprint2020arXiv

LEEP: A New Measure to Evaluate Transferability of Learned Representations

We introduce a new measure to evaluate the transferability of representations learned by classifiers. Our measure, the Log Expected Empirical Prediction (LEEP), is simple and easy to compute: when given a classifier trained on a source data set, it only requires running the target data set through this classifier once. We analyze the properties of LEEP theoretically and demonstrate its effectiveness empirically. Our analysis shows that LEEP can predict the performance and convergence speed of both transfer and meta-transfer learning methods, even for small or imbalanced data. Moreover, LEEP outperforms recently proposed transferability measures such as negative conditional entropy and H scores. Notably, when transferring from ImageNet to CIFAR100, LEEP can achieve up to 30% improvement compared to the best competing method in terms of the correlations with actual transfer accuracy.

preprint2020arXiv

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress. However, most of the current arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals. RPN relies heavily on manually designed anchors and its proposals are represented with axis-aligned rectangles. The former presents difficulties in handling text instances of extreme aspect ratios or irregular shapes, and the latter often includes multiple neighboring instances into a single proposal, in cases of densely oriented text. To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Our SPN is anchor-free and gives accurate representations of arbitrary-shape proposals. It is therefore superior to RPN in detecting text instances of extreme aspect ratios or irregular shapes. Furthermore, the accurate proposals produced by SPN allow masked RoI features to be used for decoupling neighboring text instances. As a result, our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise. Specifically, we outperform state-of-the-art methods by 21.9 percent on the Rotated ICDAR 2013 dataset (rotation robustness), 5.9 percent on the Total-Text dataset (shape robustness), and achieve state-of-the-art performance on the MSRA-TD500 dataset (aspect ratio robustness). Code is available at: https://github.com/MhLiao/MaskTextSpotterV3

preprint2016arXiv

Do We Really Need to Collect Millions of Faces for Effective Face Recognition?

Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes -- huge numbers of face images downloaded and labeled for identity -- it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible means of increasing training data sizes for face recognition systems. Rather than manually harvesting and labeling more faces, we simply synthesize them. We describe novel methods of enriching an existing dataset with important facial appearance variations by manipulating the faces it contains. We further apply this synthesis approach when matching query images represented using a standard convolutional neural network. The effect of training and testing with synthesized images is extensively tested on the LFW and IJB-A (verification and identification) benchmarks and Janus CS2. The performances obtained by our approach match state of the art results reported by systems trained on millions of downloaded images.

preprint2016arXiv

Face Recognition Using Deep Multi-Pose Representations

We introduce our method and system for face recognition using multiple pose-aware deep learning models. In our representation, a face image is processed by several pose-specific deep convolutional neural network (CNN) models to generate multiple pose-specific features. 3D rendering is used to generate multiple face poses from the input image. Sensitivity of the recognition system to pose variations is reduced since we use an ensemble of pose-specific CNN features. The paper presents extensive experimental results on the effect of landmark detection, CNN layer selection and pose model selection on the performance of the recognition pipeline. Our novel representation achieves better results than the state-of-the-art on IARPA's CS2 and NIST's IJB-A in both verification and identification (i.e. search) tasks.

preprint2016arXiv

Facial Landmark Detection with Tweaked Convolutional Neural Networks

We present a novel convolutional neural network (CNN) design for facial landmark coordinate regression. We examine the intermediate features of a standard CNN trained for landmark detection and show that features extracted from later, more specialized layers capture rough landmark locations. This provides a natural means of applying differential treatment midway through the network, tweaking processing based on facial alignment. The resulting Tweaked CNN model (TCNN) harnesses the robustness of CNNs for landmark detection, in an appearance-sensitive manner without training multi-part or multi-scale models. Our results on standard face landmark detection and face verification benchmarks show TCNN to surpasses previously published performances by wide margins.

preprint2016arXiv

Pooling Faces: Template based Face Recognition with Pooled Face Images

We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an approach which was proven effective in many other domains, but, to our knowledge, never fully explored for face images: average pooling of face photos. We show how (and why!) the space of a template's images can be partitioned and then pooled based on image quality and head pose and the effect this has on accuracy and template size. We perform extensive tests on the IJB-A and Janus CS2 template based face identification and verification benchmarks. These show that not only does our approach outperform published state of the art despite requiring far fewer cross template comparisons, but also, surprisingly, that image pooling performs on par with deep feature pooling.

preprint2016arXiv

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

The 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied "in the wild", their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. The 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.

preprint2016arXiv

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

Accuracy, descriptor size, and the time required for extraction and matching are all important factors when selecting local image descriptors. To optimize over all these requirements, this paper presents a CUDA port for the recent Learned Arrangement of Three Patches (LATCH) binary descriptors to the GPU platform. The design of LATCH makes it well suited for GPU processing. Owing to its small size and binary nature, the GPU can further be used to efficiently match LATCH features. Taken together, this leads to breakneck descriptor extraction and matching speeds. We evaluate the trade off between these speeds and the quality of results in a feature matching intensive application. To this end, we use our proposed CUDA LATCH (CLATCH) to recover structure from motion (SfM), comparing 3D reconstructions and speed using different representations. Our results show that CLATCH provides high quality 3D reconstructions at fractions of the time required by other representations, with little, if any, loss of reconstruction quality.

preprint2015arXiv

GPU-Based Computation of 2D Least Median of Squares with Applications to Fast and Robust Line Detection

The 2D Least Median of Squares (LMS) is a popular tool in robust regression because of its high breakdown point: up to half of the input data can be contaminated with outliers without affecting the accuracy of the LMS estimator. The complexity of 2D LMS estimation has been shown to be $Ω(n^2)$ where $n$ is the total number of points. This high theoretical complexity along with the availability of graphics processing units (GPU) motivates the development of a fast, parallel, GPU-based algorithm for LMS computation. We present a CUDA based algorithm for LMS computation and show it to be much faster than the optimal state of the art single threaded CPU algorithm. We begin by describing the proposed method and analyzing its performance. We then demonstrate how it can be used to modify the well-known Hough Transform (HT) in order to efficiently detect image lines in noisy images. Our method is compared with standard HT-based line detection methods and shown to overcome their shortcomings in terms of both efficiency and accuracy.

preprint2015arXiv

LATCH: Learned Arrangements of Three Patch Codes

We present a novel means of describing local image appearances using binary strings. Binary descriptors have drawn increasing interest in recent years due to their speed and low memory footprint. A known shortcoming of these representations is their inferior performance compared to larger, histogram based descriptors such as the SIFT. Our goal is to close this performance gap while maintaining the benefits attributed to binary representations. To this end we propose the Learned Arrangements of Three Patch Codes descriptors, or LATCH. Our key observation is that existing binary descriptors are at an increased risk from noise and local appearance variations. This, as they compare the values of pixel pairs; changes to either of the pixels can easily lead to changes in descriptor values, hence damaging its performance. In order to provide more robustness, we instead propose a novel means of comparing pixel patches. This ostensibly small change, requires a substantial redesign of the descriptors themselves and how they are produced. Our resulting LATCH representation is rigorously compared to state-of-the-art binary descriptors and shown to provide far better performance for similar computation and space requirements.

preprint2015arXiv

Wide baseline stereo matching with convex bounded-distortion constraints

Finding correspondences in wide baseline setups is a challenging problem. Existing approaches have focused largely on developing better feature descriptors for correspondence and on accurate recovery of epipolar line constraints. This paper focuses on the challenging problem of finding correspondences once approximate epipolar constraints are given. We introduce a novel method that integrates a deformation model. Specifically, we formulate the problem as finding the largest number of corresponding points related by a bounded distortion map that obeys the given epipolar constraints. We show that, while the set of bounded distortion maps is not convex, the subset of maps that obey the epipolar line constraints is convex, allowing us to introduce an efficient algorithm for matching. We further utilize a robust cost function for matching and employ majorization-minimization for its optimization. Our experiments indicate that our method finds significantly more accurate maps than existing approaches.

preprint2014arXiv

Dense Correspondences Across Scenes and Scales

We seek a practical method for establishing dense correspondences between two images with similar content, but possibly different 3D scenes. One of the challenges in designing such a system is the local scale differences of objects appearing in the two images. Previous methods often considered only small subsets of image pixels; matching only pixels for which stable scales may be reliably estimated. More recently, others have considered dense correspondences, but with substantial costs associated with generating, storing and matching scale invariant descriptors. Our work here is motivated by the observation that pixels in the image have contexts -- the pixels around them -- which may be exploited in order to estimate local scales reliably and repeatably. Specifically, we make the following contributions. (i) We show that scales estimated in sparse interest points may be propagated to neighboring pixels where this information cannot be reliably determined. Doing so allows scale invariant descriptors to be extracted anywhere in the image, not just in detected interest points. (ii) We present three different means for propagating this information: using only the scales at detected interest points, using the underlying image information to guide the propagation of this information across each image, separately, and using both images simultaneously. Finally, (iii), we provide extensive results, both qualitative and quantitative, demonstrating that accurate dense correspondences can be obtained even between very different images, with little computational costs beyond those required by existing methods.

preprint2014arXiv

Effective Face Frontalization in Unconstrained Images

"Frontalization" is the process of synthesizing frontal facing views of faces appearing in single unconstrained photos. Recent reports have suggested that this process may substantially boost the performance of face recognition systems. This, by transforming the challenging problem of recognizing faces viewed from unconstrained viewpoints to the easier problem of recognizing faces in constrained, forward facing poses. Previous frontalization methods did this by attempting to approximate 3D facial shapes for each query image. We observe that 3D face shape estimation from unconstrained photos may be a harder problem than frontalization and can potentially introduce facial misalignments. Instead, we explore the simpler approach of using a single, unmodified, 3D surface as an approximation to the shape of all input faces. We show that this leads to a straightforward, efficient and easy to implement method for frontalization. More importantly, it produces aesthetic new frontal views and is surprisingly effective when used for face recognition and gender estimation.

preprint2013arXiv

Single View Depth Estimation from Examples

We describe a non-parametric, "example-based" method for estimating the depth of an object, viewed in a single photo. Our method consults a database of example 3D geometries, searching for those which look similar to the object in the photo. The known depths of the selected database objects act as shape priors which constrain the process of estimating the object's depth. We show how this process can be performed by optimizing a well defined target likelihood function, via a hard-EM procedure. We address the problem of representing the (possibly infinite) variability of viewing conditions with a finite (and often very small) example set, by proposing an on-the-fly example update scheme. We further demonstrate the importance of non-stationarity in avoiding misleading examples when estimating structured shapes. We evaluate our method and present both qualitative as well as quantitative results for challenging object classes. Finally, we show how this same technique may be readily applied to a number of related problems. These include the novel task of estimating the occluded depth of an object's backside and the task of tailoring custom fitting image-maps for input depths.

Tal Hassner

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Few-shot Learning with Noisy Labels

FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment

Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy

Proactive Image Manipulation Detection

DeepFake Detection Based on the Discrepancy Between the Face and its Context

LEEP: A New Measure to Evaluate Transferability of Learned Representations

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

Do We Really Need to Collect Millions of Faces for Effective Face Recognition?

Face Recognition Using Deep Multi-Pose Representations

Facial Landmark Detection with Tweaked Convolutional Neural Networks

Pooling Faces: Template based Face Recognition with Pooled Face Images

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better

GPU-Based Computation of 2D Least Median of Squares with Applications to Fast and Robust Line Detection

LATCH: Learned Arrangements of Three Patch Codes

Wide baseline stereo matching with convex bounded-distortion constraints

Dense Correspondences Across Scenes and Scales

Effective Face Frontalization in Unconstrained Images

Single View Depth Estimation from Examples