Source author record

Yedid Hoshen

Yedid Hoshen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language Cryptography and Security eess.AS eess.IV Graphics Sound

Catalog footprint

What is connected

16works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Alterbute: Editing Intrinsic Attributes of Objects in Images

We introduce Alterbute, a diffusion-based method for editing an object's intrinsic attributes in an image. We allow changing color, texture, material, and even the shape of an object, while preserving its perceived identity and scene context. Existing approaches either rely on unsupervised priors that often fail to preserve identity or use overly restrictive supervision that prevents meaningful intrinsic variations. Our method relies on: (i) a relaxed training objective that allows the model to change both intrinsic and extrinsic attributes conditioned on an identity reference image, a textual prompt describing the target intrinsic attributes, and a background image and object mask defining the extrinsic context. At inference, we restrict extrinsic changes by reusing the original background and object mask, thereby ensuring that only the desired intrinsic attributes are altered; (ii) Visual Named Entities (VNEs) - fine-grained visual identity categories (e.g., ''Porsche 911 Carrera'') that group objects sharing identity-defining features while allowing variation in intrinsic attributes. We use a vision-language model to automatically extract VNE labels and intrinsic attribute descriptions from a large public image dataset, enabling scalable, identity-preserving supervision. Alterbute outperforms existing methods on identity-preserving object intrinsic attribute editing.

preprint2026arXiv

Progressive Photorealistic Simplification

Existing image simplification techniques often rely on Non-Photorealistic Rendering (NPR), transforming photographs into stylized sketches, cartoons, or paintings. While effective at reducing visual complexity, such approaches typically sacrifice photographic realism. In this work, we explore a complementary direction: simplifying images while preserving their photorealistic appearance. We introduce progressive semantic image simplification, a framework that iteratively reduces scene complexity by removing and inpainting elements in a controlled manner. At each step, the resulting image remains a plausible natural photograph. Our method combines semantic understanding with generative editing, leveraging Vision-Language Models (VLMs) to identify and prioritize elements for removal, and a learned verifier to ensure photorealism and coherence throughout the process. This is implemented via an iterative Select-Remove-Verify pipeline that produces high-quality simplification trajectories. To improve efficiency, we further distill this process into an image-to-video generation model that directly predicts coherent simplification sequences from a single input image. Beyond generating cleaner and more focused compositions, our approach enables applications such as content-aware decluttering, semantic layer decomposition, and interactive editing. More broadly, our work suggests that simplification through structured content removal can serve as a practical mechanism for guiding visual interpretation within the photorealistic domain, complementing traditional abstraction methods.

preprint2022arXiv

A Contrastive Objective for Learning Disentangled Representations

Learning representations of images that are invariant to sensitive or unwanted attributes is important for many tasks including bias removal and cross domain retrieval. Here, our objective is to learn representations that are invariant to the domain (sensitive attribute) for which labels are provided, while being informative over all other image attributes, which are unlabeled. We present a new approach, proposing a new domain-wise contrastive objective for ensuring invariant representations. This objective crucially restricts negative image pairs to be drawn from the same domain, which enforces domain invariance whereas the standard contrastive objective does not. This domain-wise objective is insufficient on its own as it suffers from shortcut solutions resulting in feature suppression. We overcome this issue by a combination of a reconstruction constraint, image augmentations and initialization with pre-trained weights. Our analysis shows that the choice of augmentations is important, and that a misguided choice of augmentations can harm the invariance and informativeness objectives. In an extensive evaluation, our method convincingly outperforms the state-of-the-art in terms of representation invariance, representation informativeness, and training speed. Furthermore, we find that in some cases our method can achieve excellent results even without the reconstruction constraint, leading to a much faster and resource efficient training.

preprint2022arXiv

Red PANDA: Disambiguating Anomaly Detection by Removing Nuisance Factors

Anomaly detection methods strive to discover patterns that differ from the norm in a semantic way. This goal is ambiguous as a data point differing from the norm by an attribute e.g., age, race or gender, may be considered anomalous by some operators while others may consider this attribute irrelevant. Breaking from previous research, we present a new anomaly detection method that allows operators to exclude an attribute from being considered as relevant for anomaly detection. Our approach then learns representations which do not contain information over the nuisance attributes. Anomaly scoring is performed using a density-based approach. Importantly, our approach does not require specifying the attributes that are relevant for detecting anomalies, which is typically impossible in anomaly detection, but only attributes to ignore. An empirical investigation is presented verifying the effectiveness of our approach.

preprint2022arXiv

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this common practice: we prove that the pretrained NLM can model much stronger dependencies between text segments that appeared in the same training example, than it can between text segments that appeared in different training examples. This intuitive result has a twofold role. First, it formalizes the motivation behind a broad line of recent successful NLM training heuristics, proposed for the pretraining and fine-tuning stages, which do not necessarily appear related at first glance. Second, our result clearly indicates further improvements to be made in NLM pretraining for the benefit of Natural Language Understanding tasks. As an example, we propose "kNN-Pretraining": we show that including semantically related non-neighboring sentences in the same pretraining example yields improved sentence representations and open domain question answering abilities. This theoretically motivated degree of freedom for pretraining example design indicates new training schemes for self-improving representations.

preprint2022arXiv

Time Series Anomaly Detection by Cumulative Radon Features

Detecting anomalous time series is key for scientific, medical and industrial tasks, but is challenging due to its inherent unsupervised nature. In recent years, progress has been made on this task by learning increasingly more complex features, often using deep neural networks. In this work, we argue that shallow features suffice when combined with distribution distance measures. Our approach models each time series as a high dimensional empirical distribution of features, where each time-point constitutes a single sample. Modeling the distance between a test time series and the normal training set therefore requires efficiently measuring the distance between multivariate probability distributions. We show that by parameterizing each time series using cumulative Radon features, we are able to efficiently and effectively model the distribution of normal time series. Our theoretically grounded but simple-to-implement approach is evaluated on multiple datasets and shown to achieve better results than established, classical methods as well as complex, state-of-the-art deep learning methods. Code is provided.

preprint2022arXiv

Unsupervised Word Segmentation using K Nearest Neighbors

In this paper, we propose an unsupervised kNN-based approach for word segmentation in speech utterances. Our method relies on self-supervised pre-trained speech representations, and compares each audio segment of a given utterance to its K nearest neighbors within the training set. Our main assumption is that a segment containing more than one word would occur less often than a segment containing a single word. Our method does not require phoneme discovery and is able to operate directly on pre-trained audio representations. This is in contrast to current methods that use a two-stage approach; first detecting the phonemes in the utterance and then detecting word-boundaries according to statistics calculated on phoneme patterns. Experiments on two datasets demonstrate improved results over previous single-stage methods and competitive results on state-of-the-art two-stage methods.

preprint2021arXiv

Crypto-Oriented Neural Architecture Design

As neural networks revolutionize many applications, significant privacy conflicts between model users and providers emerge. The cryptography community developed a variety of techniques for secure computation to address such privacy issues. As generic techniques for secure computation are typically prohibitively ineffective, many efforts focus on optimizing their underlying cryptographic tools. Differently, we propose to optimize the initial design of crypto-oriented neural architectures and provide a novel Partial Activation layer. The proposed layer is much faster for secure computation. Evaluating our method on three state-of-the-art architectures (SqueezeNet, ShuffleNetV2, and MobileNetV2) demonstrates significant improvement to the efficiency of secure inference on common evaluation metrics.

preprint2021arXiv

Sub-Image Anomaly Detection with Deep Pyramid Correspondences

Nearest neighbor (kNN) methods utilizing deep pre-trained features exhibit very strong anomaly detection performance when applied to entire images. A limitation of kNN methods is the lack of segmentation map describing where the anomaly lies inside the image. In this work we present a novel anomaly segmentation approach based on alignment between an anomalous image and a constant number of the similar normal images. Our method, Semantic Pyramid Anomaly Detection (SPADE) uses correspondences based on a multi-resolution feature pyramid. SPADE is shown to achieve state-of-the-art performance on unsupervised anomaly detection and localization while requiring virtually no training time.

preprint2020arXiv

Classification-Based Anomaly Detection for General Data

Anomaly detection, finding patterns that substantially deviate from those seen previously, is one of the fundamental problems of artificial intelligence. Recently, classification-based methods were shown to achieve superior results on this task. In this work, we present a unifying view and propose an open-set method, GOAD, to relax current generalization assumptions. Furthermore, we extend the applicability of transformation-based methods to non-image data using random affine transformations. Our method is shown to obtain state-of-the-art accuracy and is applicable to broad data types. The strong performance of our method is extensively validated on multiple datasets from different domains.

preprint2020arXiv

Deep Nearest Neighbor Anomaly Detection

Nearest neighbors is a successful and long-standing technique for anomaly detection. Significant progress has been recently achieved by self-supervised deep methods (e.g. RotNet). Self-supervised features however typically under-perform Imagenet pre-trained features. In this work, we investigate whether the recent progress can indeed outperform nearest-neighbor methods operating on an Imagenet pretrained feature space. The simple nearest-neighbor based-approach is experimentally shown to outperform self-supervised methods in: accuracy, few shot generalization, training time and noise robustness while making fewer assumptions on image distributions.

preprint2020arXiv

Demystifying Inter-Class Disentanglement

Learning to disentangle the hidden factors of variations within a set of observations is a key task for artificial intelligence. We present a unified formulation for class and content disentanglement and use it to illustrate the limitations of current methods. We therefore introduce LORD, a novel method based on Latent Optimization for Representation Disentanglement. We find that latent optimization, along with an asymmetric noise regularization, is superior to amortized inference for achieving disentangled representations. In extensive experiments, our method is shown to achieve better disentanglement performance than both adversarial and non-adversarial methods that use the same level of supervision. We further introduce a clustering-based approach for extending our method for settings that exhibit in-class variation with promising results on the task of domain translation.

preprint2020arXiv

Improving Style-Content Disentanglement in Image-to-Image Translation

Unsupervised image-to-image translation methods have achieved tremendous success in recent years. However, it can be easily observed that their models contain significant entanglement which often hurts the translation performance. In this work, we propose a principled approach for improving style-content disentanglement in image-to-image translation. By considering the information flow into each of the representations, we introduce an additional loss term which serves as a content-bottleneck. We show that the results of our method are significantly more disentangled than those produced by current methods, while further improving the visual quality and translation diversity.

preprint2020arXiv

Training End-to-end Single Image Generators without GANs

We present AugurOne, a novel approach for training single image generative models. Our approach trains an upscaling neural network using non-affine augmentations of the (single) input image, particularly including non-rigid thin plate spline image warps. The extensive augmentations significantly increase the in-sample distribution for the upsampling network enabling the upscaling of highly variable inputs. A compact latent space is jointly learned allowing for controlled image synthesis. Differently from Single Image GAN, our approach does not require GAN training and takes place in an end-to-end fashion allowing fast and stable training. We experimentally evaluate our method and show that it obtains compelling novel animations of single-image, as well as, state-of-the-art performance on conditional generation tasks e.g. paint-to-image and edges-to-image.

preprint2015arXiv

An Egocentric Look at Video Photographer Identity

Egocentric cameras are being worn by an increasing number of users, among them many security forces worldwide. GoPro cameras already penetrated the mass market, reporting substantial increase in sales every year. As head-worn cameras do not capture the photographer, it may seem that the anonymity of the photographer is preserved even when the video is publicly distributed. We show that camera motion, as can be computed from the egocentric video, provides unique identity information. The photographer can be reliably recognized from a few seconds of video captured when walking. The proposed method achieves more than 90% recognition accuracy in cases where the random success rate is only 3%. Applications can include theft prevention by locking the camera when not worn by its rightful owner. Searching video sharing services (e.g. YouTube) for egocentric videos shot by a specific photographer may also become possible. An important message in this paper is that photographers should be aware that sharing egocentric video will compromise their anonymity, even when their face is not visible.

preprint2015arXiv

Live Video Synopsis for Multiple Cameras

Video surveillance cameras generate most of recorded video, and there is far more recorded video than operators can watch. Much progress has recently been made using summarization of recorded video, but such techniques do not have much impact on live video surveillance. We assume a camera hierarchy where a Master camera observes the decision-critical region, and one or more Slave cameras observe regions where past activity is important for making the current decision. We propose that when people appear in the live Master camera, the Slave cameras will display their past activities, and the operator could use past information for real-time decision making. The basic units of our method are action tubes, representing objects and their trajectories over time. Our object-based method has advantages over frame based methods, as it can handle multiple people, multiple activities for each person, and can address re-identification uncertainty.

Yedid Hoshen

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Alterbute: Editing Intrinsic Attributes of Objects in Images

Progressive Photorealistic Simplification

A Contrastive Objective for Learning Disentangled Representations

Red PANDA: Disambiguating Anomaly Detection by Removing Nuisance Factors

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Time Series Anomaly Detection by Cumulative Radon Features

Unsupervised Word Segmentation using K Nearest Neighbors

Crypto-Oriented Neural Architecture Design

Sub-Image Anomaly Detection with Deep Pyramid Correspondences

Classification-Based Anomaly Detection for General Data

Deep Nearest Neighbor Anomaly Detection

Demystifying Inter-Class Disentanglement

Improving Style-Content Disentanglement in Image-to-Image Translation

Training End-to-end Single Image Generators without GANs

An Egocentric Look at Video Photographer Identity

Live Video Synopsis for Multiple Cameras