Source author record

Amir Bar

Amir Bar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech Computer Vision Biological Physics Artificial Intelligence Biomolecules Machine Learning math.PR

Catalog footprint

What is connected

11works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Lifting Embodied World Models for Planning and Control

World models of embodied agents predict future observations conditioned on an action taken by the agent. For complex embodiments, action spaces are high-dimensional and difficult to specify: for example, precisely controlling a human agent requires specifying the motion of each joint. This makes the world model hard to control and expensive to plan with as search-based methods like CEM scale poorly with action dimensionality. To address this issue, we train a lightweight policy that maps high-level actions to sequences of low-level joint actions. Composing this policy with the frozen world model produces a lifted world model that predicts a sequence of future observations from a single high-level action. We instantiate this framework for a human-like embodiment, defining the high-level action space as a small set of 2D waypoints annotated on the current observation frame, each specifying a near-term goal position for a leaf joint (pelvis, head, hands). Waypoints are low-dimensional, visually interpretable, and easy to specify manually or to search over. We show that the lifted world model substantially outperforms searching directly in low-level joint space ($3.8\times$ lower mean joint error to the goal pose), while remaining more compute-efficient and generalizing to environments unseen by the policy.

preprint2022arXiv

Object-Region Video Transformers

Recently, video transformers have shown great success in video understanding, exceeding CNN performance; yet existing video transformer models do not explicitly model objects, although objects can be essential for recognizing actions. In this work, we present Object-Region Video Transformers (ORViT), an \emph{object-centric} approach that extends video transformer layers with a block that directly incorporates object representations. The key idea is to fuse object-centric representations starting from early layers and propagate them into the transformer-layers, thus affecting the spatio-temporal representations throughout the network. Our ORViT block consists of two object-level streams: appearance and dynamics. In the appearance stream, an "Object-Region Attention" module applies self-attention over the patches and \emph{object regions}. In this way, visual object regions interact with uniform patch tokens and enrich them with contextualized object information. We further model object dynamics via a separate "Object-Dynamics Module", which captures trajectory interactions, and show how to integrate the two streams. We evaluate our model on four tasks and five datasets: compositional and few-shot action recognition on SomethingElse, spatio-temporal action detection on AVA, and standard action recognition on Something-Something V2, Diving48 and Epic-Kitchen100. We show strong performance improvement across all tasks and datasets considered, demonstrating the value of a model that incorporates object representations into a transformer architecture. For code and pretrained models, visit the project page at \url{https://roeiherz.github.io/ORViT/}

preprint2022arXiv

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

This technical report describes the SViT approach for the Ego4D Point of No Return (PNR) Temporal Localization Challenge. We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos. Second, the scene representations of individual frames in video should "align" with those of still images. This is achieved via a "Frame-Clip Consistency" loss, which ensures the flow of structured information between images and videos. SViT obtains strong performance on the challenge test set with 0.656 absolute temporal localization error.

preprint2022arXiv

Visual Prompting via Image Inpainting

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc.

preprint2020arXiv

Learning Canonical Representations for Scene Graph to Image Generation

Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model that addresses these issues by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes. Our model demonstrates improved empirical performance on large scene graphs, robustness to noise in the input scene graph, and generalization on semantically equivalent graphs. Finally, we show improved performance of the model on three different benchmarks: Visual Genome, COCO, and CLEVR.

preprint2016arXiv

Exact extreme value statistics at mixed order transitions

We study extreme value statistics (EVS) for spatially extended models exhibiting mixed order phase transitions (MOT). These are phase transitions which exhibit features common to both first order (discontinuity of the order parameter) and second order (diverging correlation length) transitions. We consider here the truncated inverse distance squared Ising (TIDSI) model which is a prototypical model exhibiting MOT, and study analytically the extreme value statistics of the domain lengths. The lengths of the domains are identically distributed random variables except for the global constraint that their sum equals the total system size $L$. In addition, the number of such domains is also a fluctuating variable, and not fixed. In the paramagnetic phase, we show that the distribution of the largest domain length $l_{\max}$ converges, in the large $L$ limit, to a Gumbel distribution. However, at the critical point (for a certain range of parameters) and in the ferromagnetic phase, we show that the fluctuations of $l_{\max}$ are governed by novel distributions which we compute exactly. Our main analytical results are verified by numerical simulations.

preprint2014arXiv

Mixed order transition and condensation in exactly soluble one dimensional spin model

Mixed order phase transitions (MOT), which display discontinuous order parameter and diverging correlation length, appear in several seemingly unrelated settings ranging from equilibrium models with long-range interactions to models far from thermal equilibrium. In a recent paper [1] an exactly soluble spin model with long-range interactions that exhibits MOT was introduced and analyzed both by a grand canonical calculation and a renormalization group analysis. The model was shown to lay a bridge between two classes of one dimensional models exhibiting MOT, namely between spin models with inverse distance square interactions and surface depinning models. In this paper we elaborate on the calculations done in [1]. We also analyze the model in the canonical ensemble, which yields a better insight into the mechanism of MOT. In addition, we generalize the model to include Potts and general Ising spins, and also consider a broader class of interactions which decay with distance with a power law different from 2.

preprint2013arXiv

Mixed order phase transition in a one dimensional model

We introduce and analyze an exactly soluble one-dimensional Ising model with long range interactions which exhibits a mixed order transition (MOT), namely a phase transition in which the order parameter is discontinuous as in first order transitions while the correlation length diverges as in second order transitions. Such transitions are known to appear in a diverse classes of models which are seemingly unrelated. The model we present serves as a link between two classes of models which exhibit MOT in one dimension, namely, spin models with a coupling constant which decays as the inverse distance squared and models of depinning transitions, thus making a step towards a unifying framework.

preprint2012arXiv

Denaturation of Circular DNA: Supercoils and Overtwist

The denaturation transition of circular DNA is studied within a Poland-Scheraga type approach, generalized to account for the fact that the total linking number (LK), which measures the number of windings of one strand around the other, is conserved. In the model the LK conservation is maintained by invoking both overtwisting and writhing (supercoiling) mechanisms. This generalizes previous studies which considered each mechanism separately. The phase diagram of the model is analyzed as a function of the temperature and the elastic constant $κ$ associated with the overtwisting energy for any given loop entropy exponent, $c$. As is the case where the two mechanisms apply separately, the model exhibits no denaturation transition for $c \le 2$. For $c>2$ and $κ=0$ we find that the model exhibits a first order transition. The transition becomes of higher order for any $κ>0$. We also calculate the contribution of the two mechanisms separately in maintaining the conservation of the linking number and find that it is weakly dependent on the loop exponent $c$.

preprint2012arXiv

Macroscopic loop formation in circular DNA denaturation

The statistical mechanics of DNA denaturation under fixed linking number is qualitatively different from that of the unconstrained DNA. Quantitatively different melting scenarios are reached from two alternative assumptions, namely, that the denatured loops are formed in expense of 1) overtwist, 2) supercoils. Recent work has shown that the supercoiling mechanism results in a BEC-like picture where a macroscopic loop appears at Tc and grows steadily with temperature, while the nature of the denatured phase for the overtwisting case has not been studied. By extending an earlier result, we show here that a macroscopic loop appears in the overtwisting scenario as well. We calculate its size as a function of temperature and show that the fraction of the total sum of microscopic loops decreases above Tc, with a cusp at the critical point.

preprint2011arXiv

Denaturation of Circular DNA: Supercoil Mechanism

The denaturation transition which takes place in circular DNA is analyzed by extending the Poland-Scheraga model to include the winding degrees of freedom. We consider the case of a homopolymer whereby the winding number of the double stranded helix, released by a loop denaturation, is absorbed by \emph{supercoils}. We find that as in the case of linear DNA, the order of the transition is determined by the loop exponent $c$. However the first order transition displayed by the PS model for $c>2$ in linear DNA is replaced by a continuous transition with arbitrarily high order as $c$ approaches 2, while the second-order transition found in the linear case in the regime $1<c\le2$ disappears. In addition, our analysis reveals that melting under fixed linking number is a \emph{condensation transition}, where the condensate is a macroscopic loop which appears above the critical temperature.

Amir Bar

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Lifting Embodied World Models for Planning and Control

Object-Region Video Transformers

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Visual Prompting via Image Inpainting

Learning Canonical Representations for Scene Graph to Image Generation

Exact extreme value statistics at mixed order transitions

Mixed order transition and condensation in exactly soluble one dimensional spin model

Mixed order phase transition in a one dimensional model

Denaturation of Circular DNA: Supercoils and Overtwist

Macroscopic loop formation in circular DNA denaturation

Denaturation of Circular DNA: Supercoil Mechanism