Source author record

Ahmed Elgammal

Ahmed Elgammal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language Artificial Intelligence cs.CY Human-Computer Interaction Multimedia Information Retrieval Information Theory math.IT math.SP Neural and Evolutionary Computing physics.hist-ph Social and Information Networks

Catalog footprint

What is connected

31works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models

We present a machine learning system that can quantify fine art paintings with a set of visual elements and principles of art. This formal analysis is fundamental for understanding art, but developing such a system is challenging. Paintings have high visual complexities, but it is also difficult to collect enough training data with direct labels. To resolve these practical limitations, we introduce a novel mechanism, called proxy learning, which learns visual concepts in paintings though their general relation to styles. This framework does not require any visual annotation, but only uses style labels and a general relationship between visual concepts and style. In this paper, we propose a novel proxy model and reformulate four pre-existing methods in the context of proxy learning. Through quantitative and qualitative comparison, we evaluate these methods and compare their effectiveness in quantifying the artistic visual concepts, where the general relationship is estimated by language models; GloVe or BERT. The language modeling is a practical and scalable solution requiring no labeling, but it is inevitably imperfect. We demonstrate how the new proxy model is robust to the imperfection, while the other models are sensitively affected by it.

preprint2021arXiv

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose a light-weight GAN structure that gains superior quality on 1024*1024 resolution. Notably, the model converges from scratch with just a few hours of training on a single RTX-2080 GPU, and has a consistent performance, even with less than 100 training samples. Two technique designs constitute our work, a skip-layer channel-wise excitation module and a self-supervised discriminator trained as a feature-encoder. With thirteen datasets covering a wide variety of image domains (The datasets and code are available at: https://github.com/odegeasslbc/FastGAN-pytorch), we show our model's superior performance compared to the state-of-the-art StyleGAN2, when data and computing budget are limited.

preprint2020arXiv

OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization

Exploring the potential of GANs for unsupervised disentanglement learning, this paper proposes a novel GAN-based disentanglement framework with One-Hot Sampling and Orthogonal Regularization (OOGAN). While previous works mostly attempt to tackle disentanglement learning through VAE and seek to implicitly minimize the Total Correlation (TC) objective with various sorts of approximation methods, we show that GANs have a natural advantage in disentangling with an alternating latent variable (noise) sampling method that is straightforward and robust. Furthermore, we provide a brand-new perspective on designing the structure of the generator and discriminator, demonstrating that a minor structural change and an orthogonal regularization on model weights entails an improved disentanglement. Instead of experimenting on simple toy datasets, we conduct experiments on higher-resolution images and show that OOGAN greatly pushes the boundary of unsupervised disentanglement.

preprint2016arXiv

Automatic Annotation of Structured Facts in Images

Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions. Example structured facts include attributed objects (e.g., <flower, red>), actions (e.g., <baby, smile>), interactions (e.g., <man, walking, dog>), and positional information (e.g., <vase, on, table>). The collected annotations are in the form of fact-image pairs (e.g.,<man, walking, dog> and an image region containing this fact). With a language approach, the proposed method is able to collect hundreds of thousands of visual fact annotations with accuracy of 83% according to human judgment. Our method automatically collected more than 380,000 visual fact annotations and more than 110,000 unique visual facts from images with captions and localized them in images in less than one day of processing time on standard CPU platforms.

preprint2016arXiv

Convolutional Models for Joint Object Categorization and Pose Estimation

In the task of Object Recognition, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose regression using these approaches has received relatively much less attention. In this paper we show how deep architectures, specifically Convolutional Neural Networks (CNN), can be adapted to the task of simultaneous categorization and pose estimation of objects. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations of CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets. Our models achieve better than state-of-the-art performance on both datasets.

preprint2016arXiv

Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance

This paper is focused on studying the view-manifold structure in the feature spaces implied by the different layers of Convolutional Neural Networks (CNN). There are several questions that this paper aims to answer: Does the learned CNN representation achieve viewpoint invariance? How does it achieve viewpoint invariance? Is it achieved by collapsing the view manifolds, or separating them while preserving them? At which layer is view invariance achieved? How can the structure of the view manifold at each layer of a deep convolutional neural network be quantified experimentally? How does fine-tuning of a pre-trained CNN on a multi-view dataset affect the representation at each layer of the network? In order to answer these questions we propose a methodology to quantify the deformation and degeneracy of view manifolds in CNN layers. We apply this methodology and report interesting results in this paper that answer the aforementioned questions.

preprint2016arXiv

Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenbaur transforms, which arise from the seminal result of Schoenberg (1938). These kernels can be thought of as polynomial combination of input features in a high dimensional reproducing kernel Hilbert space (RKHS). We learn kernels over input and output for structured data, such that, dependency between kernel features is maximized. We use Hilbert-Schmidt Independence Criterion (HSIC) to measure this. We also give an efficient, matrix decomposition-based algorithm to learn these kernel transformations, and demonstrate state-of-the-art results on several real-world datasets.

preprint2016arXiv

Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition

Speech recognition is a challenging problem. Due to the acoustic limitations, using visual information is essential for improving the recognition accuracy in real-life unconstraint situations. One common approach is to model the visual recognition as nonlinear optimization problem. Measuring the distances between visual units is essential for solving this problem. Embedding the visual units on a manifold and using manifold kernels is one way to measure these distances. This work is intended to evaluate the performance of several manifold kernels for solving the problem of visual speech recognition. We show the theory behind each kernel. We apply manifold kernel partial least squares framework to OuluVs and AvLetters databases, and show empirical comparison between all kernels. This framework provides convenient way to explore different kernels.

preprint2016arXiv

Modelling depth for nonparametric foreground segmentation using RGBD devices

The problem of detecting changes in a scene and segmenting the foreground from background is still challenging, despite previous work. Moreover, new RGBD capturing devices include depth cues, which could be incorporated to improve foreground segmentation. In this work, we present a new nonparametric approach where a unified model mixes the device multiple information cues. In order to unify all the device channel cues, a new probabilistic depth data model is also proposed where we show how handle the inaccurate data to improve foreground segmentation. A new RGBD video dataset is presented in order to introduce a new standard for comparison purposes of this kind of algorithms. Results show that the proposed approach can handle several practical situations and obtain good results in all cases.

preprint2016arXiv

Sherlock: Scalable Fact Learning in Images

We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g., $<$boy$>$), (2) attributes (e.g., $<$boy, tall$>$), (3) actions (e.g., $<$boy, playing$>$), and (4) interactions (e.g., $<$boy, riding, a horse $>$). Each fact has a semantic language view (e.g., $<$ boy, playing$>$) and a visual view (an image with this fact). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview learning literature and also introduce two learning representation models as potential baselines. We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of more than 202,000 facts and 814,000 images. Our experiments show the advantage of relating facts by the structure by the proposed models compared to the designed baselines on bidirectional fact retrieval.

preprint2016arXiv

Supervised Dimensionality Reduction via Distance Correlation Maximization

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, Szekely et. al. (2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximizaiton method of \Parizi et. al. (2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

preprint2016arXiv

The Role of Typicality in Object Classification: Improving The Generalization Capacity of Convolutional Neural Networks

Deep artificial neural networks have made remarkable progress in different tasks in the field of computer vision. However, the empirical analysis of these models and investigation of their failure cases has received attention recently. In this work, we show that deep learning models cannot generalize to atypical images that are substantially different from training images. This is in contrast to the superior generalization ability of the visual system in the human brain. We focus on Convolutional Neural Networks (CNN) as the state-of-the-art models in object recognition and classification; investigate this problem in more detail, and hypothesize that training CNN models suffer from unstructured loss minimization. We propose computational models to improve the generalization capacity of CNNs by considering how typical a training image looks like. By conducting an extensive set of experiments we show that involving a typicality measure can improve the classification results on a new set of images by a large margin. More importantly, this significant improvement is achieved without fine-tuning the CNN model on the target image set.

preprint2016arXiv

Write a Classifier: Predicting Visual Classifiers from Unstructured Text

People typically learn through exposure to visual concepts associated with linguistic descriptions. For instance, teaching visual object categories to children is often accompanied by descriptions in text or speech. In a machine learning context, these observations motivates us to ask whether this learning process could be computationally modeled to learn visual classifiers. More specifically, the main question of this work is how to utilize purely textual description of visual classes with no training images, to learn explicit visual classifiers for them. We propose and investigate two baseline formulations, based on regression and domain transfer, that predict a linear classifier. Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the parameters of a linear classifier. We also propose a generic kernelized models where a kernel classifier is predicted in the form defined by the representer theorem. The kernelized models allow defining and utilizing any two RKHS (Reproducing Kernel Hilbert Space) kernel functions in the visual space and text space, respectively. We finally propose a kernel function between unstructured text descriptions that builds on distributional semantics, which shows an advantage in our setting and could be useful for other applications. We applied all the studied models to predict visual classifiers on two fine-grained and challenging categorization datasets (CU Birds and Flower Datasets), and the results indicate successful predictions of our final model over several baselines that we designed.

preprint2015arXiv

Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation

Due to large variations in shape, appearance, and viewing conditions, object recognition is a key precursory challenge in the fields of object manipulation and robotic/AI visual reasoning in general. Recognizing object categories, particular instances of objects and viewpoints/poses of objects are three critical subproblems robots must solve in order to accurately grasp/manipulate objects and reason about their environments. Multi-view images of the same object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g. visual/depth descriptor spaces). These object manifolds share the same topology despite being geometrically different. Each object manifold can be represented as a deformed version of a unified manifold. The object manifolds can thus be parameterized by its homeomorphic mapping/reconstruction from the unified manifold. In this work, we develop a novel framework to jointly solve the three challenging recognition sub-problems, by explicitly modeling the deformations of object manifolds and factorizing it in a view-invariant space for recognition. We perform extensive experiments on several challenging datasets and achieve state-of-the-art results.

preprint2015arXiv

Generalized Twin Gaussian Processes using Sharma-Mittal Divergence

There has been a growing interest in mutual information measures due to their wide range of applications in Machine Learning and Computer Vision. In this paper, we present a generalized structured regression framework based on Shama-Mittal divergence, a relative entropy measure, which is introduced to the Machine Learning community in this work. Sharma-Mittal (SM) divergence is a generalized mutual information measure for the widely used Rényi, Tsallis, Bhattacharyya, and Kullback-Leibler (KL) relative entropies. Specifically, we study Sharma-Mittal divergence as a cost function in the context of the Twin Gaussian Processes (TGP)~\citep{Bo:2010}, which generalizes over the KL-divergence without computational penalty. We show interesting properties of Sharma-Mittal TGP (SMTGP) through a theoretical analysis, which covers missing insights in the traditional TGP formulation. However, we generalize this theory based on SM-divergence instead of KL-divergence which is a special case. Experimentally, we evaluated the proposed SMTGP framework on several datasets. The results show that SMTGP reaches better predictions than KL-based TGP, since it offers a bigger class of models through its parameters that we learn from the data.

preprint2015arXiv

Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.

preprint2015arXiv

Learning Hypergraph-regularized Attribute Predictors

We present a novel attribute learning framework named Hypergraph-based Attribute Predictor (HAP). In HAP, a hypergraph is leveraged to depict the attribute relations in the data. Then the attribute prediction problem is casted as a regularized hypergraph cut problem in which HAP jointly learns a collection of attribute projections from the feature space to a hypergraph embedding space aligned with the attribute space. The learned projections directly act as attribute classifiers (linear and kernelized). This formulation leads to a very efficient approach. By considering our model as a multi-graph cut task, our framework can flexibly incorporate other available information, in particular class label. We apply our approach to attribute prediction, Zero-shot and $N$-shot learning tasks. The results on AWA, USAA and CUB databases demonstrate the value of our methods in comparison with the state-of-the-art approaches.

preprint2015arXiv

Quantifying Creativity in Art Networks

Can we develop a computer algorithm that assesses the creativity of a painting given its context within art history? This paper proposes a novel computational framework for assessing the creativity of creative products, such as paintings, sculptures, poetry, etc. We use the most common definition of creativity, which emphasizes the originality of the product and its influential value. The proposed computational framework is based on constructing a network between creative products and using this network to infer about the originality and influence of its nodes. Through a series of transformations, we construct a Creativity Implication Network. We show that inference about creativity in this network reduces to a variant of network centrality problems which can be solved efficiently. We apply the proposed framework to the task of quantifying creativity of paintings (and sculptures). We experimented on two datasets with over 62K paintings to illustrate the behavior of the proposed framework. We also propose a methodology for quantitatively validating the results of the proposed algorithm, which we call the "time machine experiment".

preprint2015arXiv

Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

In this paper we propose a framework for predicting kernelized classifiers in the visual domain for categories with no training images where the knowledge comes from textual description about these categories. Through our optimization framework, the proposed approach is capable of embedding the class-level knowledge from the text domain as kernel classifiers in the visual domain. We also proposed a distributional semantic kernel between text descriptions which is shown to be effective in our setting. The proposed framework is not restricted to textual descriptions, and can also be applied to other forms knowledge representations. Our approach was applied for the challenging task of zero-shot learning of fine-grained categories from text descriptions of these categories.

preprint2015arXiv

Toward a Taxonomy and Computational Models of Abnormalities in Images

The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to discover a coarse taxonomy of the reasons for abnormality. Our experiments reveal three major categories of abnormality: object-centric, scene-centric, and contextual. Based on this taxonomy, we propose a comprehensive computational model that can predict all different types of abnormality in images and outperform prior arts in abnormality recognition.

preprint2015arXiv

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.

preprint2014arXiv

Abnormal Object Recognition: A Comprehensive Study

When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviations from prototypical properties of categories. Our model can recognize abnormalities and report the main reasons of any recognized abnormality. We introduce the abnormality detection dataset and show interesting results on how to reason about abnormalities.

preprint2014arXiv

Collaborative Discriminant Locality Preserving Projections With its Application to Face Recognition

We present a novel Discriminant Locality Preserving Projections (DLPP) algorithm named Collaborative Discriminant Locality Preserving Projection (CDLPP). In our algorithm, the discriminating power of DLPP are further exploited from two aspects. On the one hand, the global optimum of class scattering is guaranteed via using the between-class scatter matrix to replace the original denominator of DLPP. On the other hand, motivated by collaborative representation, an $L_2$-norm constraint is imposed to the projections to discover the collaborations of dimensions in the sample space. We apply our algorithm to face recognition. Three popular face databases, namely AR, ORL and LFW-A, are employed for evaluating the performance of CDLPP. Extensive experimental results demonstrate that CDLPP significantly improves the discriminating power of DLPP and outperforms the state-of-the-arts.

preprint2014arXiv

Computational Beauty: Aesthetic Judgment at the Intersection of Art and Science

In part one of the Critique of Judgment, Immanuel Kant wrote that "the judgment of taste...is not a cognitive judgment, and so not logical, but is aesthetic."\cite{Kant} While the condition of aesthetic discernment has long been the subject of philosophical discourse, the role of the arbiters of that judgment has more often been assumed than questioned. The art historian, critic, connoisseur, and curator have long held the esteemed position of the aesthetic judge, their training, instinct, and eye part of the inimitable subjective processes that Kant described as occurring upon artistic evaluation. Although the concept of intangible knowledge in regard to aesthetic theory has been much explored, little discussion has arisen in response to the development of new types of artificial intelligence as a challenge to the seemingly ineffable abilities of the human observer. This paper examines the developments in the field of computer vision analysis of paintings from canonical movements with the history of Western art and the reaction of art historians to the application of this technology in the field. Through an investigation of the ethical consequences of this innovative technology, the unquestioned authority of the art expert is challenged and the subjective nature of aesthetic judgment is brought to philosophical scrutiny once again.

preprint2014arXiv

On The Effect of Hyperedge Weights On Hypergraph Learning

Hypergraph is a powerful representation in several computer vision, machine learning and pattern recognition problems. In the last decade, many researchers have been keen to develop different hypergraph models. In contrast, no much attention has been paid to the design of hyperedge weights. However, many studies on pairwise graphs show that the choice of edge weight can significantly influence the performances of such graph algorithms. We argue that this also applies to hypegraphs. In this paper, we empirically discuss the influence of hyperedge weight on hypegraph learning via proposing three novel hyperedge weights from the perspectives of geometry, multivariate statistical analysis and linear regression. Extensive experiments on ORL, COIL20, JAFFE, Sheffield, Scene15 and Caltech256 databases verify our hypothesis. Similar to graph learning, several representative hyperedge weighting schemes can be concluded by our experimental studies. Moreover, the experiments also demonstrate that the combinations of such weighting schemes and conventional hypergraph models can get very promising classification and clustering performances in comparison with some recent state-of-the-art algorithms.

preprint2014arXiv

Text to Multi-level MindMaps: A Novel Method for Hierarchical Visual Abstraction of Natural Language Text

MindMapping is a well-known technique used in note taking, which encourages learning and studying. MindMapping has been manually adopted to help present knowledge and concepts in a visual form. Unfortunately, there is no reliable automated approach to generate MindMaps from Natural Language text. This work firstly introduces MindMap Multilevel Visualization concept which is to jointly visualize and summarize textual information. The visualization is achieved pictorially across multiple levels using semantic information (i.e. ontology), while the summarization is achieved by the information in the highest levels as they represent abstract information in the text. This work also presents the first automated approach that takes a text input and generates a MindMap visualization out of it. The approach could visualize text documents in multilevel MindMaps, in which a high-level MindMap node could be expanded into child MindMaps. \ignore{ As far as we know, this is the first work that view MindMapping as a new approach to jointly summarize and visualize textual information.} The proposed method involves understanding of the input text and converting it into intermediate Detailed Meaning Representation (DMR). The DMR is then visualized with two modes; Single level or Multiple levels, which is convenient for larger text. The generated MindMaps from both approaches were evaluated based on Human Subject experiments performed on Amazon Mechanical Turk with various parameter settings.

preprint2014arXiv

The Digital Humanities Unveiled: Perceptions Held by Art Historians and Computer Scientists about Computer Vision Technology

Although computer scientists are generally familiar with the achievements of computer vision technology in art history, these accomplishments are little known and often misunderstood by scholars in the humanities. To clarify the parameters of this seeming disjuncture, we have addressed the concerns that one example of the digitization of the humanities poses on social, philosophical, and practical levels. In support of our assessment of the perceptions held by computer scientists and art historians about the use of computer vision technology to examine art, we based our interpretations on two surveys that were distributed in August 2014. In this paper, the development of these surveys and their results are discussed in the context of the major philosophical conclusions of our research in this area to date.

preprint2014arXiv

Toward Automated Discovery of Artistic Influence

Considering the huge amount of art pieces that exist, there is valuable information to be discovered. Examining a painting, an expert can determine its style, genre, and the time period that the painting belongs. One important task for art historians is to find influences and connections between artists. Is influence a task that a computer can measure? The contribution of this paper is in exploring the problem of computer-automated suggestion of influences between artists, a problem that was not addressed before in a general setting. We first present a comparative study of different classification methodologies for the task of fine-art style classification. A two-level comparative study is performed for this classification problem. The first level reviews the performance of discriminative vs. generative models, while the second level touches the features aspect of the paintings and compares semantic-level features vs. low-level and intermediate-level features present in the painting. Then, we investigate the question "Who influenced this artist?" by looking at his masterpieces and comparing them to others. We pose this interesting question as a knowledge discovery problem. For this purpose, we investigated several painting-similarity and artist-similarity measures. As a result, we provide a visualization of artists (Map of Artists) based on the similarity between their works

preprint2013arXiv

Visual-Semantic Scene Understanding by Sharing Labels in a Context Network

We consider the problem of naming objects in complex, natural scenes containing widely varying object appearance and subtly different names. Informed by cognitive research, we propose an approach based on sharing context based object hypotheses between visual and lexical spaces. To this end, we present the Visual Semantic Integration Model (VSIM) that represents object labels as entities shared between semantic and visual contexts and infers a new image by updating labels through context switching. At the core of VSIM is a semantic Pachinko Allocation Model and a visual nearest neighbor Latent Dirichlet Allocation Model. For inference, we derive an iterative Data Augmentation algorithm that pools the label probabilities and maximizes the joint label posterior of an image. Our model surpasses the performance of state-of-art methods in several visual tasks on the challenging SUN09 dataset.

preprint2011arXiv

A Spectral Theory for Tensors

In this paper we propose a general spectral theory for tensors. Our proposed factorization decomposes a tensor into a product of orthogonal and scaling tensors. At the same time, our factorization yields an expansion of a tensor as a summation of outer products of lower order tensors . Our proposed factorization shows the relationship between the eigen-objects and the generalised characteristic polynomials. Our framework is based on a consistent multilinear algebra which explains how to generalise the notion of matrix hermicity, matrix transpose, and most importantly the notion of orthogonality. Our proposed factorization for a tensor in terms of lower order tensors can be recursively applied so as to naturally induces a spectral hierarchy for tensors.

preprint2011arXiv

Learning Hypergraph Labeling for Feature Matching

This study poses the feature correspondence problem as a hypergraph node labeling problem. Candidate feature matches and their subsets (usually of size larger than two) are considered to be the nodes and hyperedges of a hypergraph. A hypergraph labeling algorithm, which models the subset-wise interaction by an undirected graphical model, is applied to label the nodes (feature correspondences) as correct or incorrect. We describe a method to learn the cost function of this labeling algorithm from labeled examples using a graphical model training algorithm. The proposed feature matching algorithm is different from the most of the existing learning point matching methods in terms of the form of the objective function, the cost function to be learned and the optimization method applied to minimize it. The results on standard datasets demonstrate how learning over a hypergraph improves the matching performance over existing algorithms, notably one that also uses higher order information without learning.

Ahmed Elgammal

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization

Automatic Annotation of Structured Facts in Images

Convolutional Models for Joint Object Categorization and Pose Estimation

Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance

Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition

Modelling depth for nonparametric foreground segmentation using RGBD devices

Sherlock: Scalable Fact Learning in Images

Supervised Dimensionality Reduction via Distance Correlation Maximization

The Role of Typicality in Object Classification: Improving The Generalization Capacity of Convolutional Neural Networks

Write a Classifier: Predicting Visual Classifiers from Unstructured Text

Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation

Generalized Twin Gaussian Processes using Sharma-Mittal Divergence

Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

Learning Hypergraph-regularized Attribute Predictors

Quantifying Creativity in Art Networks

Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

Toward a Taxonomy and Computational Models of Abnormalities in Images

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Abnormal Object Recognition: A Comprehensive Study

Collaborative Discriminant Locality Preserving Projections With its Application to Face Recognition

Computational Beauty: Aesthetic Judgment at the Intersection of Art and Science

On The Effect of Hyperedge Weights On Hypergraph Learning

Text to Multi-level MindMaps: A Novel Method for Hierarchical Visual Abstraction of Natural Language Text

The Digital Humanities Unveiled: Perceptions Held by Art Historians and Computer Scientists about Computer Vision Technology

Toward Automated Discovery of Artistic Influence

Visual-Semantic Scene Understanding by Sharing Labels in a Context Network

A Spectral Theory for Tensors

Learning Hypergraph Labeling for Feature Matching