Source author record

Sathyanarayanan N. Aakur

Sathyanarayanan N. Aakur appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Machine Learning Quantitative Methods

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process

Food is not only a basic human necessity but also a key factor driving a society's health and economic well-being. As a result, the cooking domain is a popular use-case to demonstrate decision-support (AI) capabilities in service of benefits like precision health with tools ranging from information retrieval interfaces to task-oriented chatbots. An AI here should understand concepts in the food domain (e.g., recipes, ingredients), be tolerant to failures encountered while cooking (e.g., browning of butter), handle allergy-based substitutions, and work with multiple data modalities (e.g. text and images). However, the recipes today are handled as textual documents which makes it difficult for machines to read, reason and handle ambiguity. This demands a need for better representation of the recipes, overcoming the ambiguity and sparseness that exists in the current textual documents. In this paper, we discuss the construction of a machine-understandable rich recipe representation (R3), in the form of plans, from the recipes available in natural language. R3 is infused with additional knowledge such as information about allergens and images of ingredients, possible failures and tips for each atomic cooking step. To show the benefits of R3, we also present TREAT, a tool for recipe retrieval which uses R3 to perform multi-modal reasoning on the recipe's content (plan objects - ingredients and cooking tools), food preparation process (plan actions and time), and media type (image, text). R3 leads to improved retrieval efficiency and new capabilities that were hither-to not possible in textual representation.

preprint2022arXiv

Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision

Advances in deep learning have enabled the development of models that have exhibited a remarkable tendency to recognize and even localize actions in videos. However, they tend to experience errors when faced with scenes or examples beyond their initial training environment. Hence, they fail to adapt to new domains without significant retraining with large amounts of annotated data. In this paper, we propose to overcome these limitations by moving to an open-world setting by decoupling the ideas of recognition and reasoning. Building upon the compositional representation offered by Grenander's Pattern Theory formalism, we show that attention and commonsense knowledge can be used to enable the self-supervised discovery of novel actions in egocentric videos in an open-world setting, where data from the observed environment (the target domain) is open i.e., the vocabulary is partially known and training examples (both labeled and unlabeled) are not available. We show that our approach can infer and learn novel classes for open vocabulary classification in egocentric videos and novel object detection with zero supervision. Extensive experiments show its competitive performance on two publicly available egocentric action recognition datasets (GTEA Gaze and GTEA Gaze+) under open-world conditions.

preprint2020arXiv

Action Localization through Continual Predictive Learning

The problem of action recognition involves locating the action in the video, both over time and spatially in the image. The dominant current approaches use supervised learning to solve this problem, and require large amounts of annotated training data, in the form of frame-level bounding box annotations around the region of interest. In this paper, we present a new approach based on continual learning that uses feature-level predictions for self-supervision. It does not require any training annotations in terms of frame-level bounding boxes. The approach is inspired by cognitive models of visual event perception that propose a prediction-based approach to event understanding. We use a stack of LSTMs coupled with CNN encoder, along with novel attention mechanisms, to model the events in the video and use this model to predict high-level features for the future frames. The prediction errors are used to continuously learn the parameters of the models. This self-supervised framework is not complicated as other approaches but is very effective in learning robust visual representations for both labeling and localization. It should be noted that the approach outputs in a streaming fashion, requiring only a single pass through the video, making it amenable for real-time processing. We demonstrate this on three datasets - UCF Sports, JHMDB, and THUMOS'13 and show that the proposed approach outperforms weakly-supervised and unsupervised baselines and obtains competitive performance compared to fully supervised baselines. Finally, we show that the proposed framework can generalize to egocentric videos and obtain state-of-the-art results in unsupervised gaze prediction.

preprint2020arXiv

Genome Sequence Classification for Animal Diagnostics with Graph Representations and Deep Neural Networks

Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disease in cattle with multiple etiologies, including bacterial and viral. It is estimated that mortality, morbidity, therapy, and quarantine resulting from BRDC account for significant losses in the cattle industry. Early detection and management of BRDC are crucial in mitigating economic losses. Current animal disease diagnostics is based on traditional tests such as bacterial culture, serolog, and Polymerase Chain Reaction (PCR) tests. Even though these tests are validated for several diseases, their main challenge is their limited ability to detect the presence of multiple pathogens simultaneously. Advancements of data analytics and machine learning and applications over metagenome sequencing are setting trends on several applications. In this work, we demonstrate a machine learning approach to identify pathogen signatures present in bovine metagenome sequences using k-mer-based network embedding followed by a deep learning-based classification task. With experiments conducted on two different simulated datasets, we show that networks-based machine learning approaches can detect pathogen signature with up to 89.7% accuracy. We will make the data available publicly upon request to tackle this important problem in a difficult domain.

Sathyanarayanan N. Aakur

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process

Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision

Action Localization through Continual Predictive Learning

Genome Sequence Classification for Animal Diagnostics with Graph Representations and Deep Neural Networks