Researcher profile

Aneesh Vartakavi

Aneesh Vartakavi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

An Experience-based Direct Generation approach to Automatic Image Cropping

Automatic Image Cropping is a challenging task with many practical downstream applications. The task is often divided into sub-problems - generating cropping candidates, finding the visually important regions, and determining aesthetics to select the most appealing candidate. Prior approaches model one or more of these sub-problems separately, and often combine them sequentially. We propose a novel convolutional neural network (CNN) based method to crop images directly, without explicitly modeling image aesthetics, evaluating multiple crop candidates, or detecting visually salient regions. Our model is trained on a large dataset of images cropped by experienced editors and can simultaneously predict bounding boxes for multiple fixed aspect ratios. We consider the aspect ratio of the cropped image to be a critical factor that influences aesthetics. Prior approaches for automatic image cropping, did not enforce the aspect ratio of the outputs, likely due to a lack of datasets for this task. We, therefore, benchmark our method on public datasets for two related tasks - first, aesthetic image cropping without regard to aspect ratio, and second, thumbnail generation that requires fixed aspect ratio outputs, but where aesthetics are not crucial. We show that our strategy is competitive with or performs better than existing methods in both these tasks. Furthermore, our one-stage model is easier to train and significantly faster than existing two-stage or end-to-end methods for inference. We present a qualitative evaluation study, and find that our model is able to generalize to diverse images from unseen datasets and often retains compositional properties of the original images after cropping. Our results demonstrate that explicitly modeling image aesthetics or visual attention regions is not necessarily required to build a competitive image cropping algorithm.

preprint2020arXiv

PodSumm -- Podcast Audio Summarization

The diverse nature, scale, and specificity of podcasts present a unique challenge to content discovery systems. Listeners often rely on text descriptions of episodes provided by the podcast creators to discover new content. Some factors like the presentation style of the narrator and production quality are significant indicators of subjective user preference but are difficult to quantify and not reflected in the text descriptions provided by the podcast creators. We propose the automated creation of podcast audio summaries to aid in content discovery and help listeners to quickly preview podcast content before investing time in listening to an entire episode. In this paper, we present a method to automatically construct a podcast summary via guidance from the text-domain. Our method performs two key steps, namely, audio to text transcription and text summary generation. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. We fine-tune a PreSumm[10] model with our augmented dataset and perform an ablation study. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset. We hope these results may inspire future research in this direction.