Source author record

Karthik Raman

Karthik Raman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Computation and Language Machine Learning Molecular Networks Artificial Intelligence Biomolecules Computer Vision Cryptography and Security

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not. We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.

preprint2021arXiv

The art of molecular computing: whence and whither

An astonishingly diverse biomolecular circuitry orchestrates the functioning machinery underlying every living cell. These biomolecules and their circuits have been engineered not only for various industrial applications but also to perform other atypical functions that they were not evolved for - including computation. Various kinds of computational challenges, such as solving NP-complete problems with many variables, logical computation, neural network operations, and cryptography, have all been attempted through this unconventional computing paradigm. In this review, we highlight key experiments across three different eras of molecular computation, beginning with molecular solutions, transitioning to logic circuits and ultimately, more complex molecular networks. We also discuss a variety of applications of molecular computation, from solving NP-hard problems to self-assembled nanostructures for delivering molecules, and provide a glimpse into the exciting potential that molecular computing holds for the future.

preprint2020arXiv

Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (version 1.0)

This report describes the aggregation and anonymization process applied to the initial version of COVID-19 Search Trends symptoms dataset (published at https://goo.gle/covid19symptomdataset on September 2, 2020), a publicly available dataset that shows aggregated, anonymized trends in Google searches for symptoms (and some related topics). The anonymization process is designed to protect the daily symptom search activity of every user with $\varepsilon$-differential privacy for $\varepsilon$ = 1.68.

preprint2019arXiv

Learning Multilingual Word Embeddings Using Image-Text Data

There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

preprint2014arXiv

Fast-SL: An efficient algorithm to identify synthetic lethal reaction sets in metabolic networks

Synthetic lethal reaction/gene-sets are sets of reactions/genes where only the simultaneous removal of all reactions/genes in the set abolishes growth of an organism. In silico, synthetic lethal sets can be identified by simulating the effect of removal of gene sets from the reconstructed genome-scale metabolic network of an organism. Flux balance analysis (FBA), based on linear programming, has emerged as a powerful tool for the in silico analyses of metabolic networks. To identify all possible synthetic lethal reactions combinations, an exhaustive sampling of all possible combinations is computationally expensive. We surmount the computational complexity of exhaustive search by iteratively restricting the sample space of reaction combinations for search, resulting in a substantial reduction in the running time. We here propose an algorithm, Fast-SL, which provides an efficient way to analyse metabolic networks for higher order lethal reaction sets. Fast-SL offers a substantial speed-up through a massive reduction in the search space for synthetic lethals; in the case of E. coli, Fast-SL reduces the search space for synthetic lethal triplets by over 4000-fold. Fast-SL also compares favourably with SL Finder, an algorithm for identifying synthetic lethal sets, by Suthers et al (2009), which involves the solution of a bi-level Mixed Integer Linear Programming problem. We have implemented the Fast-SL algorithm in MATLAB, building upon COBRA toolbox v2.0.

preprint2014arXiv

Methods for Ordinal Peer Grading

MOOCs have the potential to revolutionize higher education with their wide outreach and accessibility, but they require instructors to come up with scalable alternates to traditional student evaluation. Peer grading -- having students assess each other -- is a promising approach to tackling the problem of evaluation at scale, since the number of "graders" naturally scales with the number of students. However, students are not trained in grading, which means that one cannot expect the same level of grading skills as in traditional settings. Drawing on broad evidence that ordinal feedback is easier to provide and more reliable than cardinal feedback, it is therefore desirable to allow peer graders to make ordinal statements (e.g. "project X is better than project Y") and not require them to make cardinal statements (e.g. "project X is a B-"). Thus, in this paper we study the problem of automatically inferring student grades from ordinal peer feedback, as opposed to existing methods that require cardinal peer feedback. We formulate the ordinal peer grading problem as a type of rank aggregation problem, and explore several probabilistic models under which to estimate student grades and grader reliability. We study the applicability of these methods using peer grading data collected from a real class -- with instructor and TA grades as a baseline -- and demonstrate the efficacy of ordinal feedback techniques in comparison to existing cardinal peer grading methods. Finally, we compare these peer-grading techniques to traditional evaluation techniques.

preprint2011arXiv

Structured Learning of Two-Level Dynamic Rankings

For ambiguous queries, conventional retrieval systems are bound by two conflicting goals. On the one hand, they should diversify and strive to present results for as many query intents as possible. On the other hand, they should provide depth for each intent by displaying more than a single result. Since both diversity and depth cannot be achieved simultaneously in the conventional static retrieval model, we propose a new dynamic ranking approach. Dynamic ranking models allow users to adapt the ranking through interaction, thus overcoming the constraints of presenting a one-size-fits-all static ranking. In particular, we propose a new two-level dynamic ranking model for presenting search results to the user. In this model, a user's interactions with the first-level ranking are used to infer this user's intent, so that second-level rankings can be inserted to provide more results relevant for this intent. Unlike for previous dynamic ranking models, we provide an algorithm to efficiently compute dynamic rankings with provable approximation guarantees for a large family of performance measures. We also propose the first principled algorithm for learning dynamic ranking functions from training data. In addition to the theoretical results, we provide empirical evidence demonstrating the gains in retrieval quality that our method achieves over conventional approaches.

Karthik Raman

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

The art of molecular computing: whence and whither

Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (version 1.0)

Learning Multilingual Word Embeddings Using Image-Text Data

Fast-SL: An efficient algorithm to identify synthetic lethal reaction sets in metabolic networks

Methods for Ordinal Peer Grading

Structured Learning of Two-Level Dynamic Rankings