Researcher profile

Ramya Tekumalla

Ramya Tekumalla contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
1close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered from the Twitter stream using the name of the natural disaster and the filtered tweets are sent for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming, at times inaccurate, and more importantly not scalable in terms of size and real-time use. In this work, we curate a silver standard dataset using weak supervision. In order to validate its utility, we train machine learning models on the weakly supervised data to identify three different types of natural disasters i.e earthquakes, hurricanes and floods. Our results demonstrate that models trained on the silver standard dataset achieved performance greater than 90% when classifying a manually curated, gold-standard dataset. To enable reproducible research and additional downstream utility, we release the silver standard dataset for the scientific community.

preprint2020arXiv

A large-scale Twitter dataset for drug safety applications mined from publicly existing resources

With the increase in popularity of deep learning models for natural language processing (NLP) tasks, in the field of Pharmacovigilance, more specifically for the identification of Adverse Drug Reactions (ADRs), there is an inherent need for large-scale social-media datasets aimed at such tasks. With most researchers allocating large amounts of time to crawl Twitter or buying expensive pre-curated datasets, then manually annotating by humans, these approaches do not scale well as more and more data keeps flowing in Twitter. In this work we re-purpose a publicly available archived dataset of more than 9.4 billion Tweets with the objective of creating a very large dataset of drug usage-related tweets. Using existing manually curated datasets from the literature, we then validate our filtered tweets for relevance using machine learning methods, with the end result of a publicly available dataset of 1,181,993 million tweets for public use. We provide all code and detailed procedure on how to extract this dataset and the selected tweet ids for researchers to use.

preprint2020arXiv

Social Media Mining Toolkit (SMMT)

There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal exceptions, the few that do, place the burden on the researcher to figure out how to fetch the data, how to best format their data, and how to create automatic and manual annotations on the acquired data. In order to address this pressing issue, we introduce the Social Media Mining Toolkit (SMMT), a suite of tools aimed to encapsulate the cumbersome details of acquiring, preprocessing, annotating and standardizing social media data. The purpose of our toolkit is for researchers to focus on answering research questions, and not the technical aspects of using social media data. By using a standard toolkit, researchers will be able to acquire, use, and release data in a consistent way that is transparent for everybody using the toolkit, hence, simplifying research reproducibility and accessibility in the social media domain.