Source author record

Vijay Mago

Vijay Mago appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning Artificial Intelligence cs.CY Social and Information Networks Computer Vision eess.SY Information Retrieval Neural and Evolutionary Computing physics.soc-ph Systems and Control

Catalog footprint

What is connected

9works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Survey on Automated Sarcasm Detection on Twitter

Automatic sarcasm detection is a growing field in computer science. Short text messages are increasingly used for communication, especially over social media platforms such as Twitter. Due to insufficient or missing context, unidentified sarcasm in these messages can invert the meaning of a statement, leading to confusion and communication failures. This paper covers a variety of current methods used for sarcasm detection, including detection by context, posting history and machine learning models. Additionally, a shift towards deep learning methods is observable, likely due to the benefit of using a model with induced instead of discrete features combined with the innovation of transformers.

preprint2022arXiv

A Survey on Text Simplification

Text Simplification (TS) aims to reduce the linguistic complexity of content to make it easier to understand. Research in TS has been of keen interest, especially as approaches to TS have shifted from manual, hand-crafted rules to automated simplification. This survey seeks to provide a comprehensive overview of TS, including a brief description of earlier approaches used, discussion of various aspects of simplification (lexical, semantic and syntactic), and latest techniques being utilized in the field. We note that the research in the field has clearly shifted towards utilizing deep learning techniques to perform TS, with a specific focus on developing solutions to combat the lack of data available for simplification. We also include a discussion of datasets and evaluations metrics commonly used, along with discussion of related fields within Natural Language Processing (NLP), like semantic similarity.

preprint2022arXiv

Collision Detection: An Improved Deep Learning Approach Using SENet and ResNext

In recent days, with increased population and traffic on roadways, vehicle collision is one of the leading causes of death worldwide. The automotive industry is motivated on developing techniques to use sensors and advancements in the field of computer vision to build collision detection and collision prevention systems to assist drivers. In this article, a deep-learning-based model comprising of ResNext architecture with SENet blocks is proposed. The performance of the model is compared to popular deep learning models like VGG16, VGG19, Resnet50, and stand-alone ResNext. The proposed model outperforms the existing baseline models achieving a ROC-AUC of 0.91 using a significantly less proportion of the GTACrash synthetic data for training, thus reducing the computational overhead.

preprint2021arXiv

Automating Transfer Credit Assessment in Student Mobility -- A Natural Language Processing-based Approach

Student mobility or academic mobility involves students moving between institutions during their post-secondary education, and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student. In general, this process involves domain experts comparing the learning outcomes of the courses, to decide on offering transfer credits to the incoming students. This manual implementation is not only labor-intensive but also influenced by undue bias and administrative complexity. The proposed research article focuses on identifying a model that exploits the advancements in the field of Natural Language Processing (NLP) to effectively automate this process. Given the unique structure, domain specificity, and complexity of learning outcomes (LOs), a need for designing a tailor-made model arises. The proposed model uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of LOs and a transformer-based semantic similarity model to assess the semantic similarity of the LOs. The similarity between LOs is further aggregated to form course to course similarity. Due to the lack of quality benchmark datasets, a new benchmark dataset containing seven course-to-course similarity measures is proposed. Understanding the inherent need for flexibility in the decision-making process the aggregation part of the model offers tunable parameters to accommodate different scenarios. While providing an efficient model to assess the similarity between courses with existing resources, this research work steers future research attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.

preprint2021arXiv

Evolution of Semantic Similarity -- A Survey

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. In order to address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network-based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place, for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

preprint2021arXiv

Validating Optimal COVID-19 Vaccine Distribution Models

With the approval of vaccines for the coronavirus disease by many countries worldwide, most developed nations have begun, and developing nations are gearing up for the vaccination process. This has created an urgent need to provide a solution to optimally distribute the available vaccines once they are received by the authorities. In this paper, we propose a clustering-based solution to select optimal distribution centers and a Constraint Satisfaction Problem framework to optimally distribute the vaccines taking into consideration two factors namely priority and distance. We demonstrate the efficiency of the proposed models using real-world data obtained from the district of Chennai, India. The model provides the decision making authorities with optimal distribution centers across the district and the optimal allocation of individuals across these distribution centers with the flexibility to accommodate a wide range of demographics.

preprint2020arXiv

A summary of the prevalence of Genetic Algorithms in Bioinformatics from 2015 onwards

In recent years, machine learning has seen an increasing presencein a large variety of fields, especially in health care and bioinformatics.More specifically, the field where machine learning algorithms have found most applications is Genetic Algorithms.The objective of this paper is to conduct a survey of articles published from 2015 onwards that deal with Genetic Algorithms(GA) and how they are used in bioinformatics.To achieve the objective, a scoping review was conducted that utilized Google Scholar alongside Publish or Perish and the Scimago Journal & CountryRank to search for respectable sources. Upon analyzing 31 articles from the field of bioinformatics, it became apparent that genetic algorithms rarely form a full application, instead they rely on other vital algorithms such as support vector machines.Indeed, support vector machines were the most prevalent algorithms used alongside genetic algorithms; however, while the usage of such algorithms contributes to the heavy focus on accuracy by GA programs, it often sidelines computation times in the process. In fact, most applications employing GAs for classification and feature selectionare nearing or at 100% success rate, and the focus of future GA development should be directed elsewhere. Population-based searches, like GA, are often combined with other machine learning algorithms. In this scoping review, genetic algorithms combined with Support Vector Machines were found to perform best. The performance metric that was evaluated most often was accuracy. Measuring the accuracy avoids measuring the main weakness of GAs, which is computational time. The future of genetic algorithms could be open-ended evolutionary algorithms, which attempt to increase complexity and find diverse solutions, rather than optimize a fitness function and converge to a single best solution from the initial population of solutions.

preprint2020arXiv

The Homophily Principle in Social Network Analysis

In recent years, social media has become a ubiquitous and integral part of social networking. One of the major attentions made by social researchers is the tendency of like-minded people to interact with one another in social groups, a concept which is known as Homophily. The study of homophily can provide eminent insights into the flow of information and behaviors within a society and this has been extremely useful in analyzing the formations of online communities. In this paper, we review and survey the effect of homophily in social networks and summarize the state of art methods that has been proposed in the past years to identify and measure the effect of homophily in multiple types of social networks and we conclude with a critical discussion of open challenges and directions for future research.

preprint2020arXiv

Utilizing Deep Learning to Identify Drug Use on Twitter Data

The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. Through the analysis of collected Twitter data, models were developed for classifying drug-related tweets. Using topic pertaining keywords, such as slang and methods of drug consumption, a set of tweets was generated. Potential candidates were then preprocessed resulting in a dataset of 3,696,150 rows. The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers. Rather than simple feature or attribute analysis, a deep learning approach was implemented to screen and analyze the tweets' semantic meaning. The two CNN-based classifiers presented the best result when compared against other methodologies. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Additionally, association rule mining showed that commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of the system. Lastly, the synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.

Vijay Mago

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A Survey on Automated Sarcasm Detection on Twitter

A Survey on Text Simplification

Collision Detection: An Improved Deep Learning Approach Using SENet and ResNext

Automating Transfer Credit Assessment in Student Mobility -- A Natural Language Processing-based Approach

Evolution of Semantic Similarity -- A Survey

Validating Optimal COVID-19 Vaccine Distribution Models

A summary of the prevalence of Genetic Algorithms in Bioinformatics from 2015 onwards

The Homophily Principle in Social Network Analysis

Utilizing Deep Learning to Identify Drug Use on Twitter Data