Researcher profile

Mukesh Mohania

Mukesh Mohania contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

'John ate 5 apples' != 'John ate some apples': Self-Supervised Paraphrase Quality Detection for Algebraic Word Problems

This paper introduces the novel task of scoring paraphrases for Algebraic Word Problems (AWP) and presents a self-supervised method for doing so. In the current online pedagogical setting, paraphrasing these problems is helpful for academicians to generate multiple syntactically diverse questions for assessments. It also helps induce variation to ensure that the student has understood the problem instead of just memorizing it or using unfair means to solve it. The current state-of-the-art paraphrase generation models often cannot effectively paraphrase word problems, losing a critical piece of information (such as numbers or units) which renders the question unsolvable. There is a need for paraphrase scoring methods in the context of AWP to enable the training of good paraphrasers. Thus, we propose ParaQD, a self-supervised paraphrase quality detection method using novel data augmentations that can learn latent representations to separate a high-quality paraphrase of an algebraic question from a poor one by a wide margin. Through extensive experimentation, we demonstrate that our method outperforms existing state-of-the-art self-supervised methods by up to 32% while also demonstrating impressive zero-shot performance.

preprint2022arXiv

Auxiliary Task Guided Interactive Attention Model for Question Difficulty Prediction

Online learning platforms conduct exams to evaluate the learners in a monotonous way, where the questions in the database may be classified into Bloom's Taxonomy as varying levels in complexity from basic knowledge to advanced evaluation. The questions asked in these exams to all learners are very much static. It becomes important to ask new questions with different difficulty levels to each learner to provide a personalized learning experience. In this paper, we propose a multi-task method with an interactive attention mechanism, Qdiff, for jointly predicting Bloom's Taxonomy and difficulty levels of academic questions. We model the interaction between the predicted bloom taxonomy representations and the input representations using an attention mechanism to aid in difficulty prediction. The proposed learning method would help learn representations that capture the relationship between Bloom's taxonomy and difficulty labels. The proposed multi-task method learns a good input representation by leveraging the relationship between the related tasks and can be used in similar settings where the tasks are related. The results demonstrate that the proposed method performs better than training only on difficulty prediction. However, Bloom's labels may not always be given for some datasets. Hence we soft label another dataset with a model fine-tuned to predict Bloom's labels to demonstrate the applicability of our method to datasets with only difficulty labels.

preprint2022arXiv

BeautifAI -- A Personalised Occasion-oriented Makeup Recommendation System

With the global metamorphosis of the beauty industry and the rising demand for beauty products worldwide, the need for an efficacious makeup recommendation system has never been more. Despite the significant advancements made towards personalised makeup recommendation, the current research still falls short of incorporating the context of occasion in makeup recommendation and integrating feedback for users. In this work, we propose BeautifAI, a novel makeup recommendation system, delivering personalised occasion-oriented makeup recommendations to users while providing real-time previews and continuous feedback. The proposed work's novel contributions, including the incorporation of occasion context, region-wise makeup recommendation, real-time makeup previews and continuous makeup feedback, set our system apart from the current work in makeup recommendation. We also demonstrate our proposed system's efficacy in providing personalised makeup recommendation by conducting a user study.

preprint2022arXiv

K-12BERT: BERT for K-12 education

Online education platforms are powered by various NLP pipelines, which utilize models like BERT to aid in content curation. Since the inception of the pre-trained language models like BERT, there have also been many efforts toward adapting these pre-trained models to specific domains. However, there has not been a model specifically adapted for the education domain (particularly K-12) across subjects to the best of our knowledge. In this work, we propose to train a language model on a corpus of data curated by us across multiple subjects from various sources for K-12 education. We also evaluate our model, K12-BERT, on downstream tasks like hierarchical taxonomy tagging.

preprint2022arXiv

Obj2Sub: Unsupervised Conversion of Objective to Subjective Questions

Exams are conducted to test the learner's understanding of the subject. To prevent the learners from guessing or exchanging solutions, the mode of tests administered must have sufficient subjective questions that can gauge whether the learner has understood the concept by mandating a detailed answer. Hence, in this paper, we propose a novel hybrid unsupervised approach leveraging rule-based methods and pre-trained dense retrievers for the novel task of automatically converting the objective questions to subjective questions. We observe that our approach outperforms the existing data-driven approaches by 36.45% as measured by Recall@k and Precision@k.

preprint2022arXiv

TagRec++: Hierarchical Label Aware Attention Network for Question Categorization

Online learning systems have multiple data repositories in the form of transcripts, books and questions. To enable ease of access, such systems organize the content according to a well defined taxonomy of hierarchical nature (subject-chapter-topic). The task of categorizing inputs to the hierarchical labels is usually cast as a flat multi-class classification problem. Such approaches ignore the semantic relatedness between the terms in the input and the tokens in the hierarchical labels. Alternate approaches also suffer from class imbalance when they only consider leaf level nodes as labels. To tackle the issues, we formulate the task as a dense retrieval problem to retrieve the appropriate hierarchical labels for each content. In this paper, we deal with categorizing questions. We model the hierarchical labels as a composition of their tokens and use an efficient cross-attention mechanism to fuse the information with the term representations of the content. We also propose an adaptive in-batch hard negative sampling approach which samples better negatives as the training progresses. We demonstrate that the proposed approach \textit{TagRec++} outperforms existing state-of-the-art approaches on question datasets as measured by Recall@k. In addition, we demonstrate zero-shot capabilities of \textit{TagRec++} and ability to adapt to label changes.

preprint2022arXiv

Topic Aware Contextualized Embeddings for High Quality Phrase Extraction

Keyphrase extraction from a given document is the task of automatically extracting salient phrases that best describe the document. This paper proposes a novel unsupervised graph-based ranking method to extract high-quality phrases from a given document. We obtain the contextualized embeddings from pre-trained language models enriched with topic vectors from Latent Dirichlet Allocation (LDA) to represent the candidate phrases and the document. We introduce a scoring mechanism for the phrases using the information obtained from contextualized embeddings and the topic vectors. The salient phrases are extracted using a ranking algorithm on an undirected graph constructed for the given document. In the undirected graph, the nodes represent the phrases, and the edges between the phrases represent the semantic relatedness between them, weighted by a score obtained from the scoring mechanism. To demonstrate the efficacy of our proposed method, we perform several experiments on open source datasets in the science domain and observe that our novel method outperforms existing unsupervised embedding based keyphrase extraction methods. For instance, on the SemEval2017 dataset, our method advances the F1 score from 0.2195 (EmbedRank) to 0.2819 at the top 10 extracted keyphrases. Several variants of the proposed algorithm are investigated to determine their effect on the quality of keyphrases. We further demonstrate the ability of our proposed method to collect additional high-quality keyphrases that are not present in the document from external knowledge bases like Wikipedia for enriching the document with newly discovered keyphrases. We evaluate this step on a collection of annotated documents. The F1-score at the top 10 expanded keyphrases is 0.60, indicating that our algorithm can also be used for 'concept' expansion using external knowledge.

preprint2022arXiv

Unsupervised Question Duplicate and Related Questions Detection in e-learning platforms

Online learning platforms provide diverse questions to gauge the learners' understanding of different concepts. The repository of questions has to be constantly updated to ensure a diverse pool of questions to conduct assessments for learners. However, it is impossible for the academician to manually skim through the large repository of questions to check for duplicates when onboarding new questions from external sources. Hence, we propose a tool QDup in this paper that can surface near-duplicate and semantically related questions without any supervised data. The proposed tool follows an unsupervised hybrid pipeline of statistical and neural approaches for incorporating different nuances in similarity for the task of question duplicate detection. We demonstrate that QDup can detect near-duplicate questions and also suggest related questions for practice with remarkable accuracy and speed from a large repository of questions. The demo video of the tool can be found at https://www.youtube.com/watch?v=loh0_-7XLW4.