Researcher profile

Dipankar Das

Dipankar Das contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining?

Identifying argument components from unstructured texts and predicting the relationships expressed among them are two primary steps of argument mining. The intrinsic complexity of these tasks demands powerful learning models. While pretrained Transformer-based Language Models (LM) have been shown to provide state-of-the-art results over different NLP tasks, the scarcity of manually annotated data and the highly domain-dependent nature of argumentation restrict the capabilities of such models. In this work, we propose a novel transfer learning strategy to overcome these challenges. We utilize argumentation-rich social discussions from the ChangeMyView subreddit as a source of unsupervised, argumentative discourse-aware knowledge by finetuning pretrained LMs on a selectively masked language modeling task. Furthermore, we introduce a novel prompt-based strategy for inter-component relation prediction that compliments our proposed finetuning method while leveraging on the discourse context. Exhaustive experiments show the generalization capability of our method on these two tasks over within-domain as well as out-of-domain datasets, outperforming several existing and employed strong baselines.

preprint2022arXiv

Diluting quark flavor hierarchies using dihedral symmetry

We present a $D_4$ flavored extension of the SM which provides an intuitive reasoning for the masses and mixing patterns in the quark sector. In our model, the Cabibbo mixing angle stems purely from the scalar sector dynamics. In fact, the orders of magnitude of the CKM matrix elements are readily obtained from the hierarchical nature of the vacuum expectation values. Moreover, we also show that the smallness of the off-Cabibbo elements in the CKM matrix is strongly connected to the heaviness of the third generation of quarks.

preprint2022arXiv

JU_NLP at HinglishEval: Quality Evaluation of the Low-Resource Code-Mixed Hinglish Text

In this paper we describe a system submitted to the INLG 2022 Generation Challenge (GenChal) on Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text. We implement a Bi-LSTM-based neural network model to predict the Average rating score and Disagreement score of the synthetic Hinglish dataset. In our models, we used word embeddings for English and Hindi data, and one hot encodings for Hinglish data. We achieved a F1 score of 0.11, and mean squared error of 6.0 in the average rating score prediction task. In the task of Disagreement score prediction, we achieve a F1 score of 0.18, and mean squared error of 5.0.

preprint2022arXiv

Measuring frequency and period separations in red-giant stars using machine learning

Asteroseismology is used to infer the interior physics of stars. The \textit{Kepler} and TESS space missions have provided a vast data set of red-giant light curves, which may be used for asteroseismic analysis. These data sets are expected to significantly grow with future missions such as \textit{PLATO}, and efficient methods are therefore required to analyze these data rapidly. Here, we describe a machine learning algorithm that identifies red giants from the raw oscillation spectra and captures \textit{p} and \textit{mixed} mode parameters from the red-giant power spectra. We report algorithmic inferences for large frequency separation ($Δν$), frequency at maximum amplitude ($ν_{max}$), and period separation ($ΔΠ$) for an ensemble of stars. In addition, we have discovered $\sim$25 new probable red giants among 151,000 \textit{Kepler} long-cadence stellar-oscillation spectra analyzed by the method, among which four are binary candidates which appear to possess red-giant counterparts. To validate the results of this method, we selected $\sim$ 3,000 \textit{Kepler} stars, at various evolutionary stages ranging from subgiants to red clumps, and compare inferences of $Δν$, $ΔΠ$, and $ν_{max}$ with estimates obtained using other techniques. The power of the machine-learning algorithm lies in its speed: it is able to accurately extract seismic parameters from 1,000 spectra in $\sim$5 seconds on a modern computer (single core of the Intel Xeon Platinum 8280 CPU).

preprint2021arXiv

A three Higgs doublet model with symmetry-suppressed flavour changing neutral currents

We construct a three-Higgs doublet model with a flavour non-universal ${\rm U}(1)\times \mathbb{Z}_2$ symmetry. That symmetry induces suppressed flavour-changing interactions mediated by neutral scalars. New scalars with masses below the TeV scale can still successfully negotiate the constraints arising from flavour data. Such a model can thus encourage direct searches for extra Higgs bosons in the future collider experiments, and includes a non-trivial flavour structure.

preprint2020arXiv

Crossed two Higgs-doublet models: reduction of Yukawa parameters in the low-scale limit of left-right symmetry and other avatars

We present new variants of the Two Higgs-Doublet Model where all Yukawa couplings with physical Higgs bosons are controlled by the quark mixing matrices of both chiralities, as well as, in one case, the ratio between the two scalar doublets' vacuum expectation values. We obtain these by imposing approximate symmetries on the Lagrangian which, in one of the cases, clearly reveals the model to be the electroweak remnant of the Minimal Left-Right Symmetric Model. We also argue for the benefits of the bidoublet notation in the Two Higgs-Doublet Model context for uncovering new models.

preprint2020arXiv

Development of POS tagger for English-Bengali Code-Mixed data

Code-mixed texts are widespread nowadays due to the advent of social media. Since these texts combine two languages to formulate a sentence, it gives rise to various research problems related to Natural Language Processing. In this paper, we try to excavate one such problem, namely, Parts of Speech tagging of code-mixed texts. We have built a system that can POS tag English-Bengali code-mixed data where the Bengali words were written in Roman script. Our approach initially involves the collection and cleaning of English-Bengali code-mixed tweets. These tweets were used as a development dataset for building our system. The proposed system is a modular approach that starts by tagging individual tokens with their respective languages and then passes them to different POS taggers, designed for different languages (English and Bengali, in our case). Tags given by the two systems are later joined together and the final result is then mapped to a universal POS tag set. Our system was checked using 100 manually POS tagged code-mixed sentences and it returned an accuracy of 75.29%

preprint2020arXiv

Double Higgs boson production as an exclusive probe for a sequential fourth generation with wrong-sign Yukawa couplings

It has been shown that the data from the Large Hadron Collider (LHC) does not rule out a chiral sequential fourth generation of fermions that obtain their masses through an identical mechanism as the other three generations do. However, this is possible only if the scalar sector of the Standard Model is suitably enhanced, like embedding it in a type-II two-Higgs doublet model. In this article, we try to show that double Higgs production (DHP) can unveil the existence of such a hidden fourth generation in a very efficient way. While the DHP cross-section in the SM is quite small, it is significantly enhanced with a fourth generation. We perform a detailed analysis of the dependence of the DHP cross-section on the model parameters, and show that either a positive signal of DHP is seen in the early next run of the LHC, or the model is ruled out.

preprint2020arXiv

Investigating Deep Learning Approaches for Hate Speech Detection in Social Media

The phenomenal growth on the internet has helped in empowering individual's expressions, but the misuse of freedom of expression has also led to the increase of various cyber crimes and anti-social activities. Hate speech is one such issue that needs to be addressed very seriously as otherwise, this could pose threats to the integrity of the social fabrics. In this paper, we proposed deep learning approaches utilizing various embeddings for detecting various types of hate speeches in social media. Detecting hate speech from a large volume of text, especially tweets which contains limited contextual information also poses several practical challenges. Moreover, the varieties in user-generated data and the presence of various forms of hate speech makes it very challenging to identify the degree and intention of the message. Our experiments on three publicly available datasets of different domains shows a significant improvement in accuracy and F1-score.

preprint2020arXiv

JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed data using Grid Search Cross Validation

Code-mixing is a phenomenon which arises mainly in multilingual societies. Multilingual people, who are well versed in their native languages and also English speakers, tend to code-mix using English-based phonetic typing and the insertion of anglicisms in their main language. This linguistic phenomenon poses a great challenge to conventional NLP domains such as Sentiment Analysis, Machine Translation, and Text Summarization, to name a few. In this work, we focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis. This work was done as participation in the SemEval-2020 Sentimix Task, where we focused on the sentiment analysis of English-Hindi code-mixed sentences. our username for the submission was "sainik.mahata" and team name was "JUNLP". We used feature extraction algorithms in conjunction with traditional machine learning algorithms such as SVR and Grid Search in an attempt to solve the task. Our approach garnered an f1-score of 66.2\% when tested using metrics prepared by the organizers of the task.

preprint2020arXiv

K-TanH: Efficient TanH For Deep Learning

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical formats, such as, Float32 and BFloat16. High quality approximations to other activation functions, e.g., Sigmoid, Swish and GELU, can be derived from K-TanH. Our AVX512 implementation of K-TanH demonstrates $>5\times$ speed up over Intel SVML, and it is consistently superior in efficiency over other approximations that use floating point arithmetic. Finally, we achieve state-of-the-art Bleu score and convergence results for training language translation model GNMT on WMT16 data sets with approximate TanH obtained via K-TanH on BFloat16 inputs.

preprint2020arXiv

Preparation of Sentiment tagged Parallel Corpus and Testing its effect on Machine Translation

In the current work, we explore the enrichment in the machine translation output when the training parallel corpus is augmented with the introduction of sentiment analysis. The paper discusses the preparation of the same sentiment tagged English-Bengali parallel corpus. The preparation of raw parallel corpus, sentiment analysis of the sentences and the training of a Character Based Neural Machine Translation model using the same has been discussed extensively in this paper. The output of the translation model has been compared with a base-line translation model using automated metrics such as BLEU and TER as well as manually.