Researcher profile

Diganta Saha

Diganta Saha contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2020arXiv

A Novel Approach to Enhance the Performance of Semantic Search in Bengali using Neural Net and other Classification Techniques

Search has for a long time been an important tool for users to retrieve information. Syntactic search is matching documents or objects containing specific keywords like user-history, location, preference etc. to improve the results. However, it is often possible that the query and the best answer have no term or very less number of terms in common and syntactic search can not perform properly in such cases. Semantic search, on the other hand, resolves these issues but suffers from lack of annotation, absence of WordNet in case of low resource languages. In this work, we have demonstrated an end to end procedure to improve the performance of semantic search using semi-supervised and unsupervised learning algorithms. An available Bengali repository was chosen to have seven types of semantic properties primarily to develop the system. Performance has been tested using Support Vector Machine, Naive Bayes, Decision Tree and Artificial Neural Network (ANN). Our system has achieved the efficiency to predict the correct semantics using knowledge base over the time of learning. A repository containing around a million sentences, a product of TDIL project of Govt. of India, was used to test our system at first instance. Then the testing has been done for other languages. Being a cognitive system it may be very useful for improving user satisfaction in e-Governance or m-Governance in the multilingual environment and also for other applications.

preprint2020arXiv

A Unified Framework for Nonmonotonic Reasoning with Vagueness and Uncertainty

An interval-valued fuzzy answer set programming paradigm is proposed for nonmonotonic reasoning with vague and uncertain information. The set of sub-intervals of $[0,1]$ is considered as truth-space. The intervals are ordered using preorder-based truth and knowledge ordering. The preorder based ordering is an enhanced version of bilattice-based ordering. The system can represent and reason with prioritized rules, rules with exceptions. An iterative method for answer set computation is proposed. The sufficient conditions for termination of iterations are identified for a class of logic programs using the notion of difference equations.

preprint2020arXiv

Automatic Extraction of Bengali Root Verbs using Paninian Grammar

In this research work, we have proposed an algorithm based on supervised learning methodology to extract the root forms of the Bengali verbs using the grammatical rules proposed by Panini [1] in Ashtadhyayi. This methodology can be applied for the languages which are derived from Sanskrit. The proposed system has been developed based on tense, person and morphological inflections of the verbs to find their root forms. The work has been executed in two phases: first, the surface level forms or inflected forms of the verbs have been classified into a certain number of groups of similar tense and person. For this task, a standard pattern, available in Bengali language has been used. Next, a set of rules have been applied to extract the root form from the surface level forms of a verb. The system has been tested on 10000 verbs collected from the Bengali text corpus developed in the TDIL project of the Govt. of India. The accuracy of the output has been achieved 98% which is verified by a linguistic expert. Root verb identification is a key step in semantic searching, multi-sentence search query processing, understanding the meaning of a language, disambiguation of word sense, classification of the sentences etc.

preprint2020arXiv

Improvement of electronic Governance and mobile Governance in Multilingual Countries with Digital Etymology using Sanskrit Grammar

With huge improvement of digital connectivity (Wifi,3G,4G) and digital devices access to internet has reached in the remotest corners now a days. Rural people can easily access web or apps from PDAs, laptops, smartphones etc. This is an opportunity of the Government to reach to the citizen in large number, get their feedback, associate them in policy decision with e governance without deploying huge man, material or resourses. But the Government of multilingual countries face a lot of problem in successful implementation of Government to Citizen (G2C) and Citizen to Government (C2G) governance as the rural people tend and prefer to interact in their native languages. Presenting equal experience over web or app to different language group of speakers is a real challenge. In this research we have sorted out the problems faced by Indo Aryan speaking netizens which is in general also applicable to any language family groups or subgroups. Then we have tried to give probable solutions using Etymology. Etymology is used to correlate the words using their ROOT forms. In 5th century BC Panini wrote Astadhyayi where he depicted sutras or rules -- how a word is changed according to person,tense,gender,number etc. Later this book was followed in Western countries also to derive their grammar of comparatively new languages. We have trained our system for automatic root extraction from the surface level or morphed form of words using Panian Gramatical rules. We have tested our system over 10000 bengali Verbs and extracted the root form with 98% accuracy. We are now working to extend the program to successfully lemmatize any words of any language and correlate them by applying those rule sets in Artificial Neural Network.

preprint2020arXiv

Modeling Uncertainty and Imprecision in Nonmonotonic Reasoning using Fuzzy Numbers

To deal with uncertainty in reasoning, interval-valued logic has been developed. But uniform intervals cannot capture the difference in degrees of belief for different values in the interval. To salvage the problem triangular and trapezoidal fuzzy numbers are used as the set of truth values along with traditional intervals. Preorder-based truth and knowledge ordering are defined over the set of fuzzy numbers defined over $[0,1]$. Based on this enhanced set of epistemic states, an answer set framework is developed, with properly defined logical connectives. This type of framework is efficient in knowledge representation and reasoning with vague and uncertain information under nonmonotonic environment where rules may posses exceptions.

preprint2015arXiv

Automatic classification of bengali sentences based on sense definitions present in bengali wordnet

Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of particular ambiguous lexical item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses that render sentences in different meanings. In our experiment we have achieved around 84% accurate result on the sense classification over the total input sentences. We have analyzed those residual sentences that did not comply with our experiment and did affect the results to note that in many cases, wrong syntactic structures and less semantic information are the main hurdles in semantic classification of sentences. The applicational relevance of this study is attested in automatic text classification, machine learning, information extraction, and word sense disambiguation.

preprint2015arXiv

Word sense disambiguation: a survey

In this paper, we made a survey on Word Sense Disambiguation (WSD). Near about in all major languages around the world, research in WSD has been conducted upto different extents. In this paper, we have gone through a survey regarding the different approaches adopted in different research works, the State of the Art in the performance in this domain, recent works in different Indian languages and finally a survey in Bengali language. We have made a survey on different competitions in this field and the bench mark results, obtained from those competitions.

preprint2014arXiv

An Efficient Feature Selection in Classification of Audio Files

In this paper we have focused on an efficient feature selection method in classification of audio files. The main objective is feature selection and extraction. We have selected a set of features for further analysis, which represents the elements in feature vector. By extraction method we can compute a numerical representation that can be used to characterize the audio using the existing toolbox. In this study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute which will separate the tuples into different classes. The pulse clarity is considered as a subjective measure and it is used to calculate the gain of features of audio files. The splitting criterion is employed in the application to identify the class or the music genre of a specific audio file from testing database. Experimental results indicate that by using GR the application can produce a satisfactory result for music genre classification. After dimensionality reduction best three features have been selected out of various features of audio file and in this technique we will get more than 90% successful classification result.