Source author record

Chowdhury Rafeed Rahman

Chowdhury Rafeed Rahman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision Machine Learning Genomics Information Retrieval Quantitative Methods

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

BSpell: A CNN-Blended BERT Based Bangla Spell Checker

Bangla typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. A specialized BERT model named BSpell has been proposed in this paper targeted towards word for word correction in sentence level. BSpell contains an end-to-end trainable CNN sub-model named SemanticNet along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bangla vocabulary in the presence of spelling errors. Furthermore, a hybrid pretraining scheme has been proposed for BSpell that combines word level and character level masking. Comparison on two Bangla and one Hindi spelling correction dataset shows the superiority of our proposed approach. BSpell is available as a Bangla spell checking tool via GitHub: https://github.com/Hasiburshanto/Bangla-Spell-Checker

preprint2022arXiv

Automatic Signboard Detection and Localization in Densely Populated Developing Cities

Most city establishments of developing cities are digitally unlabeled because of the lack of automatic annotation systems. Hence location and trajectory services such as Google Maps, Uber etc remain underutilized in such cities. Accurate signboard detection in natural scene images is the foremost task for error-free information retrieval from such city streets. Yet, developing accurate signboard localization system is still an unresolved challenge because of its diverse appearances that include textual images and perplexing backgrounds. We present a novel object detection approach that can detect signboards automatically and is suitable for such cities. We use Faster R-CNN based localization by incorporating two specialized pretraining methods and a run time efficient hyperparameter value selection algorithm. We have taken an incremental approach in reaching our final proposed method through detailed evaluation and comparison with baselines using our constructed SVSO (Street View Signboard Objects) signboard dataset containing signboard natural scene images of six developing countries. We demonstrate state-of-the-art performance of our proposed method on both SVSO dataset and Open Image Dataset. Our proposed method can detect signboards accurately (even if the images contain multiple signboards with diverse shapes and colours in a noisy background) achieving 0.90 mAP (mean average precision) score on SVSO independent test set. Our implementation is available at: https://github.com/sadrultoaha/Signboard-Detection

preprint2022arXiv

Judge a Sentence by Its Content to Generate Grammatical Errors

Data sparsity is a well-known problem for grammatical error correction (GEC). Generating synthetic training data is one widely proposed solution to this problem, and has allowed models to achieve state-of-the-art (SOTA) performance in recent years. However, these methods often generate unrealistic errors, or aim to generate sentences with only one error. We propose a learning based two stage method for synthetic data generation for GEC that relaxes this constraint on sentences containing only one error. Errors are generated in accordance with sentence merit. We show that a GEC model trained on our synthetically generated corpus outperforms models trained on synthetic data from prior work.

preprint2021arXiv

Confronting the Constraints for Optical Character Segmentation from Printed Bangla Text Image

In a world of digitization, optical character recognition holds the automation to written history. Optical character recognition system basically converts printed images into editable texts for better storage and usability. To be completely functional, the system needs to go through some crucial methods such as pre-processing and segmentation. Pre-processing helps printed data to be noise free and gets rid of skewness efficiently whereas segmentation helps the image fragment into line, word and character precisely for better conversion. These steps hold the door to better accuracy and consistent results for a printed image to be ready for conversion. Our proposed algorithm is able to segment characters both from ideal and non-ideal cases of scanned or captured images giving a sustainable outcome. The implementation of our work is provided here: https://cutt.ly/rgdfBIa

preprint2020arXiv

A Hybrid Approach Towards Two Stage Bengali Question Classification Utilizing Smart Data Balancing Technique

Question classification (QC) is the primary step of the Question Answering (QA) system. Question Classification (QC) system classifies the questions in particular classes so that Question Answering (QA) System can provide correct answers for the questions. Our system categorizes the factoid type questions asked in natural language after extracting features of the questions. We present a two stage QC system for Bengali. It utilizes one dimensional convolutional neural network for classifying questions into coarse classes in the first stage. Word2vec representation of existing words of the question corpus have been constructed and used for assisting 1D CNN. A smart data balancing technique has been employed for giving data hungry convolutional neural network the advantage of a greater number of effective samples to learn from. For each coarse class, a separate Stochastic Gradient Descent (SGD) based classifier has been used in order to differentiate among the finer classes within that coarse class. TF-IDF representation of each word has been used as feature for the SGD classifiers implemented as part of second stage classification. Experiments show the effectiveness of our proposed method for Bengali question classification.

preprint2020arXiv

i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome

DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification and is responsible for many biological functions. Experimental methods for genome wide 6mA site detection is an expensive and manual labour intensive process. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves area under the receiver operating characteristic curve of 0.98 with an overall accuracy of 0.94 using 5 fold cross validation on benchmark dataset. Finally, we evaluate our model on two other plant genome 6mA site identification datasets besides rice. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. Web tool for this research can be found at: https://cutt.ly/Co6KuWG. Supplementary data (benchmark dataset, independent test dataset, comparison purpose dataset, trained model, physicochemical property values, attention mechanism details for motif finding) are available at https://cutt.ly/PpDdeDH.

preprint2020arXiv

Identification and Recognition of Rice Diseases and Pests Using Convolutional Neural Networks

An accurate and timely detection of diseases and pests in rice plants can help farmers in applying timely treatment on the plants and thereby can reduce the economic losses substantially. Recent developments in deep learning based convolutional neural networks (CNN) have greatly improved the image classification accuracy. Being motivated by the success of CNNs in image classification, deep learning based approaches have been developed in this paper for detecting diseases and pests from rice plant images. The contribution of this paper is two fold: (i) State-of-the-art large scale architectures such as VGG16 and InceptionV3 have been adopted and fine tuned for detecting and recognizing rice diseases and pests. Experimental results show the effectiveness of these models with real datasets. (ii) Since large scale architectures are not suitable for mobile devices, a two-stage small CNN architecture has been proposed, and compared with the state-of-the-art memory efficient CNN architectures such as MobileNet, NasNet Mobile and SqueezeNet. Experimental results show that the proposed architecture can achieve the desired accuracy of 93.3\% with a significantly reduced model size (e.g., 99\% less size compared to that of VGG16).

preprint2020arXiv

iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters

Promoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra and inter class variation and similarity in terms of consensus sequences. Accurate classification of various types of sigma promoters still remains a challenge. We present iPromoter-BnCNN for identification and accurate classification of six types of promoters - sigma24, sigma28, sigma32, sigma38, sigma54, sigma70. It is a Convolutional Neural Network (CNN) based classifier which combines local features related to monomer nucleotide sequence, trimer nucleotide sequence, dimer structural properties and trimer structural properties through the use of parallel branching. We conducted experiments on a benchmark dataset and compared with two state-of-the-art tools to show our supremacy on 5-fold cross-validation. Moreover, we tested our classifier on an independent test dataset. Our proposed tool iPromoter-BnCNN web server is freely available at http://103.109.52.8/iPromoter-BnCNN. The runnable source code can be found at https://colab.research.google.com/drive/1yWWh7BXhsm8U4PODgPqlQRy23QGjF2DZ.

preprint2020arXiv

Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

While writing Bengali using English keyboard, users often make spelling mistakes. The accuracy of any Bengali spell checker or paragraph correction module largely depends on the kind of error dataset it is based on. Manual generation of such error dataset is a cumbersome process. In this research, We present an algorithm for automatic misspelled Bengali word generation from correct word through analyzing Bengali writing pattern using QWERTY layout English keyboard. As part of our analysis, we have formed a list of most commonly used Bengali words, phonetically similar replaceable clusters, frequently mispressed replaceable clusters, frequently mispressed insertion prone clusters and some rules for Juktakkhar (constant letter clusters) handling while generating errors.

Chowdhury Rafeed Rahman

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

BSpell: A CNN-Blended BERT Based Bangla Spell Checker

Automatic Signboard Detection and Localization in Densely Populated Developing Cities

Judge a Sentence by Its Content to Generate Grammatical Errors

Confronting the Constraints for Optical Character Segmentation from Printed Bangla Text Image

A Hybrid Approach Towards Two Stage Bengali Question Classification Utilizing Smart Data Balancing Technique

i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome

Identification and Recognition of Rice Diseases and Pests Using Convolutional Neural Networks

iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters

Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern