Source author record

M. Sohel Rahman

M. Sohel Rahman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Engineering, Finance, and Science Discrete Mathematics Machine Learning Artificial Intelligence Computer Vision Computation and Language Computational Geometry Cryptography and Security eess.IV eess.SP Information Retrieval math.CO math.OC Neural and Evolutionary Computing q-fin.ST

Catalog footprint

What is connected

22works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla

In this work, we introduce BanglaBERT, a BERT-based Natural Language Understanding (NLU) model pretrained in Bangla, a widely spoken yet low-resource language in the NLP literature. To pretrain BanglaBERT, we collect 27.5 GB of Bangla pretraining data (dubbed `Bangla2B+') by crawling 110 popular Bangla sites. We introduce two downstream task datasets on natural language inference and question answering and benchmark on four diverse NLU tasks covering text classification, sequence labeling, and span prediction. In the process, we bring them under the first-ever Bangla Language Understanding Benchmark (BLUB). BanglaBERT achieves state-of-the-art results outperforming multilingual and monolingual models. We are making the models, datasets, and a leaderboard publicly available at https://github.com/csebuetnlp/banglabert to advance Bangla NLP.

preprint2022arXiv

Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark

Lung cancer is one of the deadliest cancers, and in part its effective diagnosis and treatment depend on the accurate delineation of the tumor. Human-centered segmentation, which is currently the most common approach, is subject to inter-observer variability, and is also time-consuming, considering the fact that only experts are capable of providing annotations. Automatic and semi-automatic tumor segmentation methods have recently shown promising results. However, as different researchers have validated their algorithms using various datasets and performance metrics, reliably evaluating these methods is still an open challenge. The goal of the Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark created through 2018 IEEE Video and Image Processing (VIP) Cup competition, is to provide a unique dataset and pre-defined metrics, so that different researchers can develop and evaluate their methods in a unified fashion. The 2018 VIP Cup started with a global engagement from 42 countries to access the competition data. At the registration stage, there were 129 members clustered into 28 teams from 10 countries, out of which 9 teams made it to the final stage and 6 teams successfully completed all the required tasks. In a nutshell, all the algorithms proposed during the competition, are based on deep learning models combined with a false positive reduction technique. Methods developed by the three finalists show promising results in tumor segmentation, however, more effort should be put into reducing the false positive rate. This competition manuscript presents an overview of the VIP-Cup challenge, along with the proposed algorithms and results.

preprint2021arXiv

A Shallow U-Net Architecture for Reliably Predicting Blood Pressure (BP) from Photoplethysmogram (PPG) and Electrocardiogram (ECG) Signals

Cardiovascular diseases are the most common causes of death around the world. To detect and treat heart-related diseases, continuous Blood Pressure (BP) monitoring along with many other parameters are required. Several invasive and non-invasive methods have been developed for this purpose. Most existing methods used in the hospitals for continuous monitoring of BP are invasive. On the contrary, cuff-based BP monitoring methods, which can predict Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP), cannot be used for continuous monitoring. Several studies attempted to predict BP from non-invasively collectible signals such as Photoplethysmogram (PPG) and Electrocardiogram (ECG), which can be used for continuous monitoring. In this study, we explored the applicability of autoencoders in predicting BP from PPG and ECG signals. The investigation was carried out on 12,000 instances of 942 patients of the MIMIC-II dataset and it was found that a very shallow, one-dimensional autoencoder can extract the relevant features to predict the SBP and DBP with the state-of-the-art performance on a very large dataset. Independent test set from a portion of the MIMIC-II dataset provides an MAE of 2.333 and 0.713 for SBP and DBP, respectively. On an external dataset of forty subjects, the model trained on the MIMIC-II dataset, provides an MAE of 2.728 and 1.166 for SBP and DBP, respectively. For both the cases, the results met British Hypertension Society (BHS) Grade A and surpassed the studies from the current literature.

preprint2021arXiv

EDITH :ECG biometrics aided by Deep learning for reliable Individual auTHentication

In recent years, physiological signal based authentication has shown great promises,for its inherent robustness against forgery. Electrocardiogram (ECG) signal, being the most widely studied biosignal, has also received the highest level of attention in this regard. It has been proven with numerous studies that by analyzing ECG signals from different persons, it is possible to identify them, with acceptable accuracy. In this work, we present, EDITH, a deep learning-based framework for ECG biometrics authentication system. Moreover, we hypothesize and demonstrate that Siamese architectures can be used over typical distance metrics for improved performance. We have evaluated EDITH using 4 commonly used datasets and outperformed the prior works using less number of beats. EDITH performs competitively using just a single heartbeat (96-99.75% accuracy) and can be further enhanced by fusing multiple beats (100% accuracy from 3 to 6 beats). Furthermore, the proposed Siamese architecture manages to reduce the identity verification Equal Error Rate (EER) to 1.29%. A limited case study of EDITH with real-world experimental data also suggests its potential as a practical authentication system.

preprint2019arXiv

Predicting and Forecasting the Price of Constituents and Index of Cryptocurrency Using Machine Learning

At present, cryptocurrencies have become a global phenomenon in financial sectors as it is one of the most traded financial instruments worldwide. Cryptocurrency is not only one of the most complicated and abstruse fields among financial instruments, but it is also deemed as a perplexing problem in finance due to its high volatility. This paper makes an attempt to apply machine learning techniques on the index and constituents of cryptocurrency with a goal to predict and forecast prices thereof. In particular, the purpose of this paper is to predict and forecast the close (closing) price of the cryptocurrency index 30 and nine constituents of cryptocurrencies using machine learning algorithms and models so that, it becomes easier for people to trade these currencies. We have used several machine learning techniques and algorithms and compared the models with each other to get the best output. We believe that our work will help reduce the challenges and difficulties faced by people, who invest in cryptocurrencies. Moreover, the obtained results can play a major role in cryptocurrency portfolio management and in observing the fluctuations in the prices of constituents of cryptocurrency market. We have also compared our approach with similar state of the art works from the literature, where machine learning approaches are considered for predicting and forecasting the prices of these currencies. In the sequel, we have found that our best approach presents better and competitive results than the best works from the literature thereby advancing the state of the art. Using such prediction and forecasting methods, people can easily understand the trend and it would be even easier for them to trade in a difficult and challenging financial instrument like cryptocurrency.

preprint2016arXiv

Algorithms to Compute the Lyndon Array

We first describe three algorithms for computing the Lyndon array that have been suggested in the literature, but for which no structured exposition has been given. Two of these algorithms execute in quadratic time in the worst case, the third achieves linear time, but at the expense of prior computation of both the suffix array and the inverse suffix array of x. We then go on to describe two variants of a new algorithm that avoids prior computation of global data structures and executes in worst-case n log n time. Experimental evidence suggests that all but one of these five algorithms require only linear execution time in practice, with the two new algorithms faster by a small factor. We conjecture that there exists a fast and worst-case linear-time algorithm to compute the Lyndon array that is also elementary (making no use of global data structures such as the suffix array).

preprint2015arXiv

Algorithms for Longest Common Abelian Factors

In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple $O(σ~ n^2)$ time algorithm, where $n$ is the length of the strings and $σ$ is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying some interesting tricks and experimentally show that the resulting algorithm runs faster.

preprint2015arXiv

Computing Covers Using Prefix Tables

An \emph{indeterminate string} $x = x[1..n]$ on an alphabet $Σ$ is a sequence of nonempty subsets of $Σ$; $x$ is said to be \emph{regular} if every subset is of size one. A proper substring $u$ of regular $x$ is said to be a \emph{cover} of $x$ iff for every $i \in 1..n$, an occurrence of $u$ in $x$ includes $x[i]$. The \emph{cover array} $γ= γ[1..n]$ of $x$ is an integer array such that $γ[i]$ is the longest cover of $x[1..i]$. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular $x$ based on prior computation of the border array of $x$. In this paper we first describe a linear-time algorithm to compute the cover array of regular string $x$ based on the prefix table of $x$. We then extend this result to indeterminate strings.

preprint2015arXiv

Enhanced Covers of Regular & Indeterminate Strings using Prefix Tables

A \itbf{cover} of a string $x = x[1..n]$ is a proper substring $u$ of $x$ such that $x$ can be constructed from possibly overlapping instances of $u$. A recent paper \cite{FIKPPST13} relaxes this definition --- an \itbf{enhanced cover} $u$ of $x$ is a border of $x$ (that is, a proper prefix that is also a suffix) that covers a {\it maximum} number of positions in $x$ (not necessarily all) --- and proposes efficient algorithms for the computation of enhanced covers. These algorithms depend on the prior computation of the \itbf{border array} $β[1..n]$, where $β[i]$ is the length of the longest border of $x[1..i]$, $1 \le i \le n$. In this paper, we first show how to compute enhanced covers using instead the \itbf{prefix table}: an array $π[1..n]$ such that $π[i]$ is the length of the longest substring of $x$ beginning at position $i$ that matches a prefix of $x$. Unlike the border array, the prefix table is robust: its properties hold also for \itbf{indeterminate strings} --- that is, strings defined on {\it subsets} of the alphabet $Σ$ rather than individual elements of $Σ$. Thus, our algorithms, in addition to being faster in practice and more space-efficient than those of \cite{FIKPPST13}, allow us to easily extend the computation of enhanced covers to indeterminate strings. Both for regular and indeterminate strings, our algorithms execute in expected linear time. Along the way we establish an important theoretical result: that the expected maximum length of any border of any prefix of a regular string $x$ is approximately 1.64 for binary alphabets, less for larger ones.

preprint2015arXiv

Inferring an Indeterminate String from a Prefix Graph

An \itbf{indeterminate string} (or, more simply, just a \itbf{string}) $\s{x} = \s{x}[1..n]$ on an alphabet $Σ$ is a sequence of nonempty subsets of $Σ$. We say that $\s{x}[i_1]$ and $\s{x}[i_2]$ \itbf{match} (written $\s{x}[i_1] \match \s{x}[i_2]$) if and only if $\s{x}[i_1] \cap \s{x}[i_2] \ne \emptyset$. A \itbf{feasible array} is an array $\s{y} = \s{y}[1..n]$ of integers such that $\s{y}[1] = n$ and for every $i \in 2..n$, $\s{y}[i] \in 0..n\- i\+ 1$. A \itbf{prefix table} of a string $\s{x}$ is an array $\sπ = \sπ[1..n]$ of integers such that, for every $i \in 1..n$, $\sπ[i] = j$ if and only if $\s{x}[i..i\+ j\- 1]$ is the longest substring at position $i$ of \s{x} that matches a prefix of \s{x}. It is known from \cite{CRSW13} that every feasible array is a prefix table of some indetermintate string. A \itbf{prefix graph} $\mathcal{P} = \mathcal{P}_{\s{y}}$ is a labelled simple graph whose structure is determined by a feasible array \s{y}. In this paper we show, given a feasible array \s{y}, how to use $\mathcal{P}_{\s{y}}$ to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table $\sπ = \s{y}$.

preprint2015arXiv

Linear Algorithms for Computing the Lyndon Border Array and the Lyndon Suffix Array

We consider the problem of finding repetitive structures and inherent patterns in a given string $\s{s}$ of length $n$ over a finite totally ordered alphabet. A border $\s{u}$ of a string $\s{s}$ is both a prefix and a suffix of $\s{s}$ such that $\s{u} \not= \s{s}$. The computation of the border array of a string $\s{s}$, namely the borders of each prefix of $\s{s}$, is strongly related to the string matching problem: given a string $\s{w}$, find all of its occurrences in $\s{s}$. A {\itshape Lyndon word} is a primitive word (i.e., it is not a power of another word) which is minimal for the lexicographical order of its conjugacy class (i.e., the set of words obtained by cyclic rotations of the letters). In this paper we combine these concepts to introduce the \emph{Lyndon Border Array} $\mathcal L β$ of $\s{s}$, whose $i$-th entry $\mathcal L β(\s{s})[i]$ is the length of the longest border of $\s{s}[1 \dd i]$ which is also a Lyndon word. We propose linear-time and linear-space algorithms for computing $\mathcal L β(\s{s})$. %in the case of both binary and bounded alphabets. Further, we introduce the \emph{Lyndon Suffix Array}, and by modifying the efficient suffix array technique of Ko and Aluru \cite{KA03} outline a linear time and space algorithm for its construction.

preprint2015arXiv

String Comparison in $V$-Order: New Lexicographic Properties & On-line Applications

$V$-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words. $V$-order has recently been proposed as an alternative to lexicographical order in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform. Efficient $V$-ordering of strings thus becomes a matter of considerable interest. In this paper we present new and surprising results on $V$-order in strings, then go on to explore the algorithmic consequences.

preprint2014arXiv

An Integer Programming Formulation of the Minimum Common String Partition problem

We consider the problem of finding a minimum common partition of two strings (MCSP). The problem has its application in genome comparison. MCSP problem is proved to be NP-hard. In this paper, we develop an Integer Programming (IP) formulation for the problem and implement it. The experimental results are compared with the previous state-of-the-art algorithms and are found to be promising.

preprint2014arXiv

CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval

Due to the advancements in technology number of entries in the structural database of proteins are increasing day by day. Methods for retrieving protein tertiary structures from this large database is the key to comparative analysis of structures which plays an important role to understand proteins and their function. In this paper, we present fast and accurate methods for the retrieval of proteins from a large database with tertiary structures similar to a query protein. Our proposed methods borrow ideas from the field of computer vision. The speed and accuracy of our methods comes from the two newly introduced features, the co-occurrence matrix of the oriented gradient and pyramid histogram of oriented gradient and from the use of Euclidean distance as the distance measure. Experimental results clearly indicate the superiority of our approach in both running time and accuracy. Our method is readily available for use from this website: http://research.buet.ac.bd:8080/Comograd/.

preprint2014arXiv

GreMuTRRR: A Novel Genetic Algorithm to Solve Distance Geometry Problem for Protein Structures

Nuclear Magnetic Resonance (NMR) Spectroscopy is a widely used technique to predict the native structure of proteins. However, NMR machines are only able to report approximate and partial distances between pair of atoms. To build the protein structure one has to solve the Euclidean distance geometry problem given the incomplete interval distance data produced by NMR machines. In this paper, we propose a new genetic algorithm for solving the Euclidean distance geometry problem for protein structure prediction given sparse NMR data. Our genetic algorithm uses a greedy mutation operator to intensify the search, a twin removal technique for diversification in the population and a random restart method to recover stagnation. On a standard set of benchmark dataset, our algorithm significantly outperforms standard genetic algorithms.

preprint2014arXiv

Protein Folding in the Hexagonal Prism Lattice with Diagonals

Predicting protein secondary structure using lattice model is one of the most studied computational problem in bioinformatics. Here secondary structure or three dimensional structure of protein is predicted from its amino acid sequence. Secondary structure refers to local sub-structures of protein. Mostly founded secondary structures are alpha helix and beta sheets. Since, it is a problem of great potential complexity many simplified energy model have been proposed in literature on basis of interaction of amino acid residue in protein. Here we use well researched Hydrophobic-Polar (HP) energy model. In this paper, we proposed hexagonal prism lattice with diagonal that can overcome the problems of other lattice structure, e.g., parity problem. We give two approximation algorithm for protein folding on this lattice. Our first algorithm leads us to similar structure of helix structure which is commonly found in protein structure. This motivated us to find next algorithm which improves the algorithm ratio of 9/7.

preprint2014arXiv

Solving the Minimum Common String Partition Problem with the Help of Ants

In this paper, we consider the problem of finding a minimum common partition of two strings. The problem has its application in genome comparison. As it is an NP-hard, discrete combinatorial optimization problem, we employ a metaheuristic technique, namely, MAX-MIN ant system to solve this problem. To achieve better efficiency we first map the problem instance into a special kind of graph. Subsequently, we employ a MAX-MIN ant system to achieve high quality solutions for the problem. Experimental results show the superiority of our algorithm in comparison with the state of art algorithm in the literature. The improvement achieved is also justified by standard statistical test.

preprint2013arXiv

The Swap Matching Problem Revisited

In this paper, we revisit the much studied problem of Pattern Matching with Swaps (Swap Matching problem, for short). We first present a graph-theoretic model, which opens a new and so far unexplored avenue to solve the problem. Then, using the model, we devise two efficient algorithms to solve the swap matching problem. The resulting algorithms are adaptations of the classic shift-and algorithm. For patterns having length similar to the word-size of the target machine, both the algorithms run in linear time considering a fixed alphabet.

preprint2011arXiv

Computing a Longest Common Palindromic Subsequence

The {\em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. Palindrome is a word which reads the same forward as it does backward. The {\em longest common palindromic subsequence (LCPS)} problem is an interesting variant of the classic LCS problem which finds the longest common subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we study the LCPS problem and give efficient algorithms to solve this problem. To the best of our knowledge, this is the first attempt to study and solve this interesting problem.

preprint2011arXiv

Linear Time Inference of Strings from Cover Arrays using a Binary Alphabet

Covers being one of the most popular form of regularities in strings, have drawn much attention over time. In this paper, we focus on the problem of linear time inference of strings from cover arrays using the least sized alphabet possible. We present an algorithm that can reconstruct a string $x$ over a two-letter alphabet whenever a valid cover array $C$ is given as an input. This algorithm uses several interesting combinatorial properties of cover arrays and an interesting relation between border array and cover array to achieve this. Our algorithm runs in linear time.

preprint2010arXiv

An $O(n^2)$ Algorithm for Computing Longest Common Cyclic Subsequence

The {\em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. LCS is a central problem in stringology and finds broad applications in text compression, error-detecting codes and biological sequence comparison. However, in numerous contexts, words represent cyclic sequences of symbols and LCS must be generalized to consider all circular shifts of the strings. This occurs especially in computational biology when genetic material is sequenced form circular DNA or RNA molecules. This initiates the problem of {\em longest common cyclic subsequence (LCCS)} which finds the longest subsequence between all circular shifts of two strings. In this paper, we give an $O(n^2)$ algorithm for solving LCCS problem where $n$ is the number of symbols in the strings.

preprint2010arXiv

Improved Algorithms for the Point-Set Embeddability problem for Plane 3-Trees

In the point set embeddability problem, we are given a plane graph $G$ with $n$ vertices and a point set $S$ with $n$ points. Now the goal is to answer the question whether there exists a straight-line drawing of $G$ such that each vertex is represented as a distinct point of $S$ as well as to provide an embedding if one does exist. Recently, in \cite{DBLP:conf/gd/NishatMR10}, a complete characterization for this problem on a special class of graphs known as the plane 3-trees was presented along with an efficient algorithm to solve the problem. In this paper, we use the same characterization to devise an improved algorithm for the same problem. Much of the efficiency we achieve comes from clever uses of the triangular range search technique. We also study a generalized version of the problem and present improved algorithms for this version of the problem as well.

M. Sohel Rahman

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla

Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark

A Shallow U-Net Architecture for Reliably Predicting Blood Pressure (BP) from Photoplethysmogram (PPG) and Electrocardiogram (ECG) Signals

EDITH :ECG biometrics aided by Deep learning for reliable Individual auTHentication

Predicting and Forecasting the Price of Constituents and Index of Cryptocurrency Using Machine Learning

Algorithms to Compute the Lyndon Array

Algorithms for Longest Common Abelian Factors

Computing Covers Using Prefix Tables

Enhanced Covers of Regular & Indeterminate Strings using Prefix Tables

Inferring an Indeterminate String from a Prefix Graph

Linear Algorithms for Computing the Lyndon Border Array and the Lyndon Suffix Array

String Comparison in $V$-Order: New Lexicographic Properties & On-line Applications

An Integer Programming Formulation of the Minimum Common String Partition problem

CoMOGrad and PHOG: From Computer Vision to Fast and Accurate Protein Tertiary Structure Retrieval

GreMuTRRR: A Novel Genetic Algorithm to Solve Distance Geometry Problem for Protein Structures

Protein Folding in the Hexagonal Prism Lattice with Diagonals

Solving the Minimum Common String Partition Problem with the Help of Ants

The Swap Matching Problem Revisited

Computing a Longest Common Palindromic Subsequence

Linear Time Inference of Strings from Cover Arrays using a Binary Alphabet

An $O(n^2)$ Algorithm for Computing Longest Common Cyclic Subsequence

Improved Algorithms for the Point-Set Embeddability problem for Plane 3-Trees