Source author record

Manar D. Samad

Manar D. Samad appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Applications Artificial Intelligence Biomolecules Computation and Language Information Retrieval Neural and Evolutionary Computing

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning

Polymer-based long-acting injectables (LAIs) have transformed the treatment of chronic diseases by enabling controlled drug delivery, thus reducing dosing frequency and extending therapeutic duration. Achieving controlled drug release from LAIs requires extensive optimization of the complex underlying physicochemical properties. Machine learning (ML) can accelerate LAI development by modeling the complex relationships between LAI properties and drug release. However, recent ML studies have provided limited information on key properties that modulate drug release, due to the lack of custom modeling and analysis tailored to LAI data. This paper presents a novel data transformation and explainable ML approach to synthesize actionable information from 321 LAI formulations by predicting early drug release at 24, 48, and 72 hours, classification of release profile types, and prediction of complete release profiles. These three experiments investigate the contribution and control of LAI material characteristics in early and complete drug release profiles. A strong correlation (>0.65) is observed between the true and predicted drug release in 72 hours, while a 0.87 F1-score is obtained in classifying release profile types. A time-independent ML framework predicts delayed biphasic and triphasic curves with better performance than current time-dependent approaches. Shapley additive explanations reveal the relative influence of material characteristics during early and for complete release which fill several gaps in previous in-vitro and ML-based studies. The novel approach and findings can provide a quantitative strategy and recommendations for scientists to optimize the drug-release dynamics of LAI. The source code for the model implementation is publicly available.

preprint2023arXiv

Effectiveness of Deep Image Embedding Clustering Methods on Tabular Data

Deep learning methods in the literature are commonly benchmarked on image data sets, which may not be suitable or effective baselines for non-image tabular data. In this paper, we take a data-centric view to perform one of the first studies on deep embedding clustering of tabular data. Eight clustering and state-of-the-art embedding clustering methods proposed for image data sets are tested on seven tabular data sets. Our results reveal that a traditional clustering method ranks second out of eight methods and is superior to most deep embedding clustering baselines. Our observation aligns with the literature that conventional machine learning of tabular data is still a robust approach against deep learning. Therefore, state-of-the-art embedding clustering methods should consider data-centric customization of learning architectures to become competitive baselines for tabular data.

preprint2020arXiv

A Probabilistic Approach to Identifying Run Scoring Advantage in the Order of Playing Cricket

In the game of cricket, the result of coin toss is assumed to be one of the determinants of match outcome. The decision to bat first after winning the toss is often taken to make the best use of superior pitch conditions and set a big target for the opponent. However, the opponent may fail to show their natural batting performance in the second innings due to a number of factors, including deteriorated pitch conditions and excessive pressure of chasing a high target score. The advantage of batting first has been highlighted in the literature and expert opinions, however, the effect of batting and bowling order on match outcome has not been investigated well enough to recommend a solution to any potential bias. This study proposes a probability theory-based model to study venue-specific scoring and chasing characteristics of teams under different match outcomes. A total of 1117 one-day international matches held in ten popular venues are analyzed to show substantially high scoring advantage and likelihood when the winning team bat in the first innings. Results suggest that the same 'bat-first' winning team is very unlikely to score or chase such a high score if they were to bat in the second innings. Therefore, the coin toss decision may favor one team over the other. A Bayesian model is proposed to revise the target score for each venue such that the winning and scoring likelihood is equal regardless of the toss decision. The data and source codes have been shared publicly for future research in creating competitive match outcomes by eliminating the advantage of batting order in run scoring.

preprint2020arXiv

Effect of Text Processing Steps on Twitter Sentiment Classification using Word Embedding

Processing of raw text is the crucial first step in text classification and sentiment analysis. However, text processing steps are often performed using off-the-shelf routines and pre-built word dictionaries without optimizing for domain, application, and context. This paper investigates the effect of seven text processing scenarios on a particular text domain (Twitter) and application (sentiment classification). Skip gram-based word embeddings are developed to include Twitter colloquial words, emojis, and hashtag keywords that are often removed for being unavailable in conventional literature corpora. Our experiments reveal negative effects on sentiment classification of two common text processing steps: 1) stop word removal and 2) averaging of word vectors to represent individual tweets. New effective steps for 1) including non-ASCII emoji characters, 2) measuring word importance from word embedding, 3) aggregating word vectors into a tweet embedding, and 4) developing linearly separable feature space have been proposed to optimize the sentiment classification pipeline. The best combination of text processing steps yields the highest average area under the curve (AUC) of 88.4 (+/-0.4) in classifying 14,640 tweets with three sentiment labels. Word selection from context-driven word embedding reveals that only the ten most important words in Tweets cumulatively yield over 98% of the maximum accuracy. Results demonstrate a means for data-driven selection of important words in tweet classification as opposed to using pre-built word dictionaries. The proposed tweet embedding is robust to and alleviates the need for several text processing steps.

Manar D. Samad

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning

Effectiveness of Deep Image Embedding Clustering Methods on Tabular Data

A Probabilistic Approach to Identifying Run Scoring Advantage in the Order of Playing Cricket

Effect of Text Processing Steps on Twitter Sentiment Classification using Word Embedding