Source author record

Imdad Ullah Khan

Imdad Ullah Khan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Discrete Mathematics Machine Learning Quantitative Methods

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Efficient Approximate Kernel Based Spike Sequence Classification

Machine learning (ML) models, such as SVM, for tasks like classification and clustering of sequences, require a definition of distance/similarity between pairs of sequences. Several methods have been proposed to compute the similarity between sequences, such as the exact approach that counts the number of matches between $k$-mers (sub-sequences of length $k$) and an approximate approach that estimates pairwise similarity scores. Although exact methods yield better classification performance, they pose high computational costs, limiting their applicability to a small number of sequences. The approximate algorithms are proven to be more scalable and perform comparably to (sometimes better than) the exact methods -- they are designed in a "general" way to deal with different types of sequences (e.g., music, protein, etc.). Although general applicability is a desired property of an algorithm, it is not the case in all scenarios. For example, in the current COVID-19 (coronavirus) pandemic, there is a need for an approach that can deal specifically with the coronavirus. To this end, we propose a series of ways to improve the performance of the approximate kernel (using minimizers and information gain) in order to enhance its predictive performance pm coronavirus sequences. More specifically, we improve the quality of the approximate kernel using domain knowledge (computed using information gain) and efficient preprocessing (using minimizers computation) to classify coronavirus spike protein sequences corresponding to different variants (e.g., Alpha, Beta, Gamma). We report results using different classification and clustering algorithms and evaluate their performance using multiple evaluation metrics. Using two datasets, we show that our proposed method helps improve the kernel's performance compared to the baseline and state-of-the-art approaches in the healthcare domain.

preprint2022arXiv

SsAG: Summarization and sparsification of Attributed Graphs

We present SsAG, an efficient and scalable lossy graph summarization method that retains the essential structure of the original graph. SsAG computes a sparse representation (summary) of the input graph and also caters to graphs with node attributes. The summary of a graph $G$ is stored as a graph on supernodes (subsets of vertices of $G$), and a weighted superedge connects two supernodes. The proposed method constructs a summary graph on $k$ supernodes that minimize the reconstruction error (difference between the original graph and the graph reconstructed from the summary) and maximum homogeneity with respect to attributes. We construct the summary by iteratively merging a pair of nodes. We derive a closed-form expression to efficiently compute the reconstruction error after merging a pair and approximate this score in constant time. To reduce the search space for selecting the best pair for merging, we assign a weight to each supernode that closely quantifies the contribution of the node in the score of the pairs containing it. We choose the best pair for merging from a random sample of supernodes selected with probability proportional to their weights. A logarithmic-sized sample yields a comparable summary based on various quality measures with weighted sampling. We propose a sparsification step for the constructed summary to reduce the storage cost to a given target size with a marginal increase in reconstruction error. Empirical evaluation on several real-world graphs and comparison with state-of-the-art methods shows that SsAG is up to $5\times$ faster and generates summaries of comparable quality.

Imdad Ullah Khan

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Efficient Approximate Kernel Based Spike Sequence Classification

SsAG: Summarization and sparsification of Attributed Graphs