Source author record

Izzat Alsmadi

Izzat Alsmadi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Information Retrieval Digital Libraries Social and Information Networks

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

Modern scientific research increasingly depends on High-Performance Computing (HPC) infrastructures, yet many researchers face significant operational barriers when interacting with cluster environments, job schedulers, GPU resources, and parallel computing frameworks. General-purpose large language models (LLMs) provide useful coding assistance but often lack the domain-specific operational knowledge required for reliable HPC support. This paper presents HPC-LLM, a retrieval augmented and domain-adapted assistant designed to support common HPC workflows including Slurm scheduling, MPI execution, GPU utilization, filesystem management, and cluster troubleshooting. The proposed framework integrates automated documentation ingestion, dense retrieval, lightweight domain adaptation using QLoRA, and local inference within a modular orchestration pipeline. To support domain adaptation, we construct an HPC-oriented corpus from publicly available university HPC documentation, curated operational examples, and synthetic instruction-answer pairs generated from retrieved HPC content. The resulting dataset contains approximately 9,000 to 24,000 HPC-focused training examples spanning job scheduling, GPU computing, distributed training, storage systems, and cluster administration topics. We fine-tune Llama 3.1 8B using QLoRA and evaluate the resulting model against several open weight baselines under retrieval-augmented settings on JetStream2 infrastructure. Experimental results indicate that the adapted 8B model achieves performance comparable to substantially larger general-purpose models while operating under significantly lower GPU memory requirements and inference latency. In particular, the adapted model approaches the performance of Qwen 2.5 14B while requiring substantially fewer computational resources.

preprint2026arXiv

Security Hardening Using FABRIC: Implementing a Unified Compliance Aggregator for Linux Servers

This paper presents a unified framework for evaluating Linux security hardening on the FABRIC testbed through aggregation of heterogeneous security auditing tools. We deploy three Ubuntu 22.04 nodes configured at baseline, partial, and full hardening levels, and evaluate them using Lynis, OpenSCAP, and AIDE across 108 audit runs. To address the lack of a consistent interpretation across tools, we implement a Unified Compliance Aggregator (UCA) that parses tool outputs, normalizes scores to a common 0--100 scale, and combines them into a weighted metric augmented by a customizable rule engine for organization-specific security policies. Experimental results show that full hardening increases OpenSCAP compliance from 39.7 to 71.8, while custom rule compliance improves from 39.3\% to 83.6\%. The results demonstrate that UCA provides a clearer and more reproducible assessment of security posture than individual tools alone, enabling systematic evaluation of hardening effectiveness in programmable testbed environments.

preprint2022arXiv

Balanced Datasets for IoT IDS

As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this study, four commonly used datasets are visualized and analyzed visually. Moreover, it proposes a sampling algorithm that generates a sample that represents the original dataset. In addition, it proposes an algorithm to generate a balanced dataset. Researchers can use this paper as a starting point when investigating cybersecurity and machine learning. The proposed sampling algorithms showed reliability in generating well-representing and balanced samples from NSL-KDD, UNSW-NB15, BotNetIoT-01, and BoTIoT datasets.

preprint2022arXiv

Benchmark Assessment for DeepSpeed Optimization Library

Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets while producing high accuracy and performance metrics. The size of such datasets and the complexity of DL models cause such models to be complex, consuming large amount of resources and time to train. Many recent libraries and applications are introduced to deal with DL complexity and efficiency issues. In this paper, we evaluated one example, Microsoft DeepSpeed library through classification tasks. DeepSpeed public sources reported classification performance metrics on the LeNet architecture. We extended this through evaluating the library on several modern neural network architectures, including convolutional neural networks (CNNs) and Vision Transformer (ViT). Results indicated that DeepSpeed, while can make improvements in some of those cases, it has no or negative impact on others.

preprint2021arXiv

Adversarial Machine Learning in Text Analysis and Generation

The research field of adversarial machine learning witnessed a significant interest in the last few years. A machine learner or model is secure if it can deliver main objectives with acceptable accuracy, efficiency, etc. while at the same time, it can resist different types and/or attempts of adversarial attacks. This paper focuses on studying aspects and research trends in adversarial machine learning specifically in text analysis and generation. The paper summarizes main research trends in the field such as GAN algorithms, models, types of attacks, and defense against those attacks.

preprint2021arXiv

An ontological analysis of misinformation in online social networks

The internet, Online Social Networks (OSNs) and smart phones enable users to create tremendous amount of information. Users who search for general or specific knowledge may not have these days problems of information scarce but misinformation. Misinformation nowadays can refer to a continuous spectrum between what can be seen as "facts" or "truth", if humans agree on the existence of such, to false information that everyone agree that it is false. In this paper, we will look at this spectrum of information/misinformation and compare between some of the major relevant concepts. While few fact-checking websites exist to evaluate news articles or some of the popular claims people exchange, nonetheless this can be seen as a little effort in the mission to tag online information with their "proper" category or label.

preprint2012arXiv

Annotations, Collaborative Tagging, and Searching Mathematics in E-Learning

This paper presents a new framework for adding semantics into e-learning system. The proposed approach relies on two principles. The first principle is the automatic addition of semantic information when creating the mathematical contents. The second principle is the collaborative tagging and annotation of the e-learning contents and the use of an ontology to categorize the e-learning contents. The proposed system encodes the mathematical contents using presentation MathML with RDFa annotations. The system allows students to highlight and annotate specific parts of the e-learning contents. The objective is to add meaning into the e-learning contents, to add relationships between contents, and to create a framework to facilitate searching the contents. This semantic information can be used to answer semantic queries (e.g., SPARQL) to retrieve information request of a user. This work is implemented as an embedded code into Moodle e-learning system.

preprint2012arXiv

Indexing of Arabic documents automatically based on lexical analysis

The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance and results showed that this process can effectively replace the effort of manually indexing books and document, a process that can be very useful in all information processing and retrieval applications.

Izzat Alsmadi

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

Security Hardening Using FABRIC: Implementing a Unified Compliance Aggregator for Linux Servers

Balanced Datasets for IoT IDS

Benchmark Assessment for DeepSpeed Optimization Library

Adversarial Machine Learning in Text Analysis and Generation

An ontological analysis of misinformation in online social networks

Annotations, Collaborative Tagging, and Searching Mathematics in E-Learning

Indexing of Arabic documents automatically based on lexical analysis