Source author record

Ali Hassan

Ali Hassan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Machine Learning Systems and Control Cryptography and Security Artificial Intelligence Computation and Language cs.CY Databases

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and insufficient code-level grounding for identifying malicious and vulnerable code segments. To address these limitations, this research introduces LCC-LLM, a code-centric benchmark dataset and evidence-grounded framework for malware attribution and multi-task static malware analysis. The proposed LCCD dataset contains approximately 34K PE samples processed through a large-scale reverse-engineering pipeline and represented using decompiled C code, assembly code, CFG/FCG artifacts, hexadecimal data, PE metadata, suspicious API evidence, and structural features. Beyond dataset construction, LCC-LLM integrates LangGraph-orchestrated static analysis with multi-source cybersecurity knowledge to support evidence-grounded malware reasoning. The framework employs a seven-layer retrieval-augmented generation pipeline, CoVe for IoC validation, and a multi-dimensional quality gate to improve factual reliability and analyst-oriented decision support. Curriculum-ordered instruction data is used to fine-tune DeepSeek-R1-Distill-Qwen-14B and Qwen3-Coder-30B-A3B using QLoRA. Evaluation across 43 malware-analysis task types achieves an average semantic similarity of 0.634, with the highest task-level performance in structured report generation, IoC extraction, vulnerability assessment, malware configuration extraction, and malware class detection. In a real-world case study using MalwareBazaar samples, the grounded pipeline achieves a 10/10 structured analysis pass rate, producing CFG/FCG evidence, MITRE ATT&CK mappings, detection guidance, and analyst-ready reports. These results show that code-centric representations, retrieval grounding, and verification-guided reasoning improve the reliability and operational usefulness of LLM-assisted malware attribution.

preprint2025arXiv

Towards eco friendly cybersecurity: machine learning based anomaly detection with carbon and energy metrics

The rising energy footprint of artificial intelligence has become a measurable component of US data center emissions, yet cybersecurity research seldom considers its environmental cost. This study introduces an eco aware anomaly detection framework that unifies machine learning based network monitoring with real time carbon and energy tracking. Using the publicly available Carbon Aware Cybersecurity Traffic Dataset comprising 2300 flow level observations, we benchmark Logistic Regression, Random Forest, Support Vector Machine, Isolation Forest, and XGBoost models across energy, carbon, and performance dimensions. Each experiment is executed in a controlled Colab environment instrumented with the CodeCarbon toolkit to quantify power draw and equivalent CO2 output during both training and inference. We construct an Eco Efficiency Index that expresses F1 score per kilowatt hour to capture the trade off between detection quality and environmental impact. Results reveal that optimized Random Forest and lightweight Logistic Regression models achieve the highest eco efficiency, reducing energy consumption by more than forty percent compared to XGBoost while sustaining competitive detection accuracy. Principal Component Analysis further decreases computational load with negligible loss in recall. Collectively, these findings establish that integrating carbon and energy metrics into cybersecurity workflows enables environmentally responsible machine learning without compromising operational protection. The proposed framework offers a reproducible path toward sustainable carbon accountable cybersecurity aligned with emerging US green computing and federal energy efficiency initiatives.

preprint2022arXiv

"hasSignification()": une nouvelle fonction de distance pour soutenir la détection de données personnelles

Today with Big Data and data lakes, we are faced of a mass of data that is very difficult to manage it manually. The protection of personal data in this context requires an automatic analysis for data discovery. Storing the names of attributes already analyzed in a knowledge base could optimize this automatic discovery. To have a better knowledge base, we should not store any attributes whose name does not make sense. In this article, to check if the name of an attribute has a meaning, we propose a solution that calculate the distances between this name and the words in a dictionary. Our studies on the distance functions like N-Gram, Jaro-Winkler and Levenshtein show limits to set an acceptance threshold for an attribute in the knowledge base. In order to overcome these limitations, our solution aims to strengthen the score calculation by using an exponential function based on the longest sequence. In addition, a double scan in dictionary is also proposed in order to process the attributes which have a compound name.

preprint2020arXiv

A Hierarchical Approach to Multi-Energy Demand Response: From Electricity to Multi-Energy Applications

Due to proliferation of energy efficiency measures and availability of the renewable energy resources, traditional energy infrastructure systems (electricity, heat, gas) can no longer be operated in a centralized manner under the assumption that consumer behavior is inflexible, i.e. cannot be adjusted in return for an adequate incentive. To allow for a less centralized operating paradigm, consumer-end perspective and abilities should be integrated in current dispatch practices and accounted for in switching between different energy sources not only at the system but also at the individual consumer level. Since consumers are confined within different built environments, this paper looks into an opportunity to control energy consumption of an aggregation of many residential, commercial and industrial consumers, into an ensemble. This ensemble control becomes a modern demand response contributor to the set of modeling tools for multi-energy infrastructure systems.

preprint2020arXiv

Data-Driven Learning and Load Ensemble Control

Demand response (DR) programs aim to engage distributed small-scale flexible loads, such as thermostatically controllable loads (TCLs), to provide various grid support services. Linearly Solvable Markov Decision Process (LS-MDP), a variant of the traditional MDP, is used to model aggregated TCLs. Then, a model-free reinforcement learning technique called Z-learning is applied to learn the value function and derive the optimal policy for the DR aggregator to control TCLs. The learning process is robust against uncertainty that arises from estimating the passive dynamics of the aggregated TCLs. The efficiency of this data-driven learning is demonstrated through simulations on Heating, Cooling & Ventilation (HVAC) units in a testbed neighborhood of residential houses.

preprint2020arXiv

Stochastic and Distributionally Robust Load Ensemble Control

Demand response (DR) programs aim to engage distributed demand-side resources in providing ancillary services for electric power systems. Previously, aggregated thermostatically controlled loads (TCLs) have been demonstrated as a technically viable and economically valuable provider of such services that can effectively compete with conventional generation resources in reducing load peaks and smoothing demand fluctuations. Yet, to provide these services at scale, a large number of TCLs must be accurately aggregated and operated in sync. This paper describes a Markov Decision Process (MDP) that aggregates and models an ensemble of TCLs. Using the MDP framework, we propose to internalize the exogenous uncertain dynamics of TCLs by means of stochastic and distributionally robust optimization. First, under mild assumptions on the underlying uncertainty, we derive analytical stochastic and distributionally robust control policies for dispatching a given TCL ensemble. Second, we further relax these mild assumptions to allow for a more delicate treatment of uncertainty, which leads to distributionally robust MDP formulations with moment- and Wasserstein-based ambiguity sets that can be efficiently solved numerically. The case study compares the analytical and numerical control policies using a simulated ensemble of 1,000 air conditioners.

Ali Hassan

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

Towards eco friendly cybersecurity: machine learning based anomaly detection with carbon and energy metrics

"hasSignification()": une nouvelle fonction de distance pour soutenir la détection de données personnelles

A Hierarchical Approach to Multi-Energy Demand Response: From Electricity to Multi-Energy Applications

Data-Driven Learning and Load Ensemble Control

Stochastic and Distributionally Robust Load Ensemble Control