Researcher profile

Abusayeed Saifullah

Abusayeed Saifullah contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Feature-Aware Task-to-Core Allocation in Embedded Multi-core Platforms via Statistical Learning

Optimizing task-to-core allocation can substantially reduce power consumption in multi-core platforms without degrading user experience. However, existing approaches overlook critical factors such as parallelism, compute intensity, and heterogeneous core types. In this paper, we introduce a statistical learning approach for feature selection that identifies the most influential features-such as core type, speed, temperature, and application-level parallelism or memory intensity-for accurate environment modeling and efficient energy minimization, a critical consideration for embedded systems. Our experiments, conducted with state-of-the-art Linux governors and thermal modeling techniques, show that correlation-aware task-to-core allocation lowers energy consumption by up to 10% and reduces core temperature by up to 5C compared to random core selection. Furthermore, our compressed, bootstrapped regression model improves thermal prediction accuracy by 6% while cutting model parameters by 16%, yielding an overall mean square error reduction of 61.6% relative to existing approaches. We provided results based on superscalar Intel Core i7 12th Gen processors with 14 cores, and validated our method across a diverse set of hardware platforms and effectively balanced performance, power, and thermal demands through statistical feature evaluation.

preprint2026arXiv

FlowRL: Flow-Augmented Few-Shot Reinforcement Learning for Semi-Structured Sensor Data

Reinforcement learning (RL) in few-shot scenarios with limited sensor data is challenging due to insufficient training samples, particularly in applications like Dynamic Voltage and Frequency Scaling (DVFS) where sensor readings are semi-structured with inherent correlations. We propose Flow-Augmented Reinforcement Learning (FlowRL), a novel method that leverages continuous normalizing flows to generate high-quality synthetic data for few-shot RL. By integrating latent space bootstrapping for diversity and feature-weighted flow matching to preserve critical data correlations, FlowRL enhances sample efficiency and policy robustness. Evaluated on a DVFS case study using the NVIDIA Jetson TX2, our approach achieves up to 35\% higher frame rates and faster Q-value convergence compared to baselines, demonstrating its effectiveness in resource-constrained environments. FlowRL generalizes to other semi-structured domains, such as robotics and smart grids, offering a scalable solution for data-scarce RL settings.

preprint2026arXiv

HiDVFS: A Hierarchical Multi-Agent DVFS Scheduler for OpenMP DAG Workloads

With advancements in multicore embedded systems, leakage power, exponentially tied to chip temperature, has surpassed dynamic power consumption. Energy-aware solutions use dynamic voltage and frequency scaling (DVFS) to mitigate overheating in performance-intensive scenarios, while software approaches allocate high-utilization tasks across core configurations in parallel systems to reduce power. However, existing heuristics lack per-core frequency monitoring, failing to address overheating from uneven core activity, and task assignments without detailed profiling overlook irregular execution patterns. We target OpenMP DAG workloads. Because makespan, energy, and thermal goals often conflict within a single benchmark, this work prioritizes performance (makespan) while reporting energy and thermal as secondary outcomes. To overcome these issues, we propose HiDVFS (a hierarchical multi-agent, performance-aware DVFS scheduler) for parallel systems that optimizes task allocation based on profiling data, core temperatures, and makespan-first objectives. It employs three agents: one selects cores and frequencies using profiler data, another manages core combinations via temperature sensors, and a third sets task priorities during resource contention. A makespan-focused reward with energy and temperature regularizers estimates future states and enhances sample efficiency. Experiments on the NVIDIA Jetson TX2 using the BOTS suite (9 benchmarks) compare HiDVFS against state-of-the-art approaches. With multi-seed validation (seeds 42, 123, 456), HiDVFS achieves the best finetuned performance with 4.16 plus/minus 0.58s average makespan (L10), representing a 3.44x speedup over GearDVFS (14.32 plus/minus 2.61s) and 50.4% energy reduction (63.7 kJ vs 128.4 kJ). Across all BOTS benchmarks, HiDVFS achieves an average 3.95x speedup and 47.1% energy reduction.

preprint2022arXiv

Transparent and Tamper-Proof Event Ordering in the Internet of Things Platforms

Today, the audit and diagnosis of the causal relationships between the events in a trigger-action-based event chain (e.g., why is a light turned on in a smart home?) in the Internet of Things (IoT) platforms are untrustworthy and unreliable. The current IoT platforms lack techniques for transparent and tamper-proof ordering of events due to their device-centric logging mechanism. In this paper, we develop a framework that facilitates tamper-proof transparency and event order in an IoT platform by proposing a Blockchain protocol and adopting the vector clock system, both tailored for the resource-constrained heterogeneous IoT devices, respectively. To cope with the unsuited storage (e.g., ledger) and computing power (e.g., proof of work puzzle) requirements of the Blockchain in the commercial off-the-shelf IoT devices, we propose a partial consistent cut protocol and engineer a modular arithmetic-based lightweight proof of work puzzle, respectively. To the best of our knowledge, this is the first Blockchain designed for resource-constrained heterogeneous IoT platforms. Our event ordering protocol based on the vector clock system is also novel for the IoT platforms. We implement our framework using an IoT gateway and 30 IoT devices. We experiment with 10 concurrent trigger-action-based event chains while each chain involves 20 devices, and each device participates in 5 different chains. The results show that our framework may order these events in 2.5 seconds while consuming only 140 mJ of energy per device. The results hence demonstrate the proposed platform as a practical choice for many IoT applications such as smart home, traffic monitoring, and crime investigation.

preprint2021arXiv

Handling Mobility in Low-Power Wide-Area Network

Despite the proliferation of mobile devices in various wide-area Internet of Things applications (e.g., smart city, smart farming), current Low-Power Wide-Area Networks (LPWANs) are not designed to effectively support mobile nodes. In this paper, we propose to handle mobility in SNOW (Sensor Network Over White spaces), an LPWAN that operates in the TV white spaces. SNOW supports massive concurrent communication between a base station (BS) and numerous low-power nodes through a distributed implementation of OFDM. In SNOW, inter-carrier interference (ICI) is more pronounced under mobility due to its OFDM based design. Geospatial variation of white spaces also raises challenges in both intra- and inter-network mobility as the low-power nodes are not equipped to determine white spaces. To handle mobility impacts on ICI, we propose a dynamic carrier frequency offset estimation and compensation technique which takes into account Doppler shifts without requiring to know the speed of the nodes. We also propose to circumvent the mobility impacts on geospatial variation of white space through a mobility-aware spectrum assignment to nodes. To enable mobility of the nodes across different SNOWs, we propose an efficient handoff management through a fast and energy-efficient BS discovery and quick association with the BS by combining time and frequency domain energy-sensing. Experiments through SNOW deployments in a large metropolitan city and indoors show that our proposed approaches enable mobility across multiple different SNOWs and provide robustness in terms of reliability, latency, and energy consumption under mobility.

preprint2021arXiv

LPWAN in the TV White Spaces: A Practical Implementation and Deployment Experiences

Low-Power Wide-Area Network (LPWAN) is an enabling Internet-of-Things (IoT) technology that supports long-range, low-power, and low-cost connectivity to numerous devices. To avoid the crowd in the limited ISM band (where most LPWANs operate) and cost of licensed band, the recently proposed SNOW (Sensor Network over White Spaces) is a promising LPWAN platform that operates over the TV white spaces. As it is a very recent technology and is still in its infancy, the current SNOW implementation uses the USRP devices as LPWAN nodes, which has high costs (~$750 USD per device) and large form-factors, hindering its applicability in practical deployment. In this paper, we implement SNOW using low-cost, low form-factor, low-power, and widely available commercial off-the-shelf (COTS) devices to enable its practical and large-scale deployment. Our choice of the COTS device (TI CC13x0: CC1310 or CC1350) consequently brings down the cost and form-factor of a SNOW node by 25x and 10x, respectively. Such implementation of SNOW on the CC13x0 devices, however, faces a number of challenges to enable link reliability and communication range. Our implementation addresses these challenges by handling peak-to-average power ratio problem, channel state information estimation, carrier frequency offset estimation, and near-far power problem. Our deployment in the city of Detroit, Michigan demonstrates that CC13x0-based SNOW can achieve uplink and downlink throughputs of 11.2kbps and 4.8kbps per node, respectively, over a distance of 1km. Also, the overall throughput in the uplink increases linearly with the increase in the number of SNOW nodes.

preprint2020arXiv

Bringing Inter-Thread Cache Benefits to Federated Scheduling -- Extended Results & Technical Report

Multiprocessor scheduling of hard real-time tasks modeled by directed acyclic graphs (DAGs) exploits the inherent parallelism presented by the model. For DAG tasks, a node represents a request to execute an object on one of the available processors. In one DAG task, there may be multiple execution requests for one object, each represented by a distinct node. These distinct execution requests offer an opportunity to reduce their combined cache overhead through coordinated scheduling of objects as threads within a parallel task. The goal of this work is to realize this opportunity by incorporating the cache-aware BUNDLE-scheduling algorithm into federated scheduling of sporadic DAG task sets. This is the first work to incorporate instruction cache sharing into federated scheduling. The result is a modification of the DAG model named the DAG with objects and threads (DAG-OT). Under the DAG-OT model, descriptions of nodes explicitly include their underlying executable object and number of threads. When possible, nodes assigned the same executable object are collapsed into a single node; joining their threads when BUNDLE-scheduled. Compared to the DAG model, the DAG-OT model with cache-aware scheduling reduces the number of cores allocated to individual tasks by approximately 20 percent in the synthetic evaluation and up to 50 percent on a novel parallel computing platform implementation. By reducing the number of allocated cores, the DAG-OT model is able to schedule a subset of previously infeasible task sets.

preprint2020arXiv

Integrating Low-Power Wide-Area Networks for Enhanced Scalability and Extended Coverage

Low-Power Wide-Area Networks (LPWANs) are evolving as an enabling technology for Internet-of-Things (IoT) due to their capability of communicating over long distances at very low transmission power. Existing LPWAN technologies, however, face limitations in meeting scalability and covering very wide areas which make their adoption challenging for future IoT applications, especially in infrastructure-limited rural areas. To address this limitation, in this paper, we consider achieving scal-ability and extended coverage by integrating multiple LPWANs. SNOW (Sensor Network Over White Spaces), a recently proposed LPWAN architecture over the TV white spaces, has demonstrated its advantages over existing LPWANs in performance and energy-efficiency. In this paper, we propose to scale up LPWANs through a seamless integration of multiple SNOWs which enables concurrent inter-SNOW and intra-SNOW communications. We then formulate the tradeoff between scalability and inter-SNOW interference as a constrained optimization problem whose objective is to maximize scalability by managing white space spectrum sharing across multiple SNOWs. We also prove the NP-hardness of this problem. To this extent, We propose an intuitive polynomial-time heuristic algorithm for solving the scalability optimization problem which is highly efficient in practice. For the sake of theoretical bound, we also propose a simple polynomial-time 1/2-approximation algorithm for the scalability optimization problem. Hardware experiments through deployment in an area of (25x15)sq. km as well as large scale simulations demonstrate the effectiveness of our algorithms and feasibility of achieving scalability through seamless integration of SNOWs with high reliability, low latency, and energy efficiency.

preprint2020arXiv

Long-Lived LoRa: Prolonging the Lifetime of a LoRa Network

Prolonging the network lifetime is a major consideration in many Internet of Things applications. In this paper, we study maximizing the network lifetime of an energy-harvesting LoRa network. Such a network is characterized by heterogeneous recharging capabilities across the nodes that is not taken into account in existing work. We propose a link-layer protocol to achieve a long-lived LoRa network which dynamically enables the nodes with depleting batteries to exploit the superfluous energy of the neighboring nodes with affluent batteries by letting a depleting node offload its packets to an affluent node. By exploiting the LoRa's capability of adjusting multiple transmission parameters, we enable low-cost offloading by depleting nodes instead of high-cost direct forwarding. Such offloading requires synchronization of wake-up times as well as transmission parameters between the two nodes which also need to be selected dynamically. The proposed protocol addresses these challenges and prolongs the lifetime of a LoRa network through three novel techniques. (1) We propose a lightweight medium access control protocol for peer-to-peer communication to enable packet offloading which circumvents the synchronization overhead between the two nodes. (2) We propose an intuitive heuristic method for effective parameter selections for different modes (conventional vs. offloading). (3) We analyze the energy overhead of offloading and, based on it, the protocol dynamically selects affluent and depleting nodes while ensuring that an affluent node is not overwhelmed by the depleting ones. Simulations in NS-3 as well as real experiments show that our protocol can increase the network lifetime up to $4$ times while maintaining the same throughput compared to traditional LoRa network.