Source author record

Radu Marculescu

Radu Marculescu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Machine Learning Artificial Intelligence Computer Vision Distributed, Parallel, and Cluster Computing eess.SP Multiagent Systems physics.soc-ph Populations and Evolution Social and Information Networks

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing

Long-form video editing poses unique challenges due to the exponential increase in the computational cost from joint editing and Denoising Diffusion Implicit Models (DDIM) inversion across extended sequences. To address these limitations, we propose PipeFlow, a scalable, pipelined video editing method that introduces three key innovations: First, based on a motion analysis using Structural Similarity Index Measure (SSIM) and Optical Flow, we identify and propose to skip editing of frames with low motion. Second, we propose a pipelined task scheduling algorithm that splits a video into multiple segments and performs DDIM inversion and joint editing in parallel based on available GPU memory. Lastly, we leverage a neural network-based interpolation technique to smooth out the border frames between segments and interpolate the previously skipped frames. Our method uniquely scales to longer videos by dividing them into smaller segments, allowing PipeFlow's editing time to increase linearly with video length. In principle, this enables editing of infinitely long videos without the growing per-frame computational overhead encountered by other methods. PipeFlow achieves up to a 9.6X speedup compared to TokenFlow and a 31.7X speedup over Diffusion Motion Transfer (DMT).

preprint2022arXiv

SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition. Nevertheless, resource-efficient GNN learning is a rarely explored topic despite its many benefits for edge computing and Internet of Things (IoT) applications. To improve this state of affairs, this work proposes efficient subgraph-level training via resource-aware graph partitioning (SUGAR). SUGAR first partitions the initial graph into a set of disjoint subgraphs and then performs local training at the subgraph-level. We provide a theoretical analysis and conduct extensive experiments on five graph benchmarks to verify its efficacy in practice. Our results show that SUGAR can achieve up to 33 times runtime speedup and 3.8 times memory reduction on large-scale graphs. We believe SUGAR opens a new research direction towards developing GNN methods that are resource-efficient, hence suitable for IoT deployment.

preprint2020arXiv

Centralized and decentralized isolation strategies and their impact on the COVID-19 pandemic dynamics

The infectious diseases are spreading due to human interactions enabled by various social networks. Therefore, when a new pathogen such as SARS-CoV-2 causes an outbreak, the non-pharmaceutical isolation strategies (e.g., social distancing) are the only possible response to disrupt its spreading. To this end, we introduce the new epidemic model (SICARS) and compare the centralized (C), decentralized (D), and combined (C+D) social distancing strategies, and analyze their efficiency to control the dynamics of COVID-19 on heterogeneous complex networks. Our analysis shows that the centralized social distancing is necessary to minimize the pandemic spreading. The decentralized strategy is insufficient when used alone, but offers the best results when combined with the centralized one. Indeed, the (C+D) is the most efficient isolation strategy at mitigating the network superspreaders and reducing the highest node degrees to less than 10% of their initial values. Our results also indicate that stronger social distancing, e.g., cutting 75% of social ties, can reduce the outbreak by 75% for the C isolation, by 33% for the D isolation, and by 87% for the (C+D) isolation strategy. Finally, we study the impact of proactive versus reactive isolation strategies, as well as their delayed enforcement. We find that the reactive response to the pandemic is less efficient, and delaying the adoption of isolation measures by over one month (since the outbreak onset in a region) can have alarming effects; thus, our study contributes to an understanding of the COVID-19 pandemic both in space and time. We believe our investigations have a high social relevance as they provide insights into understanding how different degrees of social distancing can reduce the peak infection ratio substantially; this can make the COVID-19 pandemic easier to understand and control over an extended period of time.

preprint2020arXiv

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared to homogeneous architectures. They can be further tailored to a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this potential is contingent upon optimizing the SoC for the target domain and utilizing its resources effectively at runtime. To this end, system-level design - including scheduling, power-thermal management algorithms and design space exploration studies - plays a crucial role. This paper presents a system-level domain-specific SoC simulation (DS3) framework to address this need. DS3 enables both design space exploration and dynamic resource management for power-performance optimization of domain applications. We showcase DS3 using six real-world applications from wireless communications and radar processing domain. DS3, as well as the reference applications, is shared as open-source software to stimulate research in this area.

preprint2020arXiv

New Directions in Distributed Deep Learning: Bringing the Network at Forefront of IoT Design

In this paper, we first highlight three major challenges to large-scale adoption of deep learning at the edge: (i) Hardware-constrained IoT devices, (ii) Data security and privacy in the IoT era, and (iii) Lack of network-aware deep learning algorithms for distributed inference across multiple IoT devices. We then provide a unified view targeting three research directions that naturally emerge from the above challenges: (1) Federated learning for training deep networks, (2) Data-independent deployment of learning algorithms, and (3) Communication-aware distributed inference. We believe that the above research directions need a network-centric approach to enable the edge intelligence and, therefore, fully exploit the true potential of IoT.

preprint2020arXiv

On Network Science and Mutual Information for Explaining Deep Neural Networks

In this paper, we present a new approach to interpret deep learning models. By coupling mutual information with network science, we explore how information flows through feedforward networks. We show that efficiently approximating mutual information allows us to create an information measure that quantifies how much information flows between any two neurons of a deep learning model. To that end, we propose NIF, Neural Information Flow, a technique for codifying information flow that exposes deep learning model internals and provides feature attributions.

preprint2020arXiv

Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-Core Systems

Domain-specific systems-on-chip, a class of heterogeneous many-core systems, are recognized as a key approach to narrow down the performance and energy-efficiency gap between custom hardware accelerators and programmable processors. Reaching the full potential of these architectures depends critically on optimally scheduling the applications to available resources at runtime. Existing optimization-based techniques cannot achieve this objective at runtime due to the combinatorial nature of the task scheduling problem. As the main theoretical contribution, this paper poses scheduling as a classification problem and proposes a hierarchical imitation learning (IL)-based scheduler that learns from an Oracle to maximize the performance of multiple domain-specific applications. Extensive evaluations with six streaming applications from wireless communications and radar domains show that the proposed IL-based scheduler approximates an offline Oracle policy with more than 99% accuracy for performance- and energy-based optimization objectives. Furthermore, it achieves almost identical performance to the Oracle with a low runtime overhead and successfully adapts to new applications, many-core system configurations, and runtime variations in application characteristics.

preprint2007arXiv

Energy- and Performance-Driven NoC Communication Architecture Synthesis Using a Decomposition Approach

In this paper, we present a methodology for customized communication architecture synthesis that matches the communication requirements of the target application. This is an important problem, particularly for network-based implementations of complex applications. Our approach is based on using frequently encountered generic communication primitives as an alphabet capable of characterizing any given communication pattern. The proposed algorithm searches through the entire design space for a solution that minimizes the system total energy consumption, while satisfying the other design constraints. Compared to the standard mesh architecture, the customized architecture generated by the newly proposed approach shows about 36% throughput increase and 51% reduction in the energy required to encrypt 128 bits of data with a standard encryption algorithm.

preprint2007arXiv

Energy-Aware Routing for E-Textile Applications

As the scale of electronic devices shrinks, "electronic textiles" (e-textiles) will make possible a wide variety of novel applications which are currently unfeasible. Due to the wearability concerns, low-power techniques are critical for e-textile applications. In this paper, we address the issue of the energy-aware routing for e-textile platforms and propose an efficient algorithm to solve it. The platform we consider consists of dedicated components for e-textiles, including computational modules, dedicated transmission lines and thin-film batteries on fiber substrates. Furthermore, we derive an analytical upper bound for the achievable number of jobs completed over all possible routing strategies. From a practical standpoint, for the Advanced Encryption Standard (AES) cipher, the routing technique we propose achieves about fifty percent of this analytical upper bound. Moreover, compared to the non-energy-aware counterpart, our routing technique increases the number of encryption jobs completed by one order of magnitude.

Radu Marculescu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing

SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Centralized and decentralized isolation strategies and their impact on the COVID-19 pandemic dynamics

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

New Directions in Distributed Deep Learning: Bringing the Network at Forefront of IoT Design

On Network Science and Mutual Information for Explaining Deep Neural Networks

Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-Core Systems

Energy- and Performance-Driven NoC Communication Architecture Synthesis Using a Decomposition Approach

Energy-Aware Routing for E-Textile Applications