Researcher profile

Radu Marculescu

Radu Marculescu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2025arXiv

PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing

Long-form video editing poses unique challenges due to the exponential increase in the computational cost from joint editing and Denoising Diffusion Implicit Models (DDIM) inversion across extended sequences. To address these limitations, we propose PipeFlow, a scalable, pipelined video editing method that introduces three key innovations: First, based on a motion analysis using Structural Similarity Index Measure (SSIM) and Optical Flow, we identify and propose to skip editing of frames with low motion. Second, we propose a pipelined task scheduling algorithm that splits a video into multiple segments and performs DDIM inversion and joint editing in parallel based on available GPU memory. Lastly, we leverage a neural network-based interpolation technique to smooth out the border frames between segments and interpolate the previously skipped frames. Our method uniquely scales to longer videos by dividing them into smaller segments, allowing PipeFlow's editing time to increase linearly with video length. In principle, this enables editing of infinitely long videos without the growing per-frame computational overhead encountered by other methods. PipeFlow achieves up to a 9.6X speedup compared to TokenFlow and a 31.7X speedup over Diffusion Motion Transfer (DMT).

preprint2022arXiv

SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition. Nevertheless, resource-efficient GNN learning is a rarely explored topic despite its many benefits for edge computing and Internet of Things (IoT) applications. To improve this state of affairs, this work proposes efficient subgraph-level training via resource-aware graph partitioning (SUGAR). SUGAR first partitions the initial graph into a set of disjoint subgraphs and then performs local training at the subgraph-level. We provide a theoretical analysis and conduct extensive experiments on five graph benchmarks to verify its efficacy in practice. Our results show that SUGAR can achieve up to 33 times runtime speedup and 3.8 times memory reduction on large-scale graphs. We believe SUGAR opens a new research direction towards developing GNN methods that are resource-efficient, hence suitable for IoT deployment.

preprint2020arXiv

Centralized and decentralized isolation strategies and their impact on the COVID-19 pandemic dynamics

The infectious diseases are spreading due to human interactions enabled by various social networks. Therefore, when a new pathogen such as SARS-CoV-2 causes an outbreak, the non-pharmaceutical isolation strategies (e.g., social distancing) are the only possible response to disrupt its spreading. To this end, we introduce the new epidemic model (SICARS) and compare the centralized (C), decentralized (D), and combined (C+D) social distancing strategies, and analyze their efficiency to control the dynamics of COVID-19 on heterogeneous complex networks. Our analysis shows that the centralized social distancing is necessary to minimize the pandemic spreading. The decentralized strategy is insufficient when used alone, but offers the best results when combined with the centralized one. Indeed, the (C+D) is the most efficient isolation strategy at mitigating the network superspreaders and reducing the highest node degrees to less than 10% of their initial values. Our results also indicate that stronger social distancing, e.g., cutting 75% of social ties, can reduce the outbreak by 75% for the C isolation, by 33% for the D isolation, and by 87% for the (C+D) isolation strategy. Finally, we study the impact of proactive versus reactive isolation strategies, as well as their delayed enforcement. We find that the reactive response to the pandemic is less efficient, and delaying the adoption of isolation measures by over one month (since the outbreak onset in a region) can have alarming effects; thus, our study contributes to an understanding of the COVID-19 pandemic both in space and time. We believe our investigations have a high social relevance as they provide insights into understanding how different degrees of social distancing can reduce the peak infection ratio substantially; this can make the COVID-19 pandemic easier to understand and control over an extended period of time.

preprint2020arXiv

DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework

Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared to homogeneous architectures. They can be further tailored to a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this potential is contingent upon optimizing the SoC for the target domain and utilizing its resources effectively at runtime. To this end, system-level design - including scheduling, power-thermal management algorithms and design space exploration studies - plays a crucial role. This paper presents a system-level domain-specific SoC simulation (DS3) framework to address this need. DS3 enables both design space exploration and dynamic resource management for power-performance optimization of domain applications. We showcase DS3 using six real-world applications from wireless communications and radar processing domain. DS3, as well as the reference applications, is shared as open-source software to stimulate research in this area.

preprint2020arXiv

New Directions in Distributed Deep Learning: Bringing the Network at Forefront of IoT Design

In this paper, we first highlight three major challenges to large-scale adoption of deep learning at the edge: (i) Hardware-constrained IoT devices, (ii) Data security and privacy in the IoT era, and (iii) Lack of network-aware deep learning algorithms for distributed inference across multiple IoT devices. We then provide a unified view targeting three research directions that naturally emerge from the above challenges: (1) Federated learning for training deep networks, (2) Data-independent deployment of learning algorithms, and (3) Communication-aware distributed inference. We believe that the above research directions need a network-centric approach to enable the edge intelligence and, therefore, fully exploit the true potential of IoT.

preprint2020arXiv

On Network Science and Mutual Information for Explaining Deep Neural Networks

In this paper, we present a new approach to interpret deep learning models. By coupling mutual information with network science, we explore how information flows through feedforward networks. We show that efficiently approximating mutual information allows us to create an information measure that quantifies how much information flows between any two neurons of a deep learning model. To that end, we propose NIF, Neural Information Flow, a technique for codifying information flow that exposes deep learning model internals and provides feature attributions.

preprint2020arXiv

Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-Core Systems

Domain-specific systems-on-chip, a class of heterogeneous many-core systems, are recognized as a key approach to narrow down the performance and energy-efficiency gap between custom hardware accelerators and programmable processors. Reaching the full potential of these architectures depends critically on optimally scheduling the applications to available resources at runtime. Existing optimization-based techniques cannot achieve this objective at runtime due to the combinatorial nature of the task scheduling problem. As the main theoretical contribution, this paper poses scheduling as a classification problem and proposes a hierarchical imitation learning (IL)-based scheduler that learns from an Oracle to maximize the performance of multiple domain-specific applications. Extensive evaluations with six streaming applications from wireless communications and radar domains show that the proposed IL-based scheduler approximates an offline Oracle policy with more than 99% accuracy for performance- and energy-based optimization objectives. Furthermore, it achieves almost identical performance to the Oracle with a low runtime overhead and successfully adapts to new applications, many-core system configurations, and runtime variations in application characteristics.