Source author record

Pietro Michiardi

Pietro Michiardi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Networking and Internet Architecture Artificial Intelligence Databases Robotics

Catalog footprint

What is connected

22works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning

While the promises of Multi-Task Learning (MTL) are attractive, characterizing the conditions of its success is still an open problem in Deep Learning. Some tasks may benefit from being learned together while others may be detrimental to one another. From a task perspective, grouping cooperative tasks while separating competing tasks is paramount to reap the benefits of MTL, i.e., reducing training and inference costs. Therefore, estimating task affinity for joint learning is a key endeavor. Recent work suggests that the training conditions themselves have a significant impact on the outcomes of MTL. Yet, the literature is lacking of a benchmark to assess the effectiveness of tasks affinity estimation techniques and their relation with actual MTL performance. In this paper, we take a first step in recovering this gap by (i) defining a set of affinity scores by both revisiting contributions from previous literature as well presenting new ones and (ii) benchmarking them on the Taskonomy dataset. Our empirical campaign reveals how, even in a small-scale scenario, task affinity scoring does not correlate well with actual MTL performance. Yet, some metrics can be more indicative than others.

preprint2022arXiv

Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios

When learning to behave in a stochastic environment where safety is critical, such as driving a vehicle in traffic, it is natural for human drivers to plan fallback strategies as a backup to use if ever there is an unexpected change in the environment. Knowing to expect the unexpected, and planning for such outcomes, increases our capability for being robust to unseen scenarios and may help prevent catastrophic failures. Control of Autonomous Vehicles (AVs) has a particular interest in knowing when and how to use fallback strategies in the interest of safety. Due to imperfect information available to an AV about its environment, it is important to have alternate strategies at the ready which might not have been deduced from the original training data distribution. In this paper we present a principled approach for a model-free Reinforcement Learning (RL) agent to capture multiple modes of behaviour in an environment. We introduce an extra pseudo-reward term to the reward model, to encourage exploration to areas of state-space different from areas privileged by the optimal policy. We base this reward term on a distance metric between the trajectories of agents, in order to force policies to focus on different areas of state-space than the initial exploring agent. Throughout the paper, we refer to this particular training paradigm as learning fallback strategies. We apply this method to an autonomous driving scenario, and show that we are able to learn useful policies that would have otherwise been missed out on during training, and unavailable to use when executing the control algorithm.

preprint2022arXiv

Do Deep Neural Networks Contribute to Multivariate Time Series Anomaly Detection?

Anomaly detection in time series is a complex task that has been widely studied. In recent years, the ability of unsupervised anomaly detection algorithms has received much attention. This trend has led researchers to compare only learning-based methods in their articles, abandoning some more conventional approaches. As a result, the community in this field has been encouraged to propose increasingly complex learning-based models mainly based on deep neural networks. To our knowledge, there are no comparative studies between conventional, machine learning-based and, deep neural network methods for the detection of anomalies in multivariate time series. In this work, we study the anomaly detection performance of sixteen conventional, machine learning-based and, deep neural network approaches on five real-world open datasets. By analyzing and comparing the performance of each of the sixteen methods, we show that no family of methods outperforms the others. Therefore, we encourage the community to reincorporate the three categories of methods in the anomaly detection in multivariate time series benchmarks.

preprint2022arXiv

Improved optimization strategies for deep Multi-Task Networks

In Multi-Task Learning (MTL), it is a common practice to train multi-task networks by optimizing an objective function, which is a weighted average of the task-specific objective functions. Although the computational advantages of this strategy are clear, the complexity of the resulting loss landscape has not been studied in the literature. Arguably, its optimization may be more difficult than a separate optimization of the constituting task-specific objectives. In this work, we investigate the benefits of such an alternative, by alternating independent gradient descent steps on the different task-specific objective functions and we formulate a novel way to combine this approach with state-of-the-art optimizers. As the separation of task-specific objectives comes at the cost of increased computational time, we propose a random task grouping as a trade-off between better optimization and computational efficiency. Experimental results over three well-known visual MTL datasets show better overall absolute performance on losses and standard metrics compared to an averaged objective function and other state-of-the-art MTL methods. In particular, our method shows the most benefits when dealing with tasks of different nature and it enables a wider exploration of the shared parameter space. We also show that our random grouping strategy allows to trade-off between these benefits and computational efficiency.

preprint2022arXiv

Safer Autonomous Driving in a Stochastic, Partially-Observable Environment by Hierarchical Contingency Planning

When learning to act in a stochastic, partially observable environment, an intelligent agent should be prepared to anticipate a change in its belief of the environment state, and be capable of adapting its actions on-the-fly to changing conditions. As humans, we are able to form contingency plans when learning a task with the explicit aim of being able to correct errors in the initial control, and hence prove useful if ever there is a sudden change in our perception of the environment which requires immediate corrective action. This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount, and a strong ability to react to a changing belief about the environment is truly needed. In this paper we explore an end-to-end approach, from training to execution, for learning robust contingency plans and combining them with a hierarchical planner to obtain a robust agent policy in an autonomous navigation task where other vehicles' behaviours are unknown, and the agent's belief about these behaviours is subject to sudden, last-second change. We show that our approach results in robust, safe behaviour in a partially observable, stochastic environment, generalizing well over environment dynamics not seen during training.

preprint2021arXiv

An Identifiable Double VAE For Disentangled Representations

A large part of the literature on learning disentangled representations focuses on variational autoencoders (VAE). Recent developments demonstrate that disentanglement cannot be obtained in a fully unsupervised setting without inductive biases on models and data. However, Khemakhem et al., AISTATS, 2020 suggest that employing a particular form of factorized prior, conditionally dependent on auxiliary variables complementing input observations, can be one such bias, resulting in an identifiable model with guarantees on disentanglement. Working along this line, we propose a novel VAE-based generative model with theoretical guarantees on identifiability. We obtain our conditional prior over the latents by learning an optimal representation, which imposes an additional strength on their regularization. We also extend our method to semi-supervised settings. Experimental results indicate superior performance with respect to state-of-the-art approaches, according to several established metrics proposed in the literature on disentanglement.

preprint2021arXiv

Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. In this work we study the performance of asynchronous, distributed settings, when applying sparsification, a technique used to reduce communication overheads. In particular, for the first time in an asynchronous, non-convex setting, we theoretically prove that, in presence of staleness, sparsification does not harm SGD performance: the ergodic convergence rate matches the known result of standard SGD, that is $\mathcal{O} \left( 1/\sqrt{T} \right)$. We also carry out an empirical study to complement our theory, and confirm that the effects of sparsification on the convergence rate are negligible, when compared to 'vanilla' SGD, even in the challenging scenario of an asynchronous, distributed system.

preprint2020arXiv

A Variational View on Bootstrap Ensembles as Bayesian Inference

In this paper, we employ variational arguments to establish a connection between ensemble methods for Neural Networks and Bayesian inference. We consider an ensemble-based scheme where each model/particle corresponds to a perturbation of the data by means of parametric bootstrap and a perturbation of the prior. We derive conditions under which any optimization steps of the particles makes the associated distribution reduce its divergence to the posterior over model parameters. Such conditions do not require any particular form for the approximation and they are purely geometrical, giving insights on the behavior of the ensemble on a number of interesting models such as Neural Networks with ReLU activations. Experiments confirm that ensemble methods can be a valid alternative to approximate Bayesian inference; the theoretical developments in the paper seek to explain this behavior.

preprint2020arXiv

Isotropic SGD: a Practical Approach to Bayesian Posterior Sampling

In this work we define a unified mathematical framework to deepen our understanding of the role of stochastic gradient (SG) noise on the behavior of Markov chain Monte Carlo sampling (SGMCMC) algorithms. Our formulation unlocks the design of a novel, practical approach to posterior sampling, which makes the SG noise isotropic using a fixed learning rate that we determine analytically, and that requires weaker assumptions than existing algorithms. In contrast, the common traits of existing \sgmcmc algorithms is to approximate the isotropy condition either by drowning the gradients in additive noise (annealing the learning rate) or by making restrictive assumptions on the \sg noise covariance and the geometry of the loss landscape. Extensive experimental validations indicate that our proposal is competitive with the state-of-the-art on \sgmcmc, while being much more practical to use.

preprint2016arXiv

Bleach: A Distributed Stream Data Cleaning System

In this paper we address the problem of rule-based stream data cleaning, which sets stringent requirements on latency, rule dynamics and ability to cope with the unbounded nature of data streams. We design a system, called Bleach, which achieves real-time violation detection and data repair on a dirty data stream. Bleach relies on efficient, compact and distributed data structures to maintain the necessary state to repair data, using an incremental version of the equivalence class algorithm. Additionally, it supports rule dynamics and uses a "cumulative" sliding window operation to improve cleaning accuracy. We evaluate a prototype of Bleach using a TPC-DS derived dirty data stream and observe its high throughput, low latency and high cleaning accuracy, even with rule dynamics. Experimental results indicate superior performance of Bleach compared to a baseline system built on the micro-batch streaming paradigm.

preprint2016arXiv

DiNoDB: an Interactive-speed Query Engine for Ad-hoc Queries on Temporary Data

As data sets grow in size, analytics applications struggle to get instant insight into large datasets. Modern applications involve heavy batch processing jobs over large volumes of data and at the same time require efficient ad-hoc interactive analytics on temporary data. Existing solutions, however, typically focus on one of these two aspects, largely ignoring the need for synergy between the two. Consequently, interactive queries need to re-iterate costly passes through the entire dataset (e.g., data loading) that may provide meaningful return on investment only when data is queried over a long period of time. In this paper, we propose DiNoDB, an interactive-speed query engine for ad-hoc queries on temporary data. DiNoDB avoids the expensive loading and transformation phase that characterizes both traditional RDBMSs and current interactive analytics solutions. It is tailored to modern workflows found in machine learning and data exploration use cases, which often involve iterations of cycles of batch and interactive analytics on data that is typically useful for a narrow processing window. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata that DiNoDB exploits to expedite the interactive queries. Our experimental analysis demonstrates that DiNoDB achieves very good performance for a wide range of ad-hoc queries compared to alternatives %such as Hive, Stado, SparkSQL and Impala.

preprint2016arXiv

Fast Online k-nn Graph Building

In this paper we propose an online approximate k-nn graph building algorithm, which is able to quickly update a k-nn graph using a flow of data points. One very important step of the algorithm consists in using the current distributed graph to search for the neighbors of a new node. Hence we also propose a distributed partitioning method based on balanced k-medoids clustering, that we use to optimize the distributed search process. Finally, we present the improved sequential search procedure that is used inside each partition. We also perform an experimental evaluation of the different algorithms, where we study the influence of the parameters and compare the result of our algorithms to existing state of the art. This experimental evaluation confirms that the fast online k-nn graph building algorithm produces a graph that is highly similar to the graph produced by an offline exhaustive algorithm, while it requires less similarity computations.

preprint2015arXiv

On Fair Size-Based Scheduling

By executing jobs serially rather than in parallel, size-based scheduling policies can shorten time needed to complete jobs; however, major obstacles to their applicability are fairness guarantees and the fact that job sizes are rarely known exactly a-priori. Here, we introduce the Pri family of size-based scheduling policies; Pri simulates any reference scheduler and executes jobs in the order of their simulated completion: we show that these schedulers give strong fairness guarantees, since no job completes later in Pri than in the reference policy. In addition, we introduce PSBS, a practical implementation of such a scheduler: it works online (i.e., without needing knowledge of jobs submitted in the future), it has an efficient O(log n) implementation and it allows setting priorities to jobs. Most importantly, unlike earlier size-based policies, the performance of PSBS degrades gracefully with errors, leading to performances that are close to optimal in a variety of realistic use cases.

preprint2015arXiv

PSBS: Practical Size-Based Scheduling

Size-based schedulers have very desirable performance properties: optimal or near-optimal response time can be coupled with strong fairness guarantees. Despite this, such systems are very rarely implemented in practical settings, because they require knowing a priori the amount of work needed to complete jobs: this assumption is very difficult to satisfy in concrete systems. It is definitely more likely to inform the system with an estimate of the job sizes, but existing studies point to somewhat pessimistic results if existing scheduler policies are used based on imprecise job size estimations. We take the goal of designing scheduling policies that are explicitly designed to deal with inexact job sizes: first, we show that existing size-based schedulers can have bad performance with inexact job size information when job sizes are heavily skewed; we show that this issue, and the pessimistic results shown in the literature, are due to problematic behavior when large jobs are underestimated. Once the problem is identified, it is possible to amend existing size-based schedulers to solve the issue. We generalize FSP -- a fair and efficient size-based scheduling policy -- in order to solve the problem highlighted above; in addition, our solution deals with different job weights (that can be assigned to a job independently from its size). We provide an efficient implementation of the resulting protocol, which we call Practical Size-Based Scheduler (PSBS). Through simulations evaluated on synthetic and real workloads, we show that PSBS has near-optimal performance in a large variety of cases with inaccurate size information, that it performs fairly and it handles correctly job weights. We believe that this work shows that PSBS is indeed pratical, and we maintain that it could inspire the design of schedulers in a wide array of real-world use cases.

preprint2014arXiv

Adaptive Redundancy Management for Durable P2P Backup

We design and analyze the performance of a redundancy management mechanism for Peer-to-Peer backup applications. Armed with the realization that a backup system has peculiar requirements -- namely, data is read over the network only during restore processes caused by data loss -- redundancy management targets data durability rather than attempting to make each piece of information availabile at any time. In our approach each peer determines, in an on-line manner, an amount of redundancy sufficient to counter the effects of peer deaths, while preserving acceptable data restore times. Our experiments, based on trace-driven simulations, indicate that our mechanism can reduce the redundancy by a factor between two and three with respect to redundancy policies aiming for data availability. These results imply an according increase in storage capacity and decrease in time to complete backups, at the expense of longer times required to restore data. We believe this is a very reasonable price to pay, given the nature of the application. We complete our work with a discussion on practical issues, and their solutions, related to which encoding technique is more suitable to support our scheme.

preprint2014arXiv

On User Availability Prediction and Network Applications

User connectivity patterns in network applications are known to be heterogeneous, and to follow periodic (daily and weekly) patterns. In many cases, the regularity and the correlation of those patterns is problematic: for network applications, many connected users create peaks of demand; in contrast, in peer-to-peer scenarios, having few users online results in a scarcity of available resources. On the other hand, since connectivity patterns exhibit a periodic behavior, they are to some extent predictable. This work shows how this can be exploited to anticipate future user connectivity and to have applications proactively responding to it. We evaluate the probability that any given user will be online at any given time, and assess the prediction on six-month availability traces from three different Internet applications. Building upon this, we show how our probabilistic approach makes it easy to evaluate and optimize the performance in a number of diverse network application models, and to use them to optimize systems. In particular, we show how this approach can be used in distributed hash tables, friend-to-friend storage, and cache pre-loading for social networks, resulting in substantial gains in data availability and system efficiency at negligible costs.

preprint2014arXiv

OS-Assisted Task Preemption for Hadoop

This work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which introduces latency): experimental results indicate superior performance and very small overheads when compared to existing alternatives.

preprint2014arXiv

Revisiting Size-Based Scheduling with Estimated Job Sizes

We study size-based schedulers, and focus on the impact of inaccurate job size information on response time and fairness. Our intent is to revisit previous results, which allude to performance degradation for even small errors on job size estimates, thus limiting the applicability of size-based schedulers. We show that scheduling performance is tightly connected to workload characteristics: in the absence of large skew in the job size distribution, even extremely imprecise estimates suffice to outperform size-oblivious disciplines. Instead, when job sizes are heavily skewed, known size-based disciplines suffer. In this context, we show -- for the first time -- the dichotomy of over-estimation versus under-estimation. The former is, in general, less problematic than the latter, as its effects are localized to individual jobs. Instead, under-estimation leads to severe problems that may affect a large number of jobs. We present an approach to mitigate these problems: our technique requires no complex modifications to original scheduling policies and performs very well. To support our claim, we proceed with a simulation-based evaluation that covers an unprecedented large parameter space, which takes into account a variety of synthetic and real workloads. As a consequence, we show that size-based scheduling is practical and outperforms alternatives in a wide array of use-cases, even in presence of inaccurate size information.

preprint2013arXiv

Practical Size-based Scheduling for MapReduce Workloads

We present the Hadoop Fair Sojourn Protocol (HFSP) scheduler, which implements a size-based scheduling discipline for Hadoop. The benefits of size-based scheduling disciplines are well recognized in a variety of contexts (computer networks, operating systems, etc...), yet, their practical implementation for a system such as Hadoop raises a number of important challenges. With HFSP, which is available as an open-source project, we address issues related to job size estimation, resource management and study the effects of a variety of preemption strategies. Although the architecture underlying HFSP is suitable for any size-based scheduling discipline, in this work we revisit and extend the Fair Sojourn Protocol, which solves problems related to job starvation that affect FIFO, Processor Sharing and a range of size-based disciplines. Our experiments, in which we compare HFSP to standard Hadoop schedulers, pinpoint at a significant decrease in average job sojourn times - a metric that accounts for the total time a job spends in the system, including waiting and serving times - for realistic workloads that we generate according to production traces available in literature.

preprint2010arXiv

Back To The Future: On Predicting User Uptime

Correlation in user connectivity patterns is generally considered a problem for system designers, since it results in peaks of demand and also in the scarcity of resources for peer-to-peer applications. The other side of the coin is that these connectivity patterns are often predictable and that, to some extent, they can be dealt with proactively. In this work, we build predictors aiming to determine the probability that any given user will be online at any given time in the future. We evaluate the quality of these predictors on various large traces from instant messaging and file sharing applications. We also illustrate how availability prediction can be applied to enhance the behavior of peer-to-peer applications: we show through simulation how data availability is substantially increased in a distributed hash table simply by adjusting data placement policies according to peer availability prediction and without requiring any additional storage from any peer.

preprint2010arXiv

On Scheduling and Redundancy for P2P Backup

An online backup system should be quick and reliable in both saving and restoring users' data. To do so in a peer-to-peer implementation, data transfer scheduling and the amount of redundancy must be chosen wisely. We formalize the problem of exchanging multiple pieces of data with intermittently available peers, and we show that random scheduling completes transfers nearly optimally in terms of duration as long as the system is sufficiently large. Moreover, we propose an adaptive redundancy scheme that improves performance and decreases resource usage while keeping the risks of data loss low. Extensive simulations show that our techniques are effective in a realistic trace-driven scenario with heterogeneous bandwidth.

preprint2006arXiv

Rarest First and Choke Algorithms Are Enough

The performance of peer-to-peer file replication comes from its piece and peer selection strategies. Two such strategies have been introduced by the BitTorrent protocol: the rarest first and choke algorithms. Whereas it is commonly admitted that BitTorrent performs well, recent studies have proposed the replacement of the rarest first and choke algorithms in order to improve efficiency and fairness. In this paper, we use results from real experiments to advocate that the replacement of the rarest first and choke algorithms cannot be justified in the context of peer-to-peer file replication in the Internet. We instrumented a BitTorrent client and ran experiments on real torrents with different characteristics. Our experimental evaluation is peer oriented, instead of tracker oriented, which allows us to get detailed information on all exchanged messages and protocol events. We go beyond the mere observation of the good efficiency of both algorithms. We show that the rarest first algorithm guarantees close to ideal diversity of the pieces among peers. In particular, on our experiments, replacing the rarest first algorithm with source or network coding solutions cannot be justified. We also show that the choke algorithm in its latest version fosters reciprocation and is robust to free riders. In particular, the choke algorithm is fair and its replacement with a bit level tit-for-tat solution is not appropriate. Finally, we identify new areas of improvements for efficient peer-to-peer file replication protocols.

Pietro Michiardi

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning

Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios

Do Deep Neural Networks Contribute to Multivariate Time Series Anomaly Detection?

Improved optimization strategies for deep Multi-Task Networks

Safer Autonomous Driving in a Stochastic, Partially-Observable Environment by Hierarchical Contingency Planning

An Identifiable Double VAE For Disentangled Representations

Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

A Variational View on Bootstrap Ensembles as Bayesian Inference

Isotropic SGD: a Practical Approach to Bayesian Posterior Sampling

Bleach: A Distributed Stream Data Cleaning System

DiNoDB: an Interactive-speed Query Engine for Ad-hoc Queries on Temporary Data

Fast Online k-nn Graph Building

On Fair Size-Based Scheduling

PSBS: Practical Size-Based Scheduling

Adaptive Redundancy Management for Durable P2P Backup

On User Availability Prediction and Network Applications

OS-Assisted Task Preemption for Hadoop

Revisiting Size-Based Scheduling with Estimated Job Sizes

Practical Size-based Scheduling for MapReduce Workloads

Back To The Future: On Predicting User Uptime

On Scheduling and Redundancy for P2P Backup

Rarest First and Choke Algorithms Are Enough