Researcher profile

Rodrigo Fonseca

Rodrigo Fonseca contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search

High dimensional black-box optimization has broad applications but remains a challenging problem to solve. Given a set of samples $\{\vx_i, y_i\}$, building a global model (like Bayesian Optimization (BO)) suffers from the curse of dimensionality in the high-dimensional search space, while a greedy search may lead to sub-optimality. By recursively splitting the search space into regions with high/low function values, recent works like LaNAS shows good performance in Neural Architecture Search (NAS), reducing the sample complexity empirically. In this paper, we coin LA-MCTS that extends LaNAS to other domains. Unlike previous approaches, LA-MCTS learns the partition of the search space using a few samples and their function values in an online fashion. While LaNAS uses linear partition and performs uniform sampling in each region, our LA-MCTS adopts a nonlinear decision boundary and learns a local model to pick good candidates. If the nonlinear partition function and the local model fits well with ground-truth black-box function, then good partitions and candidates can be reached with much fewer samples. LA-MCTS serves as a \emph{meta-algorithm} by using existing black-box optimizers (e.g., BO, TuRBO) as its local models, achieving strong performance in general black-box optimization and reinforcement learning benchmarks, in particular for high-dimensional problems.

preprint2021arXiv

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major strategies to reduce communication overhead: one is to hide communication by overlapping it with computation, and the other is to reduce message sizes. The first solution works well for linear neural architectures, but latest networks such as ResNet and Inception offer limited opportunity for this overlapping. Therefore, researchers have paid more attention to minimizing communication. In this paper, we present a novel gradient compression framework derived from insights of real gradient distributions, and which strikes a balance between compression ratio, accuracy, and computational overhead. Our framework has two major novel components: sparsification of gradients in the frequency domain, and a range-based floating point representation to quantize and further compress gradients frequencies. Both components are dynamic, with tunable parameters that achieve different compression ratio based on the accuracy requirement and systems' platforms, and achieve very high throughput on GPUs. We prove that our techniques guarantee the convergence with a diminishing compression ratio. Our experiments show that the proposed compression framework effectively improves the scalability of most popular neural networks on a 32 GPU cluster to the baseline of no compression, without compromising the accuracy and convergence speed.

preprint2020arXiv

FITing-Tree: A Data-aware Index Structure

Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can often become prohibitive and consume valuable system resources. In fact, a recent study showed that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a modern DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space available to store new data or process existing data. In this paper, we present FITing-Tree, a novel form of a learned index which uses piece-wise linear functions with a bounded error specified at construction time. This error knob provides a tunable parameter that allows a DBA to FIT an index to a dataset and workload by being able to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps determine an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.

preprint2020arXiv

Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider

Function as a Service (FaaS) has been gaining popularity as a way to deploy computations to serverless backends in the cloud. This paradigm shifts the complexity of allocating and provisioning resources to the cloud provider, which has to provide the illusion of always-available resources (i.e., fast function invocations without cold starts) at the lowest possible resource cost. Doing so requires the provider to deeply understand the characteristics of the FaaS workload. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we first characterize the entire production FaaS workload of Azure Functions. We show for example that most functions are invoked very infrequently, but there is an 8-order-of-magnitude range of invocation frequencies. Using observations from our characterization, we then propose a practical resource management policy that significantly reduces the number of function coldstarts,while spending fewerresources than state-of-the-practice policies.