Source author record

José Bento

José Bento appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Artificial Intelligence Data Structures and Algorithms Information Retrieval Information Theory math.IT math.ST Statistics Theory Computer Science and Game Theory cond-mat.stat-mech Distributed, Parallel, and Cluster Computing Mathematical Software physics.comp-ph physics.data-an Populations and Evolution q-fin.ST Robotics stat.OT

Catalog footprint

What is connected

12works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Constraints on the perfect phylogeny mixture model and their effect on reducing degeneracy

The perfect phylogeny mixture (PPM) model is useful due to its simplicity and applicability in scenarios where mutations can be assumed to accumulate monotonically over time. It is the underlying model in many tools that have been used, for example, to infer phylogenetic trees for tumor evolution and reconstruction. Unfortunately, the PPM model gives rise to substantial ambiguity -- in that many different phylogenetic trees can explain the same observed data -- even in the idealized setting where data are observed perfectly, i.e. fully and without noise. This ambiguity has been studied in this perfect setting by Pradhan et al. 2018, which proposed a procedure to bound the number of solutions given a fixed instance of observation data. Beyond this, studies have been primarily empirical. Recent work (Myers et al. 2019) proposed adding extra constraints to the PPM model to tackle ambiguity. In this paper, we first show that the extra constraints of Myers et al. 2019, called longitudinal constraints (LC), often fail to reduce the number of distinct trees that explain the observations. We then propose novel alternative constraints to limit solution ambiguity and study their impact when the data are observed perfectly. Unlike the analysis in Pradhan et al. 2018, our theoretical results regarding both the inefficacy of the LC and the extent to which our new constrains reduce ambiguity are not tied to a single observation instance. Rather, our theorems hold over large ensembles of possible inference problems. To the best of our knowledge, we are the first to study degeneracy in the PPM model in this ensemble-based theoretical framework.

preprint2020arXiv

Distributed Optimization, Averaging via ADMM, and Network Topology

There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of datasets and model complexity in modern machine learning applications. Scalable solvers often distribute the computation over a network of processing units. For simple algorithms such as gradient descent the dependency of the convergence time with the topology of this network is well-known. However, for more involved algorithms such as the Alternating Direction Methods of Multipliers (ADMM) much less is known. At the heart of many distributed optimization algorithms there exists a gossip subroutine which averages local information over the network, and whose efficiency is crucial for the overall performance of the method. In this paper we review recent research in this area and, with the goal of isolating such a communication exchange behaviour, we compare different algorithms when applied to a canonical distributed averaging consensus problem. We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence and optimal parameter tuning in terms of spectral properties of the network. Finally, we empirically study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.

preprint2016arXiv

Testing fine-grained parallelism for the ADMM on a factor-graph

There is an ongoing effort to develop tools that apply distributed computational resources to tackle large problems or reduce the time to solve them. In this context, the Alternating Direction Method of Multipliers (ADMM) arises as a method that can exploit distributed resources like the dual ascent method and has the robustness and improved convergence of the augmented Lagrangian method. Traditional approaches to accelerate the ADMM using multiple cores are problem-specific and often require multi-core programming. By contrast, we propose a problem-independent scheme of accelerating the ADMM that does not require the user to write any parallel code. We show that this scheme, an interpretation of the ADMM as a message-passing algorithm on a factor-graph, can automatically exploit fine-grained parallelism both in GPUs and shared-memory multi-core computers and achieves significant speedup in such diverse application domains as combinatorial optimization, machine learning, and optimal control. Specifically, we obtain 10-18x speedup using a GPU, and 5-9x using multiple CPU cores, over a serial, optimized C-version of the ADMM, which is similar to the typical speedup reported for existing GPU-accelerated libraries, including cuFFT (19x), cuBLAS (17x), and cuRAND (8x).

preprint2015arXiv

Proximal operators for multi-agent path planning

We address the problem of planning collision-free paths for multiple agents using optimization methods known as proximal algorithms. Recently this approach was explored in Bento et al. 2013, which demonstrated its ease of parallelization and decentralization, the speed with which the algorithms generate good quality solutions, and its ability to incorporate different proximal operators, each ensuring that paths satisfy a desired property. Unfortunately, the operators derived only apply to paths in 2D and require that any intermediate waypoints we might want agents to follow be preassigned to specific agents, limiting their range of applicability. In this paper we resolve these limitations. We introduce new operators to deal with agents moving in arbitrary dimensions that are faster to compute than their 2D predecessors and we introduce landmarks, space-time positions that are automatically assigned to the set of agents under different optimality criteria. Finally, we report the performance of the new operators in several numerical experiments.

preprint2015arXiv

The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning

We describe a new instance-based learning algorithm called the Boundary Forest (BF) algorithm, that can be used for supervised and unsupervised learning. The algorithm builds a forest of trees whose nodes store previously seen examples. It can be shown data points one at a time and updates itself incrementally, hence it is naturally online. Few instance-based algorithms have this property while being simultaneously fast, which the BF is. This is crucial for applications where one needs to respond to input data in real time. The number of children of each node is not set beforehand but obtained from the training procedure, which makes the algorithm very flexible with regards to what data manifolds it can learn. We test its generalization performance and speed on a range of benchmark datasets and detail in which settings it outperforms the state of the art. Empirically we find that training time scales as O(DNlog(N)) and testing as O(Dlog(N)), where D is the dimensionality and N the amount of data,

preprint2014arXiv

A Time and Space Efficient Algorithm for Contextual Linear Bandits

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve $O(\log T)$ regret after $T$ time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with $T$ or achieve regrets that grow linearly with the number of contexts $|\myset{X}|$. We propose an $ε$-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in $\reals^d$, we prove that our algorithm has a constant computation complexity per iteration of $O(poly(d))$ and can achieve a regret of $O(poly(d) \log T)$ even when $|\myset{X}| = Ω(2^d) $. In addition, unlike previous algorithms, its space complexity scales like $O(Kd^2)$ and does not grow with $T$.

preprint2013arXiv

An Improved Three-Weight Message-Passing Algorithm

We describe how the powerful "Divide and Concur" algorithm for constraint satisfaction can be derived as a special case of a message-passing version of the Alternating Direction Method of Multipliers (ADMM) algorithm for convex optimization, and introduce an improved message-passing algorithm based on ADMM/DC by introducing three distinct weights for messages, with "certain" and "no opinion" weights, as well as the standard weight used in ADMM/DC. The "certain" messages allow our improved algorithm to implement constraint propagation as a special case, while the "no opinion" messages speed convergence for some problems by making the algorithm focus only on active constraints. We describe how our three-weight version of ADMM/DC can give greatly improved performance for non-convex problems such as circle packing and solving large Sudoku puzzles, while retaining the exact performance of ADMM for convex problems. We also describe the advantages of our algorithm compared to other message-passing algorithms based upon belief propagation.

preprint2013arXiv

Methods for Integrating Knowledge with the Three-Weight Optimization Algorithm for Hybrid Cognitive Processing

In this paper we consider optimization as an approach for quickly and flexibly developing hybrid cognitive capabilities that are efficient, scalable, and can exploit knowledge to improve solution speed and quality. In this context, we focus on the Three-Weight Algorithm, which aims to solve general optimization problems. We propose novel methods by which to integrate knowledge with this algorithm to improve expressiveness, efficiency, and scaling, and demonstrate these techniques on two example problems (Sudoku and circle packing).

preprint2012arXiv

Identifying Users From Their Rating Patterns

This paper reports on our analysis of the 2011 CAMRa Challenge dataset (Track 2) for context-aware movie recommendation systems. The train dataset comprises 4,536,891 ratings provided by 171,670 users on 23,974$ movies, as well as the household groupings of a subset of the users. The test dataset comprises 5,450 ratings for which the user label is missing, but the household label is provided. The challenge required to identify the user labels for the ratings in the test set. Our main finding is that temporal information (time labels of the ratings) is significantly more useful for achieving this objective than the user preferences (the actual ratings). Using a model that leverages on this fact, we are able to identify users within a known household with an accuracy of approximately 96% (i.e. misclassification rate around 4%).

preprint2011arXiv

Information Theoretic Limits on Learning Stochastic Differential Equations

Consider the problem of learning the drift coefficient of a stochastic differential equation from a sample path. In this paper, we assume that the drift is parametrized by a high dimensional vector. We address the question of how long the system needs to be observed in order to learn this vector of parameters. We prove a general lower bound on this time complexity by using a characterization of mutual information as time integral of conditional variance, due to Kadota, Zakai, and Ziv. This general lower bound is applied to specific classes of linear and non-linear stochastic differential equations. In the linear case, the problem under consideration is the one of learning a matrix of interaction coefficients. We evaluate our lower bound for ensembles of sparse and dense random matrices. The resulting estimates match the qualitative behavior of upper bounds achieved by computationally efficient procedures.

preprint2011arXiv

On the trade-off between complexity and correlation decay in structural learning algorithms

We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that low-complexity algorithms often fail when the Markov random field develops long-range correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it).

preprint2010arXiv

Learning Networks of Stochastic Differential Equations

We consider linear models for stochastic dynamics. To any such model can be associated a network (namely a directed graph) describing which degrees of freedom interact under the dynamics. We tackle the problem of learning such a network from observation of the system trajectory over a time interval $T$. We analyze the $\ell_1$-regularized least squares algorithm and, in the setting in which the underlying network is sparse, we prove performance guarantees that are \emph{uniform in the sampling rate} as long as this is sufficiently high. This result substantiates the notion of a well defined `time complexity' for the network inference problem.

José Bento

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Constraints on the perfect phylogeny mixture model and their effect on reducing degeneracy

Distributed Optimization, Averaging via ADMM, and Network Topology

Testing fine-grained parallelism for the ADMM on a factor-graph

Proximal operators for multi-agent path planning

The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning

A Time and Space Efficient Algorithm for Contextual Linear Bandits

An Improved Three-Weight Message-Passing Algorithm

Methods for Integrating Knowledge with the Three-Weight Optimization Algorithm for Hybrid Cognitive Processing

Identifying Users From Their Rating Patterns

Information Theoretic Limits on Learning Stochastic Differential Equations

On the trade-off between complexity and correlation decay in structural learning algorithms

Learning Networks of Stochastic Differential Equations