Researcher profile

Pedro M. Q. Aguiar

Pedro M. Q. Aguiar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Sparse Continuous Distributions and Fenchel-Young Losses

Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $Ω$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $Ω$ is a Tsallis negentropy with parameter $α$, we obtain ``deformed exponential families,'' which include $α$-entmax and sparsemax ($α=2$) as particular cases. For quadratic energy functions, the resulting densities are $β$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $Ω$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $α\in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

preprint2020arXiv

Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization

The majority of current object detectors lack context: class predictions are made independently from other detections. We propose to incorporate context in object detection by post-processing the output of an arbitrary detector to rescore the confidences of its detections. Rescoring is done by conditioning on contextual information from the entire set of detections: their confidences, predicted classes, and positions. We show that AP can be improved by simply reassigning the detection confidence values such that true positives that survive longer (i.e., those with the correct class and large IoU) are scored higher than false positives or detections with small IoU. In this setting, we use a bidirectional RNN with attention for contextual rescoring and introduce a training target that uses the IoU with ground truth to maximize AP for the given set of detections. The fact that our approach does not require access to visual features makes it computationally inexpensive and agnostic to the detection architecture. In spite of this simplicity, our model consistently improves AP over strong pre-trained baselines (Cascade R-CNN and Faster R-CNN with several backbones), particularly by reducing the confidence of duplicate detections (a learned form of non-maximum suppression) and removing out-of-context objects by conditioning on the confidences, classes, positions, and sizes of the co-occurrent detections. Code is available at https://github.com/LourencoVazPato/seeing-without-looking/

preprint2012arXiv

Alternating Directions Dual Decomposition

We propose AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs based on the alternating directions method of multipliers. Like dual decomposition algorithms, AD3 uses worker nodes to iteratively solve local subproblems and a controller node to combine these local solutions into a global update. The key characteristic of AD3 is that each local subproblem has a quadratic regularizer, leading to a faster consensus than subgradient-based dual decomposition, both theoretically and in practice. We provide closed-form solutions for these AD3 subproblems for binary pairwise factors and factors imposing first-order logic constraints. For arbitrary factors (large or combinatorial), we introduce an active set method which requires only an oracle for computing a local MAP configuration, making AD3 applicable to a wide range of problems. Experiments on synthetic and realworld problems show that AD3 compares favorably with the state-of-the-art.

preprint2012arXiv

Distributed Basis Pursuit

We propose a distributed algorithm for solving the optimization problem Basis Pursuit (BP). BP finds the least L1-norm solution of the underdetermined linear system Ax = b and is used, for example, in compressed sensing for reconstruction. Our algorithm solves BP on a distributed platform such as a sensor network, and is designed to minimize the communication between nodes. The algorithm only requires the network to be connected, has no notion of a central processing node, and no node has access to the entire matrix A at any time. We consider two scenarios in which either the columns or the rows of A are distributed among the compute nodes. Our algorithm, named D-ADMM, is a decentralized implementation of the alternating direction method of multipliers. We show through numerical simulation that our algorithm requires considerably less communications between the nodes than the state-of-the-art algorithms.

preprint2011arXiv

A Proof of Convergence For the Alternating Direction Method of Multipliers Applied to Polyhedral-Constrained Functions

We give a general proof of convergence for the Alternating Direction Method of Multipliers (ADMM). ADMM is an optimization algorithm that has recently become very popular due to its capabilities to solve large-scale and/or distributed problems. We prove that the sequence generated by ADMM converges to an optimal primal-dual optimal solution. We assume the functions f and g, defining the cost f(x) + g(y), are real-valued, but constrained to lie on polyhedral sets X and Y. Our proof is an extension of the proofs from [Bertsekas97, Boyd11].

preprint2010arXiv

3-D Rigid Models from Partial Views - Global Factorization

The so-called factorization methods recover 3-D rigid structure from motion by factorizing an observation matrix that collects 2-D projections of features. These methods became popular due to their robustness - they use a large number of views, which constrains adequately the solution - and computational simplicity - the large number of unknowns is computed through an SVD, avoiding non-linear optimization. However, they require that all the entries of the observation matrix are known. This is unlikely to happen in practice, due to self-occlusion and limited field of view. Also, when processing long videos, regions that become occluded often appear again later. Current factorization methods process these as new regions, leading to less accurate estimates of 3-D structure. In this paper, we propose a global factorization method that infers complete 3-D models directly from the 2-D projections in the entire set of available video frames. Our method decides whether a region that has become visible is a region that was seen before, or a previously unseen region, in a global way, i.e., by seeking the simplest rigid object that describes well the entire set of observations. This global approach increases significantly the accuracy of the estimates of the 3-D shape of the scene and the 3-D motion of the camera. Experiments with artificial and real videos illustrate the good performance of our method.

preprint2010arXiv

Alternatives to speech in low bit rate communication systems

This paper describes a framework and a method with which speech communication can be analyzed. The framework consists of a set of low bit rate, short-range acoustic communication systems, such as speech, but that are quite different from speech. The method is to systematically compare these systems according to different objective functions such as data rate, computational overhead, psychoacoustic effects and semantics. One goal of this study is to better understand the nature of human communication. Another goal is to identify acoustic communication systems that are more efficient than human speech for some specific purposes.

preprint2010arXiv

ANSIG - An Analytic Signature for Arbitrary 2D Shapes (or Bags of Unlabeled Points)

In image analysis, many tasks require representing two-dimensional (2D) shape, often specified by a set of 2D points, for comparison purposes. The challenge of the representation is that it must not only capture the characteristics of the shape but also be invariant to relevant transformations. Invariance to geometric transformations, such as translation, rotation, and scale, has received attention in the past, usually under the assumption that the points are previously labeled, i.e., that the shape is characterized by an ordered set of landmarks. However, in many practical scenarios, the points describing the shape are obtained from automatic processes, e.g., edge or corner detection, thus without labels or natural ordering. Obviously, the combinatorial problem of computing the correspondences between the points of two shapes in the presence of the aforementioned geometrical distortions becomes a quagmire when the number of points is large. We circumvent this problem by representing shapes in a way that is invariant to the permutation of the landmarks, i.e., we represent bags of unlabeled 2D points. Within our framework, a shape is mapped to an analytic function on the complex plane, leading to what we call its analytic signature (ANSIG). To store an ANSIG, it suffices to sample it along a closed contour in the complex plane. We show that the ANSIG is a maximal invariant with respect to the permutation group, i.e., that different shapes have different ANSIGs and shapes that differ by a permutation (or re-labeling) of the landmarks have the same ANSIG. We further show how easy it is to factor out geometric transformations when comparing shapes using the ANSIG representation. Finally, we illustrate these capabilities with shape-based image classification experiments.

preprint2010arXiv

Maximum Likelihood Mosaics

The majority of the approaches to the automatic recovery of a panoramic image from a set of partial views are suboptimal in the sense that the input images are aligned, or registered, pair by pair, e.g., consecutive frames of a video clip. These approaches lead to propagation errors that may be very severe, particularly when dealing with videos that show the same region at disjoint time intervals. Although some authors have proposed a post-processing step to reduce the registration errors in these situations, there have not been attempts to compute the optimal solution, i.e., the registrations leading to the panorama that best matches the entire set of partial views}. This is our goal. In this paper, we use a generative model for the partial views of the panorama and develop an algorithm to compute in an efficient way the Maximum Likelihood estimate of all the unknowns involved: the parameters describing the alignment of all the images and the panorama itself.

preprint2010arXiv

Online Multiple Kernel Learning for Structured Prediction

Despite the recent progress towards efficient multiple kernel learning (MKL), the structured output case remains an open research front. Current approaches involve repeatedly solving a batch learning problem, which makes them inadequate for large scale scenarios. We propose a new family of online proximal algorithms for MKL (as well as for group-lasso and variants thereof), which overcomes that drawback. We show regret, convergence, and generalization bounds for the proposed method. Experiments on handwriting recognition and dependency parsing testify for the successfulness of the approach.