Researcher profile

Thiago Serra

Thiago Serra contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

Getting Away with More Network Pruning: From Sparsity to Geometry and Linear Regions

One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the linear regions defined by a neural network, and consequently reduces the expected maximum number of linear regions based on the architecture. We observe that pruning affects accuracy similarly to how sparsity affects the number of linear regions and our proposed bound for the maximum number. Conversely, we find out that selecting the sparsity across layers to maximize our bound very often improves accuracy in comparison to pruning as much with the same sparsity in all layers, thereby providing us guidance on where to prune.

preprint2022arXiv

Optimal Decision Diagrams for Classification

Decision diagrams for classification have some notable advantages over decision trees, as their internal connections can be determined at training time and their width is not bound to grow exponentially with their depth. Accordingly, decision diagrams are usually less prone to data fragmentation in internal nodes. However, the inherent complexity of training these classifiers acted as a long-standing barrier to their widespread adoption. In this context, we study the training of optimal decision diagrams (ODDs) from a mathematical programming perspective. We introduce a novel mixed-integer linear programming model for training and demonstrate its applicability for many datasets of practical importance. Further, we show how this model can be easily extended for fairness, parsimony, and stability notions. We present numerical analyses showing that our model allows training ODDs in short computational times, and that ODDs achieve better accuracy than optimal decision trees, while allowing for improved stability without significant accuracy losses.

preprint2022arXiv

Seamless Multimodal Transportation Scheduling

Ride-hailing services have expanded the role of shared mobility in passenger transportation systems, creating new markets and creative planning solutions for major urban centers. In this paper, we consider their use for the first-mile or last-mile passenger transportation in coordination with a mass transit service to provide a seamless multimodal transportation experience for the user. A system that provides passengers with predictable information on travel and waiting times in their commutes is immensely valuable. We envision that the passengers will inform the system of their desired travel and arrival windows so that the system can jointly optimize the schedules of passengers. The problem we study balances minimizing travel time and the number of trips taken by the last-mile vehicles, so that long-term planning, maintenance, and environmental impact are all taken into account. We focus on the case where the last-mile service aggregates passengers by destination. We show that this problem is NP-hard, and propose a decision diagram-based branch-and-price decomposition model that can solve instances of real-world size (10,000 passengers spread over an hour, 50 last-mile destinations, 600 last-mile vehicles) in computational time (~1 minute) that is orders-of-magnitude faster than other methods appearing in the literature. Our experiments also indicate that aggregating passengers by destination on the last-mile service provides high-quality solutions to more general settings.

preprint2022arXiv

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

Neural networks tend to achieve better accuracy with training if they are larger -- even if the resulting models are overparameterized. Nevertheless, carefully removing such excess parameters before, during, or after training may also produce models with similar or even improved accuracy. In many cases, that can be curiously achieved by heuristics as simple as removing a percentage of the weights with the smallest absolute value -- even though magnitude is not a perfect proxy for weight relevance. With the premise that obtaining significantly better performance from pruning depends on accounting for the combined effect of removing multiple weights, we revisit one of the classic approaches for impact-based pruning: the Optimal Brain Surgeon(OBS). We propose a tractable heuristic for solving the combinatorial extension of OBS, in which we select weights for simultaneous removal, as well as a systematic update of the remaining weights. Our selection method outperforms other methods under high sparsity, and the weight update is advantageous even when combined with the other methods.

preprint2022arXiv

Training Thinner and Deeper Neural Networks: Jumpstart Regularization

Neural networks are more expressive when they have multiple layers. In turn, conventional training methods are only successful if the depth does not lead to numerical issues such as exploding or vanishing gradients, which occur less frequently when the layers are sufficiently wide. However, increasing width to attain greater depth entails the use of heavier computational resources and leads to overparameterized models. These subsequent issues have been partially addressed by model compression methods such as quantization and pruning, some of which relying on normalization-based regularization of the loss function to make the effect of most parameters negligible. In this work, we propose instead to use regularization for preventing neurons from dying or becoming linear, a technique which we denote as jumpstart regularization. In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.

preprint2021arXiv

Template-based Minor Embedding for Adiabatic Quantum Optimization

Quantum Annealing (QA) can be used to quickly obtain near-optimal solutions for Quadratic Unconstrained Binary Optimization (QUBO) problems. In QA hardware, each decision variable of a QUBO should be mapped to one or more adjacent qubits in such a way that pairs of variables defining a quadratic term in the objective function are mapped to some pair of adjacent qubits. However, qubits have limited connectivity in existing QA hardware. This has spurred work on preprocessing algorithms for embedding the graph representing problem variables with quadratic terms into the hardware graph representing qubits adjacencies, such as the Chimera graph in hardware produced by D-Wave Systems. In this paper, we use integer linear programming to search for an embedding of the problem graph into certain classes of minors of the Chimera graph, which we call template embeddings. One of these classes corresponds to complete bipartite graphs, for which we show the limitation of the existing approach based on minimum Odd Cycle Transversals (OCTs). One of the formulations presented is exact, and thus can be used to certify the absence of a minor embedding using that template. On an extensive test set consisting of random graphs from five different classes of varying size and sparsity, we can embed more graphs than a state-of-the-art OCT-based approach, our approach scales better with the hardware size, and the runtime is generally orders of magnitude smaller.

preprint2020arXiv

Lossless Compression of Deep Neural Networks

Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition, where large neural networks are often used to obtain good accuracy. Consequently, it is challenging to deploy these networks under limited computational resources, such as in mobile devices. In this work, we introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced, which thus implies a lossless compression. This algorithm, which we denote as LEO (Lossless Expressiveness Optimization), relies on Mixed-Integer Linear Programming (MILP) to identify Rectified Linear Units (ReLUs) with linear behavior over the input domain. By using L1 regularization to induce such behavior, we can benefit from training over a larger architecture than we would later use in the environment where the trained neural network is deployed.

preprint2020arXiv

When Lift-and-Project Cuts are Different

In this paper, we present a method to determine if a lift-and-project cut for a mixed-integer linear program is irregular, in which case the cut is not equivalent to any intersection cut from the bases of the linear relaxation. This is an important question due to the intense research activity for the past decade on cuts from multiple rows of simplex tableau as well as on lift-and-project cuts from non-split disjunctions. While it is known since Balas and Perregaard (2003) that lift-and-project cuts from split disjunctions are always equivalent to intersection cuts and consequently to such multi-row cuts, Balas and Kis (2016) have recently shown that there is a necessary and sufficient condition in the case of arbitrary disjunctions: a lift-and-project cut is regular if, and only if, it corresponds to a regular basic solution of the Cut Generating Linear Program (CGLP). This paper has four contributions. First, we state a result that simplifies the verification of regularity for basic CGLP solutions from Balas and Kis (2016). Second, we provide a mixed-integer formulation that checks whether there is a regular CGLP solution for a given cut that is regular in a broader sense, which also encompasses irregular cuts that are implied by the regular cut closure. Third, we describe a numerical procedure based on such formulation that identifies irregular lift-and-project cuts. Finally, we use this method to evaluate how often lift-and-project cuts from simple $t$-branch split disjunctions are irregular, and thus not equivalent to multi-row cuts, on 74 instances of the MIPLIB benchmarks.