Source author record

Pierre Baldi

Pierre Baldi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning hep-ex Artificial Intelligence hep-ph Computer Science and Game Theory Multiagent Systems Neural and Evolutionary Computing physics.data-an Computer Vision Cryptography and Security Computational Engineering, Finance, and Science Data Structures and Algorithms eess.IV math.RA physics.acc-ph physics.ao-ph physics.chem-ph physics.comp-ph Programming Languages

Catalog footprint

What is connected

30works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games

Finding approximate equilibria for large-scale imperfect-information competitive games such as StarCraft, Dota, and CounterStrike remains computationally infeasible due to sparse rewards and challenging exploration over long horizons. In this paper, we propose a multi-agent starting-state sampling strategy designed to substantially accelerate online exploration in regularized policy-gradient game methods for two-player zero-sum (2p0s) games. Motivated by an assumption that offline demonstrations from skilled humans can provide good coverage of high-level strategies relevant to equilibrium play, we propose the initialization of reinforcement learning data collection at intermediate states sampled from offline data to facilitate exploration of strategically relevant subgames. Referring to this method as Data-Augmented Game Starts (DAGS), we perform experiments using synthetic datasets and analytically tractable, long-horizon control variants of two-player Kuhn Poker, Goofspiel, and a counterexample game designed to penalize biased beliefs over hidden information. Under fixed computational budgets, DAGS enables regularized policy gradient methods to achieve lower exploitability in games with significantly more challenging exploration. We show that augmenting starting state distributions when solving imperfect information games can lead to biased equilibria, and we provide a straightforward mitigation to this in the form of multi-task observation flags. Finally, we release a new set of benchmark environments that drastically increase exploration challenges and state counts in existing OpenSpiel games while keeping exploitability measurements analytically tractable.

preprint2022arXiv

Anytime PSRO for Two-Player Zero-Sum Games

Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO), a tabular double oracle algorithm for 2-player zero-sum games that is guaranteed to converge to a Nash equilibrium while decreasing exploitability from one iteration to the next. Unlike DO, in which the restricted distribution is based on the restricted game formed by each player's strategy sets, ADO finds the restricted distribution for each player that minimizes its exploitability against any policy in the full, unrestricted game. We also propose a method of finding this restricted distribution via a no-regret algorithm updated against best responses, called RM-BR DO. Finally, we propose anytime PSRO (APSRO), a version of ADO that calculates best responses via reinforcement learning. In experiments on Leduc poker and random normal form games, we show that our methods achieve far lower exploitability than DO and PSRO and decrease exploitability monotonically.

preprint2022arXiv

Deep Learning From Four Vectors

An early example of the ability of deep networks to improve the statistical power of data collected in particle physics experiments was the demonstration that such networks operating on lists of particle momenta (four-vectors) could outperform shallow networks using features engineered with domain knowledge. A benchmark case is described, with extensions to parameterized networks. A discussion of data handling and architecture is presented, as well as a description of how to incorporate physics knowledge into the network architecture.

preprint2022arXiv

Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks

Top quarks, produced in large numbers at the Large Hadron Collider, have a complex detector signature and require special reconstruction techniques. The most common decay mode, the "all-jet" channel, results in a 6-jet final state which is particularly difficult to reconstruct in $pp$ collisions due to the large number of permutations possible. We present a novel approach to this class of problem, based on neural networks using a generalized attention mechanism, that we call Symmetry Preserving Attention Networks (SPA-Net). We train one such network to identify the decay products of each top quark unambiguously and without combinatorial explosion as an example of the power of this technique.This approach significantly outperforms existing state-of-the-art methods, correctly assigning all jets in $93.0%$ of $6$-jet, $87.8%$ of $7$-jet, and $82.6%$ of $\geq 8$-jet events respectively.

preprint2022arXiv

Resolving Extreme Jet Substructure

We study the effectiveness of theoretically-motivated high-level jet observables in the extreme context of jets with a large number of hard sub-jets (up to $N=8$). Previous studies indicate that high-level observables are powerful, interpretable tools to probe jet substructure for $N\le 3$ hard sub-jets, but that deep neural networks trained on low-level jet constituents match or slightly exceed their performance. We extend this work for up to $N=8$ hard sub-jets, using deep particle-flow networks (PFNs) and Transformer based networks to estimate a loose upper bound on the classification performance. A fully-connected neural network operating on a standard set of high-level jet observables, 135 $\textrm{N}$-subjetiness observables and jet mass, reach classification accuracy of 86.90\%, but fall short of the PFN and Transformer models, which reach classification accuracies of 89.19\% and 91.27\% respectively, suggesting that the constituent networks utilize information not captured by the set of high-level observables. We then identify additional high-level observables which are able to narrow this gap, and utilize LASSO regularization for feature selection to identify and rank the most relevant observables and provide further insights into the learning strategies used by the constituent-based neural networks. The final model contains only 31 high-level observables and is able to match the performance of the PFN and approximate the performance of the Transformer model to within 2\%.

preprint2022arXiv

Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction Representation

It is fundamental for science and technology to be able to predict chemical reactions and their properties. To achieve such skills, it is important to develop good representations of chemical reactions, or good deep learning architectures that can learn such representations automatically from the data. There is currently no universal and widely adopted method for robustly representing chemical reactions. Most existing methods suffer from one or more drawbacks, such as: (1) lacking universality; (2) lacking robustness; (3) lacking interpretability; or (4) requiring excessive manual pre-processing. Here we exploit graph-based representations of molecular structures to develop and test a hypergraph attention neural network approach to solve at once the reaction representation and property-prediction problems, alleviating the aforementioned drawbacks. We evaluate this hypergraph representation in three experiments using three independent data sets of chemical reactions. In all experiments, the hypergraph-based approach matches or outperforms other representations and their corresponding models of chemical reactions while yielding interpretable multi-level representations.

preprint2022arXiv

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

preprint2022arXiv

The Quarks of Attention

Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the fundamental building blocks of attention and their computational properties. Within the standard model of deep learning, we classify all possible fundamental building blocks of attention in terms of their source, target, and computational mechanism. We identify and study three most important mechanisms: additive activation attention, multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). The gating mechanisms correspond to multiplicative extensions of the standard model and are used across all current attention-based deep learning architectures. We study their functional properties and estimate the capacity of several attentional building blocks in the case of linear and polynomial threshold gates. Surprisingly, additive activation attention plays a central role in the proofs of the lower bounds. Attention mechanisms reduce the depth of certain basic circuits and leverage the power of quadratic activations without incurring their full cost.

preprint2022arXiv

XDO: A Double Oracle Algorithm for Extensive-Form Games

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo.

preprint2021arXiv

A theory of capacity and sparse neural encoding

Motivated by biological considerations, we study sparse neural maps from an input layer to a target layer with sparse activity, and specifically the problem of storing $K$ input-target associations $(x,y)$, or memories, when the target vectors $y$ are sparse. We mathematically prove that $K$ undergoes a phase transition and that in general, and somewhat paradoxically, sparsity in the target layers increases the storage capacity of the map. The target vectors can be chosen arbitrarily, including in random fashion, and the memories can be both encoded and decoded by networks trained using local learning rules, including the simple Hebb rule. These results are robust under a variety of statistical assumptions on the data. The proofs rely on elegant properties of random polytopes and sub-gaussian random vector variables. Open problems and connections to capacity theories and polynomial threshold maps are discussed.

preprint2021arXiv

Detecting Pulmonary Coccidioidomycosis (Valley fever) with Deep Convolutional Neural Networks

Coccidioidomycosis is the most common systemic mycosis in dogs in the southwestern United States. With warming climates, affected areas and number of cases are expected to increase in the coming years, escalating also the chances of transmission to humans. As a result, developing methods for automating the detection of the disease is important, as this will help doctors and veterinarians more easily identify and diagnose positive cases. We apply machine learning models to provide accurate and interpretable predictions of Coccidioidomycosis. We assemble a set of radiographic images and use it to train and test state-of-the-art convolutional neural networks to detect Coccidioidomycosis. These methods are relatively inexpensive to train and very fast at inference time. We demonstrate the successful application of this approach to detect the disease with an Area Under the Curve (AUC) above 0.99 using 10-fold cross validation. We also use the classification model to identify regions of interest and localize the disease in the radiographic images, as illustrated through visual heatmaps. This proof-of-concept study establishes the feasibility of very accurate and rapid automated detection of Valley Fever in radiographic images.

preprint2021arXiv

Enforcing Analytic Constraints in Neural-Networks Emulating Physical Systems

Neural networks can emulate nonlinear physical systems with high accuracy, yet they may produce physically-inconsistent results when violating fundamental constraints. Here, we introduce a systematic way of enforcing nonlinear analytic constraints in neural networks via constraints in the architecture or the loss function. Applied to convective processes for climate modeling, architectural constraints enforce conservation laws to within machine precision without degrading performance. Enforcing constraints also reduces errors in the subsets of the outputs most impacted by the constraints.

preprint2021arXiv

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots. Experiment code is available athttps://github.com/JBLanier/pipeline-psro.

preprint2021arXiv

SARM: Sparse Autoregressive Model for Scalable Generation of Sparse Images in Particle Physics

Generation of simulated data is essential for data analysis in particle physics, but current Monte Carlo methods are very computationally expensive. Deep-learning-based generative models have successfully generated simulated data at lower cost, but struggle when the data are very sparse. We introduce a novel deep sparse autoregressive model (SARM) that explicitly learns the sparseness of the data with a tractable likelihood, making it more stable and interpretable when compared to Generative Adversarial Networks (GANs) and other methods. In two case studies, we compare SARM to a GAN model and a non-sparse autoregressive model. As a quantitative measure of performance, we compute the Wasserstein distance ($W_p$) between the distributions of physical quantities calculated on the generated images and on the training images. In the first study, featuring images of jets in which 90% of the pixels are zero-valued, SARM produces images with $W_p$ scores that are 24-52% better than the scores obtained with other state-of-the-art generative models. In the second study, on calorimeter images in the vicinity of muons where 98% of the pixels are zero-valued, SARM produces images with $W_p$ scores that are 66-68% better. Similar observations made with other metrics confirm the usefulness of SARM for sparse data in particle physics. Original data and software will be made available upon acceptance of the manuscript from the UCI Machine Learning in Physics web portal at: http://mlphysics.ics.uci.edu/.

preprint2020arXiv

A Fortran-Keras Deep Learning Bridge for Scientific Computing

Implementing artificial neural networks is commonly achieved via high-level programming languages like Python and easy-to-use deep learning libraries like Keras. These software libraries come pre-loaded with a variety of network architectures, provide autodifferentiation, and support GPUs for fast and efficient computation. As a result, a deep learning practitioner will favor training a neural network model in Python, where these tools are readily available. However, many large-scale scientific computation projects are written in Fortran, making it difficult to integrate with modern deep learning methods. To alleviate this problem, we introduce a software library, the Fortran-Keras Bridge (FKB). This two-way bridge connects environments where deep learning resources are plentiful, with those where they are scarce. The paper describes several unique features offered by FKB, such as customizable layers, loss functions, and network ensembles. The paper concludes with a case study that applies FKB to address open questions about the robustness of an experimental approach to global climate simulation, in which subgrid physics are outsourced to deep neural network emulators. In this context, FKB enables a hyperparameter search of one hundred plus candidate models of subgrid cloud and radiation physics, initially implemented in Keras, to be transferred and used in Fortran. Such a process allows the model's emergent behavior to be assessed, i.e. when fit imperfections are coupled to explicit planetary-scale fluid dynamics. The results reveal a previously unrecognized strong relationship between offline validation error and online performance, in which the choice of optimizer proves unexpectedly critical. This reveals many neural network architectures that produce considerable improvements in stability including some with reduced error, for an especially challenging training dataset.

preprint2020arXiv

Continuous Representation of Molecules Using Graph Variational Autoencoder

In order to continuously represent molecules, we propose a generative model in the form of a VAE which is operating on the 2D-graph structure of molecules. A side predictor is employed to prune the latent space and help the decoder in generating meaningful adjacency tensor of molecules. Other than the potential applicability in drug design and property prediction, we show the superior performance of this technique in comparison to other similar methods based on the SMILES representation of the molecules with RNN based encoder and decoder.

preprint2020arXiv

Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter settings that perform well across random seeds. In particular, we benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions. For this we consider the Successive Halving, Random Search, and Bayesian Optimization algorithms, the latter two with and without repetitions. We apply these to tuning the PPO2 algorithm on the Cartpole balancing task and the Inverted Pendulum Swing-up task. We demonstrate that pruning may negatively affect the optimization and that repeated sampling does not help in finding hyperparameter settings that perform better across random seeds. From our experiments we conclude that Bayesian optimization with a noise robust acquisition function is the best choice for hyperparameter optimization in reinforcement learning tasks.

preprint2020arXiv

Sherpa: Robust Hyperparameter Optimization for Machine Learning

Sherpa is a hyperparameter optimization library for machine learning models. It is specifically designed for problems with computationally expensive, iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Sherpa can be run on either a single machine or in parallel on a cluster. Finally, an interactive dashboard enables users to view the progress of models as they are trained, cancel trials, and explore which hyperparameter combinations are working best. Sherpa empowers machine learning practitioners by automating the more tedious aspects of model tuning. Its source code and documentation are available at https://github.com/sherpa-ai/sherpa.

preprint2020arXiv

SPLASH: Learnable Activation Functions for Improving Accuracy and Adversarial Robustness

We introduce SPLASH units, a class of learnable activation functions shown to simultaneously improve the accuracy of deep neural networks while also improving their robustness to adversarial attacks. SPLASH units have both a simple parameterization and maintain the ability to approximate a wide range of non-linear functions. SPLASH units are: 1) continuous; 2) grounded (f(0) = 0); 3) use symmetric hinges; and 4) the locations of the hinges are derived directly from the data (i.e. no learning required). Compared to nine other learned and fixed activation functions, including ReLU and its variants, SPLASH units show superior performance across three datasets (MNIST, CIFAR-10, and CIFAR-100) and four architectures (LeNet5, All-CNN, ResNet-20, and Network-in-Network). Furthermore, we show that SPLASH units significantly increase the robustness of deep neural networks to adversarial attacks. Our experiments on both black-box and open-box adversarial attacks show that commonly-used architectures, namely LeNet5, All-CNN, ResNet-20, and Network-in-Network, can be up to 31% more robust to adversarial attacks by simply using SPLASH units instead of ReLUs.

preprint2019arXiv

Gaussian Process Accelerated Feldman-Cousins Approach for Physical Parameter Inference

The unified approach of Feldman and Cousins allows for exact statistical inference of small signals that commonly arise in high energy physics. It has gained widespread use, for instance, in measurements of neutrino oscillation parameters in long-baseline experiments. However, the approach relies on the Neyman construction of the classical confidence interval and is computationally intensive as it is typically done in a grid-based fashion over the entire parameter space. In this letter, we propose an efficient algorithm for the Feldman-Cousins approach using Gaussian processes to construct confidence intervals iteratively. We show that in the neutrino oscillation context, one can obtain confidence intervals 5 times faster in one dimension and 10 times faster in two dimensions, while maintaining an accuracy above 99.5%.

preprint2016arXiv

A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation

In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of the local variables, and then the functional form that ties them together into each learning rule. Such a framework enables also the systematic discovery of new learning rules and exploration of relationships between learning rules and group symmetries. We study polynomial local learning rules stratified by their degree and analyze their behavior and capabilities in both linear and non-linear units and networks. Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, it cannot learn complex input-output functions, even when targets are available for the top layer. Learning complex input-output functions requires local deep learning where target information is communicated to the deep layers through a backward learning channel. The nature of the communicated information about the targets and the structure of the learning channel partition the space of learning algorithms. We estimate the learning channel capacity associated with several algorithms and show that backpropagation outperforms them by simultaneously maximizing the information rate and minimizing the computational cost, even in recurrent networks. The theory clarifies the concept of Hebbian learning, establishes the power and limitations of local learning rules, introduces the learning channel which enables a formal analysis of the optimality of backpropagation, and explains the sparsity of the space of learning rules discovered so far.

preprint2016arXiv

Jet Flavor Classification in High-Energy Physics with Deep Neural Networks

Classification of jets as originating from light-flavor or heavy-flavor quarks is an important task for inferring the nature of particles produced in high-energy collisions. The large and variable dimensionality of the data provided by the tracking detectors makes this task difficult. The current state-of-the-art tools require expert data-reduction to convert the data into a fixed low-dimensional form that can be effectively managed by shallow classifiers. We study the application of deep networks to this task, attempting classification at several levels of data, starting from a raw list of tracks. We find that the highest-level lowest-dimensionality expert information sacrifices information needed for classification, that the performance of current state-of-the-art taggers can be matched or slightly exceeded by deep-network-based taggers using only track and vertex information, that classification using only lowest-level highest-dimensionality tracking information remains a difficult task for deep networks, and that adding lower-level track and vertex information to the classifiers provides a significant boost in performance compared to the state-of-the-art.

preprint2016arXiv

Jet Substructure Classification in High-Energy Physics with Deep Neural Networks

At the extreme energies of the Large Hadron Collider, massive particles can be produced at such high velocities that their hadronic decays are collimated and the resulting jets overlap. Deducing whether the substructure of an observed jet is due to a low-mass single particle or due to multiple decay objects of a massive particle is an important problem in the analysis of collider data. Traditional approaches have relied on expert features designed to detect energy deposition patterns in the calorimeter, but the complexity of the data make this task an excellent candidate for the application of machine learning tools. The data collected by the detector can be treated as a two-dimensional image, lending itself to the natural application of image classification techniques. In this work, we apply deep neural networks with a mixture of locally-connected and fully-connected nodes. Our experiments demonstrate that without the aid of expert features, such networks match or modestly outperform the current state-of-the-art approach for discriminating between jets from single hadronic particles and overlapping jets from pairs of collimated hadronic particles, and that such performance gains persist in the presence of pileup interactions.

preprint2016arXiv

Parameterized Machine Learning for High-Energy Physics

We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual values. This simplifies the training process and gives improved performance at intermediate values, even for complex problems requiring deep learning. Applications include tools parameterized in terms of theoretical model parameters, such as the mass of a particle, which allow for a single network to provide improved discrimination across a range of masses. This concept is simple to implement and allows for optimized interpolatable results.

preprint2015arXiv

Learning Activation Functions to Improve Deep Neural Networks

Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

preprint2014arXiv

Complex-Valued Autoencoders

Autoencoders are unsupervised machine learning circuits whose learning goal is to minimize a distortion measure between inputs and outputs. Linear autoencoders can be defined over any field and only real-valued linear autoencoder have been studied so far. Here we study complex-valued linear autoencoders where the components of the training vectors and adjustable matrices are defined over the complex field with the $L_2$ norm. We provide simpler and more general proofs that unify the real-valued and complex-valued cases, showing that in both cases the landscape of the error function is invariant under certain groups of transformations. The landscape has no local minima, a family of global minima associated with Principal Component Analysis, and many families of saddle points associated with orthogonal projections onto sub-space spanned by sub-optimal subsets of eigenvectors of the covariance matrix. The theory yields several iterative, convergent, learning algorithms, a clear understanding of the generalization properties of the trained autoencoders, and can equally be applied to the hetero-associative case when external targets are provided. Partial results on deep architecture as well as the differential geometry of autoencoders are also presented. The general framework described here is useful to classify autoencoders and identify general common properties that ought to be investigated for each class, illuminating some of the connections between information theory, unsupervised learning, clustering, Hebbian learning, and autoencoders.

preprint2014arXiv

Enhanced Higgs to $τ^+τ^-$ Searches with Deep Learning

The Higgs boson is thought to provide the interaction that imparts mass to the fundamental fermions, but while measurements at the Large Hadron Collider (LHC) are consistent with this hypothesis, current analysis techniques lack the statistical power to cross the traditional 5$σ$ significance barrier without more data. \emph{Deep learning} techniques have the potential to increase the statistical power of this analysis by \emph{automatically} learning complex, high-level data representations. In this work, deep neural networks are used to detect the decay of the Higgs to a pair of tau leptons. A Bayesian optimization algorithm is used to tune the network architecture and training algorithm hyperparameters, resulting in a deep network of eight non-linear processing layers that improves upon the performance of shallow classifiers even without the use of features specifically engineered by physicists for this application. The improvement in discovery significance is equivalent to an increase in the accumulated dataset of 25\%.

preprint2014arXiv

Searching for Exotic Particles in High-Energy Physics with Deep Learning

Collisions at high-energy particle colliders are a traditionally fruitful source of exotic particle discoveries. Finding these rare particles requires solving difficult signal-versus-background classification problems, hence machine learning approaches are often used. Standard approaches have relied on `shallow' machine learning models that have a limited capacity to learn complex non-linear functions of the inputs, and rely on a pain-staking search through manually constructed non-linear features. Progress on this problem has slowed, as a variety of techniques have shown equivalent performance. Recent advances in the field of deep learning make it possible to learn more complex functions and better discriminate between signal and background classes. Using benchmark datasets, we show that deep learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8\% over the best current approaches. This demonstrates that deep learning approaches can improve the power of collider searches for exotic particles.

preprint2011arXiv

Countering Gattaca: Efficient and Secure Testing of Fully-Sequenced Human Genomes (Full Version)

Recent advances in DNA sequencing technologies have put ubiquitous availability of fully sequenced human genomes within reach. It is no longer hard to imagine the day when everyone will have the means to obtain and store one's own DNA sequence. Widespread and affordable availability of fully sequenced genomes immediately opens up important opportunities in a number of health-related fields. In particular, common genomic applications and tests performed in vitro today will soon be conducted computationally, using digitized genomes. New applications will be developed as genome-enabled medicine becomes increasingly preventive and personalized. However, this progress also prompts significant privacy challenges associated with potential loss, theft, or misuse of genomic data. In this paper, we begin to address genomic privacy by focusing on three important applications: Paternity Tests, Personalized Medicine, and Genetic Compatibility Tests. After carefully analyzing these applications and their privacy requirements, we propose a set of efficient techniques based on private set operations. This allows us to implement in in silico some operations that are currently performed via in vitro methods, in a secure fashion. Experimental results demonstrate that proposed techniques are both feasible and practical today.

preprint2011arXiv

Privacy-Enhanced Methods for Comparing Compressed DNA Sequences

In this paper, we study methods for improving the efficiency and privacy of compressed DNA sequence comparison computations, under various querying scenarios. For instance, one scenario involves a querier, Bob, who wants to test if his DNA string, $Q$, is close to a DNA string, $Y$, owned by a data owner, Alice, but Bob does not want to reveal $Q$ to Alice and Alice is willing to reveal $Y$ to Bob \emph{only if} it is close to $Q$. We describe a privacy-enhanced method for comparing two compressed DNA sequences, which can be used to achieve the goals of such a scenario. Our method involves a reduction to set differencing, and we describe a privacy-enhanced protocol for set differencing that achieves absolute privacy for Bob (in the information theoretic sense), and a quantifiable degree of privacy protection for Alice. One of the important features of our protocols, which makes them ideally suited to privacy-enhanced DNA sequence comparison problems, is that the communication complexity of our solutions is proportional to a threshold that bounds the cardinality of the set differences that are of interest, rather than the cardinality of the sets involved (which correlates to the length of the DNA sequences). Moreover, in our protocols, the querier, Bob, can easily compute the set difference only if its cardinality is close to or below a specified threshold.

Pierre Baldi

What is connected

Connect this record

See the researcher in context

Building this map preview

30 published item(s)

Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games

Anytime PSRO for Two-Player Zero-Sum Games

Deep Learning From Four Vectors

Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks

Resolving Extreme Jet Substructure

Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction Representation

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

The Quarks of Attention

XDO: A Double Oracle Algorithm for Extensive-Form Games

A theory of capacity and sparse neural encoding

Detecting Pulmonary Coccidioidomycosis (Valley fever) with Deep Convolutional Neural Networks

Enforcing Analytic Constraints in Neural-Networks Emulating Physical Systems

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

SARM: Sparse Autoregressive Model for Scalable Generation of Sparse Images in Particle Physics

A Fortran-Keras Deep Learning Bridge for Scientific Computing

Continuous Representation of Molecules Using Graph Variational Autoencoder

Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

Sherpa: Robust Hyperparameter Optimization for Machine Learning

SPLASH: Learnable Activation Functions for Improving Accuracy and Adversarial Robustness

Gaussian Process Accelerated Feldman-Cousins Approach for Physical Parameter Inference

A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation

Jet Flavor Classification in High-Energy Physics with Deep Neural Networks

Jet Substructure Classification in High-Energy Physics with Deep Neural Networks

Parameterized Machine Learning for High-Energy Physics

Learning Activation Functions to Improve Deep Neural Networks

Complex-Valued Autoencoders

Enhanced Higgs to $τ^+τ^-$ Searches with Deep Learning

Searching for Exotic Particles in High-Energy Physics with Deep Learning

Countering Gattaca: Efficient and Secure Testing of Fully-Sequenced Human Genomes (Full Version)

Privacy-Enhanced Methods for Comparing Compressed DNA Sequences