Source author record

Antonio Rodríguez-Sánchez

Antonio Rodríguez-Sánchez appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-ph Computer Vision Machine Learning Artificial Intelligence hep-ex

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization

Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow networks generalize better than deep networks, and the addition of more layers results in higher training and test errors. The deep residual learning framework addresses this degradation problem by adding skip connections to several neural network layers. It would at first seem counter-intuitive that such skip connections are needed to train deep networks successfully as the expressivity of a network would grow exponentially with depth. In this paper, we first analyze the flow of information through neural networks. We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We prove empirically and theoretically that a positive batch-entropy is required for gradient descent-based training approaches to optimize a given loss function successfully. Based on those insights, we introduce batch-entropy regularization to enable gradient descent-based training algorithms to optimize the flow of information through each hidden layer individually. With batch-entropy regularization, gradient descent optimizers can transform untrainable networks into trainable networks. We show empirically that we can therefore train a "vanilla" fully connected network and convolutional neural network -- no skip connections, batch normalization, dropout, or any other architectural tweak -- with 500 layers by simply adding the batch-entropy regularization term to the loss function. The effect of batch-entropy regularization is not only evaluated on vanilla neural networks, but also on residual networks, autoencoders, and also transformer models over a wide range of computer vision as well as natural language processing tasks.

preprint2022arXiv

Momentum Capsule Networks

Capsule networks are a class of neural networks that achieved promising results on many computer vision tasks. However, baseline capsule networks have failed to reach state-of-the-art results on more complex datasets due to the high computation and memory requirements. We tackle this problem by proposing a new network architecture, called Momentum Capsule Network (MoCapsNet). MoCapsNets are inspired by Momentum ResNets, a type of network that applies reversible residual building blocks. Reversible networks allow for recalculating activations of the forward pass in the backpropagation algorithm, so those memory requirements can be drastically reduced. In this paper, we provide a framework on how invertible residual building blocks can be applied to capsule networks. We will show that MoCapsNet beats the accuracy of baseline capsule networks on MNIST, SVHN, CIFAR-10 and CIFAR-100 while using considerably less memory. The source code is available on https://github.com/moejoe95/MoCapsNet.

preprint2022arXiv

On the sensitivity of the D parameter to new physics

Measurements of angular correlations in nuclear beta decay are important tests of the Standard Model (SM). Among those, the so-called D correlation parameter occupies a particular place because it is odd under time reversal, and because the experimental sensitivity is at the $10^{-4}$ level, with plans of further improvement in the near future. Using effective field theory~(EFT) techniques, we reassess its potential to discover or constrain new physics beyond the SM. We provide a comprehensive classification of CP-violating EFT scenarios which generate a shift of the D parameter away from the SM prediction. We show that, in each scenario, a shift larger than $10^{-5}$ is in serious tension with the existing experimental data, where bounds coming from electric dipole moments and LHC observables play a decisive role. The tension can only be avoided by fine tuning of the parameters in the UV completion of the EFT. We illustrate this using examples of leptoquark UV completions. Finally, we comment on the possibility to probe CP-conserving new physics via the D parameter.

preprint2022arXiv

Semileptonic tau decays beyond the Standard Model

Hadronic $τ$ decays are studied as probe of new physics. We determine the dependence of several inclusive and exclusive $τ$ observables on the Wilson coefficients of the low-energy effective theory describing charged-current interactions between light quarks and leptons. The analysis includes both strange and non-strange decay channels. The main result is the likelihood function for the Wilson coefficients in the tau sector, based on the up-to-date experimental measurements and state-of-the-art theoretical techniques. The likelihood can be readily combined with inputs from other low-energy precision observables. We discuss a combination with nuclear beta, baryon, pion, and kaon decay data. In particular, we provide a comprehensive and model-independent description of the new physics hints in the combined dataset, which are known under the name of the Cabibbo anomaly.

preprint2022arXiv

Short-distance constraints on the hadronic light-by-light

The muon anomalous magnetic moment continues to attract interest due to the potential tension between experimental measurement [1,2] and the Standard Model prediction [3]. The hadronic light-by-light contribution to the magnetic moment is one of the two diagrammatic topologies currently saturating the theoretical uncertainty. With the aim of improving precision on the hadronic light-by-light in a data-driven approach founded on dispersion theory [4,5], we derive various short-distance constraints of the underlying correlation function of four electromagnetic currents. Here, we present our previous progress in the purely short-distance regime and current efforts in the so-called Melnikov-Vainshtein limit.

preprint2022arXiv

Violations of Quark-Hadron Duality in Low-Energy Determinations of $α_s$

Using the spectral functions measured in $τ$ decays, we investigate the actual numerical impact of duality violations on the extraction of the strong coupling. These effects are tiny in the standard $α_s(m_τ^2)$ determinations from integrated distributions of the hadronic spectrum with pinched weights, or from the total $τ$ hadronic width. The pinched-weight factors suppress very efficiently the violations of duality, making their numerical effects negligible in comparison with the larger perturbative uncertainties. However, combined fits of $α_s$ and duality-violation parameters, performed with non-protected weights, are subject to large systematic errors associated with the assumed modelling of duality-violation effects. These uncertainties have not been taken into account in the published analyses, based on specific models of quark-hadron duality.

preprint2021arXiv

Arguments for the Unsuitability of Convolutional Neural Networks for Non--Local Tasks

Convolutional neural networks have established themselves over the past years as the state of the art method for image classification, and for many datasets, they even surpass humans in categorizing images. Unfortunately, the same architectures perform much worse when they have to compare parts of an image to each other to correctly classify this image. Until now, no well-formed theoretical argument has been presented to explain this deficiency. In this paper, we will argue that convolutional layers are of little use for such problems, since comparison tasks are global by nature, but convolutional layers are local by design. We will use this insight to reformulate a comparison task into a sorting task and use findings on sorting networks to propose a lower bound for the number of parameters a neural network needs to solve comparison tasks in a generalizable way. We will use this lower bound to argue that attention, as well as iterative/recurrent processing, is needed to prevent a combinatorial explosion.

preprint2016arXiv

25 years of CNNs: Can we compare to human abstraction capabilities?

We try to determine the progress made by convolutional neural networks over the past 25 years in classifying images into abstractc lasses. For this purpose we compare the performance of LeNet to that of GoogLeNet at classifying randomly generated images which are differentiated by an abstract property (e.g., one class contains two objects of the same size, the other class two objects of different sizes). Our results show that there is still work to do in order to solve vision problems humans are able to solve without much difficulty.

preprint2016arXiv

Determination of the QCD Coupling from ALEPH $τ$ Decay Data

We present a comprehensive study of the determination of the strong coupling from $τ$ decay, using the most recent release of the experimental ALEPH data. We critically review all theoretical strategies used in previous works and put forward various novel approaches which allow to study complementary aspects of the problem. We investigate the advantages and disadvantages of the different methods, trying to uncover their potential hidden weaknesses and test the stability of the obtained results under slight variations of the assumed inputs. We perform several determinations, using different methodologies, and find a very consistent set of results. All determinations are in excellent agreement, and allow us to extract a very reliable value for $α_s(m_τ^2)$. The main uncertainty originates in the pure perturbative error from unknown higher orders. Taking into account the systematic differences between the results obtained with the CIPT and FOPT prescriptions, we find $α_{s}^{(n_f=3)}(m_τ^2) = 0.328 \pm 0.013$ which implies $α_{s}^{(n_f=5)}(M_Z^{2}) = 0.1197\pm 0.0015$.

preprint2013arXiv

Proceedings of the 37th Annual Workshop of the Austrian Association for Pattern Recognition (ÖAGM/AAPR), 2013

This volume represents the proceedings of the 37th Annual Workshop of the Austrian Association for Pattern Recognition (ÖAGM/AAPR), held May 23-24, 2013, in Innsbruck, Austria.

Antonio Rodríguez-Sánchez

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization

Momentum Capsule Networks

On the sensitivity of the D parameter to new physics

Semileptonic tau decays beyond the Standard Model

Short-distance constraints on the hadronic light-by-light

Violations of Quark-Hadron Duality in Low-Energy Determinations of $α_s$

Arguments for the Unsuitability of Convolutional Neural Networks for Non--Local Tasks

25 years of CNNs: Can we compare to human abstraction capabilities?

Determination of the QCD Coupling from ALEPH $τ$ Decay Data

Proceedings of the 37th Annual Workshop of the Austrian Association for Pattern Recognition (ÖAGM/AAPR), 2013