Researcher profile

Antonio Rodríguez-Sánchez

Antonio Rodríguez-Sánchez contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization

Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow networks generalize better than deep networks, and the addition of more layers results in higher training and test errors. The deep residual learning framework addresses this degradation problem by adding skip connections to several neural network layers. It would at first seem counter-intuitive that such skip connections are needed to train deep networks successfully as the expressivity of a network would grow exponentially with depth. In this paper, we first analyze the flow of information through neural networks. We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We prove empirically and theoretically that a positive batch-entropy is required for gradient descent-based training approaches to optimize a given loss function successfully. Based on those insights, we introduce batch-entropy regularization to enable gradient descent-based training algorithms to optimize the flow of information through each hidden layer individually. With batch-entropy regularization, gradient descent optimizers can transform untrainable networks into trainable networks. We show empirically that we can therefore train a "vanilla" fully connected network and convolutional neural network -- no skip connections, batch normalization, dropout, or any other architectural tweak -- with 500 layers by simply adding the batch-entropy regularization term to the loss function. The effect of batch-entropy regularization is not only evaluated on vanilla neural networks, but also on residual networks, autoencoders, and also transformer models over a wide range of computer vision as well as natural language processing tasks.

preprint2022arXiv

Momentum Capsule Networks

Capsule networks are a class of neural networks that achieved promising results on many computer vision tasks. However, baseline capsule networks have failed to reach state-of-the-art results on more complex datasets due to the high computation and memory requirements. We tackle this problem by proposing a new network architecture, called Momentum Capsule Network (MoCapsNet). MoCapsNets are inspired by Momentum ResNets, a type of network that applies reversible residual building blocks. Reversible networks allow for recalculating activations of the forward pass in the backpropagation algorithm, so those memory requirements can be drastically reduced. In this paper, we provide a framework on how invertible residual building blocks can be applied to capsule networks. We will show that MoCapsNet beats the accuracy of baseline capsule networks on MNIST, SVHN, CIFAR-10 and CIFAR-100 while using considerably less memory. The source code is available on https://github.com/moejoe95/MoCapsNet.

preprint2022arXiv

On the sensitivity of the D parameter to new physics

Measurements of angular correlations in nuclear beta decay are important tests of the Standard Model (SM). Among those, the so-called D correlation parameter occupies a particular place because it is odd under time reversal, and because the experimental sensitivity is at the $10^{-4}$ level, with plans of further improvement in the near future. Using effective field theory~(EFT) techniques, we reassess its potential to discover or constrain new physics beyond the SM. We provide a comprehensive classification of CP-violating EFT scenarios which generate a shift of the D parameter away from the SM prediction. We show that, in each scenario, a shift larger than $10^{-5}$ is in serious tension with the existing experimental data, where bounds coming from electric dipole moments and LHC observables play a decisive role. The tension can only be avoided by fine tuning of the parameters in the UV completion of the EFT. We illustrate this using examples of leptoquark UV completions. Finally, we comment on the possibility to probe CP-conserving new physics via the D parameter.

preprint2022arXiv

Semileptonic tau decays beyond the Standard Model

Hadronic $τ$ decays are studied as probe of new physics. We determine the dependence of several inclusive and exclusive $τ$ observables on the Wilson coefficients of the low-energy effective theory describing charged-current interactions between light quarks and leptons. The analysis includes both strange and non-strange decay channels. The main result is the likelihood function for the Wilson coefficients in the tau sector, based on the up-to-date experimental measurements and state-of-the-art theoretical techniques. The likelihood can be readily combined with inputs from other low-energy precision observables. We discuss a combination with nuclear beta, baryon, pion, and kaon decay data. In particular, we provide a comprehensive and model-independent description of the new physics hints in the combined dataset, which are known under the name of the Cabibbo anomaly.

preprint2022arXiv

Short-distance constraints on the hadronic light-by-light

The muon anomalous magnetic moment continues to attract interest due to the potential tension between experimental measurement [1,2] and the Standard Model prediction [3]. The hadronic light-by-light contribution to the magnetic moment is one of the two diagrammatic topologies currently saturating the theoretical uncertainty. With the aim of improving precision on the hadronic light-by-light in a data-driven approach founded on dispersion theory [4,5], we derive various short-distance constraints of the underlying correlation function of four electromagnetic currents. Here, we present our previous progress in the purely short-distance regime and current efforts in the so-called Melnikov-Vainshtein limit.

preprint2022arXiv

Violations of Quark-Hadron Duality in Low-Energy Determinations of $α_s$

Using the spectral functions measured in $τ$ decays, we investigate the actual numerical impact of duality violations on the extraction of the strong coupling. These effects are tiny in the standard $α_s(m_τ^2)$ determinations from integrated distributions of the hadronic spectrum with pinched weights, or from the total $τ$ hadronic width. The pinched-weight factors suppress very efficiently the violations of duality, making their numerical effects negligible in comparison with the larger perturbative uncertainties. However, combined fits of $α_s$ and duality-violation parameters, performed with non-protected weights, are subject to large systematic errors associated with the assumed modelling of duality-violation effects. These uncertainties have not been taken into account in the published analyses, based on specific models of quark-hadron duality.

preprint2021arXiv

Arguments for the Unsuitability of Convolutional Neural Networks for Non--Local Tasks

Convolutional neural networks have established themselves over the past years as the state of the art method for image classification, and for many datasets, they even surpass humans in categorizing images. Unfortunately, the same architectures perform much worse when they have to compare parts of an image to each other to correctly classify this image. Until now, no well-formed theoretical argument has been presented to explain this deficiency. In this paper, we will argue that convolutional layers are of little use for such problems, since comparison tasks are global by nature, but convolutional layers are local by design. We will use this insight to reformulate a comparison task into a sorting task and use findings on sorting networks to propose a lower bound for the number of parameters a neural network needs to solve comparison tasks in a generalizable way. We will use this lower bound to argue that attention, as well as iterative/recurrent processing, is needed to prevent a combinatorial explosion.