Researcher profile

Manuel Szewc

Manuel Szewc contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Bayesian Probabilistic Modelling for Four-Tops at the LHC

Monte Carlo (MC) generators are crucial for analyzing data in particle collider experiments. However, often even a small mismatch between the MC simulations and the measurements can undermine the interpretation of the results. This is particularly important in the context of LHC searches for rare physics processes within and beyond the standard model (SM). One of the ultimate rare processes in the SM currently being explored at the LHC, $pp\to t\bar tt \bar t$ with its large multi-dimensional phase-space is an ideal testing ground to explore new ways to reduce the impact of potential MC mismodelling on experimental results. We propose a novel statistical method capable of disentangling the 4-top signal from the dominant backgrounds in the same-sign dilepton channel, while simultaneously correcting for possible MC imperfections in modelling of the most relevant discriminating observables -- the jet multiplicity distributions. A Bayesian mixture of multinomials is used to model the light-jet and $b$-jet multiplicities under the assumption of their conditional independence. The signal and background distributions generated from a deliberately mistuned MC simulator are used as model priors. The posterior distributions, as well as the signal and background fractions, are then learned from the data using Bayesian inference. We demonstrate that our method can mitigate the effects of large MC mismodellings in the context of a realistic $t\bar tt\bar t$ search, leading to corrected posterior distributions that better approximate the underlying truth-level spectra.

preprint2022arXiv

Unsupervised quark/gluon jet tagging with Poissonian Mixture Models

The classification of jets induced by quarks or gluons is important for New Physics searches at high-energy colliders. However, available taggers usually rely on modelling the data through Monte Carlo simulations, which could veil intractable theoretical and systematical uncertainties. To significantly reduce biases, we propose an unsupervised learning algorithm that, given a sample of jets, can learn the SoftDrop Poissonian rates for quark- and gluon-initiated jets and their fractions. We extract the Maximum Likelihood Estimates for the mixture parameters and the posterior probability over them. We then construct a quark-gluon tagger and estimate its accuracy in actual data to be in the $0.65-0.7$ range, below supervised algorithms but nevertheless competitive. We also show how relevant unsupervised metrics perform well, allowing for an unsupervised hyperparameter selection. Further, we find that this result is not affected by an angular smearing introduced to simulate detector effects for central jets. The presented unsupervised learning algorithm is simple; its result is interpretable and depends on very few assumptions.

preprint2020arXiv

A Machine Learning alternative to placebo-controlled clinical trials upon new diseases: A primer

The appearance of a new dangerous and contagious disease requires the development of a drug therapy faster than what is foreseen by usual mechanisms. Many drug therapy developments consist in investigating through different clinical trials the effects of different specific drug combinations by delivering it into a test group of ill patients, meanwhile a placebo treatment is delivered to the remaining ill patients, known as the control group. We compare the above technique to a new technique in which all patients receive a different and reasonable combination of drugs and use this outcome to feed a Neural Network. By averaging out fluctuations and recognizing different patient features, the Neural Network learns the pattern that connects the patients initial state to the outcome of the treatments and therefore can predict the best drug therapy better than the above method. In contrast to many available works, we do not study any detail of drugs composition nor interaction, but instead pose and solve the problem from a phenomenological point of view, which allows us to compare both methods. Although the conclusion is reached through mathematical modeling and is stable upon any reasonable model, this is a proof-of-concept that should be studied within other expertises before confronting a real scenario. All calculations, tools and scripts have been made open source for the community to test, modify or expand it. Finally it should be mentioned that, although the results presented here are in the context of a new disease in medical sciences, these are useful for any field that requires a experimental technique with a control group.

preprint2020arXiv

Containing COVID-19 outbreaks using a Firewall

COVID-19 outbreaks have proven to be very difficult to isolate and extinguish before they spread out. An important reason behind this might be that epidemiological barriers consisting in stopping symptomatic people are likely to fail because of the contagion time before onset, mild cases and/or asymptomatics carriers. Motivated by these special COVID-19 features, we study a scheme for containing an outbreak in a city that consists in adding an extra firewall block between the outbreak and the rest of the city. We implement a coupled compartment model with stochastic noise to simulate a localized outbreak that is partially isolated and analyze its evolution with and without firewall for different plausible model parameters. We explore how further improvements could be achieved if the epidemic evolution would trigger policy changes for the flux and/or lock-down in the different blocks. Our results show that a substantial improvement is obtained by merely adding an extra block between the outbreak and the bulk of the city.

preprint2020arXiv

Intelligent Arxiv: Sort daily papers by learning users topics preference

Current daily paper releases are becoming increasingly large and areas of research are growing in diversity. This makes it harder for scientists to keep up to date with current state of the art and identify relevant work within their lines of interest. The goal of this article is to address this problem using Machine Learning techniques. We model a scientific paper to be built as a combination of different scientific knowledge from diverse topics into a new problem. In light of this, we implement the unsupervised Machine Learning technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a given field to: i) define and extract underlying topics in the corpus; ii) get the topics weight vector for each paper in the corpus; and iii) get the topics weight vector for new papers. By registering papers preferred by a user, we build a user vector of weights using the information of the vectors of the selected papers. Hence, by performing an inner product between the user vector and each paper in the daily Arxiv release, we can sort the papers according to the user preference on the underlying topics. We have created the website IArxiv.org where users can read sorted daily Arxiv releases (and more) while the algorithm learns each users preference, yielding a more accurate sorting every day. Current IArxiv.org version runs on Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to others. We propose several new useful and relevant implementations to be additionally developed as well as new Machine Learning techniques beyond LDA to further improve the accuracy of this new tool.

preprint2020arXiv

Topic Model for four-top at the LHC

We study the implementation of a Topic Model algorithm in four-top searches at the LHC as a test-probe of a not ideal system for applying this technique. We study this Topic Model behavior as its different hypotheses such as mutual reducibility and equal distribution in all samples shift from true. The four-top final state at the LHC is not only relevant because it does not fulfill these conditions, but also because it is a difficult and inefficient system to reconstruct and current Monte Carlo modeling of signal and backgrounds suffers from non-negligible uncertainties. We implement this Topic Model algorithm in the Same-Sign lepton channel where S/B is of order one and all backgrounds cannot have more than two b-jets at parton level. We define different mixtures according to the number of b-jets and we use the total number of jets to demix. Since only the background has an anchor bin, we find that we can reconstruct the background in the signal region independently of Monte Carlo. We propose to use this information to tune the Monte Carlo in the signal region and then compare signal prediction with data. We also explore Machine Learning techniques applied to this Topic Model algorithm and find slight improvements as well as potential roads to investigate. Although our findings indicate that still with the full LHC run 3 data the implementation would be challenging, we pursue through this work to find ways to reduce the impact of Monte Carlo simulations in four-top searches at the LHC.