Source author record

Pedro Silva

Pedro Silva appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.EP astro-ph.GA astro-ph.IM astro-ph.SR Computer Vision hep-ex hep-ph Machine Learning math.GR Software Engineering

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Deep Learning for School Dropout Detection: A Comparison of Tabular and Graph-Based Models for Predicting At-Risk Students

Student dropout is a significant challenge in educational systems worldwide, leading to substantial social and economic costs. Predicting students at risk of dropout allows for timely interventions. While traditional Machine Learning (ML) models operating on tabular data have shown promise, Graph Neural Networks (GNNs) offer a potential advantage by capturing complex relationships inherent in student data if structured as graphs. This paper investigates whether transforming tabular student data into graph structures, primarily using clustering techniques, enhances dropout prediction accuracy. We compare the performance of GNNs (a custom Graph Convolutional Network (GCN) and GraphSAGE) on these generated graphs against established tabular models (Random Forest (RF), XGBoost, and TabNet) using a real-world student dataset. Our experiments explore various graph construction strategies based on different clustering algorithms (K-Means, HDBSCAN) and dimensionality reduction techniques (Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP)). Our findings demonstrate that a specific GNN configuration, GraphSAGE on a graph derived from PCA-KMeans clustering, achieved superior performance, notably improving the macro F1-score by approximately 7 percentage points and accuracy by nearly 2 percentage points over the strongest tabular baseline (XGBoost). However, other GNN configurations and graph construction methods did not consistently surpass tabular models, emphasizing the critical role of the graph generation strategy and GNN architecture selection. This highlights both the potential of GNNs and the challenges in optimally transforming tabular data for graph-based learning in this domain.

preprint2022arXiv

A Decidability-Based Loss Function

Nowadays, deep learning is the standard approach for a wide range of problems, including biometrics, such as face recognition and speech recognition, etc. Biometric problems often use deep learning models to extract features from images, also known as embeddings. Moreover, the loss function used during training strongly influences the quality of the generated embeddings. In this work, a loss function based on the decidability index is proposed to improve the quality of embeddings for the verification routine. Our proposal, the D-loss, avoids some Triplet-based loss disadvantages such as the use of hard samples and tricky parameter tuning, which can lead to slow convergence. The proposed approach is compared against the Softmax (cross-entropy), Triplets Soft-Hard, and the Multi Similarity losses in four different benchmarks: MNIST, Fashion-MNIST, CIFAR10 and CASIA-IrisV4. The achieved results show the efficacy of the proposal when compared to other popular metrics in the literature. The D-loss computation, besides being simple, non-parametric and easy to implement, favors both the inter-class and intra-class scenarios.

preprint2020arXiv

ARCHI: pipeline for light curve extraction of CHEOPS background star

High precision time series photometry from space is being used for a number of scientific cases. In this context, the recently launched CHEOPS (ESA) mission promises to bring 20 ppm precision over an exposure time of 6 hours, when targeting nearby bright stars, having in mind the detailed characterization of exoplanetary systems through transit measurements. However, the official CHEOPS (ESA) mission pipeline only provides photometry for the main target (the central star in the field). In order to explore the potential of CHEOPS photometry for all stars in the field, in this paper we present archi, an additional open-source pipeline module†to analyse the background stars present in the image. As archi uses the official Data Reduction Pipeline data as input, it is not meant to be used as independent tool to process raw CHEOPS data but, instead, to be used as an add-on to the official pipeline. We test archi using CHEOPS simulated images, and show that photometry of background stars in CHEOPS images is only slightly degraded (by a factor of 2 to 3) with respect to the main target. This opens a potential for the use of CHEOPS to produce photometric time series of several close-by targets at once, as well as to use different stars in the image to calibrate systematic errors. We also show one clear scientific application where the study of the companion light curve can be important for the understanding of the contamination on the main target.

preprint2017arXiv

SpArcFiRe: morphological selection effects due to reduced visibility of tightly winding arms in distant spiral galaxies

The Galaxy Zoo has provided morphological data on many galaxies. Several biases have been identified in the Galaxy Zoo data. Here we report on a newly discovered selection effect: astronomers interested in studying spiral galaxies may select a set of spiral galaxies based upon a threshold in spirality (the fraction of Galaxy Zoo humans who report seeing spiral structure). SpArcFiRe is an automated tool that decomposes a spiral galaxy into its constituent spiral arms, providing objective, quantitative data on their structure. SpArcFiRe measures the pitch angle of spiral arms. We have observed that when selecting a set of spiral galaxies based on a threshold on spirality, the pitch angle of spiral arms appear increase with redshift. We hypothesize that this is a selection effect: tightly-wound spiral arms become less visible as images degrade with increasing redshift, leading to fewer such galaxies being included in the sample at higher redshifts. We corroborate this hypothesis by artificially degrading images of nearby galaxies, then using a machine learning algorithm trained on Galaxy Zoo data to provide a spirality for each artificially degraded image. It correctly predicts that spirality decreases as image quality degrades. Thus, the mean pitch angle of those galaxies remaining above the spirality threshold is higher than those eliminated by the selection effect. This demonstrates that users who select samples of galaxies using a threshold of Galaxy Zoo votes must carefully consider the possibility of selection effects on morphological measures, even if the measure itself is believed to be objective and unbiased. Finally, we also perform an empirical sensitivity analysis to demonstrate that SpArcFiRe's output changes in a smooth and predictable fashion to changes in its internal algorithmic parameters.

preprint2014arXiv

Perspectives on top quark physics after Run I of the LHC: sqrt(s)=13 TeV and beyond

A summary of the on-going preparations from the ATLAS and CMS collaborations to perform top quark physics in Run II of the LHC and at the HL-LHC is given. To maintain the current level of precision and profit from the high-luminosity scenario expected in the next runs of the LHC, several new reconstruction techniques and detector upgrades are foreseen. The prospects for precise measurements and possible discovery stories for new physics with top quarks are summarized.

preprint2012arXiv

Modeling Languages: metrics and assessing tools

Any traditional engineering field has metrics to rigorously assess the quality of their products. Engineers know that the output must satisfy the requirements, must comply with the production and market rules, and must be competitive. Professionals in the new field of software engineering started a few years ago to define metrics to appraise their product: individual programs and software systems. This concern motivates the need to assess not only the outcome but also the process and tools employed in its development. In this context, assessing the quality of programming languages is a legitimate objective; in a similar way, it makes sense to be concerned with models and modeling approaches, as more and more people start the software development process by a modeling phase. In this paper we introduce and motivate the assessment of models quality in the Software Development cycle. After the general discussion of this topic, we focus the attention on the most popular modeling language -- the UML -- presenting metrics. Through a Case-Study, we present and explore two tools. To conclude we identify what is still lacking in the tools side.

preprint2011arXiv

Finite automata for Schreier graphs of virtually free groups

The Stallings construction for finitely generated subgroups of free groups is generalized by introducing the concept of Stallings section, which allows an eficient computation of the core of a Schreier graph based on edge folding. It is proved that those groups admitting Stallings sections are precisely finitely generated virtually free groups, through a constructive approach based on Bass-Serre theory. Complexity issues and applications are also discussed.

preprint2010arXiv

Probing the flavor of the top quark decay

The top quark sector is almost decoupled from lighter quark generations due to the fact that $V_{tb}\approx$~1. The current experimental measurements of $V_{tb}$ are compatible with the Standard Model expectations but are still dominated by experimental uncertainties. In this manuscript, a revision of the experimental methods used to measure $V_{tb}$ is given, and a simple method to probe heavy flavor content fraction of top quark events, $R=B(t\rightarrow Wb)/B(t\rightarrow Wq)$, is presented and discussed. Prospects for the measurements at the Large Hadron Collider based on generator level simulations are outlined.