Source author record

Miguel Romero

Miguel Romero appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Logic in Computer Science Data Structures and Algorithms Computational Complexity Discrete Mathematics Artificial Intelligence Databases Distributed, Parallel, and Cluster Computing math.CO math.SP Performance Social and Information Networks

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Pliability and Approximating Max-CSPs

We identify a sufficient condition, treewidth-pliability, that gives a polynomial-time algorithm for an arbitrarily good approximation of the optimal value in a large class of Max-2-CSPs parameterised by the class of allowed constraint graphs (with arbitrary constraints on an unbounded alphabet). Our result applies more generally to the maximum homomorphism problem between two rational-valued structures. The condition unifies the two main approaches for designing a polynomial-time approximation scheme. One is Baker's layering technique, which applies to sparse graphs such as planar or excluded-minor graphs. The other is based on Szemerédi's regularity lemma and applies to dense graphs. We extend the applicability of both techniques to new classes of Max-CSPs. On the other hand, we prove that the condition cannot be used to find solutions (as opposed to approximating the optimal value) in general. Treewidth-pliability turns out to be a robust notion that can be defined in several equivalent ways, including characterisations via size, treedepth, or the Hadwiger number. We show connections to the notions of fractional-treewidth-fragility from structural graph theory, hyperfiniteness from the area of property testing, and regularity partitions from the theory of dense graph limits. These may be of independent interest. In particular we show that a monotone class of graphs is hyperfinite if and only if it is fractionally-treewidth-fragile and has bounded degree.

preprint2022arXiv

A Top-down Supervised Learning Approach to Hierarchical Multi-label Classification in Networks

Node classification is the task of inferring or predicting missing node attributes from information available for other nodes in a network. This paper presents a general prediction model to hierarchical multi-label classification (HMC), where the attributes to be inferred can be specified as a strict poset. It is based on a top-down classification approach that addresses hierarchical multi-label classification with supervised learning by building a local classifier per class. The proposed model is showcased with a case study on the prediction of gene functions for Oryza sativa Japonica, a variety of rice. It is compared to the Hierarchical Binomial-Neighborhood, a probabilistic model, by evaluating both approaches in terms of prediction performance and computational cost. The results in this work support the working hypothesis that the proposed model can achieve good levels of prediction efficiency, while scaling up in relation to the state of the art.

preprint2022arXiv

Feature extraction using Spectral Clustering for Gene Function Prediction using Hierarchical Multi-label Classification

Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification (HMC). The approach uses spectral clustering to extract new features from the gene co-expression network (GCN) and enrich the prediction task. HMC is used to build multiple estimators that consider the hierarchical structure of gene functions. The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world. The results illustrate how in silico approaches are key to reduce the time and costs of gene annotation. More specifically, they highlight the importance of: (i) building new features that represent the structure of gene relationships in GCNs to annotate genes; and (ii) taking into account the structure of biological processes to obtain consistent predictions.

preprint2022arXiv

Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification

The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of annotating genes with functions remains an open challenge. In this work, we present a method to detect missing annotations in hierarchical multi-label classification datasets. We propose a method that exploits the class hierarchy by computing aggregated probabilities to the paths of classes from the leaves to the root for each instance. The proposed method is presented in the context of predicting missing gene function annotations, where these aggregated probabilities are further used to select a set of annotations to be verified through in vivo experiments. The experiments on Oriza sativa Japonica, a variety of rice, showcase that incorporating the hierarchy of classes into the method often improves the predictive performance and our proposed method yields superior results when compared to competitor methods from the literature.

preprint2022arXiv

Modeling GPU Dynamic Parallelism for Self Similar Density Workloads

Dynamic Parallelism (DP) is a runtime feature of the GPU programming model that allows GPU threads to execute additional GPU kernels, recursively. Apart from making the programming of parallel hierarchical patterns easier, DP can also speedup problems that exhibit a heterogeneous data layout by focusing, through a subdivision process, the finite GPU resources on the sub-regions that exhibit more parallelism. However, doing an optimal subdivision process is not trivial, as there are different parameters that play an important role in the final performance of DP. Moreover, the current programming abstraction for DP also introduces an overhead that can penalize the final performance. In this work we present a subdivision cost model for problems that exhibit self similar density (SSD) workloads (such as fractals), in order understand what parameters provide the fastest subdivision approach. Also, we introduce a new subdivision implementation, named \textit{Adaptive Serial Kernels} (ASK), as a smaller overhead alternative to CUDA's Dynamic Parallelism. Using the cost model on the Mandelbrot Set as a case study shows that the optimal scheme is to start with an initial subdivision between $g=[2,16]$, then keep subdividing in regions of $r=2,4$, and stop when regions reach a size of $B \sim 32$. The experimental results agree with the theoretical parameters, confirming the usability of the cost model. In terms of performance, the proposed ASK approach runs up to $\sim 60\%$ faster than Dynamic Parallelism in the Mandelbrot set, and up to $12\times$ faster than a basic exhaustive implementation, whereas DP is up to $7.5\times$.

preprint2022arXiv

On Computing Probabilistic Explanations for Decision Trees

Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of "sufficient reasons", a kind of explanation in which given a decision tree $T$ and an instance $x$, one explains the decision $T(x)$ by providing a subset $y$ of the features of $x$ such that for any other instance $z$ compatible with $y$, it holds that $T(z) = T(x)$, intuitively meaning that the features in $y$ are already enough to fully justify the classification of $x$ by $T$. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation, and thus the community has started to study their probabilistic counterpart, in which one requires that the probability of $T(z) = T(x)$ must be at least some value $δ\in (0, 1]$, where $z$ is a random instance that is compatible with $y$. Our paper settles the computational complexity of $δ$-sufficient-reasons over decision trees, showing that both (1) finding $δ$-sufficient-reasons that are minimal in size, and (2) finding $δ$-sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless P=NP). This is in stark contrast with the deterministic case ($δ= 1$) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings.

preprint2021arXiv

The complexity of general-valued CSPs seen from the other side

The constraint satisfaction problem (CSP) is concerned with homomorphisms between two structures. For CSPs with restricted left-hand side structures, the results of Dalmau, Kolaitis, and Vardi [CP'02], Grohe [FOCS'03/JACM'07], and Atserias, Bulatov, and Dalmau [ICALP'07] establish the precise borderline of polynomial-time solvability (subject to complexity-theoretic assumptions) and of solvability by bounded-consistency algorithms (unconditionally) as bounded treewidth modulo homomorphic equivalence. The general-valued constraint satisfaction problem (VCSP) is a generalisation of the CSP concerned with homomorphisms between two valued structures. For VCSPs with restricted left-hand side valued structures, we establish the precise borderline of polynomial-time solvability (subject to complexity-theoretic assumptions) and of solvability by the $k$-th level of the Sherali-Adams LP hierarchy (unconditionally). We also obtain results on related problems concerned with finding a solution and recognising the tractable cases; the latter has an application in database theory.

preprint2020arXiv

On monotonic determinacy and rewritability for recursive queries and views

A query Q is monotonically determined over a set of views if Q can be expressed as a monotonic function of the view image. In the case of relational algebra views and queries, monotonic determinacy coincides with rewritability as a union of conjunctive queries, and it is decidable in important special cases, such as for CQ views and queries. We investigate the situation for views and queries in the recursive query language Datalog. We give both positive and negative results about the ability to decide monotonic determinacy, and also about the co-incidence of monotonic determinacy with Datalog rewritability.

preprint2020arXiv

Point-width and Max-CSPs

The complexity of (unbounded-arity) Max-CSPs under structural restrictions is poorly understood. The two most general hypergraph properties known to ensure tractability of Max-CSPs, $β$-acyclicity and bounded (incidence) MIM-width, are incomparable and lead to very different algorithms. We introduce the framework of point decompositions for hypergraphs and use it to derive a new sufficient condition for the tractability of (structurally restricted) Max-CSPs, which generalises both bounded MIM-width and \b{eta}-acyclicity. On the way, we give a new characterisation of bounded MIM-width and discuss other hypergraph properties which are relevant to the complexity of Max-CSPs, such as $β$-hypertreewidth.

preprint2020arXiv

Spectral Evolution with Approximated Eigenvalue Trajectories for Link Prediction

The spectral evolution model aims to characterize the growth of large networks (i.e., how they evolve as new edges are established) in terms of the eigenvalue decomposition of the adjacency matrices. It assumes that, while eigenvectors remain constant, eigenvalues evolve in a predictable manner over time. This paper extends the original formulation of the model twofold. First, it presents a method to compute an approximation of the spectral evolution of eigenvalues based on the Rayleigh quotient. Second, it proposes an algorithm to estimate the evolution of eigenvalues by extrapolating only a fraction of their approximated values. The proposed model is used to characterize mention networks of users who posted tweets that include the most popular political hashtags in Colombia from August 2017 to August 2018 (the period which concludes the disarmament of the Revolutionary Armed Forces of Colombia). To evaluate the extent to which the spectral evolution model resembles these networks, link prediction methods based on learning algorithms (i.e., extrapolation and regression) and graph kernels are implemented. Experimental results show that the learning algorithms deployed on the approximated trajectories outperform the usual kernel and extrapolation methods at predicting the formation of new edges.

preprint2016arXiv

The complexity of reverse engineering problems for conjunctive queries

Reverse engineering problems for conjunctive queries (CQs), such as query by example (QBE) or definability, take a set of user examples and convert them into an explanatory CQ. Despite their importance, the complexity of these problems is prohibitively high (coNEXPTIME-complete). We isolate their two main sources of complexity and propose relaxations of them that reduce the complexity while having meaningful theoretical interpretations. The first relaxation is based on the idea of using existential pebble games for approximating homomorphism tests. We show that this characterizes QBE/definability for CQs up to treewidth $k$, while reducing the complexity to EXPTIME. As a side result, we obtain that the complexity of the QBE/definability problems for CQs of treewidth $k$ is EXPTIME-complete for each $k \geq 1$. The second relaxation is based on the idea of "desynchronizing" direct products, which characterizes QBE/definability for unions of CQs and reduces the complexity to coNP. The combination of these two relaxations yields tractability for QBE and characterizes it in terms of unions of CQs of treewidth at most $k$. We also study the complexity of these problems for conjunctive regular path queries over graph databases, showing them to be no more difficult than for CQs.

Miguel Romero

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Pliability and Approximating Max-CSPs

A Top-down Supervised Learning Approach to Hierarchical Multi-label Classification in Networks

Feature extraction using Spectral Clustering for Gene Function Prediction using Hierarchical Multi-label Classification

Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification

Modeling GPU Dynamic Parallelism for Self Similar Density Workloads

On Computing Probabilistic Explanations for Decision Trees

The complexity of general-valued CSPs seen from the other side

On monotonic determinacy and rewritability for recursive queries and views

Point-width and Max-CSPs

Spectral Evolution with Approximated Eigenvalue Trajectories for Link Prediction

The complexity of reverse engineering problems for conjunctive queries