Source author record

Nicolas Fraiman

Nicolas Fraiman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO math.PR Machine Learning Discrete Mathematics Networking and Internet Architecture

Catalog footprint

What is connected

10works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Beam Search for Feature Selection

In this paper, we present and prove some consistency results about the performance of classification models using a subset of features. In addition, we propose to use beam search to perform feature selection, which can be viewed as a generalization of forward selection. We apply beam search to both simulated and real-world data, by evaluating and comparing the performance of different classification models using different sets of features. The results demonstrate that beam search could outperform forward selection, especially when the features are correlated so that they have more discriminative power when considered jointly than individually. Moreover, in some cases classification models could obtain comparable performance using only ten features selected by beam search instead of hundreds of original features.

preprint2022arXiv

Biclustering with Alternating K-Means

Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-diagonal biclusters. We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk. We develop and prove a consistency result with respect to the empirical clustering risk. Since the optimization problem is combinatorial in nature, finding the global minimum is computationally intractable. In light of this fact, we propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows. We evaluate and compare the performance of our algorithm to other related biclustering methods on both simulated data and real-world gene expression data sets. The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.

preprint2022arXiv

Classification with Nearest Disjoint Centroids

In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other classification methods on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.

preprint2020arXiv

Community modulated recursive trees and population dependent branching processes

We consider random recursive trees that are grown via community modulated schemes that involve random attachment or degree based attachment. The aim of this paper is to derive general techniques based on continuous time embedding to study such models. The associated continuous time embeddings are not branching processes: individual reproductive rates at each time depend on the composition of the entire population at that time. Using stochastic analytic techniques we show that various key macroscopic statistics of the continuous time embedding stabilize, allowing asymptotics for a host of functionals of the original models to be derived.

preprint2020arXiv

Recursive functions on conditional Galton--Watson trees

A recursive function on a tree is a function in which each leaf has a given value, and each internal node has a value equal to a function of the number of children, the values of the children, and possibly an explicitly specified random element $U$. The value of the root is the key quantity of interest in general. In this first study, all node values and function values are in a finite set $S$. In this note, we describe the limit behavior when the leaf values are drawn independently from a fixed distribution on $S$, and the tree $T_n$ is a random Galton--Watson tree of size $n$.

preprint2015arXiv

The diameter of Inhomogeneous random graphs

In this paper we study the diameter of Inhomogeneous random graphs $G(n,κ,p)$ that are induced by irreducible kernels $κ$. The kernels we consider act on separable metric spaces and are almost everywhere continuous. We generalize results known for the Erdős-Rényi model $G(n,p)$ for several ranges of $p$. We find upper and lower bounds for the diameter of $G(n,κ,p)$ in terms of the expansion factor and two explicit constants that depend on the behavior of the kernel over partitions of the metric space.

preprint2012arXiv

Connectivity of inhomogeneous random graphs

We find conditions for the connectivity of inhomogeneous random graphs with intermediate density. Our results generalize the classical result for G(n, p), when p = c log n/n. We draw n independent points X_i from a general distribution on a separable metric space, and let their indices form the vertex set of a graph. An edge (i,j) is added with probability min(1, \K(X_i,X_j) log n/n), where \K \ge 0 is a fixed kernel. We show that, under reasonably weak assumptions, the connectivity threshold of the model can be determined.

preprint2012arXiv

Depth properties of scaled attachment random recursive trees

We study depth properties of a general class of random recursive trees where each node i attaches to the random node iX_i and X_0, ..., X_n is a sequence of i.i.d. random variables taking values in [0,1). We call such trees scaled attachment random recursive trees (SARRT). We prove that the typical depth D_n, the maximum depth (or height) H_n and the minimum depth M_n of a SARRT are asymptotically given by D_n \sim μ^{-1} \log n, H_n \sim α_{\max} \log n and M_n \sim α_{\min} \log n where μ, α_{\max} and α_{\min} are constants depending only on the distribution of X_0 whenever X_0 has a density. In particular, this gives a new elementary proof for the height of uniform random recursive trees H_n \sim e \log n that does not use branching random walks.

preprint2011arXiv

Connectivity threshold for Bluetooth graphs

We study the connectivity properties of random Bluetooth graphs that model certain "ad hoc" wireless networks. The graphs are obtained as "irrigation subgraphs" of the well-known random geometric graph model. There are two parameters that control the model: the radius $r$ that determines the "visible neighbors" of each node and the number of edges $c$ that each node is allowed to send to these. The randomness comes from the underlying distribution of data points in space and from the choices of each vertex. We prove that no connectivity can take place with high probability for a range of parameters $r, c$ and completely characterize the connectivity threshold (in $c$) for values of $r$ close the critical value for connectivity in the underlying random geometric graph.

preprint2011arXiv

Lines in hypergraphs

One of the De Bruijn - Erdos theorems deals with finite hypergraphs where every two vertices belong to precisely one hyperedge. It asserts that, except in the perverse case where a single hyperedge equals the whole vertex set, the number of hyperedges is at least the number of vertices and the two numbers are equal if and only if the hypergraph belongs to one of simply described families, near-pencils and finite projective planes. Chen and Chvatal proposed to define the line uv in a 3-uniform hypergraph as the set of vertices that consists of u, v, and all w such that {u,v,w} is a hyperedge. With this definition, the De Bruijn - Erdos theorem is easily seen to be equivalent to the following statement: If no four vertices in a 3-uniform hypergraph carry two or three hyperedges, then, except in the perverse case where one of the lines equals the whole vertex set, the number of lines is at least the number of vertices and the two numbers are equal if and only if the hypergraph belongs to one of two simply described families. Our main result eneralizes this statement by allowing any four vertices to carry three hyperedges (but keeping two forbidden): the conclusion remains the same except that a third simply described family, complements of Steiner triple systems, appears in the extremal case.

Nicolas Fraiman

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Beam Search for Feature Selection

Biclustering with Alternating K-Means

Classification with Nearest Disjoint Centroids

Community modulated recursive trees and population dependent branching processes

Recursive functions on conditional Galton--Watson trees

The diameter of Inhomogeneous random graphs

Connectivity of inhomogeneous random graphs

Depth properties of scaled attachment random recursive trees

Connectivity threshold for Bluetooth graphs

Lines in hypergraphs