Source author record

Mark Kozdoba

Mark Kozdoba appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.FA Social and Information Networks Information Retrieval math.ST physics.soc-ph Statistics Theory

Catalog footprint

What is connected

6works

7topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Dimension Free Generalization Bounds for Non Linear Metric Learning

In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data. Specifically, we provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime which we term \emph{bounded amplification}. The sparse regime bounds correspond to situations where $\ell_1$-type norms of the parameters are small. Similarly to the situation in classification, solutions satisfying such bounds can be obtained by an appropriate regularization of the problem. On the other hand, unregularized SGD optimization of a metric learning loss typically does not produce sparse solutions. We show that despite this lack of sparsity, by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees. Consequently, these bounds can explain generalization in non sparse real experimental situations. We illustrate the studied phenomena on the MNIST and 20newsgroups datasets.

preprint2020arXiv

Topic Modeling via Full Dependence Mixtures

In this paper we introduce a new approach to topic modelling that scales to large datasets by using a compact representation of the data and by leveraging the GPU architecture. In this approach, topics are learned directly from the co-occurrence data of the corpus. In particular, we introduce a novel mixture model which we term the Full Dependence Mixture (FDM) model. FDMs model second moment under general generative assumptions on the data. While there is previous work on topic modeling using second moments, we develop a direct stochastic optimization procedure for fitting an FDM with a single Kullback Leibler objective. Moment methods in general have the benefit that an iteration no longer needs to scale with the size of the corpus. Our approach allows us to leverage standard optimizers and GPUs for the problem of topic modeling. In particular, we evaluate the approach on two large datasets, NeurIPS papers and a Twitter corpus, with a large number of topics, and show that the approach performs comparably or better than the the standard benchmarks.

preprint2015arXiv

Overlapping Communities Detection via Measure Space Embedding

We present a new algorithm for community detection. The algorithm uses random walks to embed the graph in a space of measures, after which a modification of $k$-means in that space is applied. The algorithm is therefore fast and easily parallelizable. We evaluate the algorithm on standard random graph benchmarks, including some overlapping community benchmarks, and find its performance to be better or at least as good as previously known algorithms. We also prove a linear time (in number of edges) guarantee for the algorithm on a $p,q$-stochastic block model with $p \geq c\cdot N^{-\frac{1}{2} + ε}$ and $p-q \geq c' \sqrt{p N^{-\frac{1}{2} + ε} \log N}$.

preprint2015arXiv

Overlapping Community Detection by Online Cluster Aggregation

We present a new online algorithm for detecting overlapping communities. The main ingredients are a modification of an online k-means algorithm and a new approach to modelling overlap in communities. An evaluation on large benchmark graphs shows that the quality of discovered communities compares favorably to several methods in the recent literature, while the running time is significantly improved.

preprint2011arXiv

On Projections of Metric Spaces

Let $X$ be a metric space and let $μ$ be a probability measure on it. Consider a Lipschitz map $T: X \rightarrow \Rn$, with Lipschitz constant $\leq 1$. Then one can ask whether the image $TX$ can have large projections on many directions. For a large class of spaces $X$, we show that there are directions $ϕ\in \nsphere$ on which the projection of the image $TX$ is small on the average, with bounds depending on the dimension $n$ and the eigenvalues of the Laplacian on $X$.