Source author record

Vladimir Pestov

Vladimir Pestov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Information Retrieval Databases funct-an math.DS math.FA math.GR math.OA

Catalog footprint

What is connected

17works

9topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A note on the groups of finite type and the Hartman-Mycielski construction

Ando, Matsuzawa, Thom, and Törnquist have resolved a problem by Sorin Popa by constructing an example of a Polish group of unitary operators with the strong operator topology, whose left and right uniform structures coincide, but which does not embed into the unitary group of a finite von Neumann algebra. The question remained whether such a group can be connected. Here we observe that a connected (in fact, homeomorphic to the Hilbert space) example is obtained from the example of the above authors via the Hartman--Mycielski construction.

preprint2020arXiv

An amenability-like property of finite energy path and loop groups

We show that the groups of finite energy loops and paths (that is, those of Sobolev class $H^1$) with values in a compact connected Lie group, as well as their central extensions, satisfy an amenability-like property: they admit a left-invariant mean on the space of bounded functions uniformly continuous with regard to a left-invariant metric. Every strongly continuous unitary representation $π$ of such a group (which we call skew-amenable) has a conjugation-invariant state on $B({\mathcal H}_π)$.

preprint2013arXiv

Text Categorization via Similarity Search: An Efficient and Effective Novel Algorithm

We present a supervised learning algorithm for text categorization which has brought the team of authors the 2nd place in the text categorization division of the 2012 Cybersecurity Data Mining Competition (CDMC'2012) and a 3rd prize overall. The algorithm is quite different from existing approaches in that it is based on similarity search in the metric space of measure distributions on the dictionary. At the preprocessing stage, given a labeled learning sample of texts, we associate to every class label (document category) a point in the space of question. Unlike it is usual in clustering, this point is not a centroid of the category but rather an outlier, a uniform measure distribution on a selection of domain-specific words. At the execution stage, an unlabeled text is assigned a text category as defined by the closest labeled neighbour to the point representing the frequency distribution of the words in the text. The algorithm is both effective and efficient, as further confirmed by experiments on the Reuters 21578 dataset.

preprint2012arXiv

Is the k-NN classifier in high dimensions affected by the curse of dimensionality?

There is an increasing body of evidence suggesting that exact nearest neighbour search in high-dimensional spaces is affected by the curse of dimensionality at a fundamental level. Does it necessarily mean that the same is true for k nearest neighbours based learning algorithms such as the k-NN classifier? We analyse this question at a number of levels and show that the answer is different at each of them. As our first main observation, we show the consistency of a k approximate nearest neighbour classifier. However, the performance of the classifier in very high dimensions is provably unstable. As our second main observation, we point out that the existing model for statistical learning is oblivious of dimension of the domain and so every learning problem admits a universally consistent deterministic reduction to the one-dimensional case by means of a Borel isomorphism.

preprint2012arXiv

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets $X$ are sampled randomly from a domain $Ω$, equipped with a distance, $ρ$, and an underlying probability distribution, $μ$. While performing an asymptotic analysis, we send the intrinsic dimension $d$ of $Ω$ to infinity, and assume that the size of a dataset, $n$, grows superpolynomially yet subexponentially in $d$. Exact similarity search refers to finding the nearest neighbour in the dataset $X$ to a query point $ω\inΩ$, where the query points are subject to the same probability distribution $μ$ as datapoints. Let $\mathscr F$ denote a class of all 1-Lipschitz functions on $Ω$ that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets $\{ω\colon f(ω)\geq a\}$, $a\in\R$ is $o(n^{1/4}/\log^2n)$. (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption $d^{O(1)}$ is reasonable.) We deduce the $Ω(n^{1/4})$ lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in $(Ω,X)$. In paricular, this bound is superpolynomial in $d$.

preprint2012arXiv

PAC learnability under non-atomic measures: a problem by Vidyasagar

In response to a 1997 problem of M. Vidyasagar, we state a criterion for PAC learnability of a concept class $\mathscr C$ under the family of all non-atomic (diffuse) measures on the domain $Ω$. The uniform Glivenko--Cantelli property with respect to non-atomic measures is no longer a necessary condition, and consistent learnability cannot in general be expected. Our criterion is stated in terms of a combinatorial parameter $\VC({\mathscr C}\,{\mathrm{mod}}\,ω_1)$ which we call the VC dimension of $\mathscr C$ modulo countable sets. The new parameter is obtained by "thickening up" single points in the definition of VC dimension to uncountable "clusters". Equivalently, $\VC(\mathscr C\moddω_1)\leq d$ if and only if every countable subclass of $\mathscr C$ has VC dimension $\leq d$ outside a countable subset of $Ω$. The new parameter can be also expressed as the classical VC dimension of $\mathscr C$ calculated on a suitable subset of a compactification of $Ω$. We do not make any measurability assumptions on $\mathscr C$, assuming instead the validity of Martin's Axiom (MA). Similar results are obtained for function learning in terms of fat-shattering dimension modulo countable sets, but, just like in the classical distribution-free case, the finiteness of this parameter is sufficient but not necessary for PAC learnability under non-atomic measures.

preprint2011arXiv

Indexability, concentration, and VC theory

Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.

preprint2011arXiv

PAC learnability versus VC dimension: a footnote to a basic result of statistical learning

A fundamental result of statistical learnig theory states that a concept class is PAC learnable if and only if it is a uniform Glivenko-Cantelli class if and only if the VC dimension of the class is finite. However, the theorem is only valid under special assumptions of measurability of the class, in which case the PAC learnability even becomes consistent. Otherwise, there is a classical example, constructed under the Continuum Hypothesis by Dudley and Durst and further adapted by Blumer, Ehrenfeucht, Haussler, and Warmuth, of a concept class of VC dimension one which is neither uniform Glivenko-Cantelli nor consistently PAC learnable. We show that, rather surprisingly, under an additional set-theoretic hypothesis which is much milder than the Continuum Hypothesis (Martin's Axiom), PAC learnability is equivalent to finite VC dimension for every concept class.

preprint2010arXiv

A note on sample complexity of learning binary output neural networks under fixed input distributions

We show that the learning sample complexity of a sigmoidal neural network constructed by Sontag (1992) required to achieve a given misclassification error under a fixed purely atomic distribution can grow arbitrarily fast: for any prescribed rate of growth there is an input distribution having this rate as the sample complexity, and the bound is asymptotically tight. The rate can be superexponential, a non-recursive function, etc. We further observe that Sontag's ANN is not Glivenko-Cantelli under any input distribution having a non-atomic part.

preprint2010arXiv

Intrinsic Dimensionality

This entry for the SIGSPATIAL Special July 2010 issue on Similarity Searching in Metric Spaces discusses the notion of intrinsic dimensionality of data in the context of similarity search.

preprint2010arXiv

PAC learnability of a concept class under non-atomic measures: a problem by Vidyasagar

In response to a 1997 problem of M. Vidyasagar, we state a necessary and sufficient condition for distribution-free PAC learnability of a concept class $\mathscr C$ under the family of all non-atomic (diffuse) measures on the domain $Ω$. Clearly, finiteness of the classical Vapnik-Chervonenkis dimension of $\mathscr C$ is a sufficient, but no longer necessary, condition. Besides, learnability of $\mathscr C$ under non-atomic measures does not imply the uniform Glivenko-Cantelli property with regard to non-atomic measures. Our learnability criterion is stated in terms of a combinatorial parameter $\VC({\mathscr C}\,{\mathrm{mod}}\,ω_1)$ which we call the VC dimension of $\mathscr C$ modulo countable sets. The new parameter is obtained by ``thickening up'' single points in the definition of VC dimension to uncountable ``clusters''. Equivalently, $\VC(\mathscr C\moddω_1)\leq d$ if and only if every countable subclass of $\mathscr C$ has VC dimension $\leq d$ outside a countable subset of $Ω$. The new parameter can be also expressed as the classical VC dimension of $\mathscr C$ calculated on a suitable subset of a compactification of $Ω$. We do not make any measurability assumptions on $\mathscr C$, assuming instead the validity of Martin's Axiom (MA).

preprint2010arXiv

Predictive PAC learnability: a paradigm for learning from exchangeable input data

Exchangeable random variables form an important and well-studied generalization of i.i.d. variables, however simple examples show that no nontrivial concept or function classes are PAC learnable under general exchangeable data inputs $X_1,X_2,\ldots$. Inspired by the work of Berti and Rigo on a Glivenko--Cantelli theorem for exchangeable inputs, we propose a new paradigm, adequate for learning from exchangeable data: predictive PAC learnability. A learning rule $\mathcal L$ for a function class $\mathscr F$ is predictive PAC if for every $\e,δ>0$ and each function $f\in {\mathscr F}$, whenever $\absσ\geq s(δ,\e)$, we have with confidence $1-δ$ that the expected difference between $f(X_{n+1})$ and the image of $f\vertσ$ under $\mathcal L$ does not exceed $\e$ conditionally on $X_1,X_2,\ldots,X_n$. Thus, instead of learning the function $f$ as such, we are learning to a given accuracy $\e$ the predictive behaviour of $f$ at the future points $X_i(ω)$, $i>n$ of the sample path. Using de Finetti's theorem, we show that if a universally separable function class $\mathscr F$ is distribution-free PAC learnable under i.i.d. inputs, then it is distribution-free predictive PAC learnable under exchangeable inputs, with a slightly worse sample complexity.

preprint2009arXiv

Concentration of measure and whirly actions of Polish groups

A weakly continuous near-action of a Polish group $G$ on a standard Lebesgue measure space $(X,μ)$ is whirly if for every $A\subseteq X$ of strictly positive measure and every neighbourhood $V$ of identity in $G$ the set $VA$ has full measure. This is a strong version of ergodicity, and locally compact groups never admit whirly actions. On the contrary, every ergodic near-action by a Polish Lévy group in the sense of Gromov and Milman, such as $U(\ell^2)$, is whirly (Glasner--Tsirelson--Weiss). We give examples of closed subgroups of the group $\Aut(X,μ)$ of measure preserving automorphisms of a standard Lebesgue measure space (with the weak topology) whose tautological action on $(X,μ)$ is whirly, and which are not Lévy groups, thus answering a question of Glasner and Weiss.

preprint2009arXiv

Curse of Dimensionality in Pivot-based Indexes

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples $X_d$ picked in i.i.d. fashion from metric spaces $Ω_d$. We allow the size of the dataset $n=n_d$ to be such that $d$, the ``dimension'', is superlogarithmic but subpolynomial in $n$. The number of pivots is allowed to grow as $o(n/d)$. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces $Ω_d$ in the sense of concentration of measure phenomenon is $O(d)$, then the performance of similarity search pivot-based indexes is asymptotically linear in $n$.

preprint2007arXiv

Intrinsic dimension of a dataset: what properties does one expect?

We propose an axiomatic approach to the concept of an intrinsic dimension of a dataset, based on a viewpoint of geometry of high-dimensional structures. Our first axiom postulates that high values of dimension be indicative of the presence of the curse of dimensionality (in a certain precise mathematical sense). The second axiom requires the dimension to depend smoothly on a distance between datasets (so that the dimension of a dataset and that of an approximating principal manifold would be close to each other). The third axiom is a normalization condition: the dimension of the Euclidean $n$-sphere $\s^n$ is $Θ(n)$. We give an example of a dimension function satisfying our axioms, even though it is in general computationally unfeasible, and discuss a computationally cheap function satisfying most but not all of our axioms (the ``intrinsic dimensionality'' of Chávez et al.)

preprint1999arXiv

A geometric framework for modelling similarity search

The aim of this paper is to propose a geometric framework for modelling similarity search in large and multidimensional data spaces of general nature, which seems to be flexible enough to address such issues as analysis of complexity, indexability, and the `curse of dimensionality.' Such a framework is provided by the concept of the so-called similarity workload, which is a probability metric space $Ω$ (query domain) with a distinguished finite subspace $X$ (dataset), together with an assembly of concepts, techniques, and results from metric geometry. They include such notions as metric transform, $\e$-entropy, and the phenomenon of concentration of measure on high-dimensional structures. In particular, we discuss the relevance of the latter to understanding the curse of dimensionality. As some of those concepts and techniques are being currently reinvented by the database community, it seems desirable to try and bridge the gap between database research and the relevant work already done in geometry and analysis.

preprint1997arXiv

Two 1935 questions of Mazur about polynomials in Banach spaces: a counter-example

We construct a continuous scalar-valued 2-polynomial, $W$, on the separable Hilbert space $l_2$ and an unbounded set $R\subset l_2$ such that (i) $W$ is bounded on an $ε$-neighbourhood of $R$; (ii) $W$ is unbounded on ${1/2} R$; (iii) consequently, $W$ does not factor through any bounded 1-polynomial on $l_2$ sending $R$ to a bounded set. This answers in the negative two 1935 questions asked by Mazur (problems 55 and 75 in the Scottish Book). The construction is valid both over $\R$ and $\C$. (In finite dimensions the questions were answered in the positive by Auerbach soon after being asked.)

Vladimir Pestov

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

A note on the groups of finite type and the Hartman-Mycielski construction

An amenability-like property of finite energy path and loop groups

Text Categorization via Similarity Search: An Efficient and Effective Novel Algorithm

Is the k-NN classifier in high dimensions affected by the curse of dimensionality?

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

PAC learnability under non-atomic measures: a problem by Vidyasagar

Indexability, concentration, and VC theory

PAC learnability versus VC dimension: a footnote to a basic result of statistical learning

A note on sample complexity of learning binary output neural networks under fixed input distributions

Intrinsic Dimensionality

PAC learnability of a concept class under non-atomic measures: a problem by Vidyasagar

Predictive PAC learnability: a paradigm for learning from exchangeable input data

Concentration of measure and whirly actions of Polish groups

Curse of Dimensionality in Pivot-based Indexes

Intrinsic dimension of a dataset: what properties does one expect?

A geometric framework for modelling similarity search

Two 1935 questions of Mazur about polynomials in Banach spaces: a counter-example