Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2022arXiv

An Informational Space Based Semantic Analysis for Scientific Texts

One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. Computational methods extracting semantic feature are used to analyse the relations between texts of messages and 'representations of situations' for a newly created large collection of scientific texts, Leicester Scientific Corpus. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties, with the vectors of some attributes: a list of scientific subject categories that the text belongs to. First, this paper introduces 'Meaning Space' in which the informational representation of the meaning is extracted from the occurrence of the word in texts across the scientific categories, i.e., the meaning of a word is represented by a vector of Relative Information Gain about the subject categories. Then, the meaning space is statistically analysed for Leicester Scientific Dictionary-Core and we investigate 'Principal Components of the Meaning' to describe the adequate dimensions of the meaning. The research in this paper conducts the base for the geometric representation of the meaning of texts.

preprint2022arXiv

Learning from few examples with nonlinear feature maps

In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations.

preprint2022arXiv

Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation

Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality.

preprint2021arXiv

General stochastic separation theorems with optimal bounds

Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain.

preprint2021arXiv

Transition states and entangled mass action law

The classical approaches to the derivation of the (generalized) Mass Action Law (MAL) assume that the intermediate transition state (i) has short life time and (ii) is in partial equilibrium with the initial reagents of the elementary reaction. The partial equilibrium assumption (ii) means that the reverse decomposition of the intermediates is much faster than its transition through other channels to the products. In this work we demonstrate how avoiding this partial equilibrium assumption modifies the reaction rates. This kinetic revision of transition state theory results in an effective `entanglement' of reaction rates, which become linear combinations of different MAL expressions.

preprint2020arXiv

Blessing of dimensionality at the edge

In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems and is illustrated with an example on the identification faulty processes in Computer Numerical Control (CNC) milling and with a case study on adaptive removal of false positives in an industrial video surveillance and analytics system.

preprint2020arXiv

Formation of working memory in a spiking neuron network accompanied by astrocytes

We propose a biologically plausible computational model of working memory (WM) implemented by the spiking neuron network (SNN) interacting with a network of astrocytes. SNN is modelled by the synaptically coupled Izhikevich neurons with a non-specific architecture connection topology. Astrocytes generating calcium signals are connected by local gap junction diffusive couplings and interact with neurons by chemicals diffused in the extracellular space. Calcium elevations occur in response to the increase of concentration of a neurotransmitter released by spiking neurons when a group of them fire coherently. In turn, gliotransmitters are released by activated astrocytes modulating the strengths of synaptic connections in the corresponding neuronal group. Input information is encoded as two-dimensional patterns of short applied current pulses stimulating neurons. The output is taken from frequencies of transient discharges of corresponding neurons. We show how a set of information patterns with quite significant overlapping areas can be uploaded into the neuron-astrocyte network and stored for several seconds. Information retrieval is organised by the application of a cue pattern representing the one from the memory set distorted by noise. We found that successful retrieval with level of the correlation between recalled pattern and ideal pattern more than 90% is possible for multi-item WM task. Having analysed the dynamical mechanism of WM formation, we discovered that astrocytes operating at a time scale of a dozen of seconds can successfully store traces of neuronal activations corresponding to information patterns. In the retrieval stage, the astrocytic network selectively modulates synaptic connections in SNN leading to the successful recall. Information and dynamical characteristics of the proposed WM model agrees with classical concepts and other WM models.

preprint2020arXiv

Fractional norms and quasinorms do not help to overcome the curse of dimensionality

The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant.

preprint2020arXiv

High--Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality

High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the "curse of dimensionality" states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the "blessing of dimensionality", has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.

preprint2020arXiv

Informational Space of Meaning for Scientific Texts

In Natural Language Processing, automatic extracting the meaning of texts constitutes an important problem. Our focus is the computational analysis of meaning of short scientific texts (abstracts or brief reports). In this paper, a vector space model is developed for quantifying the meaning of words and texts. We introduce the Meaning Space, in which the meaning of a word is represented by a vector of Relative Information Gain (RIG) about the subject categories that the text belongs to, which can be obtained from observing the word in the text. This new approach is applied to construct the Meaning Space based on Leicester Scientific Corpus (LSC) and Leicester Scientific Dictionary-Core (LScDC). The LSC is a scientific corpus of 1,673,350 abstracts and the LScDC is a scientific dictionary which words are extracted from the LSC. Each text in the LSC belongs to at least one of 252 subject categories of Web of Science (WoS). These categories are used in construction of vectors of information gains. The Meaning Space is described and statistically analysed for the LSC with the LScDC. The usefulness of the proposed representation model is evaluated through top-ranked words in each category. The most informative n words are ordered. We demonstrated that RIG-based word ranking is much more useful than ranking based on raw word frequency in determining the science-specific meaning and importance of a word. The proposed model based on RIG is shown to have ability to stand out topic-specific words in categories. The most informative words are presented for 252 categories. The new scientific dictionary and the 103,998 x 252 Word-Category RIG Matrix are available online. Analysis of the Meaning Space provides us with a tool to further explore quantifying the meaning of a text using more complex and context-dependent meaning models that use co-occurrence of words and their combinations.

preprint2020arXiv

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large.

preprint2020arXiv

Personality Traits and Drug Consumption. A Story Told by Data

This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available online.) - An introductory description of the data mining and machine learning methods used for the analysis of this dataset. - The demonstration that the personality traits (five factor model, impulsivity, and sensation seeking), together with simple demographic data, give the possibility of predicting the risk of consumption of individual drugs with sensitivity and specificity above 70% for most drugs. - The analysis of correlations of use of different substances and the description of the groups of drugs with correlated use (correlation pleiades). - Proof of significant differences of personality profiles for users of different drugs. This is explicitly proved for benzodiazepines, ecstasy, and heroin. - Tables of personality profiles for users and non-users of 18 substances. The book is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of machine learning, advanced data mining concepts or modern psychology of personality is assumed. For more detailed introduction into statistical methods we recommend several undergraduate textbooks. Familiarity with basic statistics and some experience in the use of probabilities would be helpful as well as some basic technical understanding of psychology.

preprint2020arXiv

Singularities of transient processes in dynamics and beyond

This note is a brief review of the analysis of long transients in dynamical systems. The problem of long transients arose in many disciplines, from physical and chemical kinetic to biology and even social sciences. Detailed analysis of singularities of various `relaxation times' associated long transients with bifurcations of $ω$-limit sets, homoclinic structures (intersections of $α$- and $ω$-limit sets) and other peculiarities of dynamics. This review was stimulated by the analysis of anomalously long transients in ecology published recently by A. Morozov and S. Petrovskii with co-authors.

preprint2020arXiv

Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data

Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such as lethal or recovery states). Extracting this information directly from the data remains challenging, especially in the case of synchronic (with a short-term follow up) observations. Here we suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values, through modeling the geometrical data structure as a bouquet of bifurcating clinical trajectories. The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations. The methodology allows positioning a patient on a particular clinical trajectory (pathological scenario) and characterizing the degree of progression along it with a qualitative estimate of the uncertainty of the prognosis. Overall, our pseudo-time quantification-based approach gives a possibility to apply the methods developed for dynamical disease phenotyping and illness trajectory analysis (diachronic data analysis) to synchronic observational data. We developed a tool $ClinTrajan$ for clinical trajectory analysis implemented in Python programming language. We test the methodology in two large publicly available datasets: myocardial infarction complications and readmission of diabetic patients data.

preprint2019arXiv

Basic, simple and extendable kinetic model of protein synthesis

Protein synthesis is one of the most fundamental biological processes, which consumes a significant amount of cellular resources. Despite existence of multiple mathematical models of translation, varying in the level of mechanistical details, surprisingly, there is no basic and simple chemical kinetic model of this process, derived directly from the detailed kinetic model. One of the reasons for this is that the translation process is characterized by indefinite number of states, thanks to existence of polysomes. We bypass this difficulty by applying a trick consisting in lumping multiple states of translated mRNA into few dynamical variables and by introducing a variable describing the pool of translating ribosomes. The simplest model can be solved analytically under some assumptions. The basic and simple model can be extended, if necessary, to take into account various phenomena such as the interaction between translating ribosomes, limited amount of ribosomal units or regulation of translation by microRNA. The model can be used as a building block (translation module) for more complex models of cellular processes. We demonstrate the utility of the model in two examples. First, we determine the critical parameters of the single protein synthesis for the case when the ribosomal units are abundant. Second, we demonstrate intrinsic bi-stability in the dynamics of the ribosomal protein turnover and predict that a minimal number of ribosomes should pre-exists in a living cell to sustain its protein synthesis machinery, even in the absence of proliferation.

preprint2019arXiv

Multivariate Gaussian and Student$-t$ Process Regression for Multi-output Prediction

Gaussian process model for vector-valued function has been shown to be useful for multi-output prediction. The existing method for this model is to re-formulate the matrix-variate Gaussian distribution as a multivariate normal distribution. Although it is effective in many cases, re-formulation is not always workable and is difficult to apply to other distributions because not all matrix-variate distributions can be transformed to respective multivariate distributions, such as the case for matrix-variate Student$-t$ distribution. In this paper, we propose a unified framework which is used not only to introduce a novel multivariate Student$-t$ process regression model (MV-TPR) for multi-output prediction, but also to reformulate the multivariate Gaussian process regression (MV-GPR) that overcomes some limitations of the existing methods. Both MV-GPR and MV-TPR have closed-form expressions for the marginal likelihoods and predictive distributions under this unified framework and thus can adopt the same optimization approaches as used in the conventional GPR. The usefulness of the proposed methods is illustrated through several simulated and real data examples. In particular, we verify empirically that MV-TPR has superiority for the datasets considered, including air quality prediction and bike rent prediction. At last, the proposed methods are shown to produce profitable investment strategies in the stock markets.

preprint2018arXiv

High-dimensional brain. A tool for encoding and rapid learning of memories by single neurons

Codifying memories is one of the fundamental problems of modern Neuroscience. The functional mechanisms behind this phenomenon remain largely unknown. Experimental evidence suggests that some of the memory functions are performed by stratified brain structures such as, e.g., the hippocampus. In this particular case, single neurons in the CA1 region receive a highly multidimensional input from the CA3 area, which is a hub for information processing. We thus assess the implication of the abundance of neuronal signalling routes converging onto single cells on the information processing. We show that single neurons can selectively detect and learn arbitrary information items, given that they operate in high dimensions. The argument is based on Stochastic Separation Theorems and the concentration of measure phenomena. We demonstrate that a simple enough functional neuronal model is capable of explaining: i) the extreme selectivity of single neurons to the information content, ii) simultaneous separation of several uncorrelated stimuli or informational items from a large set, and iii) dynamic learning of new items by associating them with already "known" ones. These results constitute a basis for organization of complex memories in ensembles of single neurons. Moreover, they show that no a priori assumptions on the structural organization of neuronal ensembles are necessary for explaining basic concepts of static and dynamic memories.

preprint2018arXiv

Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.

preprint2017arXiv

Knowledge Transfer Between Artificial Intelligence Systems

We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the "student" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the "student" system can successfully and non-iteratively learn $k\ll n$ new examples from the "teacher" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.

preprint2009arXiv

Dynamical modeling of microRNA action on the protein translation process

Protein translation is a multistep process which can be represented as a cascade of biochemical reactions (initiation, ribosome assembly, elongation, etc.), the rate of which can be regulated by small non-coding microRNAs through multiple mechanisms. It remains unclear what mechanisms of microRNA action are most dominant: moreover, many experimental reports deliver controversal messages on what is the concrete mechanism actually observed in the experiment. Parker and Nissan (Parker and Nissan, RNA, 2008) demonstrated that it is impossible to distinguish alternative biological hypotheses using the steady state data on the rate of protein synthesis. For their analysis they used two simple kinetic models of protein translation. In contrary, we show that dynamical data allow to discriminate some of the mechanisms of microRNA action. We demonstrate this using the same models as in (Parker and Nissan, RNA, 2008) for the sake of comparison but the methods developed (asymptotology of biochemical networks) can be used for other models. As one of the results of our analysis, we formulate a hypothesis that the effect of microRNA action is measurable and observable only if it affects the dominant system (generalization of the limiting step notion for complex networks) of the protein translation machinery. The dominant system can vary in different experimental conditions that can partially explain the existing controversy of some of the experimental data.