Source author record

Giuseppe Jurman

Giuseppe Jurman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Molecular Networks Machine Learning Quantitative Methods physics.soc-ph math.CO Social and Information Networks math.DS math.RA Applications Computer Vision Cryptography and Security math.GR Mathematical Software Neural and Evolutionary Computing Neurons and Cognition

Catalog footprint

What is connected

22works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Convolutional Neural Network for Stereotypical Motor Movement Detection in Autism

Autism Spectrum Disorders (ASDs) are often associated with specific atypical postural or motor behaviors, of which Stereotypical Motor Movements (SMMs) have a specific visibility. While the identification and the quantification of SMM patterns remain complex, its automation would provide support to accurate tuning of the intervention in the therapy of autism. Therefore, it is essential to develop automatic SMM detection systems in a real world setting, taking care of strong inter-subject and intra-subject variability. Wireless accelerometer sensing technology can provide a valid infrastructure for real-time SMM detection, however such variability remains a problem also for machine learning methods, in particular whenever handcrafted features extracted from accelerometer signal are considered. Here, we propose to employ the deep learning paradigm in order to learn discriminating features from multi-sensor accelerometer signals. Our results provide preliminary evidence that feature learning and transfer learning embedded in the deep architecture achieve higher accurate SMM detectors in longitudinal scenarios.

preprint2016arXiv

Differential network analysis and graph classification: a glocal approach

Based on the glocal HIM metric and its induced graph kernel, we propose a novel solution in differential network analysis that integrates network comparison and classification tasks. The HIM distance is defined as the one-parameter family of product metrics linearly combining the normalised Hamming distance H and the normalised Ipsen-Mikhailov spectral distance IM. The combination of the two components within a single metric allows overcoming their drawbacks and obtaining a measure that is simultaneously global and local. Furthermore, plugging the HIM kernel into a Support Vector Machine gives us a classification algorithm based on the HIM distance. First, we outline the theory underlying the metric construction. We introduce two diverse applications of the HIM distance and the HIM kernel to biological datasets. This versatility supports the adoption of the HIM family as a general tool for information extraction, quantifying difference among diverse in- stances of a complex system. An Open Source implementation of the HIM metrics is provided by the R package nettols and in its web interface ReNette.

preprint2016arXiv

Metric projection for dynamic multiplex networks

Evolving multiplex networks are a powerful model for representing the dynamics along time of different phenomena, such as social networks, power grids, biological pathways. However, exploring the structure of the multiplex network time series is still an open problem. Here we propose a two-steps strategy to tackle this problem based on the concept of distance (metric) between networks. Given a multiplex graph, first a network of networks is built for each time steps, and then a real valued time series is obtained by the sequence of (simple) networks by evaluating the distance from the first element of the series. The effectiveness of this approach in detecting the occurring changes along the original time series is shown on a synthetic example first, and then on the Gulf dataset of political events.

preprint2015arXiv

Community dynamics in connected time-dependent multilayer networks

Different strategies have been considered to extract information from social media about how similarly people react to the same news or event. In this context, a powerful method is offered by the application of graph techniques to the contents produced by social network users. In particular, large events typically attract enough content traffic along time to enable an analysis that explicitly models a dependence from the time dimension. Here we demonstrate how it is possible to extend the application of community detection strategies in complex networks to the case of time-dependent multilayer networks, whenever the connection between consecutive time layers is non-trivial. We apply the method to 400K Twitter post related to the Expo event held in Milan (Italy) between May and October 2015.

preprint2015arXiv

Seasonal Linear Predictivity in National Football Championships

Predicting the results of sport matches and competitions is an arising research field, benefiting from the growing amount of available data and the novel data analytics techniques. Excellent forecasts can be achieved by advanced machine learning methods applied to detailed historical data, especially in very popular sports such as football (soccer). Here we show that, despite the large number of confounding factors, the results of a football team in longer competitions (e.g., a national league) follow a basically linear trend useful for predictive purposes, too. In support of this claim, we present a set of experiments of linear regression on a database collecting the yearly results of 707 teams playing in 22 divisions from 11 countries, in 20 football seasons.

preprint2014arXiv

DTW-MIC Coexpression Networks from Time-Course Data

When modeling coexpression networks from high-throughput time course data, Pearson Correlation Coefficient (PCC) is one of the most effective and popular similarity functions. However, its reliability is limited since it cannot capture non-linear interactions and time shifts. Here we propose to overcome these two issues by employing a novel similarity function, Dynamic Time Warping Maximal Information Coefficient (DTW-MIC), combining a measure taking care of functional interactions of signals (MIC) and a measure identifying horizontal displacements (DTW). By using the Hamming-Ipsen-Mikhailov (HIM) metric to quantify network differences, the effectiveness of the DTW-MIC approach is demonstrated on both synthetic and transcriptomic datasets.

preprint2014arXiv

Entropy Dynamics of Community Alignment in the Italian Parliament Time-Dependent Network

Complex institutions are typically characterized by meso-scale structures which are fundamental for the successful coordination of multiple agents. Here we introduce a framework to study the temporal dynamics of the node-community relationship based on the concept of community alignment, a measure derived from the modularity matrix that defines the alignment of a node with respect to the core of its community. The framework is applied to the 16th legislature of the Italian Parliament to study the dynamic relationship in voting behavior between Members of the Parliament (MPs) and their political parties. As a novel contribution, we introduce two entropy-based measures that capture politically interesting dynamics: the group alignment entropy (over a single snapshot), and the node alignment entropy (over multiple snapshots). We show that significant meso-scale changes in the time-dependent network structures can be detected by a combination of the two measures. We observe a steady growth of the group alignment entropy after a major internal conflict in the ruling majority and a different distribution of nodes alignment entropy after the government transition.

preprint2013arXiv

A combinatorial model of malware diffusion via Bluetooth connections

We outline here the mathematical expression of a diffusion model for cellphones malware transmitted through Bluetooth channels. In particular, we provide the deterministic formula underlying the proposed infection model, in its equivalent recursive (simple but computationally heavy) and closed form (more complex but efficiently computable) expression.

preprint2013arXiv

Sparse Predictive Structure of Deconvolved Functional Brain Networks

The functional and structural representation of the brain as a complex network is marked by the fact that the comparison of noisy and intrinsically correlated high-dimensional structures between experimental conditions or groups shuns typical mass univariate methods. Furthermore most network estimation methods cannot distinguish between real and spurious correlation arising from the convolution due to nodes' interaction, which thus introduces additional noise in the data. We propose a machine learning pipeline aimed at identifying multivariate differences between brain networks associated to different experimental conditions. The pipeline (1) leverages the deconvolved individual contribution of each edge and (2) maps the task into a sparse classification problem in order to construct the associated "sparse deconvolved predictive network", i.e., a graph with the same nodes of those compared but whose edge weights are defined by their relevance for out of sample predictions in classification. We present an application of the proposed method by decoding the covert attention direction (left or right) based on the single-trial functional connectivity matrix extracted from high-frequency magnetoencephalography (MEG) data. Our results demonstrate how network deconvolution matched with sparse classification methods outperforms typical approaches for MEG decoding.

preprint2013arXiv

The HIM glocal metric and kernel for network comparison and classification

Due to the ever rising importance of the network paradigm across several areas of science, comparing and classifying graphs represent essential steps in the networks analysis of complex systems. Both tasks have been recently tackled via quite different strategies, even tailored ad-hoc for the investigated problem. Here we deal with both operations by introducing the Hamming-Ipsen-Mikhailov (HIM) distance, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices. The new measure combines the local Hamming distance and the global spectral Ipsen-Mikhailov distance so to overcome the drawbacks affecting the two components separately. Building then the HIM kernel function derived from the HIM distance it is possible to move from network comparison to network classification via the Support Vector Machine (SVM) algorithm. Applications of HIM distance and HIM kernel in computational biology and social networks science demonstrate the effectiveness of the proposed functions as a general purpose solution.

preprint2012arXiv

Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers

We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties, and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large (n=1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. Availability and Implementation: Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX).

preprint2012arXiv

mlpy: Machine Learning Python

mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu.

preprint2012arXiv

Stability Indicators in Network Reconstruction

The number of algorithms available to reconstruct a biological network from a dataset of high-throughput measurements is nowadays overwhelming, but evaluating their performance when the gold standard is unknown is a difficult task. Here we propose to use a few reconstruction stability tools as a quantitative solution to this problem. We introduce four indicators to quantitatively assess the stability of a reconstructed network in terms of variability with respect to data subsampling. In particular, we give a measure of the mutual distances among the set of networks generated by a collection of data subsets (and from the network generated on the whole dataset) and we rank nodes and edges according to their decreasing variability within the same set of networks. As a key ingredient, we employ a global/local network distance combined with a bootstrap procedure. We demonstrate the use of the indicators in a controlled situation on a toy dataset, and we show their application on a miRNA microarray dataset with paired tumoral and non-tumoral tissues extracted from a cohort of 241 hepatocellular carcinoma patients.

preprint2011arXiv

A machine learning pipeline for discriminant pathways identification

Motivation: Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more in general, in systems biology. Results: In this work we propose a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. The proposal is independent from the classification algorithm used. Three applications on genomewide data are presented regarding children susceptibility to air pollution and two neurodegenerative diseases: Parkinson's and Alzheimer's. Availability: Details about the software used for the experiments discussed in this paper are provided in the Appendix.

preprint2011arXiv

Biological network comparison via Ipsen-Mikhailov distance

Highlighting similarities and differences between networks is an informative task in investigating many biological processes. Typical examples are detecting differences between an inferred network and the corresponding gold standard, or evaluating changes in a dynamic network along time. Although fruitful insights can be drawn by qualitative or feature-based methods, a distance must be used whenever a quantitative assessment is required. Here we introduce the Ipsen-Mikhailov metric for biological network comparison, based on the difference of the distributions of the Laplacian eigenvalues of the compared graphs. Being a spectral measure, its focus is on the general structure of the net so it can overcome the issues affecting local metrics such as the edit distances. Relation with the classical Matthews Correlation Coefficient (MCC) is discussed, showing the finer discriminant resolution achieved by the Ipsen-Mikhailov metric. We conclude with three examples of application in functional genomic tasks, including stability of network reconstruction as robustness to data subsampling, variability in dynamical networks and differences in networks associated to a classification task.

preprint2011arXiv

Single-base mismatch profiles for NGS samples

Within the preprocessing pipeline of a Next Generation Sequencing sample, its set of Single-Base Mismatches is one of the first outcomes, together with the number of correctly aligned reads. The union of these two sets provides a 4x4 matrix (called Single Base Indicator, SBI in what follows) representing a blueprint of the sample and its preprocessing ingredients such as the sequencer, the alignment software, the pipeline parameters. In this note we show that, under the same technological conditions, there is a strong relation between the SBI and the biological nature of the sample. To reach this goal we need to introduce a similarity measure between SBIs: we also show how two measures commonly used in machine learning can be of help in this context.

preprint2010arXiv

(Finite) presentations of Bi-Zassenhaus loop algebras

We prove that Bi-Zassenhaus loop algebras are finitely presented up to central and second central elements. In particular, we show an explicit finite presentation for a Lie algebra whose quotient over its second centre is isomorphic to a Bi-Zassenhaus loop algebra.

preprint2010arXiv

A unifying view for performance measures in multi-class prediction

In the last few years, many different performance measures have been introduced to overcome the weakness of the most natural metric, the Accuracy. Among them, Matthews Correlation Coefficient has recently gained popularity among researchers not only in machine learning but also in several application fields such as bioinformatics. Nonetheless, further novel functions are being proposed in literature. We show that Confusion Entropy, a recently introduced classifier performance measure for multi-class problems, has a strong (monotone) relation with the multi-class generalization of a classical metric, the Matthews Correlation Coefficient. Computational evidence in support of the claim is provided, together with an outline of the theoretical explanation.

preprint2010arXiv

Algebraic Comparison of Partial Lists in Bioinformatics

The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset.

preprint2010arXiv

An introduction to spectral distances in networks (extended version)

Many functions have been recently defined to assess the similarity among networks as tools for quantitative comparison. They stem from very different frameworks - and they are tuned for dealing with different situations. Here we show an overview of the spectral distances, highlighting their behavior in some basic cases of static and dynamic synthetic and real networks.

preprint2010arXiv

Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

Motivation :Reconstructing the topology of a gene regulatory network is one of the key tasks in systems biology. Despite of the wide variety of proposed methods, very little work has been dedicated to the assessment of their stability properties. Here we present a methodical comparison of the performance of a novel method (RegnANN) for gene network inference based on multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER), focussing our analysis on the prediction variability induced by both the network intrinsic structure and the available data. Results: The extensive evaluation on both synthetic data and a selection of gene modules of "Escherichia coli" indicates that all the algorithms suffer of instability and variability issues with regards to the reconstruction of the topology of the network. This instability makes objectively very hard the task of establishing which method performs best. Nevertheless, RegnANN shows MCC scores that compare very favorably with all the other inference methods tested. Availability: The software for the RegnANN inference algorithm is distributed under GPL3 and it is available at the corresponding author home page (http://mpba.fbk.eu/grimaldi/regnann-supmat)

preprint2008arXiv

The structure of thin Lie algebras up to the second diamond

Thin Lie algebras are Lie algebras L, graded over the positive integers, with all homogeneous components of dimension at most two, and satisfying a more stringent but natural narrowness condition modeled on an analogous one for pro-p groups. The two-dimensional homogeneous components of L, which include that of degree one, are named diamonds. Infinite-dimensional thin Lie algebras with various diamond patterns have been produced, over fields of positive characteristic, as loop algebras of suitable finite-dimensional simple Lie algebras, of classical or of Cartan type depending on the location of the second diamond. The goal of this paper is a description of the initial structure of a thin Lie algebra, up to the second diamond. Specifically, if L_k is the second diamond of L, then the quotient L/L^k is a graded Lie algebras of maximal class. In characteristic not two, L/L^k is known to be metabelian, and hence uniquely determined up to isomorphism by its dimension k, which ranges in an explicitly known set of possible values. The quotient L/L^k need not be metabelian in characteristic two. We describe here all the possibilities for L/L^k up to isomorphism. In particular, we prove that k+1 equals a power of two.

Giuseppe Jurman

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Convolutional Neural Network for Stereotypical Motor Movement Detection in Autism

Differential network analysis and graph classification: a glocal approach

Metric projection for dynamic multiplex networks

Community dynamics in connected time-dependent multilayer networks

Seasonal Linear Predictivity in National Football Championships

DTW-MIC Coexpression Networks from Time-Course Data

Entropy Dynamics of Community Alignment in the Italian Parliament Time-Dependent Network

A combinatorial model of malware diffusion via Bluetooth connections

Sparse Predictive Structure of Deconvolved Functional Brain Networks

The HIM glocal metric and kernel for network comparison and classification

Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers

mlpy: Machine Learning Python

Stability Indicators in Network Reconstruction

A machine learning pipeline for discriminant pathways identification

Biological network comparison via Ipsen-Mikhailov distance

Single-base mismatch profiles for NGS samples

(Finite) presentations of Bi-Zassenhaus loop algebras

A unifying view for performance measures in multi-class prediction

Algebraic Comparison of Partial Lists in Bioinformatics

An introduction to spectral distances in networks (extended version)

Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

The structure of thin Lie algebras up to the second diamond