Source author record

Tijana Milenkovic

Tijana Milenkovic appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Molecular Networks Social and Information Networks Machine Learning physics.soc-ph Computational Engineering, Finance, and Science Discrete Mathematics

Catalog footprint

What is connected

14works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Data-driven biological network alignment that uses topological, sequence, and functional information

Many proteins remain functionally unannotated. Sequence alignment (SA) uncovers missing annotations by transferring functional knowledge between species' sequence-conserved regions. Because SA is imperfect, network alignment (NA) complements SA by transferring functional knowledge between conserved biological network, rather than just sequence, regions of different species. Existing NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are almost as topologically similar as functionally related proteins. So, we redefined NA as a data-driven framework, TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to the proteins' functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, its alignments yielded higher protein functional prediction accuracy than alignments of existing NA methods, even those that used both topological and sequence information. Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods.

preprint2020arXiv

Data-driven network alignment

Biological network alignment (NA) aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for transfer of functional knowledge between the aligned nodes. However, current NA methods do not end up aligning functionally related nodes. A likely reason is that they assume it is topologically similar nodes that are functionally related. However, we show that this assumption does not hold well. So, a paradigm shift is needed with how the NA problem is approached. We redefine NA as a data-driven framework, TARA (daTA-dRiven network Alignment), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity, like traditional NA methods do. TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns. We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. Clearly, TARA as currently implemented uses topological but not protein sequence information for this task. We find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance.

preprint2020arXiv

Heterogeneous network approach to predict individuals' mental health

Depression and anxiety are critical public health issues affecting millions of people around the world. To identify individuals who are vulnerable to depression and anxiety, predictive models have been built that typically utilize data from one source. Unlike these traditional models, in this study, we leverage a rich heterogeneous data set from the University of Notre Dame's NetHealth study that collected individuals' (student participants') social interaction data via smartphones, health-related behavioral data via wearables (Fitbit), and trait data from surveys. To integrate the different types of information, we model the NetHealth data as a heterogeneous information network (HIN). Then, we redefine the problem of predicting individuals' mental health conditions (depression or anxiety) in a novel manner, as applying to our HIN a popular paradigm of a recommender system (RS), which is typically used to predict the preference that a person would give to an item (e.g., a movie or book). In our case, the items are the individuals' different mental health states. We evaluate four state-of-the-art RS approaches. Also, we model the prediction of individuals' mental health as another problem type - that of node classification (NC) in our HIN, evaluating in the process four node features under logistic regression as a proof-of-concept classifier. We find that our RS and NC network methods produce more accurate predictions than a logistic regression model using the same NetHealth data in the traditional non-network fashion as well as a random-approach. Also, we find that the best of the considered RS approaches outperforms all considered NC approaches. This is the first study to integrate smartphone, wearable sensor, and survey data in an HIN manner and use RS or NC on the HIN to predict individuals' mental health conditions.

preprint2020arXiv

Inference of the Dynamic Aging-related Biological Subnetwork via Network Propagation

Gene expression (GE) data capture valuable condition-specific information ("condition" can mean a biological process, disease stage, age, patient, etc.) However, GE analyses ignore physical interactions between gene products, i.e., proteins. Since proteins function by interacting with each other, and since biological networks (BNs) capture these interactions, BN analyses are promising. However, current BN data fail to capture condition-specific information. Recently, GE and BN data have been integrated using network propagation (NP) to infer condition-specific BNs. However, existing NP-based studies result in a static condition-specific subnetwork, even though cellular processes are dynamic. A dynamic process of our interest is human aging. We use prominent existing NP methods in a new task of inferring a dynamic rather than static condition-specific (aging-related) subnetwork. Then, we study evolution of network structure with age - we identify proteins whose network positions significantly change with age and predict them as new aging-related candidates. We validate the predictions via e.g., functional enrichment analyses and literature search. Dynamic network inference via NP yields higher prediction quality than the only existing method for inferring a dynamic aging-related BN, which does not use NP.

preprint2020arXiv

Network-based protein structural classification

Experimental determination of protein function is resource-consuming. As an alternative, computational prediction of protein function has received attention. In this context, protein structural classification (PSC) can help, by allowing for determining structural classes of currently unclassified proteins based on their features, and then relying on the fact that proteins with similar structures have similar functions. Existing PSC approaches rely on sequence-based or direct 3-dimensional (3D) structure-based protein features. In contrast, we first model 3D structures of proteins as protein structure networks (PSNs). Then, we use network-based features for PSC. We propose the use of graphlets, state-of-the-art features in many research areas of network science, in the task of PSC. Moreover, because graphlets can deal only with unweighted PSNs, and because accounting for edge weights when constructing PSNs could improve PSC accuracy, we also propose a deep learning framework that automatically learns network features from weighted PSNs. When evaluated on a large set of ~9,400 CATH and ~12,800 SCOP protein domains (spanning 36 PSN sets), our proposed approaches are superior to existing PSC approaches in terms of accuracy, with comparable running time.

preprint2020arXiv

Pairwise versus multiple network alignment

Biological network alignment (NA) aims to identify similar regions between molecular networks of different species. NA can be local or global. Just as the recent trend in the NA field, we also focus on global NA, which can be pairwise (PNA) and multiple (MNA). PNA produces aligned node pairs between two networks. MNA produces aligned node clusters between more than two networks. Recently, the focus has shifted from PNA to MNA, because MNA captures conserved regions between more networks than PNA (and MNA is thus considered to be more insightful), though at higher computational complexity. The issue is that, due to the different outputs of PNA and MNA, a PNA method is only compared to other PNA methods, and an MNA method is only compared to other MNA methods. Comparison of PNA against MNA must be done to evaluate whether MNA's higher complexity is justified by its higher accuracy. We introduce a framework that allows for this. We evaluate eight prominent PNA and MNA methods, on synthetic and real-world biological networks, using topological and functional alignment quality measures. We compare PNA against MNA in both a pairwise (native to PNA) and multiple (native to MNA) manner. PNA is expected to perform better under the pairwise evaluation framework. Indeed this is what we find. MNA is expected to perform better under the multiple evaluation framework. Shockingly, we find this not to always hold; PNA is often better than MNA in this framework, depending on the choice of evaluation test.

preprint2016arXiv

IGLOO: Integrating global and local biological network alignment

Analogous to genomic sequence alignment, biological network alignment (NA) aims to find regions of similarities between molecular networks (rather than sequences) of different species. NA can be either local (LNA) or global (GNA). LNA aims to identify highly conserved common subnetworks, which are typically small, while GNA aims to identify large common subnetworks, which are typically suboptimally conserved. We recently showed that LNA and GNA yield complementary results: LNA has high functional but low topological alignment quality, while GNA has high topological but low functional alignment quality. Thus, we propose IGLOO, a new approach that integrates GNA and LNA in hope to reconcile the two. We evaluate IGLOO against state-of-the-art LNA (NetworkBLAST, NetAligner, AlignNemo, and AlignMCL) and GNA (GHOST, NETAL, GEDEVO, MAGNA++, WAVE, and L-GRAAL) methods. We show that IGLOO allows for a trade-off between topological and functional alignment quality better than the existing LNA and GNA methods considered in our study.

preprint2016arXiv

Multiple network alignment via multiMAGNA++

Network alignment (NA) aims to find a node mapping between molecular networks of different species that identifies topologically or functionally similar network regions. Analogous to genomic sequence alignment, NA can be used to transfer biological knowledge from well- to poorly-studied species between aligned network regions. Pairwise NA (PNA) finds similar regions between two networks while multiple NA (MNA) can align more than two networks. We focus on MNA. Existing MNA methods aim to maximize total similarity over all aligned nodes (node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Directly optimizing edge conservation during alignment construction in addition to node conservation may result in superior alignments. Thus, we present a novel MNA approach called multiMAGNA++ that can achieve this. Indeed, multiMAGNA++ generally outperforms or is on par with the existing MNA methods, while often completing faster than the existing methods. That is, multiMAGNA++ scales well to larger network data and can be parallelized effectively. During method evaluation, we also introduce new MNA quality measures to allow for more complete alignment characterization as well as more fair MNA method comparison compared to using only the existing alignment quality measures.

preprint2016arXiv

SCOUT: simultaneous time segmentation and community detection in dynamic networks

Many evolving complex systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which identifies groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share one community organization. In reality, the truth likely lies between these two extremes, since some time periods can have community organization that evolves while others can have community organization that stays the same. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which typically consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity.

preprint2015arXiv

Local versus Global Biological Network Alignment

Network alignment (NA) aims to find regions of similarities between molecular networks of different species. There exist two NA categories: local (LNA) or global (GNA). LNA finds small highly conserved network regions and produces a many-to-many node mapping. GNA finds large conserved regions and produces a one-to-one node mapping. Given the different outputs of LNA and GNA, when a new NA method is proposed, it is compared against existing methods from the same category. However, both NA categories have the same goal: to allow for transferring functional knowledge from well- to poorly-studied species between conserved network regions. So, which one to choose, LNA or GNA? To answer this, we introduce the first systematic evaluation of the two NA categories. We introduce new measures of alignment quality that allow for fair comparison of the different LNA and GNA outputs, as such measures do not exist. We provide user-friendly software for efficient alignment evaluation that implements the new and existing measures. We evaluate prominent LNA and GNA methods on synthetic and real-world biological networks. We study the effect on alignment quality of using different interaction types and confidence levels. We find that the superiority of one NA category over the other is context-dependent. Further, when we contrast LNA and GNA in the application of learning novel protein functional knowledge, the two produce very different predictions, indicating their complementarity. Our results and software provide guidelines for future NA method development and evaluation.

preprint2014arXiv

Exploring the structure and function of temporal networks with dynamic graphlets

With the growing amount of available temporal real-world network data, an important question is how to efficiently study these data. One can simply model a temporal network as either a single aggregate static network, or as a series of time-specific snapshots, each of which is an aggregate static network over the corresponding time window. The advantage of modeling the temporal data in these two ways is that one can use existing well established methods for static network analysis to study the resulting aggregate network(s). Here, we develop a novel approach for studying temporal network data more explicitly. We base our methodology on the well established notion of graphlets (subgraphs), which have been successfully used in numerous contexts in static network research. Here, we take the notion of static graphlets to the next level and develop new theory needed to allow for graphlet-based analysis of temporal networks. Our new notion of dynamic graphlets is quite different than existing approaches for dynamic network analysis that are based on temporal motifs (statistically significant subgraphs). Namely, these approaches suffer from many limitations. For example, they can only deal with subgraph structures of limited complexity. Also, their major drawback is that their results heavily depend on the choice of a null network model that is required to evaluate the significance of a subgraph. However, choosing an appropriate null network model is a non-trivial task. Our dynamic graphlet approach overcomes the limitations of the existing temporal motif-based approaches. At the same time, when we thoroughly evaluate the ability of our new approach to characterize the structure and function of an entire temporal network or of individual nodes, we find that the dynamic graphlet approach outperforms the static graphlet approach, which indicates that accounting for temporal information helps.

preprint2013arXiv

Dynamic networks reveal key players in aging

Motivation: Since susceptibility to diseases increases with age, studying aging gains importance. Analyses of gene expression or sequence data, which have been indispensable for investigating aging, have been limited to studying genes and their protein products in isolation, ignoring their connectivities. However, proteins function by interacting with other proteins, and this is exactly what biological networks (BNs) model. Thus, analyzing the proteins' BN topologies could contribute to understanding of aging. Current methods for analyzing systems-level BNs deal with their static representations, even though cells are dynamic. For this reason, and because different data types can give complementary biological insights, we integrate current static BNs with aging-related gene expression data to construct dynamic, age-specific BNs. Then, we apply sensitive measures of topology to the dynamic BNs to study cellular changes with age. Results: While global BN topologies do not significantly change with age, local topologies of a number of genes do. We predict such genes as aging-related. We demonstrate credibility of our predictions by: 1) observing significant overlap between our predicted aging-related genes and "ground truth" aging-related genes; 2) showing that our aging-related predictions group by functions and diseases that are different than functions and diseases of genes that are not predicted as aging-related; 3) observing significant overlap between functions and diseases that are enriched in our aging-related predictions and those that are enriched in "ground truth" aging-related data; 4) providing evidence that diseases which are enriched in our aging-related predictions are linked to human aging; and 5) validating all of our high-scoring novel predictions via manual literature search.

preprint2012arXiv

Identifying edge clusters in networks via edge graphlet degree vectors (edge-GDVs) and edge-GDV-similarities

Inference of new biological knowledge, e.g., prediction of protein function, from protein-protein interaction (PPI) networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and predict protein function from the clusters. Traditionally, network research has focused on clustering of nodes. However, why favor nodes over edges, when clustering of edges may be preferred? For example, nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was proposed recently, outperforming different node clustering methods. However, since some biological processes can have characteristic "signatures" throughout the network, not just locally, it may be of interest to consider edges that are not necessarily adjacent. Hence, we design a sensitive measure of the "topological similarity" of edges that can deal with edges that are not necessarily adjacent. We cluster edges that are similar according to our measure in different baker's yeast PPI networks, outperforming existing node and edge clustering approaches.

preprint2009arXiv

An integrative approach to modeling biological networks

Since proteins carry out biological processes by interacting with other proteins, analyzing the structure of protein-protein interaction (PPI) networks could explain complex biological mechanisms, evolution, and disease. Similarly, studying protein structure networks, residue interaction graphs (RIGs), might provide insights into protein folding, stability, and function. The first step towards understanding these networks is finding an adequate network model that closely replicates their structure. Evaluating the fit of a model to the data requires comparing the model with real-world networks. Since network comparisons are computationally infeasible, they rely on heuristics, or "network properties." We show that it is difficult to assess the reliability of the fit of a model with any individual network property. Thus, our approach integrates a variety of network properties and further combines these with a series of probabilistic methods to predict an appropriate network model for biological networks. We find geometric random graphs, that model spatial relationships between objects, to be the best-fitting model for RIGs. This validates the correctness of our method, since RIGs have previously been shown to be geometric. We apply our approach to noisy PPI networks and demonstrate that their structure is also consistent with geometric random graphs.

Tijana Milenkovic

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Data-driven biological network alignment that uses topological, sequence, and functional information

Data-driven network alignment

Heterogeneous network approach to predict individuals' mental health

Inference of the Dynamic Aging-related Biological Subnetwork via Network Propagation

Network-based protein structural classification

Pairwise versus multiple network alignment

IGLOO: Integrating global and local biological network alignment

Multiple network alignment via multiMAGNA++

SCOUT: simultaneous time segmentation and community detection in dynamic networks

Local versus Global Biological Network Alignment

Exploring the structure and function of temporal networks with dynamic graphlets

Dynamic networks reveal key players in aging

Identifying edge clusters in networks via edge graphlet degree vectors (edge-GDVs) and edge-GDV-similarities

An integrative approach to modeling biological networks