Source author record

Ji Zhu

Ji Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning Applications physics.soc-ph Social and Information Networks Information Theory math.IT math.ST Performance Statistics Theory Computation Cryptography and Security Data Structures and Algorithms math.PR Networking and Internet Architecture physics.data-an quant-ph

Catalog footprint

What is connected

26works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Locally sparse varying coefficient mixed model with application to longitudinal microbiome differential abundance

Differential abundance (DA) analysis in microbiome studies has recently been used to uncover a plethora of associations between microbial composition and various health conditions. While current approaches to DA typically apply only to cross-sectional data, many studies feature a longitudinal design to better understand the underlying microbial dynamics. To study DA in longitudinal microbial studies, we introduce a novel varying coefficient mixed-effects model with local sparsity. The proposed method can identify time intervals of significant group differences while accounting for temporal dependence. Specifically, we exploit a penalized kernel smoothing approach for parameter estimation and include a random effect to account for serial correlation. In particular, our method operates effectively regardless of whether sampling times are shared across subjects, accommodating irregular sampling and missing observations. Simulation studies demonstrate the necessity of modeling dependence for precise estimation and support recovery. The application of our method to a longitudinal study of mice oral microbiome during cancer development revealed significant scientific insights that were otherwise not discernible through cross-sectional analyses. An R implementation is available at https://github.com/fontaine618/LSVCMM.

preprint2022arXiv

Adaptive Algorithm for Quantum Amplitude Estimation

Quantum amplitude estimation is a key sub-routine of a number of quantum algorithms with various applications. We propose an adaptive algorithm for interval estimation of amplitudes. The quantum part of the algorithm is based only on Grover's algorithm. The key ingredient is the introduction of an adjustment factor, which adjusts the amplitude of good states such that the amplitude after the adjustment, and the original amplitude, can be estimated without ambiguity in the subsequent step. We show with numerical studies that the proposed algorithm uses a similar number of quantum queries to achieve the same level of precision $ε$ compared to state-of-the-art algorithms, but the classical part, i.e., the non-quantum part, has substantially lower computational complexity. We rigorously prove that the number of oracle queries achieves $O(1/ε)$, i.e., a quadratic speedup over classical Monte Carlo sampling, and the computational complexity of the classical part achieves $O(\log(1/ε))$, both up to a double-logarithmic factor.

preprint2020arXiv

Detection of the number of principal components by extended AIC-type method

Estimating the number of principal components is one of the fundamental problems in many scientific fields such as signal processing (or the spiked covariance model). In this paper, we first demonstrate that, for fixed $p$, any penalty term of the form $k'(p-k'/2+1/2)C_n$ may lead to an asymptotically consistent estimator under the condition that $C_n\to\infty$ and $C_n/n\to0$. We also extend our results to the case $n,p\to\infty$, with $p/n\to c>0$. In this case, for $k=o(n^{\frac{1}{3}})$, we first investigate the limiting laws for the leading eigenvalues of the sample covariance matrix $S_n$ under the condition that $λ_k>1+\sqrt{c}$. At low SNR, since the AIC tends to underestimate the number of signals $k$, the AIC should be re-defined in this case. As a natural extension of the AIC for fixed $p$, we propose the extended AIC (EAIC), i.e., the AIC-type method with tuning parameter $γ=φ(c)=1/2+\sqrt{1/c}-\log(1+\sqrt{c})/c$, and demonstrate that the EAIC-type method, i.e., the AIC-type method with tuning parameter $γ>φ(c)$, can select the number of signals $k$ consistently. In the following two cases, (1) $p$ fixed, $n\to\infty$, (2) $n,p\to\infty$ with $p/n\to 0$, if the AIC is defined as the degeneration of the EAIC in the case $n,p\to\infty$ with $p/n\to c>0$, i.e., $γ=\lim_{c\rightarrow 0+0}φ(c)=1$, then we have essentially demonstrated that, to achieve the consistency of the AIC-type method in the above two cases, $γ>1$ is required. Moreover, we show that the EAIC-type method is essentially tuning-free and outperforms the well-known KN estimator proposed in Kritchman and Nadler (2008) and the BCF estimator proposed in Bai, Choi and Fujikoshi (2018). Numerical studies indicate that the proposed method works well.

preprint2020arXiv

High-dimensional Gaussian graphical model for network-linked data

Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to violate these assumptions. Here we develop a Gaussian graphical model for observations connected by a network with potentially different mean vectors, varying smoothly over the network. We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network. We also prove that our method estimates both the inverse covariance matrix and the corresponding graph structure correctly under the assumption of network â€œcohesionâ€, which refers to the empirically observed phenomenon of network neighbors sharing similar traits.

preprint2020arXiv

Network cross-validation by edge sampling

While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. Here we propose a new network resampling strategy based on splitting node pairs rather than nodes applicable to cross-validation for a wide range of network model selection tasks. We provide a theoretical justification for our method in a general setting and examples of how our method can be used in specific network model selection and parameter tuning tasks. Numerical results on simulated networks and on a citation network of statisticians show that this cross-validation approach works well for model selection.

preprint2020arXiv

Network-Assisted Estimation for Large-dimensional Factor Model with Guaranteed Convergence Rate Improvement

Network structure is growing popular for capturing the intrinsic relationship between large-scale variables. In the paper we propose to improve the estimation accuracy for large-dimensional factor model when a network structure between individuals is observed. To fully excavate the prior network information, we construct two different penalties to regularize the factor loadings and shrink the idiosyncratic errors. Closed-form solutions are provided for the penalized optimization problems. Theoretical results demonstrate that the modified estimators achieve faster convergence rates and lower asymptotic mean squared errors when the underlying network structure among individuals is correct. An interesting finding is that even if the priori network is totally misleading, the proposed estimators perform no worse than conventional state-of-art methods. Furthermore, to facilitate the practical application, we propose a data-driven approach to select the tuning parameters, which is computationally efficient. We also provide an empirical criterion to determine the number of common factors. Simulation studies and application to the S&P100 weekly return dataset convincingly illustrate the superiority and adaptivity of the new approach.

preprint2019arXiv

Minorization-Maximization-based Steepest Ascent for Large-scale Survival Analysis with Time-Varying Effects: Application to the National Kidney Transplant Dataset

The time-varying effects model is a flexible and powerful tool for modeling the dynamic changes of covariate effects. However, in survival analysis, its computational burden increases quickly as the number of sample sizes or predictors grows. Traditional methods that perform well for moderate sample sizes and low-dimensional data do not scale to massive data. Analysis of national kidney transplant data with a massive sample size and large number of predictors defy any existing statistical methods and software. In view of these difficulties, we propose a Minorization-Maximization-based steepest ascent procedure for estimating the time-varying effects. Leveraging the block structure formed by the basis expansions, the proposed procedure iteratively updates the optimal block-wise direction along which the approximate increase in the log-partial likelihood is maximized. The resulting estimates ensure the ascent property and serve as refinements of the previous step. The performance of the proposed method is examined by simulations and applications to the analysis of national kidney transplant data.

preprint2016arXiv

Asymptotics in directed exponential random graph models with an increasing bi-degree sequence

Although asymptotic analyses of undirected network models based on degree sequences have started to appear in recent literature, it remains an open problem to study statistical properties of directed network models. In this paper, we provide for the first time a rigorous analysis of directed exponential random graph models using the in-degrees and out-degrees as sufficient statistics with binary as well as continuous weighted edges. We establish the uniform consistency and the asymptotic normality for the maximum likelihood estimate, when the number of parameters grows and only one realized observation of the graph is available. One key technique in the proofs is to approximate the inverse of the Fisher information matrix using a simple matrix with high accuracy. Numerical studies confirm our theoretical findings.

preprint2016arXiv

Classification with Ultrahigh-Dimensional Features

Although much progress has been made in classification with high-dimensional features \citep{Fan_Fan:2008, JGuo:2010, CaiSun:2014, PRXu:2014}, classification with ultrahigh-dimensional features, wherein the features much outnumber the sample size, defies most existing work. This paper introduces a novel and computationally feasible multivariate screening and classification method for ultrahigh-dimensional data. Leveraging inter-feature correlations, the proposed method enables detection of marginally weak and sparse signals and recovery of the true informative feature set, and achieves asymptotic optimal misclassification rates. We also show that the proposed procedure provides more powerful discovery boundaries compared to those in \citet{CaiSun:2014} and \citet{JJin:2009}. The performance of the proposed procedure is evaluated using simulation studies and demonstrated via classification of patients with different post-transplantation renal functional types.

preprint2016arXiv

High-dimensional Mixed Graphical Models

While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models linking both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted $\ell_1$ penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation data set (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to categorical variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables, we also show that the proposed methodology can be easily extended to general discrete variables.

preprint2015arXiv

Community Detection in Networks with Node Features

Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that uses both the network edge information and the node features to detect community structures. One advantage our method has over existing joint detection approaches is the flexibility of learning the impact of different features which may differ across communities. Another advantage is the flexibility of choosing the amount of influence the feature information has on communities. The method is asymptotically consistent under the block model with additional assumptions on the feature distributions, and performs well on simulated and real networks.

preprint2015arXiv

Consistency of community detection in networks under degree-corrected stochastic block models

Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs.

preprint2015arXiv

Detecting Overlapping Communities in Networks Using Spectral Methods

Community detection is a fundamental problem in network analysis which is made more challenging by overlaps between communities which often occur in practice. Here we propose a general, flexible, and interpretable generative model for overlapping communities, which can be thought of as a generalization of the degree-corrected stochastic block model. We develop an efficient spectral algorithm for estimating the community memberships, which deals with the overlaps by employing the K-medians algorithm rather than the usual K-means for clustering in the spectral domain. We show that the algorithm is asymptotically consistent when networks are not too sparse and the overlaps between communities not too large. Numerical experiments on both simulated networks and many real social networks demonstrate that our method performs very well compared to a number of benchmark methods for overlapping community detection.

preprint2015arXiv

Estimating heterogeneous graphical models for discrete data with an application to roll call voting

We consider the problem of jointly estimating a collection of graphical models for discrete data, corresponding to several categories that share some common structure. An example for such a setting is voting records of legislators on different issues, such as defense, energy, and healthcare. We develop a Markov graphical model to characterize the heterogeneous dependence structures arising from such data. The model is fitted via a joint estimation method that preserves the underlying common graph structure, but also allows for differences between the networks. The method employs a group penalty that targets the common zero interaction effects across all the networks. We apply the method to describe the internal networks of the U.S. Senate on several important issues. Our analysis reveals individual structure for each issue, distinct from the underlying well-known bipartisan structure common to all categories which we are able to extract separately. We also establish consistency of the proposed method both for parameter estimation and model selection, and evaluate its numerical performance on a number of simulated examples.

preprint2014arXiv

Regularized 3D functional regression for brain image data via Haar wavelets

The primary motivation and application in this article come from brain imaging studies on cognitive impairment in elderly subjects with brain disorders. We propose a regularized Haar wavelet-based approach for the analysis of three-dimensional brain image data in the framework of functional data analysis, which automatically takes into account the spatial information among neighboring voxels. We conduct extensive simulation studies to evaluate the prediction performance of the proposed approach and its ability to identify related regions to the outcome of interest, with the underlying assumption that only few relatively small subregions are truly predictive of the outcome of interest. We then apply the proposed approach to searching for brain subregions that are associated with cognition using PET images of patients with Alzheimer's disease, patients with mild cognitive impairment and normal controls.

preprint2013arXiv

Link prediction for partially observed networks

Link prediction is one of the fundamental problems in network analysis. In many applications, notably in genetics, a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many existing supervised learning approaches. We develop a new method which treats the observed network as a sample of the true network with different sampling rates for positive and negative examples. We obtain a relative ranking of potential links by their probabilities, utilizing information on node covariates as well as on network topology. Empirically, the method performs well under many settings, including when the observed network is sparse. We apply the method to a protein-protein interaction network and a school friendship network.

preprint2013arXiv

On The Degrees of Freedom of Reduced-rank Estimators in Multivariate Regression

In this paper we study the effective degrees of freedom of a general class of reduced rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation (SURE). We derive a finite-sample exact unbiased estimator that admits a closed-form expression in terms of the singular values or thresholded singular values of the least squares solution and hence readily computable. The results continue to hold in the high-dimensional scenario when both the predictor and response dimensions are allowed to be larger than the sample size. The derived analytical form facilitates the investigation of its theoretical properties and provides new insights into the empirical behaviors of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator, i.e., the number of free parameters. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.

preprint2013arXiv

Tree dynamics for peer-to-peer streaming

This paper presents an asynchronous distributed algorithm to manage multiple trees for peer-to-peer streaming in a flow level model. It is assumed that videos are cut into substreams, with or without source coding, to be distributed to all nodes. The algorithm guarantees that each node receives sufficiently many substreams within delay logarithmic in the number of peers. The algorithm works by constantly updating the topology so that each substream is distributed through trees to as many nodes as possible without interference. Competition among trees for limited upload capacity is managed so that both coverage and balance are achieved. The algorithm is robust in that it efficiently eliminates cycles and maintains tree structures in a distributed way. The algorithm favors nodes with higher degree, so it not only works for live streaming and video on demand, but also in the case a few nodes with large degree act as servers and other nodes act as clients. A proof of convergence of the algorithm is given assuming instantaneous update of depth information, and for the case of a single tree it is shown that the convergence time is stochastically tightly bounded by a small constant times the log of the number of nodes. These theoretical results are complemented by simulations showing that the algorithm works well even when most assumptions for the theoretical tractability do not hold.

preprint2012arXiv

Sparse Ising Models with Covariates

There has been a lot of work fitting Ising models to multivariate binary data in order to understand the conditional dependency relationships between the variables. However, additional covariates are frequently recorded together with the binary data, and may influence the dependence relationships. Motivated by such a dataset on genomic instability collected from tumor samples of several types, we propose a sparse covariate dependent Ising model to study both the conditional dependency within the binary data and its relationship with the additional covariates. This results in subject-specific Ising models, where the subject's covariates influence the strength of association between the genes. As in all exploratory data analysis, interpretability of results is important, and we use L1 penalties to induce sparsity in the fitted graphs and in the number of selected covariates. Two algorithms to fit the model are proposed and compared on a set of simulated data, and asymptotic results are established. The results on the tumor dataset and their biological significance are discussed in detail.

preprint2012arXiv

Stability of a Peer-to-Peer Communication System

This paper focuses on the stationary portion of file download in an unstructured peer-to-peer network, which typically follows for many hours after a flash crowd initiation. The model includes the case that peers can have some pieces at the time of arrival. The contribution of the paper is to identify how much help is needed from the seeds, either fixed seeds or peer seeds (which are peers remaining in the system after obtaining a complete collection) to stabilize the system. The dominant cause for instability is the missing piece syndrome, whereby one piece becomes very rare in the network. It is shown that stability can be achieved with only a small amount of help from peer seeds--even with very little help from a fixed seed, peers need dwell as peer seeds on average only long enough to upload one additional piece. The region of stability is insensitive to the piece selection policy. Network coding can substantially increase the region of stability in case a portion of the new peers arrive with randomly coded pieces.

preprint2011arXiv

Random lasso

We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of randomly selected covariates. A measure of importance is yielded from this step for each covariate. In step 2, a similar procedure to the first step is implemented with the exception that for each bootstrap sample, a subset of covariates is randomly selected with unequal selection probabilities determined by the covariates' importance. Adaptive lasso may be used in the second step with weights determined by the importance measures. The final set of covariates and their coefficients are determined by averaging bootstrap results obtained from step 2. The proposed method alleviates some of the limitations of lasso, elastic-net and related methods noted especially in the context of microarray data analysis: it tends to remove highly correlated variables altogether or select them all, and maintains maximal flexibility in estimating their coefficients, particularly with different signs; the number of selected variables is no longer limited by the sample size; and the resulting prediction accuracy is competitive or superior compared to the alternatives. We illustrate the proposed method by extensive simulation studies. The proposed method is also applied to a Glioblastoma microarray data analysis.

preprint2011arXiv

The Missing Piece Syndrome in Peer-to-Peer Communication

Typical protocols for peer-to-peer file sharing over the Internet divide files to be shared into pieces. New peers strive to obtain a complete collection of pieces from other peers and from a seed. In this paper we investigate a problem that can occur if the seeding rate is not large enough. The problem is that, even if the statistics of the system are symmetric in the pieces, there can be symmetry breaking, with one piece becoming very rare. If peers depart after obtaining a complete collection, they can tend to leave before helping other peers receive the rare piece. Assuming that peers arrive with no pieces, there is a single seed, random peer contacts are made, random useful pieces are downloaded, and peers depart upon receiving the complete file, the system is stable if the seeding rate (in pieces per time unit) is greater than the arrival rate, and is unstable if the seeding rate is less than the arrival rate. The result persists for any piece selection policy that selects from among useful pieces, such as rarest first, and it persists with the use of network coding.

preprint2010arXiv

Community extraction for social networks

Analysis of networks and in particular discovering communities within networks has been a focus of recent work in several fields, with applications ranging from citation and friendship networks to food webs and gene regulatory networks. Most of the existing community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities and few ties between. However, many networks contain nodes that do not fit in with any of the communities, and forcing every node into a community can distort results. Here we propose a new framework that focuses on community extraction instead of partition, extracting one community at a time. The main idea behind extraction is that the strength of a community should not depend on ties between members of other communities, but only on ties within that community and its ties to the outside world. We show that the new extraction criterion performs well on simulated and real networks, and establish asymptotic consistency of our method under the block model assumption.

preprint2010arXiv

Group Variable Selection via a Hierarchical Lasso and Its Oracle Property

In many engineering and scientific applications, prediction variables are grouped, for example, in biological applications where assayed genes or proteins can be grouped by biological roles or biological pathways. Common statistical analysis methods such as ANOVA, factor analysis, and functional modeling with basis sets also exhibit natural variable groupings. Existing successful group variable selection methods such as Antoniadis and Fan (2001), Yuan and Lin (2006) and Zhao, Rocha and Yu (2009) have the limitation of selecting variables in an "all-in-all-out" fashion, i.e., when one variable in a group is selected, all other variables in the same group are also selected. In many real problems, however, we may want to keep the flexibility of selecting variables within a group, such as in gene-set selection. In this paper, we develop a new group variable selection method that not only removes unimportant groups effectively, but also keeps the flexibility of selecting variables within a group. We also show that the new method offers the potential for achieving the theoretical "oracle" property as in Fan and Li (2001) and Fan and Peng (2004).

preprint2010arXiv

Quantifying Information Leakage in Finite Order Deterministic Programs

Information flow analysis is a powerful technique for reasoning about the sensitive information exposed by a program during its execution. While past work has proposed information theoretic metrics (e.g., Shannon entropy, min-entropy, guessing entropy, etc.) to quantify such information leakage, we argue that some of these measures not only result in counter-intuitive measures of leakage, but also are inherently prone to conflicts when comparing two programs P1 and P2 -- say Shannon entropy predicts higher leakage for program P1, while guessing entropy predicts higher leakage for program P2. This paper presents the first attempt towards addressing such conflicts and derives solutions for conflict-free comparison of finite order deterministic programs.

preprint2010arXiv

Sparse regulatory networks

In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses $L_1$ penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.

Ji Zhu

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

Locally sparse varying coefficient mixed model with application to longitudinal microbiome differential abundance

Adaptive Algorithm for Quantum Amplitude Estimation

Detection of the number of principal components by extended AIC-type method

High-dimensional Gaussian graphical model for network-linked data

Network cross-validation by edge sampling

Network-Assisted Estimation for Large-dimensional Factor Model with Guaranteed Convergence Rate Improvement

Minorization-Maximization-based Steepest Ascent for Large-scale Survival Analysis with Time-Varying Effects: Application to the National Kidney Transplant Dataset

Asymptotics in directed exponential random graph models with an increasing bi-degree sequence

Classification with Ultrahigh-Dimensional Features

High-dimensional Mixed Graphical Models

Community Detection in Networks with Node Features

Consistency of community detection in networks under degree-corrected stochastic block models

Detecting Overlapping Communities in Networks Using Spectral Methods

Estimating heterogeneous graphical models for discrete data with an application to roll call voting

Regularized 3D functional regression for brain image data via Haar wavelets

Link prediction for partially observed networks

On The Degrees of Freedom of Reduced-rank Estimators in Multivariate Regression

Tree dynamics for peer-to-peer streaming

Sparse Ising Models with Covariates

Stability of a Peer-to-Peer Communication System

Random lasso

The Missing Piece Syndrome in Peer-to-Peer Communication

Community extraction for social networks

Group Variable Selection via a Hierarchical Lasso and Its Oracle Property

Quantifying Information Leakage in Finite Order Deterministic Programs

Sparse regulatory networks