Source author record

Charanpal Dhanjal

Charanpal Dhanjal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks Applications physics.soc-ph Data Structures and Algorithms Information Retrieval Populations and Evolution

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

A k-core Decomposition Framework for Graph Clustering

Graph clustering or community detection constitutes an important task for investigating the internal structure of graphs, with a plethora of applications in several domains. Traditional techniques for graph clustering, such as spectral methods, typically suffer from high time and space complexity. In this article, we present CoreCluster, an efficient graph clustering framework based on the concept of graph degeneracy, that can be used along with any known graph clustering algorithm. Our approach capitalizes on processing the graph in an hierarchical manner provided by its core expansion sequence, an ordered partition of the graph into different levels according to the k-core decomposition. Such a partition provides an efficient way to process the graph in an incremental manner that preserves its clustering structure, while making the execution of the chosen clustering algorithm much faster due to the smaller size of the graph's partitions onto which the algorithm operates. An experimental analysis on a multitude of real and synthetic data demonstrates that our approach can be applied to any clustering algorithm accelerating the clustering process, while the quality of the clustering structure is preserved or even improved.

preprint2015arXiv

AUC Optimisation and Collaborative Filtering

In recommendation systems, one is interested in the ranking of the predicted items as opposed to other losses such as the mean squared error. Although a variety of ways to evaluate rankings exist in the literature, here we focus on the Area Under the ROC Curve (AUC) as it widely used and has a strong theoretical underpinning. In practical recommendation, only items at the top of the ranked list are presented to the users. With this in mind, we propose a class of objective functions over matrix factorisations which primarily represent a smooth surrogate for the real AUC, and in a special case we show how to prioritise the top of the list. The objectives are differentiable and optimised through a carefully designed stochastic gradient-descent-based algorithm which scales linearly with the size of the data. In the special case of square loss we show how to improve computational complexity by leveraging previously computed measures. To understand theoretically the underlying matrix factorisation approaches we study both the consistency of the loss functions with respect to AUC, and generalisation using Rademacher theory. The resulting generalisation analysis gives strong motivation for the optimisation under study. Finally, we provide computation results as to the efficacy of the proposed method using synthetic and real data.

preprint2014arXiv

An SIR Graph Growth Model for the Epidemics of Communicable Diseases

It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease. The major novelty of the SIR model we promote lies in the fact that the social network on which the epidemics is taking place is not specified in advance but evolves through time, accounting for the temporal evolution of the interactions involving infective individuals. Without assuming the existence of a fixed underlying network model, the stochastic process introduced describes, in a flexible and realistic manner, epidemic spread in non-uniformly mixing and possibly heterogeneous populations. It is shown how to fit such a (parametrised) model by means of Approximate Bayesian Computation methods based on graph-valued statistics. The concepts and statistical methods described in this paper are finally applied to a real epidemic dataset, related to the spread of HIV in Cuba in presence of a contact tracing system, which permits one to reconstruct partly the evolution of the graph of sexual partners diagnosed HIV positive between 1986 and 2006.

preprint2014arXiv

Efficient Eigen-updating for Spectral Graph Clustering

Partitioning a graph into groups of vertices such that those within each group are more densely connected than vertices assigned to different groups, known as graph clustering, is often used to gain insight into the organisation of large scale networks and for visualisation purposes. Whereas a large number of dedicated techniques have been recently proposed for static graphs, the design of on-line graph clustering methods tailored for evolving networks is a challenging problem, and much less documented in the literature. Motivated by the broad variety of applications concerned, ranging from the study of biological networks to the analysis of networks of scientific references through the exploration of communications networks such as the World Wide Web, it is the main purpose of this paper to introduce a novel, computationally efficient, approach to graph clustering in the evolutionary context. Namely, the method promoted in this article can be viewed as an incremental eigenvalue solution for the spectral clustering method described by Ng. et al. (2001). The incremental eigenvalue solution is a general technique for finding the approximate eigenvectors of a symmetric matrix given a change. As well as outlining the approach in detail, we present a theoretical bound on the quality of the approximate eigenvectors using perturbation theory. We then derive a novel spectral clustering algorithm called Incremental Approximate Spectral Clustering (IASC). The IASC algorithm is simple to implement and its efficacy is demonstrated on both synthetic and real datasets modelling the evolution of a HIV epidemic, a citation network and the purchase history graph of an e-commerce website.

preprint2014arXiv

On Recent Advances in Supervised Ranking for Metabolite Profiling

This paper focuses on data arising from the field of metabolomics, a rapidly developing area concerned by the analysis of the chemical fingerprints (i.e. the metabolite profile). The metabolite profile is left by specific chemical processes occurring in biological cells, tissues or organs. It is the main purpose of this article to develop and implement scoring techniques so as to rank all possible metabolic profiles by increasing order of magnitude of the conditional probability that a given metabolite is present at high levels in a certain biological fluid. After a detailed description of the (functional) data from which decision rules must be learnt, several approaches to this predictive problem, based on recent advances in K-partite ranking are described at length. Their performance on several real datasets are next thoroughly investigated.

preprint2014arXiv

Online Matrix Completion Through Nuclear Norm Regularisation

It is the main goal of this paper to propose a novel method to perform matrix completion on-line. Motivated by a wide variety of applications, ranging from the design of recommender systems to sensor network localization through seismic data reconstruction, we consider the matrix completion problem when entries of the matrix of interest are observed gradually. Precisely, we place ourselves in the situation where the predictive rule should be refined incrementally, rather than recomputed from scratch each time the sample of observed entries increases. The extension of existing matrix completion methods to the sequential prediction context is indeed a major issue in the Big Data era, and yet little addressed in the literature. The algorithm promoted in this article builds upon the Soft Impute approach introduced in Mazumder et al. (2010). The major novelty essentially arises from the use of a randomised technique for both computing and updating the Singular Value Decomposition (SVD) involved in the algorithm. Though of disarming simplicity, the method proposed turns out to be very efficient, while requiring reduced computations. Several numerical experiments based on real datasets illustrating its performance are displayed, together with preliminary results giving it a theoretical basis.

preprint2013arXiv

Learning Reputation in an Authorship Network

The problem of searching for experts in a given academic field is hugely important in both industry and academia. We study exactly this issue with respect to a database of authors and their publications. The idea is to use Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform topic modelling in order to find authors who have worked in a query field. We then construct a coauthorship graph and motivate the use of influence maximisation and a variety of graph centrality measures to obtain a ranked list of experts. The ranked lists are further improved using a Markov Chain-based rank aggregation approach. The complete method is readily scalable to large datasets. To demonstrate the efficacy of the approach we report on an extensive set of computational simulations using the Arnetminer dataset. An improvement in mean average precision is demonstrated over the baseline case of simply using the order of authors found by the topic models.

preprint2012arXiv

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via V-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage however of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called V-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here we report on an extensive set of experiments comparing V-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and V-fold penalisation provide poor estimates of the risk respectively and introduce a modified penalisation technique to reduce the estimation error.

preprint2012arXiv

Dissemination of Health Information within Social Networks

In this paper, we investigate, how information about a common food born health hazard, known as Campylobacter, spreads once it was delivered to a random sample of individuals in France. The central question addressed here is how individual characteristics and the various aspects of social network influence the spread of information. A key claim of our paper is that information diffusion processes occur in a patterned network of social ties of heterogeneous actors. Our percolation models show that the characteristics of the recipients of the information matter as much if not more than the characteristics of the sender of the information in deciding whether the information will be transmitted through a particular tie. We also found that at least for this particular advisory, it is not the perceived need of the recipients for the information that matters but their general interest in the topic.

preprint2011arXiv

The Evolution of the Cuban HIV/AIDS Network

An individual detected as HIV positive in Cuba is asked to provide a list of his/her sexual contacts for the previous 2 years. This allows one to gather detailed information on the spread of the HIV epidemic. Here we study the evolution of the sexual contact graph of detected individuals and also the directed graph of HIV infections. The study covers the Cuban HIV epidemic between the years 1986 and 2004 inclusive and is motivated by an earlier study on the static properties of the network at the end of 2004. We use a variety of advanced graph algorithms to paint a picture of the growth of the epidemic, including an examination of diameters, geodesic distances, community structure and centrality amongst others characteristics. The analysis contrasts the HIV network with other real networks, and graphs generated using the configuration model. We find that the early epidemic starts in the heterosexual population and then grows mainly through MSM (Men having Sex with Men) contact. The epidemic exhibits a giant component which is shown to have degenerate chains of vertices and after 1989, diameters are larger than that expected by the equivalent configuration model graphs. In 1997 there is an significant increase in the detection rate from 73 to 256 detections/year covering mainly MSMs which results in a rapid increase of distances and diameters in the giant component.

Charanpal Dhanjal

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A k-core Decomposition Framework for Graph Clustering

AUC Optimisation and Collaborative Filtering

An SIR Graph Growth Model for the Epidemics of Communicable Diseases

Efficient Eigen-updating for Spectral Graph Clustering

On Recent Advances in Supervised Ranking for Metabolite Profiling

Online Matrix Completion Through Nuclear Norm Regularisation

Learning Reputation in an Authorship Network

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Dissemination of Health Information within Social Networks

The Evolution of the Cuban HIV/AIDS Network