Source author record

P. S. Sastry

P. S. Sastry appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Neural and Evolutionary Computing Artificial Intelligence Databases

Catalog footprint

What is connected

10works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

On the Robustness of Decision Tree Learning under Label Noise

In most practical problems of classifier learning, the training data suffers from the label noise. Hence, it is important to understand how robust is a learning algorithm to such label noise. This paper presents some theoretical analysis to show that many popular decision tree algorithms are robust to symmetric label noise under large sample size. We also present some sample complexity results which provide some bounds on the sample size for the robustness to hold with a high probability. Through extensive simulations we illustrate this robustness.

preprint2015arXiv

Empirical Analysis of Sampling Based Estimators for Evaluating RBMs

The Restricted Boltzmann Machines (RBM) can be used either as classifiers or as generative models. The quality of the generative RBM is measured through the average log-likelihood on test data. Due to the high computational complexity of evaluating the partition function, exact calculation of test log-likelihood is very difficult. In recent years some estimation methods are suggested for approximate computation of test log-likelihood. In this paper we present an empirical comparison of the main estimation methods, namely, the AIS algorithm for estimating the partition function, the CSL method for directly estimating the log-likelihood, and the RAISE algorithm that combines these two ideas. We use the MNIST data set to learn the RBM and then compare these methods for estimating the test log-likelihood.

preprint2015arXiv

Making Risk Minimization Tolerant to Label Noise

In many applications, the training data, from which one needs to learn a classifier, is corrupted with label noise. Many standard algorithms such as SVM perform poorly in presence of label noise. In this paper we investigate the robustness of risk minimization to label noise. We prove a sufficient condition on a loss function for the risk minimization under that loss to be tolerant to uniform label noise. We show that the $0-1$ loss, sigmoid loss, ramp loss and probit loss satisfy this condition though none of the standard convex loss functions satisfy it. We also prove that, by choosing a sufficiently large value of a parameter in the loss function, the sigmoid loss, ramp loss and probit loss can be made tolerant to non-uniform label noise also if we can assume the classes to be separable under noise-free data distribution. Through extensive empirical studies, we show that risk minimization under the $0-1$ loss, the sigmoid loss and the ramp loss has much better robustness to label noise when compared to the SVM algorithm.

preprint2014arXiv

Discovering Compressing Serial Episodes from Event Sequences

Most pattern mining methods output a very large number of frequent patterns and isolating a small but relevant subset is a challenging problem of current interest in frequent pattern mining. In this paper we consider discovery of a small set of relevant frequent episodes from data sequences. We make use of the Minimum Description Length principle to formulate the problem of selecting a subset of episodes. Using an interesting class of serial episodes with inter-event constraints and a novel encoding scheme for data using such episodes, we present algorithms for discovering small set of episodes that achieve good data compression. Using an example of the data streams obtained from distributed sensors in a composable coupled conveyor system, we show that our method is very effective in unearthing highly relevant episodes and that our scheme also achieves good data compression.

preprint2014arXiv

Polyceptron: A Polyhedral Learning Algorithm

In this paper we propose a new algorithm for learning polyhedral classifiers which we call as Polyceptron. It is a Perception like algorithm which updates the parameters only when the current classifier misclassifies any training data. We give both batch and online version of Polyceptron algorithm. Finally we give experimental results to show the effectiveness of our approach.

preprint2013arXiv

K-Plane Regression

In this paper, we present a novel algorithm for piecewise linear regression which can learn continuous as well as discontinuous piecewise linear functions. The main idea is to repeatedly partition the data and learn a liner model in in each partition. While a simple algorithm incorporating this idea does not work well, an interesting modification results in a good algorithm. The proposed algorithm is similar in spirit to $k$-means clustering algorithm. We show that our algorithm can also be viewed as an EM algorithm for maximum likelihood estimation of parameters under a reasonable probability model. We empirically demonstrate the effectiveness of our approach by comparing its performance with the state of art regression learning algorithms on some real world datasets.

preprint2012arXiv

Geometric Decision Tree

In this paper we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy to assess the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node, are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.

preprint2012arXiv

Noise Tolerance under Risk Minimization

In this paper we explore noise tolerant learning of classifiers. We formulate the problem as follows. We assume that there is an ${\bf unobservable}$ training set which is noise-free. The actual training set given to the learning algorithm is obtained from this ideal data set by corrupting the class label of each example. The probability that the class label of an example is corrupted is a function of the feature vector of the example. This would account for most kinds of noisy data one encounters in practice. We say that a learning method is noise tolerant if the classifiers learnt with the ideal noise-free data and with noisy data, both have the same classification accuracy on the noise-free data. In this paper we analyze the noise tolerance properties of risk minimization (under different loss functions), which is a generic method for learning classifiers. We show that risk minimization under 0-1 loss function has impressive noise tolerance properties and that under squared error loss is tolerant only to uniform noise; risk minimization under other loss functions is not noise tolerant. We conclude the paper with some discussion on implications of these theoretical results.

preprint2010arXiv

A unified view of Automata-based algorithms for Frequent Episode Discovery

Frequent Episode Discovery framework is a popular framework in Temporal Data Mining with many applications. Over the years many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper we present a unified view of all such frequency counting algorithms. We present a generic algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various algorithms as we show here. We also point out how this unified view helps us to consider generalization of the algorithm so that they can discover episodes with general partial orders.

preprint2010arXiv

Efficient Discovery of Large Synchronous Events in Neural Spike Streams

We address the problem of finding patterns from multi-neuronal spike trains that give us insights into the multi-neuronal codes used in the brain and help us design better brain computer interfaces. We focus on the synchronous firings of groups of neurons as these have been shown to play a major role in coding and communication. With large electrode arrays, it is now possible to simultaneously record the spiking activity of hundreds of neurons over large periods of time. Recently, techniques have been developed to efficiently count the frequency of synchronous firing patterns. However, when the number of neurons being observed grows they suffer from the combinatorial explosion in the number of possible patterns and do not scale well. In this paper, we present a temporal data mining scheme that overcomes many of these problems. It generates a set of candidate patterns from frequent patterns of smaller size; all possible patterns are not counted. Also we count only a certain well defined subset of occurrences and this makes the process more efficient. We highlight the computational advantage that this approach offers over the existing methods through simulations. We also propose methods for assessing the statistical significance of the discovered patterns. We detect only those patterns that repeat often enough to be significant and thus be able to automatically fix the threshold for the data-mining application. Finally we discuss the usefulness of these methods for brain computer interfaces.

P. S. Sastry

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

On the Robustness of Decision Tree Learning under Label Noise

Empirical Analysis of Sampling Based Estimators for Evaluating RBMs

Making Risk Minimization Tolerant to Label Noise

Discovering Compressing Serial Episodes from Event Sequences

Polyceptron: A Polyhedral Learning Algorithm

K-Plane Regression

Geometric Decision Tree

Noise Tolerance under Risk Minimization

A unified view of Automata-based algorithms for Frequent Episode Discovery

Efficient Discovery of Large Synchronous Events in Neural Spike Streams