Source author record

Subhajit Dutta

Subhajit Dutta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Machine Learning

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On Generalizations of Some Distance Based Classifiers for HDLSS Data

In high dimension, low sample size (HDLSS) settings, classifiers based on Euclidean distances like the nearest neighbor classifier and the average distance classifier perform quite poorly if differences between locations of the underlying populations get masked by scale differences. To rectify this problem, several modifications of these classifiers have been proposed in the literature. However, existing methods are confined to location and scale differences only, and often fail to discriminate among populations differing outside of the first two moments. In this article, we propose some simple transformations of these classifiers resulting into improved performance even when the underlying populations have the same location and scale. We further propose a generalization of these classifiers based on the idea of grouping of variables. The high-dimensional behavior of the proposed classifiers is studied theoretically. Numerical experiments with a variety of simulated examples as well as an extensive analysis of real data sets exhibit advantages of the proposed methods.

preprint2022arXiv

On Perfect Classification and Clustering for Gaussian Processes

In this paper, we propose a data based transformation for infinite-dimensional Gaussian processes and derive its limit theorem. For a classification problem, this transformation induces complete separation among the associated Gaussian processes. The misclassification probability of any simple classifier when applied on the transformed data asymptotically converges to zero. In a clustering problem using mixture models, an appropriate modification of this transformation asymptotically leads to perfect separation of the populations. Theoretical properties are studied for the usual $k$-means clustering method when used on this transformed data. Good empirical performance of the proposed methodology is demonstrated using simulated as well as benchmark data sets, when compared with some popular parametric and nonparametric methods for such functional data.

preprint2022arXiv

Sub-dimensional Mardia measures of multivariate skewness and kurtosis

The Mardia measures of multivariate skewness and kurtosis summarize the respective characteristics of a multivariate distribution with two numbers. However, these measures do not reflect the sub-dimensional features of the distribution. Consequently, testing procedures based on these measures may fail to detect skewness or kurtosis present in a sub-dimension of the multivariate distribution. We introduce sub-dimensional Mardia measures of multivariate skewness and kurtosis, and investigate the information they convey about all sub-dimensional distributions of some symmetric and skewed families of multivariate distributions. The maxima of the sub-dimensional Mardia measures of multivariate skewness and kurtosis are considered, as these reflect the maximum skewness and kurtosis present in the distribution, and also allow us to identify the sub-dimension bearing the highest skewness and kurtosis. Asymptotic distributions of the vectors of sub-dimensional Mardia measures of multivariate skewness and kurtosis are derived, based on which testing procedures for the presence of skewness and of deviation from Gaussian kurtosis are developed. The performances of these tests are compared with some existing tests in the literature on simulated and real datasets.

preprint2020arXiv

On a Generalization of the Average Distance Classifier

In high dimension, low sample size (HDLSS)settings, the simple average distance classifier based on the Euclidean distance performs poorly if differences between the locations get masked by the scale differences. To rectify this issue, modifications to the average distance classifier was proposed by Chan and Hall (2009). However, the existing classifiers cannot discriminate when the populations differ in other aspects than locations and scales. In this article, we propose some simple transformations of the average distance classifier to tackle this issue. The resulting classifiers perform quite well even when the underlying populations have the same location and scale. The high-dimensional behaviour of the proposed classifiers is studied theoretically. Numerical experiments with a variety of simulated as well as real data sets exhibit the usefulness of the proposed methodology.

preprint2020arXiv

On Construction of Higher Order Kernels Using Fourier Transforms and Covariance Functions

In this paper, we show that a suitably chosen covariance function of a continuous time, second order stationary stochastic process can be viewed as a symmetric higher order kernel. This leads to the construction of a higher order kernel by choosing an appropriate covariance function. An optimal choice of the constructed higher order kernel that partially minimizes the mean integrated square error of the kernel density estimator is also discussed.

preprint2016arXiv

On Affine Invariant $L_p$ Depth Classifiers based on an Adaptive Choice of $p$

In this article, we use L$_p$ depth for classification of multivariate data, where the value of $p$ is chosen adaptively using observations from the training sample. While many depth based classifiers are constructed assuming elliptic symmetry of the underlying distributions, our proposed L$_p$ depth classifiers cater to a larger class of distributions. We establish Bayes risk consistency of these proposed classifiers under appropriate regularity conditions. Several simulated and benchmark data sets are analyzed to compare their finite sample performance with some existing parametric and nonparametric classifiers including those based on other notions of data depth.

preprint2015arXiv

Multi-scale Classification using Localized Spatial Depth

In this article, we develop and investigate a new classifier based on features extracted using spatial depth. Our construction is based on fitting a generalized additive model to the posterior probabilities of the different competing classes. To cope with possible multi-modal as well as non-elliptic population distributions, we develop a localized version of spatial depth and use that with varying degrees of localization to build the classifier. Final classification is done by aggregating several posterior probability estimates each of which is obtained using localized spatial depth with a fixed scale of localization. The proposed classifier can be conveniently used even when the dimension is larger than the sample size, and its good discriminatory power for such data has been established using theoretical as well as numerical results.

preprint2012arXiv

Some intriguing properties of Tukey's half-space depth

For multivariate data, Tukey's half-space depth is one of the most popular depth functions available in the literature. It is conceptually simple and satisfies several desirable properties of depth functions. The Tukey median, the multivariate median associated with the half-space depth, is also a well-known measure of center for multivariate data with several interesting properties. In this article, we derive and investigate some interesting properties of half-space depth and its associated multivariate median. These properties, some of which are counterintuitive, have important statistical consequences in multivariate analysis. We also investigate a natural extension of Tukey's half-space depth and the related median for probability distributions on any Banach space (which may be finite- or infinite-dimensional) and prove some results that demonstrate anomalous behavior of half-space depth in infinite-dimensional spaces.

Subhajit Dutta

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

On Generalizations of Some Distance Based Classifiers for HDLSS Data

On Perfect Classification and Clustering for Gaussian Processes

Sub-dimensional Mardia measures of multivariate skewness and kurtosis

On a Generalization of the Average Distance Classifier

On Construction of Higher Order Kernels Using Fourier Transforms and Covariance Functions

On Affine Invariant $L_p$ Depth Classifiers based on an Adaptive Choice of $p$

Multi-scale Classification using Localized Spatial Depth

Some intriguing properties of Tukey's half-space depth