Source author record

Kunal Banerjee

Kunal Banerjee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cs.CY Neural and Evolutionary Computing Social and Information Networks

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Detecting Concept Drift in the Presence of Sparsity -- A Case Study of Automated Change Risk Assessment System

Missing values, widely called as \textit{sparsity} in literature, is a common characteristic of many real-world datasets. Many imputation methods have been proposed to address this problem of data incompleteness or sparsity. However, the accuracy of a data imputation method for a given feature or a set of features in a dataset is highly dependent on the distribution of the feature values and its correlation with other features. Another problem that plagues industry deployments of machine learning (ML) solutions is concept drift detection, which becomes more challenging in the presence of missing values. Although data imputation and concept drift detection have been studied extensively, little work has attempted a combined study of the two phenomena, i.e., concept drift detection in the presence of sparsity. In this work, we carry out a systematic study of the following: (i) different patterns of missing values, (ii) various statistical and ML based data imputation methods for different kinds of sparsity, (iii) several concept drift detection methods, (iv) practical analysis of the various drift detection metrics, (v) selecting the best concept drift detector given a dataset with missing values based on the different metrics. We first analyze it on synthetic data and publicly available datasets, and finally extend the findings to our deployed solution of automated change risk assessment system. One of the major findings from our empirical study is the absence of supremacy of any one concept drift detection method across all the relevant metrics. Therefore, we adopt a majority voting based ensemble of concept drift detectors for abrupt and gradual concept drifts. Our experiments show optimal or near optimal performance can be achieved for this ensemble method across all the metrics.

preprint2022arXiv

These Deals Won't Last! Longevity, Uniformity and Bias in Product Badge Assignment in E-Commerce Platforms

Product badges are ubiquitous in e-commerce platforms, acting as effective psychological triggers to nudge customers to buy specific products, boosting revenues. However, to the best of our knowledge, there has been no attempt to systematically study these badges and their several idiosyncrasies - we intend to close this gap in our current work. Specifically, we try to answer questions such as: How long does a product retain a badge on a given platform? If a product is sold on different platforms, then does it receive similar badges? How do the products that receive badges differ from those which do not, in terms of price, customer rating, etc. We collect longitudinal data from several e-commerce platforms over 45 days, and find that although most of the badges are short-lived, there are several permanent badge assignments and that too for badges meant to denote urgency or scarcity. Furthermore, it is unclear how the badge assignments are done, and we find evidence that highly-rated products are missing out on badges compared to lower quality ones. Our work calls for greater transparency in the badge assignment process to inform customers, as well as to reduce dissatisfaction among the sellers dependent on the platforms for their revenues.

preprint2020arXiv

K-TanH: Efficient TanH For Deep Learning

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical formats, such as, Float32 and BFloat16. High quality approximations to other activation functions, e.g., Sigmoid, Swish and GELU, can be derived from K-TanH. Our AVX512 implementation of K-TanH demonstrates $>5\times$ speed up over Intel SVML, and it is consistently superior in efficiency over other approximations that use floating point arithmetic. Finally, we achieve state-of-the-art Bleu score and convergence results for training language translation model GNMT on WMT16 data sets with approximate TanH obtained via K-TanH on BFloat16 inputs.