Source author record

Niladri Chatterjee

Niladri Chatterjee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Information Retrieval Machine Learning Artificial Intelligence Computational Engineering, Finance, and Science q-fin.ST Digital Libraries

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

FL Games: A federated learning framework for distribution shifts

Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, whereby predictive models with strong in-distribution generalization can fail catastrophically on unseen domains. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL Games, a game-theoretic framework for federated learning for learning causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL Games effectively resolves this challenge and exhibits smooth performance curves. Further, FL Games scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that FL Games achieves high out-of-distribution performance on various benchmarks.

preprint2022arXiv

Interpretation of Black Box NLP Models: A Survey

An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

preprint2022arXiv

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.

preprint2020arXiv

Examining Lead-Lag Relationships In-Depth, With Focus On FX Market As Covid-19 Crises Unfolds

The lead-lag relationship plays a vital role in financial markets. It is the phenomenon where a certain price-series lags behind and partially replicates the movement of leading time-series. The present research proposes a new technique which helps better identify the lead-lag relationship empirically. Apart from better identifying the lead-lag path, the technique also gives a measure for adjudging closeness between financial time-series. Also, the proposed measure is closely related to correlation, and it uses Dynamic Programming technique for finding the optimal lead-lag path. Further, it retains most of the properties of a metric, so much so, it is termed as loose metric. Tests are performed on Synthetic Time Series (STS) with known lead-lag relationship and comparisons are done with other state-of-the-art models on the basis of significance and forecastability. The proposed technique gives the best results in both the tests. It finds paths which are all statistically significant, and its forecasts are closest to the target values. Then, we use the measure to study the topology evolution of the Foreign Exchange market, as the COVID-19 pandemic unfolds. Here, we study the FX currency prices of 29 prominent countries of the world. It is observed that as the crises unfold, all the currencies become strongly interlinked to each other. Also, USA Dollar starts playing even more central role in the FX market. Finally, we mention several other application areas of the proposed technique for designing intelligent systems.

preprint2020arXiv

Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization

Most problems in Machine Learning cater to classification and the objects of universe are classified to a relevant class. Ranking of classified objects of universe per decision class is a challenging problem. We in this paper propose a novel Rough Set based membership called Rank Measure to solve to this problem. It shall be utilized for ranking the elements to a particular class. It differs from Pawlak Rough Set based membership function which gives an equivalent characterization of the Rough Set based approximations. It becomes paramount to look beyond the traditional approach of computing memberships while handling inconsistent, erroneous and missing data that is typically present in real world problems. This led us to propose the aggregate Rank Measure. The contribution of the paper is three fold. Firstly, it proposes a Rough Set based measure to be utilized for numerical characterization of within class ranking of objects. Secondly, it proposes and establish the properties of Rank Measure and aggregate Rank Measure based membership. Thirdly, we apply the concept of membership and aggregate ranking to the problem of supervised Multi Document Summarization wherein first the important class of sentences are determined using various supervised learning techniques and are post processed using the proposed ranking measure. The results proved to have significant improvement in accuracy.

preprint2019arXiv

Selecting stock pairs for pairs trading while incorporating lead-lag relationship

Pairs Trading is carried out in the financial market to earn huge profits from known equilibrium relation between pairs of stock. In financial markets, seldom it is seen that stock pairs are correlated at particular lead or lag. This lead-lag relationship has been empirically studied in various financial markets. Earlier research works have suggested various measures for identifying the best pairs for pairs trading, but they do not consider this lead-lag effect. The present study proposes a new distance measure which incorporates the lead-lag relationship between the stocks while selecting the best pairs for pairs trading. Further, the lead-lag value between the stocks is allowed to vary continuously over time. The proposed measures importance has been show-cased through experimentation on two different datasets, one corresponding to Indian companies and another corresponding to American companies. When the proposed measure is clubbed with SSD measure, i.e., when pairs are identified through optimising both these measures, then the selected pairs consistently generate the best profit, as compared to all other measures. Finally, possible generalisation and extension of the proposed distance measure have been discussed.

Niladri Chatterjee

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

FL Games: A federated learning framework for distribution shifts

Interpretation of Black Box NLP Models: A Survey

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations

Examining Lead-Lag Relationships In-Depth, With Focus On FX Market As Covid-19 Crises Unfolds

Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization

Selecting stock pairs for pairs trading while incorporating lead-lag relationship