Source author record

Rajesh Sharma

Rajesh Sharma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks Computation and Language Distributed, Parallel, and Cluster Computing Machine Learning cs.CY math.FA Artificial Intelligence Computer Vision Cryptography and Security econ.GN Information Retrieval math.CO q-fin.EC

Catalog footprint

What is connected

14works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Confidence Trap: Gender Bias and Predictive Certainty in LLMs

The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the alignment between LLM-predicted confidence and human-annotated bias judgments. Focusing on gender bias, the research investigates probability confidence calibration in contexts involving gendered pronoun resolution. The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in LLMs. The results show that, among the six state-of-the-art models, Gemma-2 demonstrates the worst calibration according to the gender bias benchmark. The primary contribution of this work is a fairness-aware evaluation of LLMs' confidence calibration, offering guidance for ethical deployment. In addition, we introduce a new calibration metric, Gender-ECE, designed to measure gender disparities in resolution tasks.

preprint2025arXiv

That is Unacceptable: the Moral Foundations of Canceling

Canceling is a morally-driven phenomenon that hinders the development of safe social media platforms and contributes to ideological polarization. To address this issue we present the Canceling Attitudes Detection (CADE) dataset, an annotated corpus of canceling incidents aimed at exploring the factors of disagreements in evaluating people canceling attitudes on social media. Specifically, we study the impact of annotators' morality in their perception of canceling, showing that morality is an independent axis for the explanation of disagreement on this phenomenon. Annotator's judgments heavily depend on the type of controversial events and involved celebrities. This shows the need to develop more event-centric datasets to better understand how harms are perpetrated in social media and to develop more aware technologies for their detection.

preprint2022arXiv

Eigenvalues and Diagonal Elements

In this paper we discuss some relations between the eigenvalues and the diagonal entries of Hermitian matrices.

preprint2022arXiv

Extensions of the Schur majorisation inequalities

We obtain some inequalities which are stronger than the Schur majorization inequalities.

preprint2022arXiv

Pre-Trained Language Transformers are Universal Image Classifiers

Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both raw facial images and encrypted images. We hypothesize the meta-learning capacity FPT gained because of its large size and trained on a large size with theory and experiments. The GPT-2 trained to generate a single word token at a time, through the autoregressive process, forced to heavy-tail distribution. Then the FPT uses the heavy-tail property as its meta-learning capacity for classifying images. Our work shows one way to avoid bias during the machine classification of images.The FPT encodes worldly knowledge because of the pretraining of one text, which it uses during the classification. The statistical error of classification is reduced because of the added context gained from the text.Our paper shows the ethical dimension of using encrypted data for classification.Criminal images are sensitive to share across the boundary but encrypted largely evades ethical concern.FPT showing good classification accuracy on encrypted images shows promise for further research on privacy-preserving machine learning.

preprint2020arXiv

Edge Computing Enabled by Unmanned Autonomous Vehicles

Pervasive applications are revolutionizing the perception that users have towards the environment. Indeed, pervasive applications perform resource intensive computations over large amounts of stream sensor data collected from multiple sources. This allows applications to provide richer and deep insights into the natural characteristics that govern everything that surrounds us. A key limitation of these applications is that they have high energy footprints, which in turn hampers the quality of experience of users. While cloud and edge computing solutions can be applied to alleviate the problem, these solutions are hard to adopt in existing architecture and far from become ubiquitous. Fortunately, cloudlets are becoming portable enough, such that they can be transported and integrated into any environment easily and dynamically. In this article, we investigate how cloudlets can be transported by unmanned autonomous vehicles (UAV)s to provide computation support on the edge. Based on our study, we develop GEESE, a novel UAVbased system that enables the dynamic deployment of an edge computing infrastructure through the cooperation of multiple UAVs carrying cloudlets. By using GEESE, we conduct rigorous experiments to analyze the effort to deliver cloudlets using aerial, ground, and underwater UAVs. Our results indicate that UAVs can work in a cooperative manner to enable edge computing in the wild.

preprint2020arXiv

Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study

Identifying semantically identical questions on, Question and Answering social media platforms like Quora is exceptionally significant to ensure that the quality and the quantity of content are presented to users, based on the intent of the question and thus enriching overall user experience. Detecting duplicate questions is a challenging problem because natural language is very expressive, and a unique intent can be conveyed using different words, phrases, and sentence structuring. Machine learning and deep learning methods are known to have accomplished superior results over traditional natural language processing techniques in identifying similar texts. In this paper, taking Quora for our case study, we explored and applied different machine learning and deep learning techniques on the task of identifying duplicate questions on Quora's dataset. By using feature engineering, feature importance techniques, and experimenting with seven selected machine learning classifiers, we demonstrated that our models outperformed previous studies on this task. Xgboost model with character level term frequency and inverse term frequency is our best machine learning model that has also outperformed a few of the Deep learning baseline models. We applied deep learning techniques to model four different deep neural networks of multiple layers consisting of Glove embeddings, Long Short Term Memory, Convolution, Max pooling, Dense, Batch Normalization, Activation functions, and model merge. Our deep learning models achieved better accuracy than machine learning models. Three out of four proposed architectures outperformed the accuracy from previous machine learning and deep learning research work, two out of four models outperformed accuracy from previous deep learning study on Quora's question pair dataset, and our best model achieved accuracy of 85.82% which is close to Quora state of the art accuracy.

preprint2020arXiv

Mobility Based SIR Model For Pandemics -- With Case Study Of COVID-19

In the last decade, humanity has faced many different pandemics such as SARS, H1N1, and presently novel coronavirus (COVID-19). On one side, scientists are focusing on vaccinations, and on the other side, there is a need to propose models that can help us in understanding the spread of these pandemics as it can help governmental and other concerned agencies to be well prepared, especially from pandemics, which spreads faster like COVID-19. The main reason for some epidemic turning into pandemics is the connectivity among different regions of the world, which makes it easier to affect a wider geographical area, often worldwide. In addition, the population distribution and social coherence in the different regions of the world is non-uniform. Thus, once the epidemic enters a region, then the local population distribution plays an important role. Inspired by these ideas, we proposed a mobility-based SIR model for epidemics, which especially takes into account pandemic situations. To the best of our knowledge, this model is first of its kind, which takes into account the population distribution and connectivity of different geographic locations across the globe. In addition to presenting the mathematical proof of our model, we have performed extensive simulations using synthetic data to demonstrate our model's generalizability. To demonstrate the wider scope of our model, we used our model to forecast the COVID-19 cases for Estonia.

preprint2020arXiv

Which bills are lobbied? Predicting and interpreting lobbying activity in the US

Using lobbying data from OpenSecrets.org, we offer several experiments applying machine learning techniques to predict if a piece of legislation (US bill) has been subjected to lobbying activities or not. We also investigate the influence of the intensity of the lobbying activity on how discernible a lobbied bill is from one that was not subject to lobbying. We compare the performance of a number of different models (logistic regression, random forest, CNN and LSTM) and text embedding representations (BOW, TF-IDF, GloVe, Law2Vec). We report results of above 0.85% ROC AUC scores, and 78% accuracy. Model performance significantly improves (95% ROC AUC, and 88% accuracy) when bills with higher lobbying intensity are looked at. We also propose a method that could be used for unlabelled data. Through this we show that there is a considerably large number of previously unlabelled US bills where our predictions suggest that some lobbying activity took place. We believe our method could potentially contribute to the enforcement of the US Lobbying Disclosure Act (LDA) by indicating the bills that were likely to have been affected by lobbying but were not filed as such.

preprint2014arXiv

DAPriv: Decentralized architecture for preserving the privacy of medical data

The digitization of the medical data has been a sensitive topic. In modern times laws such as the HIPAA provide some guidelines for electronic transactions in medical data to prevent attacks and fraudulent usage of private information. In our paper, we explore an architecture that uses hybrid computing with decentralized key management and show how it is suitable in preventing a special form of re-identification attack that we name as the re-assembly attack. This architecture would be able to use current infrastructure from mobile phones to server certificates and cloud based decentralized storage models in an efficient way to provide a reliable model for communication of medical data. We encompass entities including patients, doctors, insurance agents, emergency contacts, researchers, medical test laboratories and technicians. This is a complete architecture that provides patients with a good level of privacy, secure communication and more direct control.

preprint2014arXiv

Missing data in multiplex networks: a preliminary study

A basic problem in the analysis of social networks is missing data. When a network model does not accurately capture all the actors or relationships in the social system under study, measures computed on the network and ultimately the final outcomes of the analysis can be severely distorted. For this reason, researchers in social network analysis have characterised the impact of different types of missing data on existing network measures. Recently a lot of attention has been devoted to the study of multiple-network systems, e.g., multiplex networks. In these systems missing data has an even more significant impact on the outcomes of the analyses. However, to the best of our knowledge, no study has focused on this problem yet. This work is a first step in the direction of understanding the impact of missing data in multiple networks. We first discuss the main reasons for missingness in these systems, then we explore the relation between various types of missing information and their effect on network properties. We provide initial experimental evidence based on both real and synthetic data.

preprint2014arXiv

Spreading processes in Multilayer Networks

Several systems can be modeled as sets of interconnected networks or networks with multiple types of connections, here generally called multilayer networks. Spreading processes such as information propagation among users of an online social networks, or the diffusion of pathogens among individuals through their contact network, are fundamental phenomena occurring in these networks. However, while information diffusion in single networks has received considerable attention from various disciplines for over a decade, spreading processes in multilayer networks is still a young research area presenting many challenging research issues. In this paper we review the main models, results and applications of multilayer spreading processes and discuss some promising research directions.

preprint2013arXiv

PriSM: A Private Social Mesh for Leveraging Social Networking at Workplace

In this work we describe the PriSM framework for decentralized deployment of a federation of autonomous social networks (ASN). The individual ASNs are centrally managed by organizations according to their institutional needs, while cross-ASN interactions are facilitated subject to security and confidentiality requirements specified by administrators and users of the ASNs. Such decentralized deployment, possibly either on private or public clouds, provides control and ownership of information/flow to individual organizations. Lack of such complete control (if third party online social networking services were to be used) has so far been a great barrier in taking full advantage of the novel communication mechanisms at workplace that have however become commonplace for personal usage with the advent of Web 2.0 platforms and online social networks. PriSM provides a practical solution for organizations to harness the advantages of online social networking both in intra/inter-organizational settings without sacrificing autonomy, security and confidentiality needs.

preprint2011arXiv

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks

Recent years have seen several earnest initiatives from both academic researchers as well as open source communities to implement and deploy decentralized online social networks (DOSNs). The primary motivations for DOSNs are privacy and autonomy from big brotherly service providers. The promise of decentralization is complete freedom for end-users from any service providers both in terms of keeping privacy about content and communication, and also from any form of censorship. However decentralization introduces many challenges. One of the principal problems is to guarantee availability of data even when the data owner is not online, so that others can access the said data even when a node is offline or down. In this paper, we argue that a pragmatic design needs to explicitly allow for and leverage on system heterogeneity, and provide incentives for the resource rich participants in the system to contribute such resources. To that end we introduce SuperNova - a super-peer based DOSN architecture. While proposing the SuperNova architecture, we envision a dynamic system driven by incentives and reputation, however, investigation of such incentives and reputation, and its effect on determining peer behaviors is a subject for our future study. In this paper we instead investigate the efficacy of a super-peer based system at any time point (a snap-shot of the envisioned dynamic system), that is to say, we try to quantify the performance of SuperNova system given any (fixed) mix of peer population and strategies.

Rajesh Sharma

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

The Confidence Trap: Gender Bias and Predictive Certainty in LLMs

That is Unacceptable: the Moral Foundations of Canceling

Eigenvalues and Diagonal Elements

Extensions of the Schur majorisation inequalities

Pre-Trained Language Transformers are Universal Image Classifiers

Edge Computing Enabled by Unmanned Autonomous Vehicles

Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study

Mobility Based SIR Model For Pandemics -- With Case Study Of COVID-19

Which bills are lobbied? Predicting and interpreting lobbying activity in the US

DAPriv: Decentralized architecture for preserving the privacy of medical data

Missing data in multiplex networks: a preliminary study

Spreading processes in Multilayer Networks

PriSM: A Private Social Mesh for Leveraging Social Networking at Workplace

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks