Researcher profile

Sourav Das

Sourav Das contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Challenges and approaches to privacy preserving post-click conversion prediction

Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area.

preprint2022arXiv

Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction

Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and advanced video-surveillance applications. This challenging task typically requires knowledge about past motion, the environment and likely destination areas. In this context, multi-modality is a fundamental aspect and its effective modeling can be beneficial to any architecture. Inferring accurate trajectories is nevertheless challenging, due to the inherently uncertain nature of the future. To overcome these difficulties, recent models use different inputs and propose to model human intentions using complex fusion mechanisms. In this respect, we propose a lightweight attention-based recurrent backbone that acts solely on past observed positions. Although this backbone already provides promising results, we demonstrate that its prediction accuracy can be improved considerably when combined with a scene-aware goal-estimation module. To this end, we employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations. We conduct extensive experiments on publicly-available datasets (i.e. SDD, inD, ETH/UCY) and show that our approach performs on par with state-of-the-art techniques while reducing model complexity.

preprint2022arXiv

PAC Mode Estimation using PPR Martingale Confidence Sequences

We consider the problem of correctly identifying the \textit{mode} of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is tackled very well by prior-posterior-ratio (PPR) martingale confidence sequences \citep{waudby-ramdas-ppr}, we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. To begin, we show that the "one-versus-one" principle to generalise from $K = 2$ to $K \geq 2$ classes is more efficient than the "one-versus-rest" alternative. We then prove that our resulting stopping rule, denoted PPR-1v1, is asymptotically optimal (as the mistake probability is taken to $0$). PPR-1v1 is parameter-free and computationally light, and incurs significantly fewer samples than competitors even in the non-asymptotic regime. We demonstrate its gains in two practical applications of sampling: election forecasting and verification of smart contracts in blockchains.

preprint2022arXiv

Probabilistic Impact Score Generation using Ktrain-BERT to Identify Hate Words from Twitter Discussions

Social media has seen a worrying rise in hate speech in recent times. Branching to several distinct categories of cyberbullying, gender discrimination, or racism, the combined label for such derogatory content can be classified as toxic content in general. This paper presents experimentation with a Keras wrapped lightweight BERT model to successfully identify hate speech and predict probabilistic impact score for the same to extract the hateful words within sentences. The dataset used for this task is the Hate Speech and Offensive Content Detection (HASOC 2021) data from FIRE 2021 in English. Our system obtained a validation accuracy of 82.60%, with a maximum F1-Score of 82.68%. Subsequently, our predictive cases performed significantly well in generating impact scores for successful identification of the hate tweets as well as the hateful words from tweet pools.

preprint2022arXiv

Relative Log-Symplectic structure on a semi-stable degeneration of moduli of Higgs bundles

In a recent paper \cite{3}, a semi-stable degeneration of moduli space of Higgs bundles on a curve has been constructed. In this paper, we show that there is a relative log-symplectic form on this degeneration, whose restriction to the generic fibre is the classical symplectic form discovered by Hitchin. We compute the Poisson ranks at every point and describe the symplectic foliation on the closed fibre. We also show that the closed fibre, which is a variety with normal crossing singularities, acquires a structure of an algebraically completely integrable system.

preprint2022arXiv

State-of-the-Art Review of Design of Experiments for Physics-Informed Deep Learning

This paper presents a comprehensive review of the design of experiments used in the surrogate models. In particular, this study demonstrates the necessity of the design of experiment schemes for the Physics-Informed Neural Network (PINN), which belongs to the supervised learning class. Many complex partial differential equations (PDEs) do not have any analytical solution; only numerical methods are used to solve the equations, which is computationally expensive. In recent decades, PINN has gained popularity as a replacement for numerical methods to reduce the computational budget. PINN uses physical information in the form of differential equations to enhance the performance of the neural networks. Though it works efficiently, the choice of the design of experiment scheme is important as the accuracy of the predicted responses using PINN depends on the training data. In this study, five different PDEs are used for numerical purposes, i.e., viscous Burger's equation, Shrödinger equation, heat equation, Allen-Cahn equation, and Korteweg-de Vries equation. A comparative study is performed to establish the necessity of the selection of a DoE scheme. It is seen that the Hammersley sampling-based PINN performs better than other DoE sample strategies.

preprint2021arXiv

Efficient Cross-Shard Transaction Execution in Sharded Blockchains

Sharding is a promising blockchain scaling solution. But it currently suffers from high latency and low throughput when it comes to cross-shard transactions, i.e., transactions that require coordination from multiple shards. The root cause of these limitations arise from the use of the classic two-phase commit protocol, which involves locking assets for extended periods of time. This paper presents Rivet, a new paradigm for blockchain sharding that achieves lower latency and higher throughput for cross-shard transactions. Rivet has a single reference shard running consensus, and multiple worker shards maintaining disjoint states and processing a subset of transactions in the system. Rivet obviates the need for consensus within each worker shard, and as a result, tolerates more failures within a shard and lowers communication overhead. We prove the correctness and security of Rivet. We also propose a more realistic framework for evaluating sharded blockchains by creating a benchmark based on real Ethereum transactions. An evaluation of our prototype implementation of Rivet and the baseline two-phase commit, atop 50+ AWS EC2 instances, using our evaluation framework demonstrates the latency and throughput improvements for cross-shard transactions.

preprint2020arXiv

Airmed: Efficient Self-Healing Network of Low-End Devices

The proliferation of application specific cyber-physical systems coupled with the emergence of a variety of attacks on such systems (malware such as Mirai and Hajime) underlines the need to secure such networks. Most existing security efforts have focused on only detection of the presence of malware. However given the ability of most attacks to spread through the network once they infect a few devices, it is important to contain the spread of a virus and at the same time systematically cleanse the impacted nodes using the communication capabilities of the network. Toward this end, we present Airmed - a method and system to not just detect corruption of the application software on a IoT node, but to self correct itself using its neighbors. Airmed's decentralized mechanisms prevent the spread of self-propagating malware and can also be used as a technique for updating application code on such IoT devices. Among the novelties of Airmed are a novel bloom-filter technique along with hardware support to identify position of the malware program from the benign application code, an adaptive self-check for computational efficiency, and a uniform random-backoff and stream signatures for secure and bandwidth efficient code exchange to correct corrupted devices. We assess the performance of Airmed, using the embedded systems security architecture of TrustLite in the OMNeT++ simulator. The results show that Airmed scales up to thousands of devices, ensures guaranteed update of the entire network, and can recover 95% of the nodes in 10 minutes in both internal and external propagation models. Moreover, we evaluate memory and communication costs and show that Airmed is efficient and incurs very low overhead.

preprint2020arXiv

Detecting Generic Music Features with Single Layer Feedforward Network using Unsupervised Hebbian Computation

With the ever-increasing number of digital music and vast music track features through popular online music streaming software and apps, feature recognition using the neural network is being used for experimentation to produce a wide range of results across a variety of experiments recently. Through this work, the authors extract information on such features from a popular open-source music corpus and explored new recognition techniques, by applying unsupervised Hebbian learning techniques on their single-layer neural network using the same dataset. The authors show the detailed empirical findings to simulate how such an algorithm can help a single layer feedforward network in training for music feature learning as patterns. The unsupervised training algorithm enhances their proposed neural network to achieve an accuracy of 90.36% for successful music feature detection. For comparative analysis against similar tasks, authors put their results with the likes of several previous benchmark works. They further discuss the limitations and thorough error analysis of their work. The authors hope to discover and gather new information about this particular classification technique and its performance, and further understand future potential directions and prospects that could improve the art of computational music feature recognition.

preprint2020arXiv

Further results on weighted core inverse in a ring

The notion of the weighted core inverse in a ring with involution was introduced, recently [Mosic et al. Comm. Algebra, 2018; 46(6); 2332-2345]. In this paper, we explore new representation and characterization of the weighted core inverse of sum and difference of two weighted core invertible elements in an unital ring with involution under different conditions. Further, we discuss reverse order laws and mixed-type reverse order laws for the weighted core invertible elements in a ring.