Source author record

Amitabha Bagchi

Amitabha Bagchi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Data Structures and Algorithms cs.CY Databases Distributed, Parallel, and Cluster Computing Machine Learning math.PR Networking and Internet Architecture physics.soc-ph Artificial Intelligence Discrete Mathematics Information Retrieval Multiagent Systems

Catalog footprint

What is connected

14works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GRAPHGINI: Fostering Individual and Group Fairness in Graph Neural Networks

Graph Neural Networks (GNNs) have demonstrated impressive performance across various tasks, leading to their increased adoption in high-stakes decision-making systems. However, concerns have arisen about GNNs potentially generating unfair decisions for underprivileged groups or individuals when lacking fairness constraints. This work addresses this issue by introducing GraphGini, a novel approach that incorporates the Gini coefficient to enhance both individual and group fairness within the GNN framework. We rigorously establish that the Gini coefficient offers greater robustness and promotes equal opportunity among GNN outcomes, advantages not afforded by the prevailing Lipschitz constant methodology. Additionally, we employ the Nash social welfare program to ensure our solution yields a Pareto optimal distribution of group fairness. Extensive experimentation on real-world datasets demonstrates GraphGini's efficacy in significantly improving individual fairness compared to state-of-the-art methods while maintaining utility and group fairness.

preprint2022arXiv

CHEX: Multiversion Replay with Ordered Checkpoints

In scientific computing and data science disciplines, it is often necessary to share application workflows and repeat results. Current tools containerize application workflows, and share the resulting container for repeating results. These tools, due to containerization, do improve sharing of results. However, they do not improve the efficiency of replay. In this paper, we present the multiversion replay problem which arises when multiple versions of an application are containerized, and each version must be replayed to repeat results. To avoid executing each version separately, we develop CHEX, which checkpoints program state and determines when it is permissible to reuse program state across versions. It does so using system call-based execution lineage. Our capability to identify common computations across versions enables us to consider optimizing replay using an in-memory cache, based on a checkpoint-restore-switch system. We show the multiversion replay problem is NP-hard, and propose efficient heuristics for it. CHEX reduces overall replay time by sharing common computations but avoids storing a large number of checkpoints. We demonstrate that CHEX maintains lightweight package sharing, and improves the total time of multiversion replay by 50% on average.

preprint2022arXiv

FairFoody: Bringing in Fairness in Food Delivery

Along with the rapid growth and rise to prominence of food delivery platforms, concerns have also risen about the terms of employment of the gig workers underpinning this growth. Our analysis on data derived from a real-world food delivery platform across three large cities from India show that there is significant inequality in the money delivery agents earn. In this paper, we formulate the problem of fair income distribution among agents while also ensuring timely food delivery. We establish that the problem is not only NP-hard but also inapproximable in polynomial time. We overcome this computational bottleneck through a novel matching algorithm called FairFoody. Extensive experiments over real-world food delivery datasets show FairFoody imparts up to 10 times improvement in equitable income distribution when compared to baseline strategies, while also ensuring minimal impact on customer experience.

preprint2022arXiv

Gigs with Guarantees: Achieving Fair Wage for Food Delivery Workers

With the increasing popularity of food delivery platforms, it has become pertinent to look into the working conditions of the 'gig' workers in these platforms, especially providing them fair wages, reasonable working hours, and transparency on work availability. However, any solution to these problems must not degrade customer experience and be cost-effective to ensure that platforms are willing to adopt them. We propose WORK4FOOD, which provides income guarantees to delivery agents, while minimizing platform costs and ensuring customer satisfaction. WORK4FOOD ensures that the income guarantees are met in such a way that it does not lead to increased working hours or degrade environmental impact. To incorporate these objectives, WORK4FOOD balances supply and demand by controlling the number of agents in the system and providing dynamic payment guarantees to agents based on factors such as agent location, ratings, etc. We evaluate WORK4FOOD on a real-world dataset from a leading food delivery platform and establish its advantages over the state of the art in terms of the multi-dimensional objectives at hand.

preprint2020arXiv

A Distributed Laplacian Solver and its Applications to Electrical Flow and Random Spanning Tree Computation

We use queueing networks to present a new approach to solving Laplacian systems. This marks a significant departure from the existing techniques, mostly based on graph-theoretic constructions and sampling. Our distributed solver works for a large and important class of Laplacian systems that we call "one-sink" Laplacian systems. Specifically, our solver can produce solutions for systems of the form $Lx = b$ where exactly one of the coordinates of $b$ is negative. Our solver is a distributed algorithm that takes $\widetilde{O}(t_{hit} d_{\max})$ time (where $\widetilde{O}$ hides $\text{poly}\log n$ factors) to produce an approximate solution where $t_{hit}$ is the worst-case hitting time of the random walk on the graph, which is $Θ(n)$ for a large set of important graphs, and $d_{\max}$ is the generalized maximum degree of the graph. The class of one-sink Laplacians includes the important voltage computation problem and allows us to compute the effective resistance between nodes in a distributed setting. As a result, our Laplacian solver can be used to adapt the approach by Kelner and Mądry (2009) to give the first distributed algorithm to compute approximate random spanning trees efficiently.

preprint2020arXiv

Batching and Matching for Food Delivery in Dynamic Road Networks

Given a stream of food orders and available delivery vehicles, how should orders be assigned to vehicles so that the delivery time is minimized? Several decisions have to be made: (1) assignment of orders to vehicles, (2) grouping orders into batches to cope with limited vehicle availability, and (3) adapting to dynamic positions of delivery vehicles. We show that the minimization problem is not only NP-hard but inapproximable in polynomial time. To mitigate this computational bottleneck, we develop an algorithm called FoodMatch, which maps the vehicle assignment problem to that of minimum weight perfect matching on a bipartite graph. To further reduce the quadratic construction cost of the bipartite graph, we deploy best-first search to only compute a subgraph that is highly likely to contain the minimum matching. The solution quality is further enhanced by reducing batching to a graph clustering problem and anticipating dynamic positions of vehicles through angular distance. Extensive experiments on food-delivery data from large metropolitan cities establish that FoodMatch is substantially better than baseline strategies on a number of metrics, while being efficient enough to handle real-world workloads.

preprint2020arXiv

Lecture notes: Efficient approximation of kernel functions

These lecture notes endeavour to collect in one place the mathematical background required to understand the properties of kernels in general and the Random Fourier Features approximation of Rahimi and Recht (NIPS 2007) in particular. We briefly motivate the use of kernels in Machine Learning with the example of the support vector machine. We discuss positive definite and conditionally negative definite kernels in some detail. After a brief discussion of Hilbert spaces, including the Reproducing Kernel Hilbert Space construction, we present Mercer's theorem. We discuss the Random Fourier Features technique and then present, with proofs, scalar and matrix concentration results that help us estimate the error incurred by the technique. These notes are the transcription of 10 lectures given at IIT Delhi between January and April 2020.

preprint2015arXiv

On the Role of Conductance, Geography and Topology in Predicting Hashtag Virality

We focus on three aspects of the early spread of a hashtag in order to predict whether it will go viral: the network properties of the subset of users tweeting the hashtag, its geographical properties, and, most importantly, its conductance-related properties. One of our significant contributions is to discover the critical role played by the conductance based features for the successful prediction of virality. More specifically, we show that the first derivative of the conductance gives an early indication of whether the hashtag is going to go viral or not. We present a detailed experimental evaluation of the effect of our various categories of features on the virality prediction task. When compared to the baselines and the state of the art techniques proposed in the literature our feature set is able to achieve significantly better accuracy on a large dataset of 7.7 million users and all their tweets over a period of month, as well as on existing datasets.

preprint2014arXiv

Optimal Radius for Connectivity in Duty-Cycled Wireless Sensor Networks

We investigate the condition on transmission radius needed to achieve connectivity in duty-cycled wireless sensor networks (briefly, DC-WSN). First, we settle a conjecture of Das et. al. (2012) and prove that the connectivity condition on Random Geometric Graphs (RGG), given by Gupta and Kumar (1989), can be used to derive a weak sufficient condition to achieve connectivity in DC-WSN. To find a stronger result, we define a new vertex-based random connection model which is of independent interest. Following a proof technique of Penrose (1991) we prove that when the density of the nodes approaches infinity then a finite component of size greater than 1 exists with probability 0 in this model. We use this result to obtain an optimal condition on node transmission radius which is both necessary and sufficient to achieve connectivity and is hence optimal. The optimality of such a radius is also tested via simulation for two specific duty-cycle schemes, called the contiguous and the random selection duty-cycle scheme. Finally, we design a minimum-radius duty-cycling scheme that achieves connectivity with a transmission radius arbitrarily close to the one required in Random Geometric Graphs. The overhead in this case is that we have to spend some time computing the schedule.

preprint2012arXiv

Topic Diffusion and Emergence of Virality in Social Networks

We propose a stochastic model for the diffusion of topics entering a social network modeled by a Watts-Strogatz graph. Our model sets into play an implicit competition between these topics as they vie for the attention of users in the network. The dynamics of our model are based on notions taken from real-world OSNs like Twitter where users either adopt an exogenous topic or copy topics from their neighbors leading to endogenous propagation. When instantiated correctly, the model achieves a viral regime where a few topics garner unusually good response from the network, closely mimicking the behavior of real-world OSNs. Our main contribution is our description of how clusters of proximate users that have spoken on the topic merge to form a large giant component making a topic go viral. This demonstrates that it is not weak ties but actually strong ties that play a major part in virality. We further validate our model and our hypotheses about its behavior by comparing our simulation results with the results of a measurement study conducted on real data taken from Twitter.

preprint2011arXiv

Spatio-Temporal Analysis of Topic Popularity in Twitter

We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 4000 topics that include both popular and less popular topics. On a data set containing approximately 10 million users and a comprehensive scraping of all the tweets posted by these users between June 2009 and August 2009 (approximately 200 million tweets), we perform a rigorous temporal and spatial analysis, investigating the time-evolving properties of the subgraphs formed by the users discussing each topic. We focus on two different notions of the spatial: the network topology formed by follower-following links on Twitter, and the geospatial location of the users. We investigate the effect of initiators on the popularity of topics and find that users with a high number of followers have a strong impact on popularity. We deduce that topics become popular when disjoint clusters of users discussing them begin to merge and form one giant component that grows to cover a significant fraction of the network. Our geospatial analysis shows that highly popular topics are those that cross regional boundaries aggressively.

preprint2010arXiv

Relating Web pages to enable information-gathering tasks

We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be productively mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: {\em SeekRel}, {\em FactRel} and {\em SurfRel}. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks - each corresponding to a particular keyword - that mirror the interconnection structure of the World Wide Web. The scores are computed by computing flows on these subnetworks. The capacities of the links are derived from the {\em hub} and {\em authority} values of the nodes they connect, following the work of Kleinberg (1998) on assigning authority to pages in hyperlinked environments. We evaluated our scoring mechanism by running experiments on four data sets taken from the Web. We present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by Google's Similar Pages feature, and the {\em Companion} algorithm proposed by Dean and Henzinger (1999).

preprint2009arXiv

Hierarchical neighbor graphs: A low stretch connected structure for points in Euclidean space

We introduce hierarchical neighbor graphs, a new architecture for connecting ad hoc wireless nodes distributed in a plane. The structure has the flavor of hierarchical clustering and requires only local knowledge and minimal computation at each node to be formed and repaired. Hence, it is a suitable interconnection model for an ad hoc wireless sensor network. The structure is able to use energy efficiently by reorganizing dynamically when the battery power of heavily utilized nodes degrades and is able to achieve throughput, energy efficiency and network lifetimes that compare favorably with the leading proposals for data collation in sensor networks such as LEACH (Heinzelman et. al., 2002). Additionally, hierarchical neighbor graphs have low power stretch i.e. the power required to connect nodes through the network is a small factor higher than the power required to connect them directly. Our structure also compares favorably to mathematical structures proposed for connecting points in a plane e.g. nearest-neighbor graphs (Ballister et. al., 2005), $θ$-graphs (Ruppert and Seidel, 1991), in that it has expected constant degree and does not require any significant computation or global information to be formed.

preprint2004arXiv

The Effect of Faults on Network Expansion

In this paper we study the problem of how resilient networks are to node faults. Specifically, we investigate the question of how many faults a network can sustain so that it still contains a large (i.e. linear-sized) connected component that still has approximately the same expansion as the original fault-free network. For this we apply a pruning technique which culls away parts of the faulty network which have poor expansion. This technique can be applied to both adversarial faults and to random faults. For adversarial faults we prove that for every network with expansion alpha, a large connected component with basically the same expansion as the original network exists for up to a constant times alpha n faults. This result is tight in the sense that every graph G of size n and uniform expansion alpha(.), i.e. G has an expansion of alpha(n) and every subgraph G' of size m of G has an expansion of O(alpha(m)), can be broken into sublinear components with omega(alpha(n) n) faults. For random faults we observe that the situation is significantly different, because in this case the expansion of a graph only gives a very weak bound on its resilience to random faults. More specifically, there are networks of uniform expansion O(sqrt{n}) that are resilient against a constant fault probability but there are also networks of uniform expansion Omega(1/log n) that are not resilient against a O(1/log n) fault probability. Thus, a different parameter is needed. For this we introduce the span of a graph which allows us to determine the maximum fault probability in a much better way than the expansion can. We use the span to show the first known results for the effect of random faults on the expansion of d-dimensional meshes.

Amitabha Bagchi

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

GRAPHGINI: Fostering Individual and Group Fairness in Graph Neural Networks

CHEX: Multiversion Replay with Ordered Checkpoints

FairFoody: Bringing in Fairness in Food Delivery

Gigs with Guarantees: Achieving Fair Wage for Food Delivery Workers

A Distributed Laplacian Solver and its Applications to Electrical Flow and Random Spanning Tree Computation

Batching and Matching for Food Delivery in Dynamic Road Networks

Lecture notes: Efficient approximation of kernel functions

On the Role of Conductance, Geography and Topology in Predicting Hashtag Virality

Optimal Radius for Connectivity in Duty-Cycled Wireless Sensor Networks

Topic Diffusion and Emergence of Virality in Social Networks

Spatio-Temporal Analysis of Topic Popularity in Twitter

Relating Web pages to enable information-gathering tasks

Hierarchical neighbor graphs: A low stretch connected structure for points in Euclidean space

The Effect of Faults on Network Expansion