Source author record

Arijit Khan

Arijit Khan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Data Structures and Algorithms Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning Social and Information Networks

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

View-based Explanations for Graph Neural Networks

Generating explanations for graph neural networks (GNNs) has been studied to understand their behavior in analytical tasks such as graph classification. Existing approaches aim to understand the overall results of GNNs rather than providing explanations for specific class labels of interest, and may return explanation structures that are hard to access, nor directly queryable.We propose GVEX, a novel paradigm that generates Graph Views for EXplanation. (1) We design a two-tier explanation structure called explanation views. An explanation view consists of a set of graph patterns and a set of induced explanation subgraphs. Given a database G of multiple graphs and a specific class label l assigned by a GNN-based classifier M, it concisely describes the fraction of G that best explains why l is assigned by M. (2) We propose quality measures and formulate an optimization problem to compute optimal explanation views for GNN explanation. We show that the problem is $Σ^2_P$-hard. (3) We present two algorithms. The first one follows an explain-and-summarize strategy that first generates high-quality explanation subgraphs which best explain GNNs in terms of feature influence maximization, and then performs a summarization step to generate patterns. We show that this strategy provides an approximation ratio of 1/2. Our second algorithm performs a single-pass to an input node stream in batches to incrementally maintain explanation views, having an anytime quality guarantee of 1/4 approximation. Using real-world benchmark data, we experimentally demonstrate the effectiveness, efficiency, and scalability of GVEX. Through case studies, we showcase the practical applications of GVEX.

preprint2022arXiv

Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling

A knowledge graph (KG) manages large-scale and real-world facts as a big graph in a schema-flexible manner. Aggregate query is a fundamental query over KGs, e.g., "what is the average price of cars produced in Germany?". Despite its importance, answering aggregate queries on KGs has received little attention in the literature. Aggregate queries can be supported based on factoid queries, e.g., "find all cars produced in Germany", by applying an additional aggregate operation on factoid queries' answers. However, this straightforward method is challenging because both the accuracy and efficiency of factoid query processing will seriously impact the performance of aggregate queries. In this paper, we propose a "sampling-estimation" model to answer aggregate queries over KGs, which is the first work to provide an approximate aggregate result with an effective accuracy guarantee, and without relying on factoid queries. Specifically, we first present a semantic-aware sampling to collect a high-quality random sample through a random walk based on knowledge graph embedding. Then, we propose unbiased estimators for COUNT, SUM, and a consistent estimator for AVG to compute the approximate aggregate results based on the random sample, with an accuracy guarantee in the form of confidence interval. We extend our approach to support iterative improvement of accuracy, and more complex queries with filter, GROUP-BY, and different graph shapes, e.g., chain, cycle, star, flower. Extensive experiments over real-world KGs demonstrate the effectiveness and efficiency of our approach.

preprint2022arXiv

Efficiently Embedding Dynamic Knowledge Graphs

Knowledge graph (KG) embedding encodes the entities and relations from a KG into low-dimensional vector spaces to support various applications such as KG completion, question answering, and recommender systems. In real world, knowledge graphs (KGs) are dynamic and evolve over time with addition or deletion of triples. However, most existing models focus on embedding static KGs while neglecting dynamics. To adapt to the changes in a KG, these models need to be retrained on the whole KG with a high time cost. In this paper, to tackle the aforementioned problem, we propose a new context-aware Dynamic Knowledge Graph Embedding (DKGE) method which supports the embedding learning in an online fashion. DKGE introduces two different representations (i.e., knowledge embedding and contextual element embedding) for each entity and each relation, in the joint modeling of entities and relations as well as their contexts, by employing two attentive graph convolutional networks, a gate strategy, and translation operations. This effectively helps limit the impacts of a KG update in certain regions, not in the entire graph, so that DKGE can rapidly acquire the updated KG embedding by a proposed online learning algorithm. Furthermore, DKGE can also learn KG embedding from scratch. Experiments on the tasks of link prediction and question answering in a dynamic environment demonstrate the effectiveness and efficiency of DKGE.

preprint2022arXiv

Voting-based Opinion Maximization

We investigate the novel problem of voting-based opinion maximization in a social network: Find a given number of seed nodes for a target campaigner, in the presence of other competing campaigns, so as to maximize a voting-based score for the target campaigner at a given time horizon. The bulk of the influence maximization literature assumes that social network users can switch between only two discrete states, inactive and active, and the choice to switch is frozen upon one-time activation. In reality, even when having a preferred opinion, a user may not completely despise the other opinions, and the preference level may vary over time due to social influence. To this end, we employ models rooted in opinion formation and diffusion, and use several voting-based scores to determine a user's vote for each of the multiple campaigners at a given time horizon. Our problem is NP-hard and non-submodular for various scores. We design greedy seed selection algorithms with quality guarantees for our scoring functions via sandwich approximation. To improve the efficiency, we develop random walk and sketch-based opinion computation, with quality guarantees. Empirical results validate our effectiveness, efficiency, and scalability.

preprint2021arXiv

Multi-relation Graph Summarization

Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this paper, we study the novel problem of producing summaries of multi-relation networks, i.e., graphs where multiple edges of different types may exist between any pair of nodes. Multi-relation graphs are an expressive model of real-world activities, in which a relation can be a topic in social networks, an interaction type in genetic networks, or a snapshot in temporal graphs. The first approach that we consider for multi-relation graph summarization is a two-step method based on summarizing each relation in isolation, and then aggregating the resulting summaries in some clever way to produce a final unique summary. In doing this, as a side contribution, we provide the first polynomial-time approximation algorithm based on the k-Median clustering for the classic problem of lossless single-relation graph summarization. Then, we demonstrate the shortcomings of these two-step methods, and propose holistic approaches, both approximate and heuristic algorithms, to compute a summary directly for multi-relation graphs. In particular, we prove that the approximation bound of k-Median clustering for the single relation solution can be maintained in a multi-relation graph with proper aggregation operation over adjacency matrices corresponding to its multiple relations. Experimental results and case studies (on co-authorship networks and brain networks) validate the effectiveness and efficiency of the proposed algorithms.

preprint2020arXiv

Reliability Maximization in Uncertain Graphs

Network reliability measures the probability that a target node is reachable from a source node in an uncertain graph, i.e., a graph where every edge is associated with a probability of existence. In this paper, we investigate the novel and fundamental problem of adding a small number of edges in the uncertain network for maximizing the reliability between a given pair of nodes. We study the NP-hardness and the approximation hardness of our problem, and design effective, scalable solutions. Furthermore, we consider extended versions of our problem (e.g., multiple source and target nodes can be provided as input) to support and demonstrate a wider family of queries and applications, including sensor network reliability maximization and social influence maximization. Experimental results validate the effectiveness and efficiency of the proposed algorithms.

preprint2020arXiv

Semantic Guided and Response Times Bounded Top-k Similarity Search over Knowledge Graphs

Recently, graph query is widely adopted for querying knowledge graphs. Given a query graph $G_Q$, the graph query finds subgraphs in a knowledge graph $G$ that exactly or approximately match $G_Q$. We face two challenges on graph query: (1) the structural gap between $G_Q$ and the predefined schema in $G$ causes mismatch with query graph, (2) users cannot view the answers until the graph query terminates, leading to a longer system response time (SRT). In this paper, we propose a semantic-guided and response-time-bounded graph query to return the top-k answers effectively and efficiently. We leverage a knowledge graph embedding model to build the semantic graph $SG_Q$, and we define the path semantic similarity ($pss$) over $SG_Q$ as the metric to evaluate the answer's quality. Then, we propose an A* semantic search on $SG_Q$ to find the top-k answers with the greatest $pss$ via a heuristic $pss$ estimation. Furthermore, we make an approximate optimization on A* semantic search to allow users to trade off the effectiveness for SRT within a user-specific time bound. Extensive experiments over real datasets confirm the effectiveness and efficiency of our solution.

preprint2016arXiv

Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly

We study distributed graph algorithms that adopt an iterative vertex-centric framework for graph processing, popularized by the Google's Pregel system. Since then, there are several attempts to implement many graph algorithms in a vertex-centric framework, as well as efforts to design optimization techniques for improving the efficiency. However, to the best of our knowledge, there has not been any systematic study to compare these vertex-centric implementations with their sequential counterparts. Our paper addresses this gap in two ways. (1) We analyze the computational complexity of such implementations with the notion of time-processor product, and benchmark several vertex-centric graph algorithms whether they perform more work with respect to their best-known sequential solutions. (2) Employing the concept of balanced practical Pregel algorithms, we study if these implementations suffer from imbalanced workload and large number of iterations. Our findings illustrate that with the exception of Euler tour tree algorithm, all other algorithms either perform more work than their best-known sequential approach, or suffer from imbalanced workload/ large number of iterations, or even both. We also emphasize on graph algorithms that are fundamentally difficult to be expressed in vertex-centric frameworks, and conclude by discussing the road ahead for distributed graph processing.

preprint2013arXiv

Querying Knowledge Graphs by Example Entity Tuples

We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over simple tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, GQBE (Graph Query By Example), automatically derives a weighted hidden maximal query graph based on input query tuples, to capture a user's query intent. It efficiently finds and ranks the top approximate answer tuples. For fast query processing, GQBE only partially evaluates query graphs. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.

Arijit Khan

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

View-based Explanations for Graph Neural Networks

Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling

Efficiently Embedding Dynamic Knowledge Graphs

Voting-based Opinion Maximization

Multi-relation Graph Summarization

Reliability Maximization in Uncertain Graphs

Semantic Guided and Response Times Bounded Top-k Similarity Search over Knowledge Graphs

Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly

Querying Knowledge Graphs by Example Entity Tuples