Source author record

Debarati Das

Debarati Das appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

We consider the numerical taxonomy problem of fitting a positive distance function ${D:{S\choose 2}\rightarrow \mathbb R_{>0}}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match those in $D$. A nice application is in evolutionary biology where the tree $T$ aims to approximate the branching process leading to the observed distances in $D$ [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in $S$. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was $O((\log n)(\log \log n))$ by Ailon and Charikar [2005] who wrote "Determining whether an $O(1)$ approximation can be obtained is a fascinating question".

preprint2021arXiv

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within an approximation factor $\text{poly}(\log n)$. In this paper, we provide an algorithm with running time $\tilde{O}(n^{2-2/7})$ that approximates the edit distance within a constant factor.

preprint2020arXiv

No Repetition: Fast Streaming with Highly Concentrated Hashing

To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of this approach are small space algorithms for estimating the number of distinct elements in a stream, or estimating the set similarity between large sets. Using standard strongly universal hashing to process each element, we get a sketch based estimator where the probability of a too large error is, say, 1/4. By performing $r$ independent repetitions and taking the median of the estimators, the error probability falls exponentially in $r$. However, running $r$ independent experiments increases the processing time by a factor $r$. Here we make the point that if we have a hash function with strong concentration bounds, then we get the same high probability bounds without any need for repetitions. Instead of $r$ independent sketches, we have a single sketch that is $r$ times bigger, so the total space is the same. However, we only apply a single hash function, so we save a factor $r$ in time, and the overall algorithms just get simpler. Fast practical hash functions with strong concentration bounds were recently proposed by Aamand em et al. (to appear in STOC 2020). Using their hashing schemes, the algorithms thus become very fast and practical, suitable for online processing of high volume data streams.

preprint2015arXiv

Modeling Memetics using Edge Diversity

The study of meme propagation and the prediction of meme trajectory are emerging areas of interest in the field of complex networks research. In addition to the properties of the meme itself, the structural properties of the underlying network decides the speed and the trajectory of the propagating meme. In this paper, we provide an artificial framework for studying the meme propagation patterns. Firstly, the framework includes a synthetic network which simulates a real world network and acts as a testbed for meme simulation. Secondly, we propose a meme spreading model based on the diversity of edges in the network. Through the experiments conducted, we show that the generated synthetic network combined with the proposed spreading model is able to simulate a real world meme spread. Our proposed model is validated by the propagation of the Higgs boson meme on Twitter as well as many real world social networks.

preprint2015arXiv

Pseudo-Cores: The Terminus of an Intelligent Viral Meme's Trajectory

Comprehending the virality of a meme can help us in addressing the problems pertaining to disciplines like epidemiology and digital marketing. Therefore, it is not surprising that memetics remains a highly analyzed research topic ever since the mid 1990s. Some scientists choose to investigate the intrinsic contagiousness of a meme while others study the problem from a network theory perspective. In this paper, we revisit the idea of a core-periphery structure and apply it to understand the trajectory of a viral meme in a social network. We have proposed shell-based hill climbing algorithms to determine the path from a periphery shell(where the meme originates) to the core of the network. Further simulations and analysis on the networks behavioral characteristics helped us unearth specialized shells which we term Pseudo-Cores. These shells emulate the behavior of the core in terms of size of the cascade triggered. In our experiments, we have considered two sets for the target nodes, one being core and the other being any of the pseudo-core. We compare our algorithms against already existing path finding algorithms and validate the better performance experimentally.

Debarati Das

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

No Repetition: Fast Streaming with Highly Concentrated Hashing

Modeling Memetics using Edge Diversity

Pseudo-Cores: The Terminus of an Intelligent Viral Meme's Trajectory