Source author record

Mayank Goswami

Mayank Goswami appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Machine Learning math.CO physics.ins-det Computer Vision eess.IV physics.app-ph Discrete Mathematics Distributed, Parallel, and Cluster Computing math.CV math.DG math.OC physics.data-an

Catalog footprint

What is connected

20works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

How many users have been here for a long time? Efficient solutions for counting long aggregated visits

This paper addresses the Counting Long Aggregated Visits problem, which is defined as follows. We are given $n$ users and $m$ regions, where each user spends some time visiting some regions. For a parameter $k$ and a query consisting of a subset of $r$ regions, the task is to count the number of distinct users whose aggregate time spent visiting the query regions is at least $k$. This problem is motivated by queries arising in the analysis of large-scale mobility datasets. We present several exact and approximate data structures for supporting counting long aggregated visits, as well as conditional and unconditional lower bounds. First, we describe an exact data structure that exhibits a space-time tradeoff, as well as efficient approximate solutions based on sampling and sketching techniques. We then study the problem in geometric settings where regions are points in $\mathbb{R}^d$ and queries are hyperrectangles, and derive exact data structures that achieve improved performance in these structured spaces.

preprint2022arXiv

A Manifold View of Adversarial Risk

The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk.

preprint2022arXiv

A Topological Filter for Learning with Label Noise

Noisy labels can impair the performance of deep neural networks. To tackle this problem, in this paper, we propose a new method for filtering label noise. Unlike most existing methods relying on the posterior probability of a noisy classifier, we focus on the much richer spatial behavior of data in the latent representational space. By leveraging the high-order topological information of data, we are able to collect most of the clean data and train a high-quality model. Theoretically we prove that this topological approach is guaranteed to collect the clean data with high probability. Empirical results show that our method outperforms the state-of-the-arts and is robust to a broad spectrum of noise types and levels.

preprint2022arXiv

Characterization of 3D Printers and X-Ray Computerized Tomography

The 3D printing process flow requires several inputs for the best printing quality. These settings may vary from sample to sample, printer to printer, and depend upon users' previous experience. The involved operational parameters for 3D Printing are varied to test the optimality. Thirty-eight samples are printed using four commercially available 3D printers, namely: (a) Ultimaker 2 Extended+, (b) Delta Wasp, (c) Raise E2, and (d) ProJet MJP. The sample profiles contain uniform and non-uniform distribution of the assorted size of cubes and spheres with a known amount of porosity. These samples are scanned using X-Ray Computed Tomography system. Functional Imaging analysis is performed using AI-based segmentation codes to (a) characterize these 3D printers and (b) find Three-dimensional surface roughness of three teeth and one sandstone pebble (from riverbed) with naturally deposited layers is also compared with printed sample values. Teeth has best quality. It is found that ProJet MJP gives the best quality of printed samples with the least amount of surface roughness and almost near to the actual porosity value. As expected, 100% infill density value, best spatial resolution for printing or Layer height, and minimum nozzle speed give the best quality of 3D printing.

preprint2022arXiv

Do we really need to recalibrate the CT system Configuration for every experiment?

Different technical and physical factors may affect the image quality reconstructed using a Computed Tomography system. We have developed and designed a 2D Gamma Computed Tomography set up to study the effect of some physical parameters. One must decide the number of detectors and set CT geometry parameters or configuration like the fan-beam angle and number of rotations. Usually, geometry parameters are determined based on the objects size. This study shows the influence of the density distribution of same-sized phantom or objects on CT geometry parameters. Due to limited space, industrial applications may not allow a CT system to move around the object (under investigation). On-spot customization of CT system configuration may be required according to similar situations. The same problem is experienced in medical science, material science, and many other fields in which the CT system is widely used for non-destructive imaging. The number of detectors in the scanning array is one major factor that optimizes a CT system. Of course, more detectors are desired for better resolution. Changing the number of detectors requires recalibration of CT geometry. A simulated work is presented to study the influence of the density distribution of same-sized objects and the number of detectors on CT system configuration. The same is verified experimentally also. A comparison between simulated and experimental results shows a good agreement.

preprint2022arXiv

Effect of scattering and electronic noise upon selection of detectors for Gamma Computerized Tomography

Computed tomography (CT) has become a vital tool in a variety of fields as a result of technological developments and continual improvement. High-quality CT images are desirable for image interpretation and obtaining information from CT images. A variety of things influence the CT image quality. Various research groups have investigated and attempted to improve image quality by examining noise/error associated with CT geometry. This study aims to select detectors for CT, which yield the least amount of noise in projection data. Three distinct gamma-ray detectors that are routinely used in CT have been compared in terms of scattering and electrical noise. The sensitivity of Kanpur Theorem-1 to scattering noise is demonstrated in this work and used to quantify the relative level of scattering noise. The detector measures the signal multiple times, and the standard deviation of the signal is used to calculate the electronic noise. It is observed that IC CsI(Tl) scintillation detector produces low electronic noise and relative scattering noise as compared to conventional electronic detectors; NaI(Tl) and HPGe.

preprint2022arXiv

Noise analysis, error estimates, and Gamma Radiation Measurement for limited detector computerized tomography application

Computed Tomography is one of the efficient and vital modalities of non-destructive techniques (NDT). Various factors influence the CT reconstruction result, including limited projection data, detector electronics optimization, background noise, detection noise, discretized nature of projection data, and many more. Radiation hardening and other aging factors that affect the operational settings may require recalibration of electronics parameters. Two well-known exercises are utilized with the motivation to improve reliability and accuracy in inverse recovery. The first exercise brute-forces an optimal candidate from the set of calibration methods for minimum error in inverse recovery. The second exercise, Kanpur Theorem-1 (KT-1) examines if optimal calibration sets electronics to impart minimum noise. The mutual conformity between statistics-derived CLT and Riemann integral transform-based KT-1 is shown first time using gamma radiation measurement. The analysis shows that measurement data with normal distribution inflicts the least noise in inverse recovery.

preprint2021arXiv

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.

preprint2020arXiv

Batched Predecessor and Sorting with Size-Priced Information in External Memory

In the unit-cost comparison model, a black box takes an input two items and outputs the result of the comparison. Problems like sorting and searching have been studied in this model, and it has been generalized to include the concept of priced information, where different pairs of items (say database records) have different comparison costs. These comparison costs can be arbitrary (in which case no algorithm can be close to optimal (Charikar et al. STOC 2000)), structured (for example, the comparison cost may depend on the length of the databases (Gupta et al. FOCS 2001)), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the cost depends on the sizes of the items, we consider the problems of sorting and batched predecessor where two non-uniform sets of items $A$ and $B$ are given as input. (1) In the RAM setting, we consider the scenario where both sets have $n$ keys each. The cost to compare two items in $A$ is $a$, to compare an item of $A$ to an item of $B$ is $b$, and to compare two items in $B$ is $c$. We give upper and lower bounds for the case $a \le b \le c$. Notice that the case $b=1, a=c=\infty$ is the famous ``nuts and bolts'' problem. (2) In the Disk-Access Model (DAM), where transferring elements between disk and internal memory is the main bottleneck, we consider the scenario where elements in $B$ are larger than elements in $A$. The larger items take more I/Os to be brought into memory, consume more space in internal memory, and are required in their entirety for comparisons. We first give output-sensitive lower and upper bounds on the batched predecessor problem, and use these to derive bounds on the complexity of sorting in the two models. Our bounds are tight in most cases, and require novel generalizations of the classical lower bound techniques in external memory to accommodate the non-uniformity of keys.

preprint2020arXiv

Cutting Polygons into Small Pieces with Chords: Laser-Based Localization

Motivated by indoor localization by tripwire lasers, we study the problem of cutting a polygon into small-size pieces, using the chords of the polygon. Several versions are considered, depending on the definition of the "size" of a piece. In particular, we consider the area, the diameter, and the radius of the largest inscribed circle as a measure of the size of a piece. We also consider different objectives, either minimizing the maximum size of a piece for a given number of chords, or minimizing the number of chords that achieve a given size threshold for the pieces. We give hardness results for polygons with holes and approximation algorithms for multiple variants of the problem.

preprint2020arXiv

On the I/O complexity of the k-nearest neighbor problem

We consider static, external memory indexes for exact and approximate versions of the $k$-nearest neighbor ($k$-NN) problem, and show new lower bounds under a standard indivisibility assumption: - Polynomial space indexing schemes for high-dimensional $k$-NN in Hamming space cannot take advantage of block transfers: $Ω(k)$ block reads are needed to to answer a query. - For the $\ell_\infty$ metric the lower bound holds even if we allow $c$-appoximate nearest neighbors to be returned, for $c \in (1, 3)$. - The restriction to $c < 3$ is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al.~using space $O(kn)$, where $n$ is the number of points, that can retrieve $k$ 3-approximate nearest neighbors using $\lceil k/B\rceil$ I/Os, which is optimal. - For specific metrics, data structures with better approximation factors are possible. For $k$-NN in Hamming space and every approximation factor $c>1$ there exists a polynomial space data structure that returns $k$ $c$-approximate nearest neighbors in $\lceil k/B\rceil$ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the $λ$-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in $d\geq n$ dimensions. To extend the lower bounds down to $d = O(k \log(n/k))$ dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.

preprint2016arXiv

Distance Sensitive Bloom Filters Without False Negatives

A Bloom filter is a widely used data-structure for representing a set $S$ and answering queries of the form "Is $x$ in $S$?". By allowing some false positive answers (saying "yes" when the answer is in fact `no') Bloom filters use space significantly below what is required for storing $S$. In the distance sensitive setting we work with a set $S$ of (Hamming) vectors and seek a data structure that offers a similar trade-off, but answers queries of the form "Is $x$ close to an element of $S$?" (in Hamming distance). Previous work on distance sensitive Bloom filters have accepted false positive and false negative answers. Absence of false negatives is of critical importance in many applications of Bloom filters, so it is natural to ask if this can be also achieved in the distance sensitive setting. Our main contributions are upper and lower bounds (that are tight in several cases) for space usage in the distance sensitive setting where false negatives are not allowed.

preprint2016arXiv

The landscape of bounds for binary search trees

Binary search trees (BSTs) with rotations can adapt to various kinds of structure in search sequences, achieving amortized access times substantially better than the Theta(log n) worst-case guarantee. Classical examples of structural properties include static optimality, sequential access, working set, key-independent optimality, and dynamic finger, all of which are now known to be achieved by the two famous online BST algorithms (Splay and Greedy). (...) In this paper, we introduce novel properties that explain the efficiency of sequences not captured by any of the previously known properties, and which provide new barriers to the dynamic optimality conjecture. We also establish connections between various properties, old and new. For instance, we show the following. (i) A tight bound of O(n log d) on the cost of Greedy for d-decomposable sequences. The result builds on the recent lazy finger result of Iacono and Langerman (SODA 2016). On the other hand, we show that lazy finger alone cannot explain the efficiency of pattern avoiding sequences even in some of the simplest cases. (ii) A hierarchy of bounds using multiple lazy fingers, addressing a recent question of Iacono and Langerman. (iii) The optimality of the Move-to-root heuristic in the key-independent setting introduced by Iacono (Algorithmica 2005). (iv) A new tool that allows combining any finite number of sound structural properties. As an application, we show an upper bound on the cost of a class of sequences that all known properties fail to capture. (v) The equivalence between two families of BST properties. The observation on which this connection is based was known before - we make it explicit, and apply it to classical BST properties. (...)

preprint2015arXiv

Greedy Is an Almost Optimal Deque

In this paper we extend the geometric binary search tree (BST) model of Demaine, Harmon, Iacono, Kane, and Patrascu (DHIKP) to accommodate for insertions and deletions. Within this extended model, we study the online Greedy BST algorithm introduced by DHIKP. Greedy BST is known to be equivalent to a maximally greedy (but inherently offline) algorithm introduced independently by Lucas in 1988 and Munro in 2000, conjectured to be dynamically optimal. With the application of forbidden-submatrix theory, we prove a quasilinear upper bound on the performance of Greedy BST on deque sequences. It has been conjectured (Tarjan, 1985) that splay trees (Sleator and Tarjan, 1983) can serve such sequences in linear time. Currently neither splay trees, nor other general-purpose BST algorithms are known to fulfill this requirement. As a special case, we show that Greedy BST can serve output-restricted deque sequences in linear time. A similar result is known for splay trees (Tarjan, 1985; Elmasry, 2004). As a further application of the insert-delete model, we give a simple proof that, given a set U of permutations of [n], the access cost of any BST algorithm is Omega(log |U| + n) on "most" of the permutations from U. In particular, this implies that the access cost for a random permutation of [n] is Omega(n log n) with high probability. Besides the splay tree noted before, Greedy BST has recently emerged as a plausible candidate for dynamic optimality. Compared to splay trees, much less effort has gone into analyzing Greedy BST. Our work is intended as a step towards a full understanding of Greedy BST, and we remark that forbidden-submatrix arguments seem particularly well suited for carrying out this program.

preprint2015arXiv

Pattern-avoiding access in binary search trees

The dynamic optimality conjecture is perhaps the most fundamental open question about binary search trees (BST). It postulates the existence of an asymptotically optimal online BST, i.e. one that is constant factor competitive with any BST on any input access sequence. The two main candidates for dynamic optimality in the literature are splay trees [Sleator and Tarjan, 1985], and Greedy [Lucas, 1988; Munro, 2000; Demaine et al. 2009] [..] Dynamic optimality is trivial for almost all sequences: the optimum access cost of most length-n sequences is Theta(n log n), achievable by any balanced BST. Thus, the obvious missing step towards the conjecture is an understanding of the "easy" access sequences. [..] The difficulty of proving dynamic optimality is witnessed by highly restricted special cases that remain unresolved; one prominent example is the traversal conjecture [Sleator and Tarjan, 1985], which states that preorder sequences (whose optimum is linear) are linear-time accessed by splay trees; no online BST is known to satisfy this conjecture. In this paper, we prove two different relaxations of the traversal conjecture for Greedy: (i) Greedy is almost linear for preorder traversal, (ii) if a linear-time preprocessing is allowed, Greedy is in fact linear. These statements are corollaries of our more general results that express the complexity of access sequences in terms of a pattern avoidance parameter k. [..] To our knowledge, these are the first upper bounds for Greedy that are not known to hold for any other online BST. To obtain these results we identify an input-revealing property of Greedy. Informally, this means that the execution log partially reveals the structure of the access sequence. This property facilitates the use of rich technical tools from forbidden submatrix theory. [Abridged]

preprint2015arXiv

Self-Adjusting Binary Search Trees: What Makes Them Tick?

Splay trees (Sleator and Tarjan) satisfy the so-called access lemma. Many of the nice properties of splay trees follow from it. What makes self-adjusting binary search trees (BSTs) satisfy the access lemma? After each access, self-adjusting BSTs replace the search path by a tree on the same set of nodes (the after-tree). We identify two simple combinatorial properties of the search path and the after-tree that imply the access lemma. Our main result (i) implies the access lemma for all minimally self-adjusting BST algorithms for which it was known to hold: splay trees and their generalization to the class of local algorithms (Subramanian, Georgakopoulos and Mc-Clurkin), as well as Greedy BST, introduced by Demaine et al. and shown to satisfy the access lemma by Fox, (ii) implies that BST algorithms based on "strict" depth-halving satisfy the access lemma, addressing an open question that was raised several times since 1985, and (iii) yields an extremely short proof for the O(log n log log n) amortized access cost for the path-balance heuristic (proposed by Sleator), matching the best known bound (Balasubramanian and Raman) to a lower-order factor. One of our combinatorial properties is locality. We show that any BST-algorithm that satisfies the access lemma via the sum-of-log (SOL) potential is necessarily local. The other property states that the sum of the number of leaves of the after-tree plus the number of side alternations in the search path must be at least a constant fraction of the length of the search path. We show that a weak form of this property is necessary for sequential access to be linear.

preprint2015arXiv

Space Filling Curves for 3D Sensor Networks with Complex Topology

Several aspects of managing a sensor network (e.g., motion planning for data mules, serial data fusion and inference) benefit once the network is linearized to a path. The linearization is often achieved by constructing a space filling curve in the domain. However, existing methods cannot handle networks distributed on surfaces of complex topology. This paper presents a novel method for generating space filling curves for 3D sensor networks that are distributed densely on some two-dimensional geometric surface. Our algorithm is completely distributed and constructs a path which gets uniformly, progressively denser as it becomes longer. We analyze the algorithm mathematically and prove that the curve we obtain is dense. Our method is based on the Hodge decomposition theorem and uses holomorphic differentials on Riemann surfaces. The underlying high genus surface is conformally mapped to a union of flat tori and then a proportionally-dense space filling curve on this union is constructed. The pullback of this curve to the original network gives us the desired curve.

preprint2015arXiv

Uniformity of point samples in metric spaces using gap ratio

Teramoto et al. defined a new measure called the gap ratio that measures the uniformity of a finite point set sampled from $\cal S$, a bounded subset of $\mathbb{R}^2$. We generalize this definition of measure over all metric spaces by appealing to covering and packing radius. The definition of gap ratio needs only a metric unlike discrepancy, a widely used uniformity measure, that depends on the notion of a range space and its volume. We also show some interesting connections of gap ratio to Delaunay triangulation and discrepancy in the Euclidean plane. The major focus of this work is on solving optimization related questions about selecting uniform point samples from metric spaces; the uniformity being measured using gap ratio. We consider discrete spaces like graph and set of points in the Euclidean space and continuous spaces like the unit square and path connected spaces. We deduce lower bounds, prove hardness and approximation hardness results. We show that a general approximation algorithm framework gives different approximation ratios for different metric spaces based on the lower bound we deduce. Apart from the above, we show existence of coresets for sampling uniform points from the Euclidean space -- for both the static and the streaming case. This leads to a $\left( 1+ε\right)$-approximation algorithm for uniform sampling from the Euclidean space.

preprint2014arXiv

Approximate Range Emptiness in Constant Time and Optimal Space

This paper studies the \emph{$\varepsilon$-approximate range emptiness} problem, where the task is to represent a set $S$ of $n$ points from $\{0,\ldots,U-1\}$ and answer emptiness queries of the form "$[a ; b]\cap S \neq \emptyset$ ?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length $L$. Setting the false positive rate to $\varepsilon/L$ and performing $L$ queries, Bloom filters yield a solution to this problem with space $O(n \lg(L/\varepsilon))$ bits, false positive probability bounded by $\varepsilon$ for intervals of length up to $L$, using query time $O(L \lg(L/\varepsilon))$. Our first contribution is to show that the space/error trade-off cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to $L$ with false positive probability $\varepsilon$, must use space $Ω(n \lg(L/\varepsilon)) - O(n)$ bits. On the positive side we show that the query time can be improved greatly, to constant time, while matching our space lower bound up to a lower order additive term. This result is achieved through a succinct data structure for (non-approximate 1d) range emptiness/reporting queries, which may be of independent interest.

preprint2014arXiv

Computing Teichmüller Maps between Polygons

By the Riemann-mapping theorem, one can bijectively map the interior of an $n$-gon $P$ to that of another $n$-gon $Q$ conformally. However, (the boundary extension of) this mapping need not necessarily map the vertices of $P$ to those $Q$. In this case, one wants to find the ``best" mapping between these polygons, i.e., one that minimizes the maximum angle distortion (the dilatation) over \textit{all} points in $P$. From complex analysis such maps are known to exist and are unique. They are called extremal quasiconformal maps, or Teichmüller maps. Although there are many efficient ways to compute or approximate conformal maps, there is currently no such algorithm for extremal quasiconformal maps. This paper studies the problem of computing extremal quasiconformal maps both in the continuous and discrete settings. We provide the first constructive method to obtain the extremal quasiconformal map in the continuous setting. Our construction is via an iterative procedure that is proven to converge quickly to the unique extremal map. To get to within $ε$ of the dilatation of the extremal map, our method uses $O(1/ε^{4})$ iterations. Every step of the iteration involves convex optimization and solving differential equations, and guarantees a decrease in the dilatation. Our method uses a reduction of the polygon mapping problem to that of the punctured sphere problem, thus solving a more general problem. We also discretize our procedure. We provide evidence for the fact that the discrete procedure closely follows the continuous construction and is therefore expected to converge quickly to a good approximation of the extremal quasiconformal map.

Mayank Goswami

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

How many users have been here for a long time? Efficient solutions for counting long aggregated visits

A Manifold View of Adversarial Risk

A Topological Filter for Learning with Label Noise

Characterization of 3D Printers and X-Ray Computerized Tomography

Do we really need to recalibrate the CT system Configuration for every experiment?

Effect of scattering and electronic noise upon selection of detectors for Gamma Computerized Tomography

Noise analysis, error estimates, and Gamma Radiation Measurement for limited detector computerized tomography application

Stability of SGD: Tightness Analysis and Improved Bounds

Batched Predecessor and Sorting with Size-Priced Information in External Memory

Cutting Polygons into Small Pieces with Chords: Laser-Based Localization

On the I/O complexity of the k-nearest neighbor problem

Distance Sensitive Bloom Filters Without False Negatives

The landscape of bounds for binary search trees

Greedy Is an Almost Optimal Deque

Pattern-avoiding access in binary search trees

Self-Adjusting Binary Search Trees: What Makes Them Tick?

Space Filling Curves for 3D Sensor Networks with Complex Topology

Uniformity of point samples in metric spaces using gap ratio

Approximate Range Emptiness in Constant Time and Optimal Space

Computing Teichmüller Maps between Polygons