Researcher profile

Mayank Goswami

Mayank Goswami contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

How many users have been here for a long time? Efficient solutions for counting long aggregated visits

This paper addresses the Counting Long Aggregated Visits problem, which is defined as follows. We are given $n$ users and $m$ regions, where each user spends some time visiting some regions. For a parameter $k$ and a query consisting of a subset of $r$ regions, the task is to count the number of distinct users whose aggregate time spent visiting the query regions is at least $k$. This problem is motivated by queries arising in the analysis of large-scale mobility datasets. We present several exact and approximate data structures for supporting counting long aggregated visits, as well as conditional and unconditional lower bounds. First, we describe an exact data structure that exhibits a space-time tradeoff, as well as efficient approximate solutions based on sampling and sketching techniques. We then study the problem in geometric settings where regions are points in $\mathbb{R}^d$ and queries are hyperrectangles, and derive exact data structures that achieve improved performance in these structured spaces.

preprint2022arXiv

A Manifold View of Adversarial Risk

The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk.

preprint2022arXiv

A Topological Filter for Learning with Label Noise

Noisy labels can impair the performance of deep neural networks. To tackle this problem, in this paper, we propose a new method for filtering label noise. Unlike most existing methods relying on the posterior probability of a noisy classifier, we focus on the much richer spatial behavior of data in the latent representational space. By leveraging the high-order topological information of data, we are able to collect most of the clean data and train a high-quality model. Theoretically we prove that this topological approach is guaranteed to collect the clean data with high probability. Empirical results show that our method outperforms the state-of-the-arts and is robust to a broad spectrum of noise types and levels.

preprint2022arXiv

Characterization of 3D Printers and X-Ray Computerized Tomography

The 3D printing process flow requires several inputs for the best printing quality. These settings may vary from sample to sample, printer to printer, and depend upon users' previous experience. The involved operational parameters for 3D Printing are varied to test the optimality. Thirty-eight samples are printed using four commercially available 3D printers, namely: (a) Ultimaker 2 Extended+, (b) Delta Wasp, (c) Raise E2, and (d) ProJet MJP. The sample profiles contain uniform and non-uniform distribution of the assorted size of cubes and spheres with a known amount of porosity. These samples are scanned using X-Ray Computed Tomography system. Functional Imaging analysis is performed using AI-based segmentation codes to (a) characterize these 3D printers and (b) find Three-dimensional surface roughness of three teeth and one sandstone pebble (from riverbed) with naturally deposited layers is also compared with printed sample values. Teeth has best quality. It is found that ProJet MJP gives the best quality of printed samples with the least amount of surface roughness and almost near to the actual porosity value. As expected, 100% infill density value, best spatial resolution for printing or Layer height, and minimum nozzle speed give the best quality of 3D printing.

preprint2022arXiv

Do we really need to recalibrate the CT system Configuration for every experiment?

Different technical and physical factors may affect the image quality reconstructed using a Computed Tomography system. We have developed and designed a 2D Gamma Computed Tomography set up to study the effect of some physical parameters. One must decide the number of detectors and set CT geometry parameters or configuration like the fan-beam angle and number of rotations. Usually, geometry parameters are determined based on the objects size. This study shows the influence of the density distribution of same-sized phantom or objects on CT geometry parameters. Due to limited space, industrial applications may not allow a CT system to move around the object (under investigation). On-spot customization of CT system configuration may be required according to similar situations. The same problem is experienced in medical science, material science, and many other fields in which the CT system is widely used for non-destructive imaging. The number of detectors in the scanning array is one major factor that optimizes a CT system. Of course, more detectors are desired for better resolution. Changing the number of detectors requires recalibration of CT geometry. A simulated work is presented to study the influence of the density distribution of same-sized objects and the number of detectors on CT system configuration. The same is verified experimentally also. A comparison between simulated and experimental results shows a good agreement.

preprint2022arXiv

Effect of scattering and electronic noise upon selection of detectors for Gamma Computerized Tomography

Computed tomography (CT) has become a vital tool in a variety of fields as a result of technological developments and continual improvement. High-quality CT images are desirable for image interpretation and obtaining information from CT images. A variety of things influence the CT image quality. Various research groups have investigated and attempted to improve image quality by examining noise/error associated with CT geometry. This study aims to select detectors for CT, which yield the least amount of noise in projection data. Three distinct gamma-ray detectors that are routinely used in CT have been compared in terms of scattering and electrical noise. The sensitivity of Kanpur Theorem-1 to scattering noise is demonstrated in this work and used to quantify the relative level of scattering noise. The detector measures the signal multiple times, and the standard deviation of the signal is used to calculate the electronic noise. It is observed that IC CsI(Tl) scintillation detector produces low electronic noise and relative scattering noise as compared to conventional electronic detectors; NaI(Tl) and HPGe.

preprint2022arXiv

Noise analysis, error estimates, and Gamma Radiation Measurement for limited detector computerized tomography application

Computed Tomography is one of the efficient and vital modalities of non-destructive techniques (NDT). Various factors influence the CT reconstruction result, including limited projection data, detector electronics optimization, background noise, detection noise, discretized nature of projection data, and many more. Radiation hardening and other aging factors that affect the operational settings may require recalibration of electronics parameters. Two well-known exercises are utilized with the motivation to improve reliability and accuracy in inverse recovery. The first exercise brute-forces an optimal candidate from the set of calibration methods for minimum error in inverse recovery. The second exercise, Kanpur Theorem-1 (KT-1) examines if optimal calibration sets electronics to impart minimum noise. The mutual conformity between statistics-derived CLT and Riemann integral transform-based KT-1 is shown first time using gamma radiation measurement. The analysis shows that measurement data with normal distribution inflicts the least noise in inverse recovery.

preprint2021arXiv

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.

preprint2020arXiv

Batched Predecessor and Sorting with Size-Priced Information in External Memory

In the unit-cost comparison model, a black box takes an input two items and outputs the result of the comparison. Problems like sorting and searching have been studied in this model, and it has been generalized to include the concept of priced information, where different pairs of items (say database records) have different comparison costs. These comparison costs can be arbitrary (in which case no algorithm can be close to optimal (Charikar et al. STOC 2000)), structured (for example, the comparison cost may depend on the length of the databases (Gupta et al. FOCS 2001)), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the cost depends on the sizes of the items, we consider the problems of sorting and batched predecessor where two non-uniform sets of items $A$ and $B$ are given as input. (1) In the RAM setting, we consider the scenario where both sets have $n$ keys each. The cost to compare two items in $A$ is $a$, to compare an item of $A$ to an item of $B$ is $b$, and to compare two items in $B$ is $c$. We give upper and lower bounds for the case $a \le b \le c$. Notice that the case $b=1, a=c=\infty$ is the famous ``nuts and bolts'' problem. (2) In the Disk-Access Model (DAM), where transferring elements between disk and internal memory is the main bottleneck, we consider the scenario where elements in $B$ are larger than elements in $A$. The larger items take more I/Os to be brought into memory, consume more space in internal memory, and are required in their entirety for comparisons. We first give output-sensitive lower and upper bounds on the batched predecessor problem, and use these to derive bounds on the complexity of sorting in the two models. Our bounds are tight in most cases, and require novel generalizations of the classical lower bound techniques in external memory to accommodate the non-uniformity of keys.

preprint2020arXiv

Cutting Polygons into Small Pieces with Chords: Laser-Based Localization

Motivated by indoor localization by tripwire lasers, we study the problem of cutting a polygon into small-size pieces, using the chords of the polygon. Several versions are considered, depending on the definition of the "size" of a piece. In particular, we consider the area, the diameter, and the radius of the largest inscribed circle as a measure of the size of a piece. We also consider different objectives, either minimizing the maximum size of a piece for a given number of chords, or minimizing the number of chords that achieve a given size threshold for the pieces. We give hardness results for polygons with holes and approximation algorithms for multiple variants of the problem.

preprint2020arXiv

On the I/O complexity of the k-nearest neighbor problem

We consider static, external memory indexes for exact and approximate versions of the $k$-nearest neighbor ($k$-NN) problem, and show new lower bounds under a standard indivisibility assumption: - Polynomial space indexing schemes for high-dimensional $k$-NN in Hamming space cannot take advantage of block transfers: $Ω(k)$ block reads are needed to to answer a query. - For the $\ell_\infty$ metric the lower bound holds even if we allow $c$-appoximate nearest neighbors to be returned, for $c \in (1, 3)$. - The restriction to $c < 3$ is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al.~using space $O(kn)$, where $n$ is the number of points, that can retrieve $k$ 3-approximate nearest neighbors using $\lceil k/B\rceil$ I/Os, which is optimal. - For specific metrics, data structures with better approximation factors are possible. For $k$-NN in Hamming space and every approximation factor $c>1$ there exists a polynomial space data structure that returns $k$ $c$-approximate nearest neighbors in $\lceil k/B\rceil$ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the $λ$-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in $d\geq n$ dimensions. To extend the lower bounds down to $d = O(k \log(n/k))$ dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.