Source author record

Arnab Bhattacharya

Arnab Bhattacharya appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

36works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

INDIC DIALECT: A Multi Task Benchmark to Evaluate and Translate in Indian Language Dialects

Recent NLP advances focus primarily on standardized languages, leaving most low-resource dialects under-served especially in Indian scenarios. In India, the issue is particularly important: despite Hindi being the third most spoken language globally (over 600 million speakers), its numerous dialects remain underrepresented. The situation is similar for Odia, which has around 45 million speakers. While some datasets exist which contain standard Hindi and Odia languages, their regional dialects have almost no web presence. We introduce INDIC-DIALECT, a human-curated parallel corpus of 13k sentence pairs spanning 11 dialects and 2 languages: Hindi and Odia. Using this corpus, we construct a multi-task benchmark with three tasks: dialect classification, multiple-choice question (MCQ) answering, and machine translation (MT). Our experiments show that LLMs like GPT-4o and Gemini 2.5 perform poorly on the classification task. While fine-tuned transformer based models pretrained on Indian languages substantially improve performance e.g., improving F1 from 19.6\% to 89.8\% on dialect classification. For dialect to language translation, we find that hybrid AI model achieves highest BLEU score of 61.32 compared to the baseline score of 23.36. Interestingly, due to complexity in generating dialect sentences, we observe that for language to dialect translation the ``rule-based followed by AI" approach achieves best BLEU score of 48.44 compared to the baseline score of 27.59. INDIC-DIALECT thus is a new benchmark for dialect-aware Indic NLP, and we plan to release it as open source to support further work on low-resource Indian dialects.

preprint2025arXiv

Lexical and Statistical Analysis of Bangla Newspaper and Literature: A Corpus-Driven Study on Diversity, Readability, and NLP Adaptation

In this paper, we present a comprehensive corpus-driven analysis of Bangla literary and newspaper texts to investigate their lexical diversity, structural complexity and readability. We undertook Vacaspati and IndicCorp, which are the most extensive literature and newspaper-only corpora for Bangla. We examine key linguistic properties, including the type-token ratio (TTR), hapax legomena ratio (HLR), Bigram diversity, average syllable and word lengths, and adherence to Zipfs Law, for both newspaper (IndicCorp) and literary corpora (Vacaspati).For all the features, such as Bigram Diversity and HLR, despite its smaller size, the literary corpus exhibits significantly higher lexical richness and structural variation. Additionally, we tried to understand the diversity of corpora by building n-gram models and measuring perplexity. Our findings reveal that literary corpora have higher perplexity than newspaper corpora, even for similar sentence sizes. This trend can also be observed for the English newspaper and literature corpus, indicating its generalizability. We also examined how the performance of models on downstream tasks is influenced by the inclusion of literary data alongside newspaper data. Our findings suggest that integrating literary data with newspapers improves the performance of models on various downstream tasks. We have also demonstrated that a literary corpus adheres more closely to global word distribution properties, such as Zipfs law, than a newspaper corpus or a merged corpus of both literary and newspaper texts. Literature corpora also have higher entropy and lower redundancy values compared to a newspaper corpus. We also further assess the readability using Flesch and Coleman-Liau indices, showing that literary texts are more complex.

preprint2022arXiv

Koopman-based Differentiable Predictive Control for the Dynamics-Aware Economic Dispatch Problem

The dynamics-aware economic dispatch (DED) problem embeds low-level generator dynamics and operational constraints to enable near real-time scheduling of generation units in a power network. DED produces a more dynamic supervisory control policy than traditional economic dispatch (T-ED) that leads to reduced overall generation costs. However, the incorporation of differential equations that govern the system dynamics makes DED an optimization problem that is computationally prohibitive to solve. In this work, we present a new data-driven approach based on differentiable programming to efficiently obtain parametric solutions to the underlying DED problem. In particular, we employ the recently proposed differentiable predictive control (DPC) for offline learning of explicit neural control policies using an identified Koopman operator (KO) model of the power system dynamics. We demonstrate the high solution quality and five orders of magnitude computational-time savings of the DPC method over the original online optimization-based DED approach on a 9-bus test power grid network.

preprint2022arXiv

Predictions of Reynolds and Nusselt numbers in turbulent convection using machine-learning models

In this paper, we develop a multivariate regression model and a neural network model to predict the Reynolds number (Re) and Nusselt number in turbulent thermal convection. We compare their predictions with those of earlier models of convection: Grossmann-Lohse~[Phys. Rev. Lett. \textbf{86}, 3316 (2001)], revised Grossmann-Lohse~[Phys. Fluids \textbf{33}, 015113 (2021)], and Pandey-Verma [Phys. Rev. E \textbf{94}, 053106 (2016)] models. We observe that although the predictions of all the models are quite close to each other, the machine learning models developed in this work provide the best match with the experimental and numerical results.

preprint2022arXiv

Recommendation of Compatible Outfits Conditioned on Style

Recommendation in the fashion domain has seen a recent surge in research in various areas, for example, shop-the-look, context-aware outfit creation, personalizing outfit creation, etc. The majority of state of the art approaches in the domain of outfit recommendation pursue to improve compatibility among items so as to produce high quality outfits. Some recent works have realized that style is an important factor in fashion and have incorporated it in compatibility learning and outfit generation. These methods often depend on the availability of fine-grained product categories or the presence of rich item attributes (e.g., long-skirt, mini-skirt, etc.). In this work, we aim to generate outfits conditional on styles or themes as one would dress in real life, operating under the practical assumption that each item is mapped to a high level category as driven by the taxonomy of an online portal, like outdoor, formal etc and an image. We use a novel style encoder network that renders outfit styles in a smooth latent space. We present an extensive analysis of different aspects of our method and demonstrate its superiority over existing state of the art baselines through rigorous experiments.

preprint2022arXiv

Terahertz Optical Properties and Birefringence in Single Crystal Vanadium doped [100] \b{eta}-Ga2O3

We report the Terahertz optical properties of the Vanadium doped [100] \b{eta}-Ga2O3 using Terahertz Time-Domain Spectroscopy (THz-TDS). The V-doped \b{eta}-Ga2O3 crystal shows strong birefringence in the 0.2-2.4 THz range. Further, phase retardation by the V-doped \b{eta}-Ga2O3 has been measured over the whole THz range by Terahertz Time-Domain Polarimetry (THz-TDP). It is observed that the V-doped \b{eta}-Ga2O3 crystal behaves both as a quarter waveplate (QWP) at 0.38, 1.08, 1.71, 2.28 THz, and a half waveplate (HWP) at 0.74 and 1.94 THz, respectively.

preprint2022arXiv

Vanadium doped beta-Ga2O3 single crystals: Growth, Optical and Terahertz characterization

We report the growth of electrically-resistive vanadium-doped beta-Ga2O3 single crystals via the optical floating zone technique. By carefully controlling the growth parameters V-doped crystals with very high electrical resistivity compared to the usual n-type V-doped beta-Ga2O3 (ne~10^(18)/cm^3) can be synthesized. The optical properties of such high resistive V-doped b-Ga2O3 are significantly different compared to the undoped and n-doped crystals. We study the polarization-dependent Raman spectra, polarization-dependent transmission, temperature-dependent photoluminescence in the optical wavelength range and the THz transmission properties in the 0.2 - 2.6 THz range. The V-doped insulating Ga2O3 crystals show strong birefringence with refractive index contrast Dn of 0.3+-0.02 at 1 THz, suggesting it to be an ideal material for optical applications in the THz region.

preprint2020arXiv

C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation

Estimation of information theoretic quantities such as mutual information and its conditional variant has drawn interest in recent times owing to their multifaceted applications. Newly proposed neural estimators for these quantities have overcome severe drawbacks of classical $k$NN-based estimators in high dimensions. In this work, we focus on conditional mutual information (CMI) estimation by utilizing its formulation as a minmax optimization problem. Such a formulation leads to a joint training procedure similar to that of generative adversarial networks. We find that our proposed estimator provides better estimates than the existing approaches on a variety of simulated data sets comprising linear and non-linear relations between variables. As an application of CMI estimation, we deploy our estimator for conditional independence (CI) testing on real data and obtain better results than state-of-the-art CI testers.

preprint2020arXiv

How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs

Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This brings us to the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing these issues, we present HUKA which uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---to the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs.

preprint2020arXiv

Learning Koopman Representations for Hybrid Systems

The Koopman operator lifts nonlinear dynamical systems into a functional space of observables, where the dynamics are linear. In this paper, we provide three different Koopman representations for hybrid systems. The first is specific to switched systems, and the second and third preserve the original hybrid dynamics while eliminating the discrete state variables; the second approach is straightforward, and we provide conditions under which the transformation associated with the third holds. Eliminating discrete state variables provides computational benefits when using data-driven methods to learn the Koopman operator and its observables. Following this, we use deep learning to implement each representation on two test cases, discuss the challenges associated with those implementations, and propose areas of future work.

preprint2020arXiv

Model Predictive Control of Discrete-Continuous Energy Systems via Generalized Disjunctive Programming

Generalized Disjunctive Programming (GDP) provides an alternative framework to model optimization problems with both discrete and continuous variables. The key idea behind GDP involves the use of logical disjunctions to represent discrete decisions in the continuous space, and logical propositions to denote algebraic constraints in the discrete space. Compared to traditional mixed-integer programming (MIP), the inherent logic structure in GDP yields tighter relaxations that are exploited by global branch and bound algorithms to improve solution quality. In this paper, we present a general GDP model for optimal control of hybrid systems that exhibit both discrete and continuous dynamics. Specifically, we use GDP to formulate a model predictive control (MPC) model for piecewise-affine systems with implicit switching logic. As an example, the GDP-based MPC approach is used as a supervisory control to improve energy efficiency in residential buildings with binary on/off, relay-based thermostats. A simulation study is used to demonstrate the validity of the proposed approach, and the improved solution quality compared to existing MIP-based control approaches.

preprint2020arXiv

Recharging and rejuvenation of decontaminated N95 masks

N95 respirators comprise a critical part of the personal protective equipment used by frontline health-care workers, and are typically meant for one-time usage. However, the recent COVID-19 pandemic has resulted in a serious shortage of these masks leading to a worldwide effort to develop decontamination and re-use procedures. A major factor contributing to the filtration efficiency of N95 masks is the presence of an intermediate layer of charged polypropylene electret fibers that trap particles through electrostatic or electrophoretic effects. This charge can degrade when the mask is used. Moreover, simple decontamination procedures (e.g. use of alcohol) can degrade any remaining charge from the polypropylene, thus severely impacting the filtration efficiency post decontamination. In this report, we summarize our results on the development of a simple laboratory setup allowing measurement of charge and filtration efficiency in N95 masks. In particular, we propose and show that it is possible to recharge the masks post-decontamination and recover filtration efficiency.

preprint2016arXiv

Comparison of GaN nanowires grown on c-, r- and m-plane sapphire substrates

Gallium nitride nanowires were grown on c-plane, r-plane and m-plane sapphire substrates in a showerhead metalorganic chemical vapor deposition system using nickel catalyst with trimethylgallium and ammonia as precursors. We studied the influence of carrier gas, growth temperature, reactor pressure, reactant flow rates and substrate orientation in order to obtain thin nanowires. The nanowires grew along the <10-11> and <10-10> axes depending on the substrate orientation. These nanowires were further characterized using x-ray diffraction, electron microscopy, photoluminescence and Raman spectroscopy.

preprint2016arXiv

Wide bandwidth nanowire electromechanics on insulating substrates at room temperature

We study InAs nanowire resonators fabricated on sapphire substrate with a local gate configuration. The key advantage of using an insulating sapphire substrate is that it results in a reduced parasitic capacitance thus allowing both wide bandwidth actuation and detection using a network analyzer as well as signal detection at room temperature. Both in-plane and out-of-plane vibrational modes of the nanowire can be driven and the non-linear response of the resonators studied. In addition this technique enables the study of variation of thermal strains due to heating in nanostructures

preprint2015arXiv

Layered transition metal dichalcogenides: promising near-lattice-matched substrates for GaN growth

Most III-nitride semiconductors are grown on non-lattice-matched substrates like sapphire or silicon due to the extreme difficulty of obtaining a native GaN substrate. We show that several layered transition-metal dichalcogenides are closely lattice matched to GaN and report the growth of GaN on a range of such layered materials. We report detailed studies of the growth of GaN on mechanically-exfoliated flakes WS$_2$ and MoS$_2$ by metalorganic vapour phase epitaxy. Structural and optical characterization show that strain-free, single-crystal islands of GaN are obtained on the underlying chalcogenide flakes. We obtain strong near-band-edge emission from these layers, and analyse their temperature-dependent photoluminescence properties. We also report a proof-of-concept demonstration of large-area epitaxial growth of GaN on CVD MoS$_2$. Our results show that the transition-metal dichalcogenides can serve as novel near-lattice-matched substrates for nitride growth.

preprint2015arXiv

Trajectory Aware Macro-cell Planning for Mobile Users

We design and evaluate algorithms for efficient user-mobility driven macro-cell planning in cellular networks. As cellular networks embrace heterogeneous technologies (including long range 3G/4G and short range WiFi, Femto-cells, etc.), most traffic generated by static users gets absorbed by the short-range technologies, thereby increasingly leaving mobile user traffic to macro-cells. To this end, we consider a novel approach that factors in the trajectories of mobile users as well as the impact of city geographies and their associated road networks for macro-cell planning. Given a budget k of base-stations that can be upgraded, our approach selects a deployment that impacts the most number of user trajectories. The generic formulation incorporates the notion of quality of service of a user trajectory as a parameter to allow different application-specific requirements, and operator choices.We show that the proposed trajectory utility maximization problem is NP-hard, and design multiple heuristics. We evaluate our algorithms with real and synthetic data sets emulating different city geographies to demonstrate their efficacy. For instance, with an upgrade budget k of 20%, our algorithms perform 3-8 times better in improving the user quality of service on trajectories in different city geographies when compared to greedy location-based base-station upgrades.

preprint2013arXiv

A facile process for soak-and-peel delamination of CVD graphene from substrates using water

We demonstrate a simple technique to transfer CVD-grown graphene from copper and platinum substrates using a soak-and-peel delamination technique utilizing only hot deionized water. The lack of chemical etchants results in cleaner CVD graphene films minimizing unintentional doping, as confirmed by Raman and electrical measurements. The process allows the reuse of substrates and hence can enable the use of oriented substrates for growth of higher quality graphene, and is an inherently inexpensive and scalable process for large-area production.

preprint2013arXiv

Constraint Satisfaction over Generalized Staircase Constraints

One of the key research interests in the area of Constraint Satisfaction Problem (CSP) is to identify tractable classes of constraints and develop efficient solutions for them. In this paper, we introduce generalized staircase (GS) constraints which is an important generalization of one such tractable class found in the literature, namely, staircase constraints. GS constraints are of two kinds, down staircase (DS) and up staircase (US). We first examine several properties of GS constraints, and then show that arc consistency is sufficient to determine a solution to a CSP over DS constraints. Further, we propose an optimal O(cd) time and space algorithm to compute arc consistency for GS constraints where c is the number of constraints and d is the size of the largest domain. Next, observing that arc consistency is not necessary for solving a DSCSP, we propose a more efficient algorithm for solving it. With regard to US constraints, arc consistency is not known to be sufficient to determine a solution, and therefore, methods such as path consistency or variable elimination are required. Since arc consistency acts as a subroutine for these existing methods, replacing it by our optimal O(cd) arc consistency algorithm produces a more efficient method for solving a USCSP.

preprint2013arXiv

Evolution of the Modern Phase of Written Bangla: A Statistical Study

Active languages such as Bangla (or Bengali) evolve over time due to a variety of social, cultural, economic, and political issues. In this paper, we analyze the change in the written form of the modern phase of Bangla quantitatively in terms of character-level, syllable-level, morpheme-level and word-level features. We collect three different types of corpora---classical, newspapers and blogs---and test whether the differences in their features are statistically significant. Results suggest that there are significant changes in the length of a word when measured in terms of characters, but there is not much difference in usage of different characters, syllables and morphemes in a word or of different words in a sentence. To the best of our knowledge, this is the first work on Bangla of this kind.

preprint2012arXiv

Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations

The multi-criteria decision making, which is possible with the advent of skyline queries, has been applied in many areas. Though most of the existing research is concerned with only a single relation, several real world applications require finding the skyline set of records over multiple relations. Consequently, the join operation over skylines where the preferences are local to each relation, has been proposed. In many of those cases, however, the join often involves performing aggregate operations among some of the attributes from the different relations. In this paper, we introduce such queries as "aggregate skyline join queries". Since the naive algorithm is impractical, we propose three algorithms to efficiently process such queries. The algorithms utilize certain properties of skyline sets, and processes the skylines as much as possible locally before computing the join. Experiments with real and synthetic datasets exhibit the practicality and scalability of the algorithms with respect to the cardinality and dimensionality of the relations.

preprint2012arXiv

INSTRUCT: Space-Efficient Structure for Indexing and Complete Query Management of String Databases

The tremendous expanse of search engines, dictionary and thesaurus storage, and other text mining applications, combined with the popularity of readily available scanning devices and optical character recognition tools, has necessitated efficient storage, retrieval and management of massive text databases for various modern applications. For such applications, we propose a novel data structure, INSTRUCT, for efficient storage and management of sequence databases. Our structure uses bit vectors for reusing the storage space for common triplets, and hence, has a very low memory requirement. INSTRUCT efficiently handles prefix and suffix search queries in addition to the exact string search operation by iteratively checking the presence of triplets. We also propose an extension of the structure to handle substring search efficiently, albeit with an increase in the space requirements. This extension is important in the context of trie-based solutions which are unable to handle such queries efficiently. We perform several experiments portraying that INSTRUCT outperforms the existing structures by nearly a factor of two in terms of space requirements, while the query times are better. The ability to handle insertion and deletion of strings in addition to supporting all kinds of queries including exact search, prefix/suffix search and substring search makes INSTRUCT a complete data structure.

preprint2012arXiv

Minimally Infrequent Itemset Mining using Pattern-Growth Paradigm and Residual Trees

Itemset mining has been an active area of research due to its successful application in various data mining scenarios including finding association rules. Though most of the past work has been on finding frequent itemsets, infrequent itemset mining has demonstrated its utility in web mining, bioinformatics and other fields. In this paper, we propose a new algorithm based on the pattern-growth paradigm to find minimally infrequent itemsets. A minimally infrequent itemset has no subset which is also infrequent. We also introduce the novel concept of residual trees. We further utilize the residual trees to mine multiple level minimum support itemsets where different thresholds are used for finding frequent itemsets for different lengths of the itemset. Finally, we analyze the behavior of our algorithm with respect to different parameters and show through experiments that it outperforms the competing ones.

preprint2012arXiv

Mining Statistically Significant Substrings using the Chi-Square Statistic

The problem of identification of statistically significant patterns in a sequence of data has been applied to many domains such as intrusion detection systems, financial models, web-click records, automated monitoring systems, computational biology, cryptology, and text analysis. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to randomness or chance alone. We use the chi-square statistic as a quantitative measure of statistical significance. Given a string of characters generated from a memoryless Bernoulli model, the problem is to identify the substring for which the empirical distribution of single letters deviates the most from the distribution expected from the generative Bernoulli model. This deviation is captured using the chi-square measure. The most significant substring (MSS) of a string is thus defined as the substring having the highest chi-square value. Till date, to the best of our knowledge, there does not exist any algorithm to find the MSS in better than O(n^2) time, where n denotes the length of the string. In this paper, we propose an algorithm to find the most significant substring, whose running time is O(n^{3/2}) with high probability. We also study some variants of this problem such as finding the top-t set, finding all substrings having chi-square greater than a fixed threshold and finding the MSS among substrings greater than a given length. We experimentally demonstrate the asymptotic behavior of the MSS on varying the string size and alphabet size. We also describe some applications of our algorithm on cryptology and real world data from finance and sports. Finally, we compare our technique with the existing heuristics for finding the MSS.

preprint2011arXiv

Anisotropic structural and optical properties of a-plane (11-20) AlInN nearly-lattice-matched to GaN

We report epitaxial growth of a-plane (11-20) AlInN layers nearly-lattice-matched to GaN. Unlike for c-plane oriented epilayers, a-plane Al_{1-x}In_{x}N cannot be simultaneously lattice-matched to GaN in both in-plane directions. We study the influence of temperature on indium incorporation and obtain nearly-lattice-matched Al_{0.81}In_{0.19}N at a growth temperature of 760^{o}C. We outline a procedure to check in-plane lattice mismatch using high resolution x-ray diffraction, and evaluate the strain and critical thickness. Polarization-resolved optical transmission measurements of the Al_{0.81}In_{0.19}N epilayer reveal a difference in bandgap of ~140 meV between (electric field) E_parallel_c [0001]-axis and E_perpendicular_c conditions with room-temperature photoluminescence peaked at 3.38 eV strongly polarized with E_parallel_c, in good agreement with strain-dependent band-structure calculations.

preprint2011arXiv

Caching Stars in the Sky: A Semantic Caching Approach to Accelerate Skyline Queries

Multi-criteria decision making has been made possible with the advent of skyline queries. However, processing such queries for high dimensional datasets remains a time consuming task. Real-time applications are thus infeasible, especially for non-indexed skyline techniques where the datasets arrive online. In this paper, we propose a caching mechanism that uses the semantics of previous skyline queries to improve the processing time of a new query. In addition to exact queries, utilizing such special semantics allow accelerating related queries. We achieve this by generating partial result sets guaranteed to be in the skyline sets. We also propose an index structure for efficient organization of the cached queries. Experiments on synthetic and real datasets show the effectiveness and scalability of our proposed methods.

preprint2011arXiv

Facile fabrication of lateral nanowire wrap-gate devices with improved performance

We present a simple fabrication technique for lateral nanowire wrap-gate devices with high capacitive coupling and field-effect mobility. Our process uses e-beam lithography with a single resist-spinning step, and does not require chemical etching. We measure, in the temperature range 1.5-250 K, a subthreshold slope of 5-54 mV/decade and mobility of 2800-2500 $cm^2/Vs$ -- significantly larger than previously reported lateral wrap-gate devices. At depletion, the barrier height due to the gated region is proportional to applied wrap-gate voltage.

preprint2011arXiv

High Q electromechanics with InAs nanowire quantum dots

In this report, we study electromechanical properties of a suspended InAs nanowire (NW) resonator. At low temperatures, the NW acts as the island of a single electron transistor (SET) and we observe a strong coupling between electrons and mechanical modes at resonance; the rate of electron tunneling is approximately 10 times the resonant frequency. Above and below the mechanical resonance, the magnitude of Coulomb peaks is different and we observe Fano resonance in conductance due to the interference between two contributions to potential of the SET. The quality factor ($Q$) of these devices is observed $\sim10^5$ at 100 mK.

preprint2011arXiv

Polarization sensitive solar-blind detector based on a-plane AlGaN

We report polarization-sensitive solar-blind metal-semiconductor-metal UV photodetectors based on (11-20) a-plane AlGaN. The epilayer shows anisotropic optical properties confirmed by polarization-resolved transmission and photocurrent measurements, in good agreement with band structure calculations.

preprint2011arXiv

Tunable thermal conductivity in defect engineered nanowires at low temperatures

We measure the thermal conductivity ($κ$) of individual InAs nanowires (NWs), and find that it is 3 orders of magnitude smaller than the bulk value in the temperature range of 10 to 50 K. We argue that the low $κ$ arises from the strong localization of phonons in the random superlattice of twin-defects oriented perpendicular to the axis of the NW. We observe significant electronic contribution arising from the surface accumulation layer which gives rise to tunability of $κ$ with the application of electrostatic gate and magnetic field. Our devices and measurements of $κ$ at different carrier concentrations and magnetic field without introducing structural defects, offer a means to study new aspects of nanoscale thermal transport.

preprint2010arXiv

Distorted wurtzite unit cells: Determination of lattice parameters of non-polar a-plane AlGaN and estimation of solid phase Al content

Unlike c-plane nitrides, ``non-polar" nitrides grown in e.g. the a-plane or m-plane orientation encounter anisotropic in-plane strain due to the anisotropy in the lattice and thermal mismatch with the substrate or buffer layer. Such anisotropic strain results in a distortion of the wurtzite unit cell and creates difficulty in accurate determination of lattice parameters and solid phase group-III content (x_solid) in ternary alloys. In this paper we show that the lattice distortion is orthorhombic, and outline a relatively simple procedure for measurement of lattice parameters of non-polar group III-nitrides epilayers from high resolution x-ray diffraction measurements. We derive an approximate expression for x_solid taking into account the anisotropic strain. We illustrate this using data for a-plane AlGaN, where we measure the lattice parameters and estimate the solid phase Al content, and also show that this method is applicable for m-plane structures as well.

preprint2010arXiv

Finding top-k similar pairs of objects annotated with terms from an ontology

With the growing focus on semantic searches and interpretations, an increasing number of standardized vocabularies and ontologies are being designed and used to describe data. We investigate the querying of objects described by a tree-structured ontology. Specifically, we consider the case of finding the top-k best pairs of objects that have been annotated with terms from such an ontology when the object descriptions are available only at runtime. We consider three distance measures. The first one defines the object distance as the minimum pairwise distance between the sets of terms describing them, and the second one defines the distance as the average pairwise term distance. The third and most useful distance measure, earth mover's distance, finds the best way of matching the terms and computes the distance corresponding to this best matching. We develop lower bounds that can be aggregated progressively and utilize them to speed up the search for top-k object pairs when the earth mover's distance is used. For the minimum pairwise distance, we devise an algorithm that runs in O(D + Tk log k) time, where D is the total information size and T is the total number of terms in the ontology. We also develop a novel best-first search strategy for the average pairwise distance that utilizes lower bounds generated in an ordered manner. Experiments on real and synthetic datasets demonstrate the practicality and scalability of our algorithms.

preprint2010arXiv

Minimum Spanning Tree on Spatio-Temporal Networks

Given a spatio-temporal network (ST network) where edge properties vary with time, a time-sub-interval minimum spanning tree (TSMST) is a collection of minimum spanning trees of the ST network, where each tree is associated with a time interval. During this time interval, the total cost of tree is least among all the spanning trees. The TSMST problem aims to identify a collection of distinct minimum spanning trees and their respective time-sub-intervals under the constraint that the edge weight functions are piecewise linear. This is an important problem in ST network application domains such as wireless sensor networks (e.g., energy efficient routing). Computing TSMST is challenging because the ranking of candidate spanning trees is non-stationary over a given time interval. Existing methods such as dynamic graph algorithms and kinetic data structures assume separable edge weight functions. In contrast, we propose novel algorithms to find TSMST for large ST networks by accounting for both separable and non-separable piecewise linear edge weight functions. The algorithms are based on the ordering of edges in edge-order-intervals and intersection points of edge weight functions.

preprint2010arXiv

Mining Statistically Significant Substrings Based on the Chi-Square Measure

Given the vast reservoirs of data stored worldwide, efficient mining of data from a large information store has emerged as a great challenge. Many databases like that of intrusion detection systems, web-click records, player statistics, texts, proteins etc., store strings or sequences. Searching for an unusual pattern within such long strings of data has emerged as a requirement for diverse applications. Given a string, the problem then is to identify the substrings that differs the most from the expected or normal behavior, i.e., the substrings that are statistically significant. In other words, these substrings are less likely to occur due to chance alone and may point to some interesting information or phenomenon that warrants further exploration. To this end, we use the chi-square measure. We propose two heuristics for retrieving the top-k substrings with the largest chi-square measure. We show that the algorithms outperform other competing algorithms in the runtime, while maintaining a high approximation ratio of more than 0.96.

preprint2010arXiv

Tuning mechanical modes and influence of charge screening in nanowire resonators

We probe electro-mechanical properties of InAs nanowire (diameter ~ 100 nm) resonators where the suspended nanowire (NW) is also the active channel of a field effect transistor (FET). We observe and explain the non-monotonic dispersion of the resonant frequency with DC gate voltage (VgDC). The effect of electronic screening on the properties of the resonator can be seen in the amplitude. We observe the mixing of mechanical modes with VgDC. We also experimentally probe and quantitatively explain the hysteretic non-linear properties, as a function of VgDC, of the resonator using the Duffing equation.

preprint2009arXiv

Finding Significant Subregions in Large Image Databases

Images have become an important data source in many scientific and commercial domains. Analysis and exploration of image collections often requires the retrieval of the best subregions matching a given query. The support of such content-based retrieval requires not only the formulation of an appropriate scoring function for defining relevant subregions but also the design of new access methods that can scale to large databases. In this paper, we propose a solution to this problem of querying significant image subregions. We design a scoring scheme to measure the similarity of subregions. Our similarity measure extends to any image descriptor. All the images are tiled and each alignment of the query and a database image produces a tile score matrix. We show that the problem of finding the best connected subregion from this matrix is NP-hard and develop a dynamic programming heuristic. With this heuristic, we develop two index based scalable search strategies, TARS and SPARS, to query patterns in a large image repository. These strategies are general enough to work with other scoring schemes and heuristics. Experimental results on real image datasets show that TARS saves more than 87% query time on small queries, and SPARS saves up to 52% query time on large queries as compared to linear search. Qualitative tests on synthetic and real datasets achieve precision of more than 80%.

preprint2009arXiv

On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces

Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point of view of discovering low distortion embeddings into low-dimensional spaces. More specifically, we consider the Mahalanobis distance measure, the Bhattacharyya class of divergences and the Kullback-Leibler divergence. We present a dimensionality reduction method based on the Johnson-Lindenstrauss Lemma for the Mahalanobis measure that achieves arbitrarily low distortion. By using the Johnson-Lindenstrauss Lemma again, we further demonstrate that the Bhattacharyya distance admits dimensionality reduction with arbitrarily low additive error. We also examine the question of embeddability into metric spaces for these distance measures due to the availability of efficient indexing schemes on metric spaces. We provide explicit constructions of point sets under the Bhattacharyya and the Kullback-Leibler divergences whose embeddings into any metric space incur arbitrarily large distortions. We show that the lower bound presented for Bhattacharyya distance is nearly tight by providing an embedding that approaches the lower bound for relatively small dimensional datasets.

Arnab Bhattacharya

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

INDIC DIALECT: A Multi Task Benchmark to Evaluate and Translate in Indian Language Dialects

Lexical and Statistical Analysis of Bangla Newspaper and Literature: A Corpus-Driven Study on Diversity, Readability, and NLP Adaptation

Koopman-based Differentiable Predictive Control for the Dynamics-Aware Economic Dispatch Problem

Predictions of Reynolds and Nusselt numbers in turbulent convection using machine-learning models

Recommendation of Compatible Outfits Conditioned on Style

Terahertz Optical Properties and Birefringence in Single Crystal Vanadium doped [100] \b{eta}-Ga2O3

Vanadium doped beta-Ga2O3 single crystals: Growth, Optical and Terahertz characterization

C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation

How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs

Learning Koopman Representations for Hybrid Systems

Model Predictive Control of Discrete-Continuous Energy Systems via Generalized Disjunctive Programming

Recharging and rejuvenation of decontaminated N95 masks

Comparison of GaN nanowires grown on c-, r- and m-plane sapphire substrates

Wide bandwidth nanowire electromechanics on insulating substrates at room temperature

Layered transition metal dichalcogenides: promising near-lattice-matched substrates for GaN growth

Trajectory Aware Macro-cell Planning for Mobile Users

A facile process for soak-and-peel delamination of CVD graphene from substrates using water

Constraint Satisfaction over Generalized Staircase Constraints

Evolution of the Modern Phase of Written Bangla: A Statistical Study

Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations

INSTRUCT: Space-Efficient Structure for Indexing and Complete Query Management of String Databases

Minimally Infrequent Itemset Mining using Pattern-Growth Paradigm and Residual Trees

Mining Statistically Significant Substrings using the Chi-Square Statistic

Anisotropic structural and optical properties of a-plane (11-20) AlInN nearly-lattice-matched to GaN

Caching Stars in the Sky: A Semantic Caching Approach to Accelerate Skyline Queries

Facile fabrication of lateral nanowire wrap-gate devices with improved performance

High Q electromechanics with InAs nanowire quantum dots

Polarization sensitive solar-blind detector based on a-plane AlGaN

Tunable thermal conductivity in defect engineered nanowires at low temperatures

Distorted wurtzite unit cells: Determination of lattice parameters of non-polar a-plane AlGaN and estimation of solid phase Al content

Finding top-k similar pairs of objects annotated with terms from an ontology

Minimum Spanning Tree on Spatio-Temporal Networks

Mining Statistically Significant Substrings Based on the Chi-Square Measure

Tuning mechanical modes and influence of charge screening in nanowire resonators

Finding Significant Subregions in Large Image Databases

On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces