Source author record

Arka Bhattacharya

Arka Bhattacharya appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms math.PR Quantitative Methods

Catalog footprint

What is connected

4works

4topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Sensor-Type Classification in Buildings

Many sensors/meters are deployed in commercial buildings to monitor and optimize their performance. However, because sensor metadata is inconsistent across buildings, software-based solutions are tightly coupled to the sensor metadata conventions (i.e. schemas and naming) for each building. Running the same software across buildings requires significant integration effort. Metadata normalization is critical for scaling the deployment process and allows us to decouple building-specific conventions from the code written for building applications. It also allows us to deal with missing metadata. One important aspect of normalization is to differentiate sensors by the typeof phenomena being observed. In this paper, we propose a general, simple, yet effective classification scheme to differentiate sensors in buildings by type. We perform ensemble learning on data collected from over 2000 sensor streams in two buildings. Our approach is able to achieve more than 92% accuracy for classification within buildings and more than 82% accuracy for across buildings. We also introduce a method for identifying potential misclassified streams. This is important because it allows us to identify opportunities to attain more input from experts -- input that could help improve classification accuracy when ground truth is unavailable. We show that by adjusting a threshold value we are able to identify at least 30% of the misclassified instances.

preprint2014arXiv

Approximation Algorithms for the Asymmetric Traveling Salesman Problem : Describing two recent methods

The paper provides a description of the two recent approximation algorithms for the Asymmetric Traveling Salesman Problem, giving the intuitive description of the works of Feige-Singh[1] and Asadpour et.al\ [2].\newline [1] improves the previous $O(\log n)$ approximation algorithm, by improving the constant from 0.84 to 0.66 and modifying the work of Kaplan et. al\ [3] and also shows an efficient reduction from ATSPP to ATSP. Combining both the results, they finally establish an approximation ratio of $\left(\frac{4}{3}+ε\right)\log n$ for ATSPP,\ considering a small $ε>0$,\ improving the work of Chekuri and Pal.[4]\newline Asadpour et.al, in their seminal work\ [2], gives an $O\left(\frac{\log n}{\log \log n}\right)$ randomized algorithm for the ATSP, by symmetrizing and modifying the solution of the Held-Karp relaxation problem and then proving an exponential family distribution for probabilistically constructing a maximum entropy spanning tree from a spanning tree polytope and then finally defining the thin-ness property and transforming a thin spanning tree into an Eulerian walk.\ The optimization methods used in\ [2] are quite elegant and the approximation ratio could further be improved, by manipulating the thin-ness of the cuts.

preprint2014arXiv

Quadri-allele frequency spectrum in a coalescent topology for mutations in non-constant population size

The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to have more than one clearly different phenotypes. We present a model for analyzing quadri-allele frequency spectrum, where the ancestral population diverged into three populations at a certain divergence time and the resulting mutations on the branches of the coalescent tree gave rise to three different derived alleles, which could be observed in the present generation along with the ancestral allele. The model has been analyzed for non-constant population size, assuming we had a certain number of extant lineages at the divergence time and no migration occurs between the populations.

preprint2013arXiv

Evolution and Computational Learning Theory: A survey on Valiant's paper

Darwin's theory of evolution is considered to be one of the greatest scientific gems in modern science. It not only gives us a description of how living things evolve, but also shows how a population evolves through time and also, why only the fittest individuals continue the generation forward. The paper basically gives a high level analysis of the works of Valiant[1]. Though, we know the mechanisms of evolution, but it seems that there does not exist any strong quantitative and mathematical theory of the evolution of certain mechanisms. What is defined exactly as the fitness of an individual, why is that only certain individuals in a population tend to mutate, how computation is done in finite time when we have exponentially many examples: there seems to be a lot of questions which need to be answered. [1] basically treats Darwinian theory as a form of computational learning theory, which calculates the net fitness of the hypotheses and thus distinguishes functions and their classes which could be evolvable using polynomial amount of resources. Evolution is considered as a function of the environment and the previous evolutionary stages that chooses the best hypothesis using learning techniques that makes mutation possible and hence, gives a quantitative idea that why only the fittest individuals tend to survive and have the power to mutate.