Source author record

Hang J. Kim

Hang J. Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Cryptography and Security Genomics Methodology

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $ε$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $rε$, if a value of $ε$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited systematically to improve the accuracy of the privatized estimate. Specifically, assuming the agency has information for the full population, we ask under which circumstances accuracy gains could be expected, if the privatized estimate would be computed on a random sample instead of the full population. We find that accuracy gains can be achieved for certain regimes. However, gains can typically only be expected, if the sensitivity of the output with respect to small changes in the database does not depend too strongly on the size of the database. We only focus on algorithms that achieve differential privacy by adding noise to the final output and illustrate the accuracy implications for two commonly used statistics: the mean and the median. We see our research as a first step towards understanding the conditions required for accuracy gains in practice and we hope that these findings will stimulate further research broadening the scope of differential privacy algorithms and outputs considered.

preprint2022arXiv

graph-GPA 2.0: A Graphical Model for Multi-disease Analysis of GWAS Results with Integration of Functional Annotation Data

Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with traits and diseases. However, it still remains challenging to fully understand functional mechanisms underlying many associated variants. This is especially the case when we are interested in variants shared across multiple phenotypes. To address this challenge, we propose graph-GPA 2.0 (GGPA 2.0), a novel statistical framework to integrate GWAS datasets for multiple phenotypes and incorporate functional annotations within a unified framework. We conducted simulation studies to evaluate GGPA 2.0. The results indicate that incorporating functional annotation data using GGPA 2.0 does not only improve detection of disease-associated variants, but also allows to identify more accurate relationships among diseases. We analyzed five autoimmune diseases and five psychiatric disorders with the functional annotations derived from GenoSkyline and GenoSkyline-Plus and the prior disease graph generated by biomedical literature mining. For autoimmune diseases, GGPA 2.0 identified enrichment for blood, especially B cells and regulatory T cells across multiple diseases. Psychiatric disorders were enriched for brain, especially prefrontal cortex and inferior temporal lobe for bipolar disorder (BIP) and schizophrenia (SCZ), respectively. Finally, GGPA 2.0 successfully identified the pleiotropy between BIP and SCZ. These results demonstrate that GGPA 2.0 can be a powerful tool to identify associated variants associated with each phenotype or those shared across multiple phenotypes, while also promoting understanding of functional mechanisms underlying the associated variants.

preprint2016arXiv

Bandwidth Selection for Kernel Density Estimation with a Markov Chain Monte Carlo Sample

Markov chain Monte Carlo samplers produce dependent streams of variates drawn from the limiting distribution of the Markov chain. With this as motivation, we introduce novel univariate kernel density estimators which are appropriate for the stationary sequences of dependent variates. We modify the asymptotic mean integrated squared error criterion to account for dependence and find that the modified criterion suggests data-driven adjustments to standard bandwidth selection methods. Simulation studies show that our proposed methods find bandwidths close to the optimal value while standard methods lead to smaller bandwidths and hence to undersmoothed density estimates. Empirically, the proposed methods have considerably smaller integrated mean squared error than do standard methods.