Source author record

Mark Gerstein

Mark Gerstein appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Cryptography and Security Machine Learning

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Higher-Order Generalization Bounds: Learning Deep Probabilistic Programs via PAC-Bayes Objectives

Deep Probabilistic Programming (DPP) allows powerful models based on recursive computation to be learned using efficient deep-learning optimization techniques. Additionally, DPP offers a unified perspective, where inference and learning algorithms are treated on a par with models as stochastic programs. Here, we offer a framework for representing and learning flexible PAC-Bayes bounds as stochastic programs using DPP-based methods. In particular, we show that DPP techniques may be leveraged to derive generalization bounds that draw on the compositionality of DPP representations. In turn, the bounds we introduce offer principled training objectives for higher-order probabilistic programs. We offer a definition of a higher-order generalization bound, which naturally encompasses single- and multi-task generalization perspectives (including transfer- and meta-learning) and a novel class of bound based on a learned measure of model complexity. Further, we show how modified forms of all higher-order bounds can be efficiently optimized as objectives for DPP training, using variational techniques. We test our framework using single- and multi-task generalization settings on synthetic and biological data, showing improved performance and generalization prediction using flexible DPP model representations and learned complexity measures.

preprint2022arXiv

Scalable privacy-preserving cancer type prediction with homomorphic encryption

Machine Learning (ML) alleviates the challenges of high-dimensional data analysis and improves decision making in critical applications like healthcare. Effective cancer type from high-dimensional genetic mutation data can be useful for cancer diagnosis and treatment, if the distinguishable patterns between cancer types are identified. At the same time, analysis of high-dimensional data is computationally expensive and is often outsourced to cloud services. Privacy concerns in outsourced ML, especially in the field of genetics, motivate the use of encrypted computation, like Homomorphic Encryption (HE). But restrictive overheads of encrypted computation deter its usage. In this work, we explore the challenges of privacy preserving cancer detection using a real-world dataset consisting of more than 2 million genetic information for several cancer types. Since the data is inherently high-dimensional, we explore smaller ML models for cancer prediction to enable fast inference in the privacy preserving domain. We develop a solution for privacy preserving cancer inference which first leverages the domain knowledge on somatic mutations to efficiently encode genetic mutations and then uses statistical tests for feature selection. Our logistic regression model, built using our novel encoding scheme, achieves 0.98 micro-average area under curve with 13% higher test accuracy than similar studies. We exhaustively test our model's predictive capabilities by analyzing the genes used by the model. Furthermore, we propose a fast matrix multiplication algorithm that can efficiently handle high-dimensional data. Experimental results show that, even with 40,000 features, our proposed matrix multiplication algorithm can speed up concurrent inference of multiple individuals by approximately 10x and inference of a single individual by approximately 550x, in comparison to standard matrix multiplication.

Mark Gerstein

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Higher-Order Generalization Bounds: Learning Deep Probabilistic Programs via PAC-Bayes Objectives

Scalable privacy-preserving cancer type prediction with homomorphic encryption