Source author record

Praneeth Vepakomma

Praneeth Vepakomma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security cs.CY Computer Vision Distributed, Parallel, and Cluster Computing Networking and Internet Architecture

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Modulated learning for private and distributed regression with just a single sample per client device

This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app usage aggregators, body-worn sensing devices, and daily event monitors to name a few. When a client has only one sample, the standard federated learning paradigm breaks down as a local update based on that single point is far from being useful, especially in the earlier rounds for estimation of the model coefficients. This utility is further weakened by the privacy-inducing noise applied at every round. This work caters to this problem to enable such clients to collaboratively contribute to effectively learn a global model without leaking the privacy of their data. The proposed approach injects a single, carefully calibrated noisy perturbation to transform the sample at each client, followed by a post-processed representation which is shared with the server. These representations aggregated at the server are processed to obtain an unbiased gradient update that in expectation matches the non-private centralized gradient while preserving data privacy. This approach is different than traditional private federated learning, where the communication payloads involve model coefficients as opposed to privately transformed data samples. This method enables devices with extremely limited data to collaborate and learn accurate, privacy-preserving models without requiring large local datasets or sacrificing individual privacy.

preprint2022arXiv

Decouple-and-Sample: Protecting sensitive information in task agnostic data release

We propose sanitizer, a framework for secure and task-agnostic data release. While releasing datasets continues to make a big impact in various applications of computer vision, its impact is mostly realized when data sharing is not inhibited by privacy concerns. We alleviate these concerns by sanitizing datasets in a two-stage process. First, we introduce a global decoupling stage for decomposing raw data into sensitive and non-sensitive latent representations. Secondly, we design a local sampling stage to synthetically generate sensitive information with differential privacy and merge it with non-sensitive latent features to create a useful representation while preserving the privacy. This newly formed latent information is a task-agnostic representation of the original dataset with anonymized sensitive information. While most algorithms sanitize data in a task-dependent manner, a few task-agnostic sanitization techniques sanitize data by censoring sensitive information. In this work, we show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately. We validate the effectiveness of the sanitizer by outperforming state-of-the-art baselines on the existing benchmark tasks and demonstrating tasks that are not possible using existing techniques.

preprint2022arXiv

Splintering with distributions: A stochastic decoy scheme for private computation

Performing computations while maintaining privacy is an important problem in todays distributed machine learning solutions. Consider the following two set ups between a client and a server, where in setup i) the client has a public data vector $\mathbf{x}$, the server has a large private database of data vectors $\mathcal{B}$ and the client wants to find the inner products $\langle \mathbf{x,y_k} \rangle, \forall \mathbf{y_k} \in \mathcal{B}$. The client does not want the server to learn $\mathbf{x}$ while the server does not want the client to learn the records in its database. This is in contrast to another setup ii) where the client would like to perform an operation solely on its data, such as computation of a matrix inverse on its data matrix $\mathbf{M}$, but would like to use the superior computing ability of the server to do so without having to leak $\mathbf{M}$ to the server. \par We present a stochastic scheme for splitting the client data into privatized shares that are transmitted to the server in such settings. The server performs the requested operations on these shares instead of on the raw client data at the server. The obtained intermediate results are sent back to the client where they are assembled by the client to obtain the final result.

preprint2022arXiv

Visual Transformer Meets CutMix for Improved Accuracy, Communication Efficiency, and Data Privacy in Split Learning

This article seeks for a distributed learning solution for the visual transformer (ViT) architectures. Compared to convolutional neural network (CNN) architectures, ViTs often have larger model sizes, and are computationally expensive, making federated learning (FL) ill-suited. Split learning (SL) can detour this problem by splitting a model and communicating the hidden representations at the split-layer, also known as smashed data. Notwithstanding, the smashed data of ViT are as large as and as similar as the input data, negating the communication efficiency of SL while violating data privacy. To resolve these issues, we propose a new form of CutSmashed data by randomly punching and compressing the original smashed data. Leveraging this, we develop a novel SL framework for ViT, coined CutMixSL, communicating CutSmashed data. CutMixSL not only reduces communication costs and privacy leakage, but also inherently involves the CutMix data augmentation, improving accuracy and scalability. Simulations corroborate that CutMixSL outperforms baselines such as parallelized SL and SplitFed that integrates FL with SL.

preprint2021arXiv

Advances and Open Problems in Federated Learning

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

preprint2020arXiv

Assessing Disease Exposure Risk with Location Data: A Proposal for Cryptographic Preservation of Privacy

Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of exposure to an infectious disease while preserving individual privacy. Our proposal uses recent GPS location histories, which are transformed and encrypted, and a private set intersection protocol to interface with a semi-trusted authority. There have been other recent proposals for privacy-preserving contact tracing, based on Bluetooth and decentralization, that could further eliminate the need for trust in authority. However, solutions with Bluetooth are currently limited to certain devices and contexts while decentralization adds complexity. The goal of this work is two-fold: we aim to propose a location-based system that is more privacy-preserving than what is currently being adopted by governments around the world, and that is also practical to implement with the immediacy needed to stem a viral outbreak.

preprint2020arXiv

COVID-19 Contact-Tracing Mobile Apps: Evaluation and Assessment for Decision Makers

A number of groups, from governments to non-profits, have quickly acted to innovate the contact-tracing process: they are designing, building, and launching contact-tracing apps in response to the COVID-19 crisis. A diverse range of approaches exist, creating challenging choices for officials looking to implement contact-tracing technology in their community and raising concerns about these choices among citizens asked to participate in contact tracing. We are frequently asked how to evaluate and differentiate between the options for contact-tracing applications. Here, we share the questions we ask about app features and plans when reviewing the many contact-tracing apps appearing on the global stage.

preprint2020arXiv

NoPeek: Information leakage reduction to share activations in distributed deep learning

For distributed machine learning with sensitive data, we demonstrate how minimizing distance correlation between raw data and intermediary representations reduces leakage of sensitive raw data patterns across client communications while maintaining model accuracy. Leakage (measured using distance correlation between input and intermediate representations) is the risk associated with the invertibility of raw data from intermediary representations. This can prevent client entities that hold sensitive data from using distributed deep learning services. We demonstrate that our method is resilient to such reconstruction attacks and is based on reduction of distance correlation between raw data and learned representations during training and inference with image datasets. We prevent such reconstruction of raw data while maintaining information required to sustain good classification accuracies.

preprint2020arXiv

PPContactTracing: A Privacy-Preserving Contact Tracing Protocol for COVID-19 Pandemic

Several contact tracing solutions have been proposed and implemented all around the globe to combat the spread of COVID-19 pandemic. But, most of these solutions endanger the privacy rights of the individuals and hinder their widespread adoption. We propose a privacy-preserving contact tracing protocol for the efficient tracing of the spread of the global pandemic. It is based on the private set intersection (PSI) protocol and utilizes the homomorphic properties to preserve the privacy at the individual level. A hierarchical model for the representation of landscapes and rate-limiting factor on the number of queries have been adopted to maintain the efficiency of the protocol.

preprint2020arXiv

SplitNN-driven Vertical Partitioning

In this work, we introduce SplitNN-driven Vertical Partitioning, a configuration of a distributed deep learning method called SplitNN to facilitate learning from vertically distributed features. SplitNN does not share raw data or model details with collaborating institutions. The proposed configuration allows training among institutions holding diverse sources of data without the need of complex encryption algorithms or secure computation protocols. We evaluate several configurations to merge the outputs of the split models, and compare performance and resource efficiency. The method is flexible and allows many different configurations to tackle the specific challenges posed by vertically split datasets.

preprint2016arXiv

Optimal bandwidth estimation for a fast manifold learning algorithm to detect circular structure in high-dimensional data

We provide a way to infer about existence of topological circularity in high-dimensional data sets in $\mathbb{R}^d$ from its projection in $\mathbb{R}^2$ obtained through a fast manifold learning map as a function of the high-dimensional dataset $\mathbb{X}$ and a particular choice of a positive real $σ$ known as bandwidth parameter. At the same time we also provide a way to estimate the optimal bandwidth for fast manifold learning in this setting through minimization of these functions of bandwidth. We also provide limit theorems to characterize the behavior of our proposed functions of bandwidth.

preprint2016arXiv

Supervised Dimensionality Reduction via Distance Correlation Maximization

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, Szekely et. al. (2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximizaiton method of \Parizi et. al. (2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

Praneeth Vepakomma

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Modulated learning for private and distributed regression with just a single sample per client device

Decouple-and-Sample: Protecting sensitive information in task agnostic data release

Splintering with distributions: A stochastic decoy scheme for private computation

Visual Transformer Meets CutMix for Improved Accuracy, Communication Efficiency, and Data Privacy in Split Learning

Advances and Open Problems in Federated Learning

Assessing Disease Exposure Risk with Location Data: A Proposal for Cryptographic Preservation of Privacy

COVID-19 Contact-Tracing Mobile Apps: Evaluation and Assessment for Decision Makers

NoPeek: Information leakage reduction to share activations in distributed deep learning

PPContactTracing: A Privacy-Preserving Contact Tracing Protocol for COVID-19 Pandemic

SplitNN-driven Vertical Partitioning

Optimal bandwidth estimation for a fast manifold learning algorithm to detect circular structure in high-dimensional data

Supervised Dimensionality Reduction via Distance Correlation Maximization