Researcher profile

Daniel Kifer

Daniel Kifer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

An Empirical Analysis of Recurrent Learning Algorithms In Neural Lossy Image Compression Systems

Recent advances in deep learning have resulted in image compression algorithms that outperform JPEG and JPEG 2000 on the standard Kodak benchmark. However, they are slow to train (due to backprop-through-time) and, to the best of our knowledge, have not been systematically evaluated on a large variety of datasets. In this paper, we perform the first large-scale comparison of recent state-of-the-art hybrid neural compression algorithms, while exploring the effects of alternative training strategies (when applicable). The hybrid recurrent neural decoder is a former state-of-the-art model (recently overtaken by a Google model) that can be trained using backprop-through-time (BPTT) or with alternative algorithms like sparse attentive backtracking (SAB), unbiased online recurrent optimization (UORO), and real-time recurrent learning (RTRL). We compare these training alternatives along with the Google models (GOOG and E2E) on 6 benchmark datasets. Surprisingly, we found that the model trained with SAB performs better (outperforming even BPTT), resulting in faster convergence and a better peak signal-to-noise ratio.

preprint2022arXiv

Bayesian and Frequentist Semantics for Common Variations of Differential Privacy: Applications to the 2020 Census

The purpose of this paper is to guide interpretation of the semantic privacy guarantees for some of the major variations of differential privacy, which include pure, approximate, Rényi, zero-concentrated, and $f$ differential privacy. We interpret privacy-loss accounting parameters, frequentist semantics, and Bayesian semantics (including new results). The driving application is the interpretation of the confidentiality protections for the 2020 Census Public Law 94-171 Redistricting Data Summary File released August 12, 2021, which, for the first time, were produced with formal privacy guarantees.

preprint2022arXiv

Exact Privacy Analysis of the Gaussian Sparse Histogram Mechanism

Sparse histogram methods can be useful for returning differentially private counts of items in large or infinite histograms, large group-by queries, and more generally, releasing a set of statistics with sufficient item counts. We consider the Gaussian version of the sparse histogram mechanism and study the exact $ε,δ$ differential privacy guarantees satisfied by this mechanism. We compare these exact $ε,δ$ parameters to the simpler overestimates used in prior work to quantify the impact of their looser privacy bounds.

preprint2022arXiv

Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting

In lifelong learning systems based on artificial neural networks, one of the biggest obstacles is the inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we propose a new kind of connectionist architecture, the Sequential Neural Coding Network, that is robust to forgetting when learning from streams of data points and, unlike networks of today, does not learn via the popular back-propagation of errors. Grounded in the neurocognitive theory of predictive processing, our model adapts synapses in a biologically-plausible fashion while another neural system learns to direct and control this cortex-like structure, mimicking some of the task-executive control functionality of the basal ganglia. In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting compared to standard neural models, outperforming a swath of previously proposed methods, including rehearsal/data buffer-based methods, on both standard (SplitMNIST, Split Fashion MNIST, etc.) and custom benchmarks even though it is trained in a stream-like fashion. Our work offers evidence that emulating mechanisms in real neuronal systems, e.g., local learning, lateral competition, can yield new directions and possibilities for tackling the grand challenge of lifelong machine learning.

preprint2022arXiv

Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG Encoder-Decoder

Recent advances in deep learning have led to superhuman performance across a variety of applications. Recently, these methods have been successfully employed to improve the rate-distortion performance in the task of image compression. However, current methods either use additional post-processing blocks on the decoder end to improve compression or propose an end-to-end compression scheme based on heuristics. For the majority of these, the trained deep neural networks (DNNs) are not compatible with standard encoders and would be difficult to deply on personal computers and cellphones. In light of this, we propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends, an approach we call Neural JPEG. We propose frequency domain pre-editing and post-editing methods to optimize the distribution of the DCT coefficients at both encoder and decoder ends in order to improve the standard compression (JPEG) method. Moreover, we design and integrate a scheme for jointly learning quantization tables within this hybrid neural compression framework.Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics, such as PSNR and MS-SSIM, and generates visually appealing images with better color retention quality.

preprint2022arXiv

Reconstruction Attacks on Aggressive Relaxations of Differential Privacy

Differential privacy is a widely accepted formal privacy definition that allows aggregate information about a dataset to be released while controlling privacy leakage for individuals whose records appear in the data. Due to the unavoidable tension between privacy and utility, there have been many works trying to relax the requirements of differential privacy to achieve greater utility. One class of relaxation, which is starting to gain support outside the privacy community is embodied by the definitions of individual differential privacy (IDP) and bootstrap differential privacy (BDP). The original version of differential privacy defines a set of neighboring database pairs and achieves its privacy guarantees by requiring that each pair of neighbors should be nearly indistinguishable to an attacker. The privacy definitions we study, however, aggressively reduce the set of neighboring pairs that are protected. Both IDP and BDP define a measure of "privacy loss" that satisfies formal privacy properties such as postprocessing invariance and composition, and achieve dramatically better utility than the traditional variants of differential privacy. However, there is a significant downside - we show that they allow a significant portion of the dataset to be reconstructed using algorithms that have arbitrarily low privacy loss under their privacy accounting rules. We demonstrate these attacks using the preferred mechanisms of these privacy definitions. In particular, we design a set of queries that, when protected by these mechanisms with high noise settings (i.e., with claims of very low privacy loss), yield more precise information about the dataset than if they were not protected at all.

preprint2022arXiv

The 2020 Census Disclosure Avoidance System TopDown Algorithm

The Census TopDown Algorithm (TDA) is a disclosure avoidance system using differential privacy for privacy-loss accounting. The algorithm ingests the final, edited version of the 2020 Census data and the final tabulation geographic definitions. The algorithm then creates noisy versions of key queries on the data, referred to as measurements, using zero-Concentrated Differential Privacy. Another key aspect of the TDA are invariants, statistics that the Census Bureau has determined, as matter of policy, to exclude from the privacy-loss accounting. The TDA post-processes the measurements together with the invariants to produce a Microdata Detail File (MDF) that contains one record for each person and one record for each housing unit enumerated in the 2020 Census. The MDF is passed to the 2020 Census tabulation system to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File. This paper describes the mathematics and testing of the TDA for this purpose.

preprint2022arXiv

The Neural Coding Framework for Learning Generative Models

Neural generative models can be used to learn complex probability distributions from data, to sample from them, and to produce probability density estimates. We propose a computational framework for developing neural generative models inspired by the theory of predictive processing in the brain. According to predictive processing theory, the neurons in the brain form a hierarchy in which neurons in one level form expectations about sensory inputs from another level. These neurons update their local models based on differences between their expectations and the observed signals. In a similar way, artificial neurons in our generative models predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality. In this work, we show that the neural generative models learned within our framework perform well in practice across several benchmark datasets and metrics and either remain competitive with or significantly outperform other generative models with similar functionality (such as the variational auto-encoder).

preprint2020arXiv

CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples

We propose CheckDP, the first automated and integrated approach for proving or disproving claims that a mechanism is differentially private. CheckDP can find counterexamples for mechanisms with subtle bugs for which prior counterexample generators have failed. Furthermore, it was able to \emph{automatically} generate proofs for correct mechanisms for which no formal verification was reported before. CheckDP is built on static program analysis, allowing it to be more efficient and more precise in catching infrequent events than existing counterexample generators (which run mechanisms hundreds of thousands of times to estimate their output distribution). Moreover, its sound approach also allows automatic verification of correct mechanisms. When evaluated on standard benchmarks and newer privacy mechanisms, CheckDP generates proofs (for correct mechanisms) and counterexamples (for incorrect mechanisms) within 70 seconds without any false positives or false negatives.

preprint2020arXiv

Guidelines for Implementing and Auditing Differentially Private Systems

Differential privacy is an information theoretic constraint on algorithms and code. It provides quantification of privacy leakage and formal privacy guarantees that are currently considered the gold standard in privacy protections. In this paper we provide an initial set of "best practices" for developing differentially private platforms, techniques for unit testing that are specific to differential privacy, guidelines for checking if differential privacy is being applied correctly in an application, and recommendations for parameter settings. The genesis of this paper was an initiative by Facebook and Social Science One to provide social science researchers with programmatic access to a URL-shares dataset. In order to maximize the utility of the data for research while protecting privacy, researchers should access the data through an interactive platform that supports differential privacy. The intention of this paper is to provide guidelines and recommendations that can generally be re-used in a wide variety of systems. For this reason, no specific platforms will be named, except for systems whose details and theory appear in academic papers.

preprint2020arXiv

Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment

Training deep neural networks on large-scale datasets requires significant hardware resources whose costs (even on cloud platforms) put them out of reach of smaller organizations, groups, and individuals. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. Furthermore, it requires researchers to continually develop various tricks, such as specialized weight initializations and activation functions, in order to ensure a stable parameter optimization. Our goal is to seek an effective, neuro-biologically-plausible alternative to backprop that can be used to train deep networks. In this paper, we propose a gradient-free learning procedure, recursive local representation alignment, for training large-scale neural architectures. Experiments with residual networks on CIFAR-10 and the large benchmark, ImageNet, show that our algorithm generalizes as well as backprop while converging sooner due to weight updates that are parallelizable and computationally less demanding. This is empirical evidence that a backprop-free algorithm can scale up to larger datasets.

preprint2020arXiv

Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack

Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), never reaching 100% accuracy on training. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on CFLs nor tested on sequences longer than those seen in training. The few efforts that have studied them have shown that continuous differentiable memory structures yield poor generalization for complex CFLs, making the RNN less interpretable. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms that ensure that the model learns to properly balance the use of its latent states with external memory. Our improved RNN models exhibit better generalization performance and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CGGs, including the Dyck languages, as well as on the Penn Treebank language modelling task, and achieve stable, robust performance across these benchmarks. Furthermore, we show that only our memory-augmented networks are capable of retaining memory for a longer duration up to strings of length 160.

preprint2020arXiv

Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called "per-example gradient clipping" whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differentiation (e.g., in PyTorch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.