Source author record

Ken R. Duffy

Ken R. Duffy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Cryptography and Security math.PR Cell Behavior Machine Learning Networking and Internet Architecture Computer Vision math.ST Populations and Evolution Quantitative Methods Statistics Theory

Catalog footprint

What is connected

20works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

AES as Error Correction: Cryptosystems for Reliable Communication

In this paper, we show that the Advanced Encryption Standard (AES) cryptosystem can be used as an error-correcting code to obtain reliability over noisy communication and data systems. Moreover, we characterize a family of computational cryptosystems that can potentially be used as well performing error correcting codes. In particular, we show that simple padding followed by a cryptosystem with uniform or pseudo-uniform outputs can approach the error-correcting performance of random codes. We empirically contrast the performance of the proposed approach using AES as error correction with that of Random Linear Codes and CA-Polar codes and show that in practical scenarios, they achieve almost the same performance. Finally, we present a modified counter mode of operation, named input plaintext counter mode, in order to utilize AES for multiple blocks while retaining its error correcting capabilities.

preprint2022arXiv

Block turbo decoding with ORBGRAND

Guessing Random Additive Noise Decoding (GRAND) is a family of universal decoding algorithms suitable for decoding any moderate redundancy code of any length. We establish that, through the use of list decoding, soft-input variants of GRAND can replace the Chase algorithm as the component decoder in the turbo decoding of product codes. In addition to being able to decode arbitrary product codes, rather than just those with dedicated hard-input component code decoders, results show that ORBGRAND achieves a coding gain of up to 0.7dB over the Chase algorithm with same list size.

preprint2022arXiv

Partial Encryption after Encoding for Security and Reliability in Data Systems

We consider the problem of secure and reliable communication over a noisy multipath network. Previous work considering a noiseless version of our problem proposed a hybrid universal network coding cryptosystem (HUNCC). By combining an information-theoretically secure encoder together with partial encryption, HUNCC is able to obtain security guarantees, even in the presence of an all-observing eavesdropper. In this paper, we propose a version of HUNCC for noisy channels (N-HUNCC). This modification requires four main novelties. First, we present a network coding construction which is jointly, individually secure and error-correcting. Second, we introduce a new security definition which is a computational analogue of individual security, which we call individual indistinguishability under chosen ciphertext attack (individual IND-CCA1), and show that NHUNCC satisfies it. Third, we present a noise based decoder for N-HUNCC, which permits the decoding of the encoded-thenencrypted data. Finally, we discuss how to select parameters for N-HUNCC and its error-correcting capabilities.

preprint2022arXiv

Syfer: Neural Obfuscation for Private Data Release

Balancing privacy and predictive utility remains a central challenge for machine learning in healthcare. In this paper, we develop Syfer, a neural obfuscation method to protect against re-identification attacks. Syfer composes trained layers with random neural networks to encode the original data (e.g. X-rays) while maintaining the ability to predict diagnoses from the encoded data. The randomness in the encoder acts as the private key for the data owner. We quantify privacy as the number of attacker guesses required to re-identify a single image (guesswork). We propose a contrastive learning algorithm to estimate guesswork. We show empirically that differentially private methods, such as DP-Image, obtain privacy at a significant loss of utility. In contrast, Syfer achieves strong privacy while preserving utility. For example, X-ray classifiers built with DP-image, Syfer, and original data achieve average AUCs of 0.53, 0.78, and 0.86, respectively.

preprint2020arXiv

Discrete convolution statistic for hypothesis testing

The question of testing for equality in distribution between two linear models, each consisting of sums of distinct discrete independent random variables with unequal numbers of observations, has emerged from the biological research. In this case, the computation of classical $χ^2$ statistics, which would not include all observations, results in loss of power, especially when sample sizes are small. Here, as an alternative that uses all data, the nonparametric maximum likelihood estimator for the distribution of sum of discrete and independent random variables, which we call the convolution statistic, is proposed and its limiting normal covariance matrix determined. To challenge null hypotheses about the distribution of this sum, the generalized Wald's method is applied to define a testing statistic whose distribution is asymptotic to a $χ^2$ with as many degrees of freedom as the rank of such covariance matrix. Rank analysis also reveals a connection with the roots of the probability generating functions associated to the addend variables of the linear models. A simulation study is performed to compare the convolution test with Pearson's $χ^2$, and to provide usage guidelines.

preprint2020arXiv

Inferring differentiation order in adaptive immune responses from population level data

A hallmark of the adaptive immune response is the proliferation of pathogen-specific lymphocytes that leave in their wake a long lived population of cells that provide lasting immunity. A subject of ongoing investigation is when during an adaptive immune response those memory cells are produced. In two ground-breaking studies, Buchholz et al. (Science, 2013) and Gerlach et al. (Science, 2013) employed experimental methods that allowed identification of offspring from individual lymphocytes in vivo, which we call clonal data, at a single time point. Through the development, application and fitting of a mathematical model, Buchholz et al. (Science, 2013) concluded that, if memory is produced during the expansion phase, memory cell precursors are made before the effector cells that clear the original pathogen. We sought to determine the general validity and power of the modeling approach introduced in Buchholz et al. (Science, 2013) for quickly evaluating differentiation networks by adapting it to make it suitable for drawing inferences from more readily available non-clonal phenotypic proportion time-courses. We first established the method drew consistent deductions when fit to the non-clonal data in Buchholz et al. (Science, 2013) itself. We fit a variant of the model to data reported in Badovinac et al. (J. Immun., 2007), Schlub et al. (Immun. & Cell Bio., 2010), and Kinjo et al. (Nature Commun., 2015) with necessary simplifications to match different reported data in these papers. The deduction from the model was consistent with that in Buchholz et al. (Science, 2013), albeit with questionable parameterizations. An alternative possibility, supported by the data in Kinjo et al. (Nature Commun., 2015), is that memory precursors are created after the expansion phase, which is a deduction not possible from the mathematical methods provided in Buchholz et al. (Science, 2013).

preprint2020arXiv

Noise Recycling

We introduce Noise Recycling, a method that substantially enhances decoding performance of orthogonal channels subject to correlated noise without the need for joint encoding or decoding. The method can be used with any combination of codes, code-rates and decoding techniques. In the approach, a continuous realization of noise is estimated from a lead channel by subtracting its decoded output from its received signal. The estimate is recycled to reduce the Signal to Noise Ratio (SNR) of an orthogonal channel that is experiencing correlated noise and so improve the accuracy of its decoding. In this design, channels only aid each other only through the provision of noise estimates post-decoding. For a system with arbitrary noise correlation between orthogonal channels experiencing potentially distinct conditions, we introduce an algorithm that determines a static decoding order that maximizes total effective SNR. We prove that this solution results in higher effective SNR than independent decoding, which in turn leads to a larger rate region. We derive upper and lower bounds on the capacity of any sequential decoding of orthogonal channels with correlated noise where the encoders are independent and show that those bounds are almost tight. We numerically compare the upper bound with the capacity of jointly Gaussian noise channel with joint encoding and decoding, showing that they match. Simulation results illustrate that Noise Recycling can be employed with any combination of codes and decoders, and that it gives significant Block Error Rate (BLER) benefits when applying the static predetermined order used to enhance the rate region. We further establish that an additional BLER improvement is possible through Dynamic Noise Recycling, where the lead channel is not pre-determined but is chosen on-the-fly based on which decoder provides the most confident decoding.

preprint2020arXiv

Privacy with Estimation Guarantees

We study the central problem in data privacy: how to share data with an analyst while providing both privacy and utility guarantees to the user that owns the data. In this setting, we present an estimation-theoretic analysis of the privacy-utility trade-off (PUT). Here, an analyst is allowed to reconstruct (in a mean-squared error sense) certain functions of the data (utility), while other private functions should not be reconstructed with distortion below a certain threshold (privacy). We demonstrate how chi-square information captures the fundamental PUT in this case and provide bounds for the best PUT. We propose a convex program to compute privacy-assuring mappings when the functions to be disclosed and hidden are known a priori and the data distribution is known. We derive lower bounds on the minimum mean-squared error of estimating a target function from the disclosed data and evaluate the robustness of our approach when an empirical distribution is used to compute the privacy-assuring mappings instead of the true data distribution. We illustrate the proposed approach through two numerical experiments.

preprint2020arXiv

Soft Maximum Likelihood Decoding using GRAND

Maximum Likelihood (ML) decoding of forward error correction codes is known to be optimally accurate, but is not used in practice as it proves too challenging to efficiently implement. Here we introduce a ML decoder called SGRAND, which is a development of a previously described hard detection ML decoder called GRAND, that fully avails of soft detection information and is suitable for use with any arbitrary high-rate, short-length block code. We assess SGRAND's performance on CRC-aided Polar (CA-Polar) codes, which will be used for all control channel communication in 5G NR, comparing its accuracy with CRC-Aided Successive Cancellation List decoding (CA-SCL), a state-of-the-art soft-information decoder specific to CA-Polar codes.

preprint2016arXiv

Site-specific recombinatorics: in situ cellular barcoding with the Cre Lox system

Cellular barcoding is a significant, recently developed, biotechnology tool that enables the familial identification of progeny of individual cells in vivo. Most existing approaches rely on ex vivo viral transduction of cells with barcodes, followed by adoptive transfer into an animal, which works well for some systems, but precludes barcoding cells in their native environment, such as those inside solid tissues. With a view to overcoming this limitation, we propose a new design for a genetic barcoding construct based on the Cre Lox system that induces randomly created stable barcodes in cells in situ by exploiting inherent sequence distance constraints during site-specific recombination. Leveraging this previously unused feature, we identify the cassette with maximal code diversity. This proves to be orders of magnitude higher than what is attainable with previously considered Cre Lox barcoding approaches and is well suited for its intended applications as it exceeds the number of lymphocytes or hematopoietic progenitor cells in mice. Moreover, it can be built using established technology.

preprint2015arXiv

Hiding Symbols and Functions: New Metrics and Constructions for Information-Theoretic Security

We present information-theoretic definitions and results for analyzing symmetric-key encryption schemes beyond the perfect secrecy regime, i.e. when perfect secrecy is not attained. We adopt two lines of analysis, one based on lossless source coding, and another akin to rate-distortion theory. We start by presenting a new information-theoretic metric for security, called symbol secrecy, and derive associated fundamental bounds. We then introduce list-source codes (LSCs), which are a general framework for mapping a key length (entropy) to a list size that an eavesdropper has to resolve in order to recover a secret message. We provide explicit constructions of LSCs, and demonstrate that, when the source is uniformly distributed, the highest level of symbol secrecy for a fixed key length can be achieved through a construction based on minimum-distance separable (MDS) codes. Using an analysis related to rate-distortion theory, we then show how symbol secrecy can be used to determine the probability that an eavesdropper correctly reconstructs functions of the original plaintext. We illustrate how these bounds can be applied to characterize security properties of symmetric-key encryption schemes, and, in particular, extend security claims based on symbol secrecy to a functional setting.

preprint2013arXiv

Bounds on inference

Lower bounds for the average probability of error of estimating a hidden variable X given an observation of a correlated random variable Y, and Fano's inequality in particular, play a central role in information theory. In this paper, we present a lower bound for the average estimation error based on the marginal distribution of X and the principal inertias of the joint distribution matrix of X and Y. Furthermore, we discuss an information measure based on the sum of the largest principal inertias, called k-correlation, which generalizes maximal correlation. We show that k-correlation satisfies the Data Processing Inequality and is convex in the conditional distribution of Y given X. Finally, we investigate how to answer a fundamental question in inference and privacy: given an observation Y, can we estimate a function f(X) of the hidden random variable X with an average error below a certain threshold? We provide a general method for answering this question using an approach based on rate-distortion theory.

preprint2013arXiv

Brute force searching, the typical set and Guesswork

Consider the situation where a word is chosen probabilistically from a finite list. If an attacker knows the list and can inquire about each word in turn, then selecting the word via the uniform distribution maximizes the attacker's difficulty, its Guesswork, in identifying the chosen word. It is tempting to use this property in cryptanalysis of computationally secure ciphers by assuming coded words are drawn from a source's typical set and so, for all intents and purposes, uniformly distributed within it. By applying recent results on Guesswork, for i.i.d. sources it is this equipartition ansatz that we investigate here. In particular, we demonstrate that the expected Guesswork for a source conditioned to create words in the typical set grows, with word length, at a lower exponential rate than that of the uniform approximation, suggesting use of the approximation is ill-advised.

preprint2013arXiv

Guessing a password over a wireless channel (on the effect of noise non-uniformity)

A string is sent over a noisy channel that erases some of its characters. Knowing the statistical properties of the string's source and which characters were erased, a listener that is equipped with an ability to test the veracity of a string, one string at a time, wishes to fill in the missing pieces. Here we characterize the influence of the stochastic properties of both the string's source and the noise on the channel on the distribution of the number of attempts required to identify the string, its guesswork. In particular, we establish that the average noise on the channel is not a determining factor for the average guesswork and illustrate simple settings where one recipient with, on average, a better channel than another recipient, has higher average guesswork. These results stand in contrast to those for the capacity of wiretap channels and suggest the use of techniques such as friendly jamming with pseudo-random sequences to exploit this guesswork behavior.

preprint2012arXiv

Guesswork, large deviations and Shannon entropy

How hard is it guess a password? Massey showed that that the Shannon entropy of the distribution from which the password is selected is a lower bound on the expected number of guesses, but one which is not tight in general. In a series of subsequent papers under ever less restrictive stochastic assumptions, an asymptotic relationship as password length grows between scaled moments of the guesswork and specific Rényi entropy was identified. Here we show that, when appropriately scaled, as the password length grows the logarithm of the guesswork satisfies a Large Deviation Principle (LDP), providing direct estimates of the guesswork distribution when passwords are long. The rate function governing the LDP possess a specific, restrictive form that encapsulates underlying structure in the nature of guesswork. Returning to Massey's original observation, a corollary to the LDP shows that expectation of the logarithm of the guesswork is the specific Shannon entropy of the password selection process.

preprint2012arXiv

Tail asymptotics for busy periods

The busy period for a queue is cast as the area swept under the random walk until it first returns to zero, $B$. Encompassing non-i.i.d. increments, the large-deviations asymptotics of $B$ is addressed, under the assumption that the increments satisfy standard conditions, including a negative drift. The main conclusions provide insight on the probability of a large busy period, and the manner in which this occurs: I) The scaled probability of a large busy period has the asymptote, for any $b>0$, \lim_{n\to\infty} \frac{1}{\sqrt{n}} \log P(B\geq bn) = -K\sqrt{b}, \hbox{where} \quad K = 2 \sqrt{-\int_0^{λ^*} Λ(θ) dθ}, \quad \hbox{with $λ^*=\sup\{θ:Λ(θ)\leq0\}$,} and with $Λ$ denoting the scaled cumulant generating function of the increments process. II) The most likely path to a large swept area is found to be a simple rescaling of the path on $[0,1]$ given by, [ψ^*(t) = -Λ(λ^*(1-t))/λ^*.] In contrast to the piecewise linear most likely path leading the random walk to hit a high level, this is strictly concave in general. While these two most likely paths have very different forms, their derivatives coincide at the start of their trajectories, and at their first return to zero. These results partially answer an open problem of Kulick and Palmowski regarding the tail of the work done during a busy period at a single server queue. The paper concludes with applications of these results to the estimation of the busy period statistics $(λ^*, K)$ based on observations of the increments, offering the possibility of estimating the likelihood of a large busy period in advance of observing one.

preprint2011arXiv

Decentralised Learning MACs for Collision-free Access in WLANs

By combining the features of CSMA and TDMA, fully decentralised WLAN MAC schemes have recently been proposed that converge to collision-free schedules. In this paper we describe a MAC with optimal long-run throughput that is almost decentralised. We then design two \changed{schemes} that are practically realisable, decentralised approximations of this optimal scheme and operate with different amounts of sensing information. We achieve this by (1) introducing learning algorithms that can substantially speed up convergence to collision free operation; (2) developing a decentralised schedule length adaptation scheme that provides long-run fair (uniform) access to the medium while maintaining collision-free access for arbitrary numbers of stations.

preprint2011arXiv

Log-Convexity of Rate Region in 802.11e WLANs

In this paper we establish the log-convexity of the rate region in 802.11 WLANs. This generalises previous results for Aloha networks and has immediate implications for optimisation based approaches to the analysis and design of 802.11 wireless networks.

preprint2010arXiv

Most likely paths to error when estimating the mean of a reflected random walk

It is known that simulation of the mean position of a Reflected Random Walk (RRW) $\{W_n\}$ exhibits non-standard behavior, even for light-tailed increment distributions with negative drift. The Large Deviation Principle (LDP) holds for deviations below the mean, but for deviations at the usual speed above the mean the rate function is null. This paper takes a deeper look at this phenomenon. Conditional on a large sample mean, a complete sample path LDP analysis is obtained. Let $I$ denote the rate function for the one dimensional increment process. If $I$ is coercive, then given a large simulated mean position, under general conditions our results imply that the most likely asymptotic behavior, $ψ^*$, of the paths $n^{-1} W_{\lfloor tn\rfloor}$ is to be zero apart from on an interval $[T_0,T_1]\subset[0,1]$ and to satisfy the functional equation \begin{align*} \nabla I\left(\ddtψ^*(t)\right)=λ^*(T_1-t) \quad \text{whenever } ψ(t)\neq 0. \end{align*} If $I$ is non-coercive, a similar, but slightly more involved, result holds. These results prove, in broad generality, that Monte Carlo estimates of the steady-state mean position of a RRW have a high likelihood of over-estimation. This has serious implications for the performance evaluation of queueing systems by simulation techniques where steady state expected queue-length and waiting time are key performance metrics. The results show that naïve estimates of these quantities from simulation are highly likely to be conservative.

preprint2009arXiv

Estimating Loynes' exponent

Loynes' distribution, which characterizes the one dimensional marginal of the stationary solution to Lindley's recursion, possesses an ultimately exponential tail for a large class of increment processes. If one can observe increments but does not know their probabilistic properties, what are the statistical limits of estimating the tail exponent of Loynes' distribution? We conjecture that in broad generality a consistent sequence of non-parametric estimators can be constructed that satisfies a large deviation principle. We present rigorous support for this conjecture under restrictive assumptions and simulation evidence indicating why we believe it to be true in greater generality.

Ken R. Duffy

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

AES as Error Correction: Cryptosystems for Reliable Communication

Block turbo decoding with ORBGRAND

Partial Encryption after Encoding for Security and Reliability in Data Systems

Syfer: Neural Obfuscation for Private Data Release

Discrete convolution statistic for hypothesis testing

Inferring differentiation order in adaptive immune responses from population level data

Noise Recycling

Privacy with Estimation Guarantees

Soft Maximum Likelihood Decoding using GRAND

Site-specific recombinatorics: in situ cellular barcoding with the Cre Lox system

Hiding Symbols and Functions: New Metrics and Constructions for Information-Theoretic Security

Bounds on inference

Brute force searching, the typical set and Guesswork

Guessing a password over a wireless channel (on the effect of noise non-uniformity)

Guesswork, large deviations and Shannon entropy

Tail asymptotics for busy periods

Decentralised Learning MACs for Collision-free Access in WLANs

Log-Convexity of Rate Region in 802.11e WLANs

Most likely paths to error when estimating the mean of a reflected random walk

Estimating Loynes' exponent