Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

NOVA: Fundamental Limits of Knowledge Discovery Through AI

Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framework, which models the common ``generate, verify, accumulate, retrain'' loop as an adaptive sampling process over a knowledge space. We identify sufficient conditions under which accumulated genuine knowledge eventually covers a finite domain, and show how their violations produce distinct failure modes: contamination, forgetting, exploration failure, and acceptance failure. We then analyze imperfect verification and identify a contamination trap: as easy-to-find knowledge is exhausted, the model mass assigned to new valid artifacts shrinks, so even small false-positive rates can cause invalid artifacts to enter the knowledge base faster than genuine discoveries. We clarify that Good--Turing estimation is a local batch-diversity diagnostic, not an estimator of the historically undiscovered valid mass that governs long-term discovery. Under a separate tail-equivalence assumption relating the model's effective discovery distribution to a Zipf law with exponent $α>1$, we prove that the cumulative generation cost required to obtain $D$ distinct genuine discoveries satisfies $R_{\mathrm{cum}}(D)=Θ(c_{\mathrm{gen}}D^α)$, where $c_{\mathrm{gen}}$ is the per-candidate generation cost. This scaling law quantifies asymptotic diminishing returns as the discovery frontier advances. Finally, we formalize human amplification through guidance, generation, and verification, explaining why expert input is most valuable near autonomous exploration barriers.

preprint2026arXiv

Revisiting the Interface between Error and Erasure Correction in Wireless Standards

Modern 5G communication systems implement a combination of error correction and feedback-based erasure correction (HARQ/ARQ) as reliability mechanisms, which can introduce substantial delay and resource inefficiency. We propose forward erasure correction using network coding as a more delay-efficient alternative. We present a mathematical characterization of network delay for existing reliability mechanisms and network coding. Through simulations in a network slicing environment, we demonstrate that network coding not only improves the in-order delivery delay and goodput for the applications utilizing the slice, but also benefits other applications sharing the network by reducing resource utilization for the coded slice. Our analysis and characterization point towards ideas that require attention in the 6G standardization process. These findings highlight the need for greater modularity in protocol stack design that enables the integration of novel technologies in future wireless networks.

preprint2022arXiv

A Bivariate Invariance Principle

A notable result from analysis of Boolean functions is the Basic Invariance Principle (BIP), a quantitative nonlinear generalization of the Central Limit Theorem for multilinear polynomials. We present a generalization of the BIP for bivariate multilinear polynomials, i.e., polynomials over two n-length sequences of random variables. This bivariate invariance principle arises from an iterative application of the BIP to bound the error in replacing each of the two input sequences. In order to prove this invariance principle, we first derive a version of the BIP for random multilinear polynomials, i.e., polynomials whose coefficients are random variables. As a benchmark, we also state a naive bivariate invariance principle which treats the two input sequences as one and directly applies the BIP. Neither principle is universally stronger than the other, but we do show that for a notable class of bivariate functions, which we term separable functions, our subtler principle is exponentially tighter than the naive benchmark.

preprint2022arXiv

Absolute Security in High-Frequency Wireless Links

Security against eavesdropping is one of the key concerns in the design of any communication system. Many common considerations of the security of a wireless communication channel rely on comparing the signal level measured by Bob (the intended receiver) to that accessible to Eve (an eavesdropper). Frameworks such as Wyner's wiretap model ensure the security of a link, in an average sense, when Bob's signal-to-noise ratio exceeds Eve's. Unfortunately, because these guarantees rely on statistical assumptions about noise, Eve can still occasionally succeed in decoding information. The goal of achieving exactly zero probability of intercept over an engineered region of the broadcast sector, which we term absolute security, remains elusive. Here, we describe the first architecture for a wireless link which provides absolute security. Our approach relies on the inherent properties of broadband and high-gain antennas, and is therefore ideally suited for implementation in millimeter-wave and terahertz wireless systems, where such antennas will generally be employed. We exploit spatial minima of the antenna pattern at different frequencies, the union of which defines a wide region where Eve is guaranteed to fail regardless of her computational capabilities, and regardless of the noise in the channels. Unlike conventional zero-forcing beam forming methods, we show that, for realistic assumptions about the antenna configuration and power budget, this absolute security guarantee can be achieved over most possible eavesdropper locations. Since we use relatively simple frequency-multiplexed coding, together with the underlying physics of a diffracting aperture, this idea is broadly applicable in many contexts.

preprint2022arXiv

AES as Error Correction: Cryptosystems for Reliable Communication

In this paper, we show that the Advanced Encryption Standard (AES) cryptosystem can be used as an error-correcting code to obtain reliability over noisy communication and data systems. Moreover, we characterize a family of computational cryptosystems that can potentially be used as well performing error correcting codes. In particular, we show that simple padding followed by a cryptosystem with uniform or pseudo-uniform outputs can approach the error-correcting performance of random codes. We empirically contrast the performance of the proposed approach using AES as error correction with that of Random Linear Codes and CA-Polar codes and show that in practical scenarios, they achieve almost the same performance. Finally, we present a modified counter mode of operation, named input plaintext counter mode, in order to utilize AES for multiple blocks while retaining its error correcting capabilities.

preprint2022arXiv

Block turbo decoding with ORBGRAND

Guessing Random Additive Noise Decoding (GRAND) is a family of universal decoding algorithms suitable for decoding any moderate redundancy code of any length. We establish that, through the use of list decoding, soft-input variants of GRAND can replace the Chase algorithm as the component decoder in the turbo decoding of product codes. In addition to being able to decode arbitrary product codes, rather than just those with dedicated hard-input component code decoders, results show that ORBGRAND achieves a coding gain of up to 0.7dB over the Chase algorithm with same list size.

preprint2022arXiv

Broadcast Approach Meets Network Coding for Data Streaming

For data streaming applications, existing solutions are not yet able to close the gap between high data rates and low delay. This work considers the problem of data streaming under mixed delay constraints over a single communication channel with delayed feedback. We propose a novel layered adaptive causal random linear network coding (LAC-RLNC) approach with forward error correction. LAC-RLNC is a variable-to-variable coding scheme, i.e., variable recovered information data at the receiver over variable short block length and rate is proposed. Specifically, for data streaming with base and enhancement layers of content, we characterize a high dimensional throughput-delay trade-off managed by the adaptive causal layering coding scheme. The base layer is designed to satisfy the strict delay constraints, as it contains the data needed to allow the streaming service. Then, the sender can manage the throughput-delay trade-off of the second layer by adjusting the retransmission rate a priori and posterior as the enhancement layer, that contains the remaining data to augment the streaming service's quality, is with the relax delay constraints. We numerically show that the layered network coding approach can dramatically increase performance. We demonstrate that LAC-RLNC compared with the non-layered approach gains a factor of three in mean and maximum delay for the base layer, close to the lower bound, and factor two for the enhancement layer.

preprint2022arXiv

Distributed Computations with Layered Resolution

Modern computationally-heavy applications are often time-sensitive, demanding distributed strategies to accelerate them. On the other hand, distributed computing suffers from the bottleneck of slow workers in practice. Distributed coded computing is an attractive solution that adds redundancy such that a subset of distributed computations suffices to obtain the final result. However, the final result is still either obtained within a desired time or not, and for the latter, the resources that are spent are wasted. In this paper, we introduce the novel concept of layered-resolution distributed coded computations such that lower resolutions of the final result are obtained from collective results of the workers -- at an earlier stage than the final result. This innovation makes it possible to have more effective deadline-based systems, since even if a computational job is terminated because of timing, an approximated version of the final result can be released. Based on our theoretical and empirical results, the average execution delay for the first resolution is notably smaller than the one for the final resolution. Moreover, the probability of meeting a deadline is one for the first resolution in a setting where the final resolution exceeds the deadline almost all the time, reducing the success rate of the systems with no layering.

preprint2022arXiv

FlEC: Enhancing QUIC with application-tailored reliability mechanisms

Packet losses are common events in today's networks. They usually result in longer delivery times for application data since retransmissions are the de facto technique to recover from such losses. Retransmissions is a good strategy for many applications but it may lead to poor performance with latency-sensitive applications compared to network coding. Although different types of network coding techniques have been proposed to reduce the impact of losses by transmitting redundant information, they are not widely used. Some niche applications include their own variant of Forward Erasure Correction (FEC) techniques, but there is no generic protocol that enables many applications to easily use them. We close this gap by designing, implementing and evaluating a new Flexible Erasure Correction (FlEC) framework inside the newly standardized QUIC protocol. With FlEC, an application can easily select the reliability mechanism that meets its requirements, from pure retransmissions to various forms of FEC. We consider three different use cases: $(i)$ bulk data transfer, $(ii)$ file transfers with restricted buffers and $(iii)$ delay-constrained messages. We demonstrate that modern transport protocols such as QUIC may benefit from application knowledge by leveraging this knowledge in FlEC to provide better loss recovery and stream scheduling. Our evaluation over a wide range of scenarios shows that the FlEC framework outperforms the standard QUIC reliability mechanisms from a latency viewpoint.

preprint2022arXiv

Multi-Level Group Testing with Application to One-Shot Pooled COVID-19 Tests

A key requirement in containing contagious diseases, such as the Coronavirus disease 2019 (COVID-19) pandemic, is the ability to efficiently carry out mass diagnosis over large populations. Some of the leading testing procedures, such as those utilizing qualitative polymerase chain reaction, involve using dedicated machinery which can simultaneously process a limited amount of samples. A candidate method to increase the test throughput is to examine pooled samples comprised of a mixture of samples from different patients. In this work we study pooling based tests which operate in a one-shot fashion, while providing an indication not solely on the presence of infection, but also on its level, without additional pool tests, as often required in COVID-19 testing. As these requirements limit the application of traditional group-testing (GT) methods, we propose a multi-level GT scheme, which builds upon GT principles to enable accurate recovery using much fewer tests than patients, while operating in a one-shot manner and providing multi-level indications. We provide a theoretical analysis of the proposed scheme and characterize conditions under which the algorithm operates reliably and at affordable computational complexity. Our numerical results demonstrate that multi level GT accurately and efficiently detects infection levels, while achieving improved performance over previously proposed one-shot COVID-19 pooled-testing methods.

preprint2022arXiv

Partial Encryption after Encoding for Security and Reliability in Data Systems

We consider the problem of secure and reliable communication over a noisy multipath network. Previous work considering a noiseless version of our problem proposed a hybrid universal network coding cryptosystem (HUNCC). By combining an information-theoretically secure encoder together with partial encryption, HUNCC is able to obtain security guarantees, even in the presence of an all-observing eavesdropper. In this paper, we propose a version of HUNCC for noisy channels (N-HUNCC). This modification requires four main novelties. First, we present a network coding construction which is jointly, individually secure and error-correcting. Second, we introduce a new security definition which is a computational analogue of individual security, which we call individual indistinguishability under chosen ciphertext attack (individual IND-CCA1), and show that NHUNCC satisfies it. Third, we present a noise based decoder for N-HUNCC, which permits the decoding of the encoded-thenencrypted data. Finally, we discuss how to select parameters for N-HUNCC and its error-correcting capabilities.

preprint2022arXiv

Rainbow Differential Privacy

We extend a previous framework for designing differentially private (DP) mechanisms via randomized graph colorings that was restricted to binary functions, corresponding to colorings in a graph, to multi-valued functions. As before, datasets are nodes in the graph and any two neighboring datasets are connected by an edge. In our setting, we assume that each dataset has a preferential ordering for the possible outputs of the mechanism, each of which we refer to as a rainbow. Different rainbows partition the graph of datasets into different regions. We show that if the DP mechanism is pre-specified at the boundary of such regions and behaves identically for all same-rainbow boundary datasets, at most one optimal such mechanism can exist and the problem can be solved by means of a morphism to a line graph. We then show closed form expressions for the line graph in the case of ternary functions. Treatment of ternary queries in this paper displays enough richness to be extended to higher-dimensional query spaces with preferential query ordering, but the optimality proof does not seem to follow directly from the ternary proof.

preprint2022arXiv

Syfer: Neural Obfuscation for Private Data Release

Balancing privacy and predictive utility remains a central challenge for machine learning in healthcare. In this paper, we develop Syfer, a neural obfuscation method to protect against re-identification attacks. Syfer composes trained layers with random neural networks to encode the original data (e.g. X-rays) while maintaining the ability to predict diagnoses from the encoded data. The randomness in the encoder acts as the private key for the data owner. We quantify privacy as the number of attacker guesses required to re-identify a single image (guesswork). We propose a contrastive learning algorithm to estimate guesswork. We show empirically that differentially private methods, such as DP-Image, obtain privacy at a significant loss of utility. In contrast, Syfer achieves strong privacy while preserving utility. For example, X-ray classifiers built with DP-image, Syfer, and original data achieve average AUCs of 0.53, 0.78, and 0.86, respectively.

preprint2022arXiv

Ultra-Reliable Low-Latency Millimeter-Wave Communications with Sliding Window Network Coding

Ultra-reliability and low-latency are pivotal requirements of the new 6th generation of communication systems (xURLLC). Over the past years, to increase throughput, adaptive active antennas were introduced in advanced wireless communications, specifically in the domain of millimeter-wave (mmWave). Consequently, new lower-layer techniques were proposed to cope with practical challenges of high dimensional and electronically-steerable beams. The transition from omni-directional to highly directional antennas presents a new type of wireless systems that deliver high bandwidth, but that are susceptible to high losses and high latency variation. Classical approaches cannot close the rising gap between high throughput and low delay in those advanced systems. In this work, we incorporate effective sliding window network coding solutions in mmWave communications. While legacy systems such as rateless codes improve delay, cross-layer results show that they do not provide low latency communications (LLC - below 10 ms), due to the lossy behaviour of mmWave channel and the lower-layers' retransmission mechanisms. On the other hand, fixed sliding window random linear network coding (RLNC) is able to achieve LLC, and even better, adaptive sliding window RLNC obtains ultra-reliable LLC (Ultra-Reliable and Low-Latency Communications (URLLC) - LLC with maximum delay below 10 ms with more than 99% success rate).

preprint2021arXiv

Low Influence, Utility, and Independence in Differential Privacy: A Curious Case of $3 \choose 2$

We study the relationship between randomized low influence functions and differentially private mechanisms. Our main aim is to formally determine whether differentially private mechanisms are low influence and whether low influence randomized functions can be differentially private. We show that differential privacy does not necessarily imply low influence in a formal sense. However, low influence implies approximate differential privacy. These results hold for both independent and non-independent randomized mechanisms, where an important instance of the former is the widely-used additive noise techniques in the differential privacy literature. Our study also reveals the interesting dynamics between utility, low influence, and independence of a differentially private mechanism. As the name of this paper suggests, we show that any two such features are simultaneously possible. However, in order to have a differentially private mechanism that has both utility and low influence, even under a very mild utility condition, one has to employ non-independent mechanisms.

preprint2021arXiv

Robust Improvement of the Age of Information by Adaptive Packet Coding

We consider a wireless communication network with an adaptive scheme to select the number of packets to be admitted and encoded for each transmission, and characterize the information timeliness. For a network of erasure channels and discrete time, we provide closed form expressions for the Average and Peak Age of Information (AoI) as functions of admission control and adaptive coding parameters, the feedback delay, and the maximum feasible end-to-end rate that depends on channel conditions and network topology. These new results guide the system design for robust improvements of the AoI when transmitting time sensitive information in the presence of topology and channel changes. We illustrate the benefits of using adaptive packet coding to improve information timeliness by characterizing the network performance with respect to the AoI along with its relationship to throughput (rate of successfully decoded packets at the destination) and per-packet delay. We show that significant AoI performance gains can be obtained in comparison to the uncoded case, and that these gains are robust to network variations as channel conditions and network topology change.

preprint2021arXiv

Spatial Concentration of Caching in Wireless Heterogeneous Networks

We propose a decentralized caching policy for wireless heterogeneous networks that makes content placement decisions based on pairwise interactions between cache nodes. We call our proposed scheme γ-exclusion cache placement (GEC), where a parameter γ controls an exclusion radius that discourages nearby caches from storing redundant content. GEC takes into account item popularity and the nodes' caching priorities and leverages negative dependence to relax the classic 0-1 knapsack problem to yield spatially balanced sampling across caches. We show that GEC guarantees a better concentration (reduced variance) of the required cache storage size than the state of the art, and that the cache size constraints can be satisfied with high probability. Given a cache hit probability target, we compare the 95\% confidence intervals of the required cache sizes for three caching schemes: (i) independent placement, (ii) hard exclusion caching (HEC), and (iii) the proposed GEC approach. For uniform spatial traffic, we demonstrate that GEC provides approximately a 3x and 2x reduction in required cache size over (i) and (ii), respectively. For non-uniform spatial traffic based on realistic peak-hour variations in urban scenarios, the gains are even greater.

preprint2021arXiv

Stream Distributed Coded Computing

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the stragglers. To address this challenge, introducing efficient amount of redundant computations via distributed coded computation has received significant attention. Recent approaches in this area have mainly focused on introducing minimum computational redundancies to tolerate certain number of stragglers. To the best of our knowledge, the current literature lacks a unified end-to-end design in a heterogeneous setting where the workers can vary in their computation and communication capabilities. The contribution of this paper is to devise a novel framework for joint scheduling-coding, in a setting where the workers and the arrival of stream computational jobs are based on stochastic models. In our initial joint scheme, we propose a systematic framework that illustrates how to select a set of workers and how to split the computational load among the selected workers based on their differences in order to minimize the average in-order job execution delay. Through simulations, we demonstrate that the performance of our framework is dramatically better than the performance of naive method that splits the computational load uniformly among the workers, and it is close to the ideal performance.

preprint2020arXiv

Capacities and Optimal Input Distributions for Particle-Intensity Channels

This work introduces the particle-intensity channel (PIC) as a model for molecular communication systems and characterizes the capacity limits as well as properties of the optimal (capacity-achieving) input distributions for such channels. In the PIC, the transmitter encodes information, in symbols of a given duration, based on the probability of particle release, and the receiver detects and decodes the message based on the number of particles detected during the symbol interval. In this channel, the transmitter may be unable to control precisely the probability of particle release, and the receiver may not detect all the particles that arrive. We model this channel using a generalization of the binomial channel and show that the capacity-achieving input distribution for this channel always has mass points at probabilities of particle release of zero and one. To find the capacity-achieving input distributions, we develop an efficient algorithm we call dynamic assignment Blahut-Arimoto (DAB). For diffusive particle transport, we also derive the conditions under which the input with two mass points is capacity-achieving.

preprint2020arXiv

Centralized vs Decentralized Targeted Brute-Force Attacks: Guessing with Side-Information

According to recent empirical studies, a majority of users have the same, or very similar, passwords across multiple password-secured online services. This practice can have disastrous consequences, as one password being compromised puts all the other accounts at much higher risk. Generally, an adversary may use any side-information he/she possesses about the user, be it demographic information, password reuse on a previously compromised account, or any other relevant information to devise a better brute-force strategy (so called targeted attack). In this work, we consider a distributed brute-force attack scenario in which $m$ adversaries, each observing some side information, attempt breaching a password secured system. We compare two strategies: an uncoordinated attack in which the adversaries query the system based on their own side-information until they find the correct password, and a fully coordinated attack in which the adversaries pool their side-information and query the system together. For passwords $\mathbf{X}$ of length $n$, generated independently and identically from a distribution $P_X$, we establish an asymptotic closed-form expression for the uncoordinated and coordinated strategies when the side-information $\mathbf{Y}_{(m)}$ are generated independently from passing $\mathbf{X}$ through a memoryless channel $P_{Y|X}$, as the length of the password $n$ goes to infinity. We illustrate our results for binary symmetric channels and binary erasure channels, two families of side-information channels which model password reuse. We demonstrate that two coordinated agents perform asymptotically better than any finite number of uncoordinated agents for these channels, meaning that sharing side-information is very valuable in distributed attacks.

preprint2020arXiv

Noise Recycling

We introduce Noise Recycling, a method that substantially enhances decoding performance of orthogonal channels subject to correlated noise without the need for joint encoding or decoding. The method can be used with any combination of codes, code-rates and decoding techniques. In the approach, a continuous realization of noise is estimated from a lead channel by subtracting its decoded output from its received signal. The estimate is recycled to reduce the Signal to Noise Ratio (SNR) of an orthogonal channel that is experiencing correlated noise and so improve the accuracy of its decoding. In this design, channels only aid each other only through the provision of noise estimates post-decoding. For a system with arbitrary noise correlation between orthogonal channels experiencing potentially distinct conditions, we introduce an algorithm that determines a static decoding order that maximizes total effective SNR. We prove that this solution results in higher effective SNR than independent decoding, which in turn leads to a larger rate region. We derive upper and lower bounds on the capacity of any sequential decoding of orthogonal channels with correlated noise where the encoders are independent and show that those bounds are almost tight. We numerically compare the upper bound with the capacity of jointly Gaussian noise channel with joint encoding and decoding, showing that they match. Simulation results illustrate that Noise Recycling can be employed with any combination of codes and decoders, and that it gives significant Block Error Rate (BLER) benefits when applying the static predetermined order used to enhance the rate region. We further establish that an additional BLER improvement is possible through Dynamic Noise Recycling, where the lead channel is not pre-determined but is chosen on-the-fly based on which decoder provides the most confident decoding.

preprint2020arXiv

Privacy with Estimation Guarantees

We study the central problem in data privacy: how to share data with an analyst while providing both privacy and utility guarantees to the user that owns the data. In this setting, we present an estimation-theoretic analysis of the privacy-utility trade-off (PUT). Here, an analyst is allowed to reconstruct (in a mean-squared error sense) certain functions of the data (utility), while other private functions should not be reconstructed with distortion below a certain threshold (privacy). We demonstrate how chi-square information captures the fundamental PUT in this case and provide bounds for the best PUT. We propose a convex program to compute privacy-assuring mappings when the functions to be disclosed and hidden are known a priori and the data distribution is known. We derive lower bounds on the minimum mean-squared error of estimating a target function from the disclosed data and evaluate the robustness of our approach when an empirical distribution is used to compute the privacy-assuring mappings instead of the true data distribution. We illustrate the proposed approach through two numerical experiments.

preprint2020arXiv

Soft Maximum Likelihood Decoding using GRAND

Maximum Likelihood (ML) decoding of forward error correction codes is known to be optimally accurate, but is not used in practice as it proves too challenging to efficiently implement. Here we introduce a ML decoder called SGRAND, which is a development of a previously described hard detection ML decoder called GRAND, that fully avails of soft detection information and is suitable for use with any arbitrary high-rate, short-length block code. We assess SGRAND's performance on CRC-aided Polar (CA-Polar) codes, which will be used for all control channel communication in 5G NR, comparing its accuracy with CRC-Aided Successive Cancellation List decoding (CA-SCL), a state-of-the-art soft-information decoder specific to CA-Polar codes.

preprint2019arXiv

Babel Storage: Uncoordinated Content Delivery from Multiple Coded Storage Systems

In future content-centric networks, content is identified independently of its location. From an end-user's perspective, individual storage systems dissolve into a seemingly omnipresent structureless `storage fog'. Content should be delivered oblivious of the network topology, using multiple storage systems simultaneously, and at minimal coordination overhead. Prior works have addressed the advantages of error correction coding for distributed storage and content delivery separately. This work takes a comprehensive approach to highlighting the tradeoff between storage overhead and transmission overhead in uncoordinated content delivery from multiple coded storage systems. Our contribution is twofold. First, we characterize the tradeoff between storage and transmission overhead when all participating storage systems employ the same code. Second, we show that the resulting stark inefficiencies can be avoided when storage systems use diverse codes. What is more, such code diversity is not just technically desirable, but presumably will be the reality in the increasingly heterogeneous networks of the future. To this end, we show that a mix of Reed-Solomon, low-density parity-check and random linear network codes achieves close-to-optimal performance at minimal coordination and operational overhead.