Source author record

Ken Duffy

Ken Duffy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Artificial Intelligence eess.SP physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

NOVA: Fundamental Limits of Knowledge Discovery Through AI

Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framework, which models the common ``generate, verify, accumulate, retrain'' loop as an adaptive sampling process over a knowledge space. We identify sufficient conditions under which accumulated genuine knowledge eventually covers a finite domain, and show how their violations produce distinct failure modes: contamination, forgetting, exploration failure, and acceptance failure. We then analyze imperfect verification and identify a contamination trap: as easy-to-find knowledge is exhausted, the model mass assigned to new valid artifacts shrinks, so even small false-positive rates can cause invalid artifacts to enter the knowledge base faster than genuine discoveries. We clarify that Good--Turing estimation is a local batch-diversity diagnostic, not an estimator of the historically undiscovered valid mass that governs long-term discovery. Under a separate tail-equivalence assumption relating the model's effective discovery distribution to a Zipf law with exponent $α>1$, we prove that the cumulative generation cost required to obtain $D$ distinct genuine discoveries satisfies $R_{\mathrm{cum}}(D)=Θ(c_{\mathrm{gen}}D^α)$, where $c_{\mathrm{gen}}$ is the per-candidate generation cost. This scaling law quantifies asymptotic diminishing returns as the discovery frontier advances. Finally, we formalize human amplification through guidance, generation, and verification, explaining why expert input is most valuable near autonomous exploration barriers.

preprint2021arXiv

5G NR CA-Polar Maximum Likelihood Decoding by GRAND

CA-Polar codes have been selected for all control channel communications in 5G NR, but accurate, computationally feasible decoders are still subject to development. Here we report the performance of a recently proposed class of optimally precise Maximum Likelihood (ML) decoders, GRAND, that can be used with any block-code. As published theoretical results indicate that GRAND is computationally efficient for short-length, high-rate codes and 5G CA-Polar codes are in that class, here we consider GRAND's utility for decoding them. Simulation results indicate that decoding of 5G CA-Polar codes by GRAND, and a simple soft detection variant, is a practical possibility.

preprint2021arXiv

A Coding Theory Perspective on Multiplexed Molecular Profiling of Biological Tissues

High-throughput and quantitative experimental technologies are experiencing rapid advances in the biological sciences. One important recent technique is multiplexed fluorescence in situ hybridization (mFISH), which enables the identification and localization of large numbers of individual strands of RNA within single cells. Core to that technology is a coding problem: with each RNA sequence of interest being a codeword, how to design a codebook of probes, and how to decode the resulting noisy measurements? Published work has relied on assumptions of uniformly distributed codewords and binary symmetric channels for decoding and to a lesser degree for code construction. Here we establish that both of these assumptions are inappropriate in the context of mFISH experiments and substantial decoding performance gains can be obtained by using more appropriate, less classical, assumptions. We propose a more appropriate asymmetric channel model that can be readily parameterized from data and use it to develop a maximum a posteriori (MAP) decoders. We show that false discovery rate for rare RNAs, which is the key experimental metric, is vastly improved with MAP decoders even when employed with the existing sub-optimal codebook. Using an evolutionary optimization methodology, we further show that by permuting the codebook to better align with the prior, which is an experimentally straightforward procedure, significant further improvements are possible.

preprint2016arXiv

A Linear Network Code Construction for General Integer Connections Based on the Constraint Satisfaction Problem

The problem of finding network codes for general connections is inherently difficult in capacity constrained networks. Resource minimization for general connections with network coding is further complicated. Existing methods for identifying solutions mainly rely on highly restricted classes of network codes, and are almost all centralized. In this paper, we introduce linear network mixing coefficients for code constructions of general connections that generalize random linear network coding (RLNC) for multicast connections. For such code constructions, we pose the problem of cost minimization for the subgraph involved in the coding solution and relate this minimization to a path-based Constraint Satisfaction Problem (CSP) and an edge-based CSP. While CSPs are NP-complete in general, we present a path-based probabilistic distributed algorithm and an edge-based probabilistic distributed algorithm with almost sure convergence in finite time by applying Communication Free Learning (CFL). Our approach allows fairly general coding across flows, guarantees no greater cost than routing, and shows a possible distributed implementation. Numerical results illustrate the performance improvement of our approach over existing methods.

preprint2016arXiv

Network Infusion to Infer Information Sources in Networks

Several significant models have been developed that enable the study of diffusion of signals across biological, social and engineered networks. Within these established frameworks, the inverse problem of identifying the source of the propagated signal is challenging, owing to the numerous alternative possibilities for signal progression through the network. In real world networks, the challenge of determining sources is compounded as the true propagation dynamics are typically unknown, and when they have been directly measured, they rarely conform to the assumptions of any of the well-studied models. In this paper we introduce a method called Network Infusion (NI) that has been designed to circumvent these issues, making source inference practical for large, complex real world networks. The key idea is that to infer the source node in the network, full characterization of diffusion dynamics, in many cases, may not be necessary. This objective is achieved by creating a diffusion kernel that well-approximates standard diffusion models, but lends itself to inversion, by design, via likelihood maximization or error minimization. We apply NI for both single-source and multi-source diffusion, for both single-snapshot and multi-snapshot observations, and for both homogeneous and heterogeneous diffusion setups. We prove the mean-field optimality of NI for different scenarios, and demonstrate its effectiveness over several synthetic networks. Moreover, we apply NI to a real-data application, identifying news sources in the Digg social network, and demonstrate the effectiveness of NI compared to existing methods. Finally, we propose an integrative source inference framework that combines NI with a distance centrality-based method, which leads to a robust performance in cases where the underlying dynamics are unknown.

preprint2015arXiv

A Perspective on Future Research Directions in Information Theory

Information theory is rapidly approaching its 70th birthday. What are promising future directions for research in information theory? Where will information theory be having the most impact in 10-20 years? What new and emerging areas are ripe for the most impact, of the sort that information theory has had on the telecommunications industry over the last 60 years? How should the IEEE Information Theory Society promote high-risk new research directions and broaden the reach of information theory, while continuing to be true to its ideals and insisting on the intellectual rigor that makes its breakthroughs so powerful? These are some of the questions that an ad hoc committee (composed of the present authors) explored over the past two years. We have discussed and debated these questions, and solicited detailed inputs from experts in fields including genomics, biology, economics, and neuroscience. This report is the result of these discussions.

preprint2015arXiv

Optimization-Based Linear Network Coding for General Connections of Continuous Flows

For general connections, the problem of finding network codes and optimizing resources for those codes is intrinsically difficult and little is known about its complexity. Most of the existing solutions rely on very restricted classes of network codes in terms of the number of flows allowed to be coded together, and are not entirely distributed. In this paper, we consider a new method for constructing linear network codes for general connections of continuous flows to minimize the total cost of edge use based on mixing. We first formulate the minimumcost network coding design problem. To solve the optimization problem, we propose two equivalent alternative formulations with discrete mixing and continuous mixing, respectively, and develop distributed algorithms to solve them. Our approach allows fairly general coding across flows and guarantees no greater cost than any solution without network coding.

Ken Duffy

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

NOVA: Fundamental Limits of Knowledge Discovery Through AI

5G NR CA-Polar Maximum Likelihood Decoding by GRAND

A Coding Theory Perspective on Multiplexed Molecular Profiling of Biological Tissues

A Linear Network Code Construction for General Integer Connections Based on the Constraint Satisfaction Problem

Network Infusion to Infer Information Sources in Networks

A Perspective on Future Research Directions in Information Theory

Optimization-Based Linear Network Coding for General Connections of Continuous Flows