Researcher profile

Mary Wootters

Mary Wootters contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Efficient Near-Optimal Codes for General Repeat Channels

Given a probability distribution $\mathcal{D}$ over the non-negative integers, a $\mathcal{D}$-repeat channel acts on an input symbol by repeating it a number of times distributed as $\mathcal{D}$. For example, the binary deletion channel ($\mathcal{D}=Bernoulli$) and the Poisson repeat channel ($\mathcal{D}=Poisson$) are special cases. We say a $\mathcal{D}$-repeat channel is square-integrable if $\mathcal{D}$ has finite first and second moments. In this paper, we construct explicit codes for all square-integrable $\mathcal{D}$-repeat channels with rate arbitrarily close to the capacity, that are encodable and decodable in linear and quasi-linear time, respectively. We also consider possible extensions to the repeat channel model, and illustrate how our construction can be extended to an even broader class of channels capturing insertions, deletions, and substitutions. Our work offers an alternative, simplified, and more general construction to the recent work of Rubinstein (arXiv:2111.00261), who attains similar results to ours in the cases of the deletion channel and the Poisson repeat channel. It also slightly improves the runtime and decoding failure probability of the polar codes constructions of Tal et al. (ISIT 2019) and of Pfister and Tal (arXiv:2102.02155) for the deletion channel and certain insertion/deletion/substitution channels. Our techniques follow closely the approaches of Guruswami and Li (IEEEToIT 2019) and Con and Shpilka (IEEEToIT 2020); what sets apart our work is that we show that a capacity-achieving code can be assumed to have an "approximate balance" in the frequency of zeros and ones of all sufficiently long substrings of all codewords. This allows us to attain near-capacity-achieving codes in a general setting. We consider this "approximate balance" result to be of independent interest, as it can be cast in much greater generality than repeat channels.

preprint2022arXiv

Low-bandwidth recovery of linear functions of Reed-Solomon-encoded data

We study the problem of efficiently computing on encoded data. More specifically, we study the question of low-bandwidth computation of functions $F:\mathbb{F}^k \to \mathbb{F}$ of some data $x \in \mathbb{F}^k$, given access to an encoding $c \in \mathbb{F}^n$ of $x$ under an error correcting code. In our model -- relevant in distributed storage, distributed computation and secret sharing -- each symbol of $c$ is held by a different party, and we aim to minimize the total amount of information downloaded from each party in order to compute $F(x)$. Special cases of this problem have arisen in several domains, and we believe that it is fruitful to study this problem in generality. Our main result is a low-bandwidth scheme to compute linear functions for Reed-Solomon codes, even in the presence of erasures. More precisely, let $ε> 0$ and let $\mathcal{C}: \mathbb{F}^k \to \mathbb{F}^n$ be a full-length Reed-Solomon code of rate $1 - ε$ over a field $\mathbb{F}$ with constant characteristic. For any $γ\in [0, ε)$, our scheme can compute any linear function $F(x)$ given access to any $(1 - γ)$-fraction of the symbols of $\mathcal{C}(x)$, with download bandwidth $O(n/(ε- γ))$ bits. In contrast, the naive scheme that involves reconstructing the data $x$ and then computing $F(x)$ uses $Θ(n \log n)$ bits. Our scheme has applications in distributed storage, coded computation, and homomorphic secret sharing.

preprint2022arXiv

On the Download Rate of Homomorphic Secret Sharing

A homomorphic secret sharing (HSS) scheme is a secret sharing scheme that supports evaluating functions on shared secrets by means of a local mapping from input shares to output shares. We initiate the study of the download rate of HSS, namely, the achievable ratio between the length of the output shares and the output length when amortized over $\ell$ function evaluations. We obtain the following results. * In the case of linear information-theoretic HSS schemes for degree-$d$ multivariate polynomials, we characterize the optimal download rate in terms of the optimal minimal distance of a linear code with related parameters. We further show that for sufficiently large $\ell$ (polynomial in all problem parameters), the optimal rate can be realized using Shamir's scheme, even with secrets over $\mathbb{F}_2$. * We present a general rate-amplification technique for HSS that improves the download rate at the cost of requiring more shares. As a corollary, we get high-rate variants of computationally secure HSS schemes and efficient private information retrieval protocols from the literature. * We show that, in some cases, one can beat the best download rate of linear HSS by allowing nonlinear output reconstruction and $2^{-Ω(\ell)}$ error probability.

preprint2021arXiv

On Coding for an Abstracted Nanopore Channel for DNA Storage

In the emerging field of DNA storage, data is encoded as DNA sequences and stored. The data is read out again by sequencing the stored DNA. Nanopore sequencing is a new sequencing technology that has many advantages over other methods; in particular, it is cheap, portable, and can support longer reads. While several practical coding schemes have been developed for DNA storage with nanopore sequencing, the theory is not well understood. Towards that end, we study a highly abstracted (deterministic) version of the nanopore sequencer, which highlights key features that make its analysis difficult. We develop methods and theory to understand the capacity of our abstracted model, and we propose efficient coding schemes and algorithms.

preprint2021arXiv

On Greedy Approaches to Hierarchical Aggregation

We analyze greedy algorithms for the Hierarchical Aggregation (HAG) problem, a strategy introduced in [Jia et al., KDD 2020] for speeding up learning on Graph Neural Networks (GNNs). The idea of HAG is to identify and remove redundancies in computations performed when training GNNs. The associated optimization problem is to identify and remove the most redundancies. Previous work introduced a greedy approach for the HAG problem and claimed a 1-1/e approximation factor. We show by example that this is not correct, and one cannot hope for better than a 1/2 approximation factor. We prove that this greedy algorithm does satisfy some (weaker) approximation guarantee, by showing a new connection between the HAG problem and maximum matching problems in hypergraphs. We also introduce a second greedy algorithm which can out-perform the first one, and we show how to implement it efficiently in some parameter regimes. Finally, we introduce some greedy heuristics that are much faster than the above greedy algorithms, and we demonstrate that they perform well on real-world graphs.

preprint2020arXiv

Lifted multiplicity codes and the disjoint repair group property

Lifted Reed Solomon Codes (Guo, Kopparty, Sudan 2013) were introduced in the context of locally correctable and testable codes. They are multivariate polynomials whose restriction to any line is a codeword of a Reed-Solomon code. We consider a generalization of their construction, which we call lifted multiplicity codes. These are multivariate polynomial codes whose restriction to any line is a codeword of a multiplicity code (Kopparty, Saraf, Yekhanin 2014). We show that lifted multiplicity codes have a better trade-off between redundancy and a notion of locality called the $t$-disjoint-repair-group property than previously known constructions. More precisely, we show that lifted multiplicity codes with length $N$ and redundancy $O(t^{0.585} \sqrt{N})$ have the property that any symbol of a codeword can be reconstructed in $t$ different ways, each using a disjoint subset of the other coordinates. This gives the best known trade-off for this problem for any super-constant $t < \sqrt{N}$. We also give an alternative analysis of lifted Reed Solomon codes using dual codes, which may be of independent interest.

preprint2020arXiv

Linear-time Erasure List-decoding of Expander Codes

We give a linear-time erasure list-decoding algorithm for expander codes. More precisely, let $r > 0$ be any integer. Given an inner code $C_0$ of length $d$, and a $d$-regular bipartite expander graph $G$ with $n$ vertices on each side, we give an algorithm to list-decode the expander code $C = C(G, C_0)$ of length $nd$ from approximately $δδ_r nd$ erasures in time $n \cdot \mathrm{poly}(d2^r / δ)$, where $δ$ and $δ_r$ are the relative distance and the $r$&#39;th generalized relative distance of $C_0$, respectively. To the best of our knowledge, this is the first linear-time algorithm that can list-decode expander codes from erasures beyond their (designed) distance of approximately $δ^2 nd$. To obtain our results, we show that an approach similar to that of (Hemenway and Wootters, Information and Computation, 2018) can be used to obtain such an erasure-list-decoding algorithm with an exponentially worse dependence of the running time on $r$ and $δ$; then we show how to improve the dependence of the running time on these parameters.