Source author record

Wentu Song

Wentu Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.CO Discrete Mathematics Networking and Internet Architecture

Catalog footprint

What is connected

22works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On the Sequence Reconstruction Problem for the Single-Deletion Two-Substitution Channel

The Levenshtein sequence reconstruction problem studies the reconstruction of a transmitted sequence from multiple erroneous copies of it. A fundamental question in this field is to determine the minimum number of erroneous copies required to guarantee correct reconstruction of the original sequence. This problem is equivalent to determining the maximum possible intersection size of two error balls associated with the underlying channel. Existing research on the sequence reconstruction problem has largely focused on channels with a single type of error, such as insertions, deletions, or substitutions alone. However, relatively little is known for channels that involve a mixture of error types, for instance, channels allowing both deletions and substitutions. In this work, we study the sequence reconstruction problem for the single-deletion two-substitution channel, which allows one deletion and at most two substitutions applied to the transmitted sequence. Specifically, we prove that if two $q$-ary length-$n$ sequences have the Hamming distance $d\geq 2$, where $q\geq 2$ is any fixed integer, then the intersection size of their error balls under the single-deletion two-substitution channel is upper bounded by $(q^2-1)n^2-(3q^2+5q-5)n+O_q(1)$, where $O_q(1)$ is a constant independent from $n$ but dependent on $q$. Moreover, we show that this upper bound is tight up to an additive constant.

preprint2022arXiv

List-decodable Codes for Single-deletion Single-substitution with List-size Two

In this paper, we present an explicit construction of list-decodable codes for single-deletion and single-substitution with list size two and redundancy 3log n+4, where n is the block length of the code. Our construction has lower redundancy than the best known explicit construction by Gabrys et al. (arXiv 2021), whose redundancy is 4log n+O(1).

preprint2021arXiv

Dynamic Programming for Sequential Deterministic Quantization of Discrete Memoryless Channels

In this paper, under a general cost function $C$, we present a dynamic programming (DP) method to obtain an optimal sequential deterministic quantizer (SDQ) for $q$-ary input discrete memoryless channel (DMC). The DP method has complexity $O(q (N-M)^2 M)$, where $N$ and $M$ are the alphabet sizes of the DMC output and quantizer output, respectively. Then, starting from the quadrangle inequality, two techniques are applied to reduce the DP method's complexity. One technique makes use of the Shor-Moran-Aggarwal-Wilber-Klawe (SMAWK) algorithm and achieves complexity $O(q (N-M) M)$. The other technique is much easier to be implemented and achieves complexity $O(q (N^2 - M^2))$. We further derive a sufficient condition under which the optimal SDQ is optimal among all quantizers and the two techniques are applicable. This generalizes the results in the literature for binary-input DMC. Next, we show that the cost function of $α$-mutual information ($α$-MI)-maximizing quantizer belongs to the category of $C$. We further prove that under a weaker condition than the sufficient condition we derived, the aforementioned two techniques are applicable to the design of $α$-MI-maximizing quantizer. Finally, we illustrate the particular application of our design method to practical pulse-amplitude modulation systems.

preprint2020arXiv

Coded Caching with Polynomial Subpacketization

Consider a centralized caching network with a single server and $K$ users. The server has a database of $N$ files with each file being divided into $F$ packets ($F$ is known as subpacketization), and each user owns a local cache that can store $\frac{M}{N}$ fraction of the $N$ files. We construct a family of centralized coded caching schemes with polynomial subpacketization. Specifically, given $M$, $N$ and an integer $n\geq 0$, we construct a family of coded caching schemes for any $(K,M,N)$ caching system with $F=O(K^{n+1})$. More generally, for any $t\in\{1,2,\cdots,K-2\}$ and any integer $n$ such that $0\leq n\leq t$, we construct a coded caching scheme with $\frac{M}{N}=\frac{t}{K}$ and $F\leq K\binom{\left(1-\frac{M}{N}\right)K+n}{n}$.

preprint2020arXiv

Sequence-Subset Distance and Coding for Error Control in DNA-based Data Storage

The process of DNA-based data storage (DNA storage for short) can be mathematically modelled as a communication channel, termed DNA storage channel, whose inputs and outputs are sets of unordered sequences. To design error correcting codes for DNA storage channel, a new metric, termed the sequence-subset distance, is introduced, which generalizes the Hamming distance to a distance function defined between any two sets of unordered vectors and helps to establish a uniform framework to design error correcting codes for DNA storage channel. We further introduce a family of error correcting codes, referred to as \emph{sequence-subset codes}, for DNA storage and show that the error-correcting ability of such codes is completely determined by their minimum distance. We derive some upper bounds on the size of the sequence-subset codes including a tight bound for a special case, a Singleton-like bound and a Plotkin-like bound. We also propose some constructions, including an optimal construction for that special case, which imply lower bounds on the size of such codes.

preprint2016arXiv

On Sequential Locally Repairable Codes

We consider the locally repairable codes (LRC), aiming at sequential recovering multiple erasures. We define the (n,k,r,t)-SLRC (Sequential Locally Repairable Codes) as an [n,k] linear code where any t'(>= t) erasures can be sequentially recovered, each one by r (2<=r<k) other code symbols. Sequential recovering means that the erased symbols are recovered one by one, and an already recovered symbol can be used to recover the remaining erased symbols. This important recovering method, in contrast with the vastly studied parallel recovering, is currently far from understanding, say, lacking codes constructed for arbitrary t>=3 erasures and bounds to evaluate the performance of such codes. We first derive a tight upper bound on the code rate of (n, k, r, t)-SLRC for t=3 and r>=2. We then propose two constructions of binary (n, k, r, t)-SLRCs for general r,t>=2 (Existing constructions are dealing with t<=7 erasures. The first construction generalizes the method of direct product construction. The second construction is based on the resolvable configurations and yields SLRCs for any r>=2 and odd t>=3. For both constructions, the rates are optimal for t in {2,3} and are higher than most of the existing LRC families for arbitrary t>=4.

preprint2015arXiv

Binary Locally Repairable Codes ---Sequential Repair for Multiple Erasures

Locally repairable codes (LRC) for distribute storage allow two approaches to locally repair multiple failed nodes: 1) parallel approach, by which each newcomer access a set of $r$ live nodes $(r$ is the repair locality$)$ to download data and recover the lost packet; and 2) sequential approach, by which the newcomers are properly ordered and each newcomer access a set of $r$ other nodes, which can be either a live node or a newcomer ordered before it. An $[n,k]$ linear code with locality $r$ and allows local repair for up to $t$ failed nodes by sequential approach is called an $(n,k,r,t)$-exact locally repairable code (ELRC). In this paper, we present a family of binary codes which is equivalent to the direct product of $m$ copies of the $[r+1,r]$ single-parity-check code. We prove that such codes are $(n,k,r,t)$-ELRC with $n=(r+1)^m,k=r^m$ and $t=2^m-1$, which implies that they permit local repair for up to $2^m-1$ erasures by sequential approach. Our result shows that the sequential approach has much bigger advantage than parallel approach.

preprint2015arXiv

Erasure codes with symbol locality and group decodability for distributed storage

We introduce a new family of erasure codes, called group decodable code (GDC), for distributed storage system. Given a set of design parameters {α; β; k; t}, where k is the number of information symbols, each codeword of an (α; β; k; t)-group decodable code is a t-tuple of strings, called buckets, such that each bucket is a string of βsymbols that is a codeword of a [β; α] MDS code (which is encoded from αinformation symbols). Such codes have the following two properties: (P1) Locally Repairable: Each code symbol has locality (α; β-α+ 1). (P2) Group decodable: From each bucket we can decode αinformation symbols. We establish an upper bound of the minimum distance of (α; β; k; t)-group decodable code for any given set of {α; β; k; t}; We also prove that the bound is achievable when the coding field F has size |F| > n-1 \choose k-1.

preprint2015arXiv

Local Codes with Addition Based Repair

We consider the complexities of repair algorithms for locally repairable codes and propose a class of codes that repair single node failures using addition operations only, or codes with addition based repair. We construct two families of codes with addition based repair. The first family attains distance one less than the Singleton-like upper bound, while the second family attains the Singleton-like upper bound.

preprint2015arXiv

Locally Encodable and Decodable Codes for Distributed Storage Systems

We consider the locality of encoding and decoding operations in distributed storage systems (DSS), and propose a new class of codes, called locally encodable and decodable codes (LEDC), that provides a higher degree of operational locality compared to currently known codes. For a given locality structure, we derive an upper bound on the global distance and demonstrate the existence of an optimal LEDC for sufficiently large field size. In addition, we also construct two families of optimal LEDC for fields with size linear in code length.

preprint2015arXiv

Locally Repairable Codes with Functional Repair and Multiple Erasure Tolerance

We consider the problem of designing [n; k] linear codes for distributed storage systems (DSS) that satisfy the (r, t)-Local Repair Property, where any t'(<=t) simultaneously failed nodes can be locally repaired, each with locality r. The parameters n, k, r, t are positive integers such that r<k<n and t <= n-k. We consider the functional repair model and the sequential approach for repairing multiple failed nodes. By functional repair, we mean that the packet stored in each newcomer is not necessarily an exact copy of the lost data but a symbol that keep the (r, t)-local repair property. By the sequential approach, we mean that the t' newcomers are ordered in a proper sequence such that each newcomer can be repaired from the live nodes and the newcomers that are ordered before it. Such codes, which we refer to as (n, k, r, t)-functional locally repairable codes (FLRC), are the most general class of LRCs and contain several subclasses of LRCs reported in the literature. In this paper, we aim to optimize the storage overhead (equivalently, the code rate) of FLRCs. We derive a lower bound on the code length n given t belongs to {2,3} and any possible k, r. For t=2, our bound generalizes the rate bound proved in [14]. For t=3, our bound improves the rate bound proved in [10]. We also give some onstructions of exact LRCs for t belongs to {2,3} whose length n achieves the bound of (n, k, r, t)-FLRC, which proves the tightness of our bounds and also implies that there is no gap between the optimal code length of functional LRCs and exact LRCs for certain sets of parameters. Moreover, our constructions are over the binary field, hence are of interest in practice.

preprint2015arXiv

On the Solvability of 3s/nt Sum-Network---A Region Decomposition and Weak Decentralized Code Method

We study the network coding problem of sum-networks with 3 sources and n terminals (3s/nt sum-network), for an arbitrary positive integer n, and derive a sufficient and necessary condition for the solvability of a family of so-called terminal-separable sum-network. Both the condition of terminal-separable and the solvability of a terminal-separable sum-network can be decided in polynomial time. Consequently, we give another necessary and sufficient condition, which yields a faster (O(|E|) time) algorithm than that of Shenvi and Dey ([18], (O(|E|^3) time), to determine the solvability of the 3s/3t sum-network. To obtain the results, we further develop the region decomposition method in [22], [23] and generalize the decentralized coding method in [21]. Our methods provide new efficient tools for multiple source multiple sink network coding problems.

preprint2015arXiv

Weakly Secure MDS Codes for Simple Multiple Access Networks

We consider a simple multiple access network (SMAN), where $k$ sources of unit rates transmit their data to a common sink via $n$ relays. Each relay is connected to the sink and to certain sources. A coding scheme (for the relays) is weakly secure if a passive adversary who eavesdrops on less than $k$ relay-sink links cannot reconstruct the data from each source. We show that there exists a weakly secure maximum distance separable (MDS) coding scheme for the relays if and only if every subset of $\ell$ relays must be collectively connected to at least $\ell+1$ sources, for all $0 < \ell < k$. Moreover, we prove that this condition can be verified in polynomial time in $n$ and $k$. Finally, given a SMAN satisfying the aforementioned condition, we provide another polynomial time algorithm to trim the network until it has a sparsest set of source-relay links that still supports a weakly secure MDS coding scheme.

preprint2014arXiv

Network Coding for $3$s$/n$t Sum-Networks

A sum-network is a directed acyclic network where each source independently generates one symbol from a given field $\mathbb F$ and each terminal wants to receive the sum $($over $\mathbb F)$ of the source symbols. For sum-networks with two sources or two terminals, the solvability is characterized by the connection condition of each source-terminal pair [3]. A necessary and sufficient condition for the solvability of the $3$-source $3$-terminal $(3$s$/3$t$)$ sum-networks was given by Shenvi and Dey [6]. However, the general case of arbitrary sources/sinks is still open. In this paper, we investigate the sum-network with three sources and $n$ sinks using a region decomposition method. A sufficient and necessary condition is established for a class of $3$s$/n$t sum-networks. As a direct application of this result, a necessary and sufficient condition of solvability is obtained for the special case of $3$s$/3$t sum-networks.

preprint2014arXiv

On Block Security of Regenerating Codes at the MBR Point for Distributed Storage Systems

A passive adversary can eavesdrop stored content or downloaded content of some storage nodes, in order to learn illegally about the file stored across a distributed storage system (DSS). Previous work in the literature focuses on code constructions that trade storage capacity for perfect security. In other words, by decreasing the amount of original data that it can store, the system can guarantee that the adversary, which eavesdrops up to a certain number of storage nodes, obtains no information (in Shannon's sense) about the original data. In this work we introduce the concept of block security for DSS and investigate minimum bandwidth regenerating (MBR) codes that are block secure against adversaries of varied eavesdropping strengths. Such MBR codes guarantee that no information about any group of original data units up to a certain size is revealed, without sacrificing the storage capacity of the system. The size of such secure groups varies according to the number of nodes that the adversary can eavesdrop. We show that code constructions based on Cauchy matrices provide block security. The opposite conclusion is drawn for codes based on Vandermonde matrices.

preprint2014arXiv

On the Existence of MDS Codes Over Small Fields With Constrained Generator Matrices

We study the existence over small fields of Maximum Distance Separable (MDS) codes with generator matrices having specified supports (i.e. having specified locations of zero entries). This problem unifies and simplifies the problems posed in recent works of Yan and Sprintson (NetCod'13) on weakly secure cooperative data exchange, of Halbawi et al. (arxiv'13) on distributed Reed-Solomon codes for simple multiple access networks, and of Dau et al. (ISIT'13) on MDS codes with balanced and sparse generator matrices. We conjecture that there exist such $[n,k]_q$ MDS codes as long as $q \geq n + k - 1$, if the specified supports of the generator matrices satisfy the so-called MDS condition, which can be verified in polynomial time. We propose a combinatorial approach to tackle the conjecture, and prove that the conjecture holds for a special case when the sets of zero coordinates of rows of the generator matrix share with each other (pairwise) at most one common element. Based on our numerical result, the conjecture is also verified for all $k \leq 7$. Our approach is based on a novel generalization of the well-known Hall's marriage theorem, which allows (overlapping) multiple representatives instead of a single representative for each subset.

preprint2014arXiv

Secure Erasure Codes With Partial Decodability

The MDS property (aka the $k$-out-of-$n$ property) requires that if a file is split into several symbols and subsequently encoded into $n$ coded symbols, each being stored in one storage node of a distributed storage system (DSS), then an user can recover the file by accessing any $k$ nodes. We study the so-called $p$-decodable $μ$-secure erasure coding scheme $(1 \leq p \leq k - μ, 0 \leq μ< k, p | (k-μ))$, which satisfies the MDS property and the following additional properties: (P1) strongly secure up to a threshold: an adversary which eavesdrops at most $μ$ storage nodes gains no information (in Shannon's sense) about the stored file, (P2) partially decodable: a legitimate user can recover a subset of $p$ file symbols by accessing some $μ+ p$ storage nodes. The scheme is perfectly $p$-decodable $μ$-secure if it satisfies the following additional property: (P3) weakly secure up to a threshold: an adversary which eavesdrops more than $μ$ but less than $μ+p$ storage nodes cannot reconstruct any part of the file. Most of the related work in the literature only focused on the case $p = k - μ$. In other words, no partial decodability is provided: an user cannot retrieve any part of the file by accessing less than $k$ nodes. We provide an explicit construction of $p$-decodable $μ$-secure coding schemes over small fields for all $μ$ and $p$. That construction also produces perfectly $p$-decodable $μ$-secure schemes over small fields when $p = 1$ (for every $μ$), and when $μ= 0, 1$ (for every $p$). We establish that perfect schemes exist over \emph{sufficiently large} fields for almost all $μ$ and $p$.

preprint2013arXiv

Balanced Sparsest Generator Matrices for MDS Codes

We show that given $n$ and $k$, for $q$ sufficiently large, there always exists an $[n, k]_q$ MDS code that has a generator matrix $G$ satisfying the following two conditions: (C1) Sparsest: each row of $G$ has Hamming weight $n - k + 1$; (C2) Balanced: Hamming weights of the columns of $G$ differ from each other by at most one.

preprint2013arXiv

Encoding Complexity of Network Coding with Two Simple Multicast Sessions

The encoding complexity of network coding for single multicast networks has been intensively studied from several aspects: e.g., the time complexity, the required number of encoding links, and the required field size for a linear code solution. However, these issues as well as the solvability are less understood for networks with multiple multicast sessions. Recently, Wang and Shroff showed that the solvability of networks with two unit-rate multicast sessions (2-URMS) can be decided in polynomial time. In this paper, we prove that for the 2-URMS networks: $1)$ the solvability can be determined with time $O(|E|)$; $2)$ a solution can be constructed with time $O(|E|)$; $3)$ an optimal solution can be obtained in polynomial time; $4)$ the number of encoding links required to achieve a solution is upper-bounded by $\max\{3,2N-2\}$; and $5)$ the field size required to achieve a linear solution is upper-bounded by $\max\{2,\lfloor\sqrt{2N-7/4}+1/2\rfloor\}$, where $|E|$ is the number of links and $N$ is the number of sinks of the underlying network. Both bounds are shown to be tight.

preprint2013arXiv

Optimal Locally Repairable Linear Codes

Linear erasure codes with local repairability are desirable for distributed data storage systems. An [n, k, d] code having all-symbol (r, δ})-locality, denoted as (r, δ)a, is considered optimal if it also meets the minimum Hamming distance bound. The existing results on the existence and the construction of optimal (r, δ)a codes are limited to only the special case of δ = 2, and to only two small regions within this special case, namely, m = 0 or m >= (v+δ-1) > (δ-1), where m = n mod (r+δ-1) and v = k mod r. This paper investigates the existence conditions and presents deterministic constructive algorithms for optimal (r, δ)a codes with general r and δ. First, a structure theorem is derived for general optimal (r, δ)a codes which helps illuminate some of their structure properties. Next, the entire problem space with arbitrary n, k, r and δ is divided into eight different cases (regions) with regard to the specific relations of these parameters. For two cases, it is rigorously proved that no optimal (r, δ)a could exist. For four other cases the optimal (r, δ)a codes are shown to exist, deterministic constructions are proposed and the lower bound on the required field size for these algorithms to work is provided. Our new constructive algorithms not only cover more cases, but for the same cases where previous algorithms exist, the new constructions require a considerably smaller field, which translates to potentially lower computational complexity. Our findings substantially enriches the knowledge on (r, δ)a codes, leaving only two cases in which the existence of optimal codes are yet to be determined.

preprint2012arXiv

Error Correction for Cooperative Data Exchange

This paper considers the problem of error correction for a cooperative data exchange (CDE) system, where some clients are compromised or failed and send false messages. Assuming each client possesses a subset of the total messages, we analyze the error correction capability when every client is allowed to broadcast only one linearly-coded message. Our error correction capability bound determines the maximum number of clients that can be compromised or failed without jeopardizing the final decoding solution at each client. We show that deterministic, feasible linear codes exist that can achieve the derived bound. We also evaluate random linear codes, where the coding coefficients are drawn randomly, and then develop the probability for a client to withstand a certain number of compromised or failed peers and successfully deduce the complete message for any network size and any initial message distributions.

preprint2012arXiv

Exchanging Third-Party Information with Minimum Transmission Cost

In this paper, we consider the problem of minimizing the total transmission cost for exchanging channel state information. We proposed a network coded cooperative data exchange scheme, such that the total transmission cost is minimized while each client can decode all the channel information held by all other clients. In this paper, we first derive a necessary and sufficient condition for a feasible transmission. Based on the derived condition, there exists a feasible code design to guarantee that each client can decode the complete information. We further formulate the problem of minimizing the total transmission cost as an integer linear programming. Finally, we discuss the probability that each client can decode the complete information with distributed random linear network coding.

Wentu Song

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

On the Sequence Reconstruction Problem for the Single-Deletion Two-Substitution Channel

List-decodable Codes for Single-deletion Single-substitution with List-size Two

Dynamic Programming for Sequential Deterministic Quantization of Discrete Memoryless Channels

Coded Caching with Polynomial Subpacketization

Sequence-Subset Distance and Coding for Error Control in DNA-based Data Storage

On Sequential Locally Repairable Codes

Binary Locally Repairable Codes ---Sequential Repair for Multiple Erasures

Erasure codes with symbol locality and group decodability for distributed storage

Local Codes with Addition Based Repair

Locally Encodable and Decodable Codes for Distributed Storage Systems

Locally Repairable Codes with Functional Repair and Multiple Erasure Tolerance

On the Solvability of 3s/nt Sum-Network---A Region Decomposition and Weak Decentralized Code Method

Weakly Secure MDS Codes for Simple Multiple Access Networks

Network Coding for $3$s$/n$t Sum-Networks

On Block Security of Regenerating Codes at the MBR Point for Distributed Storage Systems

On the Existence of MDS Codes Over Small Fields With Constrained Generator Matrices

Secure Erasure Codes With Partial Decodability

Balanced Sparsest Generator Matrices for MDS Codes

Encoding Complexity of Network Coding with Two Simple Multicast Sessions

Optimal Locally Repairable Linear Codes

Error Correction for Cooperative Data Exchange

Exchanging Third-Party Information with Minimum Transmission Cost