Source author record

Han Mao Kiah

Han Mao Kiah appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.CO Discrete Mathematics Distributed, Parallel, and Cluster Computing Emerging Technologies

Catalog footprint

What is connected

31works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bounds on Codes Correcting Transpositions of Consecutive Symbols

The problem of correcting transpositions (or swaps) of consecutive symbols in $ q $-ary strings is studied. Lower bounds on asymptotically achievable rates of codes correcting $ t = τn $ transpositions are derived. The first bound is obtained by analyzing the average cardinality of ``transposition balls'' and evaluating the appropriate version of the generalized Gilbert--Varshamov bound, while the second bound follows from a construction of codes correcting an arbitrary number of transpositions (i.e., zero-error codes). Asymptotic bounds on the cardinality of optimal codes correcting $ t = \textrm{const} $ transpositions are also derived.

preprint2026arXiv

Reconstructing Reed-Solomon Codes from Multiple Noisy Channel Outputs

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication setting in which a sender transmits a codeword and the receiver observes K independent noisy versions of this codeword. In this work, we study the problem of efficient reconstruction when each of the $K$ outputs is corrupted by a $q$-ary discrete memoryless symmetric (DMS) substitution channel with substitution probability $p$. Focusing on Reed-Solomon (RS) codes, we adapt the Koetter-Vardy soft-decision decoding algorithm to obtain an efficient reconstruction algorithm. For sufficiently large blocklength and alphabet size, we derive an explicit rate threshold, depending only on $(p, K)$, such that the transmitted codeword can be reconstructed with arbitrarily small probability of error whenever the code rate $R$ lies below this threshold.

preprint2026arXiv

Trace Repair Never Loses to Classical Repair: Exact and Explicit Helper Nodes Selection

Repairing Reed-Solomon codes with low bandwidth is a central challenge in distributed storage. Following the trace-repair framework of Guruswami and Wootters (2017), recent works by Lin (2023) and Liu-Wan-Xing (2024) provided significant improvements in bandwidth using two distinct ideas. Lin constructed a trace-repair scheme that requires no contribution from a set of predetermined nodes $\mathscr{S}$, while Liu-Wan-Xing identified linear dependencies among the downloaded traces, relating the number of dependent traces to the dimension of a subspace $\mathscr{W}_k$. In this work, we fully utilize and unify these ideas. We compute the exact dimension of $\mathscr{W}_{k,\mathscr{S}}$ (a generalization of $\mathscr{W}_k$). We identify the trade-off between the set size $|\mathscr{S}|$ and the dimension $\dim(\mathscr{W}_{k,\mathscr{S}})$. We provide an algorithm to find the combination that results in the lowest bandwidth. Furthermore, we provide an explicit choice of the helper nodes for the repair. Finally, we prove that our optimized scheme never loses to the classical repair scheme, establishing a bandwidth guarantee of at most $k\log|\mathbb{F}|$ bits for all dimension $k$ and field $\mathbb{F}$, whenever the trace repair is applicable.

preprint2022arXiv

Two dimensional RC/Subarray Constrained Codes: Bounded Weight and Almost Balanced Weight

In this work, we study two types of constraints on two-dimensional binary arrays. In particular, given $p,ε>0$, we study (i) The $p$-bounded constraint: a binary vector of size $m$ is said to be $p$-bounded if its weight is at most $pm$, and (ii) The $ε$-balanced constraint: a binary vector of size $m$ is said to be $ε$-balanced if its weight is within $[(0.5-ε)*m,(0.5+ε)*m]$. Such constraints are crucial in several data storage systems, those regard the information data as two-dimensional (2D) instead of one-dimensional (1D), such as the crossbar resistive memory arrays and the holographic data storage. In this work, efficient encoding/decoding algorithms are presented for binary arrays so that the weight constraint (either $p$-bounded constraint or $ε$-balanced constraint) is enforced over every row and every column, regarded as 2D row-column (RC) constrained codes; or over every subarray, regarded as 2D subarray constrained codes. While low-complexity designs have been proposed in the literature, mostly focusing on 2D RC constrained codes where $p = 1/2$ and $ε= 0$, this work provides efficient coding methods that work for both 2D RC constrained codes and 2D subarray constrained codes, and more importantly, the methods are applicable for arbitrary values of $p$ and $ε$. Furthermore, for certain values of $p$ and $ε$, we show that, for sufficiently large array size, there exists linear-time encoding/decoding algorithm that incurs at most one redundant bit.

preprint2020arXiv

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given $\ell, ε > 0$, we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most $\ell$, (ii) GC-content constraint: the GC-content of each codeword is within $[0.5-ε, 0.5+ε]$, (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of $\ell$ and $ε$, we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.

preprint2020arXiv

Coding for Sequence Reconstruction for Single Edits

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. The common setup assumes the codebook to be the entire space and the problem is to determine the minimum number of distinct reads that is required to reconstruct the transmitted codeword. Motivated by modern storage devices, we study a variant of the problem where the number of noisy reads $N$ is fixed. Specifically, we design reconstruction codes that reconstruct a codeword from $N$ distinct noisy reads. We focus on channels that introduce single edit error (i.e. a single substitution, insertion, or deletion) and their variants, and design reconstruction codes for all values of $N$. In particular, for the case of a single edit, we show that as the number of noisy reads increases, the number of redundant bits required can be gracefully reduced from $\log n+O(1)$ to $\log \log n+O(1)$, and then to $O(1)$, where $n$ denotes the length of a codeword. We also show that the redundancy of certain reconstruction codes is within one bit of optimality.

preprint2020arXiv

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications

The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be also defined and viewed as constrained sequences. Hence, they will be called constrained de Bruijn sequences and a set of such sequences will be called a constrained de Bruijn code. Several properties and alternative definitions for such codes are examined and they are analyzed as generalized sequences in the de Bruijn graph (and its generalization) and as constrained sequences. Various enumeration techniques are used to compute the total number of sequences for any given set of parameters. A construction method of such codes from the theory of shift-register sequences is proposed. Finally, we show how these constrained de Bruijn sequences and codes can be applied in constructions of codes for correcting synchronization errors in the $\ell$-symbol read channel and in the racetrack memory channel. For this purpose, these codes are superior in their size on previously known codes.

preprint2020arXiv

Explicit Baranyai Partitions for Quadruples, Part I: Quadrupling Constructions

It is well known that, whenever $k$ divides $n$, the complete $k$-uniform hypergraph on $n$ vertices can be partitioned into disjoint perfect matchings. Equivalently, the set of $k$-subsets of an $n$-set can be partitioned into parallel classes so that each parallel class is a partition of the $n$-set. This result is known as Baranyai's theorem, which guarantees the existence of \emph{Baranyai partitions}. Unfortunately, the proof of Baranyai's theorem uses network flow arguments, making this result non-explicit. In particular, there is no known method to produce Baranyai partitions in time and space that scale linearly with the number of hyperedges in the hypergraph. It is desirable for certain applications to have an explicit construction that generates Baranyai partitions in linear time. Such an efficient construction is known for $k=2$ and $k=3$. In this paper, we present an explicit recursive quadrupling construction for $k=4$ and $n=4t$, where $t \equiv 0,3,4,6,8,9 ~(\text{mod}~12)$. In a follow-up paper (Part II), the other values of~$t$, namely $t \equiv 1,2,5,7,10,11 ~(\text{mod}~12)$, will be considered.

preprint2020arXiv

Optimal Reconstruction Codes for Deletion Channels

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. Motivated by modern storage devices, we introduced a variant of the problem where the number of noisy reads $N$ is fixed (Kiah et al. 2020). Of significance, for the single-deletion channel, using $\log_2\log_2 n +O(1)$ redundant bits, we designed a reconstruction code of length $n$ that reconstructs codewords from two distinct noisy reads. In this work, we show that $\log_2\log_2 n -O(1)$ redundant bits are necessary for such reconstruction codes, thereby, demonstrating the optimality of our previous construction. Furthermore, we show that these reconstruction codes can be used in $t$-deletion channels (with $t\ge 2$) to uniquely reconstruct codewords from $n^{t-1}+O\left(n^{t-2}\right)$ distinct noisy reads.

preprint2020arXiv

Repairing Reed-Solomon Codes via Subspace Polynomials

We propose new repair schemes for Reed-Solomon codes that use subspace polynomials and hence generalize previous works in the literature that employ trace polynomials. The Reed-Solomon codes are over $\mathbb{F}_{q^\ell}$ and have redundancy $r = n-k \geq q^m$, $1\leq m\leq \ell$, where $n$ and $k$ are the code length and dimension, respectively. In particular, for one erasure, we show that our schemes can achieve optimal repair bandwidths whenever $n=q^\ell$ and $r = q^m,$ for all $1 \leq m \leq \ell$. For two erasures, our schemes use the same bandwidth per erasure as the single erasure schemes, for $\ell/m$ is a power of $q$, and for $\ell=q^a$, $m=q^b-1>1$ ($a \geq b \geq 1$), and for $m\geq \ell/2$ when $\ell$ is even and $q$ is a power of two.

preprint2020arXiv

Repairing Reed-Solomon Codes With Multiple Erasures

Despite their exceptional error-correcting properties, Reed-Solomon codes have been overlooked in distributed storage applications due to the common belief that they have poor repair bandwidth: A naive repair approach would require the whole file to be reconstructed in order to recover a single erased codeword symbol. In a recent work, Guruswami and Wootters (STOC'16) proposed a single-erasure repair method for Reed-Solomon codes that achieves the optimal repair bandwidth amongst all linear encoding schemes. Their key idea is to recover the erased symbol by collecting a sufficiently large number of its traces, each of which can be constructed from a number of traces of other symbols. We extend the trace collection technique to cope with two and three erasures.

preprint2019arXiv

Robust Positioning Patterns with Low Redundancy

A robust positioning pattern is a large array that allows a mobile device to locate its position by reading a possibly corrupted small window around it. In this paper, we provide constructions of binary positioning patterns, equipped with efficient locating algorithms, that are robust to a constant number of errors and have redundancy within a constant factor of optimality. Furthermore, we modify our constructions to correct rank errors and obtain binary positioning patterns robust to any errors of rank less than a constant number. Additionally, we construct $q$-ary robust positioning sequences robust to a large number of errors, some of which have length attaining the upper bound. Our construction of binary positioning sequences that are robust to a constant number of errors has the least known redundancy amongst those explicit constructions with efficient locating algorithms. On the other hand, for binary robust positioning arrays, our construction is the first explicit construction whose redundancy is within a constant factor of optimality. The locating algorithms accompanying both constructions run in time cubic in sequence length or array dimension.

preprint2016arXiv

Asymmetric Lee Distance Codes for DNA-Based Storage

We consider a new family of codes, termed asymmetric Lee distance codes, that arise in the design and implementation of DNA-based storage systems and systems with parallel string transmission protocols. The codewords are defined over a quaternary alphabet, although the results carry over to other alphabet sizes; furthermore, symbol confusability is dictated by their underlying binary representation. Our contributions are two-fold. First, we demonstrate that the new distance represents a linear combination of the Lee and Hamming distance and derive upper bounds on the size of the codes under this metric based on linear programming techniques. Second, we propose a number of code constructions which imply lower bounds.

preprint2016arXiv

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size $q$, read length $\ell$, and word length $n$.Consequently, we demonstrate that for $q\ge 2$ and $n\le q^{\ell/2-1}$, the number of profile vectors is at least $q^{κn}$ with $κ$ very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors.

preprint2016arXiv

Weakly Mutually Uncorrelated Codes

We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based storage systems and synchronization protocols. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. In addition, WMU sequences used in DNA-based storage systems are required to have balanced compositions of symbols and to be at large mutual Hamming distance from each other. We present a number of constructions for balanced, error-correcting WMU codes using Dyck paths, Knuth's balancing principle, prefix synchronized and cyclic codes.

preprint2015arXiv

Codes for DNA Sequence Profiles

We consider the problem of storing and retrieving information from synthetic DNA media. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on their collection of substrings observed through a noisy channel. This problem of reconstructing sequences from traces was first investigated in the noiseless setting under the name of "Markov type" analysis. Here, we explain the connection between the reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.

preprint2015arXiv

Codes for DNA Storage Channels

We consider the problem of assembling a sequence based on a collection of its substrings observed through a noisy channel. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on a collection of their substrings observed through a noisy channel. We explain the connection between the sequence reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.

preprint2015arXiv

DNA-Based Storage: Trends and Methods

We provide an overview of current approaches to DNA-based storage system design and accompanying synthesis, sequencing and editing methods. We also introduce and analyze a suite of new constrained coding schemes for both archival and random access DNA storage channels. The mathematical basis of our work is the construction and design of sequences over discrete alphabets that avoid pre-specified address patterns, have balanced base content, and exhibit other relevant substring constraints. These schemes adapt the stored signals to the DNA medium and thereby reduce the inherent error-rate of the system.

preprint2015arXiv

Local Codes with Addition Based Repair

We consider the complexities of repair algorithms for locally repairable codes and propose a class of codes that repair single node failures using addition operations only, or codes with addition based repair. We construct two families of codes with addition based repair. The first family attains distance one less than the Singleton-like upper bound, while the second family attains the Singleton-like upper bound.

preprint2015arXiv

Locally Encodable and Decodable Codes for Distributed Storage Systems

We consider the locality of encoding and decoding operations in distributed storage systems (DSS), and propose a new class of codes, called locally encodable and decodable codes (LEDC), that provides a higher degree of operational locality compared to currently known codes. For a given locality structure, we derive an upper bound on the global distance and demonstrate the existence of an optimal LEDC for sufficiently large field size. In addition, we also construct two families of optimal LEDC for fields with size linear in code length.

preprint2015arXiv

Product Construction of Affine Codes

Binary matrix codes with restricted row and column weights are a desirable method of coded modulation for power line communication. In this work, we construct such matrix codes that are obtained as products of affine codes - cosets of binary linear codes. Additionally, the constructions have the property that they are systematic. Subsequently, we generalize our construction to irregular product of affine codes, where the component codes are affine codes of different rates.

preprint2014arXiv

Constructions of Optimal and Near-Optimal Multiply Constant-Weight Codes

Multiply constant-weight codes (MCWCs) have been recently studied to improve the reliability of certain physically unclonable function response. In this paper, we give combinatorial constructions for MCWCs which yield several new infinite families of optimal MCWCs. Furthermore, we demonstrate that the Johnson type upper bounds of MCWCs are asymptotically tight for fixed weights and distances. Finally, we provide bounds and constructions of two dimensional MCWCs.

preprint2014arXiv

Decompositions of Edge-Colored Digraphs: A New Technique in the Construction of Constant-Weight Codes and Related Families

We demonstrate that certain Johnson-type bounds are asymptotically exact for a variety of classes of codes, namely, constant-composition codes, nonbinary constant-weight codes and multiply constant-weight codes. This was achieved via an interesting application of the theory of decomposition of edge-colored digraphs.

preprint2014arXiv

Generalized Balanced Tournament Packings and Optimal Equitable Symbol Weight Codes for Power Line Communications

Generalized balance tournament packings (GBTPs) extend the concept of generalized balanced tournament designs introduced by Lamken and Vanstone (1989). In this paper, we establish the connection between GBTPs and a class of codes called equitable symbol weight codes. The latter were recently demonstrated to optimize the performance against narrowband noise in a general coded modulation scheme for power line communications. By constructing classes of GBTPs, we establish infinite families of optimal equitable symbol weight codes with code lengths greater than alphabet size and whose narrowband noise error-correcting capability to code length ratios do not diminish to zero as the length grows.

preprint2014arXiv

Multiply Constant-Weight Codes and the Reliability of Loop Physically Unclonable Functions

We introduce the class of multiply constant-weight codes to improve the reliability of certain physically unclonable function (PUF) response. We extend classical coding methods to construct multiply constant-weight codes from known $q$-ary and constant-weight codes. Analogues of Johnson bounds are derived and are shown to be asymptotically tight to a constant factor under certain conditions. We also examine the rates of the multiply constant-weight codes and interestingly, demonstrate that these rates are the same as those of constant-weight codes of suitable parameters. Asymptotic analysis of our code constructions is provided.

preprint2014arXiv

Synchronizing Edits in Distributed Storage Networks

We consider the problem of synchronizing data in distributed storage networks under an edit model that includes deletions and insertions. We present two modifications of MDS, regenerating and locally repairable codes that allow updates in the parity-check values to be performed with one round of communication at low bit rates and using small storage overhead. Our main contributions are novel protocols for synchronizing both hot and semi-static data and protocols for data deduplication applications, based on intermediary permutation, Vandermonde and Cauchy matrix coding.

preprint2013arXiv

Cross-Bifix-Free Codes Within a Constant Factor of Optimality

A cross-bifix-free code is a set of words in which no prefix of any length of any word is the suffix of any word in the set. Cross-bifix-free codes arise in the study of distributed sequences for frame synchronization. We provide a new construction of cross-bifix-free codes which generalizes the construction in Bajic (2007) to longer code lengths and to any alphabet size. The codes are shown to be nearly optimal in size. We also establish new results on Fibonacci sequences, that are used in estimating the size of the cross-bifix-free codes.

preprint2013arXiv

Importance of Symbol Equity in Coded Modulation for Power Line Communications

The use of multiple frequency shift keying modulation with permutation codes addresses the problem of permanent narrowband noise disturbance in a power line communications system. In this paper, we extend this coded modulation scheme based on permutation codes to general codes and introduce an additional new parameter that more precisely captures a code's performance against permanent narrowband noise. As a result, we define a new class of codes, namely, equitable symbol weight codes, which are optimal with respect to this measure.

preprint2013arXiv

Maximum Distance Separable Codes for Symbol-Pair Read Channels

We study (symbol-pair) codes for symbol-pair read channels introduced recently by Cassuto and Blaum (2010). A Singleton-type bound on symbol-pair codes is established and infinite families of optimal symbol-pair codes are constructed. These codes are maximum distance separable (MDS) in the sense that they meet the Singleton-type bound. In contrast to classical codes, where all known q-ary MDS codes have length O(q), we show that q-ary MDS symbol-pair codes can have length Ω(q^2). In addition, we completely determine the existence of MDS symbol-pair codes for certain parameters.

preprint2013arXiv

Pure Asymmetric Quantum MDS Codes from CSS Construction: A Complete Characterization

Using the Calderbank-Shor-Steane (CSS) construction, pure $q$-ary asymmetric quantum error-correcting codes attaining the quantum Singleton bound are constructed. Such codes are called pure CSS asymmetric quantum maximum distance separable (AQMDS) codes. Assuming the validity of the classical MDS Conjecture, pure CSS AQMDS codes of all possible parameters are accounted for.

preprint2012arXiv

Estimates on the Size of Symbol Weight Codes

The study of codes for powerlines communication has garnered much interest over the past decade. Various types of codes such as permutation codes, frequency permutation arrays, and constant composition codes have been proposed over the years. In this work we study a type of code called the bounded symbol weight codes which was first introduced by Versfeld et al. in 2005, and a related family of codes that we term constant symbol weight codes. We provide new upper and lower bounds on the size of bounded symbol weight and constant symbol weight codes. We also give direct and recursive constructions of codes for certain parameters.

Han Mao Kiah

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Bounds on Codes Correcting Transpositions of Consecutive Symbols

Reconstructing Reed-Solomon Codes from Multiple Noisy Channel Outputs

Trace Repair Never Loses to Classical Repair: Exact and Explicit Helper Nodes Selection

Two dimensional RC/Subarray Constrained Codes: Bounded Weight and Almost Balanced Weight

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

Coding for Sequence Reconstruction for Single Edits

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications

Explicit Baranyai Partitions for Quadruples, Part I: Quadrupling Constructions

Optimal Reconstruction Codes for Deletion Channels

Repairing Reed-Solomon Codes via Subspace Polynomials

Repairing Reed-Solomon Codes With Multiple Erasures

Robust Positioning Patterns with Low Redundancy

Asymmetric Lee Distance Codes for DNA-Based Storage

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Weakly Mutually Uncorrelated Codes

Codes for DNA Sequence Profiles

Codes for DNA Storage Channels

DNA-Based Storage: Trends and Methods

Local Codes with Addition Based Repair

Locally Encodable and Decodable Codes for Distributed Storage Systems

Product Construction of Affine Codes

Constructions of Optimal and Near-Optimal Multiply Constant-Weight Codes

Decompositions of Edge-Colored Digraphs: A New Technique in the Construction of Constant-Weight Codes and Related Families

Generalized Balanced Tournament Packings and Optimal Equitable Symbol Weight Codes for Power Line Communications

Multiply Constant-Weight Codes and the Reliability of Loop Physically Unclonable Functions

Synchronizing Edits in Distributed Storage Networks

Cross-Bifix-Free Codes Within a Constant Factor of Optimality

Importance of Symbol Equity in Coded Modulation for Power Line Communications

Maximum Distance Separable Codes for Symbol-Pair Read Channels

Pure Asymmetric Quantum MDS Codes from CSS Construction: A Complete Characterization

Estimates on the Size of Symbol Weight Codes