Researcher profile

Han Mao Kiah

Han Mao Kiah contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Bounds on Codes Correcting Transpositions of Consecutive Symbols

The problem of correcting transpositions (or swaps) of consecutive symbols in $ q $-ary strings is studied. Lower bounds on asymptotically achievable rates of codes correcting $ t = τn $ transpositions are derived. The first bound is obtained by analyzing the average cardinality of ``transposition balls'' and evaluating the appropriate version of the generalized Gilbert--Varshamov bound, while the second bound follows from a construction of codes correcting an arbitrary number of transpositions (i.e., zero-error codes). Asymptotic bounds on the cardinality of optimal codes correcting $ t = \textrm{const} $ transpositions are also derived.

preprint2026arXiv

Reconstructing Reed-Solomon Codes from Multiple Noisy Channel Outputs

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication setting in which a sender transmits a codeword and the receiver observes K independent noisy versions of this codeword. In this work, we study the problem of efficient reconstruction when each of the $K$ outputs is corrupted by a $q$-ary discrete memoryless symmetric (DMS) substitution channel with substitution probability $p$. Focusing on Reed-Solomon (RS) codes, we adapt the Koetter-Vardy soft-decision decoding algorithm to obtain an efficient reconstruction algorithm. For sufficiently large blocklength and alphabet size, we derive an explicit rate threshold, depending only on $(p, K)$, such that the transmitted codeword can be reconstructed with arbitrarily small probability of error whenever the code rate $R$ lies below this threshold.

preprint2026arXiv

Trace Repair Never Loses to Classical Repair: Exact and Explicit Helper Nodes Selection

Repairing Reed-Solomon codes with low bandwidth is a central challenge in distributed storage. Following the trace-repair framework of Guruswami and Wootters (2017), recent works by Lin (2023) and Liu-Wan-Xing (2024) provided significant improvements in bandwidth using two distinct ideas. Lin constructed a trace-repair scheme that requires no contribution from a set of predetermined nodes $\mathscr{S}$, while Liu-Wan-Xing identified linear dependencies among the downloaded traces, relating the number of dependent traces to the dimension of a subspace $\mathscr{W}_k$. In this work, we fully utilize and unify these ideas. We compute the exact dimension of $\mathscr{W}_{k,\mathscr{S}}$ (a generalization of $\mathscr{W}_k$). We identify the trade-off between the set size $|\mathscr{S}|$ and the dimension $\dim(\mathscr{W}_{k,\mathscr{S}})$. We provide an algorithm to find the combination that results in the lowest bandwidth. Furthermore, we provide an explicit choice of the helper nodes for the repair. Finally, we prove that our optimized scheme never loses to the classical repair scheme, establishing a bandwidth guarantee of at most $k\log|\mathbb{F}|$ bits for all dimension $k$ and field $\mathbb{F}$, whenever the trace repair is applicable.

preprint2022arXiv

Two dimensional RC/Subarray Constrained Codes: Bounded Weight and Almost Balanced Weight

In this work, we study two types of constraints on two-dimensional binary arrays. In particular, given $p,ε>0$, we study (i) The $p$-bounded constraint: a binary vector of size $m$ is said to be $p$-bounded if its weight is at most $pm$, and (ii) The $ε$-balanced constraint: a binary vector of size $m$ is said to be $ε$-balanced if its weight is within $[(0.5-ε)*m,(0.5+ε)*m]$. Such constraints are crucial in several data storage systems, those regard the information data as two-dimensional (2D) instead of one-dimensional (1D), such as the crossbar resistive memory arrays and the holographic data storage. In this work, efficient encoding/decoding algorithms are presented for binary arrays so that the weight constraint (either $p$-bounded constraint or $ε$-balanced constraint) is enforced over every row and every column, regarded as 2D row-column (RC) constrained codes; or over every subarray, regarded as 2D subarray constrained codes. While low-complexity designs have been proposed in the literature, mostly focusing on 2D RC constrained codes where $p = 1/2$ and $ε= 0$, this work provides efficient coding methods that work for both 2D RC constrained codes and 2D subarray constrained codes, and more importantly, the methods are applicable for arbitrary values of $p$ and $ε$. Furthermore, for certain values of $p$ and $ε$, we show that, for sufficiently large array size, there exists linear-time encoding/decoding algorithm that incurs at most one redundant bit.

preprint2020arXiv

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given $\ell, ε > 0$, we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most $\ell$, (ii) GC-content constraint: the GC-content of each codeword is within $[0.5-ε, 0.5+ε]$, (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of $\ell$ and $ε$, we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.

preprint2020arXiv

Coding for Sequence Reconstruction for Single Edits

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. The common setup assumes the codebook to be the entire space and the problem is to determine the minimum number of distinct reads that is required to reconstruct the transmitted codeword. Motivated by modern storage devices, we study a variant of the problem where the number of noisy reads $N$ is fixed. Specifically, we design reconstruction codes that reconstruct a codeword from $N$ distinct noisy reads. We focus on channels that introduce single edit error (i.e. a single substitution, insertion, or deletion) and their variants, and design reconstruction codes for all values of $N$. In particular, for the case of a single edit, we show that as the number of noisy reads increases, the number of redundant bits required can be gracefully reduced from $\log n+O(1)$ to $\log \log n+O(1)$, and then to $O(1)$, where $n$ denotes the length of a codeword. We also show that the redundancy of certain reconstruction codes is within one bit of optimality.

preprint2020arXiv

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications

The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be also defined and viewed as constrained sequences. Hence, they will be called constrained de Bruijn sequences and a set of such sequences will be called a constrained de Bruijn code. Several properties and alternative definitions for such codes are examined and they are analyzed as generalized sequences in the de Bruijn graph (and its generalization) and as constrained sequences. Various enumeration techniques are used to compute the total number of sequences for any given set of parameters. A construction method of such codes from the theory of shift-register sequences is proposed. Finally, we show how these constrained de Bruijn sequences and codes can be applied in constructions of codes for correcting synchronization errors in the $\ell$-symbol read channel and in the racetrack memory channel. For this purpose, these codes are superior in their size on previously known codes.

preprint2020arXiv

Explicit Baranyai Partitions for Quadruples, Part I: Quadrupling Constructions

It is well known that, whenever $k$ divides $n$, the complete $k$-uniform hypergraph on $n$ vertices can be partitioned into disjoint perfect matchings. Equivalently, the set of $k$-subsets of an $n$-set can be partitioned into parallel classes so that each parallel class is a partition of the $n$-set. This result is known as Baranyai's theorem, which guarantees the existence of \emph{Baranyai partitions}. Unfortunately, the proof of Baranyai's theorem uses network flow arguments, making this result non-explicit. In particular, there is no known method to produce Baranyai partitions in time and space that scale linearly with the number of hyperedges in the hypergraph. It is desirable for certain applications to have an explicit construction that generates Baranyai partitions in linear time. Such an efficient construction is known for $k=2$ and $k=3$. In this paper, we present an explicit recursive quadrupling construction for $k=4$ and $n=4t$, where $t \equiv 0,3,4,6,8,9 ~(\text{mod}~12)$. In a follow-up paper (Part II), the other values of~$t$, namely $t \equiv 1,2,5,7,10,11 ~(\text{mod}~12)$, will be considered.

preprint2020arXiv

Optimal Reconstruction Codes for Deletion Channels

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. Motivated by modern storage devices, we introduced a variant of the problem where the number of noisy reads $N$ is fixed (Kiah et al. 2020). Of significance, for the single-deletion channel, using $\log_2\log_2 n +O(1)$ redundant bits, we designed a reconstruction code of length $n$ that reconstructs codewords from two distinct noisy reads. In this work, we show that $\log_2\log_2 n -O(1)$ redundant bits are necessary for such reconstruction codes, thereby, demonstrating the optimality of our previous construction. Furthermore, we show that these reconstruction codes can be used in $t$-deletion channels (with $t\ge 2$) to uniquely reconstruct codewords from $n^{t-1}+O\left(n^{t-2}\right)$ distinct noisy reads.

preprint2020arXiv

Repairing Reed-Solomon Codes via Subspace Polynomials

We propose new repair schemes for Reed-Solomon codes that use subspace polynomials and hence generalize previous works in the literature that employ trace polynomials. The Reed-Solomon codes are over $\mathbb{F}_{q^\ell}$ and have redundancy $r = n-k \geq q^m$, $1\leq m\leq \ell$, where $n$ and $k$ are the code length and dimension, respectively. In particular, for one erasure, we show that our schemes can achieve optimal repair bandwidths whenever $n=q^\ell$ and $r = q^m,$ for all $1 \leq m \leq \ell$. For two erasures, our schemes use the same bandwidth per erasure as the single erasure schemes, for $\ell/m$ is a power of $q$, and for $\ell=q^a$, $m=q^b-1>1$ ($a \geq b \geq 1$), and for $m\geq \ell/2$ when $\ell$ is even and $q$ is a power of two.

preprint2020arXiv

Repairing Reed-Solomon Codes With Multiple Erasures

Despite their exceptional error-correcting properties, Reed-Solomon codes have been overlooked in distributed storage applications due to the common belief that they have poor repair bandwidth: A naive repair approach would require the whole file to be reconstructed in order to recover a single erased codeword symbol. In a recent work, Guruswami and Wootters (STOC'16) proposed a single-erasure repair method for Reed-Solomon codes that achieves the optimal repair bandwidth amongst all linear encoding schemes. Their key idea is to recover the erased symbol by collecting a sufficiently large number of its traces, each of which can be constructed from a number of traces of other symbols. We extend the trace collection technique to cope with two and three erasures.

preprint2019arXiv

Robust Positioning Patterns with Low Redundancy

A robust positioning pattern is a large array that allows a mobile device to locate its position by reading a possibly corrupted small window around it. In this paper, we provide constructions of binary positioning patterns, equipped with efficient locating algorithms, that are robust to a constant number of errors and have redundancy within a constant factor of optimality. Furthermore, we modify our constructions to correct rank errors and obtain binary positioning patterns robust to any errors of rank less than a constant number. Additionally, we construct $q$-ary robust positioning sequences robust to a large number of errors, some of which have length attaining the upper bound. Our construction of binary positioning sequences that are robust to a constant number of errors has the least known redundancy amongst those explicit constructions with efficient locating algorithms. On the other hand, for binary robust positioning arrays, our construction is the first explicit construction whose redundancy is within a constant factor of optimality. The locating algorithms accompanying both constructions run in time cubic in sequence length or array dimension.