Researcher profile

Paul H. Siegel

Paul H. Siegel contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2023arXiv

Polar Codes with Local-Global Decoding

In this paper, we investigate a coupled polar code architecture that supports both local and global decoding. This local-global construction is motivated by practical applications in data storage and transmission where reduced-latency recovery of sub-blocks of the coded information is required. Local decoding allows random access to sub-blocks of the full code block. When local decoding performance is insufficient, global decoding provides improved data reliability. The coupling scheme incorporates a systematic outer polar code and a partitioned mapping of the outer codeword to semipolarized bit-channels of the inner polar codes. Error rate simulation results are presented for 2 and 4 sub-blocks. Design issues affecting the trade-off between local and global decoding performance are also discussed.

preprint2022arXiv

Adaptive Read Thresholds for NAND Flash

A primary source of increased read time on NAND flash comes from the fact that in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of re-reads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are considered. For hard decoding, the paper attempts to find a read threshold minimizing bit-error-rate (BER) and derives an expression for the resulting codeword-error-rate. For soft decoding, it shows that minimizing BER and minimizing codeword-error-rate are competing objectives in the presence of a limited number of allowed re-reads, and proposes a trade-off between the two. The proposed method does not require any prior knowledge about the noise distribution, but can take advantage of such information when it is available. Each read threshold is chosen based on the results of previous reads, following an optimal policy derived through a dynamic programming backward recursion. The method and results are studied from the perspective of an SLC Flash memory with Gaussian noise for each level but the paper explains how the method could be extended to other scenarios.

preprint2022arXiv

Rate-Constrained Shaping Codes for Finite-State Channels With Cost

Shaping codes are used to generate code sequences in which the symbols obey a prescribed probability distribution. They arise naturally in the context of source coding for noiseless channels with unequal symbol costs. Recently, shaping codes have been proposed to extend the lifetime of flash memory and reduce DNA synthesis time. In this paper, we study a general class of shaping codes for noiseless finite-state channels with cost and i.i.d. sources. We establish a relationship between the code rate and minimum average symbol cost. We then determine the rate that minimizes the average cost per source symbol (total cost). An equivalence is established between codes minimizing average symbol cost and codes minimizing total cost, and a separation theorem is proved, showing that optimal shaping can be achieved by a concatenation of optimal compression and optimal shaping for a uniform i.i.d. source.

preprint2022arXiv

Spatio-Temporal Modeling for Flash Memory Channels Using Conditional Generative Nets

We propose a data-driven approach to modeling the spatio-temporal characteristics of NAND flash memory read voltages using conditional generative networks. The learned model reconstructs read voltages from an individual memory cell based on the program levels of the cell and its surrounding cells, as well as the specified program/erase (P/E) cycling time stamp. We evaluate the model over a range of time stamps using the cell read voltage distributions, the cell level error rates, and the relative frequency of errors for patterns most susceptible to inter-cell interference (ICI) effects. We conclude that the model accurately captures the spatial and temporal features of the flash memory channel.

preprint2020arXiv

Coding over Sets for DNA Storage

In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of $M$ sequences, each of length $L$. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.

preprint2020arXiv

Covering Codes using Insertions or Deletions

A covering code is a set of codewords with the property that the union of balls, suitably defined, around these codewords covers an entire space. Generally, the goal is to find the covering code with the minimum size codebook. While most prior work on covering codes has focused on the Hamming metric, we consider the problem of designing covering codes defined in terms of either insertions or deletions. First, we provide new sphere-covering lower bounds on the minimum possible size of such codes. Then, we provide new existential upper bounds on the size of optimal covering codes for a single insertion or a single deletion that are tight up to a constant factor. Finally, we derive improved upper bounds for covering codes using $R\geq 2$ insertions or deletions. We prove that codes exist with density that is only a factor $O(R \log R)$ larger than the lower bounds for all fixed~$R$. In particular, our upper bounds have an optimal dependence on the word length, and we achieve asymptotic density matching the best known bounds for Hamming distance covering codes.

preprint2020arXiv

On Bi-Modal Constrained Coding

Bi-modal (respectively, multi-modal) constrained coding refers to an encoding model whereby a user input block can be mapped to two (respectively, multiple) codewords. In current storage applications, such as optical disks, multi-modal coding allows to achieve DC control, in addition to satisfying the runlength limited (RLL) constraint specified by the recording channel. In this work, a study is initiated on bi-modal fixed-length constrained encoders. Necessary and sufficient conditions are presented for the existence of such encoders for a given constraint. It is also shown that under somewhat stronger conditions, one can guarantee a bi-modal encoder with finite decoding delay.

preprint2020arXiv

On the Performance of Direct Shaping Codes

In this work, we study a recently proposed direct shaping code for flash memory. This rate-1 code is designed to reduce the wear for SLC (one bit per cell) flash by minimizing the average fraction of programmed cells when storing structured data. Then we describe an adaptation of this algorithm that provides data shaping for MLC (two bits per cell) flash memory. It makes use of a page-dependent cost model and is designed to be compatible with the standard procedure of row-by-row, page-based, wordline programming. We also give experimental results demonstrating the performance of MLC data shaping codes when applied to English and Chinese language text. We then study the potential error propagation properties of direct shaping codes when used in a noisy flash device. In particular, we model the error propagation as a biased random walk in a multidimensional space. We prove an upper bound on the error propagation probability and propose an algorithm that can numerically approach a lower bound. Finally, we study the asymptotic performance of direct shaping codes. We prove that the SLC direct shaping code is suboptimal in the sense that it can only achieve the minimum average cost for a rate-1 code under certain conditions on the source distribution.

preprint2020arXiv

PR-NN: RNN-based Detection for Coded Partial-Response Channels

In this paper, we investigate the use of recurrent neural network (RNN)-based detection of magnetic recording channels with inter-symbol interference (ISI). We refer to the proposed detection method, which is intended for recording channels with partial-response equalization, as Partial-Response Neural Network (PR-NN). We train bi-directional gated recurrent units (bi-GRUs) to recover the ISI channel inputs from noisy channel output sequences and evaluate the network performance when applied to continuous, streaming data. The computational complexity of PR-NN during the evaluation process is comparable to that of a Viterbi detector. The recording system on which the experiments were conducted uses a rate-2/3, (1,7) runlength-limited (RLL) code with an E2PR4 partial-response channel target. Experimental results with ideal PR signals show that the performance of PR-NN detection approaches that of Viterbi detection in additive white gaussian noise (AWGN). Moreover, the PR-NN detector outperforms Viterbi detection and achieves the performance of Noise-Predictive Maximum Likelihood (NPML) detection in additive colored noise (ACN) at different channel densities. A PR-NN detector trained with both AWGN and ACN maintains the performance observed under separate training. Similarly, when trained with ACN corresponding to two different channel densities, PR-NN maintains its performance at both densities. Experiments confirm that this robustness is consistent over a wide range of signal-to-noise ratios (SNRs). Finally, PR-NN displays robust performance when applied to a more realistic magnetic recording channel with MMSE-equalized Lorentzian signals.

preprint2020arXiv

Rate-Constrained Shaping Codes for Structured Sources

Shaping codes are used to encode information for use on channels with cost constraints. Applications include data transmission with a power constraint and, more recently, data storage on flash memories with a constraint on memory cell wear. In the latter application, system requirements often impose a rate constraint. In this paper, we study rate-constrained fixed-to-variable length shaping codes for noiseless, memoryless costly channels and general i.i.d. sources. The analysis relies on the theory of word-valued sources. We establish a relationship between the code expansion factor and minimum average symbol cost. We then determine the expansion factor that minimizes the average cost per source symbol (total cost), corresponding to a conventional optimal source code with cost. An equivalence is established between codes minimizing average symbol cost and codes minimizing total cost, and a separation theorem is proved, showing that optimal shaping can be achieved by a concatenation of optimal compression and optimal shaping for a uniform i.i.d. source. Shaping codes often incorporate, either explicitly or implicitly, some form of non-equiprobable signaling. We use our results to further explore the connections between shaping codes and codes that map a sequence of i.i.d. source symbols into an output sequence of symbols that are approximately independent and distributed according to a specified target distribution, such as distribution matching (DM) codes. Optimal DM codes are characterized in terms of a new performance measure - generalized expansion factor (GEF) - motivated by the costly channel perspective. The GEF is used to study DM codes that minimize informational divergence and normalized informational divergence.