Source author record

Yongjune Kim

Yongjune Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Hardware Architecture Machine Learning Artificial Intelligence eess.SP Emerging Technologies

Catalog footprint

What is connected

9works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression

Long-context inference is increasingly a memory-traffic problem. The culprit is the key--value (KV) cache: it grows with context length, batch size, layers, and heads, and it is read at every decoding step. Rotation-based scalar codecs meet this systems constraint by storing a norm, applying a shared random rotation, and quantizing one coordinate at a time. They are universal and random-access, but they discard the geometry created by the normalization step. After a Haar rotation, a block of $k$ consecutive coordinates is not a product source; it is a spherical-Beta source on the unit ball. We introduce \textsc{FibQuant}, a universal fixed-rate vector quantizer that keeps the same normalize--rotate--store interface while replacing scalar tables by a shared radial--angular codebook matched to this canonical source. The codebook combines Beta-quantile radii, Fibonacci\,/\,Roberts--Kronecker quasi-uniform directions, and multi-restart Lloyd--Max refinement. We prove that the resulting vector code strictly improves on its scalar product specialization at matched rate, with a high-rate gain that separates into a cell-shaping factor and a density-matching factor. The same construction gives a dense rate axis, including fractional-bit and sub-one-bit operating points, without calibration or variable-length addresses. On GPT-2 small KV caches, \textsc{FibQuant} traces a memory--fidelity frontier from $5\times$ compression at $0.99$ attention cosine similarity to $34\times$ at $0.95$. End-to-end on TinyLlama-1.1B, it is within $0.10$ perplexity of fp16 at $4\times$ compression and has $3.6\times$ lower perplexity than scalar \textsc{TurboQuant} at $b = 2$ ($8\times$ compression), where scalar random-access quantization begins to fail.

preprint2020arXiv

Optimizing the Write Fidelity of MRAMs

Magnetic random-access memory (MRAM) is a promising memory technology due to its high density, non-volatility, and high endurance. However, achieving high memory fidelity incurs significant write-energy costs, which should be reduced for large-scale deployment of MRAMs. In this paper, we formulate an optimization problem for maximizing the memory fidelity given energy constraints, and propose a biconvex optimization approach to solve it. The basic idea is to allocate non-uniform write pulses depending on the importance of each bit position. The fidelity measure we consider is minimum mean squared error (MSE), for which we propose an iterative water-filling algorithm. Although the iterative algorithm does not guarantee global optimality, we can choose a proper starting point that decreases the MSE exponentially and guarantees fast convergence. For an 8-bit accessed word, the proposed algorithm reduces the MSE by a factor of 21.

preprint2019arXiv

On the Optimal Refresh Power Allocation for Energy-Efficient Memories

Refresh is an important operation to prevent loss of data in dynamic random-access memory (DRAM). However, frequent refresh operations incur considerable power consumption and degrade system performance. Refresh power cost is especially significant in high-capacity memory devices and battery-powered edge/mobile applications. In this paper, we propose a principled approach to optimizing the refresh power allocation. Given a model for the bit error rate dependence on power, we formulate a convex optimization problem to minimize the word mean squared error for a refresh power constraint; hence we can guarantee the optimality of the obtained refresh power allocations. In addition, we provide an integer programming problem to optimize the discrete refresh interval assignments. For an 8-bit accessed word, numerical results show that the optimized nonuniform refresh intervals reduce the refresh power by 29% at a peak signal-to-noise ratio of 50dB compared to the uniform assignment.

preprint2016arXiv

Understanding the Energy and Precision Requirements for Online Learning

It is well-known that the precision of data, hyperparameters, and internal representations employed in learning systems directly impacts its energy, throughput, and latency. The precision requirements for the training algorithm are also important for systems that learn on-the-fly. Prior work has shown that the data and hyperparameters can be quantized heavily without incurring much penalty in classification accuracy when compared to floating point implementations. These works suffer from two key limitations. First, they assume uniform precision for the classifier and for the training algorithm and thus miss out on the opportunity to further reduce precision. Second, prior works are empirical studies. In this article, we overcome both these limitations by deriving analytical lower bounds on the precision requirements of the commonly employed stochastic gradient descent (SGD) on-line learning algorithm in the specific context of a support vector machine (SVM). Lower bounds on the data precision are derived in terms of the the desired classification accuracy and precision of the hyperparameters used in the classifier. Additionally, lower bounds on the hyperparameter precision in the SGD training algorithm are obtained. These bounds are validated using both synthetic and the UCI breast cancer dataset. Additionally, the impact of these precisions on the energy consumption of a fixed-point SVM with on-line training is studied.

preprint2015arXiv

Coding scheme for 3D vertical flash memory

Recently introduced 3D vertical flash memory is expected to be a disruptive technology since it overcomes scaling challenges of conventional 2D planar flash memory by stacking up cells in the vertical direction. However, 3D vertical flash memory suffers from a new problem known as fast detrapping, which is a rapid charge loss problem. In this paper, we propose a scheme to compensate the effect of fast detrapping by intentional inter-cell interference (ICI). In order to properly control the intentional ICI, our scheme relies on a coding technique that incorporates the side information of fast detrapping during the encoding stage. This technique is closely connected to the well-known problem of coding in a memory with defective cells. Numerical results show that the proposed scheme can effectively address the problem of fast detrapping.

preprint2014arXiv

Writing on dirty flash memory

The most important challenge in the scaling down of flash memory is its increased inter-cell interference (ICI). If side information about ICI is known to the encoder, the flash memory channel can be viewed as similar to Costa's "writing on dirty paper (dirty paper coding)." We first explain why flash memories are dirty due to ICI. We then show that "dirty flash memory" can be changed into "memory with defective cells" model by using only one pre-read operation. The asymmetry between write and erase operations in flash memory plays an important role in this change. Based on the "memory with defective cells" model, we show that additive encoding can significantly improve the probability of decoding failure by using the side information.

preprint2013arXiv

Coding for Memory with Stuck-at Defects

In this paper, we propose an encoding scheme for partitioned linear block codes (PLBC) which mask the stuck-at defects in memories. In addition, we derive an upper bound and the estimate of the probability that masking fails. Numerical results show that PLBC can efficiently mask the defects with the proposed encoding scheme. Also, we show that our upper bound is very tight by using numerical results.

preprint2013arXiv

Modulation Coding for Flash Memories

The aggressive scaling down of flash memories has threatened data reliability since the scaling down of cell sizes gives rise to more serious degradation mechanisms such as cell-to-cell interference and lateral charge spreading. The effect of these mechanisms has pattern dependency and some data patterns are more vulnerable than other ones. In this paper, we will categorize data patterns taking into account degradation mechanisms and pattern dependency. In addition, we propose several modulation coding schemes to improve the data reliability by transforming original vulnerable data patterns into more robust ones.

preprint2013arXiv

Redundancy Allocation of Partitioned Linear Block Codes

Most memories suffer from both permanent defects and intermittent random errors. The partitioned linear block codes (PLBC) were proposed by Heegard to efficiently mask stuck-at defects and correct random errors. The PLBC have two separate redundancy parts for defects and random errors. In this paper, we investigate the allocation of redundancy between these two parts. The optimal redundancy allocation will be investigated using simulations and the simulation results show that the PLBC can significantly reduce the probability of decoding failure in memory with defects. In addition, we will derive the upper bound on the probability of decoding failure of PLBC and estimate the optimal redundancy allocation using this upper bound. The estimated redundancy allocation matches the optimal redundancy allocation well.

Yongjune Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression

Optimizing the Write Fidelity of MRAMs

On the Optimal Refresh Power Allocation for Energy-Efficient Memories

Understanding the Energy and Precision Requirements for Online Learning

Coding scheme for 3D vertical flash memory

Writing on dirty flash memory

Coding for Memory with Stuck-at Defects

Modulation Coding for Flash Memories

Redundancy Allocation of Partitioned Linear Block Codes