Source author record

Eun Kyung Lee

Eun Kyung Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.IM Distributed, Parallel, and Cluster Computing Hardware Architecture hep-ex Machine Learning nucl-ex Performance physics.ins-det

Catalog footprint

What is connected

3works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter operations, yet existing approaches either require production-level code and expensive profiling or fail to accurately capture multi-GPU energy behavior. As a result, practitioners lack tools for deciding which optimizations to prioritize and for selecting among existing deployment configurations when exhaustive profiling is impractical. EnergyLens addresses this gap with an intuitive einsum-based interface that captures LLM specifications including fusion, parallelism, and compute-communication overlap, combined with load-imbalance-aware MoE modeling and an empirically driven communication energy model for multi-GPU settings. We validate EnergyLens on Llama3 and Qwen3-MoE across tensor-parallel and expert-parallel configurations, achieving mean absolute percentage errors (MAPEs) between 9.25% and 13.19% for multi-GPU prefill and decode energy, and 12.97% across SM allocations for Megatron-style overlap. Our energy-driven exploration reveals up to 1.47x and 52.9x energy variation across configurations in prefill and decode efficiency and motivates distributed serving. We further show that compute-communication overlap is difficult to optimize with intuition alone, but EnergyLens correctly identifies Pareto-optimal overlap configurations.

preprint2025arXiv

Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications

Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload. Once the prefill GPU completes its task, the KV cache must be transferred to the decode GPU. While existing works have proposed various KV cache transfer paths across different memory and storage tiers, there remains a lack of systematic benchmarking that compares their performance and energy efficiency. Meanwhile, although optimization techniques such as KV cache reuse and frequency scaling have been utilized for disaggregated serving, their performance and energy implications have not been rigorously benchmarked. In this paper, we fill this research gap by re-evaluating prefill-decode disaggregation under different KV transfer mediums and optimization strategies. Specifically, we include a new colocated serving baseline and evaluate disaggregated setups under different KV cache transfer paths. Through GPU profiling using dynamic voltage and frequency scaling (DVFS), we identify and compare the performance-energy Pareto frontiers across all setups to evaluate the potential energy savings enabled by disaggregation. Our results show that performance benefits from prefill-decode disaggregation are not guaranteed and depend on the request load and KV transfer mediums. In addition, stage-wise independent frequency scaling enabled by disaggregation does not lead to energy saving due to inherently higher energy consumption of disaggregated serving.

preprint2020arXiv

Measurement of the Background Activities of a 100Mo-enriched powder sample for AMoRE crystal material using a single high purity germanium detector

The Advanced Molybdenum-based Rare process Experiment (AMoRE) searches for neutrino-less double-beta (0ν\b{eta}\b{eta}) decay of 100Mo in enriched molybdate crystals. The AMoRE crystals must have low levels of radioactive contamination to achieve low background signals with energies near the Q-value of the 100Mo 0ν\b{eta}\b{eta} decay. To produce low-activity crystals, radioactive contaminants in the raw materials used to form the crystals must be controlled and quantified. 100EnrMoO3 powder, which is enriched in the 100Mo isotope, is of particular interest as it is the source of 100Mo in the crystals. A high-purity germanium detector having 100% relative efficiency, named CC1, is being operated in the Yangyang underground laboratory. Using CC1, we collected a gamma spectrum from a 1.6-kg 100EnrMoO3 powder sample enriched to 96.4% in 100Mo. Activities were analyzed for the isotopes 228Ac, 228Th, 226Ra, and 40K. They are long-lived naturally occurring isotopes that can produce background signals in the region of interest for AMoRE. Activities of both 228Ac and 228Th were < 1.0 mBq/kg at 90% confidence level (C.L.). The activity of 226Ra was measured to be 5.1 \pm 0.4 (stat) \pm 2.2 (syst) mBq/kg. The 40K activity was found as < 16.4 mBq/kg at 90% C.L.