Source author record

Jemin Lee

Jemin Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Networking and Internet Architecture Artificial Intelligence Distributed, Parallel, and Cluster Computing Performance

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention

FlashAttention improves efficiency through tiling, but its online softmax still relies on floating-point arithmetic for numerical stability, making full quantization difficult. We identify three main obstacles to integer-only FlashAttention: (1) scale explosion during tile-wise accumulation, (2) inefficient shift-based exponential operations on GPUs, and (3) quantization granularity constraints requiring uniform scales for integer comparison. To address these challenges, we propose \textit{QFlash}, an end-to-end integer FlashAttention design that performs softmax entirely in the integer domain and runs as a single Triton kernel. On seven attention workloads from ViT, DeiT, and Swin models, QFlash achieves up to 6.73$\times$ speedup over I-ViT and up to 8.69$\times$ speedup on Swin, while reducing energy consumption by 18.8\% compared to FP16 FlashAttention, without sacrificing Top-1 accuracy on ViT/DeiT and remaining competitive on Swin under per-tensor quantization. Our code is publicly available at https://github.com/EfficientCompLab/qflash.

preprint2023arXiv

Joint Service Caching and Computing Resource Allocation for Edge Computing-Enabled Networks

In this paper, we consider the service caching and the computing resource allocation in edge computing (EC) enabled networks. We introduce a random service caching design considering multiple types of latency sensitive services and the base stations (BSs)' service caching storage. We then derive a successful service probability (SSP). We also formulate a SSP maximization problem subject to the service caching distribution and the computing resource allocation. Then, we show that the optimization problem is nonconvex and develop a novel algorithm to obtain the stationary point of the SSP maximization problem by adopting the parallel successive convex approximation (SCA). Moreover, to further reduce the computational complexity, we also provide a low complex algorithm that can obtain the near-optimal solution of the SSP maximization problem in high computing capability region. Finally, from numerical simulations, we show that proposed solutions achieve higher SSP than baseline schemes. Moreover, we show that the near-optimal solution achieves reliable performance in the high computing capability region. We also explore the impacts of target delays, a BSs' service cache size, and an EC servers' computing capability on the SSP.

preprint2022arXiv

CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

Mobile devices run deep learning models for various purposes, such as image classification and speech recognition. Due to the resource constraints of mobile devices, researchers have focused on either making a lightweight deep neural network (DNN) model using model pruning or generating an efficient code using compiler optimization. Surprisingly, we found that the straightforward integration between model compression and compiler auto-tuning often does not produce the most efficient model for a target device. We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy. CPrune makes a lightweight DNN model through informed pruning based on the structural information of subgraphs built during the compiler tuning process. Our experimental results show that CPrune increases the DNN execution speed up to 2.73x compared to the state-of-the-art TVM auto-tune while satisfying the accuracy requirement.

preprint2022arXiv

Facing to Latency of Hyperledger Fabric for Blockchain-enabled IoT: Modeling and Analysis

Hyperledger Fabric (HLF), one of the most popular private blockchains, has recently received attention for blockchain-enabled Internet of Things (IoT). However, for IoT applications to handle time-sensitive data, the processing latency in HLF has emerged as a new challenge. In this article, therefore, we establish a practical HLF latency model for HLF-enabled IoT. We first discuss the structure and transaction flow of HLF-enabled IoT. After implementing real HLF, we capture the latencies that each transaction experiences and show that the total latency of HLF can be modeled as a Gamma distribution, which is validated by conducting a goodness-of-fit test (i.e., the Kolmogorov-Smirnov (KS) test). We also provide the parameter values of the modeled latency distribution for various HLF environments. Furthermore, we explore the impacts of three important HLF parameters including transaction generation rate, block size, and block-generation timeout on the HLF latency. As a result, this article provides design insights on minimizing the latency for HLF-enabled IoT.

preprint2020arXiv

Mobile Edge Computing-Enabled Heterogeneous Networks

The mobile edge computing (MEC) has been introduced for providing computing capabilities at the edge of networks to improve the latency performance of wireless networks. In this paper, we provide the novel framework for MEC-enabled heterogeneous networks (HetNets), composed of the multi-tier networks with access points (APs) (i.e., MEC servers), which have different transmission power and different computing capabilities. In this framework, we also consider multiple-type mobile users with different sizes of computation tasks, and they offload the tasks to a MEC server, and receive the computation resulting data from the server. We derive the successful edge computing probability (SECP), defined as the probability that a user offloads and finishes its computation task at the MEC server within the target latency. We provide a closed-form expression of the approximated SECP for general case, and closed-form expressions of the exact SECP for special cases. This paper then provides the design insights for the optimal configuration of MEC-enabled HetNets by analyzing the effects of network parameters and bias factors, used in MEC server association, on the SECP. Specifically, it shows how the optimal bias factors in terms of SECP can be changed according to the numbers of user types and tiers of MEC servers, and how they are different to the conventional ones that did not consider the computing capabilities and task sizes.

preprint2016arXiv

Cooperative Caching and Transmission Design in Cluster-Centric Small Cell Networks

Wireless content caching in small cell networks (SCNs) has recently been considered as an efficient way to reduce the traffic and the energy consumption of the backhaul in emerging heterogeneous cellular networks (HetNets). In this paper, we consider a cluster-centric SCN with combined design of cooperative caching and transmission policy. Small base stations (SBSs) are grouped into disjoint clusters, in which in-cluster cache space is utilized as an entity. We propose a combined caching scheme where part of the available cache space is reserved for caching the most popular content in every SBS, while the remaining is used for cooperatively caching different partitions of the less popular content in different SBSs, as a means to increase local content diversity. Depending on the availability and placement of the requested content, coordinated multipoint (CoMP) technique with either joint transmission (JT) or parallel transmission (PT) is used to deliver content to the served user. Using Poisson point process (PPP) for the SBS location distribution and a hexagonal grid model for the clusters, we provide analytical results on the successful content delivery probability of both transmission schemes for a user located at the cluster center. Our analysis shows an inherent tradeoff between transmission diversity and content diversity in our combined caching-transmission design. We also study optimal cache space assignment for two objective functions: maximization of the cache service performance and the energy efficiency. Simulation results show that the proposed scheme achieves performance gain by leveraging cache-level and signal-level cooperation and adapting to the network environment and user QoS requirements.

preprint2016arXiv

On the Secrecy Capacity Region of the 2-user Z Interference Channel with Unidirectional Transmitter Cooperation

In this work, the role of unidirectional limited rate transmitter cooperation is studied in the case of the 2-user Z interference channel (Z-IC) with secrecy constraints at the receivers, on achieving two conflicting goals simultaneously: mitigating interference and ensuring secrecy. First, the problem is studied under the linear deterministic model. The achievable schemes for the deterministic model use a fusion of cooperative precoding and transmission of a jamming signal. The optimality of the proposed scheme is established for the deterministic model for all possible parameter settings using the outer bounds derived by the authors in a previous work. Using the insights obtained from the deterministic model, a lower bound on the secrecy capacity region of the 2-user Gaussian Z-IC are obtained. The achievable scheme in this case uses stochastic encoding in addition to cooperative precoding and transmission of a jamming signal. The secure sum generalized degrees of freedom (GDOF) is characterized and shown to be optimal for the weak/moderate interference regime. It is also shown that the secure sum capacity lies within 2 bits/s/Hz of the outer bound for the weak/moderate interference regime for all values of the capacity of the cooperative link. Interestingly, in case of the deterministic model, it is found that there is no penalty on the capacity region of the Z-IC due to the secrecy constraints at the receivers in the weak/moderate interference regimes. Similarly, it is found that there is no loss in the secure sum GDOF for the Gaussian case due to the secrecy constraint at the receiver, in the weak/moderate interference regimes. The results highlight the importance of cooperation in facilitating secure communication over the Z-IC.

preprint2016arXiv

Outer Bounds on the Secrecy Capacity Region of the 2-user Z Interference Channel With Unidirectional Transmitter Cooperation

This paper derives outer bounds on the secrecy capacity region of the 2-user Z interference channel (Z-IC) with rate-limited unidirectional cooperation between the transmitters. First, the model is studied under the linear deterministic setting. The derivation of the outer bounds on the secrecy capacity region involves careful selection of the side information to be provided to the receivers and using the secrecy constraints at the receivers in a judicious manner. To this end, a novel partitioning of the encoded messages and outputs is proposed for the deterministic model based on the strength of interference and signal. The obtained outer bounds are shown to be tight using the achievable scheme derived by the authors in a previous work. Using the insight obtained from the deterministic model, outer bounds on the secrecy capacity region of the 2-user Gaussian Z-IC are obtained. The equivalence between the outer bounds for both the models is also established. It is also shown that secrecy constraint at the receiver does not hurt the capacity region of the 2-user Z-IC for the deterministic model in the weak/moderate interference regime. On the other hand, the outer bounds developed for the Gaussian case shows that secrecy constraint at the receiver can reduce the capacity region for the weak/moderate interference regime. The study of the relative performance of these bounds reveals insight into the fundamental limits of the 2-user Z-IC with limited rate transmitter cooperation.

preprint2015arXiv

Hybrid Full-/Half-Duplex System Analysis in Heterogeneous Wireless Networks

Full-duplex (FD) radio has been introduced for bidirectional communications on the same temporal and spectral resources so as to maximize spectral efficiency. In this paper, motivated by the recent advances in FD radios, we provide a foundation for hybrid-duplex heterogeneous networks (HDHNs), composed of multi-tier networks with a mixture of access points (APs), operating either in bidirectional FD mode or downlink half-duplex (HD) mode. Specifically, we characterize the net- work interference from FD-mode cells, and derive the HDHN throughput by accounting for AP spatial density, self-interference cancellation (IC) capability, and transmission power of APs and users. By quantifying the HDHN throughput, we present the effect of network parameters and the self-IC capability on the HDHN throughput, and show the superiority of FD mode for larger AP densities (i.e., larger network interference and shorter communication distance) or higher self-IC capability. Furthermore, our results show operating all APs in FD or HD achieves higher throughput compared to the mixture of two mode APs in each tier network, and introducing hybrid-duplex for different tier networks improves the heterogenous network throughput.

preprint2015arXiv

Jamming-Aided Secure Communication in Massive MIMO Rician Channels

In this paper, we investigate the artificial noise-aided jamming design for a transmitter equipped with large antenna array in Rician fading channels. We figure out that when the number of transmit antennas tends to infinity, whether the secrecy outage happens in a Rician channel depends on the geometric locations of eavesdroppers. In this light, we first define and analytically describe the secrecy outage region (SOR), indicating all possible locations of an eavesdropper that can cause secrecy outage. After that, the secrecy outage probability (SOP) is derived, and a jamming-beneficial range, i.e., the distance range of eavesdroppers which enables uniform jamming to reduce the SOP, is determined. Then, the optimal power allocation between messages and artificial noise is investigated for different scenarios. Furthermore, to use the jamming power more efficiently and further reduce the SOP, we propose directional jamming that generates jamming signals at selected beams (mapped to physical angles) only, and power allocation algorithms are proposed for the cases with and without the information of the suspicious area, i.e., possible locations of eavesdroppers. We further extend the discussions to multiuser and multi-cell scenarios. At last, numerical results validate our conclusions and show the effectiveness of our proposed jamming power allocation schemes.

Jemin Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention

Joint Service Caching and Computing Resource Allocation for Edge Computing-Enabled Networks

CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

Facing to Latency of Hyperledger Fabric for Blockchain-enabled IoT: Modeling and Analysis

Mobile Edge Computing-Enabled Heterogeneous Networks

Cooperative Caching and Transmission Design in Cluster-Centric Small Cell Networks

On the Secrecy Capacity Region of the 2-user Z Interference Channel with Unidirectional Transmitter Cooperation

Outer Bounds on the Secrecy Capacity Region of the 2-user Z Interference Channel With Unidirectional Transmitter Cooperation

Hybrid Full-/Half-Duplex System Analysis in Heterogeneous Wireless Networks

Jamming-Aided Secure Communication in Massive MIMO Rician Channels