Source author record

Jiro Katto

Jiro Katto appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.IV Networking and Internet Architecture Computer Vision Machine Learning eess.SP Multimedia

Catalog footprint

What is connected

14works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Transformer-Based MCS Prediction for 5G Multicast-Broadcast Services (MBS)

The deployment of 5G Multicast-Broadcast Services (MBS) is emerging as a critical technology for spectral-efficient UHD content delivery and serving as a promising solution to modernize CATV deployment. However, unlike unicast networks that rely on RLC-AM with HARQ retransmissions, MBS broadcast operates in RLC Unacknowledged Mode (RLC-UM), where the absence of a feedback loop means packet loss is permanent and immediately impacts user QoE. Conventional link adaptation algorithms, designed for unicast, typically aggressively maximize throughput and fail in this risk-intolerant environment, resulting in severe video stalls and rebuffering. To address this, we propose a lightweight Transformer-based framework that predicts the success probability of all 28 MCS indices over an upcoming video segment horizon. Utilizing a unique commercial network dataset with 0.5 ms slot-level granularity, we train our model using a custom Asymmetric Safety Loss function that penalizes channel overestimation to prioritize link stability. Experimental results show that our approach achieves a reliability score of 86.89%, significantly outperforming standard AI baselines optimized for raw throughput (31.65%) while maintaining a safe conservative bias. Furthermore, the model is optimized for real-time applications, demonstrating an inference time of less than 0.07 ms on COTS 5G-era smartphones.

preprint2022arXiv

Learned Lossless Image Compression With Combined Autoregressive Models And Attention Modules

Lossless image compression is an essential research field in image compression. Recently, learning-based image compression methods achieved impressive performance compared with traditional lossless methods, such as WebP, JPEG2000, and FLIF. However, there are still many impressive lossy compression methods that can be applied to lossless compression. Therefore, in this paper, we explore the methods widely used in lossy compression and apply them to lossless compression. Inspired by the impressive performance of the Gaussian mixture model (GMM) shown in lossy compression, we generate a lossless network architecture with GMM. Besides noticing the successful achievements of attention modules and autoregressive models, we propose to utilize attention modules and add an extra autoregressive model for raw images in our network architecture to boost the performance. Experimental results show that our approach outperforms most classical lossless compression methods and existing learning-based methods.

preprint2022arXiv

Memory-Efficient Learned Image Compression with Pruned Hyperprior Module

Learned Image Compression (LIC) gradually became more and more famous in these years. The hyperprior-module-based LIC models have achieved remarkable rate-distortion performance. However, the memory cost of these LIC models is too large to actually apply them to various devices, especially to portable or edge devices. The parameter scale is directly linked with memory cost. In our research, we found the hyperprior module is not only highly over-parameterized, but also its latent representation contains redundant information. Therefore, we propose a novel pruning method named ERHP in this paper to efficiently reduce the memory cost of hyperprior module, while improving the network performance. The experiments show our method is effective, reducing at least 22.6% parameters in the whole model while achieving better rate-distortion performance.

preprint2022arXiv

Pensieve 5G: Implementation of RL-based ABR Algorithm for UHD 4K/8K Content Delivery on Commercial 5G SA/NR-DC Network

While the rollout of the fifth-generation mobile network (5G) is underway across the globe with the intention to deliver 4K/8K UHD videos, Augmented Reality (AR), and Virtual Reality (VR) content to the mass amounts of users, the coverage and throughput are still one of the most significant issues, especially in the rural areas, where only 5G in the low-frequency band are being deployed. This called for a high-performance adaptive bitrate (ABR) algorithm that can maximize the user quality of experience given 5G network characteristics and data rate of UHD contents. Recently, many of the newly proposed ABR techniques were machine-learning based. Among that, Pensieve is one of the state-of-the-art techniques, which utilized reinforcement-learning to generate an ABR algorithm based on observation of past decision performance. By incorporating the context of the 5G network and UHD content, Pensieve has been optimized into Pensieve 5G. New QoE metrics that more accurately represent the QoE of UHD video streaming on the different types of devices were proposed and used to evaluate Pensieve 5G against other ABR techniques including the original Pensieve. The results from the simulation based on the real 5G Standalone (SA) network throughput shows that Pensieve 5G outperforms both conventional algorithms and Pensieve with the average QoE improvement of 8.8% and 14.2%, respectively. Additionally, Pensieve 5G also performed well on the commercial 5G NR-NR Dual Connectivity (NR-DC) Network, despite the training being done solely using the data from the 5G Standalone (SA) network.

preprint2022arXiv

Performance Evaluations of C-Band 5G NR FR1 (Sub-6 GHz) Uplink MIMO on Urban Train

Due to the recent demand for huge Uplink throughput on Mobile networks driven by the rapid development of social media platforms, UHD 4K/8K video, and VR/AR contents, Uplink MIMO (UL-MIMO) has now been deployed on commercial 5G networks with reasonable availability of supported User Equipment (UE) for consumers. By utilizing up to 2 Tx antenna ports, UL-MIMO-capable UE promised to achieve up to two times the uplink throughput in ideal conditions, while providing improved uplink performance over UE with 1Tx in challenging conditions. In Japan, SoftBank, one of the carriers, introduced 5G Standalone (SA) services for the Fixed Wireless Access (FWA) application back in October 2021. Mobile services were commenced in May 2022, which provide UL-MIMO for supported UE on C-Band or Band n77 (3.7 GHz). In this paper, the uplink performance of UL-MIMO-capable UE will be compared against the conventional UL-1Tx UE on trains, which is the most popular method of transportation for the Japanese. The results show that UL-MIMO-capable UE delivers an average of 19.8% better throughput on moving trains with up to 33.5% in the more favorable signal conditions. A moderate relationship between downlink 5G NR SS-RSRP and uplink throughput also has been observed.

preprint2022arXiv

Q-LIC: Quantizing Learned Image Compression with Channel Splitting

Learned image compression (LIC) has reached a comparable coding gain with traditional hand-crafted methods such as VVC intra. However, the large network complexity prohibits the usage of LIC on resource-limited embedded systems. Network quantization is an efficient way to reduce the network burden. This paper presents a quantized LIC (QLIC) by channel splitting. First, we explore that the influence of quantization error to the reconstruction error is different for various channels. Second, we split the channels whose quantization has larger influence to the reconstruction error. After the splitting, the dynamic range of channels is reduced so that the quantization error can be reduced. Finally, we prune several channels to keep the number of overall channels as origin. By using the proposal, in the case of 8-bit quantization for weight and activation of both main and hyper path, we can reduce the BD-rate by 0.61%-4.74% compared with the previous QLIC. Besides, we can reach better coding gain compared with the state-of-the-art network quantization method when quantizing MS-SSIM models. Moreover, our proposal can be combined with other network quantization methods to further improve the coding gain. The moderate coding loss caused by the quantization validates the feasibility of the hardware implementation for QLIC in the future.

preprint2022arXiv

RSSI-CSI Measurement and Variation Mitigation with Commodity WiFi Device

Owing to the plentiful information released by the commodity devices, WiFi signals have been widely studied for various wireless sensing applications. In many works, both received signal strength indicator (RSSI) and the channel state information (CSI) are utilized as the key factors for precise sensing. However, the calculation and relationship between RSSI and CSI is not explained in detail. Furthermore, there are few works focusing on the measurement variation of the WiFi signal which impacts the sensing results. In this paper, the relationship between RSSI and CSI is studied in detail and the measurement variation of amplitude and phase information is investigated by extensive experiments. In the experiments, transmitter and receiver are directly connected by power divider and RF cables and the signal transmission is quantitatively controlled by RF attenuators. By changing the intensity of attenuation, the measurement of RSSI and CSI is carried out under different conditions. From the results, it is found that in order to get a reliable measurement of the signal amplitude and phase by commodity WiFi, the attenuation of the channels should not exceed 60 dB. Meanwhile, the difference between two channels should be lower than 10 dB. An active control mechanism is suggested to ensure the measurement stability. The findings and criteria of this work is promising to facilitate more precise sensing technologies with WiFi signal.

preprint2022arXiv

Streaming-capable High-performance Architecture of Learned Image Compression Codecs

Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-time video streaming encoder-decoder sample application, with the encoder running on an embedded device.

preprint2020arXiv

A Convolutional Neural Network-Based Low Complexity Filter

Convolutional Neural Network (CNN)-based filters have achieved significant performance in video artifacts reduction. However, the high complexity of existing methods makes it difficult to be applied in real usage. In this paper, a CNN-based low complexity filter is proposed. We utilize depth separable convolution (DSC) merged with the batch normalization (BN) as the backbone of our proposed CNN-based network. Besides, a weight initialization method is proposed to enhance the training performance. To solve the well known over smoothing problem for the inter frames, a frame-level residual mapping (RM) is presented. We analyze some of the mainstream methods like frame-level and block-level based filters quantitatively and build our CNN-based filter with frame-level control to avoid the extra complexity and artificial boundaries caused by block-level control. In addition, a novel module called RM is designed to restore the distortion from the learned residuals. As a result, we can effectively improve the generalization ability of the learning-based filter and reach an adaptive filtering effect. Moreover, this module is flexible and can be combined with other learning-based filters. The experimental results show that our proposed method achieves significant BD-rate reduction than H.265/HEVC. It achieves about 1.2% BD-rate reduction and 79.1% decrease in FLOPs than VR-CNN. Finally, the measurement on H.266/VVC and ablation studies are also conducted to ensure the effectiveness of the proposed method.

preprint2020arXiv

End-to-end Learned Image Compression with Fixed Point Weight Quantization

Learned image compression (LIC) has reached the traditional hand-crafted methods such as JPEG2000 and BPG in terms of the coding gain. However, the large model size of the network prohibits the usage of LIC on resource-limited embedded systems. This paper presents a LIC with 8-bit fixed-point weights. First, we quantize the weights in groups and propose a non-linear memory-free codebook. Second, we explore the optimal grouping and quantization scheme. Finally, we develop a novel weight clipping fine tuning scheme. Experimental results illustrate that the coding loss caused by the quantization is small, while around 75% model size can be reduced compared with the 32-bit floating-point anchor. As far as we know, this is the first work to explore and evaluate the LIC fully with fixed-point weights, and our proposed quantized LIC is able to outperform BPG in terms of MS-SSIM.

preprint2020arXiv

Enhanced Intra Prediction for Video Coding by Using Multiple Neural Networks

This paper enhances the intra prediction by using multiple neural network modes (NM). Each NM serves as an end-to-end mapping from the neighboring reference blocks to the current coding block. For the provided NMs, we present two schemes (appending and substitution) to integrate the NMs with the traditional modes (TM) defined in high efficiency video coding (HEVC). For the appending scheme, each NM is corresponding to a certain range of TMs. The categorization of TMs is based on the expected prediction errors. After determining the relevant TMs for each NM, we present a probability-aware mode signaling scheme. The NMs with higher probabilities to be the best mode are signaled with fewer bits. For the substitution scheme, we propose to replace the highest and lowest probable TMs. New most probable mode (MPM) generation method is also employed when substituting the lowest probable TMs. Experimental results demonstrate that using multiple NMs will improve the coding efficiency apparently compared with the single NM. Specifically, proposed appending scheme with seven NMs can save 2.6%, 3.8%, 3.1% BD-rate for Y, U, V components compared with using single NM in the state-of-the-art works.

preprint2020arXiv

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. This project page is at this https URL https://github.com/ZhengxueCheng/Learned-Image-Compression-with-GMM-and-Attention

preprint2020arXiv

Learned Lossless Image Compression with a HyperPrior and Discretized Gaussian Mixture Likelihoods

Lossless image compression is an important task in the field of multimedia communication. Traditional image codecs typically support lossless mode, such as WebP, JPEG2000, FLIF. Recently, deep learning based approaches have started to show the potential at this point. HyperPrior is an effective technique proposed for lossy image compression. This paper generalizes the hyperprior from lossy model to lossless compression, and proposes a L2-norm term into the loss function to speed up training procedure. Besides, this paper also investigated different parameterized models for latent codes, and propose to use Gaussian mixture likelihoods to achieve adaptive and flexible context models. Experimental results validate our method can outperform existing deep learning based lossless compression, and outperform the JPEG2000 and WebP for JPG images.

preprint2020arXiv

Low Bitrate Image Compression with Discretized Gaussian Mixture Likelihoods

In this paper, we provide a detailed description on our submitted method Kattolab to Workshop and Challenge on Learned Image Compression (CLIC) 2020. Our method mainly incorporates discretized Gaussian Mixture Likelihoods to previous state-of-the-art learned compression algorithms. Besides, we also describes the acceleration strategies and bit optimization with the low-rate constraint. Experimental results have demonstrated that our approach Kattolab achieves 0.9761 and 0.9802 in terms of MS-SSIM at the rate constraint of 0.15bpp during the validation phase and test phase, respectively. This project page is at this https URL https://github.com/ZhengxueCheng/Learned-Image-Compression-with-GMM-and-Attention

Jiro Katto

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Transformer-Based MCS Prediction for 5G Multicast-Broadcast Services (MBS)

Learned Lossless Image Compression With Combined Autoregressive Models And Attention Modules

Memory-Efficient Learned Image Compression with Pruned Hyperprior Module

Pensieve 5G: Implementation of RL-based ABR Algorithm for UHD 4K/8K Content Delivery on Commercial 5G SA/NR-DC Network

Performance Evaluations of C-Band 5G NR FR1 (Sub-6 GHz) Uplink MIMO on Urban Train

Q-LIC: Quantizing Learned Image Compression with Channel Splitting

RSSI-CSI Measurement and Variation Mitigation with Commodity WiFi Device

Streaming-capable High-performance Architecture of Learned Image Compression Codecs

A Convolutional Neural Network-Based Low Complexity Filter

End-to-end Learned Image Compression with Fixed Point Weight Quantization

Enhanced Intra Prediction for Video Coding by Using Multiple Neural Networks

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

Learned Lossless Image Compression with a HyperPrior and Discretized Gaussian Mixture Likelihoods

Low Bitrate Image Compression with Discretized Gaussian Mixture Likelihoods