Source author record

Guo Lu

Guo Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Computation and Language Computational Geometry cond-mat.stat-mech Machine Learning physics.plasm-ph

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

preprint2026arXiv

Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling

Social platforms have revolutionized information sharing, but also accelerated the dissemination of harmful and policy-violating content. To ensure safety and compliance at scale, moderation systems must go beyond efficiency and offer accuracy and interpretability. However, current approaches largely rely on noisy, label-driven learning, lacking alignment with moderation rules and producing opaque decisions that hinder human review. Therefore, we propose Hierarchical Guard (Hi-Guard), a multimodal moderation framework that introduces a new policy-aligned decision paradigm. The term "Hierarchical" reflects two key aspects of our system design: (1) a hierarchical moderation pipeline, where a lightweight binary model first filters safe content and a stronger model handles fine-grained risk classification; and (2) a hierarchical taxonomy in the second stage, where the model performs path-based classification over a hierarchical taxonomy ranging from coarse to fine-grained levels. To ensure alignment with evolving moderation policies, Hi-Guard directly incorporates rule definitions into the model prompt. To further enhance structured prediction and reasoning, we introduce a multi-level soft-margin reward and optimize with Group Relative Policy Optimization (GRPO), penalizing semantically adjacent misclassifications and improving explanation quality. Extensive experiments and real-world deployment demonstrate that Hi-Guard achieves superior classification accuracy, generalization, and interpretability, paving the way toward scalable, transparent, and trustworthy content safety systems. Code is available at: https://github.com/lianqi1008/Hi-Guard.

preprint2025arXiv

Structure-Guided Allocation of 2D Gaussians for Image Representation and Compression

Recent advances in 2D Gaussian Splatting (2DGS) have demonstrated its potential as a compact image representation with millisecond-level decoding. However, existing 2DGS-based pipelines allocate representation capacity and parameter precision largely oblivious to image structure, limiting their rate-distortion (RD) efficiency at low bitrates. To address this, we propose a structure-guided allocation principle for 2DGS, which explicitly couples image structure with both representation capacity and quantization precision, while preserving native decoding speed. First, we introduce a structure-guided initialization that assigns 2D Gaussians according to spatial structural priors inherent in natural images, yielding a localized and semantically meaningful distribution. Second, during quantization-aware fine-tuning, we propose adaptive bitwidth quantization of covariance parameters, which grants higher precision to small-scale Gaussians in complex regions and lower precision elsewhere, enabling RD-aware optimization, thereby reducing redundancy without degrading edge quality. Third, we impose a geometry-consistent regularization that aligns Gaussian orientations with local gradient directions to better preserve structural details. Extensive experiments demonstrate that our approach substantially improves both the representational power and the RD performance of 2DGS while maintaining over 1000 FPS decoding. Compared with the baseline GSImage, we reduce BD-rate by 43.44% on Kodak and 29.91% on DIV2K.

preprint2022arXiv

Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction

The previous deep video compression approaches only use the single scale motion compensation strategy and rarely adopt the mode prediction technique from the traditional standards like H.264/H.265 for both motion and residual compression. In this work, we first propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation, in which we perform motion estimation, compression and compensation twice in a coarse to fine manner. Our C2F framework can achieve better motion compensation results without significantly increasing bit costs. Observing hyperprior information (i.e., the mean and variance values) from the hyperprior networks contains discriminant statistical information of different patches, we also propose two efficient hyperprior-guided mode prediction methods. Specifically, using hyperprior information as the input, we propose two mode prediction networks to respectively predict the optimal block resolutions for better motion coding and decide whether to skip residual information from each block for better residual coding without introducing additional bit cost while bringing negligible extra computation cost. Comprehensive experimental results demonstrate our proposed C2F video compression framework equipped with the new hyperprior-guided mode prediction methods achieves the state-of-the-art performance on HEVC, UVG and MCL-JCV datasets.

preprint2020arXiv

A Unified End-to-End Framework for Efficient Deep Image Compression

Image compression is a widely used technique to reduce the spatial redundancy in images. Recently, learning based image compression has achieved significant progress by using the powerful representation ability from neural networks. However, the current state-of-the-art learning based image compression methods suffer from the huge computational cost, which limits their capacity for practical applications. In this paper, we propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies, including a channel attention module, a Gaussian mixture model and a decoder-side enhancement module. Specifically, we design an auto-encoder style network for learning based image compression. To improve the coding efficiency, we exploit the channel relationship between latent representations by using the channel attention module. Besides, the Gaussian mixture model is introduced for the entropy model and improves the accuracy for bitrate estimation. Furthermore, we introduce the decoder-side enhancement module to further improve image compression performance. Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance. Simultaneously, our EDIC method boosts the coding performance significantly while bringing slightly increased computational cost. More importantly, experimental results demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods and is up to more than 150 times faster in terms of decoding speed when compared with Minnen's method. The proposed framework also successfully improves the performance of the recent deep video compression system DVC. Our code will be released at https://github.com/liujiaheng/compression.

preprint2020arXiv

Content Adaptive and Error Propagation Aware Deep Video Compression

Recently, learning based video compression methods attract increasing attention. However, the previous works suffer from error propagation due to the accumulation of reconstructed error in inter predictive coding. Meanwhile, the previous learning based video codecs are also not adaptive to different video contents. To address these two problems, we propose a content adaptive and error propagation aware video compression system. Specifically, our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame. Based on the learned long-term temporal information, our approach effectively alleviates error propagation in reconstructed frames. More importantly, instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system. The proposed approach updates the parameters for encoder according to the rate-distortion criterion but keeps the decoder unchanged in the inference stage. Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets. Our method is simple yet effective and outperforms the state-of-the-art learning based video codecs on benchmark datasets without increasing the model size or decreasing the decoding speed.

preprint2020arXiv

Improving Deep Video Compression by Resolution-adaptive Flow Coding

In the learning based video compression approaches, it is an essential issue to compress pixel-level optical flow maps by developing new motion vector (MV) encoders. In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder. To handle complex or simple motion patterns globally, our frame-level scheme RaFC-frame automatically decides the optimal flow map resolution for each video frame. To cope different types of motion patterns locally, our block-level scheme called RaFC-block can also select the optimal resolution for each local block of motion features. In addition, the rate-distortion criterion is applied to both RaFC-frame and RaFC-block and select the optimal motion coding mode for effective flow coding. Comprehensive experiments on four benchmark datasets HEVC, VTL, UVG and MCL-JCV clearly demonstrate the effectiveness of our overall RaFC framework after combing RaFC-frame and RaFC-block for video compression.

preprint2020arXiv

Scope Head for Accurate Localization in Object Detection

Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance. However, they still encounter the design difficulty in hand-crafted 2D anchor definition and the learning complexity in 1D direct location regression. To tackle these issues, in this paper, we propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship. This approach quantizes the prediction space and employs a coarse-to-fine strategy for localization. It achieves superior flexibility as in the regression based anchor-free methods, while produces more precise prediction. Besides, an inherit anchor selection score is learned to indicate the localization quality of the detection result, and we propose to better represent the confidence of a detection box by combining the category-classification score and the anchor-selection score. With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO

preprint2016arXiv

The Thermodynamical Instability Induced by Pressure Ionization in Fluid Helium

A systematic study of pressure ionization is carried out in the chemical picture by the example of fluid helium. By comparing the variants of the chemical model, it is demonstrated that the behavior of pressure ionization depends on the construction of the free energy function. In the chemical model with the Coulomb free energy described by the Padé interpolation formula, thermodynamical instability induced by pressure ionization is found to be manifested by a discontinuous drop or a continuous fall and rise along the pressure-density curve as well as the pressure-temperature curve, which is very much like the first order liquid-liquid phase transition of fluid hydrogen from the first principles simulations. In contrast, in the variant chemical model with the Coulomb free energy term empirically weakened, no thermodynamical instability is induced when pressure ionization occurs, and the resulting equation of state achieves good agreement with the first principles simulations of fluid helium.

preprint2010arXiv

Cluster Identification and Characterization of Physical Fields

The description of complex configuration is a difficult issue. We present a powerful technique for cluster identification and characterization. The scheme is designed to treat with and analyze the experimental and/or simulation data from various methods. Main steps are as follows. We first divide the space using face or volume elements from discrete points. Then, combine the elements with the same and/or similar properties to construct clusters with special physical characterizations. In the algorithm, we adopt administrative structure of hierarchy-tree for spatial bodies such as points, lines, faces, blocks, and clusters. Two fast search algorithms with the complexity are realized. The establishing of the hierarchy-tree and the fast searching of spatial bodies are general, which are independent of spatial dimensions. Therefore, it is easy to extend the skill to other fields. As a verification and validation, we treated with and analyzed some two-dimensional and three-dimensional random data.

Guo Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling

Structure-Guided Allocation of 2D Gaussians for Image Representation and Compression

Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction

A Unified End-to-End Framework for Efficient Deep Image Compression

Content Adaptive and Error Propagation Aware Deep Video Compression

Improving Deep Video Compression by Resolution-adaptive Flow Coding

Scope Head for Accurate Localization in Object Detection

The Thermodynamical Instability Induced by Pressure Ionization in Fluid Helium

Cluster Identification and Characterization of Physical Fields