Source author record

Jinjia Zhou

Jinjia Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Multimedia Graphics Machine Learning

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Coarse Semantic Injection for LLM-Conditioned Structured Indoor Prediction

Large language models (LLMs) have recently been used as structured decoders for indoor understanding from 3D point-token inputs. However, point cloud encoders often under-represent thin structural elements such as doors and windows after voxelization and sparse pooling, and may miss individual furniture instances in cluttered scenes. We propose an interface-preserving semantic augmentation for LLM-conditioned structured decoding. The key idea is to associate semantic evidence with the point-cloud representation, reduce it to a coarse four-group code (furniture, walls, openings, and others), and encode it as an RGBB point interface: red for furniture, green for walls, blue for openings, and black for others, where RGBB denotes four semantic color states represented in three RGB channels rather than an additional fourth channel. This semantic color code is appended to the original raw point attributes before tokenization, so geometry and semantics share the same sparse tokenization path while the downstream language model decoder and output serialization remain unchanged. We further introduce a lightweight routed semantic shift module, with an auxiliary head used only for training-time ratio/budget regularization and analysis, to strengthen semantic cues after sparse pooling. The overall pipeline can use RGB-derived semantic evidence. Under these controlled semantic-source settings, the reported metrics improve across Structured3D, the SpatialLM dataset, and ARKitScenes, especially for opening localization and per-instance furniture detection in cluttered scenes. Ablations clarify the roles of semantic source, color coding, token fusion, and shift injection, while also showing that color/entropy effects remain nontrivial.

preprint2021arXiv

B-DRRN: A Block Information Constrained Deep Recursive Residual Network for Video Compression Artifacts Reduction

Although the video compression ratio nowadays becomes higher, the video coders such as H.264/AVC, H.265/HEVC, H.266/VVC always suffer from the video artifacts. In this paper, we design a neural network to enhance the quality of the compressed frame by leveraging the block information, called B-DRRN (Deep Recursive Residual Network with Block information). Firstly, an extra network branch is designed for leveraging the block information of the coding unit (CU). Moreover, to avoid a great increase in the network size, Recursive Residual structure and sharing weight techniques are applied. We also conduct a new large-scale dataset with 209,152 training samples. Experimental results show that the proposed B-DRRN can reduce 6.16% BD-rate compared to the HEVC standard. After efficiently adding an extra network branch, this work can improve the performance of the main network without increasing any memory for storing.

preprint2021arXiv

Deep Preset: Blending and Retouching Photos with Color Style Transfer

End-users, without knowledge in photography, desire to beautify their photos to have a similar color style as a well-retouched reference. However, the definition of style in recent image style transfer works is inappropriate. They usually synthesize undesirable results due to transferring exact colors to the wrong destination. It becomes even worse in sensitive cases such as portraits. In this work, we concentrate on learning low-level image transformation, especially color-shifting methods, rather than mixing contextual features, then present a novel scheme to train color style transfer with ground-truth. Furthermore, we propose a color style transfer named Deep Preset. It is designed to 1) generalize the features representing the color transformation from content with natural colors to retouched reference, then blend it into the contextual features of content, 2) predict hyper-parameters (settings or preset) of the applied low-level color transformation methods, 3) stylize content to have a similar color style as reference. We script Lightroom, a powerful tool in editing photos, to generate 600,000 training samples using 1,200 images from the Flick2K dataset and 500 user-generated presets with 69 settings. Experimental results show that our Deep Preset outperforms the previous works in color style transfer quantitatively and qualitatively.

preprint2021arXiv

Image Compression with Encoder-Decoder Matched Semantic Segmentation

In recent years, layered image compression is demonstrated to be a promising direction, which encodes a compact representation of the input image and apply an up-sampling network to reconstruct the image. To further improve the quality of the reconstructed image, some works transmit the semantic segment together with the compressed image data. Consequently, the compression ratio is also decreased because extra bits are required for transmitting the semantic segment. To solve this problem, we propose a new layered image compression framework with encoder-decoder matched semantic segmentation (EDMS). And then, followed by the semantic segmentation, a special convolution neural network is used to enhance the inaccurate semantic segment. As a result, the accurate semantic segment can be obtained in the decoder without requiring extra bits. The experimental results show that the proposed EDMS framework can get up to 35.31% BD-rate reduction over the HEVC-based (BPG) codec, 5% bitrate, and 24% encoding time saving compare to the state-of-the-art semantic-based image codec.

preprint2021arXiv

RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling Based Video Coding

Integrating deep learning techniques into the video coding framework gains significant improvement compared to the standard compression techniques, especially applying super-resolution (up-sampling) to down-sampling based video coding as post-processing. However, besides up-sampling degradation, the various artifacts brought from compression make super-resolution problem more difficult to solve. The straightforward solution is to integrate the artifact removal techniques before super-resolution. However, some helpful features may be removed together, degrading the super-resolution performance. To address this problem, we proposed an end-to-end restoration-reconstruction deep neural network (RR-DnCNN) using the degradation-aware technique, which entirely solves degradation from compression and sub-sampling. Besides, we proved that the compression degradation produced by Random Access configuration is rich enough to cover other degradation types, such as Low Delay P and All Intra, for training. Since the straightforward network RR-DnCNN with many layers as a chain has poor learning capability suffering from the gradient vanishing problem, we redesign the network architecture to let reconstruction leverages the captured features from restoration using up-sampling skip connections. Our novel architecture is called restoration-reconstruction u-shaped deep neural network (RR-DnCNN v2.0). As a result, our RR-DnCNN v2.0 outperforms the previous works and can attain 17.02% BD-rate reduction on UHD resolution for all-intra anchored by the standard H.265/HEVC. The source code is available at https://minhmanho.github.io/rrdncnn/.

Jinjia Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Coarse Semantic Injection for LLM-Conditioned Structured Indoor Prediction

B-DRRN: A Block Information Constrained Deep Recursive Residual Network for Video Compression Artifacts Reduction

Deep Preset: Blending and Retouching Photos with Color Style Transfer

Image Compression with Encoder-Decoder Matched Semantic Segmentation

RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling Based Video Coding