Source author record

Zhizheng Zhang

Zhizheng Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Artificial Intelligence eess.SP Machine Learning Neurons and Cognition

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Dual Aspect Self-Attention based on Transformer for Remaining Useful Life Prediction

Remaining useful life prediction (RUL) is one of the key technologies of condition-based maintenance, which is important to maintain the reliability and safety of industrial equipments. Massive industrial measurement data has effectively improved the performance of the data-driven based RUL prediction method. While deep learning has achieved great success in RUL prediction, existing methods have difficulties in processing long sequences and extracting information from the sensor and time step aspects. In this paper, we propose Dual Aspect Self-attention based on Transformer (DAST), a novel deep RUL prediction method, which is an encoder-decoder structure purely based on self-attention without any RNN/CNN module. DAST consists of two encoders, which work in parallel to simultaneously extract features of different sensors and time steps. Solely based on self-attention, the DAST encoders are more effective in processing long data sequences, and are capable of adaptively learning to focus on more important parts of input. Moreover, the parallel feature extraction design avoids mutual influence of information from two aspects. Experiments on two widely used turbofan engines datasets show that our method significantly outperforms the state-of-the-art RUL prediction methods.

preprint2022arXiv

Image Coding for Machines with Omnipotent Feature Learning

Image Coding for Machines (ICM) aims to compress images for AI tasks analysis rather than meeting human perception. Learning a kind of feature that is both general (for AI tasks) and compact (for compression) is pivotal for its success. In this paper, we attempt to develop an ICM framework by learning universal features while also considering compression. We name such features as omnipotent features and the corresponding framework as Omni-ICM. Considering self-supervised learning (SSL) improves feature generalization, we integrate it with the compression task into the Omni-ICM framework to learn omnipotent features. However, it is non-trivial to coordinate semantics modeling in SSL and redundancy removing in compression, so we design a novel information filtering (IF) module between them by co-optimization of instance distinguishment and entropy minimization to adaptively drop information that is weakly related to AI tasks (e.g., some texture redundancy). Different from previous task-specific solutions, Omni-ICM could directly support AI tasks analysis based on the learned omnipotent features without joint training or extra transformation. Albeit simple and intuitive, Omni-ICM significantly outperforms existing traditional and learning-based codecs on multiple fundamental vision tasks.

preprint2022arXiv

Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation

Unsupervised domain adaptive person re-identification (ReID) has been extensively investigated to mitigate the adverse effects of domain gaps. Those works assume the target domain data can be accessible all at once. However, for the real-world streaming data, this hinders the timely adaptation to changing data statistics and sufficient exploitation of increasing samples. In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID. This is challenging because it requires the model to continuously adapt to unlabeled data in the target environments while alleviating catastrophic forgetting for such a fine-grained person retrieval task. We design an effective scheme for this task, dubbed CLUDA-ReID, where the anti-forgetting is harmoniously coordinated with the adaptation. Specifically, a meta-based Coordinated Data Replay strategy is proposed to replay old data and update the network with a coordinated optimization direction for both adaptation and memorization. Moreover, we propose Relational Consistency Learning for old knowledge distillation/inheritance in line with the objective of retrieval-based tasks. We set up two evaluation settings to simulate the practical application scenarios. Extensive experiments demonstrate the effectiveness of our CLUDA-ReID for both scenarios with stationary target streams and scenarios with dynamic target streams.

preprint2022arXiv

Versatile Learned Video Compression

Learned video compression methods have demonstrated great promise in catching up with traditional video codecs in their rate-distortion (R-D) performance. However, existing learned video compression schemes are limited by the binding of the prediction mode and the fixed network framework. They are unable to support various inter prediction modes and thus inapplicable for various scenarios. In this paper, to break this limitation, we propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes. Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields (i.e., voxel flows) for weighted trilinear warping in spatial-temporal space. The voxel flows convey the information of temporal reference position that helps to decouple inter prediction modes away from framework designing. Secondly, in case of multiple-reference-frame prediction, we apply a flow prediction module to predict accurate motion trajectories with unified polynomial functions. We show that the flow prediction module can largely reduce the transmission cost of voxel flows. Experimental results demonstrate that our proposed VLVC not only supports versatile compression in various settings, but also is the first end-to-end learned video compression method that outperforms the latest VVC/H.266 standard reference software in terms of MS-SSIM.

preprint2021arXiv

Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution

Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image super-resolution (RealSR) problem lies in learning feature representations that are both informative and content-aware. In this paper, we propose an Omni-frequency Region-adaptive Network (ORNet) to address both challenges, here we call features of all low, middle and high frequencies omni-frequency features. Specifically, we start from the frequency perspective and design a Frequency Decomposition (FD) module to separate different frequency components to comprehensively compensate the information lost for real LR image. Then, considering the different regions of real LR image have different frequency information lost, we further design a Region-adaptive Frequency Aggregation (RFA) module by leveraging dynamic convolution and spatial attention to adaptively restore frequency components for different regions. The extensive experiments endorse the effective, and scenario-agnostic nature of our OR-Net for RealSR.

preprint2020arXiv

3-D Context Entropy Model for Improved Practical Image Compression

In this paper, we present our image compression framework designed for CLIC 2020 competition. Our method is based on Variational AutoEncoder (VAE) architecture which is strengthened with residual structures. In short, we make three noteworthy improvements here. First, we propose a 3-D context entropy model which can take advantage of known latent representation in current spatial locations for better entropy estimation. Second, a light-weighted residual structure is adopted for feature learning during entropy estimation. Finally, an effective training strategy is introduced for practical adaptation with different resolutions. Experiment results indicate our image compression method achieves 0.9775 MS-SSIM on CLIC validation set and 0.9809 MS-SSIM on test set.

preprint2020arXiv

Learned Video Compression with Feature-level Residuals

In this paper, we present an end-to-end video compression network for P-frame challenge on CLIC. We focus on deep neural network (DNN) based video compression, and improve the current frameworks from three aspects. First, we notice that pixel space residuals is sensitive to the prediction errors of optical flow based motion compensation. To suppress the relative influence, we propose to compress the residuals of image feature rather than the residuals of image pixels. Furthermore, we combine the advantages of both pixel-level and feature-level residual compression methods by model ensembling. Finally, we propose a step-by-step training strategy to improve the training efficiency of the whole framework. Experiment results indicate that our proposed method achieves 0.9968 MS-SSIM on CLIC validation set and 0.9967 MS-SSIM on test set.

preprint2020arXiv

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification

Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, i.e., pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

preprint2020arXiv

Multi-scale Grouped Dense Network for VVC Intra Coding

Versatile Video Coding (H.266/VVC) standard achieves better image quality when keeping the same bits than any other conventional image codec, such as BPG, JPEG, and etc. However, it is still attractive and challenging to improve the image quality with high compression ratio on the basis of traditional coding techniques. In this paper, we design the multi-scale grouped dense network (MSGDN) to further reduce the compression artifacts by combining the multi-scale and grouped dense block, which are integrated as the post-process network of VVC intra coding. Besides, to improve the subjective quality of compressed image, we also present a generative adversarial network (MSGDN-GAN) by utilizing our MSGDN as generator. Across the extensive experiments on validation set, our MSGDN trained by MSE losses yields the PSNR of 32.622 on average with teams IMC at the bit-rate of 0.15 in Lowrate track. Moreover, our MSGDN-GAN could achieve the better subjective performance.

preprint2020arXiv

Relation-Aware Global Attention for Person Re-identification

For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

preprint2016arXiv

Bi-phase age-related brain gray matter magnetic resonance T1rho relaxation time change

Objectives: To investigate normative value and age-related change of brain magnetic resonance T1rho relaxation at 1.5 T. Methods: 20 males (age: 40.7+/-15.5 years, range: 22-68 years) and 22 females (age: 38.5 +/-14.8 years, range: 21-62 years), were scanned at 1.5 Tesla using 3D fluid suppressed turbo spin echo sequence. Regions-of-interests (ROIs) were obtained by atlas-based tissue segmentation and T1rho was calculated by fitting the mean value to monoexponential model. Correlation between T1rho relaxation of brain gray matter regions and age was investigated. Results: A regional difference among individual gray matter areas was noted; with hippocampus (98.37+/-5.37 msec) and amygdala (94.95+/-4.34 msec) have the highest measurement, while pallidum (83.81+/-5.49) and putamen (83.93+4.76) the lowest measurement. T1rho values decreased slowly (mean slope: -0.256) and significantly (p<0.05) with age in gray matter for subjects younger than 40 years old, while for subjects older than 40 years old there was no significant correlation between T1rho relaxation and age. Conclusion: T1rho relaxation demonstrates a bi-phase change with age in adults of 22-68 years.

preprint2014arXiv

ESmodels: An Epistemic Specification Solver

(To appear in Theory and Practice of Logic Programming (TPLP)) ESmodels is designed and implemented as an experiment platform to investigate the semantics, language, related reasoning algorithms, and possible applications of epistemic specifications.We first give the epistemic specification language of ESmodels and its semantics. The language employs only one modal operator K but we prove that it is able to represent luxuriant modal operators by presenting transformation rules. Then, we describe basic algorithms and optimization approaches used in ESmodels. After that, we discuss possible applications of ESmodels in conformant planning and constraint satisfaction. Finally, we conclude with perspectives.

Zhizheng Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Dual Aspect Self-Attention based on Transformer for Remaining Useful Life Prediction

Image Coding for Machines with Omnipotent Feature Learning

Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation

Versatile Learned Video Compression

Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution

3-D Context Entropy Model for Improved Practical Image Compression

Learned Video Compression with Feature-level Residuals

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification

Multi-scale Grouped Dense Network for VVC Intra Coding

Relation-Aware Global Attention for Person Re-identification

Bi-phase age-related brain gray matter magnetic resonance T1rho relaxation time change

ESmodels: An Epistemic Specification Solver