Source author record

Yiwen Xu

Yiwen Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Multimedia Artificial Intelligence Multiagent Systems

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time decision making attuned to environmental factors through the close integration of computational resources and data sources. Here we provide a comprehensive review of the current state of the art in edge deep learning, focusing on computer vision applications, in particular medical diagnostics. An overview of the foundational principles and technical advantages of edge deep learning is presented, emphasising the capacity of this technology to revolutionise a wide range of domains. Furthermore, we present a novel categorisation of edge hardware platforms based on performance and usage scenarios, facilitating platform selection and operational effectiveness. Following this, we dive into approaches to effectively implement deep neural networks on edge devices, encompassing methods such as lightweight design and model compression. Reviewing practical applications in the fields of computer vision in general and medical diagnostics in particular, we demonstrate the profound impact edge-deployed deep learning models can have in real-life situations. Finally, we provide an analysis of potential future directions and obstacles to the adoption of edge deep learning, with the intention to stimulate further investigations and advancements of intelligent edge deep learning solutions. This survey provides researchers and practitioners with a comprehensive reference shedding light on the critical role deep learning plays in the advancement of edge computing applications.

preprint2022arXiv

Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

In the video coding process, the perceived quality of a compressed video is evaluated by full-reference quality evaluation metrics. However, it is difficult to obtain reference videos with perfect quality. To solve this problem, it is critical to design no-reference compressed video quality assessment algorithms, which assists in measuring the quality of experience on the server side and resource allocation on the network side. Convolutional Neural Network (CNN) has shown its advantage in Video Quality Assessment (VQA) with promising successes in recent years. A large-scale quality database is very important for learning accurate and powerful compressed video quality metrics. In this work, a semi-automatic labeling method is adopted to build a large-scale compressed video quality database, which allows us to label a large number of compressed videos with manageable human workload. The resulting Compressed Video quality database with Semi-Automatic Ratings (CVSAR), so far the largest of compressed video quality database. We train a no-reference compressed video quality assessment model with a 3D CNN for SpatioTemporal Feature Extraction and Evaluation (STFEE). Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics and achieves promising generalization performance in cross-database tests. The CVSAR database and STFEE model will be made publicly available to facilitate reproducible research.

preprint2022arXiv

Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation

The ever-growing multimedia traffic has underscored the importance of effective multimedia codecs. Among them, the up-to-date lossy video coding standard, Versatile Video Coding (VVC), has been attracting attentions of video coding community. However, the gain of VVC is achieved at the cost of significant encoding complexity, which brings the need to realize fast encoder with comparable Rate Distortion (RD) performance. In this paper, we propose to optimize the VVC complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation. At the first stage, we employ the deep convolutional network to extract the spatialtemporal neighboring coding features. Then we fuse all reference features obtained by different convolutional kernels to determine an optimal intra coding depth. At the second stage, we employ a probability-based model and the spatial-temporal coherence to select the candidate partition modes within the optimal coding depth. Finally, these selected depths and partitions are executed whilst unnecessary computations are excluded. Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.

preprint2022arXiv

SSIM-Variation-Based Complexity Optimization for Versatile Video Coding

To date, Versatile Video Coding (VVC) has a more magnificent overall performance than High Efficiency Video Coding (HEVC). The Quadtree with Nested Multi-Type Tree (QTMT) coding block structure can substantially enhance video coding quality in VVC. However, the coding gain also leads to a greater coding complexity. Therefore, this letter proposes a Fast Decision Scheme Based on Structural Similarity Index Metric Variation (FDS-SSIMV) to solve this problem. Firstly, the Structural Similarity Index Metric Variation (SSIMV) characteristic among the sub coding units of the spit mode is illustrated. Next, to evaluate the SSIMV value, SSIMV measure strategies are designed for different split modes in this letter. Then, the desired split modes are selected by the SSIMV values. Experimental results show that the proposed method achieves 64.74\% average encoding Time Saving (TS) with a 2.79\% Bj$\varnothing$ntegaard Delta Bit Rate (BDBR), outperforming the benchmarks.

preprint2022arXiv

Timestamp-independent Haptic-Visual Synchronization

The booming haptic data significantly improves the users'immersion during multimedia interaction. As a result, the study of Haptic,Audio-Visual Environment(HAVE)has attracted attentions of multimedia community. To realize such a system, a challenging tack is the synchronization of multiple sensorial signals that is critical to user experience. Despite of audio-visual synchronization efforts, there is still a lack of haptic-aware multimedia synchronization model. In this work, we propose a timestamp-independent synchronization for haptic-visual signal transmission. First, we exploit the sequential correlations during delivery and playback of a haptic-visual communication system. Second, we develop a key sample extraction of haptic signals based on the force feedback characteristics, and a key frame extraction of visual signals based on deep object detection. Third, we combine the key samples and frames to synchronize the corresponding haptic-visual signals. Without timestamps in signal flow, the proposed method is still effective and more robust to complicated network conditions. Subjective evaluation also shows a significant improvement of user experience with the proposed method.

Yiwen Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation

SSIM-Variation-Based Complexity Optimization for Versatile Video Coding

Timestamp-independent Haptic-Visual Synchronization