Source author record

Wei Zhou

Wei Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

75works

38topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

AI applications are increasingly being introduced into digital health. While technical performance has advanced rapidly, successful deployment mainly depends on consumer attitudes, especially to patient-facing applications. However, most existing research examines consumer attitudes towards healthcare AI at an abstract level rather than in response to concrete artefacts. We report a mixed-methods survey study in Australia (N=275) examining consumer readiness, acceptance, trust, and risk perceptions of healthcare AI, combined with a scenario-based evaluation of an AI-generated versus clinician-written consultation summary. Participants expressed moderate optimism and strong perceived usefulness and ease of use, but also substantial concerns about accuracy, safety, and data use. In the scenario task, the AI-generated summary was strongly preferred for quality, empathy, and overall usefulness, yet identification of the AI summary was near chance. Findings show that consumers judge AI through concrete communication quality and visible human governance, underscoring the need for clinically supervised deployment frameworks beyond technical performance alone.

preprint2025arXiv

Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM

Remote sensing change understanding (RSCU) is essential for analyzing remote sensing images and understanding how human activities affect the environment. However, existing datasets lack deep understanding and interactions in the diverse change captioning, counting, and localization tasks. To tackle these gaps, we construct ChangeIMTI, a new large-scale interactive multi-task instruction dataset that encompasses four complementary tasks including change captioning, binary change classification, change counting, and change localization. Building upon this new dataset, we further design a novel vision-guided vision-language model (ChangeVG) with dual-granularity awareness for bi-temporal remote sensing images (i.e., two remote sensing images of the same area at different times). The introduced vision-guided module is a dual-branch architecture that synergistically combines fine-grained spatial feature extraction with high-level semantic summarization. These enriched representations further serve as the auxiliary prompts to guide large vision-language models (VLMs) (e.g., Qwen2.5-VL-7B) during instruction tuning, thereby facilitating the hierarchical cross-modal learning. We extensively conduct experiments across four tasks to demonstrate the superiority of our approach. Remarkably, on the change captioning task, our method outperforms the strongest method Semantic-CC by 1.39 points on the comprehensive S*m metric, which integrates the semantic similarity and descriptive accuracy to provide an overall evaluation of change caption. Moreover, we also perform a series of ablation studies to examine the critical components of our method. The source code and associated data for this work are publicly available at Github.

preprint2024arXiv

Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.

preprint2022arXiv

A Brief Survey on Adaptive Video Streaming Quality Assessment

Quality of experience (QoE) assessment for adaptive video streaming plays a significant role in advanced network management systems. It is especially challenging in case of dynamic adaptive streaming schemes over HTTP (DASH) which has increasingly complex characteristics including additional playback issues. In this paper, we provide a brief overview of adaptive video streaming quality assessment. Upon our review of related works, we analyze and compare different variations of objective QoE assessment models with or without using machine learning techniques for adaptive video streaming. Through the performance analysis, we observe that hybrid models perform better than both quality-of-service (QoS) driven QoE approaches and signal fidelity measurement. Moreover, the machine learning-based model slightly outperforms the model without using machine learning for the same setting. In addition, we find that existing video streaming QoE assessment models still have limited performance, which makes it difficult to be applied in practical communication systems. Therefore, based on the success of deep learned feature representations for traditional video quality prediction, we also apply the off-the-shelf deep convolutional neural network (DCNN) to evaluate the perceptual quality of streaming videos, where the spatio-temporal properties of streaming videos are taken into consideration. Experiments demonstrate its superiority, which sheds light on the future development of specifically designed deep learning frameworks for adaptive video streaming quality assessment. We believe this survey can serve as a guideline for QoE assessment of adaptive video streaming.

preprint2022arXiv

Blind Quality Assessment of 3D Dense Point Clouds with Structure Guided Resampling

Objective quality assessment of 3D point clouds is essential for the development of immersive multimedia systems in real-world applications. Despite the success of perceptual quality evaluation for 2D images and videos, blind/no-reference metrics are still scarce for 3D point clouds with large-scale irregularly distributed 3D points. Therefore, in this paper, we propose an objective point cloud quality index with Structure Guided Resampling (SGR) to automatically evaluate the perceptually visual quality of 3D dense point clouds. The proposed SGR is a general-purpose blind quality assessment method without the assistance of any reference information. Specifically, considering that the human visual system (HVS) is highly sensitive to structure information, we first exploit the unique normal vectors of point clouds to execute regional pre-processing which consists of keypoint resampling and local region construction. Then, we extract three groups of quality-related features, including: 1) geometry density features; 2) color naturalness features; 3) angular consistency features. Both the cognitive peculiarities of the human brain and naturalness regularity are involved in the designed quality-aware features that can capture the most vital aspects of distorted 3D point clouds. Extensive experiments on several publicly available subjective point cloud quality databases validate that our proposed SGR can compete with state-of-the-art full-reference, reduced-reference, and no-reference quality assessment algorithms.

preprint2022arXiv

Deep Decomposition and Bilinear Pooling Network for Blind Night-Time Image Quality Evaluation

Blind image quality assessment (BIQA), which aims to accurately predict the image quality without any pristine reference information, has been extensively concerned in the past decades. Especially, with the help of deep neural networks, great progress has been achieved. However, it remains less investigated on BIQA for night-time images (NTIs) which usually suffers from complicated authentic distortions such as reduced visibility, low contrast, additive noises, and color distortions. These diverse authentic degradations particularly challenges the design of effective deep neural network for blind NTI quality evaluation (NTIQE). In this paper, we propose a novel deep decomposition and bilinear pooling network (DDB-Net) to better address this issue. The DDB-Net contains three modules, i.e., an image decomposition module, a feature encoding module, and a bilinear pooling module. The image decomposition module is inspired by the Retinex theory and involves decoupling the input NTI into an illumination layer component responsible for illumination information and a reflection layer component responsible for content information. Then, the feature encoding module involves learning feature representations of degradations that are rooted in the two decoupled components separately. Finally, by modeling illumination-related and content-related degradations as two-factor variations, the two feature sets are bilinearly pooled together to form a unified representation for quality prediction. The superiority of the proposed DDB-Net has been well validated by extensive experiments on several benchmark datasets. The source code will be made available soon.

preprint2022arXiv

FasterX: Real-Time Object Detection Based on Edge GPUs for UAV Applications

Real-time object detection on Unmanned Aerial Vehicles (UAVs) is a challenging issue due to the limited computing resources of edge GPU devices as Internet of Things (IoT) nodes. To solve this problem, in this paper, we propose a novel lightweight deep learning architectures named FasterX based on YOLOX model for real-time object detection on edge GPU. First, we design an effective and lightweight PixSF head to replace the original head of YOLOX to better detect small objects, which can be further embedded in the depthwise separable convolution (DS Conv) to achieve a lighter head. Then, a slimmer structure in the Neck layer termed as SlimFPN is developed to reduce parameters of the network, which is a trade-off between accuracy and speed. Furthermore, we embed attention module in the Head layer to improve the feature extraction effect of the prediction head. Meanwhile, we also improve the label assignment strategy and loss function to alleviate category imbalance and box optimization problems of the UAV dataset. Finally, auxiliary heads are presented for online distillation to improve the ability of position embedding and feature extraction in PixSF head. The performance of our lightweight models are validated experimentally on the NVIDIA Jetson NX and Jetson Nano GPU embedded platforms.Extensive experiments show that FasterX models achieve better trade-off between accuracy and latency on VisDrone2021 dataset compared to state-of-the-art models.

preprint2022arXiv

Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information. Recent works have been devoted to leveraging text summarization and have achieved promising results. However, these summarization-based methods did not take full advantage of the summary including ignoring the inherent interactions between the summary and document. As a result, they limited the representation to express major points in the document, which is highly indicative of the key sentiment. In this paper, we study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA. A Hierarchical Interaction Networks (HIN) is proposed to explore bidirectional interactions between the summary and document at multiple granularities and learn subject-oriented document representations for sentiment classification. Furthermore, we design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation. We extensively evaluate our proposed models on three public datasets. The experimental results consistently demonstrate the effectiveness of our proposed models and show that HIN-SR outperforms various state-of-the-art methods.

preprint2022arXiv

Interfacial Mixing Effect in a Promising Skyrmionic Material: Ferrimagnetic Mn$_4$N

Interfacial mixing of elements is a well-known phenomenon found in thin film deposition. For thin-film magnetic heterostructures, interfacial compositional inhomogeneities can have drastic effects on the resulting functionalities. As such, care must be taken to characterize the compositional and magnetic properties of thin films intended for device use. Recently, ferrimagnetic Mn$_4$N thin films have drawn considerable interest due to exhibiting perpendicular magnetic anisotropy, high domain-wall mobility, and good thermal stability. In this study, we employed X-ray photoelectron spectroscopy (XPS) and polarized neutron reflectometry (PNR) measurements to investigate the interfaces of an epitaxially-grown MgO/Mn$_4$N/Pt trilayer deposited at 450 $^{\circ}$C. XPS revealed the thickness of elemental mixing regions of near 5 nm at both interfaces. Using PNR, we found that these interfaces exhibit essentially zero net magnetization at room temperature. Despite the high-temperature deposition at 450 $^{\circ}$C, the thickness of mixing regions is comparable to those observed in magnetic films deposited at room temperature. Micromagnetic simulations show that this interfacial mixing should not deter the robust formation of small skyrmions, consistent with a recent experiment. The results obtained are encouraging in terms of the potential of integrating thermally stable Mn$_4$N into future spintronic devices.

preprint2022arXiv

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input. Pre-training only with phonemes as input can alleviate the input mismatch but lack the ability to model rich representations and semantic information due to limited phoneme vocabulary. In this paper, we propose MixedPhoneme BERT, a novel variant of the BERT model that uses mixed phoneme and sup-phoneme representations to enhance the learning capability. Specifically, we merge the adjacent phonemes into sup-phonemes and combine the phoneme sequence and the merged sup-phoneme sequence as the model input, which can enhance the model capacity to learn rich contextual representations. Experiment results demonstrate that our proposed Mixed-Phoneme BERT significantly improves the TTS performance with 0.30 CMOS gain compared with the FastSpeech 2 baseline. The Mixed-Phoneme BERT achieves 3x inference speedup and similar voice quality to the previous TTS pre-trained model PnG BERT

preprint2022arXiv

Multi-Granularity Semantic Aware Graph Model for Reducing Position Bias in Emotion-Cause Pair Extraction

The Emotion-Cause Pair Extraction (ECPE) task aims to extract emotions and causes as pairs from documents. We observe that the relative distance distribution of emotions and causes is extremely imbalanced in the typical ECPE dataset. Existing methods have set a fixed size window to capture relations between neighboring clauses. However, they neglect the effective semantic connections between distant clauses, leading to poor generalization ability towards position-insensitive data. To alleviate the problem, we propose a novel Multi-Granularity Semantic Aware Graph model (MGSAG) to incorporate fine-grained and coarse-grained semantic features jointly, without regard to distance limitation. In particular, we first explore semantic dependencies between clauses and keywords extracted from the document that convey fine-grained semantic features, obtaining keywords enhanced clause representations. Besides, a clause graph is also established to model coarse-grained semantic relations between clauses. Experimental results indicate that MGSAG surpasses the existing state-of-the-art ECPE models. Especially, MGSAG outperforms other models significantly in the condition of position-insensitive data.

preprint2022arXiv

No-Reference Light Field Image Quality Assessment Based on Spatial-Angular Measurement

Light field image quality assessment (LFI-QA) is a significant and challenging research problem. It helps to better guide light field acquisition, processing and applications. However, only a few objective models have been proposed and none of them completely consider intrinsic factors affecting the LFI quality. In this paper, we propose a No-Reference Light Field image Quality Assessment (NR-LFQA) scheme, where the main idea is to quantify the LFI quality degradation through evaluating the spatial quality and angular consistency. We first measure the spatial quality deterioration by capturing the naturalness distribution of the light field cyclopean image array, which is formed when human observes the LFI. Then, as a transformed representation of LFI, the Epipolar Plane Image (EPI) contains the slopes of lines and involves the angular information. Therefore, EPI is utilized to extract the global and local features from LFI to measure angular consistency degradation. Specifically, the distribution of gradient direction map of EPI is proposed to measure the global angular consistency distortion in the LFI. We further propose the weighted local binary pattern to capture the characteristics of local angular consistency degradation. Extensive experimental results on four publicly available LFI quality datasets demonstrate that the proposed method outperforms state-of-the-art 2D, 3D, multi-view, and LFI quality assessment algorithms.

preprint2022arXiv

On Language Model Integration for RNN Transducer based Speech Recognition

The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to remove this sequence prior as ILM correction. In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction, which is further experimentally verified with detailed analysis. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer, which enables a theoretical justification for other ILM approaches. Systematic comparison is conducted for both in-domain and cross-domain evaluation on the Librispeech and TED-LIUM Release 2 corpora, respectively. Our proposed exact-ILM training can further improve the best ILM method.

preprint2022arXiv

Quality Assessment of Image Super-Resolution: Balancing Deterministic and Statistical Fidelity

There has been a growing interest in developing image super-resolution (SR) algorithms that convert low-resolution (LR) to higher resolution images, but automatically evaluating the visual quality of super-resolved images remains a challenging problem. Here we look at the problem of SR image quality assessment (SR IQA) in a two-dimensional (2D) space of deterministic fidelity (DF) versus statistical fidelity (SF). This allows us to better understand the advantages and disadvantages of existing SR algorithms, which produce images at different clusters in the 2D space of (DF, SF). Specifically, we observe an interesting trend from more traditional SR algorithms that are typically inclined to optimize for DF while losing SF, to more recent generative adversarial network (GAN) based approaches that by contrast exhibit strong advantages in achieving high SF but sometimes appear weak at maintaining DF. Furthermore, we propose an uncertainty weighting scheme based on content-dependent sharpness and texture assessment that merges the two fidelity measures into an overall quality prediction named the Super Resolution Image Fidelity (SRIF) index, which demonstrates superior performance against state-of-the-art IQA models when tested on subject-rated datasets.

preprint2022arXiv

Rank-Sensitive Computation of the Rank Profile of a Polynomial Matrix

Consider a matrix $\mathbf{F} \in \mathbb{K}[x]^{m \times n}$ of univariate polynomials over a field $\mathbb{K}$. We study the problem of computing the column rank profile of $\mathbf{F}$. To this end we first give an algorithm which improves the minimal kernel basis algorithm of Zhou, Labahn, and Storjohann (Proceedings ISSAC 2012). We then provide a second algorithm which computes the column rank profile of $\mathbf{F}$ with a rank-sensitive complexity of $O\tilde{~}(r^{ω-2} n (m+D))$ operations in $\mathbb{K}$. Here, $D$ is the sum of row degrees of $\mathbf{F}$, $ω$ is the exponent of matrix multiplication, and $O\tilde{~}(\cdot)$ hides logarithmic factors.

preprint2022arXiv

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Coronary CT Angiography (CCTA) is susceptible to various distortions (e.g., artifacts and noise), which severely compromise the exact diagnosis of cardiovascular diseases. The appropriate CCTA Vessel-level Image Quality Assessment (CCTA VIQA) algorithm can be used to reduce the risk of error diagnosis. The primary challenges of CCTA VIQA are that the local part of coronary that determines final quality is hard to locate. To tackle the challenge, we formulate CCTA VIQA as a multiple-instance learning (MIL) problem, and exploit Transformer-based MIL backbone (termed as T-MIL) to aggregate the multiple instances along the coronary centerline into the final quality. However, not all instances are informative for final quality. There are some quality-irrelevant/negative instances intervening the exact quality assessment(e.g., instances covering only background or the coronary in instances is not identifiable). Therefore, we propose a Progressive Reinforcement learning based Instance Discarding module (termed as PRID) to progressively remove quality-irrelevant/negative instances for CCTA VIQA. Based on the above two modules, we propose a Reinforced Transformer Network (RTN) for automatic CCTA VIQA based on end-to-end optimization. Extensive experimental results demonstrate that our proposed method achieves the state-of-the-art performance on the real-world CCTA dataset, exceeding previous MIL methods by a large margin.

preprint2022arXiv

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED.

preprint2022arXiv

Tensor Oriented No-Reference Light Field Image Quality Assessment

Light field image (LFI) quality assessment is becoming more and more important, which helps to better guide the acquisition, processing and application of immersive media. However, due to the inherent high dimensional characteristics of LFI, the LFI quality assessment turns into a multi-dimensional problem that requires consideration of the quality degradation in both spatial and angular dimensions. Therefore, we propose a novel Tensor oriented No-reference Light Field image Quality evaluator (Tensor-NLFQ) based on tensor theory. Specifically, since the LFI is regarded as a low-rank 4D tensor, the principal components of four oriented sub-aperture view stacks are obtained via Tucker decomposition. Then, the Principal Component Spatial Characteristic (PCSC) is designed to measure the spatial-dimensional quality of LFI considering its global naturalness and local frequency properties. Finally, the Tensor Angular Variation Index (TAVI) is proposed to measure angular consistency quality by analyzing the structural similarity distribution between the first principal component and each view in the view stack. Extensive experimental results on four publicly available LFI quality databases demonstrate that the proposed Tensor-NLFQ model outperforms state-of-the-art 2D, 3D, multi-view, and LFI quality assessment algorithms.

preprint2022arXiv

Topmetal-M: a novel pixel sensor for compact tracking applications

The Topmetal-M is a large area pixel sensor (18 mm * 23 mm) prototype fabricated in a new 130 nm high-resistivity CMOS process in 2019. It contains 400 rows * 512 columns square pixels with the pitch of 40 μm. In Topmetal-M, a novel charge collection method combing the Monolithic Active Pixel Sensor (MAPS) and the Topmetal sensor has been proposed for the first time. Both the ionized charge deposited by the particle in the sensor and along the track over the sensor can be collected. The in-pixel circuit mainly consists of a low-noise charge sensitive amplifier to establish the signal for the energy reconstruction, and a discriminator with a Time-to-Amplitude Converter (TAC) for the Time of Arrival (TOA) measurement. With this mechanism, the trajectory, particle hit position, energy and arrival time of the particle can be measured. The analog signal from each pixel is accessible through time-shared multiplexing over the entire pixel array. This paper will discuss the design and preliminary test results of the Topmetal-M sensor.

preprint2022arXiv

Unsupervised Segmentation for Terracotta Warrior Point Cloud (SRG-Net)

The repairing work of terracotta warriors in Emperor Qinshihuang Mausoleum Site Museum is handcrafted by experts, and the increasing amounts of unearthed pieces of terracotta warriors make the archaeologists too challenging to conduct the restoration of terracotta warriors efficiently. We hope to segment the 3D point cloud data of the terracotta warriors automatically and store the fragment data in the database to assist the archaeologists in matching the actual fragments with the ones in the database, which could result in higher repairing efficiency of terracotta warriors. Moreover, the existing 3D neural network research is mainly focusing on supervised classification, clustering, unsupervised representation, and reconstruction. There are few pieces of researches concentrating on unsupervised point cloud part segmentation. In this paper, we present SRG-Net for 3D point clouds of terracotta warriors to address these problems. Firstly, we adopt a customized seed-region-growing algorithm to segment the point cloud coarsely. Then we present a supervised segmentation and unsupervised reconstruction networks to learn the characteristics of 3D point clouds. Finally, we combine the SRG algorithm with our improved CNN(convolution neural network) using a refinement method. This pipeline is called SRG-Net, which aims at conducting segmentation tasks on the terracotta warriors. Our proposed SRG-Net is evaluated on the terracotta warrior data and ShapeNet dataset by measuring the accuracy and the latency. The experimental results show that our SRG-Net outperforms the state-of-the-art methods. Our code is available at https://github.com/hyoau/SRG-Net.

preprint2021arXiv

Beyond Statistical Relations: Integrating Knowledge Relations into Style Correlations for Multi-Label Music Style Classification

Automatically labeling multiple styles for every song is a comprehensive application in all kinds of music websites. Recently, some researches explore review-driven multi-label music style classification and exploit style correlations for this task. However, their methods focus on mining the statistical relations between different music styles and only consider shallow style relations. Moreover, these statistical relations suffer from the underfitting problem because some music styles have little training data. To tackle these problems, we propose a novel knowledge relations integrated framework (KRF) to capture the complete style correlations, which jointly exploits the inherent relations between music styles according to external knowledge and their statistical relations. Based on the two types of relations, we use a graph convolutional network to learn the deep correlations between styles automatically. Experimental results show that our framework significantly outperforms state-of-the-art methods. Further studies demonstrate that our framework can effectively alleviate the underfitting problem and learn meaningful style correlations. The source code can be available at https://github.com/Makwen1995/MusicGenre.

preprint2021arXiv

Lateral contact yields longitudinal cohesion in active undulatory systems

Many animals and robots move using undulatory motion of their bodies. When in close proximity undulatory motion can lead to novel collective behaviors such as gait synchronization, spatial reconfiguration, and clustering. Here we study the role of contact interactions between model undulatory swimmers: three-link robots in experiment and multi-link robots in simulation. The undulatory gait of each swimmer is generated through a time-dependent sinusoidal-like waveform which has a fixed phase offset, $ϕ$. By varying the phase relationship between neighboring swimmers we seek to study how contact forces and spatial configurations are governed by the phase difference between neighboring swimmers. We find that undulatory actuation in close proximity drives neighboring swimmers into spatial equilibrium configurations that depend on the actuation phase difference. We propose a model for spatial equilibrium of nearest neighbor undulatory swimmers which we call the gait compatibility condition, which is the set of spatial and gait configurations in which no collisions occur. Robotic experiments with two, three, and four swimmers exhibit good agreement with the compatibility model. To probe the interaction potential between undulatory swimmers we perturb the each longitudinally from their equilibrium configurations and we measure their steady-state displacement. These studies reveal that undulatory swimmers in close proximity exhibit cohesive longitudinal interaction forces that drive the swimmers from incompatible to compatible configurations. This system of undulatory swimmers provides new insight into active-matter systems which move through body undulation. In addition to the importance of velocity and orientation coherence in active-matter swarms, we demonstrate that undulatory phase coherence is also important for generating stable, cohesive group configurations.

preprint2021arXiv

No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness

360-degree/omnidirectional images (OIs) have achieved remarkable attentions due to the increasing applications of virtual reality (VR). Compared to conventional 2D images, OIs can provide more immersive experience to consumers, benefitting from the higher resolution and plentiful field of views (FoVs). Moreover, observing OIs is usually in the head mounted display (HMD) without references. Therefore, an efficient blind quality assessment method, which is specifically designed for 360-degree images, is urgently desired. In this paper, motivated by the characteristics of the human visual system (HVS) and the viewing process of VR visual contents, we propose a novel and effective no-reference omnidirectional image quality assessment (NR OIQA) algorithm by Multi-Frequency Information and Local-Global Naturalness (MFILGN). Specifically, inspired by the frequency-dependent property of visual cortex, we first decompose the projected equirectangular projection (ERP) maps into wavelet subbands. Then, the entropy intensities of low and high frequency subbands are exploited to measure the multi-frequency information of OIs. Besides, except for considering the global naturalness of ERP maps, owing to the browsed FoVs, we extract the natural scene statistics features from each viewport image as the measure of local naturalness. With the proposed multi-frequency information measurement and local-global naturalness measurement, we utilize support vector regression as the final image quality regressor to train the quality evaluation model from visual quality-related features to human ratings. To our knowledge, the proposed model is the first no-reference quality assessment method for 360-degreee images that combines multi-frequency information and image naturalness. Experimental results on two publicly available OIQA databases demonstrate that our proposed MFILGN outperforms state-of-the-art approaches.

preprint2021arXiv

Unsupervised Segmentation for Terracotta Warrior with Seed-Region-Growing CNN (SRG-Net)

The repairing work of terracotta warriors in Emperor Qinshihuang Mausoleum Site Museum is handcrafted by experts, and the increasing amounts of unearthed pieces of terracotta warriors make the archaeologists too challenging to conduct the restoration of terracotta warriors efficiently. We hope to segment the 3D point cloud data of the terracotta warriors automatically and store the fragment data in the database to assist the archaeologists in matching the actual fragments with the ones in the database, which could result in higher repairing efficiency of terracotta warriors. Moreover, the existing 3D neural network research is mainly focusing on supervised classification, clustering, unsupervised representation, and reconstruction. There are few pieces of researches concentrating on unsupervised point cloud part segmentation. In this paper, we present SRG-Net for 3D point clouds of terracotta warriors to address these problems. Firstly, we adopt a customized seed-region-growing algorithm to segment the point cloud coarsely. Then we present a supervised segmentation and unsupervised reconstruction networks to learn the characteristics of 3D point clouds. Finally, we combine the SRG algorithm with our improved CNN using a refinement method. This pipeline is called SRG-Net, which aims at conducting segmentation tasks on the terracotta warriors. Our proposed SRG-Net is evaluated on the terracotta warriors data and ShapeNet dataset by measuring the accuracy and the latency. The experimental results show that our SRG-Net outperforms the state-of-the-art methods. Our code is shown in Code File 1~\cite{Srgnet_2021}.

Wei Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

75 published item(s)

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM

Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

A Brief Survey on Adaptive Video Streaming Quality Assessment

Blind Quality Assessment of 3D Dense Point Clouds with Structure Guided Resampling

Deep Decomposition and Bilinear Pooling Network for Blind Night-Time Image Quality Evaluation

FasterX: Real-Time Object Detection Based on Edge GPUs for UAV Applications

Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Interfacial Mixing Effect in a Promising Skyrmionic Material: Ferrimagnetic Mn$_4$N

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Multi-Granularity Semantic Aware Graph Model for Reducing Position Bias in Emotion-Cause Pair Extraction

No-Reference Light Field Image Quality Assessment Based on Spatial-Angular Measurement

On Language Model Integration for RNN Transducer based Speech Recognition

Quality Assessment of Image Super-Resolution: Balancing Deterministic and Statistical Fidelity

Rank-Sensitive Computation of the Rank Profile of a Polynomial Matrix

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

Tensor Oriented No-Reference Light Field Image Quality Assessment

Topmetal-M: a novel pixel sensor for compact tracking applications

Unsupervised Segmentation for Terracotta Warrior Point Cloud (SRG-Net)

Beyond Statistical Relations: Integrating Knowledge Relations into Style Correlations for Multi-Label Music Style Classification

Lateral contact yields longitudinal cohesion in active undulatory systems

No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness

Unsupervised Segmentation for Terracotta Warrior with Seed-Region-Growing CNN (SRG-Net)

Adaptive support driven Bayesian reweighted algorithm for sparse signal recovery

AutoSUM: Automating Feature Extraction and Multi-user Preference Simulation for Entity Summarization

Blind Omnidirectional Image Quality Assessment with Viewport Oriented Graph Convolutional Networks

Blind Quality Assessment for Image Superresolution Using Deep Two-Stream Convolutional Networks

Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment

Dense Residual Network for Retinal Vessel Segmentation

Development of readout electronics a novel beam monitoring system for ion research facility accelerator

DyHGCN: A Dynamic Heterogeneous Graph Convolutional Network to Learn Users' Dynamic Preferences for Information Diffusion Prediction

ESA: Entity Summarization with Attention

Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

LIRA: Lifelong Image Restoration from Unknown Blended Distortions

LOCx2-130, a low-power, low-latency, 2 x 4.8-Gbps serializer ASIC for detector front-end readout

Logic Bugs in IoT Platforms and Systems: A Review

Residual Spatial Attention Network for Retinal Vessel Segmentation

Rethinking Distributional Matching Based Domain Adaptation

The Latency Validation of the Optical Link for the ATLAS Liquid Argon Calorimeter Phase-I Trigger Upgrade

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

Topological Dirac states in a layered telluride TaPdTe$_5$ with quasi-one-dimensional PdTe$_2$ chains

Two low-power optical data transmission ASICs for the ATLAS Liquid Argon Calorimeter readout upgrade

Spontaneous photo-generated carrier separation of SnO/BiOX (X=Cl, Br, I) bilayer under visible light irradiation for water splitting

A fast, deterministic algorithm for computing a Hermite Normal Form of a polynomial matrix

Anisotropic Ginzburg-Landau scaling of Hc2 and transport properties of 112-type Ca0.8La0.2Fe0.98Co0.02As2 single crystal

Asymptotic behaviors of bivariate Gaussian powered extremes

Finite groups with small number of cyclic subgroups

Higher-order expansions of powered extremes of normal samples

On the number of cyclic subgroup in finite groups

Quadboost: A Scalable Concurrent Quadtree

Test of \textit{Topmetal-${II}^-$} In Liquid Nitrogen For Cryogenic Temperature TPCs

High Resolution Image Reconstruction Method for a Double-plane PET System with Changeable Spacing

Integrated Digital Inverters Based on Two-dimensional Anisotropic ReS2 Field-effect Transistors

Raman vibrational spectra of bulk to monolayer ReS2 with lower symmetry

The positive piezoconductive effect in graphene

Anisotropic Superconductivity of Ca1-xLaxFeAs2 (x ~ 0.18) Single Crystal

Bulk Superconductivity in Fe1+yTe0.6Se0.4 Induced by Removal of Excess Fe

Fast and deterministic computation of the determinant of a polynomial matrix

Highly tunable ultra-narrow-resonances with optical nano-antenna phased arrays in the infrared

A probabilistic approach to interior regularity of fully nonlinear degenerate elliptic equations in smooth domains

Interior regularity of fully nonlinear degenerate elliptic equations, I: Bellman equations with constant coefficients

Interior regularity of fully nonlinear degenerate elliptic equations, II: real and complex Monge-Ampère equations

LogMaster: Mining Event Correlations in Logs of Large scale Cluster Systems

On representation and regularity of viscosity solutions to degenerate Isaacs equations and certain nonconvex Hessian equations

Quasiderivative method for derivative estimates of solutions to degenerate elliptic equations

Representation and regularity for the Dirichlet problem for real and complex degenerate Hessian equations

Development of Knife-Edge Ridges on Ion-Bombarded Surfaces

Phoenix Cloud: Consolidating Different Computing Loads on Shared Cluster System for Large Organization

Scalable Group Management in Large-Scale Virtualized Clusters

Surface magnetic states of Ni nanochains modified by using different organic surfactants

The longitudinal polarization of hyperons in the forward region in polarized $pp$ collisions

Terahertz wave emission from mesoscopic crystals of BSCCO