Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
45works
0followers
24topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

45 published item(s)

preprint2026arXiv

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

AI applications are increasingly being introduced into digital health. While technical performance has advanced rapidly, successful deployment mainly depends on consumer attitudes, especially to patient-facing applications. However, most existing research examines consumer attitudes towards healthcare AI at an abstract level rather than in response to concrete artefacts. We report a mixed-methods survey study in Australia (N=275) examining consumer readiness, acceptance, trust, and risk perceptions of healthcare AI, combined with a scenario-based evaluation of an AI-generated versus clinician-written consultation summary. Participants expressed moderate optimism and strong perceived usefulness and ease of use, but also substantial concerns about accuracy, safety, and data use. In the scenario task, the AI-generated summary was strongly preferred for quality, empathy, and overall usefulness, yet identification of the AI summary was near chance. Findings show that consumers judge AI through concrete communication quality and visible human governance, underscoring the need for clinically supervised deployment frameworks beyond technical performance alone.

preprint2025arXiv

Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM

Remote sensing change understanding (RSCU) is essential for analyzing remote sensing images and understanding how human activities affect the environment. However, existing datasets lack deep understanding and interactions in the diverse change captioning, counting, and localization tasks. To tackle these gaps, we construct ChangeIMTI, a new large-scale interactive multi-task instruction dataset that encompasses four complementary tasks including change captioning, binary change classification, change counting, and change localization. Building upon this new dataset, we further design a novel vision-guided vision-language model (ChangeVG) with dual-granularity awareness for bi-temporal remote sensing images (i.e., two remote sensing images of the same area at different times). The introduced vision-guided module is a dual-branch architecture that synergistically combines fine-grained spatial feature extraction with high-level semantic summarization. These enriched representations further serve as the auxiliary prompts to guide large vision-language models (VLMs) (e.g., Qwen2.5-VL-7B) during instruction tuning, thereby facilitating the hierarchical cross-modal learning. We extensively conduct experiments across four tasks to demonstrate the superiority of our approach. Remarkably, on the change captioning task, our method outperforms the strongest method Semantic-CC by 1.39 points on the comprehensive S*m metric, which integrates the semantic similarity and descriptive accuracy to provide an overall evaluation of change caption. Moreover, we also perform a series of ablation studies to examine the critical components of our method. The source code and associated data for this work are publicly available at Github.

preprint2024arXiv

Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.

preprint2022arXiv

A Brief Survey on Adaptive Video Streaming Quality Assessment

Quality of experience (QoE) assessment for adaptive video streaming plays a significant role in advanced network management systems. It is especially challenging in case of dynamic adaptive streaming schemes over HTTP (DASH) which has increasingly complex characteristics including additional playback issues. In this paper, we provide a brief overview of adaptive video streaming quality assessment. Upon our review of related works, we analyze and compare different variations of objective QoE assessment models with or without using machine learning techniques for adaptive video streaming. Through the performance analysis, we observe that hybrid models perform better than both quality-of-service (QoS) driven QoE approaches and signal fidelity measurement. Moreover, the machine learning-based model slightly outperforms the model without using machine learning for the same setting. In addition, we find that existing video streaming QoE assessment models still have limited performance, which makes it difficult to be applied in practical communication systems. Therefore, based on the success of deep learned feature representations for traditional video quality prediction, we also apply the off-the-shelf deep convolutional neural network (DCNN) to evaluate the perceptual quality of streaming videos, where the spatio-temporal properties of streaming videos are taken into consideration. Experiments demonstrate its superiority, which sheds light on the future development of specifically designed deep learning frameworks for adaptive video streaming quality assessment. We believe this survey can serve as a guideline for QoE assessment of adaptive video streaming.

preprint2022arXiv

Blind Quality Assessment of 3D Dense Point Clouds with Structure Guided Resampling

Objective quality assessment of 3D point clouds is essential for the development of immersive multimedia systems in real-world applications. Despite the success of perceptual quality evaluation for 2D images and videos, blind/no-reference metrics are still scarce for 3D point clouds with large-scale irregularly distributed 3D points. Therefore, in this paper, we propose an objective point cloud quality index with Structure Guided Resampling (SGR) to automatically evaluate the perceptually visual quality of 3D dense point clouds. The proposed SGR is a general-purpose blind quality assessment method without the assistance of any reference information. Specifically, considering that the human visual system (HVS) is highly sensitive to structure information, we first exploit the unique normal vectors of point clouds to execute regional pre-processing which consists of keypoint resampling and local region construction. Then, we extract three groups of quality-related features, including: 1) geometry density features; 2) color naturalness features; 3) angular consistency features. Both the cognitive peculiarities of the human brain and naturalness regularity are involved in the designed quality-aware features that can capture the most vital aspects of distorted 3D point clouds. Extensive experiments on several publicly available subjective point cloud quality databases validate that our proposed SGR can compete with state-of-the-art full-reference, reduced-reference, and no-reference quality assessment algorithms.

preprint2022arXiv

Deep Decomposition and Bilinear Pooling Network for Blind Night-Time Image Quality Evaluation

Blind image quality assessment (BIQA), which aims to accurately predict the image quality without any pristine reference information, has been extensively concerned in the past decades. Especially, with the help of deep neural networks, great progress has been achieved. However, it remains less investigated on BIQA for night-time images (NTIs) which usually suffers from complicated authentic distortions such as reduced visibility, low contrast, additive noises, and color distortions. These diverse authentic degradations particularly challenges the design of effective deep neural network for blind NTI quality evaluation (NTIQE). In this paper, we propose a novel deep decomposition and bilinear pooling network (DDB-Net) to better address this issue. The DDB-Net contains three modules, i.e., an image decomposition module, a feature encoding module, and a bilinear pooling module. The image decomposition module is inspired by the Retinex theory and involves decoupling the input NTI into an illumination layer component responsible for illumination information and a reflection layer component responsible for content information. Then, the feature encoding module involves learning feature representations of degradations that are rooted in the two decoupled components separately. Finally, by modeling illumination-related and content-related degradations as two-factor variations, the two feature sets are bilinearly pooled together to form a unified representation for quality prediction. The superiority of the proposed DDB-Net has been well validated by extensive experiments on several benchmark datasets. The source code will be made available soon.

preprint2022arXiv

FasterX: Real-Time Object Detection Based on Edge GPUs for UAV Applications

Real-time object detection on Unmanned Aerial Vehicles (UAVs) is a challenging issue due to the limited computing resources of edge GPU devices as Internet of Things (IoT) nodes. To solve this problem, in this paper, we propose a novel lightweight deep learning architectures named FasterX based on YOLOX model for real-time object detection on edge GPU. First, we design an effective and lightweight PixSF head to replace the original head of YOLOX to better detect small objects, which can be further embedded in the depthwise separable convolution (DS Conv) to achieve a lighter head. Then, a slimmer structure in the Neck layer termed as SlimFPN is developed to reduce parameters of the network, which is a trade-off between accuracy and speed. Furthermore, we embed attention module in the Head layer to improve the feature extraction effect of the prediction head. Meanwhile, we also improve the label assignment strategy and loss function to alleviate category imbalance and box optimization problems of the UAV dataset. Finally, auxiliary heads are presented for online distillation to improve the ability of position embedding and feature extraction in PixSF head. The performance of our lightweight models are validated experimentally on the NVIDIA Jetson NX and Jetson Nano GPU embedded platforms.Extensive experiments show that FasterX models achieve better trade-off between accuracy and latency on VisDrone2021 dataset compared to state-of-the-art models.

preprint2022arXiv

Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information. Recent works have been devoted to leveraging text summarization and have achieved promising results. However, these summarization-based methods did not take full advantage of the summary including ignoring the inherent interactions between the summary and document. As a result, they limited the representation to express major points in the document, which is highly indicative of the key sentiment. In this paper, we study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA. A Hierarchical Interaction Networks (HIN) is proposed to explore bidirectional interactions between the summary and document at multiple granularities and learn subject-oriented document representations for sentiment classification. Furthermore, we design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation. We extensively evaluate our proposed models on three public datasets. The experimental results consistently demonstrate the effectiveness of our proposed models and show that HIN-SR outperforms various state-of-the-art methods.

preprint2022arXiv

Interfacial Mixing Effect in a Promising Skyrmionic Material: Ferrimagnetic Mn$_4$N

Interfacial mixing of elements is a well-known phenomenon found in thin film deposition. For thin-film magnetic heterostructures, interfacial compositional inhomogeneities can have drastic effects on the resulting functionalities. As such, care must be taken to characterize the compositional and magnetic properties of thin films intended for device use. Recently, ferrimagnetic Mn$_4$N thin films have drawn considerable interest due to exhibiting perpendicular magnetic anisotropy, high domain-wall mobility, and good thermal stability. In this study, we employed X-ray photoelectron spectroscopy (XPS) and polarized neutron reflectometry (PNR) measurements to investigate the interfaces of an epitaxially-grown MgO/Mn$_4$N/Pt trilayer deposited at 450 $^{\circ}$C. XPS revealed the thickness of elemental mixing regions of near 5 nm at both interfaces. Using PNR, we found that these interfaces exhibit essentially zero net magnetization at room temperature. Despite the high-temperature deposition at 450 $^{\circ}$C, the thickness of mixing regions is comparable to those observed in magnetic films deposited at room temperature. Micromagnetic simulations show that this interfacial mixing should not deter the robust formation of small skyrmions, consistent with a recent experiment. The results obtained are encouraging in terms of the potential of integrating thermally stable Mn$_4$N into future spintronic devices.

preprint2022arXiv

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input. Pre-training only with phonemes as input can alleviate the input mismatch but lack the ability to model rich representations and semantic information due to limited phoneme vocabulary. In this paper, we propose MixedPhoneme BERT, a novel variant of the BERT model that uses mixed phoneme and sup-phoneme representations to enhance the learning capability. Specifically, we merge the adjacent phonemes into sup-phonemes and combine the phoneme sequence and the merged sup-phoneme sequence as the model input, which can enhance the model capacity to learn rich contextual representations. Experiment results demonstrate that our proposed Mixed-Phoneme BERT significantly improves the TTS performance with 0.30 CMOS gain compared with the FastSpeech 2 baseline. The Mixed-Phoneme BERT achieves 3x inference speedup and similar voice quality to the previous TTS pre-trained model PnG BERT

preprint2022arXiv

Multi-Granularity Semantic Aware Graph Model for Reducing Position Bias in Emotion-Cause Pair Extraction

The Emotion-Cause Pair Extraction (ECPE) task aims to extract emotions and causes as pairs from documents. We observe that the relative distance distribution of emotions and causes is extremely imbalanced in the typical ECPE dataset. Existing methods have set a fixed size window to capture relations between neighboring clauses. However, they neglect the effective semantic connections between distant clauses, leading to poor generalization ability towards position-insensitive data. To alleviate the problem, we propose a novel Multi-Granularity Semantic Aware Graph model (MGSAG) to incorporate fine-grained and coarse-grained semantic features jointly, without regard to distance limitation. In particular, we first explore semantic dependencies between clauses and keywords extracted from the document that convey fine-grained semantic features, obtaining keywords enhanced clause representations. Besides, a clause graph is also established to model coarse-grained semantic relations between clauses. Experimental results indicate that MGSAG surpasses the existing state-of-the-art ECPE models. Especially, MGSAG outperforms other models significantly in the condition of position-insensitive data.

preprint2022arXiv

No-Reference Light Field Image Quality Assessment Based on Spatial-Angular Measurement

Light field image quality assessment (LFI-QA) is a significant and challenging research problem. It helps to better guide light field acquisition, processing and applications. However, only a few objective models have been proposed and none of them completely consider intrinsic factors affecting the LFI quality. In this paper, we propose a No-Reference Light Field image Quality Assessment (NR-LFQA) scheme, where the main idea is to quantify the LFI quality degradation through evaluating the spatial quality and angular consistency. We first measure the spatial quality deterioration by capturing the naturalness distribution of the light field cyclopean image array, which is formed when human observes the LFI. Then, as a transformed representation of LFI, the Epipolar Plane Image (EPI) contains the slopes of lines and involves the angular information. Therefore, EPI is utilized to extract the global and local features from LFI to measure angular consistency degradation. Specifically, the distribution of gradient direction map of EPI is proposed to measure the global angular consistency distortion in the LFI. We further propose the weighted local binary pattern to capture the characteristics of local angular consistency degradation. Extensive experimental results on four publicly available LFI quality datasets demonstrate that the proposed method outperforms state-of-the-art 2D, 3D, multi-view, and LFI quality assessment algorithms.

preprint2022arXiv

On Language Model Integration for RNN Transducer based Speech Recognition

The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to remove this sequence prior as ILM correction. In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction, which is further experimentally verified with detailed analysis. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer, which enables a theoretical justification for other ILM approaches. Systematic comparison is conducted for both in-domain and cross-domain evaluation on the Librispeech and TED-LIUM Release 2 corpora, respectively. Our proposed exact-ILM training can further improve the best ILM method.

preprint2022arXiv

Quality Assessment of Image Super-Resolution: Balancing Deterministic and Statistical Fidelity

There has been a growing interest in developing image super-resolution (SR) algorithms that convert low-resolution (LR) to higher resolution images, but automatically evaluating the visual quality of super-resolved images remains a challenging problem. Here we look at the problem of SR image quality assessment (SR IQA) in a two-dimensional (2D) space of deterministic fidelity (DF) versus statistical fidelity (SF). This allows us to better understand the advantages and disadvantages of existing SR algorithms, which produce images at different clusters in the 2D space of (DF, SF). Specifically, we observe an interesting trend from more traditional SR algorithms that are typically inclined to optimize for DF while losing SF, to more recent generative adversarial network (GAN) based approaches that by contrast exhibit strong advantages in achieving high SF but sometimes appear weak at maintaining DF. Furthermore, we propose an uncertainty weighting scheme based on content-dependent sharpness and texture assessment that merges the two fidelity measures into an overall quality prediction named the Super Resolution Image Fidelity (SRIF) index, which demonstrates superior performance against state-of-the-art IQA models when tested on subject-rated datasets.

preprint2022arXiv

Rank-Sensitive Computation of the Rank Profile of a Polynomial Matrix

Consider a matrix $\mathbf{F} \in \mathbb{K}[x]^{m \times n}$ of univariate polynomials over a field $\mathbb{K}$. We study the problem of computing the column rank profile of $\mathbf{F}$. To this end we first give an algorithm which improves the minimal kernel basis algorithm of Zhou, Labahn, and Storjohann (Proceedings ISSAC 2012). We then provide a second algorithm which computes the column rank profile of $\mathbf{F}$ with a rank-sensitive complexity of $O\tilde{~}(r^{ω-2} n (m+D))$ operations in $\mathbb{K}$. Here, $D$ is the sum of row degrees of $\mathbf{F}$, $ω$ is the exponent of matrix multiplication, and $O\tilde{~}(\cdot)$ hides logarithmic factors.

preprint2022arXiv

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Coronary CT Angiography (CCTA) is susceptible to various distortions (e.g., artifacts and noise), which severely compromise the exact diagnosis of cardiovascular diseases. The appropriate CCTA Vessel-level Image Quality Assessment (CCTA VIQA) algorithm can be used to reduce the risk of error diagnosis. The primary challenges of CCTA VIQA are that the local part of coronary that determines final quality is hard to locate. To tackle the challenge, we formulate CCTA VIQA as a multiple-instance learning (MIL) problem, and exploit Transformer-based MIL backbone (termed as T-MIL) to aggregate the multiple instances along the coronary centerline into the final quality. However, not all instances are informative for final quality. There are some quality-irrelevant/negative instances intervening the exact quality assessment(e.g., instances covering only background or the coronary in instances is not identifiable). Therefore, we propose a Progressive Reinforcement learning based Instance Discarding module (termed as PRID) to progressively remove quality-irrelevant/negative instances for CCTA VIQA. Based on the above two modules, we propose a Reinforced Transformer Network (RTN) for automatic CCTA VIQA based on end-to-end optimization. Extensive experimental results demonstrate that our proposed method achieves the state-of-the-art performance on the real-world CCTA dataset, exceeding previous MIL methods by a large margin.

preprint2022arXiv

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED.

preprint2022arXiv

Tensor Oriented No-Reference Light Field Image Quality Assessment

Light field image (LFI) quality assessment is becoming more and more important, which helps to better guide the acquisition, processing and application of immersive media. However, due to the inherent high dimensional characteristics of LFI, the LFI quality assessment turns into a multi-dimensional problem that requires consideration of the quality degradation in both spatial and angular dimensions. Therefore, we propose a novel Tensor oriented No-reference Light Field image Quality evaluator (Tensor-NLFQ) based on tensor theory. Specifically, since the LFI is regarded as a low-rank 4D tensor, the principal components of four oriented sub-aperture view stacks are obtained via Tucker decomposition. Then, the Principal Component Spatial Characteristic (PCSC) is designed to measure the spatial-dimensional quality of LFI considering its global naturalness and local frequency properties. Finally, the Tensor Angular Variation Index (TAVI) is proposed to measure angular consistency quality by analyzing the structural similarity distribution between the first principal component and each view in the view stack. Extensive experimental results on four publicly available LFI quality databases demonstrate that the proposed Tensor-NLFQ model outperforms state-of-the-art 2D, 3D, multi-view, and LFI quality assessment algorithms.

preprint2022arXiv

Topmetal-M: a novel pixel sensor for compact tracking applications

The Topmetal-M is a large area pixel sensor (18 mm * 23 mm) prototype fabricated in a new 130 nm high-resistivity CMOS process in 2019. It contains 400 rows * 512 columns square pixels with the pitch of 40 μm. In Topmetal-M, a novel charge collection method combing the Monolithic Active Pixel Sensor (MAPS) and the Topmetal sensor has been proposed for the first time. Both the ionized charge deposited by the particle in the sensor and along the track over the sensor can be collected. The in-pixel circuit mainly consists of a low-noise charge sensitive amplifier to establish the signal for the energy reconstruction, and a discriminator with a Time-to-Amplitude Converter (TAC) for the Time of Arrival (TOA) measurement. With this mechanism, the trajectory, particle hit position, energy and arrival time of the particle can be measured. The analog signal from each pixel is accessible through time-shared multiplexing over the entire pixel array. This paper will discuss the design and preliminary test results of the Topmetal-M sensor.

preprint2022arXiv

Unsupervised Segmentation for Terracotta Warrior Point Cloud (SRG-Net)

The repairing work of terracotta warriors in Emperor Qinshihuang Mausoleum Site Museum is handcrafted by experts, and the increasing amounts of unearthed pieces of terracotta warriors make the archaeologists too challenging to conduct the restoration of terracotta warriors efficiently. We hope to segment the 3D point cloud data of the terracotta warriors automatically and store the fragment data in the database to assist the archaeologists in matching the actual fragments with the ones in the database, which could result in higher repairing efficiency of terracotta warriors. Moreover, the existing 3D neural network research is mainly focusing on supervised classification, clustering, unsupervised representation, and reconstruction. There are few pieces of researches concentrating on unsupervised point cloud part segmentation. In this paper, we present SRG-Net for 3D point clouds of terracotta warriors to address these problems. Firstly, we adopt a customized seed-region-growing algorithm to segment the point cloud coarsely. Then we present a supervised segmentation and unsupervised reconstruction networks to learn the characteristics of 3D point clouds. Finally, we combine the SRG algorithm with our improved CNN(convolution neural network) using a refinement method. This pipeline is called SRG-Net, which aims at conducting segmentation tasks on the terracotta warriors. Our proposed SRG-Net is evaluated on the terracotta warrior data and ShapeNet dataset by measuring the accuracy and the latency. The experimental results show that our SRG-Net outperforms the state-of-the-art methods. Our code is available at https://github.com/hyoau/SRG-Net.

preprint2021arXiv

Beyond Statistical Relations: Integrating Knowledge Relations into Style Correlations for Multi-Label Music Style Classification

Automatically labeling multiple styles for every song is a comprehensive application in all kinds of music websites. Recently, some researches explore review-driven multi-label music style classification and exploit style correlations for this task. However, their methods focus on mining the statistical relations between different music styles and only consider shallow style relations. Moreover, these statistical relations suffer from the underfitting problem because some music styles have little training data. To tackle these problems, we propose a novel knowledge relations integrated framework (KRF) to capture the complete style correlations, which jointly exploits the inherent relations between music styles according to external knowledge and their statistical relations. Based on the two types of relations, we use a graph convolutional network to learn the deep correlations between styles automatically. Experimental results show that our framework significantly outperforms state-of-the-art methods. Further studies demonstrate that our framework can effectively alleviate the underfitting problem and learn meaningful style correlations. The source code can be available at https://github.com/Makwen1995/MusicGenre.

preprint2021arXiv

Lateral contact yields longitudinal cohesion in active undulatory systems

Many animals and robots move using undulatory motion of their bodies. When in close proximity undulatory motion can lead to novel collective behaviors such as gait synchronization, spatial reconfiguration, and clustering. Here we study the role of contact interactions between model undulatory swimmers: three-link robots in experiment and multi-link robots in simulation. The undulatory gait of each swimmer is generated through a time-dependent sinusoidal-like waveform which has a fixed phase offset, $ϕ$. By varying the phase relationship between neighboring swimmers we seek to study how contact forces and spatial configurations are governed by the phase difference between neighboring swimmers. We find that undulatory actuation in close proximity drives neighboring swimmers into spatial equilibrium configurations that depend on the actuation phase difference. We propose a model for spatial equilibrium of nearest neighbor undulatory swimmers which we call the gait compatibility condition, which is the set of spatial and gait configurations in which no collisions occur. Robotic experiments with two, three, and four swimmers exhibit good agreement with the compatibility model. To probe the interaction potential between undulatory swimmers we perturb the each longitudinally from their equilibrium configurations and we measure their steady-state displacement. These studies reveal that undulatory swimmers in close proximity exhibit cohesive longitudinal interaction forces that drive the swimmers from incompatible to compatible configurations. This system of undulatory swimmers provides new insight into active-matter systems which move through body undulation. In addition to the importance of velocity and orientation coherence in active-matter swarms, we demonstrate that undulatory phase coherence is also important for generating stable, cohesive group configurations.

preprint2021arXiv

No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness

360-degree/omnidirectional images (OIs) have achieved remarkable attentions due to the increasing applications of virtual reality (VR). Compared to conventional 2D images, OIs can provide more immersive experience to consumers, benefitting from the higher resolution and plentiful field of views (FoVs). Moreover, observing OIs is usually in the head mounted display (HMD) without references. Therefore, an efficient blind quality assessment method, which is specifically designed for 360-degree images, is urgently desired. In this paper, motivated by the characteristics of the human visual system (HVS) and the viewing process of VR visual contents, we propose a novel and effective no-reference omnidirectional image quality assessment (NR OIQA) algorithm by Multi-Frequency Information and Local-Global Naturalness (MFILGN). Specifically, inspired by the frequency-dependent property of visual cortex, we first decompose the projected equirectangular projection (ERP) maps into wavelet subbands. Then, the entropy intensities of low and high frequency subbands are exploited to measure the multi-frequency information of OIs. Besides, except for considering the global naturalness of ERP maps, owing to the browsed FoVs, we extract the natural scene statistics features from each viewport image as the measure of local naturalness. With the proposed multi-frequency information measurement and local-global naturalness measurement, we utilize support vector regression as the final image quality regressor to train the quality evaluation model from visual quality-related features to human ratings. To our knowledge, the proposed model is the first no-reference quality assessment method for 360-degreee images that combines multi-frequency information and image naturalness. Experimental results on two publicly available OIQA databases demonstrate that our proposed MFILGN outperforms state-of-the-art approaches.

preprint2021arXiv

Unsupervised Segmentation for Terracotta Warrior with Seed-Region-Growing CNN (SRG-Net)

The repairing work of terracotta warriors in Emperor Qinshihuang Mausoleum Site Museum is handcrafted by experts, and the increasing amounts of unearthed pieces of terracotta warriors make the archaeologists too challenging to conduct the restoration of terracotta warriors efficiently. We hope to segment the 3D point cloud data of the terracotta warriors automatically and store the fragment data in the database to assist the archaeologists in matching the actual fragments with the ones in the database, which could result in higher repairing efficiency of terracotta warriors. Moreover, the existing 3D neural network research is mainly focusing on supervised classification, clustering, unsupervised representation, and reconstruction. There are few pieces of researches concentrating on unsupervised point cloud part segmentation. In this paper, we present SRG-Net for 3D point clouds of terracotta warriors to address these problems. Firstly, we adopt a customized seed-region-growing algorithm to segment the point cloud coarsely. Then we present a supervised segmentation and unsupervised reconstruction networks to learn the characteristics of 3D point clouds. Finally, we combine the SRG algorithm with our improved CNN using a refinement method. This pipeline is called SRG-Net, which aims at conducting segmentation tasks on the terracotta warriors. Our proposed SRG-Net is evaluated on the terracotta warriors data and ShapeNet dataset by measuring the accuracy and the latency. The experimental results show that our SRG-Net outperforms the state-of-the-art methods. Our code is shown in Code File 1~\cite{Srgnet_2021}.

preprint2020arXiv

Adaptive support driven Bayesian reweighted algorithm for sparse signal recovery

Sparse learning has been widely studied to capture critical information from enormous data sources in the filed of system identification. Often, it is essential to understand internal working mechanisms of unknown systems (e.g. biological networks) in addition to input-output relationships. For this purpose, various feature selection techniques have been developed. For example, sparse Bayesian learning (SBL) was proposed to learn major features from a dictionary of basis functions, which makes identified models interpretable. Reweighted L1-regularization algorithms are often applied in SBL to solve optimization problems. However, they are expensive in both computation and memory aspects, thus not suitable for large-scale problems. This paper proposes an adaptive support driven Bayesian reweighted (ASDBR) algorithm for sparse signal recovery. A restart strategy based on shrinkage-thresholding is developed to conduct adaptive support estimate, which can effectively reduce computation burden and memory demands. Moreover, ASDBR accurately extracts major features and excludes redundant information from large datasets. Numerical experiments demonstrate the proposed algorithm outperforms state-of-the-art methods.

preprint2020arXiv

AutoSUM: Automating Feature Extraction and Multi-user Preference Simulation for Entity Summarization

Withthegrowthofknowledgegraphs, entity descriptions are becoming extremely lengthy. Entity summarization task, aiming to generate diverse, comprehensive, and representative summaries for entities, has received increasing interest recently. In most previous methods, features are usually extracted by the handcrafted templates. Then the feature selection and multi-user preference simulation take place, depending too much on human expertise. In this paper, a novel integration method called AutoSUM is proposed for automatic feature extraction and multi-user preference simulation to overcome the drawbacks of previous methods. There are two modules in AutoSUM: extractor and simulator. The extractor module operates automatic feature extraction based on a BiLSTM with a combined input representation including word embeddings and graph embeddings. Meanwhile, the simulator module automates multi-user preference simulation based on a well-designed two-phase attention mechanism (i.e., entity-phase attention and user-phase attention). Experimental results demonstrate that AutoSUM produces state-of-the-art performance on two widely used datasets (i.e., DBpedia and LinkedMDB) in both F-measure and MAP.

preprint2020arXiv

Blind Omnidirectional Image Quality Assessment with Viewport Oriented Graph Convolutional Networks

Quality assessment of omnidirectional images has become increasingly urgent due to the rapid growth of virtual reality applications. Different from traditional 2D images and videos, omnidirectional contents can provide consumers with freely changeable viewports and a larger field of view covering the $360^{\circ}\times180^{\circ}$ spherical surface, which makes the objective quality assessment of omnidirectional images more challenging. In this paper, motivated by the characteristics of the human vision system (HVS) and the viewing process of omnidirectional contents, we propose a novel Viewport oriented Graph Convolution Network (VGCN) for blind omnidirectional image quality assessment (IQA). Generally, observers tend to give the subjective rating of a 360-degree image after passing and aggregating different viewports information when browsing the spherical scenery. Therefore, in order to model the mutual dependency of viewports in the omnidirectional image, we build a spatial viewport graph. Specifically, the graph nodes are first defined with selected viewports with higher probabilities to be seen, which is inspired by the HVS that human beings are more sensitive to structural information. Then, these nodes are connected by spatial relations to capture interactions among them. Finally, reasoning on the proposed graph is performed via graph convolutional networks. Moreover, we simultaneously obtain global quality using the entire omnidirectional image without viewport sampling to boost the performance according to the viewing experience. Experimental results demonstrate that our proposed model outperforms state-of-the-art full-reference and no-reference IQA metrics on two public omnidirectional IQA databases.

preprint2020arXiv

Blind Quality Assessment for Image Superresolution Using Deep Two-Stream Convolutional Networks

Numerous image superresolution (SR) algorithms have been proposed for reconstructing high-resolution (HR) images from input images with lower spatial resolutions. However, effectively evaluating the perceptual quality of SR images remains a challenging research problem. In this paper, we propose a no-reference/blind deep neural network-based SR image quality assessor (DeepSRQ). To learn more discriminative feature representations of various distorted SR images, the proposed DeepSRQ is a two-stream convolutional network including two subcomponents for distorted structure and texture SR images. Different from traditional image distortions, the artifacts of SR images cause both image structure and texture quality degradation. Therefore, we choose the two-stream scheme that captures different properties of SR inputs instead of directly learning features from one image stream. Considering the human visual system (HVS) characteristics, the structure stream focuses on extracting features in structural degradations, while the texture stream focuses on the change in textural distributions. In addition, to augment the training data and ensure the category balance, we propose a stride-based adaptive cropping approach for further improvement. Experimental results on three publicly available SR image quality databases demonstrate the effectiveness and generalization ability of our proposed DeepSRQ method compared with state-of-the-art image quality assessment algorithms.

preprint2020arXiv

Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment

In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.

preprint2020arXiv

Dense Residual Network for Retinal Vessel Segmentation

Retinal vessel segmentation plays an imaportant role in the field of retinal image analysis because changes in retinal vascular structure can aid in the diagnosis of diseases such as hypertension and diabetes. In recent research, numerous successful segmentation methods for fundus images have been proposed. But for other retinal imaging modalities, more research is needed to explore vascular extraction. In this work, we propose an efficient method to segment blood vessels in Scanning Laser Ophthalmoscopy (SLO) retinal images. Inspired by U-Net, "feature map reuse" and residual learning, we propose a deep dense residual network structure called DRNet. In DRNet, feature maps of previous blocks are adaptively aggregated into subsequent layers as input, which not only facilitates spatial reconstruction, but also learns more efficiently due to more stable gradients. Furthermore, we introduce DropBlock to alleviate the overfitting problem of the network. We train and test this model on the recent SLO public dataset. The results show that our method achieves the state-of-the-art performance even without data augmentation.

preprint2020arXiv

Development of readout electronics a novel beam monitoring system for ion research facility accelerator

This article presents the readout electronics of a novel beam monitoring system for ion research facility accelerator. The readout electronics are divided into Front-end Card (FEC) and Readout Control Unit (RCU). FEC uses Topmetal II minus to processes the energy of the hitting particles and convert it into a voltage signal. The main function of RCU is to digitize the analog output signal of FEC and format the raw data. On the other hand, the RCU also processes the control commands from the host and distributes the commands according to the mapping. The readout electronic has been characterized and calibrated in the laboratory, and have been installed with the detector. Implementation and testing of readout electronics have been discussed.

preprint2020arXiv

DyHGCN: A Dynamic Heterogeneous Graph Convolutional Network to Learn Users' Dynamic Preferences for Information Diffusion Prediction

Information diffusion prediction is a fundamental task for understanding the information propagation process. It has wide applications in such as misinformation spreading prediction and malicious account detection. Previous works either concentrate on utilizing the context of a single diffusion sequence or using the social network among users for information diffusion prediction. However, the diffusion paths of different messages naturally constitute a dynamic diffusion graph. For one thing, previous works cannot jointly utilize both the social network and diffusion graph for prediction, which is insufficient to model the complexity of the diffusion process and results in unsatisfactory prediction performance. For another, they cannot learn users' dynamic preferences. Intuitively, users' preferences are changing as time goes on and users' personal preference determines whether the user will repost the information. Thus, it is beneficial to consider users' dynamic preferences in information diffusion prediction. In this paper, we propose a novel dynamic heterogeneous graph convolutional network (DyHGCN) to jointly learn the structural characteristics of the social graph and dynamic diffusion graph. Then, we encode the temporal information into the heterogeneous graph to learn the users' dynamic preferences. Finally, we apply multi-head attention to capture the context-dependency of the current diffusion path to facilitate the information diffusion prediction task. Experimental results show that DyHGCN significantly outperforms the state-of-the-art models on three public datasets, which shows the effectiveness of the proposed model.

preprint2020arXiv

ESA: Entity Summarization with Attention

Entity summarization aims at creating brief but informative descriptions of entities from knowledge graphs. While previous work mostly focused on traditional techniques such as clustering algorithms and graph models, we ask how to apply deep learning methods into this task. In this paper we propose ESA, a neural network with supervised attention mechanisms for entity summarization. Specifically, we calculate attention weights for facts in each entity, and rank facts to generate reliable summaries. We explore techniques to solve difficult learning problems presented by the ESA, and demonstrate the effectiveness of our model in comparison with the state-of-the-art methods. Experimental results show that our model improves the quality of the entity summaries in both F-measure and MAP.

preprint2020arXiv

Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

In hybrid HMM based speech recognition, LSTM language models have been widely applied and achieved large improvements. The theoretical capability of modeling any unlimited context suggests that no recombination should be applied in decoding. This motivates to reconsider full summation over the HMM-state sequences instead of Viterbi approximation in decoding. We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. The proposed full-sum decoding is evaluated on both Switchboard and Librispeech corpora. Different models using CE and sMBR training criteria are used. Additionally, both MAP and confusion network decoding as approximated variants of general Bayes decision rule are evaluated. Consistent improvements over strong baselines are achieved in almost all cases without extra cost. We also discuss tuning effort, efficiency and some limitations of full-sum decoding.

preprint2020arXiv

Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

Hybrid-distorted image restoration (HD-IR) is dedicated to restore real distorted image that is degraded by multiple distortions. Existing HD-IR approaches usually ignore the inherent interference among hybrid distortions which compromises the restoration performance. To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions. Specifically, we propose the feature disentanglement module (FDM) to distribute feature representations of different distortions into different channels by revising gain-control-based normalization. We also propose a feature aggregation module (FAM) with channel-wise attention to adaptively filter out the distortion representations and aggregate useful content information from different channels for the construction of raw image. The effectiveness of the proposed scheme is verified by visualizing the correlation matrix of features and channel responses of different distortions. Extensive experimental results also prove superior performance of our approach compared with the latest HD-IR schemes.

preprint2020arXiv

LIRA: Lifelong Image Restoration from Unknown Blended Distortions

Most existing image restoration networks are designed in a disposable way and catastrophically forget previously learned distortions when trained on a new distortion removal task. To alleviate this problem, we raise the novel lifelong image restoration problem for blended distortions. We first design a base fork-join model in which multiple pre-trained expert models specializing in individual distortion removal task work cooperatively and adaptively to handle blended distortions. When the input is degraded by a new distortion, inspired by adult neurogenesis in human memory system, we develop a neural growing strategy where the previously trained model can incorporate a new expert branch and continually accumulate new knowledge without interfering with learned knowledge. Experimental results show that the proposed approach can not only achieve state-of-the-art performance on blended distortions removal tasks in both PSNR/SSIM metrics, but also maintain old expertise while learning new restoration tasks.

preprint2020arXiv

LOCx2-130, a low-power, low-latency, 2 x 4.8-Gbps serializer ASIC for detector front-end readout

In this paper, we present the design and test results of LOCx2-130, a low-power, low-latency, dual-channel transmitter ASIC for detector front-end readout. LOCx2-130 has two channels of encoders and serializers, and each channel operates at 4.8 Gbps. LOCx2-130 can interface with three types of ADCs, an ASIC ADC and two COTS ADCs. LOCx2-130 is fabricated in a commercial 130-nm CMOS technology and is packaged in a 100-pin QFN package. LOCx2-130 consumes 440 mW and achieves a latency of less than 40.7 ns.

preprint2020arXiv

Logic Bugs in IoT Platforms and Systems: A Review

In recent years, IoT platforms and systems have been rapidly emerging. Although IoT is a new technology, new does not mean simpler (than existing networked systems). Contrarily, the complexity (of IoT platforms and systems) is actually being increased in terms of the interactions between the physical world and cyberspace. The increased complexity indeed results in new vulnerabilities. This paper seeks to provide a review of the recently discovered logic bugs that are specific to IoT platforms and systems. In particular, 17 logic bugs and one weakness falling into seven categories of vulnerabilities are reviewed in this survey.

preprint2020arXiv

Residual Spatial Attention Network for Retinal Vessel Segmentation

Reliable segmentation of retinal vessels can be employed as a way of monitoring and diagnosing certain diseases, such as diabetes and hypertension, as they affect the retinal vascular structure. In this work, we propose the Residual Spatial Attention Network (RSAN) for retinal vessel segmentation. RSAN employs a modified residual block structure that integrates DropBlock, which can not only be utilized to construct deep networks to extract more complex vascular features, but can also effectively alleviate the overfitting. Moreover, in order to further improve the representation capability of the network, based on this modified residual block, we introduce the spatial attention (SA) and propose the Residual Spatial Attention Block (RSAB) to build RSAN. We adopt the public DRIVE and CHASE DB1 color fundus image datasets to evaluate the proposed RSAN. Experiments show that the modified residual structure and the spatial attention are effective in this work, and our proposed RSAN achieves the state-of-the-art performance.

preprint2020arXiv

Rethinking Distributional Matching Based Domain Adaptation

Domain adaptation (DA) is a technique that transfers predictive models trained on a labeled source domain to an unlabeled target domain, with the core difficulty of resolving distributional shift between domains. Currently, most popular DA algorithms are based on distributional matching (DM). However in practice, realistic domain shifts (RDS) may violate their basic assumptions and as a result these methods will fail. In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods. We further propose InstaPBM, a novel Instance-based Predictive Behavior Matching method for robust DA. Extensive experiments on both conventional and RDS benchmarks demonstrate both the limitations of DM methods and the efficacy of InstaPBM: Compared with the best baselines, InstaPBM improves the classification accuracy respectively by $4.5\%$, $3.9\%$ on Digits5, VisDA2017, and $2.2\%$, $2.9\%$, $3.6\%$ on DomainNet-LDS, DomainNet-ILDS, ID-TwO. We hope our intuitive yet effective method will serve as a useful new direction and increase the robustness of DA in real scenarios. Code will be available at anonymous link: https://github.com/pikachusocute/InstaPBM-RobustDA.

preprint2020arXiv

The Latency Validation of the Optical Link for the ATLAS Liquid Argon Calorimeter Phase-I Trigger Upgrade

Two optical data link data transmission Application Specific Integrated Circuits (ASICs), the baseline and its backup, have been designed for the ATLAS Liquid Argon (LAr) Calorimeter Phase-I trigger upgrade. The latency of each ASIC and that of its corresponding receiver implemented in a back-end Field-Programmable Gate Array (FPGA) are critical specifications. In this paper, we present the latency measurements and simulation of two ASICs. The measurement results indicate that both ASICs achieve their design goals and meet the latency specifications. The consistency between the simulation and measurements validates the ASIC latency characterization.

preprint2020arXiv

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors. By investigating the effect of different maskings, we achieve improvements from SpecAugment on hybrid HMM models without increasing model size and training time. A subsequent sMBR training is applied to fine-tune the final acoustic model, and both LSTM and Transformer language models are trained and evaluated. Our best system achieves a 5.6% WER on the test set, which outperforms the previous state-of-the-art by 27% relative.

preprint2020arXiv

Topological Dirac states in a layered telluride TaPdTe$_5$ with quasi-one-dimensional PdTe$_2$ chains

We report the synthesis and systematic studies of a new layered ternary telluride TaPdTe5 with quasi-one-dimensional PdTe2 chains. This compound crystalizes in a layered orthorhombic structure with space group Cmcm. Analysis of its curved field-dependent Hall resistivity, using the two-band model, indicates the hole-dominated transport with a high mobility $μ_h$ = 2.38 $\times$ 10$^3$ cm$^2$ V$^{-1}$ s$^{-1}$ at low temperatures. The in-plane magnetoresistance (MR) displays significant anisotropy with field applied along the crystallographic $b$ axis. The MR with the current applied along the $c$-axis is also measured in high magnetic fields up to 51.7 T. Remarkably, it follows a power-law dependence and reaches (9.5 $\times$ 10$^3$)% at 2.1 K without any signature of saturation. The De Haas-van Alphen oscillations show a small Fermi-surface pocket with a nontrivial Berry phase. The Shubnikov-de Haas (SdH) oscillations are detected at low temperatures and under magnetic fields above 28.5 T. Two effective masses $m^*$ (0.26$m_e$ and 0.41$m_e$) are extracted from the oscillatory SdH data. Our first-principles calculations unveil a topological Dirac cone in its surface states, and, in particular, the topological index indicates that TaPdTe$_5$ is a topologically nontrivial material.

preprint2020arXiv

Two low-power optical data transmission ASICs for the ATLAS Liquid Argon Calorimeter readout upgrade

A serializer ASIC and a VCSEL driver ASIC are needed for the front-end optical data transmission in the ATLAS liquid argon calorimeter readout phase-I upgrade. The baseline ASICs are the serializer LOCx2 and the VCSEL driver LOCld, designed in a 0.25-μm Silicon-on-Sapphire (SoS) CMOS technology and consumed 843 mW and 320 mW, respectively. Based on a 130-nm CMOS technology, we design two pin-to-pin-compatible backup ASICs, LOCx2-130 and LOCld-130. Their power consumptions are much lower then of their counterparts, whereas other performance, such as the latency, data rate, and radiation tolerance, meet the phase-I upgrade requirements. We present the design of LOCx2-130 and LOCld-130. The test results of LOCx2-130 are also presented.

preprint2019arXiv

Spontaneous photo-generated carrier separation of SnO/BiOX (X=Cl, Br, I) bilayer under visible light irradiation for water splitting

Alloying in 2D materials plays a more and more important role due to wide range bandgap tunability and integrating the advantages of HER and OER. Here, the novel bilayers of SnO/BiOX (X= Cl, Br and I) bilayer are constructed to integrate the advantages of narrow bandgap and separating photo-generated carriers. The bandgap of the bilayers can be tuned from 1.09 to 1.84 eV, remarkably improving the utilization of solar energy. The large difference in effective masses and built-in electric field effectively hamper the fast recombination of photo-generated carries, which highly enhances the photocatalytic efficiency. Besides that, the type-II band alignment guarantee the two half reactions could occur at different surfaces. Moreover, the optical absorption (the strong transition between band edges and high joint density of states) and band-edge level further confirm the SnO/BiOX (X= Cl and Br) bilayer is a promising candidate for overall water-splitting.