Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
46works
0followers
27topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

46 published item(s)

preprint2026arXiv

Co-Training Vision Language Models for Remote Sensing Multi-task Learning

With Transformers achieving outstanding performance on individual remote sensing (RS) tasks, we are now approaching the realization of a unified model that excels across multiple tasks through multi-task learning (MTL). Compared to single-task approaches, MTL methods offer improved generalization, enhanced scalability, and greater practical applicability. Recently, vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning, respectively. Moreover, the unified text-based interface demonstrates significant potential for MTL. Hence, in this work, we present RSCoVLM, a simple yet flexible VLM baseline for RS MTL. Firstly, we create the data curation engine, including data acquisition, offline processing and integrating, as well as online loading and weighting. This data engine effectively addresses complex RS data enviroment and generates flexible vision-language conversations. Furthermore, we propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery. For UHR images, we introduce the Zoom-in Chain mechanism together with its corresponding dataset, LRS-VQA-Zoom. The strategies are flexible and effectively mitigate the computational burdens. Additionally, we significantly enhance the model's object detection capability and propose a novel evaluation protocol that ensures fair comparison between VLMs and conventional detection models. Extensive experiments demonstrate that RSCoVLM achieves state-of-the-art performance across diverse tasks, outperforming existing RS VLMs and even rivaling specialized expert models. All the training and evaluating tools, model weights, and datasets have been fully open-sourced to support reproducibility. We expect that this baseline will promote further progress toward general-purpose RS models.

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXiv

FairGU: Fairness-aware Graph Unlearning in Social Networks

Graph unlearning has emerged as a critical mechanism for supporting sustainable and privacy-preserving social networks, enabling models to remove the influence of deleted nodes and thereby better safeguard user information. However, we observe that existing graph unlearning techniques insufficiently protect sensitive attributes, often leading to degraded algorithmic fairness compared with traditional graph learning methods. To address this gap, we introduce FairGU, a fairness-aware graph unlearning framework designed to preserve both utility and fairness during the unlearning process. FairGU integrates a dedicated fairness-aware module with effective data protection strategies, ensuring that sensitive attributes are neither inadvertently amplified nor structurally exposed when nodes are removed. Through extensive experiments on multiple real-world datasets, we demonstrate that FairGU consistently outperforms state-of-the-art graph unlearning methods and fairness-enhanced graph learning baselines in terms of both accuracy and fairness metrics. Our findings highlight a previously overlooked risk in current unlearning practices and establish FairGU as a robust and equitable solution for the next generation of socially sustainable networked systems. The codes are available at https://github.com/LuoRenqiang/FairGU.

preprint2026arXiv

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control. A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation. Uncertainty integrates three complementary principles -- distribution plausibility, sampling stability, and cross-field consistency -- to triage human review. Experiments show that CMR-EXTR achieves 99.65% variable-level accuracy, demonstrating both reliable extraction and informative confidence scores. To our knowledge, this is the first CMR-specific extraction system with integrated confidence estimation. The code is available at https://github.com/yuyi1005/CMR-EXTR.

preprint2022arXiv

Active noise control techniques for nonlinear systems

Most of the literature focuses on the development of the linear active noise control (ANC) techniques. However, ANC systems might have to deal with some nonlinear components and the performance of linear ANC techniques may degrade in this scenario. To overcome this limitation, nonlinear ANC (NLANC) algorithms were developed. In Part II, we review the development of NLANC algorithms during the last decade. The contributions of heuristic ANC algorithms are outlined. Moreover, we emphasize recent advances of NLANC algorithms, such as spline ANC algorithms, kernel adaptive filters, and nonlinear distributed ANC algorithms. Then, we present recent applications of ANC technique including linear and nonlinear perspectives. Future research challenges regarding ANC techniques are also discussed.

preprint2022arXiv

Change point localization in dependent dynamic nonparametric random dot product graphs

In this paper, we study the offline change point localization problem in a sequence of dependent nonparametric random dot product graphs. To be specific, assume that at every time point, a network is generated from a nonparametric random dot product graph model \citep[see e.g.][]{athreya2017statistical}, where the latent positions are generated from unknown underlying distributions. The underlying distributions are piecewise constant in time and change at unknown locations, called change points. Most importantly, we allow for dependence among networks generated between two consecutive change points. This setting incorporates edge-dependence within networks and temporal dependence between networks, which is the most flexible setting in the published literature. To accomplish the task of consistently localizing change points, we propose a novel change point detection algorithm, consisting of two steps. First, we estimate the latent positions of the random dot product model, our theoretical result being a refined version of the state-of-the-art results, allowing the dimension of the latent positions to diverge. Subsequently, we construct a nonparametric version of the CUSUM statistic \citep[e.g.][]{Page1954, padilla2019optimal} that allows for temporal dependence. Consistent localization is proved theoretically and supported by extensive numerical experiments, which illustrate state-of-the-art performance. We also provide in depth discussion of possible extensions to give more understanding and insights.

preprint2022arXiv

Change-point Detection for Sparse and Dense Functional Data in General Dimensions

We study the problem of change-point detection and localisation for functional data sequentially observed on a general d-dimensional space, where we allow the functional curves to be either sparsely or densely sampled. Data of this form naturally arise in a wide range of applications such as biology, neuroscience, climatology, and finance. To achieve such a task, we propose a kernel-based algorithm named functional seeded binary segmentation (FSBS). FSBS is computationally efficient, can handle discretely observed functional data, and is theoretically sound for heavy-tailed and temporally-dependent observations. Moreover, FSBS works for a general d-dimensional domain, which is the first in the literature of change-point estimation for functional data. We show the consistency of FSBS for multiple change-point estimations and further provide a sharp localisation error rate, which reveals an interesting phase transition phenomenon depending on the number of functional curves observed and the sampling frequency for each curve. Extensive numerical experiments illustrate the effectiveness of FSBS and its advantage over existing methods in the literature under various settings. A real data application is further conducted, where FSBS localises change-points of sea surface temperature patterns in the south Pacific attributed to El Nino.

preprint2022arXiv

Chemical transformer compression for accelerating both training and inference of molecular modeling

Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate of both training and inference due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, it achieves comparable performance in QSAR and VS modeling. Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drug and material design.

preprint2022arXiv

Conjugate Gradient Adaptive Learning with Tukey's Biweight M-Estimate

We propose a novel M-estimate conjugate gradient (CG) algorithm, termed Tukey's biweight M-estimate CG (TbMCG), for system identification in impulsive noise environments. In particular, the TbMCG algorithm can achieve a faster convergence while retaining a reduced computational complexity as compared to the recursive least-squares (RLS) algorithm. Specifically, the Tukey's biweight M-estimate incorporates a constraint into the CG filter to tackle impulsive noise environments. Moreover, the convergence behavior of the TbMCG algorithm is analyzed. Simulation results confirm the excellent performance of the proposed TbMCG algorithm for system identification and active noise control applications.

preprint2022arXiv

Design and Analysis of Robust Resilient Diffusion over Multi-Task Networks Against Byzantine Attacks

This paper studies distributed diffusion adaptation over clustered multi-task networks in the presence of impulsive interferences and Byzantine attacks. We develop a robust resilient diffusion least mean Geman-McClure-estimation (RDLMG) algorithm based on the cost function used by the Geman-McClure estimator, which can reduce the sensitivity to large outliers and make the algorithm robust under impulsive interferences. Moreover, the mean sub-sequence reduced method, in which each node discards the extreme value information of cost contributions received from its neighbors, can make the network resilient against Byzantine attacks. In this regard, the proposed RDLMG algorithm ensures that all normal nodes converge to their ideal states with cooperation among nodes. A statistical analysis of the RDLMG algorithm is also carried out in terms of mean and mean-square performances. Numerical results evaluate the proposed RDLMG algorithm in applications to multi-target localization and multi-task spectrum sensing.

preprint2022arXiv

Dynamic and heterogeneous treatment effects with abrupt changes

From personalised medicine to targeted advertising, it is an inherent task to provide a sequence of decisions with historical covariates and outcome data. This requires understanding of both the dynamics and heterogeneity of treatment effects. In this paper, we are concerned with detecting abrupt changes in the treatment effects in terms of the conditional average treatment effect (CATE) in a sequential fashion. To be more specific, at each time point, we consider a nonparametric model to allow for maximal flexibility and robustness. Along the time, we allow for temporal dependence on historical covariates and noise functions. We provide a kernel-based change point estimator, which is shown to be consistent in terms of its detection delay, under an average run length control. Numerical results are provided to support our theoretical findings.

preprint2022arXiv

Electrically-driven Photonic Crystal Lasers with Ultra-low Threshold

Light sources with ultra-low energy consumption and high performance are required to realize optical interconnects for on-chip communication. Photonic crystal (PhC) nanocavity lasers are one of the most promising candidates to fill this role. In this work, we demonstrate an electrically-driven PhC nanolaser with an ultra-low threshold current of 10.2 μA emitting at 1540 nm and operated at room temperature. The lasers are InP-based bonded on Si and comprise a buried heterostructure active region and lateral p-i-n junction. The static characteristics and the thermal properties of the lasers have been characterized. The effect of disorder and p-doping absorption on the Q-factor of passive cavities was studied. We also investigate the leakage current due to the lateral p-i-n geometry by comparing the optical and electrical pumping schemes.

preprint2022arXiv

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation

Real-time semantic segmentation, which can be visually understood as the pixel-level classification task on the input image, currently has broad application prospects, especially in the fast-developing fields of autonomous driving and drone navigation. However, the huge burden of calculation together with redundant parameters are still the obstacles to its technological development. In this paper, we propose a Fast Bilateral Symmetrical Network (FBSNet) to alleviate the above challenges. Specifically, FBSNet employs a symmetrical encoder-decoder structure with two branches, semantic information branch and spatial detail branch. The Semantic Information Branch (SIB) is the main branch with semantic architecture to acquire the contextual information of the input image and meanwhile acquire sufficient receptive field. While the Spatial Detail Branch (SDB) is a shallow and simple network used to establish local dependencies of each pixel for preserving details, which is essential for restoring the original resolution during the decoding phase. Meanwhile, a Feature Aggregation Module (FAM) is designed to effectively combine the output of these two branches. Experimental results of Cityscapes and CamVid show that the proposed FBSNet can strike a good balance between accuracy and efficiency. Specifically, it obtains 70.9\% and 68.9\% mIoU along with the inference speed of 90 fps and 120 fps on these two test datasets, respectively, with only 0.62 million parameters on a single RTX 2080Ti GPU. The code is available at https://github.com/IVIPLab/FBSNet.

preprint2022arXiv

Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution

Convolutional neural networks based single-image super-resolution (SISR) has made great progress in recent years. However, it is difficult to apply these methods to real-world scenarios due to the computational and memory cost. Meanwhile, how to take full advantage of the intermediate features under the constraints of limited parameters and calculations is also a huge challenge. To alleviate these issues, we propose a lightweight yet efficient Feature Distillation Interaction Weighted Network (FDIWN). Specifically, FDIWN utilizes a series of specially designed Feature Shuffle Weighted Groups (FSWG) as the backbone, and several novel mutual Wide-residual Distillation Interaction Blocks (WDIB) form an FSWG. In addition, Wide Identical Residual Weighting (WIRW) units and Wide Convolutional Residual Weighting (WCRW) units are introduced into WDIB for better feature distillation. Moreover, a Wide-Residual Distillation Connection (WRDC) framework and a Self-Calibration Fusion (SCF) unit are proposed to interact features with different scales more flexibly and efficiently.Extensive experiments show that our FDIWN is superior to other models to strike a good balance between model performance and efficiency. The code is available at https://github.com/IVIPLab/FDIWN.

preprint2022arXiv

Functional Linear Regression with Mixed Predictors

We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can accommodate functional variables observed on discrete sample points. Besides a conventional smoothness penalty, a group Lasso-type penalty is further imposed to induce sparsity in the high-dimensional vector predictors. We derive finite sample theoretical guarantees and show that the excess prediction risk of our estimator is minimax optimal. Furthermore, our analysis reveals an interesting phase transition phenomenon that the optimal excess risk is determined jointly by the smoothness and the sparsity of the functional regression coefficients. A novel efficient optimization algorithm based on iterative coordinate descent is devised to handle the smoothness and group penalties simultaneously. Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature.

preprint2022arXiv

HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors. The code and pre-trained model are available at https://github.com/WX-Wei/HarmoF0.

preprint2022arXiv

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not change over time. To make full use of such latent information, we propose HPPNet that using the Harmonic Dilated Convolution to capture the harmonic structures and the Frequency Grouped Recurrent Neural Network to model the pitch-invariance over time. Experimental results on the MAESTRO dataset show that our piano transcription system achieves state-of-the-art performance both in frame and note scores (frame F1 93.15%, note F1 97.18%). Moreover, the model size is much smaller than the previous state-of-the-art deep learning models.

preprint2022arXiv

Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Generating melody from lyrics is an interesting yet challenging task in the area of artificial intelligence and music. However, the difficulty of keeping the consistency between input lyrics and generated melody limits the generation quality of previous works. In our proposal, we demonstrate our proposed interpretable lyrics-to-melody generation system which can interact with users to understand the generation process and recreate the desired songs. To improve the reliability of melody generation that matches lyrics, mutual information is exploited to strengthen the consistency between lyrics and generated melodies. Gumbel-Softmax is exploited to solve the non-differentiability problem of generating discrete music attributes by Generative Adversarial Networks (GANs). Moreover, the predicted probabilities output by the generator is utilized to recommend music attributes. Interacting with our lyrics-to-melody generation system, users can listen to the generated AI song as well as recreate a new song by selecting from recommended music attributes.

preprint2022arXiv

Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music

Singing voice detection (SVD), to recognize vocal parts in the song, is an essential task in music information retrieval (MIR). The task remains challenging since singing voice varies and intertwines with the accompaniment music, especially for some complicated polyphonic music such as choral music recordings. To address this problem, we investigate singing voice detection while discarding the interference from the accompaniment. The proposed SVD has two steps: i. The singing voice separation (SVS) technique is first utilized to filter out the singing voice's potential part coarsely. ii. Upon the continuity of vocal in the time domain, Long-term Recurrent Convolutional Networks (LRCN) is used to learn compositional features. Moreover, to eliminate the outliers, we choose to use a median filter for time-domain smoothing. Experimental results show that the proposed method outperforms the existing state-of-the-art works on two public datasets, the Jamendo Corpus and the RWC pop dataset.

preprint2022arXiv

Leaning Compact and Representative Features for Cross-Modality Person Re-Identification

This paper pays close attention to the cross-modality visible-infrared person re-identification (VI Re-ID) task, which aims to match pedestrian samples between visible and infrared modes. In order to reduce the modality-discrepancy between samples from different cameras, most existing works usually use constraints based on Euclidean metric. Because of the Euclidean based distance metric strategy cannot effectively measure the internal angles between the embedded vectors, the existing solutions cannot learn the angularly discriminative feature embedding. Since the most important factor affecting the classification task based on embedding vector is whether there is an angularly discriminative feature space, in this paper, we present a new loss function called Enumerate Angular Triplet (EAT) loss. Also, motivated by the knowledge distillation, to narrow down the features between different modalities before feature embedding, we further present a novel Cross-Modality Knowledge Distillation (CMKD) loss. Benefit from the above two considerations, the embedded features are discriminative enough in a way to tackle modality-discrepancy problem. The experimental results on RegDB and SYSU-MM01 datasets have demonstrated that the proposed method is superior to the other most advanced methods in terms of impressive performance. Code is available at https://github.com/IVIPLab/LCCRF.

preprint2022arXiv

Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer

Single-image super-resolution (SISR) has achieved significant breakthroughs with the development of deep learning. However, these methods are difficult to be applied in real-world scenarios since they are inevitably accompanied by the problems of computational and memory costs caused by the complex operations. To solve this issue, we propose a Lightweight Bimodal Network (LBNet) for SISR. Specifically, an effective Symmetric CNN is designed for local feature extraction and coarse image reconstruction. Meanwhile, we propose a Recursive Transformer to fully learn the long-term dependence of images thus the global information can be fully used to further refine texture details. Studies show that the hybrid of CNN and Transformer can build a more efficient model. Extensive experiments have proved that our LBNet achieves more prominent performance than other state-of-the-art methods with a relatively low computational cost and memory consumption. The code is available at https://github.com/IVIPLab/LBNet.

preprint2022arXiv

Localising change points in piecewise polynomials of general degrees

In this paper we are concerned with a sequence of univariate random variables with piecewise polynomial means and independent sub-Gaussian noise. The underlying polynomials are allowed to be of arbitrary but fixed degrees. All the other model parameters are allowed to vary depending on the sample size. We propose a two-step estimation procedure based on the $\ell_0$-penalisation and provide upper bounds on the localisation error. We complement these results by deriving a global information-theoretic lower bounds, which show that our two-step estimators are nearly minimax rate-optimal. We also show that our estimator enjoys near optimally adaptive performance by attaining individual localisation errors depending on the level of smoothness at individual change points of the underlying signal. In addition, under a special smoothness constraint, we provide a minimax lower bound on the localisation errors. This lower bound is independent of the polynomial orders and is sharper than the global minimax lower bound.

preprint2022arXiv

On robustness and local differential privacy

It is of soaring demand to develop statistical analysis tools that are robust against contamination as well as preserving individual data owners' privacy. In spite of the fact that both topics host a rich body of literature, to the best of our knowledge, we are the first to systematically study the connections between the optimality under Huber's contamination model and the local differential privacy (LDP) constraints. In this paper, we start with a general minimax lower bound result, which disentangles the costs of being robust against Huber's contamination and preserving LDP. We further study four concrete examples: a two-point testing problem, a potentially-diverging mean estimation problem, a nonparametric density estimation problem and a univariate median estimation problem. For each problem, we demonstrate procedures that are optimal in the presence of both contamination and LDP constraints, comment on the connections with the state-of-the-art methods that are only studied under either contamination or privacy constraints, and unveil the connections between robustness and LDP via partially answering whether LDP procedures are robust and whether robust procedures can be efficiently privatised. Overall, our work showcases a promising prospect of joint study for robustness and local differential privacy.

preprint2022arXiv

Optimal partition recovery in general graphs

We consider a graph-structured change point problem in which we observe a random vector with piecewise constant but unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $σ^2$, the size $Δ$ of the smaller element of the partition, the magnitude $κ$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut -- a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $κ^2 Δσ^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $κ^2 Δσ^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim ζ_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $ζ_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error -- measured by the symmetric difference between the true and estimated partition -- of order $ κ^{-2} σ^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes.

preprint2022arXiv

Sparsity-Aware Robust Normalized Subband Adaptive Filtering algorithms based on Alternating Optimization

This paper proposes a unified sparsity-aware robust normalized subband adaptive filtering (SA-RNSAF) algorithm for identification of sparse systems under impulsive noise. The proposed SA-RNSAF algorithm generalizes different algorithms by defining the robust criterion and sparsity-aware penalty. Furthermore, by alternating optimization of the parameters (AOP) of the algorithm, including the step-size and the sparsity penalty weight, we develop the AOP-SA-RNSAF algorithm, which not only exhibits fast convergence but also obtains low steady-state misadjustment for sparse systems. Simulations in various noise scenarios have verified that the proposed AOP-SA-RNSAF algorithm outperforms existing techniques.

preprint2022arXiv

Study of General Robust Subband Adaptive Filtering

In this paper, we propose a general robust subband adaptive filtering (GR-SAF) scheme against impulsive noise by minimizing the mean square deviation under the random-walk model with individual weight uncertainty. Specifically, by choosing different scaling factors such as from the M-estimate and maximum correntropy robust criteria in the GR-SAF scheme, we can easily obtain different GR-SAF algorithms. Importantly, the proposed GR-SAF algorithm can be reduced to a variable regularization robust normalized SAF algorithm, thus having fast convergence rate and low steady-state error. Simulations in the contexts of system identification with impulsive noise and echo cancellation with double-talk have verified that the proposed GR-SAF algorithms outperforms its counterparts.

preprint2022arXiv

Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond

Rain removal aims to remove rain streaks from images/videos and reduce the disruptive effects caused by rain. It not only enhances image/video visibility but also allows many computer vision algorithms to function properly. This paper makes the first attempt to conduct a comprehensive study on the robustness of deep learning-based rain removal methods against adversarial attacks. Our study shows that, when the image/video is highly degraded, rain removal methods are more vulnerable to the adversarial attacks as small distortions/perturbations become less noticeable or detectable. In this paper, we first present a comprehensive empirical evaluation of various methods at different levels of attacks and with various losses/targets to generate the perturbations from the perspective of human perception and machine analysis tasks. A systematic evaluation of key modules in existing methods is performed in terms of their robustness against adversarial attacks. From the insights of our analysis, we construct a more robust deraining method by integrating these effective modules. Finally, we examine various types of adversarial attacks that are specific to deraining problems and their effects on both human and machine vision tasks, including 1) rain region attacks, adding perturbations only in the rain regions to make the perturbations in the attacked rain images less visible; 2) object-sensitive attacks, adding perturbations only in regions near the given objects. Code is available at https://github.com/yuyi-sd/Robust_Rain_Removal.

preprint2021arXiv

C3VQG: Category Consistent Cyclic Visual Question Generation

Visual Question Generation (VQG) is the task of generating natural questions based on an image. Popular methods in the past have explored image-to-sequence architectures trained with maximum likelihood which have demonstrated meaningful generated questions given an image and its associated ground-truth answer. VQG becomes more challenging if the image contains rich contextual information describing its different semantic categories. In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers. Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations. Most importantly, by eliminating expensive answer annotations, the required supervision is weakened. Using different categories enables us to exploit different concepts as the inference requires only the image and the category. Mutual information is maximized between the image, question, and answer category in the latent space of our VAE. A novel category consistent cyclic loss is proposed to enable the model to generate consistent predictions with respect to the answer category, reducing redundancies and irregularities. Additionally, we also impose supplementary constraints on the latent space of our generative model to provide structure based on categories and enhance generalization by encapsulating decorrelated features within each dimension. Through extensive experiments, the proposed model, C3VQG outperforms state-of-the-art VQG methods with weak supervision.

preprint2021arXiv

Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing

In Intelligent Tutoring System (ITS), tracing the student's knowledge state during learning has been studied for several decades in order to provide more supportive learning instructions. In this paper, we propose a novel model for knowledge tracing that i) captures students' learning ability and dynamically assigns students into distinct groups with similar ability at regular time intervals, and ii) combines this information with a Recurrent Neural Network architecture known as Deep Knowledge Tracing. Experimental results confirm that the proposed model is significantly better at predicting student performance than well known state-of-the-art techniques for student modelling.

preprint2021arXiv

Generalized non-stationary bandits

In this paper, we study a non-stationary stochastic bandit problem, which generalizes the switching bandit problem. On top of the switching bandit problem (\textbf{Case a}), we are interested in three concrete examples: (\textbf{b}) the means of the arms are local polynomials, (\textbf{c}) the means of the arms are locally smooth, and (\textbf{d}) the gaps of the arms have a bounded number of inflexion points and where the highest arm mean cannot vary too much in a short range. These three settings are very different, but have in common the following: (i) the number of similarly-sized level sets of the logarithm of the gaps can be controlled, and (ii) the highest mean has a limited number of abrupt changes, and otherwise has limited variations. We propose a single algorithm in this general setting, that in particular solves in an efficient and unified way the four problems (a)-(d) mentioned.

preprint2021arXiv

Optimal network online change point localisation

We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection delay, we establish a minimax lower bound and two upper bounds based on NP-hard algorithms and polynomial-time algorithms, i.e., \[ \mbox{detection delay} \begin{cases} \gtrsim \log(1/α) \frac{\max\{r^2/n, \, 1\}}{κ_0^2 n ρ},\\ \lesssim \log(Δ/α) \frac{\max\{r^2/n, \, \log(r)\}}{κ_0^2 n ρ}, & \mbox{with NP-hard algorithms},\\ \lesssim \log(Δ/α) \frac{r}{κ_0^2 n ρ}, & \mbox{with polynomial-time algorithms}, \end{cases} \] where $κ_0, n, ρ, r$ and $α$ are the normalised jump size, network size, entrywise sparsity, rank sparsity and the overall Type-I error upper bound. All the model parameters are allowed to vary as $Δ$, the location of the change point, diverges. The polynomial-time algorithms are novel procedures that we propose in this paper, designed for quick detection under two different forms of Type-I error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that, under different scenarios and the aforementioned forms of Type-I error control, our proposed approaches outperform state-of-the-art methods.

preprint2021arXiv

Singer Identification Using Deep Timbre Feature Learning with KNN-Net

In this paper, we study the issue of automatic singer identification (SID) in popular music recordings, which aims to recognize who sang a given piece of song. The main challenge for this investigation lies in the fact that a singer's singing voice changes and intertwines with the signal of background accompaniment in time domain. To handle this challenge, we propose the KNN-Net for SID, which is a deep neural network model with the goal of learning local timbre feature representation from the mixture of singer voice and background music. Unlike other deep neural networks using the softmax layer as the output layer, we instead utilize the KNN as a more interpretable layer to output target singer labels. Moreover, attention mechanism is first introduced to highlight crucial timbre features for SID. Experiments on the existing artist20 dataset show that the proposed approach outperforms the state-of-the-art method by 4%. We also create singer32 and singer60 datasets consisting of Chinese pop music to evaluate the reliability of the proposed method. The more extensive experiments additionally indicate that our proposed model achieves a significant performance improvement compared to the state-of-the-art methods.

preprint2020arXiv

Conditional Hybrid GAN for Sequence Generation

Conditional sequence generation aims to instruct the generation procedure by conditioning the model with additional context information, which is a self-supervised learning issue (a form of unsupervised learning with supervision information from data itself). Unfortunately, the current state-of-the-art generative models have limitations in sequence generation with multiple attributes. In this paper, we propose a novel conditional hybrid GAN (C-Hybrid-GAN) to solve this issue. Discrete sequence with triplet attributes are separately generated when conditioned on the same context. Most importantly, relational reasoning technique is exploited to model not only the dependency inside each sequence of the attribute during the training of the generator but also the consistency among the sequences of attributes during the training of the discriminator. To avoid the non-differentiability problem in GANs encountered during discrete data generation, we exploit the Gumbel-Softmax technique to approximate the distribution of discrete-valued sequences.Through evaluating the task of generating melody (associated with note, duration, and rest) from lyrics, we demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in context-conditioned discrete-valued sequence generation.

preprint2020arXiv

Designing Artificial Two-Dimensional Landscapes via Room-Temperature Atomic-Layer Substitution

Manipulating materials with atomic-scale precision is essential for the development of next-generation material design toolbox. Tremendous efforts have been made to advance the compositional, structural, and spatial accuracy of material deposition and patterning. The family of 2D materials provides an ideal platform to realize atomic-level material architectures. The wide and rich physics of these materials have led to fabrication of heterostructures, superlattices, and twisted structures with breakthrough discoveries and applications. Here, we report a novel atomic-scale material design tool that selectively breaks and forms chemical bonds of 2D materials at room temperature, called atomic-layer substitution (ALS), through which we can substitute the top layer chalcogen atoms within the 3-atom-thick transition-metal dichalcogenides using arbitrary patterns. Flipping the layer via transfer allows us to perform the same procedure on the other side, yielding programmable in-plane multi-heterostructures with different out-of-plane crystal symmetry and electric polarization. First-principle calculations elucidate how the ALS process is overall exothermic in energy and only has a small reaction barrier, facilitating the reaction to occur at room temperature. Optical characterizations confirm the fidelity of this design approach, while TEM shows the direct evidence of Janus structure and suggests the atomic transition at the interface of designed heterostructure. Finally, transport and Kelvin probe measurements on MoXY (X,Y=S,Se; X and Y corresponding to the bottom and top layers) lateral multi-heterostructures reveal the surface potential and dipole orientation of each region, and the barrier height between them. Our approach for designing artificial 2D landscape down to a single layer of atoms can lead to unique electronic, photonic and mechanical properties previously not found in nature.

preprint2020arXiv

Detecting Abrupt Changes in High-Dimensional Self-Exciting Poisson Processes

High-dimensional self-exciting point processes have been widely used in many application areas to model discrete event data in which past and current events affect the likelihood of future events. In this paper, we are concerned with detecting abrupt changes of the coefficient matrices in discrete-time high-dimensional self-exciting Poisson processes, which have yet to be studied in the existing literature due to both theoretical and computational challenges rooted in the non-stationary and high-dimensional nature of the underlying process. We propose a penalized dynamic programming approach which is supported by a theoretical rate analysis and numerical evidence.

preprint2020arXiv

Effects of Initial State on Opinion Formation in Complex Social Networks with Noises

Opinion formation in complex social networks may exhibit complex system dynamics even when based on some simplest system evolution models. An interesting and important issue is the effects of the initial state on the final steady-state opinion distribution. We show that, while different initial opinion distributions certainly make differences to opinion evolution in social systems without noises, in systems with noises, given enough time, different initial states basically do not contribute to making any significant differences in the final steady state. Instead, it is the basal distribution of the preferred opinions that contributes to deciding the final state of the systems. Such an observation contradicts with a long-term belief on the roles of system initial state in opinion formation. We propose some brief discussions on the reasons supporting our statement, and the implications of such an observation in real-life applications.

preprint2020arXiv

End-to-end Named Entity Recognition from English Speech

Named entity recognition (NER) from text has been a widely studied problem and usually extracts semantic information from text. Until now, NER from speech is mostly studied in a two-step pipeline process that includes first applying an automatic speech recognition (ASR) system on an audio sample and then passing the predicted transcript to a NER tagger. In such cases, the error does not propagate from one step to another as both the tasks are not optimized in an end-to-end (E2E) fashion. Recent studies confirm that integrated approaches (e.g., E2E ASR) outperform sequential ones (e.g., phoneme based ASR). In this paper, we introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimizes the ASR and NER tagger components. Experimental results show that the proposed E2E approach outperforms the classical two-step approach. We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.

preprint2020arXiv

Graph matching beyond perfectly-overlapping Erdős--Rényi random graphs

Graph matching is a fruitful area in terms of both algorithms and theories. In this paper, we exploit the degree information, which was previously used only in noiseless graphs and perfectly-overlapping Erdős--Rényi random graphs matching. We are concerned with graph matching of partially-overlapping graphs and stochastic block models, which are more useful in tackling real-life problems. We propose the edge exploited degree profile graph matching method and two refined varations. We conduct a thorough analysis of our proposed methods' performances in a range of challenging scenarios, including a zebrafish neuron activity data set and a coauthorship data set. Our methods are proved to be numerically superior than the state-of-the-art methods.

preprint2020arXiv

Localizing Changes in High-Dimensional Vector Autoregressive Processes

Autoregressive models capture stochastic processes in which past realizations determine the generative distribution of new data; they arise naturally in a variety of industrial, biomedical, and financial settings. A key challenge when working with such data is to determine when the underlying generative model has changed, as this can offer insights into distinct operating regimes of the underlying system. This paper describes a novel dynamic programming approach to localizing changes in high-dimensional autoregressive processes and associated error rates that improve upon the prior state of the art. When the model parameters are piecewise constant over time and the corresponding process is piecewise stable, the proposed dynamic programming algorithm consistently localizes change points even as the dimensionality, the sparsity of the coefficient matrices, the temporal spacing between two consecutive change points, and the magnitude of the difference of two consecutive coefficient matrices are allowed to vary with the sample size. Furthermore, the accuracy of initial, coarse change point localization estimates can be boosted via a computationally-efficient refinement algorithm that provably improves the localization error rate. Finally, a comprehensive simulation experiments and a real data analysis are provided to show the numerical superiority of our proposed methods.

preprint2020arXiv

Optimal nonparametric multivariate change point detection and localization

We study the multivariate nonparametric change point detection problem, where the data are a sequence of independent $p$-dimensional random vectors whose distributions are piecewise-constant with Lipschitz densities changing at unknown times, called change points. We quantify the size of the distributional change at any change point with the supremum norm of the difference between the corresponding densities. We are concerned with the localization task of estimating the positions of the change points. In our analysis, we allow for the model parameters to vary with the total number of time points, including the minimal spacing between consecutive change points and the magnitude of the smallest distributional change. We provide information-theoretic lower bounds on both the localization rate and the minimal signal-to-noise ratio required to guarantee consistent localization. We formulate a novel algorithm based on kernel density estimation that nearly achieves the minimax lower bound, save possibly for logarithm factors. We have provided extensive numerical evidence to support our theoretical findings.

preprint2020arXiv

PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability

Fashion is an important part of human experience. Events such as interviews, meetings, marriages, etc. are often based on clothing styles. The rise in the fashion industry and its effect on social influencing have made outfit compatibility a need. Thus, it necessitates an outfit compatibility model to aid people in clothing recommendation. However, due to the highly subjective nature of compatibility, it is necessary to account for personalization. Our paper devises an attribute-wise interpretable compatibility scheme with personal preference modelling which captures user-item interaction along with general item-item interaction. Our work solves the problem of interpretability in clothing matching by locating the discordant and harmonious attributes between fashion items. Extensive experiment results on IQON3000, a publicly available real-world dataset, verify the effectiveness of the proposed model.

preprint2020arXiv

Sparsity-Aware SSAF Algorithm with Individual Weighting Factors for Acoustic Echo Cancellation

In this paper, we propose and analyze the sparsity-aware sign subband adaptive filtering with individual weighting factors (S-IWF-SSAF) algorithm, and consider its application in acoustic echo cancellation (AEC). Furthermore, we design a joint optimization scheme of the step-size and the sparsity penalty parameter to enhance the S-IWF-SSAF performance in terms of convergence rate and steady-state error. A theoretical analysis shows that the S-IWF-SSAF algorithm outperforms the previous sign subband adaptive filtering with individual weighting factors (IWF-SSAF) algorithm in sparse scenarios. In particular, compared with the existing analysis on the IWF-SSAF algorithm, the proposed analysis does not require the assumptions of large number of subbands, long adaptive filter, and paraunitary analysis filter bank, and matches well the simulated results. Simulations in both system identification and AEC situations have demonstrated our theoretical analysis and the effectiveness of the proposed algorithms.

preprint2020arXiv

Superfluidity and pairing phenomena in ultracold atomic Fermi gases in one-dimensional optical lattices, Part I: Balanced case

The superfluidity and pairing phenomena in ultracold atomic Fermi gases have been of great interest in recent years, with multiple tunable parameters. Here we study the BCS-BEC crossover behavior of balanced two-component Fermi gases in a one-dimensional optical lattice, which is distinct from the simple three-dimensional (3D) continuum and a fully 3D lattice often found in a condensed matter system. We use a pairing fluctuation theory which includes self-consistent feedback effects at finite temperatures, and find widespread pseudogap phenomena beyond the BCS regime. As a consequence of the lattice periodicity, the superfluid transition temperature $T_c$ decreases with pairing strength in the BEC regime, where it approaches asymptotically $T_c = πan/2m$, with $a$ being the $s$-wave scattering length, and $n$ ($m$) the fermion density (mass). In addition, the quasi-two dimensionality leads to fast growing (absolute value of the) fermionic chemical potential $μ$ and pairing gap $Δ$, which depends exponentially on the ratio $d/a$. Importantly, $T_c$ at unitarity increases with the lattice constant $d$ and hopping integral $t$. The effect of the van Hove singularity on $T_c$ is identified. The superfluid density exhibits $T^{3/2}$ power laws at low $T$, away from the extreme BCS limit. These predictions can be tested in future experiments.

preprint2020arXiv

Superfluidity and pairing phenomena in ultracold atomic Fermi gases in one-dimensional optical lattices, Part II: Effects of population imbalance

In this paper, we study the effect of population imbalance and its interplay with pairing strength and lattice effect in atomic Fermi gases in a one-dimensional optical lattice. We compute various phase diagrams as the system undergoes BCS-BEC crossover, using the same pairing fluctuation theory as in Part I. We find widespread pseudogap phenomena beyond the BCS regime and intermediate temperature superfluid states for relatively low population imbalances. The Fermi surface topology plays an important role in the behavior of $T_\text{c}$. For large $d$ and/or small $t$, which yield an open Fermi surface, superfluidity can be readily destroyed by a small amount of population imbalance $p$. The superfluid phase, especially in the BEC regime, can exist only for a highly restricted volume of the parameter space. Due to the continuum-lattice mixing, population imbalance gives rise to a new mechanism for pair hopping, as assisted by excessive majority fermions, which may lead to significant enhancement of $T_\text{c}$ on the BEC side of the Feshbach resonance, and also render $T_\text{c}$ approaching a constant asymptote in the BEC limit, when it exists. Furthermore, we find that not all minority fermions will be paired up in BEC limit, unlike the 3D continuum case. These predictions can be tested in future experiments.

preprint2020arXiv

Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics

Sheet music, audio, and lyrics are three main modalities during writing a song. In this paper, we propose an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained. In particular, the proposed model can transfer the strong relationship between audio and sheet music to audio-lyrics and sheet-lyrics pairs by learning the correlation in the latent shared subspace. We apply CCA components of audio and sheet music to establish new ground truth. The generative (G) model learns the correlation of two couples of transferred pairs to generate new audio-sheet pair for a fixed lyrics to challenge the discriminative (D) model. The discriminative model aims at distinguishing the input which is from the generative model or the ground truth. The two models simultaneously train in an adversarial way to enhance the ability of deep alignment representation learning. Our experimental results demonstrate the feasibility of our proposed UGAAR for alignment representation learning among sheet music, audio, and lyrics.

preprint2020arXiv

Unusual destruction and enhancement of superfluidity of atomic Fermi gases by population imbalance in a one-dimensional optical lattice

We study the superfluid behavior of a population imbalanced ultracold atomic Fermi gases with a short range attractive interaction in a one-dimensional (1D) optical lattice, using a pairing fluctuation theory. We show that, besides widespread pseudogap phenomena and intermediate temperature superfluidity, the superfluid phase is readily destroyed except in a limited region of the parameter space. We find a new mechanism for pair hopping, assisted by the excessive majority fermions, in the presence of continuum-lattice mixing, which leads to an unusual constant BEC asymptote for $T_c$ that is independent of pairing strength. In result, on the BEC side of unitarity, superfluidity, when it exists, may be strongly enhanced by population imbalance.