Source author record

Yang Xiao

Yang Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

38works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation

We propose Compressed Video Aggregator (CVA), a lightweight micro-video recommendation module that decouples video information from preference learning. It aggregates frozen VFM embeddings, and uses latent reasoning without cross-attention projection, producing compact video embeddings for recommenders. Due to the redundancy in the frame count of the original benchmark and its overly coarse sampling, we used titles to re-select key frames based on CLIP. Experiments on MicroLens and Short-Video show consistent gains with orders-of-magnitude reductions in training time and GPU memory, and re-selected frames can further enhance the performance of all methods, including CVA. Furthermore, we also discussed the impact of several scenarios involving erroneous titles on our method. Code will be released soon.

preprint2026arXiv

MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation

Recent advances in generative models have enabled modern Text-to-Audio (TTA) systems to synthesize audio with high perceptual quality. However, TTA systems often struggle to maintain semantic consistency with the input text, leading to mismatches in sound events, temporal tructures, or contextual relationships. Evaluating semantic fidelity in TTA remains a significant challenge. Traditional methods primarily rely on subjective human listening tests, which is time-consuming. To solve this, we propose an objective evaluator based on a Mixture of Experts (MoE) architecture with Sequential Cross-Attention (SeqCoAttn). Our model achieves the first rank in the XACLE Challenge, with an SRCC of 0.6402 (an improvement of 30.6% over the challenge baseline) on the test dataset. Code is available at: https://github.com/S-Orion/MOESCORE.

preprint2025arXiv

Environmental Sound Deepfake Detection Challenge: An Overview

Recent progress in audio generation models has made it possible to create highly realistic and immersive soundscapes, which are now widely used in film and virtual-reality-related applications. However, these audio generators also raise concerns about potential misuse, such as producing deceptive audio for fabricated videos or spreading misleading information. Therefore, it is essential to develop effective methods for detecting fake environmental sounds. Existing datasets for environmental sound deepfake detection (ESDD) remain limited in both scale and the diversity of sound categories they cover. To address this gap, we introduced EnvSDD, the first large-scale curated dataset designed for ESDD. Based on EnvSDD, we launched the ESDD Challenge, recognized as one of the ICASSP 2026 Grand Challenges. This paper presents an overview of the ESDD Challenge, including a detailed analysis of the challenge results.

preprint2023arXiv

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.

preprint2023arXiv

Implications of Nano-Hertz Gravitational Waves on Electroweak Phase Transition in the Singlet Dark Matter Model

Inspired by the recent evidences of nano-Hertz stochastic gravitational waves observed by the pulsar timing array collaborations, we explore their implied supercooled electroweak phase transition in the singlet extension of the Standard Model. Our findings reveal that by adjusting the model parameter at per milli level, the corresponding percolation temperature can be continuously lowered to 1 GeV. With such a low percolation temperature, the singlet dark matter may freeze out before the electroweak phase transition, and, consequently, the entropy generated during the transition can significantly affect the dark matter relic density. It alleviates the tension between the requirement of a strong electroweak phase transition and the constraints imposed by dark matter direct detection, and can be tested in future experiments.

preprint2022arXiv

Accurate relativistic chiral nucleon-nucleon interaction up to NNLO

We construct a relativistic chiral nucleon-nucleon interaction up to the next-to-next-to-leading order in covariant baryon chiral perturbation theory. We show that a good description of the $np$ phase shifts up to $T_\mathrm{lab}=200$ MeV and even higher can be achieved with a $\tildeχ^2/\mathrm{d.o.f.}$ less than 1. Both the next-to-leading order results and the next-to-next-to-leading order results describe the phase shifts equally well up to $T_\mathrm{lab}=200$ MeV, but for higher energies, the latter behaves better, showing satisfactory convergence. The relativistic chiral potential provides the most essential inputs for relativistic ab initio studies of nuclear structure and reactions, which has been in need for almost two decades.

preprint2022arXiv

An accurate relativistic chiral nucleon-nucleon interaction up to the next-to-next-to-leading order

We report on the construction of an accurate relativistic chiral nucleon-nucleon interaction up to the next-to-next-to-leading (NNLO) order. We compare the so-obtained neutron-proton phaseshifts with the next-to-next-to-next-to-leading order (N$^3$LO) nonrelativistic ones and we show that up to $T_\mathrm{lab.}=200$ MeV, the relativistic chiral nuclear force can describe the PWA93 phaseshifts and inelasticities as well as its N$^3$LO nonrelativistic counterparts. As a result, the relativistic chiral nuclear force can be readily used for relativistic ab initio nuclear structure, reaction, as well astrophysical studies.

preprint2022arXiv

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

In this paper, we ask the research question of whether all the datasets in the benchmark are necessary. We approach this by first characterizing the distinguishability of datasets when comparing different systems. Experiments on 9 datasets and 36 systems show that several existing benchmark datasets contribute little to discriminating top-scoring systems, while those less used datasets exhibit impressive discriminative power. We further, taking the text classification task as a case study, investigate the possibility of predicting dataset discrimination based on its properties (e.g., average sentence length). Our preliminary experiments promisingly show that given a sufficient number of training experimental records, a meaningful predictor can be learned to estimate dataset discrimination over unseen datasets. We released all datasets with features explored in this work on DataLab: \url{https://datalab.nlpedia.ai}.

preprint2022arXiv

Can electron and muon $g-2$ anomalies be jointly explained in SUSY?

The FNAL+BNL measurements for muon $g-2$ is $4.2σ$ above the SM prediction, and the Berkeley $^{133}$Cs measurement for the fine-structure constant $α_{\rm em}$ leads to the SM prediction for electron $g-2$ which is $2.4σ$ above the experimental value. Hence, a joint explanation of both anomalies requires a positive contribution to muon $g-2$ and a negative contribution to electron $g-2$, which is rather challenging. In this work we explore the possibility of such a joint explanation in the minimal supersymmetric standard model (MSSM). Assuming no universality between smuon and selectron soft masses, we find out a part of parameter space for a joint explanation at $2σ$ level, i.e., $μM_1,μM_2<0$, $m_{L1}, m_{E2}<200$ GeV, $m_{L2}$ being much larger than the soft masses of other sleptons, $|M_1|<125$ GeV and $μ<400$ GeV. This part of parameter space can survive LHC and LEP constraints, but gives an over-abundance for dark matter if the bino-like lightest neutralino is assumed to be the dark matter candidate. With the assumption that the dark matter candidate is a superWIMP (say a pseudo-goldstino in multi-sector SUSY breaking scenarios, whose mass can be as light as GeV and produced from the late-decay of the thermally freeze-out lightest neutralino), the dark matter problem can be avoided. So, we conclude that the MSSM may give a joint explanation for the muon and electron $g-2$ anomalies at $2σ$ level (the muon $g-2$ anomaly can be even ameliorated to $1σ$).

preprint2022arXiv

Continual Learning For On-Device Environmental Sound Classification

Continuously learning new classes without catastrophic forgetting is a challenging problem for on-device environmental sound classification given the restrictions on computation resources (e.g., model size, running memory). To address this issue, we propose a simple and efficient continual learning method. Our method selects the historical data for the training by measuring the per-sample classification uncertainty. Specifically, we measure the uncertainty by observing how the classification probability of data fluctuates against the parallel perturbations added to the classifier embedding. In this way, the computation cost can be significantly reduced compared with adding perturbation to the raw data. Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.

preprint2022arXiv

DataLab: A Platform for Data Analysis and Intervention

Despite data's crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data. In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze the characteristics of data, but also provides a standardized interface for different data processing operations. Additionally, in view of the ongoing proliferation of datasets, \toolname has features for dataset recommendation and global vision analysis that help researchers form a better view of the data ecosystem. So far, DataLab covers 1,715 datasets and 3,583 of its transformed version (e.g., hyponyms replacement), where 728 datasets support various analyses (e.g., with respect to gender bias) with the help of 140M samples annotated by 318 feature functions. DataLab is under active development and will be supported going forward. We have released a web platform, web API, Python SDK, PyPI published package and online documentation, which hopefully, can meet the diverse needs of researchers.

preprint2022arXiv

Finite-time quantum Otto engine with a squeezed thermal bath: Role of quantum coherence and squeezing in the performance and fluctuations

We consider a finite-time quantum Otto heat engine that consists of two isochoric (thermal-contact) process, where the system is alternatively coupled to a hot squeezed and a cold thermal reservoir, and two unitary driven strokes, where the system is isolated from these two baths and its von Neumann entropy keeps constant. Both quantum inner friction and coherence are generated along the driven stroke and coherence cannot be fully erased after the finite-time hot isochore. Using full counting statistics, we present the probability distribution functions of heat injection and total work per cycle, which are dependent on the time duration along each process. With these, we derive the analytical expressions for the thermodynamic quantities of the two-level heat engine, such as total work, thermodynamic efficiency, entropy production, and work fluctuations, in which effects of coherence, squeezing, inner friction and finite-time heat exchange are included. We then numerically determine the thermodynamic quantities and the fluctuations using the parameters employed in the experimental implementation. Our results clarify the role of coherence and squeezing in the performance and fluctuations in the quantum Otto engines.

preprint2022arXiv

Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling

Training deep neural network (DNN) with noisy labels is practically challenging since inaccurate labels severely degrade the generalization ability of DNN. Previous efforts tend to handle part or full data in a unified denoising flow via identifying noisy data with a coarse small-loss criterion to mitigate the interference from noisy labels, ignoring the fact that the difficulties of noisy samples are different, thus a rigid and unified data selection pipeline cannot tackle this problem well. In this paper, we first propose a coarse-to-fine robust learning method called CREMA, to handle noisy data in a divide-and-conquer manner. In coarse-level, clean and noisy sets are firstly separated in terms of credibility in a statistical sense. Since it is practically impossible to categorize all noisy samples correctly, we further process them in a fine-grained manner via modeling the credibility of each sample. Specifically, for the clean set, we deliberately design a memory-based modulation scheme to dynamically adjust the contribution of each sample in terms of its historical credibility sequence during training, thus alleviating the effect from noisy samples incorrectly grouped into the clean set. Meanwhile, for samples categorized into the noisy set, a selective label update strategy is proposed to correct noisy labels while mitigating the problem of correction error. Extensive experiments are conducted on benchmarks of different modalities, including image classification (CIFAR, Clothing1M etc) and text recognition (IMDB), with either synthetic or natural semantic noises, demonstrating the superiority and generality of CREMA.

preprint2022arXiv

Nonperturbative two-pion exchange contributions to the nucleon-nucleon interaction in covariant baryon chiral perturbation theory

We calculate the nonperturbative two-pion exchange (TPE) contributions to the $NN$ interaction in covariant baryon chiral perturbation theory. We study how the nonperturbative resummation affects the $NN$ phase shifts for partial waves with $J \geq 3$ and $L \leq 6$. No significant differences are observed between the nonperturbative phase shifts and perturbative ones for most partial waves except for $^3D_3$, for which the nonperturbative resummation greatly improves the description of the phase shifts. However, a significant cutoff dependence is found for this partial wave and a reasonable description of the phase shifts can only be obtained with a particular cutoff. Furthermore, we compare the so-obtained nonperturbative phase shifts with those obtained in the heavy baryon chiral perturbation theory. We show that the contributions from relativistic nonperturbative TPE are more moderate than those from the nonrelativistic TPE obtained in the dimensional regularization scheme. A proper convergence pattern is observed for most of the partial waves studied except for $^3F_3$, $^3F_4$, and $^3H_6$, for which the subleading TPE contributions are a bit strong. We find that for $H$ and $I$ partial waves, the OPE alone can already describe the phase shifts reasonably well.

preprint2022arXiv

Nucleon-nucleon interaction in the $^3S_1$-$^3D_1$ coupled channel for a pion mass of 469 MeV

In this work, we apply the relativistic chiral nuclear force to describe the state-of-the-art lattice simulations of the nucleon-nucleon scattering amplitude. In particular, we focus on the $^3S_1$-$^3D_1$ coupled channel for a pion mass of 469 MeV. We show that at leading order the relativistic chiral nuclear force can only describe $δ_{3S1}$ and $\varepsilon_1$ up to $T_\mathrm{lab.}\approx10$ MeV, while at the next-to-leading order it can do much better up to $T_\mathrm{lab}=200$ MeV. However, at the next-to-next-to-leading order, the description deteriorates, which can be attributed to the fact that the pion-mass dependence of the pion-nucleon couplings $c_{1,2,3,4}$ may not be negligible. Furthermore, all the studies consistently yield negative $δ_{3D1}$, contrary to the lattice QCD results which are positive but consistent with zero. The present study is relevant to a better understanding of the lattice QCD nucleon-nucleon force and more general baryon-baryon interactions.

preprint2022arXiv

On the Robustness of Reading Comprehension Models to Entity Renaming

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed? Such failures imply that models overly rely on entity information to answer questions, and thus may generalize poorly when facts about the world change or questions are asked about novel entities. To systematically audit this issue, we present a pipeline to automatically generate test examples at scale, by replacing entity names in the original test sample with names from a variety of sources, ranging from names in the same test set, to common names in life, to arbitrary strings. Across five datasets and three pretrained model architectures, MRC models consistently perform worse when entities are renamed, with particularly large accuracy drops on datasets constructed via distant supervision. We also find large differences between models: SpanBERT, which is pretrained with span-level masking, is more robust than RoBERTa, despite having similar accuracy on unperturbed test data. We further experiment with different masking strategies as the continual pretraining objective and find that entity-based masking can improve the robustness of MRC models.

preprint2022arXiv

Performance of quantum heat engines via adiabatic deformation of potential

We present a quantum Otto engine model consisting of two isochoric and two adiabatic strokes, where the adiabatic expansion or compression is realized by adiabatically changing the shape of the potential. Here we show that such an adiabatic deformation may alter operation mode and enhance machine performance by increasing output work and efficiency, even with the advantage of decreasing work fluctuations. If the heat engine operates under maximal power by optimizing the control parameter, the efficiency shows certain universal behavior.

preprint2022arXiv

Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting

Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment. This problem will be more challenging if KWS models are further required for edge devices due to their limited memory. To alleviate such an issue, we propose a novel diversity-aware incremental learning method named Rainbow Keywords (RK). Specifically, the proposed RK approach introduces a diversity-aware sampler to select a diverse set from historical and incoming keywords by calculating classification uncertainty. As a result, the RK approach can incrementally learn new tasks without forgetting prior knowledge. Besides, the RK approach also proposes data augmentation and knowledge distillation loss function for efficient memory management on the edge device. Experimental results show that the proposed RK approach achieves 4.2% absolute improvement in terms of average accuracy over the best baseline on Google Speech Command dataset with less required memory. The scripts are available on GitHub.

preprint2022arXiv

Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness

It is critical for a keyword spotting model to have a small footprint as it typically runs on-device with low computational resources. However, maintaining the previous SOTA performance with reduced model size is challenging. In addition, a far-field and noisy environment with multiple signals interference aggravates the problem causing the accuracy to degrade significantly. In this paper, we present a multi-channel ConvMixer for speech command recognitions. The novel architecture introduces an additional audio channel mixing for channel audio interaction in a multi-channel audio setting to achieve better noise-robust features with more efficient computation. Besides, we proposed a centroid based awareness component to enhance the system by equipping it with additional spatial geometry information in the latent feature projection space. We evaluate our model using the new MISP challenge 2021 dataset. Our model achieves significant improvement against the official baseline with a 55% gain in the competition score (0.152) on raw microphone array input and a 63% (0.126) boost upon front-end speech enhancement.

preprint2022arXiv

Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. Our method requires neither a training phase on these objects nor real images depicting them, only their CAD models. It relies on a small set of training objects to learn local object representations, which allow us to locally match the input image to a set of "templates", rendered images of the CAD models for the new objects. In contrast with the state-of-the-art methods, the new objects on which our method is applied can be very different from the training objects. As a result, we are the first to show generalization without retraining on the LINEMOD and Occlusion-LINEMOD datasets. Our analysis of the failure modes of previous template-based approaches further confirms the benefits of local features for template matching. We outperform the state-of-the-art template matching methods on the LINEMOD, Occlusion-LINEMOD and T-LESS datasets. Our source code and data are publicly available at https://github.com/nv-nguyen/template-pose

preprint2021arXiv

Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources

Sentiment analysis of user-generated reviews or comments on products and services in social networks can help enterprises to analyze the feedback from customers and take corresponding actions for improvement. To mitigate large-scale annotations on the target domain, domain adaptation (DA) provides an alternate solution by learning a transferable model from other labeled source domains. Existing multi-source domain adaptation (MDA) methods either fail to extract some discriminative features in the target domain that are related to sentiment, neglect the correlations of different sources and the distribution difference among different sub-domains even in the same source, or cannot reflect the varying optimal weighting during different training stages. In this paper, we propose a novel instance-level MDA framework, named curriculum cycle-consistent generative adversarial network (C-CycleGAN), to address the above issues. Specifically, C-CycleGAN consists of three components: (1) pre-trained text encoder which encodes textual input from different domains into a continuous representation space, (2) intermediate domain generator with curriculum instance-level adaptation which bridges the gap across source and target domains, and (3) task classifier trained on the intermediate domain for final sentiment classification. C-CycleGAN transfers source samples at instance-level to an intermediate domain that is closer to the target domain with sentiment semantics preserved and without losing discriminative features. Further, our dynamic instance-level weighting mechanisms can assign the optimal weights to different source samples in each training stage. We conduct extensive experiments on three benchmark datasets and achieve substantial gains over state-of-the-art DA approaches. Our source code is released at: https://github.com/WArushrush/Curriculum-CycleGAN.

preprint2021arXiv

Decentralized Spectrum Access System: Vision, Challenges, and a Blockchain Solution

Spectrum access system (SAS) is widely considered the de facto solution to coordinating dynamic spectrum sharing (DSS) and protecting incumbent users. The current SAS paradigm prescribed by the FCC for the CBRS band and standardized by the WInnForum follows a centralized service model in that a spectrum user subscribes to a SAS server for spectrum allocation service. This model, however, neither tolerates SAS server failures (crash or Byzantine) nor resists dishonest SAS administrators, leading to serious concerns on SAS system reliability and trustworthiness. This is especially concerning for the evolving DSS landscape where an increasing number of SAS service providers and heterogeneous user requirements are coming up. To address these challenges, we propose a novel blockchain-based decentralized SAS architecture called BD-SAS that provides SAS services securely and efficiently, without relying on the trust of each individual SAS server for the overall system trustworthiness. In BD-SAS, a global blockchain (G-Chain) is used for spectrum regulatory compliance while smart contract-enabled local blockchains (L-Chains) are instantiated in individual spectrum zones for automating spectrum access assignment per user request. We hope our vision of a decentralized SAS, the BD-SAS architecture, and discussion on future challenges can open up a new direction towards reliable spectrum management in a decentralized manner.

preprint2021arXiv

Partial FC: Training 10 Million Identities on a Single Machine

Face recognition has been an active and vital topic among computer vision community for a long time. Previous researches mainly focus on loss functions used for facial feature extraction network, among which the improvements of softmax-based loss functions greatly promote the performance of face recognition. However, the contradiction between the drastically increasing number of face identities and the shortage of GPU memories is gradually becoming irreconcilable. In this paper, we thoroughly analyze the optimization goal of softmax-based loss functions and the difficulty of training massive identities. We find that the importance of negative classes in softmax function in face representation learning is not as high as we previously thought. The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks. We also implement a very efficient distributed sampling algorithm, taking into account model accuracy and training efficiency, which uses only eight NVIDIA RTX2080Ti to complete classification tasks with tens of millions of identities. The code of this paper has been made available https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc.

preprint2020arXiv

$Λ_c N$ interaction in leading order covariant chiral effective field theory

We study the $Λ_c N$ interaction in the covariant chiral effective field theory (ChEFT) at leading order. All the relevant low-energy constants are determined by fitting to the lattice QCD simulations from the HAL QCD Collaboration. Extrapolating the results to the physical point, we show that the $Λ_c N$ interaction is weakly attractive in the $^1S_0$ channel, but in the $^3S_1$ channel, it is only attractive at extremely low energies and soon turns repulsive for larger laboratory energy. Furthermore, we show that the neglect of the $^3S_1-{}^3D_1$ coupling provided by the leading order covariant ChEFT would result in an attractive interaction in the $^3S_1$ channel at the physical point, which coincides with the previous non-relatistic ChEFT study. As a byproduct, we predict the $^3D_1$ phase shifts and the mixing angel $\varepsilon_1$, which can be checked by future lattice QCD simulations. In addition, we compare the $Λ_c N$ interaction with the $ΛN$ and $NN$ interactions to study how the baryon-nucleon ($BN$) interactions evolve as a function of the baryon mass with the replacement of a light quark by a strange or charm quark in the baryon ($B$).

preprint2020arXiv

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

To facilitate depth-based 3D action recognition, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation. With 3D space voxelization, the key idea of 3DV is to encode 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature jointly. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may lose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also divide the depth video into temporal splits and encode this procedure in 3DV integrally. The extensive experiments on 4 well-established benchmark datasets demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 [13] with the cross-subject and crosssetup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action.

preprint2020arXiv

A Survey of Distributed Consensus Protocols for Blockchain Networks

Since the inception of Bitcoin, cryptocurrencies and the underlying blockchain technology have attracted an increasing interest from both academia and industry. Among various core components, consensus protocol is the defining technology behind the security and performance of blockchain. From incremental modifications of Nakamoto consensus protocol to innovative alternative consensus mechanisms, many consensus protocols have been proposed to improve the performance of the blockchain network itself or to accommodate other specific application needs. In this survey, we present a comprehensive review and analysis on the state-of-the-art blockchain consensus protocols. To facilitate the discussion of our analysis, we first introduce the key definitions and relevant results in the classic theory of fault tolerance which help to lay the foundation for further discussion. We identify five core components of a blockchain consensus protocol, namely, block proposal, block validation, information propagation, block finalization, and incentive mechanism. A wide spectrum of blockchain consensus protocols are then carefully reviewed accompanied by algorithmic abstractions and vulnerability analyses. The surveyed consensus protocols are analyzed using the five-component framework and compared with respect to different performance metrics. These analyses and comparisons provide us new insights in the fundamental differences of various proposals in terms of their suitable application scenarios, key assumptions, expected fault tolerance, scalability, drawbacks and trade-offs. We believe this survey will provide blockchain developers and researchers a comprehensive view on the state-of-the-art consensus protocols and facilitate the process of designing future protocols.

preprint2020arXiv

Characterizing covers via simple closed curves

Given two finite covers $p: X \to S$ and $q: Y \to S$ of a connected, oriented, closed surface $S$ of genus at least $2$, we attempt to characterize the equivalence of $p$ and $q$ in terms of which curves lift to simple curves. Using Teichmüller theory and the complex of curves, we show that two regular covers $p$ and $q$ are equivalent if for any closed curve $γ\subset S$, $γ$ lifts to a simple closed curve on $X$ if and only if it does to $Y$. When the covers are abelian, we also give a characterization of equivalence in terms of which powers of simple closed curves lift to closed curves.

preprint2020arXiv

Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.

preprint2020arXiv

ECML: An Ensemble Cascade Metric Learning Mechanism towards Face Verification

Face verification can be regarded as a 2-class fine-grained visual recognition problem. Enhancing the feature's discriminative power is one of the key problems to improve its performance. Metric learning technology is often applied to address this need, while achieving a good tradeoff between underfitting and overfitting plays the vital role in metric learning. Hence, we propose a novel ensemble cascade metric learning (ECML) mechanism. In particular, hierarchical metric learning is executed in the cascade way to alleviate underfitting. Meanwhile, at each learning level, the features are split into non-overlapping groups. Then, metric learning is executed among the feature groups in the ensemble manner to resist overfitting. Considering the feature distribution characteristics of faces, a robust Mahalanobis metric learning method (RMML) with closed-form solution is additionally proposed. It can avoid the computation failure issue on inverse matrix faced by some well-known metric learning approaches (e.g., KISSME). Embedding RMML into the proposed ECML mechanism, our metric learning paradigm (EC-RMML) can run in the one-pass learning manner. Experimental results demonstrate that EC-RMML is superior to state-of-the-art metric learning methods for face verification. And, the proposed ensemble cascade metric learning mechanism is also applicable to other metric learning approaches.

preprint2020arXiv

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient.

preprint2020arXiv

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.

preprint2020arXiv

Modeling the Impact of Network Connectivity on Consensus Security of Proof-of-Work Blockchain

Blockchain, the technology behind the popular Bitcoin, is considered a "security by design" system as it is meant to create security among a group of distrustful parties yet without a central trusted authority. The security of blockchain relies on the premise of honest-majority, namely, the blockchain system is assumed to be secure as long as the majority of consensus voting power is honest. And in the case of proof-of-work (PoW) blockchain, adversaries cannot control more than 50% of the network's gross computing power. However, this 50% threshold is based on the analysis of computing power only, with implicit and idealistic assumptions on the network and node behavior. Recent researches have alluded that factors such as network connectivity, presence of blockchain forks, and mining strategy could undermine the consensus security assured by the honest-majority, but neither concrete analysis nor quantitative evaluation is provided. In this paper we fill the gap by proposing an analytical model to assess the impact of network connectivity on the consensus security of PoW blockchain under different adversary models. We apply our analytical model to two adversarial scenarios: 1) honest-but-potentially-colluding, 2) selfish mining. For each scenario, we quantify the communication capability of nodes involved in a fork race and estimate the adversary's mining revenue and its impact on security properties of the consensus protocol. Simulation results validated our analysis. Our modeling and analysis provide a paradigm for assessing the security impact of various factors in a distributed consensus system.

preprint2020arXiv

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Towards 3D object tracking in point clouds, a novel point-to-box network termed P2B is proposed in an end-to-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we execute permutation-invariant feature augmentation to embed target clues from template into search area seeds and represent them with target-specific features. Consequently, the augmented search area seeds regress the potential target centers via Hough voting. The centers are further strengthened with seed-wise targetness scores. Finally, each center clusters its neighbors to leverage the ensemble power for joint 3D target proposal and verification. We apply PointNet++ as our backbone and experiments on KITTI tracking dataset demonstrate P2B's superiority (~10%'s improvement over state-of-the-art). Note that P2B can run with 40FPS on a single NVIDIA 1080Ti GPU. Our code and model are available at https://github.com/HaozheQi/P2B.

preprint2020arXiv

Pixel-Pair Occlusion Relationship Map(P2ORM): Formulation, Inference & Application

We formalize concepts around geometric occlusion in 2D images (i.e., ignoring semantics), and propose a novel unified formulation of both occlusion boundaries and occlusion orientations via a pixel-pair occlusion relation. The former provides a way to generate large-scale accurate occlusion datasets while, based on the latter, we propose a novel method for task-independent pixel-level occlusion relationship estimation from single images. Experiments on a variety of datasets demonstrate that our method outperforms existing ones on this task. To further illustrate the value of our formulation, we also propose a new depth map refinement method that consistently improve the performance of state-of-the-art monocular depth estimation methods. Our code and data are available at http://imagine.enpc.fr/~qiux/P2ORM/.

preprint2020arXiv

PrivacyGuard: Enforcing Private Data Usage Control with Blockchain and Attested Off-chain Contract Execution

The abundance and rich varieties of data are enabling many transformative applications of big data analytics that have profound societal impacts. However, there are also increasing concerns regarding the improper use of individual data owner's private data. In this paper, we propose PrivacyGuard, a system that leverages blockchain smart contract and trusted execution environment (TEE) to enable individual's control over the access and usage of their private data. Smart contracts are used to specify data usage policy, i.e., who can use what data under which conditions and what analytics to perform, while the distributed blockchain ledger is used to keep an irreversible and non-repudiable data usage record. To address the efficiency problem of on-chain contract execution and to prevent exposing private data on the publicly viewable blockchain, PrivacyGuard incorporates a novel TEE-based off-chain contract execution engine along with a protocol to securely commit the execution result onto blockchain. We have built and deployed a prototype of PrivacyGuard with Ethereum and Intel SGX. Our experiment result demonstrates that PrivacyGuard fulfills the promised privacy goal and supports analytics on data from a considerable number of data owners.

preprint2020arXiv

Towards Cognitive Routing based on Deep Reinforcement Learning

Routing is one of the key functions for stable operation of network infrastructure. Nowadays, the rapid growth of network traffic volume and changing of service requirements call for more intelligent routing methods than before. Towards this end, we propose a definition of cognitive routing and an implementation approach based on Deep Reinforcement Learning (DRL). To facilitate the research of DRL-based cognitive routing, we introduce a simulator named RL4Net for DRL-based routing algorithm development and simulation. Then, we design and implement a DDPG-based routing algorithm. The simulation results on an example network topology show that the DDPG-based routing algorithm achieves better performance than OSPF and random weight algorithms. It demonstrate the preliminary feasibility and potential advantage of cognitive routing for future network.

preprint2018arXiv

Combinatorics of $k$-Farey graphs

With an eye towards studying curve systems on low-complexity surfaces, we introduce and analyze the $k$-Farey graphs $\mathcal{F}_k$ and $\mathcal{F}_{\leqslant k}$, two natural variants of the Farey graph in which we relax the edge condition to indicate intersection number $=k$ or $\le k$, respectively. The former, $\mathcal{F}_k$, is disconnected when $k>1$. In fact, we find that the number of connected components is infinite if and only if $k$ is not a prime power. Moreover, we find that each component of $\mathcal{F}_k$ is an infinite-valence tree whenever $k$ is even, and $\mathrm{Aut}(\mathcal{F}_k)$ is uncountable for $k>1$. As for $\mathcal{F}_{\leqslant k}$, Agol obtained an upper bound of $1+\min\{p:p\text{ is a prime}>k\}$ for both chromatic and clique numbers, and observed that this is an equality when $k$ is either one or two less than a prime. We add to this list the values of $k$ that are three less than a prime equivalent to $11\ (\mathrm{mod}\ 12)$, and we show computer-assisted computations of many values of $k$ for which equality fails.

preprint2015arXiv

Non-archimedean connected Julia sets with branching

We construct the first examples of rational functions defined over a non-archimedean field with certain dynamical properties. In particular, we find such functions whose Julia sets, in the Berkovich projective line, are connected but not contained in a line segment. We also show how to compute the measure-theoretic and topological entropy of such maps. In particular, we show for some of our examples that the measure-theoretic entropy is strictly smaller than the topological entropy, thus answering a question of Favre and Rivera-Letelier.

Yang Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation

MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation

Environmental Sound Deepfake Detection Challenge: An Overview

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Implications of Nano-Hertz Gravitational Waves on Electroweak Phase Transition in the Singlet Dark Matter Model

Accurate relativistic chiral nucleon-nucleon interaction up to NNLO

An accurate relativistic chiral nucleon-nucleon interaction up to the next-to-next-to-leading order

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

Can electron and muon $g-2$ anomalies be jointly explained in SUSY?

Continual Learning For On-Device Environmental Sound Classification

DataLab: A Platform for Data Analysis and Intervention

Finite-time quantum Otto engine with a squeezed thermal bath: Role of quantum coherence and squeezing in the performance and fluctuations

Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling

Nonperturbative two-pion exchange contributions to the nucleon-nucleon interaction in covariant baryon chiral perturbation theory

Nucleon-nucleon interaction in the $^3S_1$-$^3D_1$ coupled channel for a pion mass of 469 MeV

On the Robustness of Reading Comprehension Models to Entity Renaming

Performance of quantum heat engines via adiabatic deformation of potential

Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting

Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness

Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources

Decentralized Spectrum Access System: Vision, Challenges, and a Blockchain Solution

Partial FC: Training 10 Million Identities on a Single Machine

$Λ_c N$ interaction in leading order covariant chiral effective field theory

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

A Survey of Distributed Consensus Protocols for Blockchain Networks

Characterizing covers via simple closed curves

Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

ECML: An Ensemble Cascade Metric Learning Mechanism towards Face Verification

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Modeling the Impact of Network Connectivity on Consensus Security of Proof-of-Work Blockchain

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Pixel-Pair Occlusion Relationship Map(P2ORM): Formulation, Inference & Application

PrivacyGuard: Enforcing Private Data Usage Control with Blockchain and Attested Off-chain Contract Execution

Towards Cognitive Routing based on Deep Reinforcement Learning

Combinatorics of $k$-Farey graphs

Non-archimedean connected Julia sets with branching