Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
135works
0followers
52topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

135 published item(s)

preprint2026arXiv

HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily

Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decoupling framework for heterogeneous graph learning under heterophily. HeterSEED decouples representation learning into a heterogeneous semantic channel that captures type- and relation-aware local semantics and a structure-aware heterophily channel that separates homophilic and heterophilic neighborhoods via pseudo-label-guided partitioning and aggregates them using metapath-based structural weights. A node-level adaptive fusion mechanism then combines the two channels to produce context-dependent node representations. Theoretically, we establish that, on heterogeneous graphs under heterophily, HeterSEED is strictly more expressive than standard heterogeneous graph neural networks that rely primarily on feature similarity and provably reduces the prediction bias introduced by heterophilic neighbors. Experiments on five real-world heterogeneous graphs, including two large-scale networks at the million-node and hundred-million-edge scale, demonstrate that HeterSEED consistently outperforms representative heterogeneous graph neural networks and recent heterophily-aware baselines, especially in strongly heterophilic regimes.

preprint2026arXiv

The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions

Multi-agent systems (MAS) assume that collaborating inherently improves Large Language Model (LLM) reasoning. We challenge this by demonstrating that simulated social pressure triggers an algorithmic ``Bystander Effect,'' inducing severe cognitive loafing. By evaluating 22,500 deterministic trajectories across 3 dataset contexts (GAIA, SWE-bench, Multi-Challenge) with 3 state-of-the-art (SOTA) models, we semantically audit internal reasoning traces against external outputs. We formalize the \textit{Interaction Depth Limit} ($D_L$), the exact plurality threshold where an agent's logical sovereignty collapses into social compliance. Crucially, we uncover the \textit{Sovereignty Gap}: models frequently compute the correct derivation internally but suffer ``Alignment Hallucinations'' -- actively subjugating empirical evidence to sycophantically appease a simulated swarm. We prove that multi-agent social load is strictly non-commutative; the "brand" identity of the ``Lead Anchor'' auditor disproportionately dictates the swarm's integrity. These findings expose architectural vulnerabilities, proving that unstructured multi-agent topologies can degrade independent reasoning.

preprint2026arXiv

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

As AI transitions toward multi-agent systems (MAS) to solve complex workflows, research paradigms operate on the axiomatic assumption that agent collaboration mirrors the "Wisdom of the Crowd". We challenge this assumption by formalizing the Consensus Paradox: a phenomenon where agentic swarms prioritize internal architectural agreement over external logical truth. Through a 36 experiments encompassing 12,804 trajectories across three state-of-the-art (SOTA) benchmarks (GAIA, Multi-Challenge, and SWE-bench), we prove the Inverse-Wisdom Law: in kinship-dominant swarms, adding logical agents increases the stability of erroneous trajectories rather than the probability of truth. The introduction of additional logical audits converges the system toward a Logic Saturation where internal entropy hits zero while factual error hits unity. By evaluating the interaction between the 3 preeminent SOTA models (Gemini 3.1 Pro, Claude Sonnet 4.6, and GPT-5.4), we establish the Architectural Tribalism Asymmetry as a mechanistic law of transformer weights. We demonstrate that terminal swarm integrity is strictly gated by the synthesizer's receptive logic, rather than aggregate agent quality. We define the Tribalism Coefficient and the Sycophantic Weight as the primary mechanistic determinants of swarm failure. Finally, we establish the Heterogeneity Mandate as a foundational safety requirement for resilient agentic architectures.

preprint2025arXiv

MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems

Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencies, thus limiting global optimality. In this paper, we propose MaRCA, a multi-agent reinforcement learning framework for end-to-end computation resource allocation in large-scale recommender systems. MaRCA models the stages of a recommender system as cooperative agents, using Centralized Training with Decentralized Execution (CTDE) to optimize revenue under computation resource constraints. We introduce an AutoBucket TestBench for accurate computation cost estimation, and a Model Predictive Control (MPC)-based Revenue-Cost Balancer to proactively forecast traffic loads and adjust the revenue-cost trade-off accordingly. Since its end-to-end deployment in the advertising pipeline of a leading global e-commerce platform in November 2024, MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.

preprint2024arXiv

A First-Principle Approach to X-ray Active Optics: Design and Verification

This paper presents the first-principle design approach for X-ray active optics, using the simulation-modulation cycle in place of the measurement-modulation feedback loops used in traditional active optics. Hence, the new active optics have the potential to outperform the accuracy of surface-shape metrology instruments. We apply an X-ray mirror with localized thermal elastic deformation to validate the idea. Both the finite element simulations and surface shape measurements have demonstrated that the active optics modulation accuracy limit can be achieved at the atomic layer level. It is believed that the implementation of the first-principle design strategy has the capacity to revolutionize both the manufacturing processes of X-ray mirrors and the beamline engineering of synchrotron radiation.

preprint2024arXiv

A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems

Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture has been recently proposed to overcome this challenge. However, existing active RIS studies rely on an ideal amplification model without considering the practical hardware limitation of amplifiers, which may cause performance degradation using such inaccurate active RIS modeling. Motivated by this fact, in this paper we first investigate the amplification principle of typical active RIS and propose a more accurate amplification model based on amplifier hardware characteristics. Then, based on the new amplification model, we propose a novel joint transmit beamforming and RIS reflection beamforming design considering the incident signal power on practical active RIS for multiuser multi-input single-output (MU-MISO) communication system. Fractional programming (FP), majorization minimization (MM) and block coordinate descent (BCD) methods are used to solve for the complex problem. Simulation results indicate the importance of the consideration of practical amplifier hardware characteristics in the joint beamforming designs and demonstrate the effectiveness of the proposed algorithm compared to other benchmarks.

preprint2024arXiv

BiSinger: Bilingual Singing Voice Synthesis

Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at https://bisinger-svs.github.io.

preprint2024arXiv

Cooperative Cell-Free ISAC Networks: Joint BS Mode Selection and Beamforming Design

Owing to the promising ability of saving hardware cost and spectrum resources, integrated sensing and communication (ISAC) is regarded as a revolutionary technology for future sixth-generation (6G) networks. The mono-static ISAC systems considered in most of existing works can only achieve limited sensing performance due to the single observation angle and easily blocked transmission links, which motivates researchers to investigate cooperative ISAC networks. In order to further improve the degrees of freedom (DoFs) of cooperative ISAC networks, the transmitter-receiver selection, i.e., base station (BS) mode selection problem, is meaningful to be studied. However, to our best knowledge, this crucial problem has not been extensively studied in existing works. In this paper, we consider the joint BS mode selection, transmit beamforming, and receive filter designs for cooperative cell-free ISAC networks, where multi-BSs cooperatively serve communication users and detect targets. An efficient joint beamforming design algorithm and three different heuristic BS mode selection methods are proposed to solve the non-convex NP-hard problem. Simulation results demonstrates the advantages of cooperative ISAC networks, the importance of BS mode selection, and the effectiveness of proposed algorithms.

preprint2024arXiv

Quadrotor Stabilization with Safety Guarantees: A Universal Formula Approach

Safe stabilization is a significant challenge for quadrotors, which involves reaching a goal position while avoiding obstacles. Most of the existing solutions for this problem rely on optimization-based methods, demanding substantial onboard computational resources. This paper introduces a novel approach to address this issue and provides a solution that offers fast computational capabilities tailored for onboard execution. Drawing inspiration from Sontag's universal formula, we propose an analytical control strategy that incorporates the conditions of control Lyapunov functions (CLFs) and control barrier functions (CBFs), effectively avoiding the need for solving optimization problems onboard. Moreover, we extend our approach by incorporating the concepts of input-to-state stability (ISS) and input-to-state safety (ISSf), enhancing the universal formula's capacity to effectively manage disturbances. Furthermore, we present a projection-based approach to ensure that the universal formula remains effective even when faced with control input constraints. The basic idea of this approach is to project the control input derived from the universal formula onto the closest point within the control input domain. Through comprehensive simulations and experimental results, we validate the efficacy and highlight the advantages of our methodology.

preprint2023arXiv

A Theory of Human-Like Few-Shot Learning

We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.

preprint2023arXiv

Deep Learning-Based UAV Aerial Triangulation without Image Control Points

The emerging drone aerial survey has the advantages of low cost, high efficiency, and flexible use. However, UAVs are often equipped with cheap POS systems and non-measurement cameras, and their flight attitudes are easily affected. How to realize the large-scale mapping of UAV image-free control supported by POS faces many technical problems. The most basic and important core technology is how to accurately realize the absolute orientation of images through advanced aerial triangulation technology. In traditional aerial triangulation, image matching algorithms are constrained to varying degrees by preset prior knowledge. In recent years, deep learning has developed rapidly in the field of photogrammetric computer vision. It has surpassed the performance of traditional handcrafted features in many aspects. It has shown stronger stability in image-based navigation and positioning tasks, especially it has better resistance to unfavorable factors such as blur, illumination changes, and geometric distortion. Based on the introduction of the key technologies of aerial triangulation without image control points, this paper proposes a new drone image registration method based on deep learning image features to solve the problem of high mismatch rate in traditional methods. It adopts SuperPoint as the feature detector, uses the superior generalization performance of CNN to extract precise feature points from the UAV image, thereby achieving high-precision aerial triangulation. Experimental results show that under the same pre-processing and post-processing conditions, compared with the traditional method based on the SIFT algorithm, this method achieves suitable precision more efficiently, which can meet the requirements of UAV aerial triangulation without image control points in large-scale surveys.

preprint2023arXiv

Discriminator-Guided Model-Based Offline Imitation Learning

Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data. Including a learned dynamics model can potentially improve the state-action space coverage of expert data, however, it also faces challenging issues like model approximation/generalization errors and suboptimality of rollout data. In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning strategy, which uses the discriminator to guide and couple the learning process of the policy and dynamics model, resulting in improved model performance and robustness. Our framework can also be extended to the case when demonstrations contain a large proportion of suboptimal data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.

preprint2023arXiv

Self-supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond

To learn distinguishable patterns, most of recent works in vehicle re-identification (ReID) struggled to redevelop official benchmarks to provide various supervisions, which requires prohibitive human labors. In this paper, we seek to achieve the similar goal but do not involve more human efforts. To this end, we introduce a novel framework, which successfully encodes both geometric local features and global representations to distinguish vehicle instances, optimized only by the supervision from official ID labels. Specifically, given our insight that objects in ReID share similar geometric characteristics, we propose to borrow self-supervised representation learning to facilitate geometric features discovery. To condense these features, we introduce an interpretable attention module, with the core of local maxima aggregation instead of fully automatic learning, whose mechanism is completely understandable and whose response map is physically reasonable. To the best of our knowledge, we are the first that perform self-supervised learning to discover geometric features. We conduct comprehensive experiments on three most popular datasets for vehicle ReID, i.e., VeRi-776, CityFlow-ReID, and VehicleID. We report our state-of-the-art (SOTA) performances and promising visualization results. We also show the excellent scalability of our approach on other ReID related tasks, i.e., person ReID and multi-target multi-camera (MTMC) vehicle tracking.

preprint2023arXiv

Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems

Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming design is presented to achieve this goal. Specifically, we first propose a compressive sensing (CS)-assisted estimation approach to facilitate ISAC receive echo processing, which can not only enable accurate recovery of target information, but also allow substantial reduction in the number of sensing subcarriers to be sampled and processed. Then, based on the proposed CS-assisted processing method, the associated transmit beamforming design is formulated with the objective of maximizing the sum-rate of multiuser communications while satisfying the transmit power budget and ensuring the received signal-to-noise ratio (SNR) for the designated sensing subcarriers. In order to address the formulated non-convex problem involving high-dimensional variables, an effective iterative algorithm employing majorization minimization (MM), fractional programming (FP), and the nonlinear equality alternative direction method of multipliers (neADMM) with closed-form solutions has been developed. Finally, extensive numerical simulations are conducted to verify the effectiveness of the proposed algorithm and the superior performance of the introduced sparsity exploitation strategy.

preprint2022arXiv

A proof-of-principle demonstration of quantum microwave photonics

With the rapid development of microwave photonics, which has expanded to numerous applications of commercial importance, eliminating the emerging bottlenecks becomes of vital importance. For example, as the main branch of microwave photonics, radio-over-fiber technology provides high bandwidth, low-loss, and long-distance propagation capability, facilitating wide applications ranging from telecommunication to wireless networks. With ultrashort pulses as the optical carrier, huge capacity is further endowed. However, the wide bandwidth of ultrashort pulses results in the severe vulnerability of high-frequency RF signals to fiber dispersion. With a time-energy entangled biphoton source as the optical carrier and combined with the single-photon detection technique, a quantum microwave photonics method is proposed and demonstrated experimentally. The results show that it not only realizes unprecedented nonlocal RF signal modulation with strong resistance to the dispersion associated with ultrashort pulse carriers but provides an alternative mechanism to effectively distill the RF signal out from the dispersion. Furthermore, the spurious-free dynamic range of both the nonlocally modulated and distilled RF signals has been significantly improved. With the ultra-weak detection and high-speed processing advantages endowed by the low-timing-jitter single-photon detection, the quantum microwave photonics method opens up new possibilities in modern communication and networks.

preprint2022arXiv

A Schur type lemma for the Mean Berwald curvature in Finsler geometry

In this short paper, we study a symmetric covariant tensor in Finsler geometry, which is called the mean Berwald curvature. We first investigate the geometry of the fibres as the submanifolds of the tangent sphere bundle on a Finsler manifold. Then we prove that if the mean Berwald curvature is isotropic along fibres, then the Berwald scalar curvature is constant along fibres.

preprint2022arXiv

About One-point Statistics of the Ratio of Two Fourier-transformed Cosmic Fields and an Application

The Fourier transformation is an effective and efficient operation of Gaussianization at the one-point level. Using a set of N-body simulation data, we verified that the one-point distribution functions of the dark matter momentum divergence and density fields closely follow complex Gaussian distributions. The one-point distribution function of the quotient of two complex Gaussian variables is introduced and studied. Statistical theories are then applied to model one-point statistics about the growth of individual Fourier mode of the dark matter density field, which can be obtained by the ratio of two Fourier transformed cosmic fields. Our simulation results proved that the models based on the Gaussian approximation are impressively accurate, and our analysis revealed many interesting aspects about the growth of dark matter's density fluctuation in Fourier space.

preprint2022arXiv

CaFT: Clustering and Filter on Tokens of Transformer for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) is a challenging task to localize the object by only category labels. However, there is contradiction between classification and localization because accurate classification network tends to pay attention to discriminative region of objects rather than the entirety. We propose this discrimination is caused by handcraft threshold choosing in CAM-based methods. Therefore, we propose Clustering and Filter of Tokens (CaFT) with Vision Transformer (ViT) backbone to solve this problem in another way. CaFT first sends the patch tokens of the image split to ViT and cluster the output tokens to generate initial mask of the object. Secondly, CaFT considers the initial mask as pseudo labels to train a shallow convolution head (Attention Filter, AtF) following backbone to directly extract the mask from tokens. Then, CaFT splits the image into parts, outputs masks respectively and merges them into one refined mask. Finally, a new AtF is trained on the refined masks and used to predict the box of object. Experiments verify that CaFT outperforms previous work and achieves 97.55\% and 69.86\% localization accuracy with ground-truth class on CUB-200 and ImageNet-1K respectively. CaFT provides a fresh way to think about the WSOL task.

preprint2022arXiv

Collaborative Knowledge Graph Fusion by Exploiting the Open Corpus

To alleviate the challenges of building Knowledge Graphs (KG) from scratch, a more general task is to enrich a KG using triples from an open corpus, where the obtained triples contain noisy entities and relations. It is challenging to enrich a KG with newly harvested triples while maintaining the quality of the knowledge representation. This paper proposes a system to refine a KG using information harvested from an additional corpus. To this end, we formulate our task as two coupled sub-tasks, namely join event extraction (JEE) and knowledge graph fusion (KGF). We then propose a Collaborative Knowledge Graph Fusion Framework to allow our sub-tasks to mutually assist one another in an alternating manner. More concretely, the explorer carries out the JEE supervised by both the ground-truth annotation and an existing KG provided by the supervisor. The supervisor then evaluates the triples extracted by the explorer and enriches the KG with those that are highly ranked. To implement this evaluation, we further propose a Translated Relation Alignment Scoring Mechanism to align and translate the extracted triples to the prior KG. Experiments verify that this collaboration can both improve the performance of the JEE and the KGF.

preprint2022arXiv

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs to align with Vox-H cases. The baseline system performance drops from 1.939\% EER on the Vox-H test set to 10.419\% on the Vox-CA20 test set, which indicates how difficult the cross-age scenario is. Consequently, we propose an age-decoupling adversarial learning (ADAL) method to alleviate the negative effect of the age gap and reduce intra-class variance. Our method outperforms the baseline system by over 10\% related EER reduction on the Vox-CA20 test set. The source code and trial resources are available on https://github.com/qinxiaoyi/Cross-Age_Speaker_Verification

preprint2022arXiv

Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge

In this paper, we present the speaker diarization system for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) from team DKU_DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-based target-speaker voice activity detection (TS-VAD) to find the overlap between speakers. For the single-channel scenario, we separately train a model for each of the 8 channels and fuse the results. We also employ the cross-channel self-attention to further improve the performance, where the non-linear spatial correlations between different channels are learned and fused. Experimental results on the evaluation set show that the single-channel TS-VAD reduces the DER by over 75% from 12.68\% to 3.14%. The multi-channel TS-VAD further reduces the DER by 28% and achieves a DER of 2.26%. Our final submitted system achieves a DER of 2.98% on the AliMeeting test set, which ranks 1st in the M2MET challenge.

preprint2022arXiv

Deep Reinforcement Learning based Joint Active and Passive Beamforming Design for RIS-Assisted MISO Systems

Owing to the unique advantages of low cost and controllability, reconfigurable intelligent surface (RIS) is a promising candidate to address the blockage issue in millimeter wave (mmWave) communication systems, consequently has captured widespread attention in recent years. However, the joint active beamforming and passive beamforming design is an arduous task due to the high computational complexity and the dynamic changes of wireless environment. In this paper, we consider a RIS-assisted multi-user multiple-input single-output (MU-MISO) mmWave system and aim to develop a deep reinforcement learning (DRL) based algorithm to jointly design active hybrid beamformer at the base station (BS) side and passive beamformer at the RIS side. By employing an advanced soft actor-critic (SAC) algorithm, we propose a maximum entropy based DRL algorithm, which can explore more stochastic policies than deterministic policy, to design active analog precoder and passive beamformer simultaneously. Then, the digital precoder is determined by minimum mean square error (MMSE) method. The experimental results demonstrate that our proposed SAC algorithm can achieve better performance compared with conventional optimization algorithm and DRL algorithm.

preprint2022arXiv

Embedding Graphs on Grassmann Manifold

Learning efficient graph representation is the key to favorably addressing downstream tasks on graphs, such as node or graph property prediction. Given the non-Euclidean structural property of graphs, preserving the original graph data's similarity relationship in the embedded space needs specific tools and a similarity metric. This paper develops a new graph representation learning scheme, namely EGG, which embeds approximated second-order graph characteristics into a Grassmann manifold. The proposed strategy leverages graph convolutions to learn hidden representations of the corresponding subspace of the graph, which is then mapped to a Grassmann point of a low dimensional manifold through truncated singular value decomposition (SVD). The established graph embedding approximates denoised correlationship of node attributes, as implemented in the form of a symmetric matrix space for Euclidean calculation. The effectiveness of EGG is demonstrated using both clustering and classification tasks at the node level and graph level. It outperforms baseline models on various benchmarks.

preprint2022arXiv

Explainable COVID-19 Infections Identification and Delineation Using Calibrated Pseudo Labels

The upheaval brought by the arrival of the COVID-19 pandemic has continued to bring fresh challenges over the past two years. During this COVID-19 pandemic, there has been a need for rapid identification of infected patients and specific delineation of infection areas in computed tomography (CT) images. Although deep supervised learning methods have been established quickly, the scarcity of both image-level and pixel-level labels as well as the lack of explainable transparency still hinder the applicability of AI. Can we identify infected patients and delineate the infections with extreme minimal supervision? Semi-supervised learning has demonstrated promising performance under limited labelled data and sufficient unlabelled data. Inspired by semi-supervised learning, we propose a model-agnostic calibrated pseudo-labelling strategy and apply it under a consistency regularization framework to generate explainable identification and delineation results. We demonstrate the effectiveness of our model with the combination of limited labelled data and sufficient unlabelled data or weakly-labelled data. Extensive experiments have shown that our model can efficiently utilize limited labelled data and provide explainable classification and segmentation results for decision-making in clinical routine. The code is available at https://github.com/ayanglab/XAI COVID-19.

preprint2022arXiv

Fully-integrated multipurpose microwave frequency identification system on a single chip

We demonstrate a fully-integrated multipurpose microwave frequency identification system on silicon-on-insulator platform. Thanks to its multipurpose features, the chip is able to identify different types of microwave signals, including single-frequency, multiple-frequency, chirped and frequency-hopping microwave signals, as well as discriminate instantaneous frequency variation among the frequency-modulated signals. This demonstration exhibits fully integrated solution and fully functional microwave frequency identification, which can meet the requirements in reduction of size, weight and power for future advanced microwave photonic processor.

preprint2022arXiv

Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words

Wake-up word detection models are widely used in real life, but suffer from severe performance degradation when encountering adversarial samples. In this paper we discuss the concept of confusing words in adversarial samples. Confusing words are commonly encountered, which are various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustness against confusing words, we propose several methods to generate the adversarial confusing samples for simulating real confusing words scenarios in which we usually do not have any real confusing samples in the training set. The generated samples include concatenated audio, synthesized data, and partially masked keywords. Moreover, we use a domain embedding concatenated system to improve the performance. Experimental results show that the adversarial samples generated in our approach help improve the system's robustness in both the common scenario and the confusing words scenario. In addition, we release the confusing words testing database called HI-MIA-CW for future research.

preprint2022arXiv

Integrated Sensing and Communication with Reconfigurable Intelligent Surfaces: Opportunities, Applications, and Future Directions

Integrated sensing and communication (ISAC) is emerging as a key enabler to address the growing spectrum congestion problem and satisfy increasing demands for ubiquitous sensing and communication. By sharing various resources and information, ISAC achieves much higher spectral, energy, hardware, and economic efficiencies. Concurrently, reconfigurable intelligent surface (RIS) technology has been deemed as a promising approach due to its capability of intelligently manipulating the wireless propagation environment in an energy and hardware efficient manner. In this article, we analyze the potential of deploying RIS to improve communication and sensing performance in ISAC systems. We first describe the fundamentals of RIS and its applications in traditional communication and sensing systems, then introduce the principles of ISAC and overview existing explorations on RIS-assisted ISAC, followed by one case study to verify the advantages of deploying RIS in ISAC systems. Finally, open challenges and research directions are discussed to stimulate this line of research and pave the way for practical applications.

preprint2022arXiv

Invertible Voice Conversion

In this paper, we propose an invertible deep learning framework called INVVC for voice conversion. It is designed against the possible threats that inherently come along with voice conversion systems. Specifically, we develop an invertible framework that makes the source identity traceable. The framework is built on a series of invertible $1\times1$ convolutions and flows consisting of affine coupling layers. We apply the proposed framework to one-to-one voice conversion and many-to-one conversion using parallel training data. Experimental results show that this approach yields impressive performance on voice conversion and, moreover, the converted results can be reversed back to the source inputs utilizing the same parameters as in forwarding.

preprint2022arXiv

IRS-assisted Multi-cell Multi-band Systems: Practical Reflection Model and Joint Beamforming Design

Intelligent reflecting surface (IRS) has been regarded as a promising and revolutionary technology for future wireless communication systems owing to its capability of tailoring signal propagation environment in an energy/spectrum/hardware-efficient manner. However, most existing studies on IRS optimizations are based on a simple and ideal reflection model that is impractical in hardware implementation, which thus leads to severe performance loss in realistic wideband/multi-band systems. To deal with this problem, in this paper we first propose a more practical and more tractable IRS reflection model that describes the difference of reflection responses for signals at different frequencies. Then, we investigate the joint transmit beamforming and IRS reflection beamforming design for an IRS-assisted multi-cell multi-band system. Both power minimization and sum-rate maximization problems are solved by exploiting popular second-order cone programming (SOCP), Riemannian manifold, minimization-majorization (MM), weighted minimum mean square error (WMMSE), and block coordinate descent (BCD) methods. Simulation results illustrate the significant performance improvement of our proposed joint transmit beamforming and reflection design algorithms based on the practical reflection model in terms of power saving and rate enhancement.

preprint2022arXiv

Joint Beamforming Design for Intelligent Omni Surface Assisted Wireless Communication Systems

Intelligent reflecting surface (IRS) has been widely considered as one of the key enabling techniques for future wireless communication networks owing to its ability of dynamically controlling the phase shift of reflected electromagnetic (EM) waves to construct a favorable propagation environment. While IRS only focuses on signal reflection, the recently emerged innovative concept of intelligent omni-surface (IOS) can provide the dual functionality of manipulating reflecting and transmitting signals. Thus, IOS is a new paradigm for achieving ubiquitous wireless communications. In this paper, we consider an IOSassisted multi-user multi-input single-output (MU-MISO) system where the IOS utilizes its reflective and transmissive properties to enhance the MU-MISO transmission. Both power minimization and sum-rate maximization problems are solved by exploiting the second-order cone programming (SOCP), Riemannian manifold, weighted minimum mean square error (WMMSE), and block coordinate descent (BCD) methods. Simulation results verify the advancements of the IOS for wireless systems and illustrate the significant performance improvement of our proposed joint transmit beamforming, reflecting and transmitting phase-shift, and IOS energy division design algorithms. Compared with conventional IRS, IOS can significantly extend the communication coverage, enhance the strength of received signals, and improve the quality of communication links.

preprint2022arXiv

Joint Beamforming Design for RIS-Assisted Integrated Sensing and Communication Systems

Integrated sensing and communication (ISAC) has been envisioned as a promising technology to tackle the spectrum congestion problem for future networks. In this correspondence, we investigate to deploy a reconfigurable intelligent surface (RIS) in an ISAC system for achieving better performance. In particular, a multi-antenna base station (BS) simultaneously serves multiple single-antenna users with the assistance of a RIS and detects potential targets. The active beamforming of the BS and the passive beamforming of the RIS are jointly optimized to maximize the achievable sum-rate of the communication users while satisfying the constraint of beampattern similarity for radar sensing, the restriction of the RIS, and the transmit power budget. An efficient alternating algorithm based on the fractional programming (FP), majorization-minimization (MM), and manifold optimization methods is developed to convert the resulting non-convex optimization problem into two solvable sub-problems and iteratively solve them. Simulation studies illustrate the advancement of deploying RIS in ISAC systems and the effectiveness of the proposed algorithm.

preprint2022arXiv

Joint Beamforming Design in DFRC Systems for Wideband Sensing and OFDM Communications

Dual-function radar-communication (DFRC) systems, which can efficiently utilize the congested spectrum and costly hardware resources by employing one common waveform for both sensing and communication (S&C), have attracted increasing attention. While the orthogonal frequency division multiplexing (OFDM) technique has been widely adopted to support high-quality communications, it also has great potentials of improving radar sensing performance and providing flexible S&C. In this paper, we propose to jointly design the dual-functional transmit signals occupying several subcarriers to realize multi-user OFDM communications and detect one moving target in the presence of clutter. Meanwhile, the signals in other frequency subcarriers can be optimized in a similar way to perform other tasks. The transmit beamforming and receive filter are jointly optimized to maximize the radar output signal-to-interference-plus-noise ratio (SINR), while satisfying the communication SINR requirement and the power budget. An majorization minimization (MM) method based algorithm is developed to solve the resulting non-convex optimization problem. Numerical results reveal the significant wideband sensing gain brought by jointly designing the transmit signals in different subcarriers, and demonstrate the advantages of our proposed scheme and the effectiveness of the developed algorithm.

preprint2022arXiv

Low-Latency Online Speaker Diarization with Graph-Based Label Generation

This paper introduces an online speaker diarization system that can handle long-time audio with low latency. We enable Agglomerative Hierarchy Clustering (AHC) to work in an online fashion by introducing a label matching algorithm. This algorithm solves the inconsistency between output labels and hidden labels that are generated each turn. To ensure the low latency in the online setting, we introduce a variant of AHC, namely chkpt-AHC, to cluster the speakers. In addition, we propose a speaker embedding graph to exploit a graph-based re-clustering method, further improving the performance. In the experiment, we evaluate our systems on both DIHARD3 and VoxConverse datasets. The experimental results show that our proposed online systems have better performance than our baseline online system and have comparable performance to our offline systems. We find out that the framework combining the chkpt-AHC method and the label matching algorithm works well in the online setting. Moreover, the chkpt-AHC method greatly reduces the time cost, while the graph-based re-clustering method helps improve the performance.

preprint2022arXiv

MealRec: A Meal Recommendation Dataset

Bundle recommendation systems aim to recommend a bundle of items for a user to consider as a whole. They have become a norm in modern life and have been applied to many real-world settings, such as product bundle recommendation, music playlist recommendation and travel package recommendation. However, compared to studies of bundle recommendation approaches in areas such as online shopping and digital music services, research on meal recommendations for restaurants in the hospitality industry has made limited progress, due largely to the lack of high-quality benchmark datasets. A publicly available dataset specialising in meal recommendation research for the research community is in urgent demand. In this paper, we introduce a meal recommendation dataset (MealRec) that aims to facilitate future research. MealRec is constructed from the user review records of Allrecipe.com, covering 1,500+ users, 7,200+ recipes and 3,800+ meals. Each recipe is described with rich information, such as ingredients, instructions, pictures, category and tags, etc; and each meal is three-course, consisting of an appetizer, a main dish and a dessert. Furthermore, we propose a category-constrained meal recommendation model that is evaluated through comparative experiments with several state-of-the-art bundle recommendation methods on MealRec. Experimental results confirm the superiority of our model and demonstrate that MealRec is a promising testbed for meal recommendation related research. The MealRec dataset and the source code of our proposed model are available at https://github.com/WUT-IDEA/MealRec for access and reproducibility.

preprint2022arXiv

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios

Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.

preprint2022arXiv

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

Albeit with varying degrees of progress in the field of Semi-Supervised Semantic Segmentation, most of its recent successes are involved in unwieldy models and the lightweight solution is still not yet explored. We find that existing knowledge distillation techniques pay more attention to pixel-level concepts from labeled data, which fails to take more informative cues within unlabeled data into account. Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting. Specifically, MGD is formulated as a labeled-unlabeled data cooperative distillation scheme, which helps to take full advantage of diverse data characteristics that are essential in the semi-supervised setting. Image-level semantic-sensitive loss, region-level content-aware loss, and pixel-level consistency loss are set up to enrich hierarchical distillation abstraction via structurally complementary teachers. Experimental results on PASCAL VOC2012 and Cityscapes reveal that MGD can outperform the competitive approaches by a large margin under diverse partition protocols. For example, the performance of ResNet-18 and MobileNet-v2 backbone is boosted by 11.5% and 4.6% respectively under 1/16 partition protocol on Cityscapes. Although the FLOPs of the model backbone is compressed by 3.4-5.3x (ResNet-18) and 38.7-59.6x (MobileNetv2), the model manages to achieve satisfactory segmentation results.

preprint2022arXiv

Multiple and Asymmetric Scalings in Explosive Percolation

Explosive percolation in the Achlioptas process has recently attracted much research attention. From extensive simulations in an event-based ensemble, we find that, in dimensions from $2$ to $6$ and on random graphs, the Achlioptas processes all have two scaling windows and multiple fractal structures. The mixing of these multiple scalings successfully explains the previously observed anomalous phenomena in the conventional ensemble, and, moreover, correct critical exponents are now determined with a high precision by the event-based method. The multiple scalings and the ensemble inequivalence may bring new insights for other statistical systems.

preprint2022arXiv

Non-Cooperative Resource Management for Intelligent Reflecting Surface Aided Networks

Intelligent reflecting surface (IRS) has emerged as a promising and revolutionizing technology for future wireless networks. Most existing IRS studies focus on simple cooperative systems which usually have a single frequency band. In realistic non-cooperative multi-band networks, however, the existing IRS designs may be not applicable or have severe performance degradation. Thus, in the complex network environment, it is more rational to consider IRSs as public resources to be dynamically allocated to appropriate users. In this paper, we first introduce the auction theory to tackle the resource management problem for a multi-IRS-assisted non-cooperative network. An efficient auction algorithm framework is introduced to sub-optimally solve this non-convex problem. Simulation result illustrates that the significant performance improvement can be achieved by applying the auction algorithm in the complex multi-IRS-assisted non-cooperative network.

preprint2022arXiv

On isotropic Berwald scalar curvature

In this short paper, we establish a closer relation between the Berwald scalar curvature and the $S$-curvature. In fact, we prove that a Finsler metric has isotropic Berwald scalar curvature if and only if it has weakly isotropic $S$-curvature. For Finsler metrics of scalar flag curvature and of weakly isotropic $S$-curvature, they have almost isotropic $S$-curvature if and only if the flag curvature is weakly isotropic.

preprint2022arXiv

Online Target Speaker Voice Activity Detection for Speaker Diarization

This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. First, we employ a ResNet-based front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection state of each speaker based on these frame-level speaker embeddings and the previously estimated target speaker embedding. Then, the target speaker embeddings are updated by aggregating these frame-level speaker embeddings according to the predictions in the current block. We iteratively extract the results for each block and update the target speaker embedding until reaching the end of the signal. Experimental results show that the proposed method is better than the offline clustering-based diarization system on the AliMeeting dataset.

preprint2022arXiv

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs. In this paper, we propose a Parallel Pre-trained Transformers (PPT) framework to accomplish the synthetic data-based Instance Segmentation task. Specifically, we leverage the off-the-shelf pre-trained vision Transformers to alleviate the gap between natural and synthetic data, which helps to provide good generalization in the downstream synthetic data scene with few samples. Swin-B-based CBNet V2, SwinL-based CBNet V2 and Swin-L-based Uniformer are employed for parallel feature learning, and the results of these three models are fused by pixel-level Non-maximum Suppression (NMS) algorithm to obtain more robust results. The experimental results reveal that PPT ranks first in the CVPR2022 AVA Accessibility Vision and Autonomy Challenge, with a 65.155% mAP.

preprint2022arXiv

Probing gluon Bose correlations in DIS

We study correlations originating from the quantum nature of gluons in a hadronic wave function. Bose-Einstein correlation between identical particles lead to the enhancement in the number of pairs of gluons with the same quantum numbers and small relative momentum. We show that these preexisting correlations can be probed in Deep Inelastic Scattering experiments at high energy. Specifically, we consider diffractive dijet plus a third jet production. The azimuthal dependence displays a peak at the zero relative angle between the transverse momentum imbalance of the photon-going dijet and the transverse momentum of the hadron-going jet. Our calculations explicitly show that the peak originates from Bose enhancement. Comparing electron-proton to electron-nucleus collisions, we demonstrate that the nuclear target enhances the relative strength of the peak. With the future high luminosity Electron-Ion Collider the proposed measurements of gluon Bose enhancement become experimentally feasible.

preprint2022arXiv

Quantum algorithms for the generalized eigenvalue problem

The generalized eigenvalue (GE) problems are of particular importance in various areas of science engineering and machine learning. We present a variational quantum algorithm for finding the desired generalized eigenvalue of the GE problem, $\mathcal{A}|ψ\rangle=λ\mathcal{B}|ψ\rangle$, by choosing suitable loss functions. Our approach imposes the superposition of the trial state and the obtained eigenvectors with respect to the weighting matrix $\mathcal{B}$ on the Rayleigh-quotient. Furthermore, both the values and derivatives of the loss functions can be calculated on near-term quantum devices with shallow quantum circuit. Finally, we propose a full quantum generalized eigensolver (FQGE) to calculate the minimal generalized eigenvalue with quantum gradient descent algorithm. As a demonstration of the principle, we numerically implement our algorithms to conduct a 2-qubit simulation and successfully find the generalized eigenvalues of the matrix pencil $(\mathcal{A},\,\mathcal{B})$. The numerically experimental result indicates that FQGE is robust under Gaussian noise.

preprint2022arXiv

Quantum relaxed row and column iteration methods based on block-encoding

Iteration method is commonly used in solving linear systems of equations. We present quantum algorithms for the relaxed row and column iteration methods by constructing unitary matrices in the iterative processes, which generalize row and column iteration methods to solve linear systems on a quantum computer. Comparing with the conventional row and column iteration methods, the convergence accelerates when appropriate parameters are chosen. Once the quantum states are efficiently prepared, the complexity of our relaxed row and column methods is improved exponentially and is linear with the number of the iteration steps. In addition, phase estimations and Hamiltonian simulations are not required in these algorithms.

preprint2022arXiv

Realizing two-qubit gates through mode engineering on a trapped-ion quantum computer

Two-qubit gates are a fundamental constituent of a quantum computer and typically its most challenging operation. In a trapped-ion quantum computer, this is typically implemented with laser beams which are modulated in amplitude, frequency, phase, or a combination of these. The required modulation becomes increasingly more complex as the quantum computer becomes larger, complicating the control hardware design. Here, we develop a simple method to essentially remove the pulse-modulation complexity by engineering the normal modes of the ion chain. We experimentally demonstrate the required mode engineering in a three ion chain. This opens up the possibility to trade off complexity between the design of the trapping fields and the optical control system, which will help scale the ion trap quantum computing platform.

preprint2022arXiv

Reflection and Relay Dual-Functional RIS Assisted MU-MISO Systems

Reconfigurable intelligent surfaces (RISs) have been deemed as one of potential components of future wireless communication systems because they can adaptively manipulate the wireless propagation environment with low-cost passive devices. However, due to double fading effect, the passive RIS can offer sufficient signal strength only when receivers are nearby and located at the same side as the incident signals. Moreover, RIS cannot provide service coverage for the users at the back side of it. In this paper we introduce a novel reflection and relay dual-functional RIS architecture, which can simultaneously realize passive reflection and active relay functionalities to enhance the coverage. The problem of joint transmit beamforming and dual-functional RIS design is investigated to maximize the achievable sum-rate of a multiuser multiple-input single-output (MU-MISO) system. Based on fractional programming (FP) theory and majorization-minimization (MM) technique, we propose an efficient iterative transmit beamforming and RIS design algorithm. Simulation results demonstrate the superiority of the introduced dual-functional RIS architecture and the effectiveness of the proposed algorithm.

preprint2022arXiv

The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge

This paper describes our DKU-OPPO system for the 2022 Spoofing-Aware Speaker Verification (SASV) Challenge. First, we split the joint task into speaker verification (SV) and spoofing countermeasure (CM), these two tasks which are optimized separately. For ASV systems, four state-of-the-art methods are employed. For CM systems, we propose two methods on top of the challenge baseline to further improve the performance, namely Embedding Random Sampling Augmentation (ERSA) and One-Class Confusion Loss(OCCL). Second, we also explore whether SV embedding could help improve CM system performance. We observe a dramatic performance degradation of existing CM systems on the domain-mismatched Voxceleb2 dataset. Third, we compare different fusion strategies, including parallel score fusion and sequential cascaded systems. Compared to the 1.71% SASV-EER baseline, our submitted cascaded system obtains a 0.21% SASV-EER on the challenge official evaluation set.

preprint2022arXiv

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

The voice conversion task is to modify the speaker identity of continuous speech while preserving the linguistic content. Generally, the naturalness and similarity are two main metrics for evaluating the conversion quality, which has been improved significantly in recent years. This paper presents the HCCL-DKU entry for the fake audio generation task of the 2022 ICASSP ADD challenge. We propose a novel ppg-based voice conversion model that adopts a fully end-to-end structure. Experimental results show that the proposed method outperforms other conversion models, including Tacotron-based and Fastspeech-based models, on conversion quality and spoofing performance against anti-spoofing systems. In addition, we investigate several post-processing methods for better spoofing power. Finally, we achieve second place with a deception success rate of 0.916 in the ADD challenge.

preprint2022arXiv

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

With the development of deep learning, automatic speaker verification has made considerable progress over the past few years. However, to design a lightweight and robust system with limited computational resources is still a challenging problem. Traditionally, a speaker verification system is symmetrical, indicating that the same embedding extraction model is applied for both enrollment and verification in inference. In this paper, we come up with an innovative asymmetric structure, which takes the large-scale ECAPA-TDNN model for enrollment and the small-scale ECAPA-TDNNLite model for verification. As a symmetrical system, our proposed ECAPA-TDNNLite model achieves an EER of 3.07% on the Voxceleb1 original test set with only 11.6M FLOPS. Moreover, the asymmetric structure further reduces the EER to 2.31%, without increasing any computational costs during verification.

preprint2022arXiv

Transverse momentum distributions of valence quark in light and heavy vector mesons

We study the leading-twist time-reversal even transverse momentum dependent parton distribution functions (TMDs) of light and heavy vector mesons, i.e., the $ρ$, $J/ψ$ and $Υ$. We employ the leading Fock-state light front wave functions (LF-LFWFs) of $ρ$ and $J/ψ$ from our recent study, and supplement with $Υ$'s LF-LFWFs. These LF-LFWFs are extracted from dynamically solved Bethe-Salpeter wave functions. The vector meson TMDs are then studied with the light front overlap representation at leading Fock-state. All the obtained TMDs are non-vanishing and evolve with current quark mass, in particular the tensor polarized TMDs $f_{1LT}$ and $f_{1TT}$ which undergo a sign flip. The $ρ$ TMDs are compared with other model studies and agreement is found, aside from $f_{1LT}$ and $f_{1TT}$. Finally, the collinear PDFs of vector mesons are studied. The $ρ$'s valence PDFs $f_{1,v}(x)$ and $g_{1L,v}(x)$ are evolved to the scale of 2.4 GeV, with their first three moments compared to lattice QCD prediction. The qualitative behavior of tensor polarized PDF $f_{1LL}(x)$ in $ρ$ at large $x$ is also discussed.

preprint2022arXiv

User Association and Hybrid Beamforming Designs for Cooperative mmWave MIMO Systems

Hybrid analog and digital beamforming has emerged as a key enabling technology for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) communication systems since it can balance the trade-off between system performance and hardware efficiency. Owing to the strong ability of central control, cooperative networks show great potential to enhance the spectral efficiency of mmWave communications. In this paper, we consider cooperative mmWave MIMO systems and propose user association and hybrid beamforming design algorithms for three typical hybrid beamforming architectures. The central processing unit (CPU) of the cooperative networks first matches the service pairs of base stations (BSs) and users. Then, an iterative hybrid beamforming design algorithm is proposed to maximize the weighted achievable sum-rate performance of the mmWave MIMO system with fully connected hybrid beamforming architecture. Moreover, a heuristic analog beamforming design algorithm is introduced for the fixed subarray hybrid beamforming architecture. In an effort to further exploit multiple-antenna diversities, we also consider the dynamic subarray architecture and propose a novel antenna design algorithm for the analog beamforming design. Simulation results illustrate that the proposed hybrid beamforming algorithms achieve a significant performance improvement than other existing approaches and the dynamic subarray architecture has great advantages of improving the energy efficiency (EE) performance.

preprint2022arXiv

Wiggle: Physical Challenge-Response Verification of Vehicle Platooning

Autonomous vehicle platooning promises many benefits such as fuel efficiency, road safety, reduced traffic congestion, and passenger comfort. Platooning vehicles travel in a single file, in close distance, and at the same velocity. The platoon formation is autonomously maintained by a Cooperative Adaptive Cruise Control (CACC) system which relies on sensory data and vehicle-to-vehicle (V2V) communications. In fact, V2V messages play a critical role in shortening the platooning distance while maintaining safety. Whereas V2V message integrity and source authentication can be verified via cryptographic methods, establishing the truthfulness of the message contents is a much harder task. This work establishes a physical access control mechanism to restrict V2V messages to platooning members. Specifically, we aim at tying the digital identity of a candidate requesting to join a platoon to its physical trajectory relative to the platoon. We propose the {\em Wiggle} protocol that employs a physical challenge-response exchange to prove that a candidate requesting to be admitted into a platoon actually follows it. The protocol name is inspired by the random longitudinal movements that the candidate is challenged to execute. {\em Wiggle} prevents any remote adversary from joining the platoon and injecting fake CACC messages. Compared to prior works, {\em Wiggle} is resistant to pre-recording attacks and can verify that the candidate is directly behind the verifier at the same lane.

preprint2021arXiv

A Feature Fusion-Net Using Deep Spatial Context Encoder and Nonstationary Joint Statistical Model for High Resolution SAR Image Classification

Convolutional neural networks (CNNs) have been applied to learn spatial features for high-resolution (HR) synthetic aperture radar (SAR) image classification. However, there has been little work on integrating the unique statistical distributions of SAR images which can reveal physical properties of terrain objects, into CNNs in a supervised feature learning framework. To address this problem, a novel end-to-end supervised classification method is proposed for HR SAR images by considering both spatial context and statistical features. First, to extract more effective spatial features from SAR images, a new deep spatial context encoder network (DSCEN) is proposed, which is a lightweight structure and can be effectively trained with a small number of samples. Meanwhile, to enhance the diversity of statistics, the nonstationary joint statistical model (NS-JSM) is adopted to form the global statistical features. Specifically, SAR images are transformed into the Gabor wavelet domain and the produced multi-subbands magnitudes and phases are modeled by the log-normal and uniform distribution. The covariance matrix is further utilized to capture the inter-scale and intra-scale nonstationary correlation between the statistical subbands and make the joint statistical features more compact and distinguishable. Considering complementary advantages, a feature fusion network (Fusion-Net) base on group compression and smooth normalization is constructed to embed the statistical features into the spatial features and optimize the fusion feature representation. As a result, our model can learn the discriminative features and improve the final classification performance. Experiments on four HR SAR images validate the superiority of the proposed method over other related algorithms.

preprint2021arXiv

Analysis of Heterogeneous Structures of Non-separated Scales on Curved Bridge Nodes

Numerically predicting the performance of heterogenous structures without scale separation represents a significant challenge to meet the critical requirements on computational scalability and efficiency -- adopting a mesh fine enough to fully account for the small-scale heterogeneities leads to prohibitive computational costs while simply ignoring these fine heterogeneities tends to drastically over-stiffen the structure's rigidity. This study proposes an approach to construct new material-aware shape (basis) functions per element on a coarse discretization of the structure with respect to each curved bridge nodes (CBNs) defined along the elements' boundaries. Instead of formulating their derivation by solving a nonlinear optimization problem, the shape functions are constructed by building a map from the CBNs to the interior nodes and are ultimately presented in an explicit matrix form as a product of a Bézier interpolation transformation and a boundary-interior transformation. The CBN shape function accomodates more flexibility in closely capturing the coarse element's heterogeneity, overcomes the important and challenging issues of inter-element stiffness and displacement discontinuity across interface between coarse elements, and improves the analysis accuracy by orders of magnitude; they also meet the basic geometric properties of shape functions that avoid aphysical analysis results. Extensive numerical examples, including a 3D industrial example of billions of degrees of freedom, are also tested to demonstrate the approach's performance in comparison with results obtained from classical approaches.

preprint2021arXiv

Channel Estimation for Practical IRS-Assisted OFDM Systems

Intelligent reflecting surface (IRS), composed of a large number of hardware-efficient passive elements, is deemed as a potential technique for future wireless communications since it can adaptively enhance the propagation environment. In order to effectively utilize IRS to achieve promising beamforming gains, the problem of channel state information (CSI) acquisition needs to be carefully considered. However, most recent works assume to employ an ideal IRS, i.e., each reflecting element has constant amplitude, variable phase shifts, as well as the same response for the signals with different frequencies, which will cause severe estimation error due to the mismatch between the ideal IRS and the practical one. In this paper, we study channel estimation in practical IRS-aided orthogonal frequency division multiplexing (OFDM) systems with discrete phase shifts. Different from the prior works which assume that IRS has an ideal reflection model, we perform channel estimation by considering amplitude-phase shift-frequency relationship for the response of practical IRS. Aiming at minimizing normalized-mean-square-error (NMSE) of the estimated channel, a novel IRS time-varying reflection pattern is designed by leveraging the alternating optimization (AO) algorithm for the case of using low-resolution phase shifters. Moreover, for the high-resolution IRS cases, we provide another practical reflection pattern scheme to further reduce the complexity. Simulation results demonstrate the necessity of considering practical IRS model for channel estimation and the effectiveness of our proposed channel estimation methods.

preprint2021arXiv

Classical-to-quantum transition in multimode nonlinear systems with strong photon-photon coupling

With advanced micro- and nano-photonic structures, the vacuum photon-photon coupling rate is anticipated to approach the intrinsic loss rate and lead to unconventional quantum effects. Here, we investigate the classical-to-quantum transition of such photonic nonlinear systems using the quantum cluster-expansion method, which addresses the computational challenge in tracking large photon number states of the fundamental and harmonic optical fields involved in the second harmonic generation process. Compared to the mean-field approximation used in weak coupling limit, the quantum cluster-expansion method solves multimode dynamics efficiently and reveals the quantum behaviors of optical parametric oscillations around the threshold. This work presents a universal tool to study quantum dynamics of multimode systems and explore the nonlinear photonic devices for continuous-variable quantum information processing.

preprint2021arXiv

Don't Change Me! User-Controllable Selective Paraphrase Generation

In the paraphrase generation task, source sentences often contain phrases that should not be altered. Which phrases, however, can be context dependent and can vary by application. Our solution to this challenge is to provide the user with explicit tags that can be placed around any arbitrary segment of text to mean "don't change me!" when generating a paraphrase; the model learns to explicitly copy these phrases to the output. The contribution of this work is a novel data generation technique using distant supervision that allows us to start with a pretrained sequence-to-sequence model and fine-tune a paraphrase generator that exhibits this behavior, allowing user-controllable paraphrase generation. Additionally, we modify the loss during fine-tuning to explicitly encourage diversity in model output. Our technique is language agnostic, and we report experiments in English and Chinese.

preprint2021arXiv

Dual-Functional Radar-Communication Waveform Design: A Symbol-Level Precoding Approach

Dual-functional radar-communication (DFRC) systems can simultaneously perform both radar and communication functionalities using the same hardware platform and spectrum resource. In this paper, we consider multi-input multi-output (MIMO) DFRC systems and focus on transmit beamforming designs to provide both radar sensing and multi-user communications. Unlike conventional block-level precoding techniques, we propose to use the recently emerged symbol-level precoding approach in DFRC systems, which provides additional degrees of freedom (DoFs) that guarantee preferable instantaneous transmit beampatterns for radar sensing and achieve better communication performance. In particular, the squared error between the designed and desired beampatterns is minimized subject to the quality-of-service (QoS) requirements of the communication users and the constant-modulus power constraint. Two efficient algorithms are developed to solve this non-convex problem on both the Euclidean and Riemannian spaces. The first algorithm employs penalty dual decomposition (PDD), majorization-minimization (MM), and block coordinate descent (BCD) methods to convert the original optimization problem into two solvable sub-problems, and iteratively solves them using efficient algorithms. The second algorithm provides a much faster solution at the price of a slight performance loss, first transforming the original problem into Riemannian space, and then utilizing the augmented Lagrangian method (ALM) to obtain an unconstrained problem that is subsequently solved via a Riemannian Broyden-Fletcher-Goldfarb-Shanno (RBFGS) algorithm. Extensive simulations verify the distinct advantages of the proposed symbol-level precoding designs in both radar sensing and multi-user communications.

preprint2021arXiv

First saturation correction in high energy proton-nucleus collisions: Part III. Ensemble averaging

In high energy proton-nucleus collisions, the gluon saturation effects from the nucleus are fully incorporated into the light-like Wilson lines. The gluon saturation effects from the proton, which are anticipated to be important either in the extreme high energy limit or towards the dense-dense (nucleus-nucleus) collision regimes, have been studied perturbatively within the Color Glass Condensate effective theory in previous papers of this series. A configuration-by-configuration expression for the single inclusive semi-hard gluon production including the first saturation correction was obtained. In this paper, we perform ensemble averaging in the McLerran-Venugopalan model and the Dipole Approximation. We find that, in the saturation correction, the effects of the initial state interactions are negligible while the final state interactions play most important role and give a positive-valued contribution to the semi-hard gluon spectrum. Furthermore, we show that the single gluon spectrum scales approximately $1/k_{\perp}^{4}$ at small $k_{\perp}$, suggesting that a resummation of higher order saturation corrections is required to regulate the infrared region of the gluon spectrum.

preprint2021arXiv

Intelligent reflecting surface assisted multi-cell multi-band wireless networks

Intelligent reflecting surface (IRS) is deemed as a promising and revolutionizing technology for future wireless communication systems owing to its capability to intelligently change the propagation environment and introduce a new dimension into wireless communication optimization. Most existing studies on IRS are based on an ideal reflection model. However, it is difficult to implement an IRS which can simultaneously realize any adjustable phase shift for the signals with different frequencies. Therefore, the practical phase shift model, which can describe the difference of IRS phase shift responses for the signals with different frequencies, should be utilized in the IRS optimization for wideband and multi-band systems. In this paper, we consider an IRS-assisted multi-cell multi-band system, in which different base stations (BSs) operate at different frequency bands. We aim to jointly design the transmit beamforming of BSs and the reflection beamforming of the IRS to minimize the total transmit power subject to signal to interference-plus-noise ratio (SINR) constraints of individual user and the practical IRS reflection model. With the aid of the practical phase shift model, the influence between the signals with different frequencies is taken into account during the design of IRS. Simulation results illustrate the importance of considering the practical communication scenario on the IRS designs and validate the effectiveness of our proposed algorithm.

preprint2021arXiv

Intelligent Reflecting Surface based Passive Information Transmission: A Symbol-Level Precoding Approach

Intelligent reflecting surfaces (IRS) have been proposed as a revolutionary technology owing to its capability of adaptively reconfiguring the propagation environment in a cost-effective and hardware-efficient fashion. While the application of IRS as a passive reflector to enhance the performance of wireless communications has been widely investigated in the literature, using IRS as a passive transmitter recently is emerging as a new concept and attracting steadily growing interest. In this paper, we propose two novel IRS-based passive information transmission systems using advanced symbol-level precoding. One is a standalone passive information transmission system, where the IRS operates as a passive transmitter serving multiple receivers by adjusting its elements to reflect unmodulated carrier signals. The other is a joint passive reflection and information transmission system, where the IRS not only enhances transmissions for multiple primary information receivers (PIRs) by passive reflection, but also simultaneously delivers additional information to a secondary information receiver (SIR) by embedding its information into the primary signals at the symbol level. Two typical optimization problems, i.e., power minimization and quality-of-service (QoS) balancing, are investigated for the proposed IRS-based passive information transmission systems. Simulation results demonstrate the feasibility of IRS-based passive information transmission and the effectiveness of our proposed algorithms, as compared to other benchmark schemes.

preprint2021arXiv

Intelligent reflecting surface enhanced wideband MIMO-OFDM communications: From practical model to reflection optimization

Intelligent reflecting surface (IRS) is envisioned as a revolutionary technology for future wireless communication systems since it can intelligently change radio environment and integrate it into wireless communication optimization. However, most existing works adopted an ideal IRS reflection model, which is impractical and can cause significant performance degradation in realistic wideband systems. To address this issue, we first study the dual phase- and amplitude-squint effect of reflected signals and present a simplified practical IRS reflection model for wideband signals. Then, an IRS enhanced wideband multiuser multi-input single-output orthogonal frequency division multiplexing (MU-MISO-OFDM) system is investigated. We aim to jointly design the transmit beamformer and IRS reflection for the case of using both continuous and discrete phase shifters to maximize the average sum-rate over all subcarriers. By exploiting the relationship between sum-rate maximization and mean square error (MSE) minimization, the original problem is equivalently transformed into a multi-block/variable problem, which can be efficiently solved by the block coordinate descent (BCD) method. Complexity and convergence for both cases are analyzed or illustrated. Simulation results demonstrate that the proposed algorithm can offer significant average sum-rate enhancement compared to that achieved using the ideal IRS reflection model, which confirms the importance of the use of the practical model for the design of wideband systems.

preprint2021arXiv

MathNet: Haar-Like Wavelet Multiresolution-Analysis for Graph Representation and Learning

Graph Neural Networks (GNNs) have recently caught great attention and achieved significant progress in graph-level applications. In this paper, we propose a framework for graph neural networks with multiresolution Haar-like wavelets, or MathNet, with interrelated convolution and pooling strategies. The underlying method takes graphs in different structures as input and assembles consistent graph representations for readout layers, which then accomplishes label prediction. To achieve this, the multiresolution graph representations are first constructed and fed into graph convolutional layers for processing. The hierarchical graph pooling layers are then involved to downsample graph resolution while simultaneously remove redundancy within graph signals. The whole workflow could be formed with a multi-level graph analysis, which not only helps embed the intrinsic topological information of each graph into the GNN, but also supports fast computation of forward and adjoint graph transforms. We show by extensive experiments that the proposed framework obtains notable accuracy gains on graph classification and regression tasks with performance stability. The proposed MathNet outperforms various existing GNN models, especially on big data sets.

preprint2021arXiv

Numerical convergence of pre-initial conditions on dark matter halo properties

Generating pre-initial conditions (or particle loads) is the very first step to set up a cosmological N-body simulation. In this work, we revisit the numerical convergence of pre-initial conditions on dark matter halo properties using a set of simulations which only differs in initial particle loads, i.e. grid, glass, and the newly introduced capacity constrained Voronoi tessellation (CCVT). We find that the median halo properties agree fairly well (i.e. within a convergence level of a few per cent) among simulations running from different initial loads. We also notice that for some individual haloes cross-matched among different simulations, the relative difference of their properties sometimes can be several tens of per cent. By looking at the evolution history of these poorly converged haloes, we find that they are usually merging haloes or haloes have experienced recent merger events, and their merging processes in different simulations are out-of-sync, making the convergence of halo properties become poor temporarily. We show that, comparing to the simulation starting with an anisotropic grid load, the simulation with an isotropic CCVT load converges slightly better to the simulation with a glass load, which is also isotropic. Among simulations with different pre-initial conditions, haloes in higher density environments tend to have their properties converged slightly better. Our results confirm that CCVT loads behave as well as the widely used grid and glass loads at small scales, and for the first time we quantify the convergence of two independent isotropic particle loads (i.e. glass and CCVT) on halo properties.

preprint2021arXiv

Organization of cooperation in fractal structures

It is known that the small-world structure constitutes sufficient conditions to sustain cooperation and thus enhances cooperation. On the contrary, the network with a very long average distance is usually thought of as suppressing the emergence of the cooperation. In this paper we show that the fractal structure, of which the average distance is very long, does not always play a negative role in the organization of cooperation. Compared to regular networks, the fractal structure might even facilitate the emergence of cooperation. This mainly depends on the existence of locally compact clusters. The sparse inter-connection between these clusters constructs an asymmetric barrier that the defection strategy is almost impossible to cross, but the cooperation strategy has a not too small chance. More generally, the network need not to be a standard fractal, as long as such structures exist. In turn, when this typical structure is absent, the fractal structure will also suppress the emergence of the cooperation, such as the fractal configuration obtained by diluting a random tree-like network. Our findings also clarify some contradictions in the previous studies, and suggest that both removing and inserting links from/into a regular network can enhance cooperation.

preprint2021arXiv

Quantum microwave photonics

By harnessing quantum superposition and entanglement, remarkable progress has sprouted over the past three decades from different areas of research in communication computation and simulation. To further improve the processing ability of microwave pho-tonics, here, we have demonstrated a quantum microwave photonic processing system using a low jitter superconducting nanowire single photon detector (SNSPD) and a time-correlated single-photon counting (TCSPC) module. This method uniquely combines extreme optical sensitivity, down to a single-photon level (below -100 dBm), and wide processing bandwidth, twice higher than the transmission bandwidth of the cable. Moreover, benefitted from the trigger, the system can selectively process the desired RF signal and attenuates the other in-tense noise and undesired RF components even the power is 15dB greater than the desired signal power. Using this method we show microwave phase shifting and frequency filtering for the desired RF signal on the single-photon level. Besides its applications in space and under-water communications and testing and qualification of pre-packaged photonic modulators and detectors. This RF signal processing capability at the single-photon level can lead to significant development in the high-speed quantum processing method.

preprint2021arXiv

The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems

The 2020 Personalized Voice Trigger Challenge (PVTC2020) addresses two different research problems a unified setup: joint wake-up word detection with speaker verification on close-talking single microphone data and far-field multi-channel microphone array data. Specially, the second task poses an additional cross-channel matching challenge on top of the far-field condition. To simulate the real-life application scenario, the enrollment utterances are recorded from close-talking cell-phone only, while the test utterances are recorded from both the close-talking cell-phone and the far-field microphone arrays. This paper introduces our challenge setup and the released database as well as the evaluation metrics. In addition, we present a joint end-to-end neural network baseline system trained with the proposed database for speaker-dependent wake-up word detection. Results show that the cost calculated from the miss rate and the false alarm rate, can reach 0.37 in the close-talking single microphone task and 0.31 in the far-field microphone array task. The official website and the open-source baseline system have been released.

preprint2021arXiv

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

In this paper, we present the submitted system for the third DIHARD Speech Diarization Challenge from the DKU-Duke-Lenovo team. Our system consists of several modules: voice activity detection (VAD), segmentation, speaker embedding extraction, attentive similarity scoring, agglomerative hierarchical clustering. In addition, the target speaker VAD (TSVAD) is used for the phone call data to further improve the performance. Our final submitted system achieves a DER of 15.43% for the core evaluation set and 13.39% for the full evaluation set on task 1, and we also get a DER of 21.63% for core evaluation set and 18.90% for full evaluation set on task 2.

preprint2021arXiv

Tight upper bound on the quantum value of Svetlichny operators under local filtering and hidden genuine nonlocality

Nonlocal quantum correlations among the quantum subsystems play essential roles in quantum science. The violation of the Svetlichny inequality provides sufficient conditions of genuine tripartite nonlocality. We provide tight upper bounds on the maximal quantum value of the Svetlichny operators under local filtering operations, and present a qualitative analytical analysis on the hidden genuine nonlocality for three-qubit systems. We investigate in detail two classes of three-qubit states whose hidden genuine nonlocalities can be revealed by local filtering.

preprint2021arXiv

Two-dimensional antiferroelectric tunnel junction

Ferroelectric tunnel junctions (FTJs), which consist of two metal electrodes separated by a thin ferroelectric barrier, have recently aroused significant interest for technological applications as nanoscale resistive switching devices. So far, most of existing FTJs have been based on perovskite-oxide barrier layers. The recent discovery of the two-dimensional (2D) van der Waals ferroelectric materials opens a new route to realize tunnel junctions with new functionalities and nm-scale dimensions. Due to the weak coupling between the atomic layers in these materials, the relative dipole alignment between them can be controlled by applied voltage. This allows transitions between ferroelectric and antiferroelectric orderings, resulting in significant changes of the electronic structure. Here, we propose to realize 2D antiferroelectric tunnel junctions (AFTJs), which exploit this new functionality, based on bilayer In$_2$X$_3$ (X = S, Se, Te) barriers and different 2D electrodes. Using first-principles density functional theory calculations, we demonstrate that the In$_2$X$_3$ bilayers exhibit stable ferroelectric and antiferroelectric states separated by sizable energy barriers, thus supporting a non-volatile switching between these states. Using quantum-mechanical modeling of the electronic transport, we explore in-plane and out-of-plane tunneling across the In$_2$S$_3$ van der Waals bilayers, and predict giant tunneling electroresistance (TER) effects and multiple non-volatile resistance states driven by ferroelectric-antiferroelectric order transitions. Our proposal opens a new route to realize nanoscale memory devices with ultrahigh storage density using 2D AFTJs.

preprint2020arXiv

A critical survey on the kinetic assays of DNA polymerase fidelity from a new theoretical perspective

The high fidelity of DNA polymerase is critical for the faithful replication of genomic DNA. Several approaches were proposed to quantify the fidelity of DNA polymerase. Direct measurements of the error frequency of the replication products definitely give the true fidelity but turn out very hard to implement. Two biochemical kinetic approaches, the steady-state assay and the transient-state assay, were then suggested and widely adopted. In these assays, the error frequency is indirectly estimated by using the steady-state or the transient-state kinetic theory combined with the measured kinetic rates. However, whether these indirectly estimated fidelities are equivalent to the true fidelity has never been clarified theoretically, and in particular there are different strategies to quantify the proofreading efficiency of DNAP but often lead to inconsistent results. The reason for all these confusions is that it's mathematically challenging to formulate a rigorous and general theory of the true fidelity. Recently we have succeeded to establish such a theoretical framework. In this paper, we develop this theory to make a comprehensive examination on the theoretical foundation of the kinetic assays and the relation between fidelities obtained by different methods. We conclude that while the steady-state assay and the transient-state assay can always measure the true fidelity of exonuclease-deficient DNA polymerases, they only do so for exonuclease-efficient DNA polymerases conditionally (the proper way to use these assays to quantify the proofreading efficiency is also suggested). We thus propose a new kinetic approach, the single-molecule assay, which indirectly but precisely characterizes the true fidelity of either exonuclease-deficient or exonuclease-efficient DNA polymerases.

preprint2020arXiv

Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection

In this paper, we propose a deep convolutional neural network-based acoustic word embedding system on code-switching query by example spoken term detection. Different from previous configurations, we combine audio data in two languages for training instead of only using one single language. We transform the acoustic features of keyword templates and searching content to fixed-dimensional vectors and calculate the distances between keyword segments and searching content segments obtained in a sliding manner. An auxiliary variability-invariant loss is also applied to training data within the same word but different speakers. This strategy is used to prevent the extractor from encoding undesired speaker- or accent-related information into the acoustic word embeddings. Experimental results show that our proposed system produces promising searching results in the code-switching test scenario. With the increased number of templates and the employment of variability-invariant loss, the searching performance is further enhanced.

preprint2020arXiv

Atss-Net: Target Speaker Separation via Attention-based Neural Network

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.

preprint2020arXiv

Characterizing multipartite entanglement by violation of CHSH inequalities

Entanglement of high-dimensional and multipartite quantum systems offer promising perspectives in quantum information processing. However, the characterization and measure of such kind of entanglement is of great challenge. Here we consider the overlaps between the maximal quantum mean values and the classical bound of the CHSH inequalities for pairwise-qubit states in two-dimensional subspaces. We show that the concurrence of a pure state in any high-dimensional multipartite system can be equivalently represented by these overlaps. Here we consider the projections of an arbitrary high-dimensional multipartite state to two-qubit states. We investigate the non-localities of these projected two-qubit sub-states by their violations of CHSH inequalities. From these violations, the overlaps between the maximal quantum mean values and the classical bound of the CHSH inequality, we show that the concurrence of a high-dimensional multipartite pure state can be exactly expressed by these overlaps. We further derive a lower bound of the concurrence for any quantum states, which is tight for pure states. The lower bound not only imposes restriction on the non-locality distributions among the pairwise qubit states, but also supplies a sufficient condition for distillation of bipartite entanglement. Effective criteria for detecting genuine tripartite entanglement and the lower bound of concurrence for genuine tripartite entanglement are also presented based on such non-localities.

preprint2020arXiv

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

Modeling voices for multiple speakers and multiple languages in one text-to-speech system has been a challenge for a long time. This paper presents an extension on Tacotron2 to achieve bilingual multispeaker speech synthesis when there are limited data for each language. We achieve cross-lingual synthesis, including code-switching cases, between English and Mandarin for monolingual speakers. The two languages share the same phonemic representations for input, while the language attribute and the speaker identity are independently controlled by language tokens and speaker embeddings, respectively. In addition, we investigate the model's performance on the cross-lingual synthesis, with and without a bilingual dataset during training. With the bilingual dataset, not only can the model generate high-fidelity speech for all speakers concerning the language they speak, but also can generate accented, yet fluent and intelligible speech for monolingual speakers regarding non-native language. For example, the Mandarin speaker can speak English fluently. Furthermore, the model trained with bilingual dataset is robust for code-switching text-to-speech, as shown in our results and provided samples.{https://caizexin.github.io/mlms-syn-samples/index.html}.

preprint2020arXiv

Data Inference from Encrypted Databases: A Multi-dimensional Order-Preserving Matching Approach

Due to increasing concerns of data privacy, databases are being encrypted before they are stored on an untrusted server. To enable search operations on the encrypted data, searchable encryption techniques have been proposed. Representative schemes use order-preserving encryption (OPE) for supporting efficient Boolean queries on encrypted databases. Yet, recent works showed the possibility of inferring plaintext data from OPE-encrypted databases, merely using the order-preserving constraints, or combined with an auxiliary plaintext dataset with similar frequency distribution. So far, the effectiveness of such attacks is limited to single-dimensional dense data (most values from the domain are encrypted), but it remains challenging to achieve it on high-dimensional datasets (e.g., spatial data) which are often sparse in nature. In this paper, for the first time, we study data inference attacks on multi-dimensional encrypted databases (with 2-D as a special case). We formulate it as a 2-D order-preserving matching problem and explore both unweighted and weighted cases, where the former maximizes the number of points matched using only order information and the latter further considers points with similar frequencies. We prove that the problem is NP-hard, and then propose a greedy algorithm, along with a polynomial-time algorithm with approximation guarantees. Experimental results on synthetic and real-world datasets show that the data recovery rate is significantly enhanced compared with the previous 1-D matching algorithm.

preprint2020arXiv

Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution

Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation. Recently, deep learning models have been proposed to learn the representation of users' overall interests, while ignoring the fact that interests may dynamically change over time. We argue that it is necessary to consider the continuous-time information in CTR models to track user interest trend from rich historical behaviors. In this paper, we propose a novel Deep Time-Stream framework (DTS) which introduces the time information by an ordinary differential equations (ODE). DTS continuously models the evolution of interests using a neural network, and thus is able to tackle the challenge of dynamically representing users' interests based on their historical behaviors. In addition, our framework can be seamlessly applied to any existing deep CTR models by leveraging the additional Time-Stream Module, while no changes are made to the original CTR models. Experiments on public dataset as well as real industry dataset with billions of samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with existing methods.

preprint2020arXiv

Demo: iJam with Channel Randomization

Physical-layer key generation methods utilize the variations of the communication channel to achieve a secure key agreement between two parties with no prior security association. Their secrecy rate (bit generation rate) depends heavily on the randomness of the channel, which may reduce significantly in a stable environment. Existing methods seek to improve the secrecy rate by injecting artificial noise into the channel. Unfortunately, noise injection cannot alter the underlying channel state, which depends on the multipath environment between the transmitter and receiver. Consequently, these methods are known to leak key bits toward multi-antenna eavesdroppers, which is capable of filtering the noise through the differential of multiple signal receptions. This work demonstrates an improved approach to reinforce physical-layer key generation schemes, e.g., channel randomization. The channel randomization approach leverages a reconfigurable antenna to rapidly change the channel state during transmission, and an angle-of-departure (AoD) based channel estimation algorithm to cancel the changing effects for the intended receiver. The combined result is a communication channel stable in the eyes of the intended receiver but randomly changing from the viewpoint of the eavesdropper. We augmented an existing physical-layer key generation protocol, iJam, with the proposed approach and developed a full-fledged remote instrumentation platform to demonstrate its performance. Our evaluations show that augmentation does not affect the bit error rate (BER) of the intended receiver during key establishment but reduces the eavesdropper's BER to the level of random guessing, regardless of the number of antennas it equips.

preprint2020arXiv

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.

preprint2020arXiv

Discovery of dissipative microwave photonic solitons

Dissipative solitons rely on the double balance between nonlinearity and dispersion as well as gain and loss have attracted a lot of attention in optics, since it gives rise to ultrashort pulses and broadband frequency combs with good stability and smooth spectral envelopes. Here we observe a novel dissipative solitons in microwave photonics that gives rise to wideband tunable frequency hopping microwave signals with fast frequency switching speed. The dissipative microwave photonic solitions are achieved through the double balance between nonlinear gain saturation and linear filtering as well as gain and loss in a microwave photonic resonant cavity. The generation of dissipative solitons with different pulse width, repletion rate and number of solitons per round-trip time are observed, together with the corresponding wideband tunable frequency hopping microwave signals. This work opens new avenues for signal generation, processing and control based on the principle of solitons in microwave photonics, and has great potential in many applications such as modern radars, electronic warfare systems, and telecommunications.

preprint2020arXiv

Distributed Deep Forest and its Application to Automatic Detection of Cash-out Fraud

Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data and different evaluation metrics for automatically determining of the cascade level. We tested the deep forest model on an extra-large scale task, i.e., automatic detection of cash-out fraud, with more than 100 millions of training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring into a significant decrease in economic loss each day.

preprint2020arXiv

Diversity-Achieving Slow-DropBlock Network for Person Re-Identification

A big challenge of person re-identification (Re-ID) using a multi-branch network architecture is to learn diverse features from the ID-labeled dataset. The 2-branch Batch DropBlock (BDB) network was recently proposed for achieving diversity between the global branch and the feature-dropping branch. In this paper, we propose to move the dropping operation from the intermediate feature layer towards the input (image dropping). Since it may drop a large portion of input images, this makes the training hard to converge. Hence, we propose a novel double-batch-split co-training approach for remedying this problem. In particular, we show that the feature diversity can be well achieved with the use of multiple dropping branches by setting individual dropping ratio for each branch. Empirical evidence demonstrates that the proposed method performs superior to BDB on popular person Re-ID datasets, including Market-1501, DukeMTMC-reID and CUHK03 and the use of more dropping branches can further boost the performance.

preprint2020arXiv

Domain Aware Training for Far-field Small-footprint Keyword Spotting

In this paper, we focus on the task of small-footprint keyword spotting under the far-field scenario. Far-field environments are commonly encountered in real-life speech applications, causing severe degradation of performance due to room reverberation and various kinds of noises. Our baseline system is built on the convolutional neural network trained with pooled data of both far-field and close-talking speech. To cope with the distortions, we develop three domain aware training systems, including the domain embedding system, the deep CORAL system, and the multi-task learning system. These methods incorporate domain knowledge into network training and improve the performance of the keyword classifier on far-field conditions. Experimental results show that our proposed methods manage to maintain the performance on the close-talking speech and achieve significant improvement on the far-field test set.

preprint2020arXiv

Effects of Conical Intersections on Hyperfine Quenching of Hydroxyl OH in collision with an ultracold Sr atom

The effect of conical intersections (CIs) on electronic relaxation, transitions from excited states to ground states, is well studied, but their influence on hyperfine quenching in a reactant molecule is not known. Here, we report on ultracold collision dynamics of the hydroxyl free-radical OH with Sr atoms leading to quenching of OH hyperfine states. Our quantum-mechanical calculations of this process reveal that quenching is efficient due to anomalous molecular dynamics in the vicinity of the conical intersection at collinear geometry. We observe wide scattering resonance features in both elastic and inelastic rate coefficients at collision energies below k x 10 mK. They are identified as either p- or d-wave shape resonances. We also describe the electronic potentials relevant for these non-reactive collisions, their diabatization procedure, as well as the non-adiabatic coupling between the diabatic potentials near the CIs.

preprint2020arXiv

Frequency stabilization and tuning of breathing soliton in SiN microresonators

Dissipative Kerr soliton offers broadband coherent and low-noise frequency comb and stable temporal pulse train, having shown great potential applications in spectroscopy, communications, and metrology. Breathing soliton is a particular dissipative Kerr soliton that the pulse duration and peak intensity show periodic oscillation. However, the noise and stability of the breathing soliton is still remaining unexplored, which would be the main obstacle for future applications. Here, we have investigated the breathing dissipative Kerr solitons in the silicon nitride (SiN) microrings, while the breather period shows uncertainties around MHz in both simulation and experiments. By applying a modulated pump, the breathing frequency can be injectively locked to the modulation and tuned over tens of MHz with frequency noise significantly suppressed. Our demonstration offers an alternative knob for the controlling of soliton dynamics in microresonator and paves a new avenue towards practical applications of breathing soliton.

preprint2020arXiv

From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint

High-fidelity speech can be synthesized by end-to-end text-to-speech models in recent years. However, accessing and controlling speech attributes such as speaker identity, prosody, and emotion in a text-to-speech system remains a challenge. This paper presents a system involving feedback constraint for multispeaker speech synthesis. We manage to enhance the knowledge transfer from the speaker verification to the speech synthesis by engaging the speaker verification network. The constraint is taken by an added loss related to the speaker identity, which is centralized to improve the speaker similarity between the synthesized speech and its natural reference audio. The model is trained and evaluated on publicly available datasets. Experimental results, including visualization on speaker embedding space, show significant improvement in terms of speaker identity cloning in the spectrogram level. Synthesized samples are available online for listening. (https://caizexin.github.io/mlspk-syn-samples/index.html)

preprint2020arXiv

Generalized Hamiltonian to describe imperfections in ion-light interaction

We derive a general Hamiltonian that governs the interaction between an $N$-ion chain and an externally controlled laser field, where the ion motion is quantized and the laser field is considered beyond the plane-wave approximation. This general form not only explicitly includes terms that are used to drive ion-ion entanglement, but also a series of unwanted terms that can lead to quantum gate infidelity. We demonstrate the power of our expressivity of the general Hamiltonian by singling out the effect of axial mode heating and confirm this experimentally. We discuss pathways forward in furthering the trapped-ion quantum computational quality, guiding hardware design decisions.

preprint2020arXiv

Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders

Computer poetry generation is our first step towards computer writing. Writing must have a theme. The current approaches of using sequence-to-sequence models with attention often produce non-thematic poems. We present a novel conditional variational autoencoder with a hybrid decoder adding the deconvolutional neural networks to the general recurrent neural networks to fully learn topic information via latent variables. This approach significantly improves the relevance of the generated poems by representing each line of the poem not only in a context-sensitive manner but also in a holistic way that is highly related to the given keyword and the learned topic. A proposed augmented word2vec model further improves the rhythm and symmetry. Tests show that the generated poems by our approach are mostly satisfying with regulated rules and consistent themes, and 73.42% of them receive an Overall score no less than 3 (the highest score is 5).

preprint2020arXiv

GhostImage: Remote Perception Attacks against Camera-based Image Classification Systems

In vision-based object classification systems imaging sensors perceive the environment and machine learning is then used to detect and classify objects for decision-making purposes; e.g., to maneuver an automated vehicle around an obstacle or to raise an alarm to indicate the presence of an intruder in surveillance settings. In this work we demonstrate how the perception domain can be remotely and unobtrusively exploited to enable an attacker to create spurious objects or alter an existing object. An automated system relying on a detection/classification framework subject to our attack could be made to undertake actions with catastrophic results due to attacker-induced misperception. We focus on camera-based systems and show that it is possible to remotely project adversarial patterns into camera systems by exploiting two common effects in optical imaging systems, viz., lens flare/ghost effects and auto-exposure control. To improve the robustness of the attack to channel effects, we generate optimal patterns by integrating adversarial machine learning techniques with a trained end-to-end channel model. We experimentally demonstrate our attacks using a low-cost projector, on three different image datasets, in indoor and outdoor environments, and with three different cameras. Experimental results show that, depending on the projector-camera distance, attack success rates can reach as high as 100% and under targeted conditions.

preprint2020arXiv

Haar Graph Pooling

Deep Graph Neural Networks (GNNs) are useful models for graph classification and graph-based regression tasks. In these tasks, graph pooling is a critical ingredient by which GNNs adapt to input graphs of varying size and structure. We propose a new graph pooling operation based on compressive Haar transforms -- HaarPooling. HaarPooling implements a cascade of pooling operations; it is computed by following a sequence of clusterings of the input graph. A HaarPooling layer transforms a given input graph to an output graph with a smaller node number and the same feature dimension; the compressive Haar transform filters out fine detail information in the Haar wavelet domain. In this way, all the HaarPooling layers together synthesize the features of any given input graph into a feature vector of uniform size. Such transforms provide a sparse characterization of the data and preserve the structure information of the input graph. GNNs implemented with standard graph convolution layers and HaarPooling layers achieve state of the art performance on diverse graph classification and regression problems.

preprint2020arXiv

HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines

This paper presents a far-field text-dependent speaker verification database named HI-MIA. We aim to meet the data requirement for far-field microphone array based speaker verification since most of the publicly available databases are single channel close-talking and text-independent. The database contains recordings of 340 people in rooms designed for the far-field scenario. Recordings are captured by multiple microphone arrays located in different directions and distance to the speaker and a high-fidelity close-talking microphone. Besides, we propose a set of end-to-end neural network based baseline systems that adopt single-channel data for training. Moreover, we propose a testing background aware enrollment augmentation strategy to further enhance the performance. Results show that the fusion systems could achieve 3.29% EER in the far-field enrollment far field testing task and 4.02% EER in the close-talking enrollment and far-field testing task.

preprint2020arXiv

History-dependent percolation on multiplex networks

The structure of interconnected systems and its impact on the system dynamics is a much-studied cross-disciplinary topic. Although various critical phenomena have been found in different models, the study on the connections between different percolation transitions is still lacking. Here we propose a unified framework to study the origins of the discontinuous transitions of the percolation process on interacting networks. The model evolves in generations with the result of the present percolation depending on the previous state and thus is history-dependent. Both theoretical analysis and Monte Carlo simulations reveal that the nature of the transition remains the same at finite generations but exhibits an abrupt change for the infinite generation. We use brain functional correlation and morphological similarity data to show that our model also provides a general method to explore the network structure and can contribute to many practical applications, such as detecting the abnormal structures of human brain networks.

preprint2020arXiv

JIMWLK Evolution, Lindblad Equation and Quantum-Classical Correspondence

In the Color Glass Condensate(CGC) effective theory, the physics of valence gluons with large longitudinal momentum is reflected in the distribution of color charges in the transverse plane. Averaging over the valence degrees of freedom is effected by integrating over classical color charges with some quasi probability weight functional $W[{\mathbf{j}}]$ whose evolution with rapidity is governed by the JIMWLK equation. In this paper, we reformulate this setup in terms of effective quantum field theory on valence Hilbert space governed by the reduced density matrix $\hatρ$ for hard gluons, which is obtained after properly integrating out the soft gluon "environment". We show that the evolution of this density matrix with rapidity in the dense and dilute limits has the form of Lindblad equation. The quasi probability distribution (weight) functional $W$ is directly related to the reduced density matrix $\hatρ$ through the generalization of the Wigner-Weyl quantum-classical correspondence, which reformulates quantum dynamics on Hilbert space in terms of classical dynamics on the phase space. In the present case the phase space is non Abelian and is spanned by the components of transverse color charge density ${\mathbf{j}}$. The same correspondence maps the Lindblad equation for $\hatρ$ into the JIMWLK evolution equation for $W$ .

preprint2020arXiv

Learning Diverse Features with Part-Level Resolution for Person Re-Identification

Learning diverse features is key to the success of person re-identification. Various part-based methods have been extensively proposed for learning local representations, which, however, are still inferior to the best-performing methods for person re-identification. This paper proposes to construct a strong lightweight network architecture, termed PLR-OSNet, based on the idea of Part-Level feature Resolution over the Omni-Scale Network (OSNet) for achieving feature diversity. The proposed PLR-OSNet has two branches, one branch for global feature representation and the other branch for local feature representation. The local branch employs a uniform partition strategy for part-level feature resolution but produces only a single identity-prediction loss, which is in sharp contrast to the existing part-based methods. Empirical evidence demonstrates that the proposed PLR-OSNet achieves state-of-the-art performance on popular person Re-ID datasets, including Market1501, DukeMTMC-reID and CUHK03, despite its small model size.

preprint2020arXiv

Learning to Utilize Correlated Auxiliary Noise: A Possible Quantum Advantage

This paper has two messages. First, we demonstrate that neural networks that process noisy data can learn to exploit, when available, access to auxiliary noise that is correlated with the noise on the data. In effect, the network learns to use the correlated auxiliary noise as an approximate key to decipher its noisy input data. Second, we show that, for this task, the scaling behavior with increasing noise is such that future quantum machines could possess an advantage. In particular, decoherence generates correlated auxiliary noise in the environment. The new approach could, therefore, help enable future quantum machines by providing machine-learned quantum error correction.

preprint2020arXiv

LodoNet: A Deep Neural Network with 2D Keypoint Matchingfor 3D LiDAR Odometry Estimation

Deep learning based LiDAR odometry (LO) estimation attracts increasing research interests in the field of autonomous driving and robotics. Existing works feed consecutive LiDAR frames into neural networks as point clouds and match pairs in the learned feature space. In contrast, motivated by the success of image based feature extractors, we propose to transfer the LiDAR frames to image space and reformulate the problem as image feature extraction. With the help of scale-invariant feature transform (SIFT) for feature extraction, we are able to generate matched keypoint pairs (MKPs) that can be precisely returned to the 3D space. A convolutional neural network pipeline is designed for LiDAR odometry estimation by extracted MKPs. The proposed scheme, namely LodoNet, is then evaluated in the KITTI odometry estimation benchmark, achieving on par with or even better results than the state-of-the-art.

preprint2020arXiv

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020. For the mask detection task, we train deep convolutional neural networks with filter-bank energies, gender-aware features, and speaker-aware features. Support Vector Machines follows as the back-end classifiers for binary prediction on the extracted deep embeddings. Several data augmentation schemes are used to increase the quantity of training data and improve our models' robustness, including speed perturbation, SpecAugment, and random erasing. For the speech breath monitoring task, we investigate different bottleneck features based on the Bi-LSTM structure. Experimental results show that our proposed methods outperform the baselines and achieve 0.746 PCC and 78.8% UAR on the Breathing and Mask evaluation set, respectively.

preprint2020arXiv

MSA-MIL: A deep residual multiple instance learning model based on multi-scale annotation for classification and visualization of glomerular spikes

Membranous nephropathy (MN) is a frequent type of adult nephrotic syndrome, which has a high clinical incidence and can cause various complications. In the biopsy microscope slide of membranous nephropathy, spikelike projections on the glomerular basement membrane is a prominent feature of the MN. However, due to the whole biopsy slide contains large number of glomeruli, and each glomerulus includes many spike lesions, the pathological feature of the spikes is not obvious. It thus is time-consuming for doctors to diagnose glomerulus one by one and is difficult for pathologists with less experience to diagnose. In this paper, we establish a visualized classification model based on the multi-scale annotation multi-instance learning (MSA-MIL) to achieve glomerular classification and spikes visualization. The MSA-MIL model mainly involves three parts. Firstly, U-Net is used to extract the region of the glomeruli to ensure that the features learned by the succeeding algorithm are focused inside the glomeruli itself. Secondly, we use MIL to train an instance-level classifier combined with MSA method to enhance the learning ability of the network by adding a location-level labeled reinforced dataset, thereby obtaining an example-level feature representation with rich semantics. Lastly, the predicted scores of each tile in the image are summarized to obtain glomerular classification and visualization of the classification results of the spikes via the usage of sliding window method. The experimental results confirm that the proposed MSA-MIL model can effectively and accurately classify normal glomeruli and spiked glomerulus and visualize the position of spikes in the glomerulus. Therefore, the proposed model can provide a good foundation for assisting the clinical doctors to diagnose the glomerular membranous nephropathy.

preprint2020arXiv

Non-adiabatic quantum interference effects and chaoticity in the ultracold Li + LiNa $\to$ Li$_2$ + Na reaction

Electronically non-adiabatic effects play an important role in many chemical reactions. How these effects manifest in cold and ultracold chemistry remain largely unexplored. Here, through first principles non-adiabatic quantum dynamics calculations of the Li + LiNa $\to$ Li$_2$ + Na chemical reaction, it is shown that non-adiabatic dynamics induces quantum interference effects that dramatically alter the ultracold rotationally resolved reaction rate coefficients. The interference effect arises from a conical intersection between the ground and an excited electronic state that is energetically accessible even for ultracold collisions. These unique interference effects might be exploited for quantum control applications as a quantum molecular switch. A statistical analysis of rotational populations of the Li$_2$ product reveals a Poisson distribution implying an underlying classically chaotic dynamics. The Poisson distribution is robust and amenable to experimental verification and appears to be a universal property of ultracold reactions involving alkali metal dimers.

preprint2020arXiv

Nonlinear Improvement of Qubit-qudit Entanglement Witnesses

The entanglement witness is an important and experimentally applicable tool for entanglement detection. In this paper, we provide a nonlinear improvement of any entanglement witness for $2\otimes d$ quantum systems. Compared with any existing entanglement witness, the improved separability criterion only needs two more measurements on local observables. Detailed examples are employed to illustrate the efficiency of the nonlinear improvement for general, optimal and non-decomposable entanglement witnesses.

preprint2020arXiv

Parameter-Transferred Wasserstein Generative Adversarial Network (PT-WGAN) for Low-Dose PET Image Denoising

Due to the widespread use of positron emission tomography (PET) in clinical practice, the potential risk of PET-associated radiation dose to patients needs to be minimized. However, with the reduction in the radiation dose, the resultant images may suffer from noise and artifacts that compromise diagnostic performance. In this paper, we propose a parameter-transferred Wasserstein generative adversarial network (PT-WGAN) for low-dose PET image denoising. The contributions of this paper are twofold: i) a PT-WGAN framework is designed to denoise low-dose PET images without compromising structural details, and ii) a task-specific initialization based on transfer learning is developed to train PT-WGAN using trainable parameters transferred from a pretrained model, which significantly improves the training efficiency of PT-WGAN. The experimental results on clinical data show that the proposed network can suppress image noise more effectively while preserving better image fidelity than recently published state-of-the-art methods. We make our code available at https://github.com/90n9-yu/PT-WGAN.

preprint2020arXiv

Path Integral Based Convolution and Pooling for Graph Neural Networks

Graph neural networks (GNNs) extends the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose a path integral based graph neural networks (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix we call maximal entropy transition (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus providing a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics we propose to boost applications of GNN in physical sciences.

preprint2020arXiv

Peripheral-free Device Pairing by Randomly Switching Power

The popularity of Internet-of-Things (IoT) comes with security concerns. Attacks against wireless communication venues of IoT (e.g., Man-in-the-Middle attacks) have grown at an alarming rate over the past decade. Pairing, which allows the establishment of the secure communicating channels for IoT devices without a prior relationship, is thus a paramount capability. Existing secure pairing protocols require auxiliary equipment/peripheral (e.g., displays, speakers and sensors) to achieve authentication, which is unacceptable for low-priced devices such as smart lamps. This paper studies how to design a peripheral-free secure pairing protocol. Concretely, we design the protocol, termed SwitchPairing, via out-of-box power supplying chargers and on-board clocks, achieving security and economics at the same time. When a user wants to pair two or more devices, he/she connects the pairing devices to the same power source, and presses/releases the switch on/off button several times. Then, the press and release timing can be used to derive symmetric keys. We implement a prototype via two CC2640R2F development boards from Texas Instruments (TI) due to its prevalence. Extensive experiments and user studies are also conducted to benchmark our protocol in terms of efficiency and security.

preprint2020arXiv

Personalized workflow to identify optimal T-cell epitopes for peptide-based vaccines against COVID-19

Traditional vaccines against viruses are designed to target their surface proteins, i.e., antigens, which can trigger the immune system to produce specific antibodies to capture and neutralize the viruses. However, viruses often evolve quickly, and their antigens are prone to mutations to avoid recognition by the antibodies (antigenic drift). This limitation of the antibody-mediated immunity could be addressed by the T-cell mediated immunity, which is able to recognize conserved viral HLA peptides presented on virus-infected cells. Thus, by targeting conserved regions on the genome of a virus, T-cell epitope-based vaccines are less subjected to mutations and may work effectively on different strains of the virus. Here we propose a personalized workflow to identify an optimal set of T-cell epitopes based on the HLA alleles and the immunopeptidome of an individual person. Specifically, our workflow trains a machine learning model on the immunopeptidome and then predicts HLA peptides from conserved regions of a virus that are most likely to trigger responses from the person T cells. We applied the workflow to identify T-cell epitopes for the SARS-COV-2 virus, which has caused the recent COVID-19 pandemic in more than 100 countries across the globe.

preprint2020arXiv

Photon-mediated charge-exchange reactions between 39K atoms and 40Ca+ ions in a hybrid trap

We present experimental evidence of charge exchange between laser-cooled potassium $^{39}$K atoms and calcium $^{40}$Ca$^+$ ions in a hybrid atom-ion trap and give quantitative theoretical explanations for the observations. The $^{39}$K atoms and $^{40}$Ca$^+$ ions are held in a magneto-optical (MOT) and a linear Paul trap, respectively. Fluorescence detection and high resolution time of flight mass spectra for both species are used to determine the remaining number of $^{40}$Ca$^+$ ions, the increasing number of $^{39}$K$^+$ ions, and $^{39}$K number density as functions of time. Simultaneous trap operation is guaranteed by alternating periods of MOT and $^{40}$Ca$^+$ cooling lights, thus avoiding direct ionization of $^{39}$K by the $^{40}$Ca$^+$ cooling light. We show that the K-Ca$^+$ charge-exchange rate coefficient increases linearly from zero with $^{39}$K number density and, surprisingly, the fraction of $^{40}$Ca$^+$ ions in the 4p\,$^2$P$_{1/2}$ electronically-excited state. Combined with our theoretical analysis, we conclude that these data can only be explained by a process that starts with a potassium atom in its electronic ground state and a calcium ion in its excited 4p\,$^2$P$_{1/2}$ state producing ground-state $^{39}$K$^+$ ions and metastable, neutral Ca\,(3d4p$^3$P$_1$) atoms, releasing only 150 cm$^{-1}$ equivalent relative kinetic energy. Charge-exchange between either ground- or excited-state $^{39}$K and ground-state $^{40}$Ca$^+$ is negligibly small as no energetically-favorable product states are available. Our experimental and theoretical rate coefficients of $9\times10^{-10}$ cm$^3$/s are in agreement given the uncertainty budgets.

preprint2020arXiv

Pockels Soliton Microcomb

Kerr soliton microcombs have recently emerged as a prominent topic in integrated photonics and enabled new horizons for optical frequency metrology. Kerr soliton microcombs, as its name suggests, are based on the high-order cubic optical nonlinearity. It is desirable to exploit quadratic photonic materials, namely Pockels materials, for soliton generation and on-chip implementation of 1f-2f comb self-referencing. Such quadratically-driven solitons have been theoretically proposed, but have not yet been observed in a nanophotonic platform despite of recent progresses in quadratic comb generation in free-space and crystalline resonators. Here we report photonic chip-based Pockels microcomb solitons driven by three-wave mixing in an aluminum nitride microring resonator. In contrast to typical Kerr solitons, our Pockels soliton features unity soliton generation fidelity, two-by-two annihilation of multi-soliton states, favorable tuning dynamics, and high pump-to-soliton conversion efficiency.

preprint2020arXiv

PointIso: Point Cloud Based Deep Learning Model for Detecting Arbitrary-Precision Peptide Features in LC-MS Map through Attention Based Segmentation

A promising technique of discovering disease biomarkers is to measure the relative protein abundance in multiple biofluid samples through liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics. The key step involves peptide feature detection in LC-MS map, along with its charge and intensity. Existing heuristic algorithms suffer from inaccurate parameters since different settings of the parameters result in significantly different outcomes. Therefore, we propose PointIso, to serve the necessity of an automated system for peptide feature detection that is able to find out the proper parameters itself, and is easily adaptable to different types of datasets. It consists of an attention based scanning step for segmenting the multi-isotopic pattern of peptide features along with charge and a sequence classification step for grouping those isotopes into potential peptide features. PointIso is the first point cloud based, arbitrary-precision deep learning network to address the problem and achieves 98% detection of high quality MS/MS identifications in a benchmark dataset, which is higher than several other widely used algorithms. Besides contributing to the proteomics study, we believe our novel segmentation technique should serve the general image processing domain as well.

preprint2020arXiv

Practical Modeling and Beamforming for Intelligent Reflecting Surface Aided Wideband Systems

Intelligent reflecting surface (IRS) has emerged as a revolutionizing solution to enhance wireless communications by intelligently changing the propagation environment. Prior studies on IRS are based on an ideal reflection model with a constant amplitude and a variable phase shift. However, it is difficult and unrealistic to implement an IRS satisfying such ideal reflection model in practical applications. In this letter, we aim to investigate the phase-amplitude-frequency relationship of the reflected signals and propose a practical model of reflection coefficient for an IRS-aided wideband system. Then, based on this practical model, joint transmit power allocation of each subcarrier and IRS beamforming optimization are investigated for an IRS-aided wideband orthogonal frequency-division multiplexing (OFDM) system. Simulation results illustrate the importance of the practical model on the IRS designs and validate the effectiveness of our proposed model.

preprint2020arXiv

Prediction of the onset of cardiovascular diseases from electronic health records using multi-task gated recurrent units

In this work, we propose a multi-task recurrent neural network with attention mechanism for predicting cardiovascular events from electronic health records (EHRs) at different time horizons. The proposed approach is compared to a standard clinical risk predictor (QRISK) and machine learning alternatives using 5-year data from a NHS Foundation Trust. The proposed model outperforms standard clinical risk scores in predicting stroke (AUC=0.85) and myocardial infarction (AUC=0.89), considering the largest time horizon. Benefit of using an \gls{mt} setting becomes visible for very short time horizons, which results in an AUC increase between 2-6%. Further, we explored the importance of individual features and attention weights in predicting cardiovascular events. Our results indicate that the recurrent neural network approach benefits from the hospital longitudinal information and demonstrates how machine learning techniques can be applied to secondary care.

preprint2020arXiv

Providing Input-Discriminative Protection for Local Differential Privacy

Local Differential Privacy (LDP) provides provable privacy protection for data collection without the assumption of the trusted data server. In the real-world scenario, different data have different privacy requirements due to the distinct sensitivity levels. However, LDP provides the same protection for all data. In this paper, we tackle the challenge of providing input-discriminative protection to reflect the distinct privacy requirements of different inputs. We first present the Input-Discriminative LDP (ID-LDP) privacy notion and focus on a specific version termed MinID-LDP, which is shown to be a fine-grained version of LDP. Then, we focus on the application of frequency estimation and develop the IDUE mechanism based on Unary Encoding for single-item input and the extended mechanism IDUE-PS (with Padding-and-Sampling protocol) for item-set input. The results on both synthetic and real-world datasets validate the correctness of our theoretical analysis and show that the proposed mechanisms satisfying MinID-LDP have better utility than the state-of-the-art mechanisms satisfying LDP due to the input-discriminative protection.

preprint2020arXiv

ROBin: Known-Plaintext Attack Resistant Orthogonal Blinding via Channel Randomization

Orthogonal blinding based schemes for wireless physical layer security aim to achieve secure communication by injecting noise into channels orthogonal to the main channel and corrupting the eavesdropper's signal reception. These methods, albeit practical, have been proven vulnerable against multi-antenna eavesdroppers who can filter the message from the noise. The vulnerability is rooted in the fact that the main channel state remains static in spite of the noise injection, which allows an eavesdropper to estimate it promptly via known symbols and filter out the noise. Our proposed scheme leverages a reconfigurable antenna for Alice to rapidly change the channel state during transmission and a compressive sensing based algorithm for her to predict and cancel the changing effects for Bob. As a result, the communication between Alice and Bob remains clear, whereas randomized channel state prevents Eve from launching the known-plaintext attack. We formally analyze the security of the scheme against both single and multi-antenna eavesdroppers and identify its unique anti-eavesdropping properties due to the artificially created fast-changing channel. We conduct extensive simulations and real-world experiments to evaluate its performance. Empirical results show that our scheme can suppress Eve's attack success rate to the level of random guessing, even if she knows all the symbols transmitted through other antenna modes.

preprint2020arXiv

Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

Convolutional Neural Network (CNN) or Long short-term memory (LSTM) based models with the input of spectrogram or waveforms are commonly used for deep learning based audio source separation. In this paper, we propose a Sliced Attention-based neural network (Sams-Net) in the spectrogram domain for the music source separation task. It enables spectral feature interactions with multi-head attention mechanism, achieves easier parallel computing and has a larger receptive field compared with LSTMs and CNNs respectively. Experimental results on the MUSDB18 dataset show that the proposed method, with fewer parameters, outperforms most of the state-of-the-art DNN-based methods.

preprint2020arXiv

Scaling features in the spreading of COVID-19

Since the outbreak of COVID-19, many data analyses have been done. Some of them are based on the classical epidemiological approach that assumes an exponential growth, but a few studies report that a power-law scaling may provide a better fit to the currently available data. Hereby, we examine the data in China (01/20/2020--02/24/2020), and indeed find that the growth closely follows a power-law kinetics over a significantly wide time period. The exponents are $2.48(20)$, $2.21(6)$ and $4.26(12)$ for the number of confirmed infections, deaths and cured cases, respectively, indicating an underlying small-world network structure in the pandemic. While no obvious deviations from the power-law growth can be seen yet for the number of deaths and cured cases, negative deviations have clearly appeared in the number of infections, particularly that for the region outside Hubei. This suggests the beginning of the slowing-down of the virus spreading due to the huge containment effort. Meanwhile, we find that despite the dramatic difference in magnitudes, the growth kinetics of the infection number exhibits much similarity for Hubei province and the region outside Hubei. On this basis, in log-log plot, we rescale the infection number for the region outside Hubei such that it overlaps as much as possible with the total infection number in China, from which an approximate extrapolation yields the maximum of the pandemic around March 3, 2020, with the number of infections about $83,000$. Further, by analyzing the kinetics of the mortality in log-log scale, we obtains a rough estimate that near March 3, the death rate of COVID-19 would be about $4.7\%\thicksim 5.0\%$ for Hubei province and $0.7\%\thicksim1.0\%$ for the region outside Hubei. We emphasize that our predictions may be quantitatively unreliable, since the data analysis is purely empirical and various assumptions are used.

preprint2020arXiv

SecEL: Privacy-Preserving, Verifiable and Fault-Tolerant Edge Learning for Autonomous Vehicles

Mobile edge computing (MEC) is an emerging technology to transform the cloud-based computing services into the edge-based ones. Autonomous vehicular network (AVNET), as one of the most promising applications of MEC, can feature edge learning and communication techniques, improving the safety for autonomous vehicles (AVs). This paper focuses on the edge learning in AVNET, where AVs at the edge of the network share model parameters instead of data in a distributed manner, and an aggregator (e.g., a base station) aggregates parameters from AVs and at the end obtains a trained model. Despite promising, security issues, such as data leakage, computing integrity invasion and fault connection in existing edge learning cases are not considered fully. To the best of our knowledge, there lacks an effective scheme simultaneously covering the foregoing security issues. Therefore, we propose \textit{SecEL}, a privacy-preserving, verifiable and fault-tolerant scheme for edge learning in AVNET. First, we leverage the primitive of bivariate polynomial-based secret sharing to encrypt model parameters by one-time padding. Second, we use homomorphic authenticator based on message authentication code to support verifiable computation. Third, we mitigate the computation failure problem caused by fault connection. Last, we simulate and evaluate SecEL in terms of time cost, throughput and classification accuracy. The experiment results demonstrate the effectiveness of SecEL.

preprint2020arXiv

Synergy between Machine/Deep Learning and Software Engineering: How Far Are We?

Since 2009, the deep learning revolution, which was triggered by the introduction of ImageNet, has stimulated the synergy between Machine Learning (ML)/Deep Learning (DL) and Software Engineering (SE). Meanwhile, critical reviews have emerged that suggest that ML/DL should be used cautiously. To improve the quality (especially the applicability and generalizability) of ML/DL-related SE studies, and to stimulate and enhance future collaborations between SE/AI researchers and industry practitioners, we conducted a 10-year Systematic Literature Review (SLR) on 906 ML/DL-related SE papers published between 2009 and 2018. Our trend analysis demonstrated the mutual impacts that ML/DL and SE have had on each other. At the same time, however, we also observed a paucity of replicable and reproducible ML/DL-related SE studies and identified five factors that influence their replicability and reproducibility. To improve the applicability and generalizability of research results, we analyzed what ingredients in a study would facilitate an understanding of why a ML/DL technique was selected for a specific SE problem. In addition, we identified the unique trends of impacts of DL models on SE tasks, as well as five unique challenges that needed to be met in order to better leverage DL to improve the productivity of SE tasks. Finally, we outlined a road-map that we believe can facilitate the transfer of ML/DL-based SE research results into real-world industry practices.

preprint2020arXiv

Temporal optical neurons for serial deep learning

Deep learning is able to functionally simulate the human brain and thus, it has attracted considerable interest. Optics-assisted deep learning is a promising approach to improve the forward-propagation speed and reduce the power consumption. However, present methods are based on a parallel processing approach that is inherently ineffective in dealing with serial data signals at the core of information and communication technologies. Here, we propose and demonstrate a serial optical deep learning concept that is specifically designed to directly process high-speed temporal data. By utilizing ultra-short coherent optical pulses as the information carriers, the neurons are distributed at different time slots in a serial pattern, and interconnected to each other through group delay dispersion. A 4-layer serial optical neural network (SONN) was constructed and trained for classification of both analog and digital signals with simulated accuracy rates of over 90% with proper individuality variance rates. Furthermore, we performed a proof-of-concept experiment of a pseudo-3-layer SONN to successfully recognize the ASCII (American Standard Code for Information Interchange) codes of English letters at a data rate of 12 Gbps. This concept represents a novel one-dimensional realization of artificial neural networks, enabling an efficient application of optical deep learning methods to the analysis and processing of serial data signals, while offering a new overall perspective for the temporal signal processing.

preprint2020arXiv

The FFSVC 2020 Evaluation Plan

The Far-Field Speaker Verification Challenge 2020 (FFSVC20) is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenarios. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and technologies in speaker verification, 3) provide an open, free, and large scale speech database to the community that exhibits the far-field characteristics in real scenes.

preprint2020arXiv

The First Round Result from the TianQin-1 Satellite

The TianQin-1 satellite (TQ-1), which is the first technology demonstration satellite for the TianQin project, was launched on 20 December 2019. The first round of experiment had been carried out from 21 December 2019 until 1 April 2020. The residual acceleration of the satellite is found to be about $1\times10^{-10}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$ and about $5\times10^{-11}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.05~{\rm Hz}\,$, measured by an inertial sensor with a sensitivity of $5\times10^{-12}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$. The micro-Newton thrusters has demonstrated a thrust resolution of $0.1~μ{\rm N}$ and a thrust noise of $0.3~μ{\rm N}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}$. The residual noise of the satellite with drag-free control is $3\times10^{-9}~{\rm m}/{\rm s}^{2}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$. The noise level of the optical readout system is about $30~{\rm pm}/{\rm Hz}^{1/2}$ at $0.1~{\rm Hz}\,$. The temperature stability at temperature monitoring position is controlled to be about $\pm3~{\rm mK}$ per orbit, and the mismatch between the center-of-mass of the satellite and that of the test mass is measured with a precision of better than $0.1~{\rm mm}$.

preprint2020arXiv

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020) addresses three different research problems under well-defined conditions: far-field text-dependent speaker verification from single microphone array, far-field text-independent speaker verification from single microphone array, and far-field text-dependent speaker verification from distributed microphone arrays. All three tasks pose a cross-channel challenge to the participants. To simulate the real-life scenario, the enrollment utterances are recorded from close-talk cellphone, while the test utterances are recorded from the far-field microphone arrays. In this paper, we describe the database, the challenge, and the baseline system, which is based on a ResNet-based deep speaker network with cosine similarity scoring. For a given utterance, the speaker embeddings of different channels are equally averaged as the final embedding. The baseline system achieves minDCFs of 0.62, 0.66, and 0.64 and EERs of 6.27%, 6.55%, and 7.18% for task 1, task 2, and task 3, respectively.

preprint2020arXiv

The reduction of the number of incoherent Kraus operations for qutrit systems

Quantum coherence is a fundamental property that can emerge within any quantum system. Incoherent operations, defined in terms of the Kraus decomposition, take an important role in state transformation. The maximum number of incoherent Kraus operators has been presented in [A. Streltsov, S. Rana, P. Boes, J. Eisert, Phys. Rev. Lett. 119. 140402 (2017)]. In this work, we show that the number of incoherent Kraus operators for a single qubit can be reduced from 5 to 4 by constructing a proper unitary matrix. For qutrit systems we further obtain 32 incoherent Kraus operators, while the upper bound in the research of Sterltsov gives 39 Kraus operators. Besides, we reduce the number of strictly incoherent Kraus operators from more than 15 to 13. And we consider the state transformation problem for these two types of operations in single qutrit systems.

preprint2020arXiv

The TianQin project: current progress on science and technology

TianQin is a planned space-based gravitational wave (GW) observatory consisting of three earth orbiting satellites with an orbital radius of about $10^5~{\rm km}$. The satellites will form a equilateral triangle constellation the plane of which is nearly perpendicular to the ecliptic plane. TianQin aims to detect GWs between $10^{-4}~{\rm Hz}$ and $1~{\rm Hz}$ that can be generated by a wide variety of important astrophysical and cosmological sources, including the inspiral of Galactic ultra-compact binaries, the inspiral of stellar-mass black hole binaries, extreme mass ratio inspirals, the merger of massive black hole binaries, and possibly the energetic processes in the very early universe or exotic sources such as cosmic strings. In order to start science operations around 2035, a roadmap called the 0123 plan is being used to bring the key technologies of TianQin to maturity, supported by the construction of a series of research facilities on the ground. Two major projects of the 0123 plan are being carried out. In this process, the team has created a new generation $17~{\rm cm}$ single-body hollow corner-cube retro-reflector which has been launched with the QueQiao satellite on 21 May 2018; a new laser ranging station equipped with a $1.2~{\rm m}$ telescope has been constructed and the station has successfully ranged to all the five retro-reflectors on the Moon; and the TianQin-1 experimental satellite has been launched on 20 December 2019 and the first round result shows that the satellite has exceeded all of its mission requirements.

preprint2020arXiv

Towards 1% single photon nonlinearity with periodically-poled lithium niobate microring resonators

The absence of the single-photon nonlinearity has been a major roadblock in developing quantum photonic circuits at optical frequencies. In this paper, we demonstrate a periodically-poled thin film lithium niobate microring resonator (PPLNMR) that reaches 5,000,000%/W second harmonic conversion efficiency---almost 20-fold enhancement over the state-of-the-art---by accessing its largest $χ^{(2)}$ tensor component $d_{33}$ via quasi-phase matching. The corresponding single photon coupling rate $g/2π$ is estimated to be 1.2 MHz, which is an important milestone as it approaches the dissipation rate $κ/2π$ of best available lithium niobate microresonators developed in the community. Using a figure of merit defined as $g/κ$, our devices reach a single photon nonlinearity approaching 1%. We show that, by further scaling of the device, it is possible to improve the single photon nonlinearity to a regime where photon-blockade effect can be manifested.

preprint2020arXiv

TreeRNN: Topology-Preserving Deep GraphEmbedding and Learning

General graphs are difficult for learning due to their irregular structures. Existing works employ message passing along graph edges to extract local patterns using customized graph kernels, but few of them are effective for the integration of such local patterns into global features. In contrast, in this paper we study the methods to transfer the graphs into trees so that explicit orders are learned to direct the feature integration from local to global. To this end, we apply the breadth first search (BFS) to construct trees from the graphs, which adds direction to the graph edges from the center node to the peripheral nodes. In addition, we proposed a novel projection scheme that transfer the trees to image representations, which is suitable for conventional convolution neural networks (CNNs) and recurrent neural networks (RNNs). To best learn the patterns from the graph-tree-images, we propose TreeRNN, a 2D RNN architecture that recurrently integrates the image pixels by rows and columns to help classify the graph categories. We evaluate the proposed method on several graph classification datasets, and manage to demonstrate comparable accuracy with the state-of-the-art on MUTAG, PTC-MR and NCI1 datasets.

preprint2020arXiv

Tune-out and magic wavelengths for ground-state $^{23}$Na$^{40}$K molecules

We demonstrate a versatile, rotational-state dependent trapping scheme for the ground and first excited rotational states of $^{23}$Na$^{40}$K molecules. Close to the rotational manifold of a narrow electronic transition, we determine tune-out frequencies where the polarizability of one state vanishes while the other remains finite, and a magic frequency where both states experience equal polarizability. The proximity of these frequencies of only 10 GHz allows for dynamic switching between different trap configurations in a single experiment, while still maintaining sufficiently low scattering rates.

preprint2020arXiv

Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data

A number of researchers have recently questioned the necessity of increasingly complex neural network (NN) architectures. In particular, several recent papers have shown that simpler, properly tuned models are at least competitive across several NLP tasks. In this work, we show that this is also the case for text generation from structured and unstructured data. We consider neural table-to-text generation and neural question generation (NQG) tasks for text generation from structured and unstructured data, respectively. Table-to-text generation aims to generate a description based on a given table, and NQG is the task of generating a question from a given passage where the generated question can be answered by a certain sub-span of the passage using NN models. Experimental results demonstrate that a basic attention-based seq2seq model trained with the exponential moving average technique achieves the state of the art in both tasks. Code is available at https://github.com/h-shahidi/2birds-gen.

preprint2020arXiv

Variational Quantum Algorithms for Dimensionality Reduction and Classification

In this work, we present a quantum neighborhood preserving embedding and a quantum local discriminant embedding for dimensionality reduction and classification. We demonstrate that these two algorithms have an exponential speedup over their respectively classical counterparts. Along the way, we propose a variational quantum generalized eigenvalue solver that finds the generalized eigenvalues and eigenstates of a matrix pencil $(\mathcal{G},\mathcal{S})$. As a proof-of-principle, we implement our algorithm to solve $2^5\times2^5$ generalized eigenvalue problems. Finally, our results offer two optional outputs with quantum or classical form, which can be directly applied in another quantum or classical machine learning process.

preprint2020arXiv

Wireless Federated Learning with Local Differential Privacy

In this paper, we study the problem of federated learning (FL) over a wireless channel, modeled by a Gaussian multiple access channel (MAC), subject to local differential privacy (LDP) constraints. We show that the superposition nature of the wireless channel provides a dual benefit of bandwidth efficient gradient aggregation, in conjunction with strong LDP guarantees for the users. We propose a private wireless gradient aggregation scheme, which shows that when aggregating gradients from $K$ users, the privacy leakage per user scales as $\mathcal{O}\big(\frac{1}{\sqrt{K}} \big)$ compared to orthogonal transmission in which the privacy leakage scales as a constant. We also present analysis for the convergence rate of the proposed private FL aggregation algorithm and study the tradeoffs between wireless resources, convergence, and privacy.

preprint2020arXiv

Within-sample variability-invariant loss for robust speaker recognition under noisy environments

Despite the significant improvements in speaker recognition enabled by deep neural networks, unsatisfactory performance persists under noisy environments. In this paper, we train the speaker embedding network to learn the "clean" embedding of the noisy utterance. Specifically, the network is trained with the original speaker identification loss with an auxiliary within-sample variability-invariant loss. This auxiliary variability-invariant loss is used to learn the same embedding among the clean utterance and its noisy copies and prevents the network from encoding the undesired noises or variabilities into the speaker representation. Furthermore, we investigate the data preparation strategy for generating clean and noisy utterance pairs on-the-fly. The strategy generates different noisy copies for the same clean utterance at each training step, helping the speaker embedding network generalize better under noisy environments. Experiments on VoxCeleb1 indicate that the proposed training framework improves the performance of the speaker verification system in both clean and noisy conditions.

preprint2019arXiv

Feshbach resonances in $p$-wave three-body recombination within Fermi-Fermi mixtures of open-shell $^6$Li and closed-shell $^{173}$Yb atoms

We report on observations and modeling of interspecies magnetic Feshbach resonances in dilute ultracold mixtures of open-shell alkali-metal $^6$Li and closed-shell $^{173}$Yb atoms with temperatures just above quantum degeneracy for both fermionic species. Resonances are located by detecting magnetic-field-dependent atom loss due to three-body recombination. We resolve closely-located resonances that originate from a weak separation-dependent hyperfine coupling between the electronic spin of $^6$Li and the nuclear spin of $^{173}$Yb, and confirm their magnetic field spacing by ab initio electronic-structure calculations. Through quantitative comparisons of theoretical atom-loss profiles and experimental data at various temperatures between 1 $μ$K and 20 $μ$K, we show that three-body recombination in fermionic mixtures has a $p$-wave Wigner threshold behavior leading to characteristic asymmetric loss profiles. Such resonances can be applied towards the formation of ultracold doublet ground-state molecules and quantum simulation of superfluid $p$-wave pairing.

preprint2019arXiv

Onionchain: Towards Balancing Privacy and Traceability of Blockchain-Based Applications

With the popularity of Blockchain comes grave security-related concerns. Achieving privacy and traceability simultaneously remains an open question. Efforts have been made to address the issues, while they may subject to specific scenarios. This paper studies how to provide a more general solution for this open question. Concretely, we propose Onionchain, featuring a suite of protocols, offering both traceability and privacy. As the term implies, our Onionchain is inspired by Onion routing. We investigate the principles of Onion routing carefully and integrate its mechanism together with Blockchain technology. We advocate the Blockchain community to adopt Onionchain with the regards of privacy and traceability. To this end, a case-study of Onionchain, which runs in the context of Vehicular Ad Hoc Networks (VANETs), is proposed, providing the community a guideline to follow. Systematic security analysis and extensive experiments are also conducted to validate our secure and cost-effective Onionchain.

preprint2019arXiv

Photon-photon quantum phase gate in a photonic molecule with $χ^{(2)}$ nonlinearity

The construction of photon-photon quantum phase gate based on photonic nonlinearity has long been a fundamental issue, which is vital for deterministic and scalable photonic quantum information processing. It requires not only strong nonlinear interaction at the single-photon level, but also suppressed phase noise and spectral entanglement for high gate fidelity. In this paper, we propose that high-quality factor microcavity with strong $χ^{(2)}$ nonlinearity can be quantized to anharmonic energy levels and be effectively treated as an artificial atom. Such artificial atom has a size much larger than the photon wavelength, which enables passive and active ultra-strong coupling to traveling photons. High-fidelity quantum control-phase gate is realized by mediating the phase between photons with an intermediate artificial atom in a photonic molecule structure. The scheme avoids the two-photon emission and thus eliminates the spectral entanglement and quantum phase noises. Experimental realization of the artificial atom can be envisioned on the integrated photonic chip and holds great potential for single-emitter-free, room-temperature quantum information processing.

preprint2018arXiv

Secure Retrospective Interference Alignment

In this paper, the $K$-user interference channel with secrecy constraints is considered with delayed channel state information at transmitters (CSIT). We propose a novel secure retrospective interference alignment scheme in which the transmitters carefully mix information symbols with artificial noises to ensure confidentiality. Achieving positive secure degrees of freedom (SDoF) is challenging due to the delayed nature of CSIT, and the distributed nature of the transmitters. Our scheme works over two phases: phase one in which each transmitter sends information symbols mixed with artificial noises, and repeats such transmission over multiple rounds. In the next phase, each transmitter uses delayed CSIT of the previous phase and sends a function of the net interference and artificial noises (generated in previous phase), which is simultaneously useful for all receivers. These phases are designed to ensure the decodability of the desired messages while satisfying the secrecy constraints. We present our achievable scheme for three models, namely: 1) $K$-user interference channel with confidential messages (IC-CM), and we show that $\frac{1}{2} (\sqrt{K} -6) $ SDoF is achievable, 2) $K$-user interference channel with an external eavesdropper (IC-EE), and 3) $K$-user IC with confidential messages and an external eavesdropper (IC-CM-EE). We show that for the $K$-user IC-EE, $\frac{1}{2} (\sqrt{K} -3) $ SDoF is achievable, and for the $K$-user IC-CM-EE, $\frac{1}{2} (\sqrt{K} -6) $ is achievable. To the best of our knowledge, this is the first result on the $K$-user interference channel with secrecy constrained models and delayed CSIT that achieves a SDoF which scales with $K$, the number of users.