Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
43works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

43 published item(s)

preprint2024arXiv

Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

preprint2024arXiv

Quark masses and low energy constants in the continuum from the tadpole improved clover ensembles

We present the light-flavor quark masses and low energy constants using the 2+1 flavor full-QCD ensembles with stout smeared clover fermion action and Symanzik gauge actions. Both the fermion and gauge actions are tadpole improved self-consistently. The simulations are performed on 11 ensembles at 3 lattice spacings $a\in[0.05,0.11]$ fm, 4 spatial sizes $L\in[2.5, 5.1]$ fm, 7 pion masses $m_π\in[135,350]$ MeV, and several values of the strange quark mass. The quark mass is defined through the partially conserved axial current (PCAC) relation and renormalized to $\overline{\mathrm{MS}}$ 2 GeV through the intermediate regularization independent momentum subtraction (RI/MOM) scheme. The systematic uncertainty of using the symmetric momentum subtraction (SMOM) scheme is also included. Eventually, we predict $m_u=2.45(22)(20)$ MeV, $m_d=4.74(11)(09)$ MeV, and $m_s=98.8(2.9)(4.7)$ MeV with the systematic uncertainties from lattice spacing determination, continuum extrapolation and renormalization constant included. We also obtain the chiral condensate $Σ^{1/3}=268.6(3.6)(0.7)$ MeV and the pion decay constant $F=86.6(7)(1.4) $ MeV in the $N_f=2$ chiral limit, and the next-to-leading order low energy constants $\ell_3=2.43(54)(05)$ and $\ell_4=4.322(75)(96)$.

preprint2024arXiv

Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResNN) is proposed to automatically identify musical shapes. To assess the proposed approach in the context of piano musical shape evaluation, we have generated a new dataset, containing 4116 music pieces derived by 147 piano preparatory exercises and performed in 28 categories of musical shapes. The experimental results show that the S-ResNN significantly outperforms a number of benchmark methods in terms of the precision, recall and F1 score.

preprint2022arXiv

$T_{cc}^{+}(3875)$ relevant $DD^*$ scattering from $N_f=2$ lattice QCD

The $S$-wave $DD^*$ scattering in the isospin $I=0,1$ channels is studied in $N_f=2$ lattice QCD at $m_π\approx 350$ MeV. It is observed that the $DD^*$ interaction is repulsive in the $I=1$ channel when the $DD^*$ energy is near the $DD^*$ threshold. In contrast, the $DD^*$ interaction in the $I=0$ channel is definitely attractive in a wide range of the $DD^*$ energy. This is consistent with the isospin assignment $I=0$ for $T_{cc}^+(3875)$. By analyzing the components of the $DD^*$ correlation functions, it turns out that the quark diagram responsible for the different properties of $I=0,1$ $DD^*$ interactions can be understood as the charged $ρ$ meson exchange effect. This observation provides direct information on the internal dynamics of $T_{cc}^+(3875)$.

preprint2022arXiv

Annihilation diagram contribution to charmonium masses

In this work, we generate gauge configurations with $N_f=2$ dynamical charm quarks on anisotropic lattices. The mass shift of $1S$ and $1P$ charmonia owing to the charm quark annihilation effect can be investigated directly in a manner of unitary theory. The distillation method is adopted to treat the charm quark annihilation diagrams at a very precise level. For $1S$ charmonia, the charm quark annihilation effect almost does not change the $J/ψ$ mass, but lifts the $η_c$ mass by approximately 3-4 MeV. For $1P$ charmonia, this effect results in positive mass shifts of approximately 1 MeV for $χ_{c1}$ and $h_c$, but decreases the $χ_{c2}$ mass by approximately 3 MeV. We have not obtain a reliable result for the mass shift of $χ_{c0}$. In addition, it is observed that the spin averaged mass of the spin-triplet $1P$ charmonia is in a good agreement with the $h_c$, as expected by the non-relativistic quark model and measured by experiments.

preprint2022arXiv

Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling

Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks (xSL), such as cross-lingual machine reading comprehension (xMRC) by transferring knowledge from a high-resource language to low-resource languages. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages: e.g., mask language modeling objective requires local understanding of the masked token and the span-extraction objective requires global understanding and reasoning of the input passage/paragraph and question, leading to the discrepancy between pre-training and xMRC. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap in a self-supervised manner. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel sequences via unsupervised cross-lingual instance-wise training signals during pre-training. By these means, our methods not only bridge the gap between pretrain-finetune, but also enhance PLMs to better capture the alignment between different languages. Extensive experiments prove that our method achieves clearly superior results on multiple xSL benchmarks with limited pre-training data. Our methods also surpass the previous state-of-the-art methods by a large margin in few-shot data settings, where only a few hundred training examples are available.

preprint2022arXiv

Multi-View Document Representation Learning for Open-Domain Dense Retrieval

Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document. However, a document can usually answer multiple potential queries from different views. So the single vector representation of a document is hard to match with multi-view queries, and faces a semantic mismatch problem. This paper proposes a multi-view document representation learning framework, aiming to produce multi-view embeddings to represent documents and enforce them to align with different queries. First, we propose a simple yet effective method of generating multiple embeddings through viewers. Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries. Experiments show our method outperforms recent works and achieves state-of-the-art results.

preprint2022arXiv

Negative Sampling for Contrastive Representation Learning: A Review

The learn-to-compare paradigm of contrastive representation learning (CRL), which compares positive samples with negative ones for representation learning, has achieved great success in a wide range of domains, including natural language processing, computer vision, information retrieval and graph learning. While many research works focus on data augmentations, nonlinear transformations or other certain parts of CRL, the importance of negative sample selection is usually overlooked in literature. In this paper, we provide a systematic review of negative sampling (NS) techniques and discuss how they contribute to the success of CRL. As the core part of this paper, we summarize the existing NS methods into four categories with pros and cons in each genre, and further conclude with several open research questions as future directions. By generalizing and aligning the fundamental NS ideas across multiple domains, we hope this survey can accelerate cross-domain knowledge sharing and motivate future researches for better CRL.

preprint2022arXiv

Realization of fast all-microwave CZ gates with a tunable coupler

The development of high-fidelity two-qubit quantum gates is essential for digital quantum computing. Here, we propose and realize an all-microwave parametric Controlled-Z (CZ) gates by coupling strength modulation in a superconducting Transmon qubit system with tunable couplers. After optimizing the design of the tunable coupler together with the control pulse numerically, we experimentally realized a 100 ns CZ gate with high fidelity of 99.38%$ \pm$0.34% and the control error being 0.1%. We note that our CZ gates are not affected by pulse distortion and do not need pulse correction, {providing a solution for the real-time pulse generation in a dynamic quantum feedback circuit}. With the expectation of utilizing our all-microwave control scheme to reduce the number of control lines through frequency multiplexing in the future, our scheme draws a blueprint for the high-integrable quantum hardware design.

preprint2022arXiv

Ruling out real-valued standard formalism of quantum theory

Standard quantum theory was formulated with complex-valued Schrodinger equations, wave functions, operators, and Hilbert spaces. Previous work attempted to simulate quantum systems using only real numbers by exploiting an enlarged Hilbert space. A fundamental question arises: are complex numbers really necessary in the standard formalism of quantum theory? To answer this question, a quantum game has been developed to distinguish standard quantum theory from its real-number analog by revealing a contradiction in the maximum game scores between a high-fidelity multi-qubit quantum experiment and players using only real-number quantum theory. Here, using superconducting qubits, we faithfully experimentally implement the quantum game based on entanglement swapping with a state-of-the-art fidelity of 0.952(1), which beats the real-number bound of 7.66 by 43 standard deviations. Our results disprove the real-number formulation and establish the indispensable role of complex numbers in the standard quantum theory.

preprint2022arXiv

Scaling of finite size effect of $α$-Rényi entropy in disjointed intervals under dilation

The $α$-Rényi entropy in the gapless models have been obtained by the conformal field theory, which is exact in the thermodynamic limit. However, the calculation of its finite size effect (FSE) is challenging. So far only the FSE in a single interval in the XX model has been understood and the FSE in the other models and in the other conditions are totally unknown. Here we report the FSE of this entropy in disjointed intervals $A = \cup_i A_i$ under a uniform dilation $λA$ in the XY model, showing of a universal scaling law as \begin{equation*} Δ_{λA}^α= Δ_A^αλ^{-η} \mathcal{B}(A, λ), \end{equation*} where $|\mathcal{B}(A, λ)| \le 1$ is a bounded function and $η= \text{min}(2, 2/α)$ when $α< 10$. We verify this relation in the phase boundaries of the XY model, in which the different central charges correspond to the physics of free Fermion and free Boson models. We find that in the disjointed intervals, two FSEs, termed as extrinsic FSE and intrinsic FSE, are required to fully account for the FSE of the entropy. Physically, we find that only the edge modes of the correlation matrix localized at the open ends $\partial A$ have contribution to the total entropy and its FSE. Our results provide some incisive insight into the entanglement entropy in the many-body systems.

preprint2022arXiv

The Glueball content of $η_c$

We carry out the first lattice QCD derivation of the mixing energy and the mixing angle of the pseudoscalar charmonium and glueball on two gauge ensembles with $N_f=2$ degenerate dynamical charm quarks. The mixing energy is determined to be $49(6)$ MeV on the near physical charm ensemble, which seems insensitive to charm quark mass. By the assumption that $X(2370)$ is predominantly a pseudoscalar glueball, the mixing angle is determined to be approximately $4.6(6)^\circ$, which results in a $+3.9(9)$ MeV mass shift of the ground state pseudoscalar charmonium. In the mean time, the mixing can raise the total width of the pseudoscalar charmonium by 7.2(8) MeV, which explains to some extent the relative large total width of the $η_c$ meson. As a result, the branching fraction of $η_c\to γγ$ can be understood in this $c\bar{c}$-glueball mixing framework. On the other hand, the possible discrepancy of the theoretical predictions and the experimental results of the partial width of $J/ψ\toγη_c$ cannot be alleviated by the $c\bar{c}$-glueball mixing picture yet, which demands future precise experimental measurements of this partial width.

preprint2022arXiv

Transformer-Empowered Content-Aware Collaborative Filtering

Knowledge graph (KG) based Collaborative Filtering is an effective approach to personalizing recommendation systems for relatively static domains such as movies and books, by leveraging structured information from KG to enrich both item and user representations. Motivated by the use of Transformers for understanding rich text in content-based filtering recommender systems, we propose Content-aware KG-enhanced Meta-preference Networks as a way to enhance collaborative filtering recommendation based on both structured information from KG as well as unstructured content features based on Transformer-empowered content-based filtering. To achieve this, we employ a novel training scheme, Cross-System Contrastive Learning, to address the inconsistency of the two very different systems and propose a powerful collaborative filtering model and a variant of the well-known NRMS system within this modeling framework. We also contribute to public domain resources through the creation of a large-scale movie-knowledge-graph dataset and an extension of the already public Amazon-Book dataset through incorporation of text descriptions crawled from external sources. We present experimental results showing that enhancing collaborative filtering with Transformer-based features derived from content-based filtering outperforms strong baseline systems, improving the ability of knowledge-graph-based collaborative filtering systems to exploit item content information.

preprint2022arXiv

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval

Recent research demonstrates the effectiveness of using pretrained language models (PLM) to improve dense retrieval and multilingual dense retrieval. In this work, we present a simple but effective monolingual pretraining task called contrastive context prediction~(CCP) to learn sentence representation by modeling sentence level contextual relation. By pushing the embedding of sentences in a local context closer and pushing random negative samples away, different languages could form isomorphic structure, then sentence pairs in two different languages will be automatically aligned. Our experiments show that model collapse and information leakage are very easy to happen during contrastive training of language model, but language-specific memory bank and asymmetric batch normalization operation play an essential role in preventing collapsing and information leakage, respectively. Besides, a post-processing for sentence embedding is also very effective to achieve better retrieval performance. On the multilingual sentence retrieval task Tatoeba, our model achieves new SOTA results among methods without using bilingual data. Our model also shows larger gain on Tatoeba when transferring between non-English pairs. On two multi-lingual query-passage retrieval tasks, XOR Retrieve and Mr.TYDI, our model even achieves two SOTA results in both zero-shot and supervised setting among all pretraining models using bilingual data.

preprint2021arXiv

Experimental exploration of five-qubit quantum error correcting code with superconducting qubits

Quantum error correction is an essential ingredient for universal quantum computing. Despite tremendous experimental efforts in the study of quantum error correction, to date, there has been no demonstration in the realisation of universal quantum error correcting code, with the subsequent verification of all key features including the identification of an arbitrary physical error, the capability for transversal manipulation of the logical state, and state decoding. To address this challenge, we experimentally realise the $[\![5,1,3]\!]$ code, the so-called smallest perfect code that permits corrections of generic single-qubit errors. In the experiment, having optimised the encoding circuit, we employ an array of superconducting qubits to realise the $[\![5,1,3]\!]$ code for several typical logical states including the magic state, an indispensable resource for realising non-Clifford gates. The encoded states are prepared with an average fidelity of $57.1(3)\%$ while with a high fidelity of $98.6(1)\%$ in the code space. Then, the arbitrary single-qubit errors introduced manually are identified by measuring the stabilizers. We further implement logical Pauli operations with a fidelity of $97.2(2)\%$ within the code space. Finally, we realise the decoding circuit and recover the input state with an overall fidelity of $74.5(6)\%$, in total with $92$ gates. Our work demonstrates each key aspect of the $[\![5,1,3]\!]$ code and verifies the viability of experimental realization of quantum error correcting codes with superconducting qubits.

preprint2021arXiv

Floquet Prethermal Phase Protected by U(1) Symmetry on a Superconducting Quantum Processor

Periodically driven systems, or Floquet systems, exhibit many novel dynamics and interesting out-of-equilibrium phases of matter. Those phases arising with the quantum systems&#39; symmetries, such as global $U(1)$ symmetry, can even show dynamical stability with symmetry-protection. Here we experimentally demonstrate a $U(1)$ symmetry-protected prethermal phase, via performing a digital-analog quantum simulation on a superconducting quantum processor. The dynamical stability of this phase is revealed by its robustness against external perturbations. We also find that the spin glass order parameter in this phase is stabilized by the interaction between the spins. Our work reveals a promising prospect in discovering emergent quantum dynamical phases with digital-analog quantum simulators.

preprint2021arXiv

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the constructed test set, our model outperforms a production two-step pipeline-based post-processing method by a large margin of 13.26 on readability-aware WER (RA-WER) and 17.53 on BLEU metrics. Human evaluation also demonstrates that our method can generate more human-readable transcripts than the baseline method.

preprint2021arXiv

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing parameters and applying cross-attentions, we explore maximizing the representation universality and realizing the best alignment of language-agnostic information. We also introduce a denoising auto-encoding (DAE) objective to jointly train the model with the translation task in a multi-task manner. Experiments on two public multilingual parallel datasets show that our proposed model achieves a competitive or better results than universal NMT and strong pivot baseline. Moreover, we experiment incrementally adding new language to the trained model by only updating the new model parameters. With this little effort, the zero-shot translation between this newly added language and existing languages achieves a comparable result with the model trained jointly from scratch on all languages.

preprint2021arXiv

Observation of thermalization and information scrambling in a superconducting quantum processor

Understanding various phenomena in non-equilibrium dynamics of closed quantum many-body systems, such as quantum thermalization, information scrambling, and nonergodic dynamics, is a crucial for modern physics. Using a ladder-type superconducting quantum processor, we perform analog quantum simulations of both the $XX$ ladder and one-dimensional (1D) $XX$ model. By measuring the dynamics of local observables, entanglement entropy and tripartite mutual information, we signal quantum thermalization and information scrambling in the $XX$ ladder. In contrast, we show that the $XX$ chain, as free fermions on a 1D lattice, fails to thermalize, and local information does not scramble in the integrable channel. Our experiments reveal ergodicity and scrambling in the controllable qubit ladder, and opens the door to further investigations on the thermodynamics and chaos in quantum many-body systems.

preprint2021arXiv

Quantum deleting and cloning in a pseudo-unitary system

In conventional quantum mechanics, quantum no-deleting and no-cloning theorems indicate that two different and nonorthogonal states cannot be perfectly and deterministically deleted and cloned, respectively. Here, we investigate the quantum deleting and cloning in a pseudo-unitary system. We first present a pseudo-Hermitian Hamiltonian with real eigenvalues in a two-qubit system. By using the pseudo-unitary operators generated from this pseudo-Hermitian Hamiltonian, we show that it is possible to delete and clone a class of two different and nonorthogonal states, and it can be generalized to arbitrary two different and nonorthogonal pure qubit states. Furthermore, state discrimination, which is strongly related to quantum no-cloning theorem, is also discussed. Last but not least, we simulate the pseudo-unitary operators in conventional quantum mechanics with post-selection, and obtain the success probability of simulations. Pseudo-unitary operators are implemented with a limited efficiency due to the post-selections. Thus, the success probabilities of deleting and cloning in the simulation by conventional quantum mechanics are less than unity, which maintain the quantum no-deleting and no-cloning theorems.

preprint2021arXiv

Unveiling non-Abelian statistics of vortex Majorana bound states in iron-based superconductors using fermionic modes

Motivated by the recent experiments that reported the discovery of vortex Majorana bound states (vMBSs) in iron-based superconductors, we establish a portable scheme to unveil the non-Abelian statistics of vMBSs using normal fermionic modes. The unique non-Abelian statistics of vMBSs is characterized by the charge flip signal of the fermions that can be easily read out through the charge sensing measurement. In particular, the charge flip signal will be significantly suppressed for strong hybridized vMBSs or trivial vortex modes, which efficiently identifies genuine vMBSs. To eliminate the error induced by the unnecessary dynamical evolution of the fermionic modes, we further propose a correction strategy by continually reversing the energy of the fermions, reminiscent of the quantum Zeno effect. Finally, we establish a feasible protocol to perform non-Abelian braiding operations on vMBSs.

preprint2020arXiv

A Lattice Study of the Two-photon Decay Widths for Scalar and Pseudo-scalar Charmonium

In this exploratory study, two photon decay widths of pseudo-scalar ($η_c$) and scalar ($χ_{c0}$) charmonium are computed using two ensembles of $N_f=2$ twisted mass lattice QCD gauge configurations. The simulation is performed two lattice ensembles with lattice spacings $a=0.067$ fm with size $32^3\times{64}$ and $a=0.085$ fm with size $24^3\times{48}$, respectively. The results for the decay widths for the two charmonia are obtained which are in the right ballpark however smaller than the experimental ones. Possible reasons for these discrepancies are discussed.

preprint2020arXiv

Charmed and $ϕ$ meson decay constants from 2+1-flavor lattice QCD

On a lattice with 2+1-flavor dynamical domain-wall fermions at the physical pion mass, we calculate the decay constants of $D_{s}^{(*)}$, $D^{(*)}$ and $ϕ$. The lattice size is $48^3\times96$, which corresponds to a spatial extension of $\sim5.5$ fm with the lattice spacing $a\approx 0.114$ fm. For the valence light, strange and charm quarks, we use overlap fermions at several mass points close to their physical values. Our results at the physical point are $f_D=213(5)$ MeV, $f_{D_s}=249(7)$ MeV, $f_{D^*}=234(6)$ MeV, $f_{D_s^*}=274(7)$ MeV, and $f_ϕ=241(9)$ MeV. The couplings of $D^*$ and $D_s^*$ to the tensor current ($f_V^T$) can be derived, respectively, from the ratios $f_{D^*}^T/f_{D^*}=0.91(4)$ and $f_{D_s^*}^T/f_{D_s^*}=0.92(4)$, which are the first lattice QCD results. We also obtain the ratios $f_{D^*}/f_D=1.10(3)$ and $f_{D_s^*}/f_{D_s}=1.10(4)$, which reflect the size of heavy quark symmetry breaking in charmed mesons. The ratios $f_{D_s}/f_{D}=1.16(3)$ and $f_{D_s^*}/f_{D^*}=1.17(3)$ can be taken as a measure of SU(3) flavor symmetry breaking.

preprint2020arXiv

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing.

preprint2020arXiv

Emulating quantum teleportation of a Majorana zero mode qubit

Topological quantum computation based on anyons is a promising approach to achieve fault-tolerant quantum computing. The Majorana zero modes in the Kitaev chain are an example of non-Abelian anyons where braiding operations can be used to perform quantum gates. Here we perform a quantum simulation of topological quantum computing, by teleporting a qubit encoded in the Majorana zero modes of a Kitaev chain. The quantum simulation is performed by mapping the Kitaev chain to its equivalent spin version, and realizing the ground states in a superconducting quantum processor. The teleportation transfers the quantum state encoded in the spin-mapped version of the Majorana zero mode states between two Kitaev chains. The teleportation circuit is realized using only braiding operations, and can be achieved despite being restricted to Clifford gates for the Ising anyons. The Majorana encoding is a quantum error detecting code for phase flip errors, which is used to improve the average fidelity of the teleportation for six distinct states from $70.76 \pm 0.35 \% $ to $84.60 \pm 0.11 \%$, well beyond the classical bound in either case.

preprint2020arXiv

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach.

preprint2020arXiv

Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering

Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent works either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA leaderboard.

preprint2020arXiv

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. In addition, we describe a method to address the lack of task-specific data by synthesizing examples for the APR task using the datasets collected for Grammatical Error Correction (GEC) followed by text-to-speech (TTS) and ASR. Furthermore, we propose metrics borrowed from similar tasks to evaluate performance on the APR task. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method. Our results suggest that finetuned models improve the performance on the APR task significantly, hinting at the potential benefits of using APR systems. We hope that the read, understand, and rewrite approach of our work can serve as a basis that many NLP tasks and human readers can benefit from.

preprint2020arXiv

Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning

We study the problem of generating inferential texts of events for a variety of commonsense like \textit{if-else} relations. Existing approaches typically use limited evidence from training examples and learn for each relation individually. In this work, we use multiple knowledge sources as fuels for the model. Existing commonsense knowledge bases like ConceptNet are dominated by taxonomic knowledge (e.g., \textit{isA} and \textit{relatedTo} relations), having a limited number of inferential knowledge. We use not only structured commonsense knowledge bases, but also natural language snippets from search-engine results. These sources are incorporated into a generative base model via key-value memory network. In addition, we introduce a meta-learning based multi-task learning algorithm. For each targeted commonsense relation, we regard the learning of examples from other relations as the meta-training process, and the evaluation on examples from the targeted relation as the meta-test process. We conduct experiments on Event2Mind and ATOMIC datasets. Results show that both the integration of multiple knowledge sources and the use of the meta-learning algorithm improve the performance.

preprint2020arXiv

Lattice QCD package GWU-code and QUDA with HIP

The open source HIP platform for GPU computing provides an uniform framework to support both the NVIDIA and AMD GPUs, and also the possibility to porting the CUDA code to the HIP- compatible one. We present the porting progress on the Overlap fermion inverter (GWU-code) and also the general Lattice QCD inverter package - QUDA. The manual of using QUDA on HIP and also the tips of porting general CUDA code into the HIP framework are also provided.

preprint2020arXiv

LogicalFactChecker: Leveraging Logical Operations for Fact Checking with Graph Module Network

Verifying the correctness of a textual statement requires not only semantic reasoning about the meaning of words, but also symbolic reasoning about logical operations like count, superlative, aggregation, etc. In this work, we propose LogicalFactChecker, a neural network approach capable of leveraging logical operations for fact checking. It achieves the state-of-the-art performance on TABFACT, a large-scale, benchmark dataset built for verifying a textual statement with semi-structured tables. This is achieved by a graph module network built upon the Transformer-based architecture. With a textual statement and a table as the input, LogicalFactChecker automatically derives a program (a.k.a. logical form) of the statement in a semantic parsing manner. A heterogeneous graph is then constructed to capture not only the structures of the table and the program, but also the connections between inputs with different modalities. Such a graph reveals the related contexts of each word in the statement, the table and the program. The graph is used to obtain graph-enhanced contextual representations of words in Transformer-based architecture. After that, a program-driven module network is further introduced to exploit the hierarchical structure of the program, where semantic compositionality is dynamically modeled along the program structure with a set of function-specific modules. Ablation experiments suggest that both the heterogeneous graph and the module network are important to obtain strong results.

preprint2020arXiv

Mining Implicit Relevance Feedback from User Behavior for Web Question Answering

Training and refreshing a web-scale Question Answering (QA) system for a multi-lingual commercial search engine often requires a huge amount of training examples. One principled idea is to mine implicit relevance feedback from user behavior recorded in search engine logs. All previous works on mining implicit relevance feedback target at relevance of web documents rather than passages. Due to several unique characteristics of QA tasks, the existing user behavior models for web documents cannot be applied to infer passage relevance. In this paper, we make the first study to explore the correlation between user behavior and passage relevance, and propose a novel approach for mining training data for Web QA. We conduct extensive experiments on four test datasets and the results show our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine, especially for languages with low resources. Our techniques have been deployed in multi-language services.

preprint2020arXiv

Non-Abelian Aharonov-Bohm Caging in Photonic Lattices

Aharonov-Bohm (AB) caging is the localization effect in translational-invariant lattices due to destructive interference induced by penetrated magnetic fields. While current research focuses mainly on the case of Abelian AB caging, here we go beyond and develop the non-Abelian AB caging concept by considering the particle localization in a 1D multi-component rhombic lattice with non-Abelian background gauge field. In contrast to its Abelian counterpart, the non-Abelian AB cage depends on both the form of the nilpotent interference matrix and the initial state of the lattice. This phenomena is the consequence of the non-Abelian nature of the gauge potential and thus has no Abelian analog. We further propose a circuit quantum electrodynamics realization of the proposed physics, in which the required non-Abelian gauge field can be synthesized by the parametric conversion method, and the non-Abelian AB caging can be unambiguously demonstrated through the pumping and the steady-state measurements of only a few sites on the lattice. Requiring only currently available technique, our proposal can be readily tested in experiment and may pave a new route towards the investigation of exotic photonic quantum fluids.

preprint2020arXiv

Pre-training Text Representations as Meta Learning

Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, we introduce a learning algorithm which directly optimizes model&#39;s ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps. The standard multi-task learning objective adopted in BERT is a special case of our learning algorithm where the depth of meta-train is zero. We study the problem in two settings: unsupervised pre-training and supervised pre-training with different pre-training objects to verify the generality of our approach.Experimental results show that our algorithm brings improvements and learns better initializations for a variety of downstream tasks.

preprint2020arXiv

Roper State from Overlap Fermions

The Roper state is extracted with valence overlap fermions on a $2+1$-flavor domain-wall fermion lattice (spacing $a = 0.114$ fm and $m_π = 330$ MeV) using both the Sequential Empirical Bayes (SEB) method and the variational method. The results are consistent, provided that a large smearing-size interpolation operator is included in the variational calculation to have better overlap with the lowest radial excitation. Similar calculations carried out for an anisotropic clover lattice with similar parameters find the Roper $\approx 280$ MeV higher than that of the overlap fermion. The fact that the prediction of the Roper state by overlap fermions is consistently lower than those of clover fermions, chirally improved fermions, and twisted-mass fermions over a wide range of pion masses has been dubbed a &#34;Roper puzzle.&#34; To understand the origin of this difference, we study the hairpin $Z$-diagram in the isovector scalar meson ($a_0$) correlator in the quenched approximation. Comparing the $a_0$ correlators for clover and overlap fermions, at a pion mass of 290 MeV, we find that the spectral weight of the ghost state with clover fermions is smaller than that of the overlap at $a = 0.12$ fm and $0.09$ fm, whereas the whole $a_0$ correlators of clover and overlap at $a = 0.06$ fm coincide within errors. This suggests that chiral symmetry is restored for clover at $a \le 0.06$ fm and that the Roper should come down at and below this $a$. We conclude that this work supports a resolution of the &#34;Roper puzzle&#34; due to $Z$-graph type chiral dynamics. This entails coupling to higher components in the Fock space (e.g. $Nπ$, $Nππ$ states) to induce the effective flavor-spin interaction between quarks as prescribed in the chiral quark model, resulting in the parity-reversal pattern as observed in the experimental excited states of $N, Δ$ and $Λ$.

preprint2020arXiv

Strangeonium-like hybrids on the lattice

The strangeonium-like $s\bar{s}g$ hybrids are investigated from lattice QCD in the quenched approximation. In the Coulomb gauge, spatially extended operators are constructed for $1^{--}$ and $(0,1,2)^{-+}$ states with the color octet $s\bar{s}$ component being separated from the chromomagnetic field strength by spatial distances $r$, whose matrix elements between the vacuum and the corresponding states are interpreted as Bethe-Salpeter (BS) wave functions. In each of the $(1,2)^{-+}$ channels, the masses and the BS wave functions are reliably derived. The $1^{-+}$ ground state mass is around 2.1-2.2 GeV, and that of $2^{-+}$ is around 2.3-2.4 GeV, while the masses of the first excited states are roughly 1.4 GeV higher. This mass splitting is much larger than the expectation of the phenomenological flux-tube model or constituent gluon model for hybrids, which is usually a few hundred MeV. The BS wave functions with respect to $r$ show clear radial nodal structures of non-relativistic two-body system, which imply that $r$ is a meaningful dynamical variable for these hybrids and motivate a color halo picture of hybrids that the color octet $s\bar{s}$ is surrounded by gluonic degrees of freedom. In the $1^{--}$ channel, the properties of the lowest two states comply with those of $ϕ(1020)$ and $ϕ(1680)$. We have not obtained convincing information relevant to $ϕ(2170)$ yet, however, we argue that whether $ϕ(2170)$ is a conventional $s\bar{s}$ meson or a $s\bar{s}g$ hybrid within the color halo scenario, the ratio of partial decay widths $Γ(ϕη)$ and $Γ(ϕη&#39;)$ observed by BESIII can be understood by the mechanism of hadronic transition of a strangeonium-like meson along with the $η-η&#39;$ mixing.

preprint2020arXiv

Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding

Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer. There are two approaches to solve this problem. One is an extractive method which extracts candidate answers from the passage with the Open IE model, and ranks them by matching with questions. It fully uses the passage information at the extraction step, but the extraction is independent to the question. The other one is the generative method which uses a sequence to sequence model to generate answers directly. It combines the question and passage as input at the same time, but it generates the answer from scratch, which does not use the facts that most of the answer words come from in the passage. To guide the generation by passage, we present a two-stage decoding model which contains a tagging decoder and a correction decoder. At the first stage, the tagging decoder will tag keywords from the passage. At the second stage, the correction decoder will generate answers based on tagged keywords. Our model could be trained end-to-end although it has two stages. Compared to previous generative models, we generate better answers by generating coarse to fine. We evaluate our model on WebAssertions (Yan et al., 2018) which is a Question aware Open IE dataset. Our model achieves a BLEU score of 59.32, which is better than previous generative methods.

preprint2020arXiv

TGGLines: A Robust Topological Graph Guided Line Segment Detector for Low Quality Binary Images

Line segment detection is an essential task in computer vision and image analysis, as it is the critical foundation for advanced tasks such as shape modeling and road lane line detection for autonomous driving. We present a robust topological graph guided approach for line segment detection in low quality binary images (hence, we call it TGGLines). Due to the graph-guided approach, TGGLines not only detects line segments, but also organizes the segments with a line segment connectivity graph, which means the topological relationships (e.g., intersection, an isolated line segment) of the detected line segments are captured and stored; whereas other line detectors only retain a collection of loose line segments. Our empirical results show that the TGGLines detector visually and quantitatively outperforms state-of-the-art line segment detection methods. In addition, our TGGLines approach has the following two competitive advantages: (1) our method only requires one parameter and it is adaptive, whereas almost all other line segment detection methods require multiple (non-adaptive) parameters, and (2) the line segments detected by TGGLines are organized by a line segment connectivity graph.

preprint2020arXiv

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (2) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder(Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.

preprint2019arXiv

A coupled-channel lattice study on the resonance-like structure $Z_c(3900)$

In this exploratory study, near-threshold scattering of $D$ and $\bar{D}^*$ meson is investigated using lattice QCD with $N_f=2+1+1$ twisted mass fermion configurations. The calculation is performed within the coupled-channel Lüscher&#39;s finite-size formalism. The study focuses on the channel with $I^G(J^{PC})=1^+(1^{+-})$ where the resonance-like structure $Z_c(3900)$ was discovered. We first identify the most relevant two channels of the problem and the lattice study is performed within the two-channel scattering model. Combined with a two-channel Ross-Shaw theory, scattering parameters are extracted from the energy levels by solving the generalized eigenvalue problem. Our results on the scattering length parameters suggest that, at the particular lattice parameters that we studied, the best fitted parameters do not correspond to a peak behavior in the elastic scattering cross section near the threshold. Furthermore, within the zero-range Ross-Shaw theory, the scenario of a narrow resonance close to the threshold is disfavored beyond $3σ$ level.

preprint2019arXiv

Anomalous relaxation and multiply time scales in the quantum $XY$ model with boundary dissipation

The relaxation of many-body system is still a challenging problem that has not well been understood. In this work we exactly calculate the dynamics of the quantum $XY$ model with boundary dissipation, in which the density matrix in terms of Majorana operators can be decoupled into independent subspaces represented by different number of Majorana fermions. The relaxation is characterized by multiply time scales, and in the long-time limit it is determined by the single particle relaxation process in a typical time scale $T^*$. For the bulk bands, we find $T^* \propto N^3/γn^2 $ in the weak dissipation limit; and $T^* \propto γN^3/ n^2$ in the strong dissipation limit, where $N$ is the chain length, $γ$ is the dissipation rate and $n$ is the band index. For the edge modes $T^* \propto 1/γ$, indicating of most vulnerable to dissipation in the long chain limit. These results are counter-intuitive because it means any weak dissipation can induce relaxation, while strong dissipation can induce weak relaxation. We find that these two limits correspond to two different physics, which are explained based on the first and second-order perturbation theory in an equivalent non-Hermitian model.Furthermore, we show that even in the long chain limit the relaxation may exhibit strong odd-even effect. These results shade new insight into the dynamics of topological qubits in environment.

preprint2018arXiv

Anomalous isothermal compressibility in spin-orbit coupled degenerate Fermi gases

The spin-orbit coupling (SOC) in degenerate Fermi gases can fundamentally change the fate of $s$-wave superfluids with strong Zeeman field and give rise to topological superfluids and associated Majorana zero modes. It also dramatically changes the thermodynamic properties of the superfluids. Here we report the anomalous isothermal compressibility $κ_T$ in this superfluids with both SOC and Zeeman field. We formulate this quantity from the Gibbs-Duhem equation and show that the contribution of $κ_T$ comes from the explicit contribution of chemical potential and implicit contribution of order parameter. In the Bardeen-Cooper-Schrieffer (BCS) limit, this compressibility is determined by the density of state near the Fermi surface; while in the Bose Einstein condensate (BEC) regime it is determined by the scattering length. Between these two limits, we find that the anomalous peaks can only be found in the gapless Weyl phase regime. This anomalous behavior can be regarded as a remanent effect of phase separation. The similar physics can also be found in the lattice model away from half filling. These predictions can be measured from the anomalous response of sound velocity and fluctuation of carrier density.