Source author record

Yuan Cao

Yuan Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

56works

26topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Forbidden second harmonics in centrosymmetric bilayer crystals

Optical spectroscopy based on second-order nonlinearity is a critical technique for characterizing two-dimensional (2D) crystals as well as bioimaging and quantum optics. It is generally believed that second-harmonic generation (SHG) in centrosymmetric crystals, such as graphene and other bilayer 2D crystals, is negligible without externally breaking the inversion symmetry. Here, we show that with a new homodyne detection technique, we can apparently circumvent this symmetry-imposed constraint and observe robust SHG in pristine centrosymmetric crystals, without any symmetry-breaking field. With its exceptional sensitivity, we resolve polarization-resolved SHG in bilayer hexagonal boron nitride (h-BN), bilayer 2H-WSe$_2$, and remarkably, Bernal-stacked bilayer graphene, allowing us to unambiguously identify the crystallographic orientation in these crystals via SHG for the first time. We also demonstrate that the new technique can be used to non-invasively detect uniaxial strain and optical geometric phase in these crystals. The observed SHG in our experiments is attributed to second-order nonlinearity in the quadrupole channel, which is controlled by the presence of the $C_2$ symmetry instead of the inversion symmetry. Our new technique expands the capability of nonlinear optical spectroscopy to encompass a large class of centrosymmetric materials that could never be measured before, and can be used for quantum sensing of moiré materials and twisted epitaxial films.

preprint2026arXiv

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Ensuring that Large Language Models (LLMs) adhere to safety principles without refusing benign requests remains a significant challenge. While OpenAI introduces deliberative alignment (DA) to enhance the safety of its o-series models through reasoning over detailed ``code-like'' safety rules, the effectiveness of this approach in open-source LLMs, which typically lack advanced reasoning capabilities, is understudied. In this work, we systematically evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases. We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness, whereas training on case-augmented simple codes yields more robust and generalized safety behaviors. By guiding LLMs with case-augmented reasoning instead of extensive code-like safety rules, we avoid rigid adherence to narrowly enumerated rules and enable broader adaptability. Building on these insights, we propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains. CADA effectively enhances harmlessness, improves robustness against attacks, and reduces over-refusal while preserving utility across diverse benchmarks, offering a practical alternative to rule-only DA for improving safety while maintaining helpfulness.

preprint2026arXiv

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction and generation. In this work, we investigate how transformers with softmax attention perform in-context learning on linear classification data. We first construct a class of multi-layer transformers that can perform in-context logistic regression, with each layer exactly performing one step of normalized gradient descent on an in-context loss. Then, we show that our constructed transformer can be obtained through (i) training a single self-attention layer supervised by one-step gradient descent, and (ii) recurrently applying the trained layer to obtain a looped model. Training convergence guarantees of the self-attention layer and out-of-distribution generalization guarantees of the looped model are provided. Our results advance the theoretical understanding of ICL mechanism by showcasing how softmax transformers can effectively act as in-context learners.

preprint2022arXiv

Benign Overfitting in Two-layer Convolutional Neural Networks

Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve a constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.

preprint2022arXiv

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing practical MT models for under-served languages by leveraging massively multilingual models trained with supervised parallel data for over 100 high-resource languages and monolingual datasets for an additional 1000+ languages; and (iii) Studying the limitations of evaluation metrics for these languages and conducting qualitative analysis of the outputs from our MT models, highlighting several frequent error modes of these types of models. We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings.

preprint2022arXiv

Description-Driven Task-Oriented Dialog Modeling

Task-oriented dialogue (TOD) systems are required to identify key information from conversations for the completion of given tasks. Such information is conventionally specified in terms of intents and slots contained in task-specific ontology or schemata. Since these schemata are designed by system developers, the naming convention for slots and intents is not uniform across tasks, and may not convey their semantics effectively. This can lead to models memorizing arbitrary patterns in data, resulting in suboptimal performance and generalization. In this paper, we propose that schemata should be modified by replacing names or notations entirely with natural language descriptions. We show that a language description-driven system exhibits better understanding of task specifications, higher performance on state tracking, improved data efficiency, and effective zero-shot transfer to unseen tasks. Following this paradigm, we present a simple yet effective Description-Driven Dialog State Tracking (D3ST) model, which relies purely on schema descriptions and an "index-picking" mechanism. We demonstrate the superiority in quality, data efficiency and robustness of our approach as measured on the MultiWOZ (Budzianowski et al.,2018), SGD (Rastogi et al., 2020), and the recent SGD-X (Lee et al., 2021) benchmarks.

preprint2022arXiv

Micius quantum experiments in space

Quantum theory has been successfully validated in numerous laboratory experiments. But would such a theory, which excellently describes the behavior of microscopic physical systems, and its predicted phenomena such as quantum entanglement, be still applicable on very large length scales? From a practical perspective, how can quantum key distribution -- where the security of establishing secret keys between distant parties is ensured by the laws of quantum mechanics -- be made technologically useful on a global scale? Due to photon loss in optical fibers and terrestrial free space, the achievable distance using direct transmission of single photons has been limited to a few hundred kilometers. A promising route to testing quantum physics over long distances and in the relativistic regimes, and thus realizing flexible global-scale quantum networks is via the use of satellites and space-based technologies, where a significant advantage is that the photon loss and turbulence predominantly occurs in the lower ~ 10 km of the atmosphere, and most of the photons' transmission path in the space is virtually in vacuum with almost zero absorption and decoherence. In this Article, we review the progress in free-space quantum experiments, with a focus on the fast-developing Micius satellite-based quantum communications. The perspective of space-ground integrated quantum networks and fundamental quantum optics experiments in space conceivable with satellites are discussed.

preprint2022arXiv

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation

Multilingual neural machine translation models are trained to maximize the likelihood of a mix of examples drawn from multiple language pairs. The dominant inductive bias applied to these models is a shared vocabulary and a shared set of parameters across languages; the inputs and labels corresponding to examples drawn from different language pairs might still reside in distinct sub-spaces. In this paper, we introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level. Our approach interpolates instances from different language pairs into joint `crossover examples' in order to encourage sharing input and output spaces across languages. To ensure better fusion of examples in multilingual settings, we propose several techniques to improve example interpolation across dissimilar languages under heavy data imbalance. Experiments on a large-scale WMT multilingual dataset demonstrate that our approach significantly improves quality on English-to-Many, Many-to-English and zero-shot translation tasks (from +0.5 BLEU up to +5.5 BLEU points). Results on code-switching sets demonstrate the capability of our approach to improve model generalization to out-of-distribution multilingual examples. We also conduct qualitative and quantitative representation comparisons to analyze the advantages of our approach at the representation level.

preprint2022arXiv

Portable ground stations for space-to-ground quantum key distribution

Quantum key distribution (QKD) uses the fundamental principles of quantum mechanics to share unconditionally secure keys between distant users. Previous works based on the quantum science satellite "Micius" have initially demonstrated the feasibility of a global QKD network. However, the practical applications of space-based QKD still face many technical problems, such as the huge size and weight of ground stations required to receive quantum signals. Here, we report space-to-ground QKD demonstrations based on portable receiving ground stations. The weight of the portable ground station is less than 100 kg, the space required is less than 1 m$^{3}$ and the installation time requires no more than 12 hours, all of the weight, required space and deployment time are about two orders of magnitude lower than those for the previous systems. Moreover, the equipment is easy to handle and can be placed on the roof of buildings in a metropolis. Secure keys have been successfully generated from the "Micius" satellite to these portable ground stations at six different places in China, and an average final secure key length is around 50 kb can be obtained during one passage. Our results pave the way for, and greatly accelerate the practical application of, space-based QKD.

preprint2022arXiv

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this "benign overfitting" phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mixtures, and provide a tight risk bound for the maximum margin linear classifier in the over-parameterized setting. Our results precisely characterize the condition under which benign overfitting can occur in linear classification problems, and improve on previous work. They also have direct implications for over-parameterized logistic regression.

preprint2022arXiv

SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

Zero/few-shot transfer to unseen services is a critical challenge in task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support any service in zero-shot through schemas, which describe service APIs to models in natural language. We explore the robustness of dialogue systems to linguistic variations in schemas by designing SGD-X - a benchmark extending SGD with semantically similar yet stylistically diverse variants for every schema. We observe that two top state tracking models fail to generalize well across schema variants, measured by joint goal accuracy and a novel metric for measuring schema sensitivity. Additionally, we present a simple model-agnostic data augmentation method to improve schema robustness.

preprint2022arXiv

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks. However, the requirement for expensive annotations including clean image captions and regional labels limits the scalability of existing approaches, and complicates the pretraining procedure with the introduction of multiple dataset-specific objectives. In this work, we relax these constraints and present a minimalist pretraining framework, named Simple Visual Language Model (SimVLM). Unlike prior work, SimVLM reduces the training complexity by exploiting large-scale weak supervision, and is trained end-to-end with a single prefix language modeling objective. Without utilizing extra data or task-specific customization, the resulting model significantly outperforms previous pretraining methods and achieves new state-of-the-art results on a wide range of discriminative and generative vision-language benchmarks, including VQA (+3.74% vqa-score), NLVR2 (+1.17% accuracy), SNLI-VE (+1.37% accuracy) and image captioning tasks (+10.1% average CIDEr score). Furthermore, we demonstrate that SimVLM acquires strong generalization and transfer ability, enabling zero-shot behavior including open-ended visual question answering and cross-modality transfer.

preprint2022arXiv

Spin Manipulation by Giant Valley-Zeeman Spin-Orbit Field in Atom-Thick WSe2

The phenomenon originating from spin-orbit coupling (SOC) provides energy-efficient strategies for spin manipulation and device applications. The broken inversion symmetry interface and resulting electric field induce a Rashba-type spin-orbit field (SOF), which has been demonstrated to generate spin-orbit torque for data storage applications. In this study, we found that spin flipping can be achieved by the valley-Zeeman SOF in monolayer WSe2 at room temperature, which manifests as a negative magnetoresistance in the vertical spin valve. Quantum transmission calculations based on an effective model near the K valley of WSe2 confirm the precessional spin transport of carriers under the giant SOF, which is estimated to be 650 T. In particular, the valley-Zeeman SOF-induced spin dynamics was demonstrated to be tunable with the layer number and stacking phase of WSe2 as well as the gate voltage, which provides a novel strategy for spin manipulation and can benefit the development of ultralow-power spintronic devices.

preprint2022arXiv

The geometry of integration in text classification RNNs

Despite the widespread application of recurrent neural networks (RNNs) across a variety of tasks, a unified understanding of how RNNs solve these tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of a specific natural language processing task: text classification. Using tools from dynamical systems analysis, we study recurrent networks trained on a battery of both natural and synthetic text classification tasks. We find the dynamics of these trained RNNs to be both interpretable and low-dimensional. Specifically, across architectures and datasets, RNNs accumulate evidence for each class as they process the text, using a low-dimensional attractor manifold as the underlying mechanism. Moreover, the dimensionality and geometry of the attractor manifold are determined by the structure of the training dataset; in particular, we describe how simple word-count statistics computed on the training dataset can be used to predict these properties. Our observations span multiple architectures and datasets, reflecting a common mechanism RNNs employ to perform text classification. To the degree that integration of evidence towards a decision is a common computational primitive, this work lays the foundation for using dynamical systems techniques to study the inner workings of RNNs.

preprint2022arXiv

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Achieving universal translation between all human language pairs is the holy-grail of machine translation (MT) research. While recent progress in massively multilingual MT is one step closer to reaching this goal, it is becoming evident that extending a multilingual MT system simply by training on more parallel data is unscalable, since the availability of labeled data for low-resource and non-English-centric language pairs is forbiddingly limited. To this end, we present a pragmatic approach towards building a multilingual MT model that covers hundreds of languages, using a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs. We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting, even surpassing supervised translation quality for low- and mid-resource languages. We conduct a wide array of experiments to understand the effect of the degree of multilingual supervision, domain mismatches and amounts of parallel and monolingual data on the quality of our self-supervised multilingual models. To demonstrate the scalability of the approach, we train models with over 200 languages and demonstrate high performance on zero-resource translation on several previously under-studied languages. We hope our findings will serve as a stepping stone towards enabling translation for the next thousand languages.

preprint2022arXiv

Unsupervised Slot Schema Induction for Task-oriented Dialog

Carefully-designed schemas describing how to collect and annotate dialog corpora are a prerequisite towards building task-oriented dialog systems. In practical applications, manually designing schemas can be error-prone, laborious, iterative, and slow, especially when the schema is complicated. To alleviate this expensive and time consuming process, we propose an unsupervised approach for slot schema induction from unlabeled dialog corpora. Leveraging in-domain language models and unsupervised parsing structures, our data-driven approach extracts candidate slots without constraints, followed by coarse-to-fine clustering to induce slot types. We compare our method against several strong supervised baselines, and show significant performance improvement in slot schema induction on MultiWoz and SGD datasets. We also demonstrate the effectiveness of induced schemas on downstream applications including dialog state tracking and response generation.

preprint2021arXiv

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in $\mathrm{poly}(d,1/\varepsilon)$ time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent.

preprint2021arXiv

Benign Overfitting in Adversarially Robust Linear Classification

"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community. To explain this surprising phenomenon, a series of works have provided theoretical justification in over-parameterized linear regression, classification, and kernel methods. However, it is not clear if benign overfitting still occurs in the presence of adversarial examples, i.e., examples with tiny and intentional perturbations to fool the classifiers. In this paper, we show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples. In detail, we prove the risk bounds of the adversarially trained linear classifier on the mixture of sub-Gaussian data under $\ell_p$ adversarial perturbations. Our result suggests that under moderate perturbations, adversarially trained linear classifiers can achieve the near-optimal standard and adversarial risks, despite overfitting the noisy training data. Numerical experiments validate our theoretical findings.

preprint2021arXiv

Echo State Speech Recognition

We propose automatic speech recognition (ASR) models inspired by echo state network (ESN), in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained. Our study focuses on RNN-T and Conformer models, and we show that model quality does not drop even when the decoder is fully randomized. Furthermore, such models can be trained more efficiently as the decoders do not require to be updated. By contrast, randomizing encoders hurts model quality, indicating that optimizing encoders and learn proper representations for acoustic inputs are more vital for speech recognition. Overall, we challenge the common practice of training ASR models for all components, and demonstrate that ESN-based models can perform equally well but enable more efficient training and storage than fully-trainable counterparts.

preprint2021arXiv

Fractional Chern insulators in magic-angle twisted bilayer graphene

Fractional Chern insulators (FCIs) are lattice analogues of fractional quantum Hall states that may provide a new avenue toward manipulating non-abelian excitations. Early theoretical studies have predicted their existence in systems with energetically flat Chern bands and highlighted the critical role of a particular quantum band geometry. Thus far, however, FCI states have only been observed in Bernal-stacked bilayer graphene aligned with hexagonal boron nitride (BLG/hBN), in which a very large magnetic field is responsible for the existence of the Chern bands, precluding the realization of FCIs at zero field and limiting its potential for applications. By contrast, magic angle twisted bilayer graphene (MATBG) supports flat Chern bands at zero magnetic field, and therefore offers a promising route toward stabilizing zero-field FCIs. Here we report the observation of eight FCI states at low magnetic field in MATBG enabled by high-resolution local compressibility measurements. The first of these states emerge at 5 T, and their appearance is accompanied by the simultaneous disappearance of nearby topologically-trivial charge density wave states. Unlike the BLG/hBN platform, we demonstrate that the principal role of the weak magnetic field here is merely to redistribute the Berry curvature of the native Chern bands and thereby realize a quantum band geometry favorable for the emergence of FCIs. Our findings strongly suggest that FCIs may be realized at zero magnetic field and pave the way for the exploration and manipulation of anyonic excitations in moiré systems with native flat Chern bands.

preprint2021arXiv

High-Temperature Structure Detection in Ferromagnets

This paper studies structure detection problems in high temperature ferromagnetic (positive interaction only) Ising models. The goal is to distinguish whether the underlying graph is empty, i.e., the model consists of independent Rademacher variables, versus the alternative that the underlying graph contains a subgraph of a certain structure. We give matching upper and lower minimax bounds under which testing this problem is possible/impossible respectively. Our results reveal that a key quantity called graph arboricity drives the testability of the problem. On the computational front, under a conjecture of the computational hardness of sparse principal component analysis, we prove that, unless the signal is strong enough, there are no polynomial time tests which are capable of testing this problem. In order to prove this result we exhibit a way to give sharp inequalities for the even moments of sums of i.i.d. Rademacher random variables which may be of independent interest.

preprint2021arXiv

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target error $ε^{-1}$, deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees. Very recently, it is shown that under certain margin assumptions on the training data, a polylogarithmic width condition suffices for two-layer ReLU networks to converge and generalize (Ji and Telgarsky, 2019). However, whether deep neural networks can be learned with such a mild over-parameterization is still an open question. In this work, we answer this question affirmatively and establish sharper learning guarantees for deep ReLU networks trained by (stochastic) gradient descent. In specific, under certain assumptions made in previous work, our optimization and generalization guarantees hold with network width polylogarithmic in $n$ and $ε^{-1}$. Our results push the study of over-parameterized deep neural networks towards more practical settings.

preprint2021arXiv

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.

preprint2021arXiv

Quantum Random Number Generation with Uncharacterized Laser and Sunlight

The entropy or randomness source is an essential ingredient in random number generation. Quantum random number generators generally require well modeled and calibrated light sources, such as a laser, to generate randomness. With uncharacterized light sources, such as sunlight or an uncharacterized laser, genuine randomness is practically hard to be quantified or extracted owing to its unknown or complicated structure. By exploiting a recently proposed source-independent randomness generation protocol, we theoretically modify it by considering practical issues and experimentally realize the modified scheme with an uncharacterized laser and a sunlight source. The extracted randomness is guaranteed to be secure independent of its source and the randomness generation speed reaches 1 Mbps, three orders of magnitude higher than the original realization. Our result signifies the power of quantum technology in randomness generation and paves the way to high-speed semi-self-testing quantum random number generators with practical light sources.

preprint2020arXiv

Agnostic Learning of a Single Neuron with Gradient Descent

We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\mathbb{E}_{(x,y)\sim \mathcal{D}}[(σ(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d. samples $S\sim \mathcal{D}^n$. The activation function $σ$ is an arbitrary Lipschitz and non-decreasing function, making the optimization problem nonconvex and nonsmooth in general, and covers typical neural network activation functions and inverse link functions in the generalized linear model setting. In the agnostic PAC learning setting, where no assumption on the relationship between the labels $y$ and the input $x$ is made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient descent achieves population risk $O(\mathsf{OPT})+ε$ in polynomial time and sample complexity when $σ$ is strictly increasing. For the ReLU activation, our population risk guarantee is $O(\mathsf{OPT}^{1/2})+ε$. When labels take the form $y = σ(v^\top x) + ξ$ for zero-mean sub-Gaussian noise $ξ$, we show that the population risk guarantees for gradient descent improve to $\mathsf{OPT} + ε$. Our sample complexity and runtime guarantees are (almost) dimension independent, and when $σ$ is strictly increasing, require no distributional assumptions beyond boundedness. For ReLU, we show the same results under a nondegeneracy assumption for the marginal distribution of the input.

preprint2020arXiv

An explicit expression for Euclidean self-dual cyclic codes of length $2^k$ over Galois ring ${\rm GR}(4,m)$

For any positive integers $m$ and $k$, existing literature only determines the number of all Euclidean self-dual cyclic codes of length $2^k$ over the Galois ring ${\rm GR}(4,m)$, such as in [Des. Codes Cryptogr. (2012) 63:105--112]. Using properties for Kronecker products of matrices of a specific type and column vectors of these matrices, we give a simple and efficient method to construct all these self-dual cyclic codes precisely. On this basis, we provide an explicit expression to accurately represent all distinct Euclidean self-dual cyclic codes of length $2^k$ over ${\rm GR}(4,m)$, using combination numbers. As an application, we list all distinct Euclidean self-dual cyclic codes over ${\rm GR}(4,m)$ of length $2^k$ explicitly, for $k=4,5,6$.

preprint2020arXiv

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD) with momentum in training deep neural networks. This leaves how to close the generalization gap of adaptive gradient methods an open problem. In this work, we show that adaptive gradient methods such as Adam, Amsgrad, are sometimes "over adapted". We design a new algorithm, called Partially adaptive momentum estimation method, which unifies the Adam/Amsgrad with SGD by introducing a partial adaptive parameter $p$, to achieve the best from both worlds. We also prove the convergence rate of our proposed algorithm to a stationary point in the stochastic nonconvex optimization setting. Experiments on standard benchmarks show that our proposed algorithm can maintain a fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks. These results would suggest practitioners pick up adaptive gradient methods once again for faster training of deep neural networks.

preprint2020arXiv

Deep-Learning-Enabled Fast Optical Identification and Characterization of Two-Dimensional Materials

Advanced microscopy and/or spectroscopy tools play indispensable role in nanoscience and nanotechnology research, as it provides rich information about the growth mechanism, chemical compositions, crystallography, and other important physical and chemical properties. However, the interpretation of imaging data heavily relies on the "intuition" of experienced researchers. As a result, many of the deep graphical features obtained through these tools are often unused because of difficulties in processing the data and finding the correlations. Such challenges can be well addressed by deep learning. In this work, we use the optical characterization of two-dimensional (2D) materials as a case study, and demonstrate a neural-network-based algorithm for the material and thickness identification of exfoliated 2D materials with high prediction accuracy and real-time processing capability. Further analysis shows that the trained network can extract deep graphical features such as contrast, color, edges, shapes, segment sizes and their distributions, based on which we develop an ensemble approach topredict the most relevant physical properties of 2D materials. Finally, a transfer learning technique is applied to adapt the pretrained network to other applications such as identifying layer numbers of a new 2D material, or materials produced by a different synthetic approach. Our artificial-intelligence-based material characterization approach is a powerful tool that would speed up the preparation, initial characterization of 2D materials and other nanomaterials and potentially accelerate new material discoveries.

preprint2020arXiv

Echo State Neural Machine Translation

We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder layer weights are randomly generated then fixed throughout training. We show that even with this extremely simple model construction and training procedure, ESNMT can already reach 70-80% quality of fully trainable baselines. We examine how spectral radius of the reservoir, a key quantity that characterizes the model, determines the model behavior. Our findings indicate that randomized networks can work well even for complicated sequence-to-sequence prediction NLP tasks.

preprint2020arXiv

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution modeling of prosody by conditioning finer level representations on coarser level ones. Additionally, it imposes hierarchical conditioning across all latent dimensions using a conditional variational auto-encoder (VAE) with an auto-regressive structure. Evaluation of reconstruction performance illustrates that the new structure does not degrade the model while allowing better interpretability. Interpretations of prosody attributes are provided together with the comparison between word-level and phone-level prosody representations. Moreover, both qualitative and quantitative evaluations are used to demonstrate the improvement in the disentanglement of the latent dimensions.

preprint2020arXiv

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

Recent neural text-to-speech (TTS) models with fine-grained latent features enable precise control of the prosody of synthesized speech. Such models typically incorporate a fine-grained variational autoencoder (VAE) structure, extracting latent features at each input token (e.g., phonemes). However, generating samples with the standard VAE prior often results in unnatural and discontinuous speech, with dramatic prosodic variation between tokens. This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples. This is accomplished by discretizing the latent features using vector quantization (VQ), and separately training an autoregressive (AR) prior model over the result. We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes. Experimental results show that the proposed model significantly improves the naturalness in random sample generation. Furthermore, initial experiments demonstrate that randomly sampling from the proposed model can be used as data augmentation to improve the ASR performance.

preprint2020arXiv

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with self-supervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 33 BLEU on ro-en translation without any parallel data or back-translation.

preprint2020arXiv

Managing Recurrent Virtual Network Updates in Multi-Tenant Datacenters: A System Perspective

With the advent of software-defined networking, network configuration through programmable interfaces becomes practical, leading to various on-demand opportunities for network routing update in multi-tenant datacenters, where tenants have diverse requirements on network routings such as short latency, low path inflation, large bandwidth, high reliability, etc. Conventional solutions that rely on topology search coupled with an objective function https:// www.overleaf.com/project/5beb742041ab9c0e3caec84f to find desired routings have at least two shortcomings: (i) they run into scalability issues when handling consistent and frequent routing updates and (ii) they restrict the flexibility and capability to satisfy various routing requirements. To address these issues, this paper proposes a novel search and optimization decoupled design, which not only saves considerable topology search costs via search result reuse, but also avoids possible sub-optimality in greedy routing search algorithms by making decisions based on the global view of all possible routings. We implement a prototype of our proposed system, OpReduce, and perform extensive evaluations to validate its design goals.

preprint2020arXiv

Super-resolution single-photon imaging at 8.2 kilometers

Single-photon light detection and ranging (LiDAR), offering single-photon sensitivity and picosecond time resolution, has been widely adopted for active imaging applications. Long-range active imaging is a great challenge, because the spatial resolution degrades significantly with the imaging range due to the diffraction limit of the optics, and only weak echo signal photons can return but mixed with a strong background noise. Here we propose and demonstrate a photon-efficient LiDAR approach that can achieve sub-Rayleigh resolution imaging over long ranges. This approach exploits fine sub-pixel scanning and a deconvolution algorithm tailored to this long-range application. Using this approach, we experimentally demonstrated active three-dimensional (3D) single-photon imaging by recognizing different postures of a mannequin model at a stand-off distance of 8.2 km in both daylight and night. The observed spatial (transversal) resolution is about 5.5 cm at 8.2 km, which is about twice of the system's resolution. This also beats the optical system's Rayleigh criterion. The results are valuable for geosciences and target recognition over long ranges.

preprint2020arXiv

Tunable Phase Boundaries and Ultra-Strong Coupling Superconductivity in Mirror Symmetric Magic-Angle Trilayer Graphene

Moiré superlattices have recently emerged as a novel platform where correlated physics and superconductivity can be studied with unprecedented tunability. Although correlated effects have been observed in several other moiré systems, magic-angle twisted bilayer graphene (MATBG) remains the only one where robust superconductivity has been reproducibly measured. Here we realize a new moiré superconductor, mirror symmetric magic-angle twisted trilayer graphene (MATTG) with dramatically richer tunability in electronic structure and superconducting properties. Hall effect and quantum oscillations measurements as a function of density and electric field allow us to determine the system's tunable phase boundaries in the normal state. Zero magnetic field resistivity measurements then reveal that the existence of superconductivity is intimately connected to the broken symmetry phase emerging from two carriers per moiré unit cell. Strikingly, we find that the superconducting phase gets suppressed and bounded at the van Hove singularities (vHs) partially surrounding the broken-symmetry phase, which is difficult to reconcile with weak-coupling BCS theory. Moreover, the extensive in situ tunability of our system allows us to achieve the ultra-strong coupling regime, characterized by a Ginzburg-Landau coherence length reaching the average inter-particle distance and very large $T_\mathrm{BKT}/T_{F}$ ratios in excess of 0.1, where $T_\mathrm{BKT}$ and $T_F$ are the Berezinskii-Kosterlitz-Thouless transition and Fermi temperatures, respectively. These observations suggest that MATTG can be electrically tuned close to the two-dimensional BCS-BEC crossover. Our results establish a new generation of tunable moiré superconductors with the potential to revolutionize our fundamental understanding and the applications of strong coupling superconductivity.

preprint2019arXiv

Cascade of Phase Transitions and Dirac Revivals in Magic Angle Graphene

Twisted bilayer graphene near the magic angle exhibits remarkably rich electron correlation physics, displaying insulating, magnetic, and superconducting phases. Here, using measurements of the local electronic compressibility, we reveal that these phases originate from a high-energy state with an unusual sequence of band populations. As carriers are added to the system, rather than filling all the four spin and valley flavors equally, we find that the population occurs through a sequence of sharp phase transitions, which appear as strong asymmetric jumps of the electronic compressibility near integer fillings of the moire lattice. At each transition, a single spin/valley flavor takes all the carriers from its partially filled peers, "resetting" them back to the vicinity of the charge neutrality point. As a result, the Dirac-like character observed near the charge neutrality reappears after each integer filling. Measurement of the in-plane magnetic field dependence of the chemical potential near filling factor one reveals a large spontaneous magnetization, further substantiating this picture of a cascade of symmetry breakings. The sequence of phase transitions and Dirac revivals is observed at temperatures well above the onset of the superconducting and correlated insulating states. This indicates that the state we reveal here, with its strongly broken electronic flavor symmetry and revived Dirac-like electronic character, is a key player in the physics of magic angle graphene, forming the parent state out of which the more fragile superconducting and correlated insulating ground states emerge.

preprint2019arXiv

Electric Field Tunable Correlated States and Magnetic Phase Transitions in Twisted Bilayer-Bilayer Graphene

The recent discovery of correlated insulator states and superconductivity in magic-angle twisted bilayer graphene has paved the way to the experimental investigation of electronic correlations in tunable flat band systems realized in twisted van der Waals heterostructures. This novel twist angle degree of freedom and control should be generalizable to other 2D systems, which may exhibit similar correlated physics behavior while at the same time enabling new techniques to tune and control the strength of electron-electron interactions. Here, we report on a new highly tunable correlated system based on small-angle twisted bilayer-bilayer graphene (TBBG), consisting of two rotated sheets of Bernal-stacked bilayer graphene. We find that TBBG exhibits a rich phase diagram, with tunable correlated insulators states that are highly sensitive to both twist angle and to the application of an electric displacement field, the latter reflecting the inherent polarizability of Bernal-stacked bilayer graphene. We find correlated insulator states that can be switched on and off by the displacement field at all integer electron fillings of the moiré unit cell. The response of these correlated states to magnetic fields points towards evidence of electrically switchable magnetism. Moreover, the strong dependence of the resistance at low temperature and near the correlated insulator states indicates possible proximity to a superconducting phase. Furthermore, in the regime of lower twist angles, TBBG shows multiple sets of flat bands near charge neutrality, resulting in numerous correlated states corresponding to half-filling of each of these flat bands. Our results pave the way to the exploration of novel twist-angle and electric-field controlled correlated phases of matter in novel multi-flat band twisted superlattices.

preprint2019arXiv

Mapping the twist angle and unconventional Landau levels in magic angle graphene

The emergence of flat electronic bands and of the recently discovered strongly correlated and superconducting phases in twisted bilayer graphene crucially depends on the interlayer twist angle upon approaching the magic angle $θ_M \approx 1.1°$. Although advanced fabrication methods allow alignment of graphene layers with global twist angle control of about 0.1$°$, little information is currently available on the distribution of the local twist angles in actual magic angle twisted bilayer graphene (MATBG) transport devices. Here we map the local $θ$ variations in hBN encapsulated devices with relative precision better than 0.002$°$ and spatial resolution of a few moir$é$ periods. Utilizing a scanning nanoSQUID-on-tip, we attain tomographic imaging of the Landau levels in the quantum Hall state in MATBG, which provides a highly sensitive probe of the charge disorder and of the local band structure determined by the local $θ$. We find that even state-of-the-art devices, exhibiting high-quality global MATBG features including superconductivity, display significant variations in the local $θ$ with a span close to 0.1$°$. Devices may even have substantial areas where no local MATBG behavior is detected, yet still display global MATBG characteristics in transport, highlighting the importance of percolation physics. The derived $θ$ maps reveal substantial gradients and a network of jumps. We show that the twist angle gradients generate large unscreened electric fields that drastically change the quantum Hall state by forming edge states in the bulk of the sample, and may also significantly affect the phase diagram of correlated and superconducting states. The findings call for exploration of band structure engineering utilizing twist-angle gradients and gate-tunable built-in planar electric fields for novel correlated phenomena and applications.

preprint2019arXiv

Single-photon computational 3D imaging at 45 km

Long-range active imaging has a variety of applications in remote sensing and target recognition. Single-photon LiDAR (light detection and ranging) offers single-photon sensitivity and picosecond timing resolution, which is desirable for high-precision three-dimensional (3D) imaging over long distances. Despite important progress, further extending the imaging range presents enormous challenges because only weak echo photons return and are mixed with strong noise. Herein, we tackled these challenges by constructing a high-efficiency, low-noise confocal single-photon LiDAR system, and developing a long-range-tailored computational algorithm that provides high photon efficiency and super-resolution in the transverse domain. Using this technique, we experimentally demonstrated active single-photon 3D-imaging at a distance of up to 45 km in an urban environment, with a low return-signal level of $\sim$1 photon per pixel. Our system is feasible for imaging at a few hundreds of kilometers by refining the setup, and thus represents a significant milestone towards rapid, low-power, and high-resolution LiDAR over extra-long ranges.

preprint2019arXiv

Spaceborne low-noise single-photon detection for satellite-based quantum communications

Single-photon detectors (SPDs) play important roles in highly sensitive detection applications, such as fluorescence spectroscopy, remote sensing and ranging, deep space optical communications, elementary particle detection, and quantum communications. However, the adverse conditions in space, such as the increased radiation flux and thermal vacuum, severely limit their noise performances, reliability, and lifetime. Herein, we present the first example of spaceborne, low-noise, high reliability SPDs, based on commercial off-the-shelf (COTS) silicon avalanche photodiodes (APD). Based on the high noise-radiation sensitivity of silicon APD, we have developed special shielding structures, multistage cooling technologies, and configurable driver electronics that significantly improved the COTS APD reliability and mitigated the SPD noise-radiation sensitivity. This led to a reduction of the expected in-orbit radiation-induced dark count rate (DCR) from ~219 counts per second (cps) per day to ~0.76 cps/day. During a continuous period of continuous operations in orbit which spanned of 1029 days, the SPD DCR was maintained below 1000 cps, i.e., the actual in-orbit radiation-induced DCR increment rate was ~0.54 cps/day, i.e., two orders of magnitude lower than those evoked by previous technologies, while its photon detection efficiency was > 45%. Our spaceborne, low-noise SPDs established a feasible satellite-based up-link quantum communication that was validated on the quantum experiment science satellite platform. Moreover, our SPDs open new windows of opportunities for space research and applications in deep-space optical communications, single-photon laser ranging, as well as for testing the fundamental principles of physics in space.

preprint2019arXiv

Strange metal in magic-angle graphene with near Planckian dissipation

Recent experiments on magic-angle twisted bilayer graphene have discovered correlated insulating behavior and superconductivity at a fractional filling of an isolated narrow band. In this paper we show that magic-angle bilayer graphene exhibits another hallmark of strongly correlated systems --- a broad regime of $T-$linear resistivity above a small, density dependent, crossover temperature--- for a range of fillings near the correlated insulator. We also extract a transport "scattering rate", which satisfies a near Planckian form that is universally related to the ratio of $(k_BT/\hbar)$. Our results establish magic-angle bilayer graphene as a highly tunable platform to investigate strange metal behavior, which could shed light on this mysterious ubiquitous phase of correlated matter.

preprint2019arXiv

Universal transfer and stacking technique of van der Waals heterostructures for spintronics

The key to achieving high-quality van der Waals heterostructure devices made from various two-dimensional (2D) materials lies in the control over clean and flexible interfaces. However, existing transfer methods based on different mediators possess insufficiencies including the presence of residues, the unavailability of flexible interface engineering, and the selectivity towards materials and substrates since their adhesions differ considerably with the various preparation conditions, from chemical vapor deposition (CVD) growth to mechanical exfoliation. In this paper, we introduce a more universal method using a prefabricated polyvinyl alcohol (PVA) film to transfer and stack 2D materials, whether they are prepared by CVD or exfoliation. This peel-off and drop-off technique promises an ideal interface of the materials without introducing contamination. In addition, the method exhibits a micron-scale spatial transfer accuracy and meets special experimental conditions such as the preparation of twisted graphene and the 2D/metal heterostructure construction. We illustrate the superiority of this method with a WSe2 vertical spin valve device, whose performance verifies the applicability and advantages of such a method for spintronics. Our PVA-assisted transfer process will promote the development of high-performance 2D-material-based devices.

preprint2018arXiv

Full Implementation of four-intensity Protocol for Measurement-Device-Independent Quantum Key Distribution over Asymmetric Channel

We study the the optimization of full implementation of the four-intensity decoy-state Measurement-Device-Independent Quantum Key Distribution (MDIQKD) over asymmetric and unstable quantum channel.

preprint2016arXiv

Complete classification of $(δ+αu^2)$-constacyclic codes over $\mathbb{F}_{2^m}[u]/\langle u^4\rangle$ of oddly even length

Let $\mathbb{F}_{2^m}$ be a finite field of cardinality $2^m$, $R=\mathbb{F}_{2^m}[u]/\langle u^4\rangle)$ and $n$ is an odd positive integer. For any $δ,α\in \mathbb{F}_{2^m}^{\times}$, ideals of the ring $R[x]/\langle x^{2n}-(δ+αu^2)\rangle$ are identified as $(δ+αu^2)$-constacyclic codes of length $2n$ over $R$. In this paper, an explicit representation and enumeration for all distinct $(δ+αu^2)$-constacyclic codes of length $2n$ over $R$ are presented.

preprint2016arXiv

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

preprint2016arXiv

Left dihedral codes over Galois rings ${\rm GR}(p^2,m)$

Let $D_{2n}=\langle x,y\mid x^n=1, y^2=1, yxy=x^{-1}\rangle$ be a dihedral group, and $R={\rm GR}(p^2,m)$ be a Galois ring of characteristic $p^2$ and cardinality $p^{2m}$ where $p$ is a prime. Left ideals of the group ring $R[D_{2n}]$ are called left dihedral codes over $R$ of length $2n$, and abbreviated as left $D_{2n}$-codes over $R$. Let ${\rm gcd}(n,p)=1$ in this paper. Then any left $D_{2n}$-code over $R$ is uniquely decomposed into a direct sum of concatenated codes with inner codes ${\cal A}_i$ and outer codes $C_i$, where ${\cal A}_i$ is a cyclic code over $R$ of length $n$ and $C_i$ is a skew cyclic code of length $2$ over an extension Galois ring or principal ideal ring of $R$, and a generator matrix and basic parameters for each outer code $C_i$ is given. Moreover, a formula to count the number of these codes is obtained, the dual code for each left $D_{2n}$-code is determined and all self-dual left $D_{2n}$-codes and self-orthogonal left $D_{2n}$-codes over $R$ are presented, respectively.

preprint2015arXiv

Cyclic codes over $\mathbb{F}_{2^m}[u]/\langle u^k\rangle$ of oddly even length

Let $\mathbb{F}_{2^m}$ be a finite field of characteristic $2$ and $R=\mathbb{F}_{2^m}[u]/\langle u^k\rangle=\mathbb{F}_{2^m} +u\mathbb{F}_{2^m}+\ldots+u^{k-1}\mathbb{F}_{2^m}$ ($u^k=0$) where $k\in \mathbb{Z}^{+}$ satisfies $k\geq 2$. For any odd positive integer $n$, it is known that cyclic codes over $R$ of length $2n$ are identified with ideals of the ring $R[x]/\langle x^{2n}-1\rangle$. In this paper, an explicit representation for each cyclic code over $R$ of length $2n$ is provided and a formula to count the number of codewords in each code is given. Then a formula to calculate the number of cyclic codes over $R$ of length $2n$ is obtained. Moreover, the dual code of each cyclic code and self-dual cyclic codes over $R$ of length $2n$ are investigated. (AAECC-1522)

preprint2015arXiv

Experimental round-robin differential phase-shift quantum key distribution

In conventional quantum key distribution (QKD) protocols, security is guaranteed by estimating the amount of leaked information through monitoring signal disturbance, which, in practice, is generally caused by environmental noise and device imperfections rather than eavesdropping. Such estimation therefore tends to overrate the amount of leaked information in practice, leads to a fundamental threshold of the bit error rate. The threshold becomes a bottleneck of the development of practical QKD systems. In classical communication, according to Shannon's communication theory, information can transform through a noisy channel even if the background noise is very strong compare to the signal and hence the threshold of the bit error rate tends to 50%. One might wonder whether a QKD scheme can also tolerate error rate as high as 50%. The question is answered affirmatively with the recent work of round-robin differential phase-shift (RRDPS) protocol, which breaks through the fundamental threshold of the bit error rate and indicates another potential direction in the field of quantum cryptography. The key challenge to realize the RRDPS scheme lies on the measurement device, which requires a variable-delay interferometer. The delay needs to be chosen from a set of predetermined values randomly. Such measurement can be realized by switching between many interferometers with different delays at a high speed in accordance with the system repetition rate. The more delay values can be chosen from, the higher error rate can be tolerated. By designing an optical system with multiple switches and employing an active phase stabilization technology, we successfully construct a variable-delay interferometer with 128 actively selectable delays. With this measurement, we experimentally demonstrate the RRDPS QKD protocol and obtain a final key rate of 15.54 bps via a total loss of 18 dB and 8.9% error rate.

preprint2015arXiv

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

This paper proposes a unified framework to quantify local and global inferential uncertainty for high dimensional nonparanormal graphical models. In particular, we consider the problems of testing the presence of a single edge and constructing a uniform confidence subgraph. Due to the presence of unknown marginal transformations, we propose a pseudo likelihood based inferential approach. In sharp contrast to the existing high dimensional score test method, our method is free of tuning parameters given an initial estimator, and extends the scope of the existing likelihood based inferential framework. Furthermore, we propose a U-statistic multiplier bootstrap method to construct the confidence subgraph. We show that the constructed subgraph is contained in the true graph with probability greater than a given nominal level. Compared with existing methods for constructing confidence subgraphs, our method does not rely on Gaussian or sub-Gaussian assumptions. The theoretical properties of the proposed inferential methods are verified by thorough numerical experiments and real data analysis.

preprint2015arXiv

On $(α+uβ)$-constacyclic codes of length $p^sn$ over $\mathbb{F}_{p^m}+u\mathbb{F}_{p^m}$

Let $\mathbb{F}_{p^m}$ be a finite field of cardinality $p^m$ and $R=\mathbb{F}_{p^m}[u]/\langle u^2\rangle=\mathbb{F}_{p^m}+u\mathbb{F}_{p^m}$ $(u^2=0)$, where $p$ is an odd prime and $m$ is a positive integer. For any $α,β\in \mathbb{F}_{p^m}^{\times}$, the aim of this paper is to represent all distinct $(α+uβ)$-constacyclic codes over $R$ of length $p^sn$ and their dual codes, where $s$ is a nonnegative integer and $n$ is a positive integer satisfying ${\rm gcd}(p,n)=1$. Especially, all distinct $(2+u)$-constacyclic codes of length $6\cdot 5^t$ over $\mathbb{F}_{3}+u\mathbb{F}_3$ and their dual codes are listed, where $t$ is a positive integer.

preprint2015arXiv

On a class of $(δ+αu^2)$-constacyclic codes over $\mathbb{F}_{q}[u]/\langle u^4\rangle$

Let $\mathbb{F}_{q}$ be a finite field of cardinality $q$, $R=\mathbb{F}_{q}[u]/\langle u^4\rangle=\mathbb{F}_{q}+u\mathbb{F}_{q}+u^2\mathbb{F}_{q}+u^3\mathbb{F}_{q}$ $(u^4=0)$ which is a finite chain ring, and $n$ be a positive integer satisfying ${\rm gcd}(q,n)=1$. For any $δ,α\in \mathbb{F}_{q}^{\times}$, an explicit representation for all distinct $(δ+αu^2)$-constacyclic codes over $R$ of length $n$ is given, and the dual code for each of these codes is determined. For the case of $q=2^m$ and $δ=1$, all self-dual $(1+αu^2)$-constacyclic codes over $R$ of odd length $n$ are provided.

preprint2015arXiv

Training Conditional Random Fields with Natural Gradient Descent

We propose a novel parameter estimation procedure that works efficiently for conditional random fields (CRF). This algorithm is an extension to the maximum likelihood estimation (MLE), using loss functions defined by Bregman divergences which measure the proximity between the model expectation and the empirical mean of the feature vectors. This leads to a flexible training framework from which multiple update strategies can be derived using natural gradient descent (NGD). We carefully choose the convex function inducing the Bregman divergence so that the types of updates are reduced, while making the optimization procedure more effective by transforming the gradients of the log-likelihood loss function. The derived algorithms are very simple and can be easily implemented on top of the existing stochastic gradient descent (SGD) optimization procedure, yet it is very effective as illustrated by experimental results.

preprint2013arXiv

Entanglement-based quantum key distribution with biased basis choice via free space

We report a free-space entanglement-based quantum key distribution experiment, implementing the biased basis protocol between two sites which are 15.3 km apart. Photon pairs from a polarization-entangled source are distributed through two 7.8-km free-space optical links. An optimal bias 20:80 between the X and Z basis is used. A post-processing scheme with finite-key analysis is applied to extract the final secure key. After three-hour continuous operation at night, a 4293-bit secure key is obtained, with a final key rate of 0.124 bit per raw key bit which increases the final key rate by 14.8% comparing to the standard BB84 case. Our results experimentally demonstrate that the efficient BB84 protocol, which increases key generation efficiency by biasing Alice and Bob's basis choices, is potentially useful for the ground-satellite quantum communication.

preprint2013arXiv

Experimental Single-Photon Transmission from Satellite to Earth

Free-space quantum communication with satellites opens a promising avenue for global secure quantum network and large-scale test of quantum foundations. Recently, numerous experimental efforts have been carried out towards this ambitious goal. However, one essential step - transmitting single photons from the satellite to the ground with high signal-to-noise ratio (SNR) at realistic environments - remains experimental challenging. Here, we report a direct experimental demonstration of the satellite-ground transmission of a quasi-single-photon source. In the experiment, single photons (~0.85 photon per pulse) are generated by reflecting weak laser pulses back to earth with a cube-corner retro-reflector on the satellite Champ, collected by a 600-mm diameter telescope at the ground station, and finally detected by single-photon counting modules (SPCMs) after 400-km free-space link transmission. With the help of high accuracy time synchronization, narrow receiver field-of-view (FOV) and high-repetition-rate pulses (76 MHz), a SNR of better than 16:1 is obtained, which is sufficient for a secure quantum key distribution. Our experimental results represent an important step towards satellite-ground quantum communication.

preprint2013arXiv

Experimental unconditionally secure bit commitment

Bit commitment is a fundamental cryptographic task that guarantees a secure commitment between two mutually mistrustful parties and is a building block for many cryptographic primitives, including coin tossing, zero-knowledge proofs, oblivious transfer and secure two-party computation. Unconditionally secure bit commitment was thought to be impossible until recent theoretical protocols that combine quantum mechanics and relativity were shown to elude previous impossibility proofs. Here we implement such a bit commitment protocol. In the experiment, the committer performs quantum measurements using two quantum key distribution systems and the results are transmitted via free-space optical communication to two agents separated with more than 20 km. The security of the protocol relies on the properties of quantum information and relativity theory. We show that, in each run of the experiment, a bit is successfully committed with less than 5.68*10^-2 cheating probability. Our result demonstrates unconditionally secure bit commitment and the experimental feasibility of relativistic quantum communication.

preprint2013arXiv

Quantum teleportation and entanglement distribution over 100-kilometre free-space channels

A long standing goal for quantum communication is to transfer a quantum state over arbitrary distances. Free-space quantum communication provides a promising solution towards this challenging goal. Here, through a 97-km free space channel, we demonstrate long distance quantum teleportation over a 35-53 dB loss one-link channel, and entanglement distribution over a 66-85 dB high-loss two-link channel. We achieve an average fidelity of {80.4(9)}% for teleporting six distinct initial states and observe the violation of the Clauser-Horne-Shimony-Holt inequality after distributing entanglement. Besides being of fundamental interest, our result represents a significant step towards a global quantum network. Moreover, the high-frequency and high-accuracy acquiring, pointing and tracking technique developed in our experiment provides an essential tool for future satellite-based quantum communication.

Yuan Cao

What is connected

Connect this record

See the researcher in context

Building this map preview

56 published item(s)

Forbidden second harmonics in centrosymmetric bilayer crystals

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Benign Overfitting in Two-layer Convolutional Neural Networks

Building Machine Translation Systems for the Next Thousand Languages

Description-Driven Task-Oriented Dialog Modeling

Micius quantum experiments in space

Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation

Portable ground stations for space-to-ground quantum key distribution

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Spin Manipulation by Giant Valley-Zeeman Spin-Orbit Field in Atom-Thick WSe2

The geometry of integration in text classification RNNs

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Unsupervised Slot Schema Induction for Task-oriented Dialog

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

Benign Overfitting in Adversarially Robust Linear Classification

Echo State Speech Recognition

Fractional Chern insulators in magic-angle twisted bilayer graphene

High-Temperature Structure Detection in Ferromagnets

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Quantum Random Number Generation with Uncharacterized Laser and Sunlight

Agnostic Learning of a Single Neuron with Gradient Descent

An explicit expression for Euclidean self-dual cyclic codes of length $2^k$ over Galois ring ${\rm GR}(4,m)$

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Deep-Learning-Enabled Fast Optical Identification and Characterization of Two-Dimensional Materials

Echo State Neural Machine Translation

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Managing Recurrent Virtual Network Updates in Multi-Tenant Datacenters: A System Perspective

Super-resolution single-photon imaging at 8.2 kilometers

Tunable Phase Boundaries and Ultra-Strong Coupling Superconductivity in Mirror Symmetric Magic-Angle Trilayer Graphene

Cascade of Phase Transitions and Dirac Revivals in Magic Angle Graphene

Electric Field Tunable Correlated States and Magnetic Phase Transitions in Twisted Bilayer-Bilayer Graphene

Mapping the twist angle and unconventional Landau levels in magic angle graphene

Single-photon computational 3D imaging at 45 km

Spaceborne low-noise single-photon detection for satellite-based quantum communications

Strange metal in magic-angle graphene with near Planckian dissipation

Universal transfer and stacking technique of van der Waals heterostructures for spintronics

Full Implementation of four-intensity Protocol for Measurement-Device-Independent Quantum Key Distribution over Asymmetric Channel

Complete classification of $(δ+αu^2)$-constacyclic codes over $\mathbb{F}_{2^m}[u]/\langle u^4\rangle$ of oddly even length

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Left dihedral codes over Galois rings ${\rm GR}(p^2,m)$

Cyclic codes over $\mathbb{F}_{2^m}[u]/\langle u^k\rangle$ of oddly even length

Experimental round-robin differential phase-shift quantum key distribution

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

On $(α+uβ)$-constacyclic codes of length $p^sn$ over $\mathbb{F}_{p^m}+u\mathbb{F}_{p^m}$

On a class of $(δ+αu^2)$-constacyclic codes over $\mathbb{F}_{q}[u]/\langle u^4\rangle$

Training Conditional Random Fields with Natural Gradient Descent

Entanglement-based quantum key distribution with biased basis choice via free space

Experimental Single-Photon Transmission from Satellite to Earth

Experimental unconditionally secure bit commitment

Quantum teleportation and entanglement distribution over 100-kilometre free-space channels