Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
192works
0followers
51topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

192 published item(s)

preprint2026arXiv

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning. Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence. To tackle the limitation, we study direct corpus interaction (DCI), where an agent searches the raw corpus directly with general-purpose terminal tools (e.g., grep, file reads, shell commands, lightweight scripts), without any embedding model, vector index, or retrieval API. This approach requires no offline indexing and adapts naturally to evolving local corpora. Across IR benchmarks and end-to-end agentic search tasks, this simple setup substantially outperforms strong sparse, dense, and reranking baselines on several BRIGHT and BEIR datasets, and attains strong accuracy on BrowseComp-Plus and multi-hop QA without relying on any conventional semantic retriever. Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus, with which DCI opens a broader interface-design space for agentic search.

preprint2026arXiv

MAIC-UI: Making Interactive Courseware with Generative UI

Creating interactive STEM courseware traditionally requires HTML/CSS/JavaScript expertise, leaving barriers for educators. While generative AI can produce HTML codes, existing tools generate static presentations rather than interactive simulations, struggle with long documents, and lack pedagogical accuracy mechanisms. Furthermore, full regeneration for modifications requires 200--600 seconds, disrupting creative flow. We present MAIC-UI, a zero-code authoring system that enables educators to create and rapidly edit interactive courseware from textbooks, PPTs, and PDFs. MAIC-UI employs: (1) structured knowledge analysis with multi-modal understanding to ensure pedagogical rigor; (2) a two-stage generate-verify-optimize pipeline separating content alignment from visual refinement; and (3) Click-to-Locate editing with Unified Diff-based incremental generation achieving sub-10-second iteration cycles. A controlled lab study with 40 participants shows MAIC-UI reduces editing iterations (4.9 vs. 7.0) and significantly improves learnability and controllability compared to direct Text-to-HTML generation. A three-month classroom deployment with 53 high school students demonstrates that MAIC-UI fosters learning agency and reduces outcome disparities -- the pilot class achieved 9.21-point gains in STEM subjects compared to -2.32 points in control classes. Our code is available at https://github.com/THU-MAIC/MAIC-UI.

preprint2025arXiv

Antarctic TianMu Staring Observation Project II: Data reduction and preliminary results

The Antarctic TianMu Staring Observation Program is a time-domain optical sky survey project carried out in Antarctica, capable of large sky coverage, high-cadence sampling, and long-period staring. It utilizes the exceptional observing conditions in Antarctica to conduct high-cadence time-domain sky surveys. At present, we have successfully developed an 18-cm aperture Antarctic TianMu prototype, which has been deployed at Zhongshan Station in Antarctica for two consecutive years of trouble-free observations, during which more than 300,000 original images were obtained. This paper systematically outlines the commissioning data of the prototype telescope in 2023, the primary data processing pipeline, and the preliminary data products. The core pipeline encompasses four key stages: Data preprocessing, instrumental effect correction, astrometric solution, and full-field stellar photometry. Here, we release the 2023 data products, which specifically include reduced image data and a photometric catalog, for which, preliminary analyses demonstrate robust performance. Using Gaia Data Release 3 as a reference catalog, the astrometric precision, quantified by the root mean square of positional errors, is determined to be better than approximately 2 arcseconds, validating the observational capabilities of the system. For a 30-second exposure, the detection limit in the G-band is achieved at 15.00~mag, with a detection threshold of 1.5~$σ$. The photometric errors are below 0.1~mag for the majority of stars brighter than 14.00~mag. Furthermore, it improves significantly, reaching better than 0.01~mag for most stars brighter than 11.00~mag and 12.00~mag when employing the adaptive aperture photometry and point spread function photometry methods, respectively.

preprint2025arXiv

Chiral dual spin currents field-free perpendicular switching by altermagnet RuO2

Conventional spintronic mechanisms, such as spin-transfer and spin-orbit torques based on the spin current, rely on breaking time-reversal symmetry to manipulate magnetic moments. In contrast, for spatially separated dual spin currents, the time-reversal-invariant vector chirality emerges as a critical factor governing magnetization dynamics. Here, we investigate field-free perpendicular magnetization switching in an altermagnet RuO2/ferromagnet/heavy metal Pt trilayer, driven by chiral dual spin currents (CDSC). We demonstrate that the chirality of these dual spin currents acts as the deterministic role in breaking out-of-plane symmetry. Leveraging the intrinsic spin-splitting effect of the d-wave altermagnet to generate an x-polarized spin component, the interplay of non-collinear spin currents from two adjacent layers induces a helical magnetic texture within the intermediate layer. The resulting intralayer exchange coupling manifests as an effective in-plane magnetic field, facilitating deterministic switching. This distinct physical picture, validated by switching measurements and micromagnetic simulations, reveals that the switching polarity is dictated by chirality rather than charge current polarity. Characterized by the novel symmetry and low power consumption, CDSC offers a promising paradigm for next-generation high-performance spintronic architectures.

preprint2024arXiv

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess. This paper provides empirical evidence of CodeFuse-Query's transformative approach, demonstrating its robustness, scalability, and efficiency. We also highlight its real-world impact and diverse applications, emphasizing its potential to reshape the landscape of static code analysis in the context of large-scale software development.Furthermore, in the spirit of collaboration and advancing the field, our project is open-sourced and the repository is available for public access

preprint2024arXiv

LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery

Lake extraction from remote sensing images is challenging due to the complex lake shapes and inherent data noises. Existing methods suffer from blurred segmentation boundaries and poor foreground modeling. This paper proposes a hybrid CNN-Transformer architecture, called LEFormer, for accurate lake extraction. LEFormer contains three main modules: CNN encoder, Transformer encoder, and cross-encoder fusion. The CNN encoder effectively recovers local spatial information and improves fine-scale details. Simultaneously, the Transformer encoder captures long-range dependencies between sequences of any length, allowing them to obtain global features and context information. The cross-encoder fusion module integrates the local and global features to improve mask prediction. Experimental results show that LEFormer consistently achieves state-of-the-art performance and efficiency on the Surface Water and the Qinghai-Tibet Plateau Lake datasets. Specifically, LEFormer achieves 90.86% and 97.42% mIoU on two datasets with a parameter count of 3.61M, respectively, while being 20 minor than the previous best lake extraction method. The source code is available at https://github.com/BastianChen/LEFormer.

preprint2024arXiv

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.

preprint2023arXiv

Adaptive Rank-based Tests for High Dimensional Mean Problems

The Wilcoxon signed-rank test and the Wilcoxon-Mann-Whitney test are commonly employed in one sample and two sample mean tests for one-dimensional hypothesis problems. For high-dimensional mean test problems, we calculate the asymptotic distribution of the maximum of rank statistics for each variable and suggest a max-type test. This max-type test is then merged with a sum-type test, based on their asymptotic independence offered by stationary and strong mixing assumptions. Our numerical studies reveal that this combined test demonstrates robustness and superiority over other methods, especially for heavy-tailed distributions.

preprint2023arXiv

Deep Learning of Near Field Beam Focusing in Terahertz Wideband Massive MIMO Systems

Employing large antenna arrays and utilizing large bandwidth have the potential of bringing very high data rates to future wireless communication systems. However, this brings the system into the near-field regime and also makes the conventional transceiver architectures suffer from the wideband effects. To address these problems, in this paper, we propose a low-complexity frequency-aware beamforming solution that is designed for hybrid time-delay and phase-shifter based RF architectures. To reduce the complexity, the joint design problem of the time delays and phase shifts is decomposed into two subproblems, where a signal model inspired online learning framework is proposed to learn the shifts of the quantized analog phase shifters, and a low-complexity geometry-assisted method is leveraged to configure the delay settings of the time-delay units. Simulation results highlight the efficacy of the proposed solution in achieving robust performance across a wide frequency range for large antenna array systems.

preprint2023arXiv

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.

preprint2023arXiv

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

Diffusion probabilistic model (DPM) recently becomes one of the hottest topic in computer vision. Its image generation application such as Imagen, Latent Diffusion Models and Stable Diffusion have shown impressive generation capabilities, which aroused extensive discussion in the community. Many recent studies also found it is useful in many other vision tasks, like image deblurring, super-resolution and anomaly detection. Inspired by the success of DPM, we propose the first DPM based model toward general medical image segmentation tasks, which we named MedSegDiff. In order to enhance the step-wise regional attention in DPM for the medical image segmentation, we propose dynamic conditional encoding, which establishes the state-adaptive conditions for each sampling step. We further propose Feature Frequency Parser (FF-Parser), to eliminate the negative effect of high-frequency noise component in this process. We verify MedSegDiff on three medical segmentation tasks with different image modalities, which are optic cup segmentation over fundus images, brain tumor segmentation over MRI images and thyroid nodule segmentation over ultrasound images. The experimental results show that MedSegDiff outperforms state-of-the-art (SOTA) methods with considerable performance gap, indicating the generalization and effectiveness of the proposed model. Our code is released at https://github.com/WuJunde/MedSegDiff.

preprint2023arXiv

PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry

In this paper, we introduce PCR-CG: a novel 3D point cloud registration module explicitly embedding the color signals into the geometry representation. Different from previous methods that only use geometry representation, our module is specifically designed to effectively correlate color into geometry for the point cloud registration task. Our key contribution is a 2D-3D cross-modality learning algorithm that embeds the deep features learned from color signals to the geometry representation. With our designed 2D-3D projection module, the pixel features in a square region centered at correspondences perceived from images are effectively correlated with point clouds. In this way, the overlapped regions can be inferred not only from point cloud but also from the texture appearances. Adding color is non-trivial. We compare against a variety of baselines designed for adding color to 3D, such as exhaustively adding per-pixel features or RGB values in an implicit manner. We leverage Predator [25] as the baseline method and incorporate our proposed module onto it. To validate the effectiveness of 2D features, we ablate different 2D pre-trained networks and show a positive correlation between the pre-trained weights and the task performance. Our experimental results indicate a significant improvement of 6.5% registration recall over the baseline method on the 3DLoMatch benchmark. We additionally evaluate our approach on SOTA methods and observe consistent improvements, such as an improvement of 2.4% registration recall over GeoTransformer as well as 3.5% over CoFiNet. Our study reveals a significant advantages of correlating explicit deep color features to the point cloud in the registration task.

preprint2023arXiv

Quantum simulation of molecular response properties

Accurate modeling of the response of molecular systems to an external electromagnetic field is challenging on classical computers, especially in the regime of strong electronic correlation. In this paper, we develop a quantum linear response (qLR) theory to calculate molecular response properties on near-term quantum computers. Inspired by the recently developed variants of the quantum counterpart of equation of motion (qEOM) theory, the qLR formalism employs "killer condition" satisfying excitation operator manifolds that offers a number of theoretical advantages along with reduced quantum resource requirements. We also used the qEOM framework in this work to calculate state-specific response properties. Further, through noise-less quantum simulations, we show that response properties calculated using the qLR approach are more accurate than the ones obtained from the classical coupled-cluster based linear response models due to the improved quality of the ground-state wavefunction obtained using the ADAPT-VQE algorithm.

preprint2023arXiv

Super-Resolution Harmonic Retrieval of Non-Circular Signals

This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively. Accordingly, the augmented covariance matrix constructed by the covariance and pseudo-covariance matrices is not only low rank but also jointly Toeplitz-Hankel structured. To efficiently exploit such a desired structure for high estimation accuracy, we develop a low-rank Toeplitz-Hankel covariance reconstruction (LRTHCR) solution employed over the augmented covariance matrix. Further, we design a fitting error constraint to flexibly implement the LRTHCR algorithm without knowing the noise statistics. In addition, performance analysis is provided for the proposed LRTHCR in practical settings. Simulation results reveal that the LRTHCR outperforms the benchmark methods in terms of lower estimation errors.

preprint2023arXiv

Tac2Structure: Object Surface Reconstruction Only through Multi Times Touch

Inspired by humans' ability to perceive the surface texture of unfamiliar objects without relying on vision, the sense of touch can play a crucial role in robots exploring the environment, particularly in scenes where vision is difficult to apply, or occlusion is inevitable. Existing tactile surface reconstruction methods rely on external sensors or have strong prior assumptions, making the operation complex and limiting their application scenarios. This paper presents a framework for low-drift surface reconstruction through multiple tactile measurements, Tac2Structure. Compared with existing algorithms, the proposed method uses only a new vision-based tactile sensor without relying on external devices. Aiming at the difficulty that reconstruction accuracy is easily affected by the pressure at contact, we propose a correction algorithm to adapt it. The proposed method also reduces the accumulative errors that occur easily during global object surface reconstruction. Multi-frame tactile measurements can accurately reconstruct object surfaces by jointly using the point cloud registration algorithm, loop-closure detection algorithm based on deep learning, and pose graph optimization algorithm. Experiments verify that Tac2Structure can achieve millimeter-level accuracy in reconstructing the surface of objects, providing accurate tactile information for the robot to perceive the surrounding environment.

preprint2022arXiv

$J/ψ$ associated production with a bottom quark pair from the Higgs boson decay in next-to-leading order QCD

In this work, we investigate the next-to-leading order (NLO) QCD correction to $J/ψ$ associated production with a bottom quark pair from the Higgs boson decay within the nonrelativistic QCD framework. From numerical results, {we find that the decay width of process $H \rightarrow b+ J/ψ+\bar{b}$ at leading order (LO) mainly comes from the contribution of the Fock state $^3S^{(8)}_1$, and the NLO QCD corrections significantly enhance the decay width at LO accuracy by about 2 times. At NLO accuracy, the Fock states $^3S^{(8)}_1$ and $^3P^{(8)}_J$ channels give the main contribution, accounting for about $68\%$ and $29\%$ of the total decay width of $J/ψ$ associated production with a bottom quark pair at NLO accuracy from the Higgs boson decay, respectively. Considering the dominant contribution of color octet (CO) channels at NLO accuracy, the inclusive decay process $H\to b+J/ψ+\bar b + X$ has the potential to be found in future colliders with high energy/luminosity.} The study of $J/ψ$ associated production with a bottom quark pair from the Higgs boson decay is not only useful to study the mechanism of color-octet, but also to assist in the investigation of the coupling for the Higgs boson with the bottom quark.

preprint2022arXiv

A Low-speed Intruder Star in Hyades: A Temporary Residence

We hereby report a low-speed (about~21~km$\cdot$~s$^{-1}$ with respect to the Sun) intruder member in the Hyades cluster based on the data in the literature. The results show that the star is a non-native member star for the Hyades, with its radial velocity being smaller than the radial velocity of the Hyades cluster, even exceeding the standard deviation of the radial velocity of the cluster by a factor of 9. Furthermore, by analyzing and comparing the orbits of this star and its host, it may have intruded into its host in the past 2~Myr. If the star's current motion orbit remains unchanged, it may leave its host in the next 2~Myr. This implies that the intruder star may be temporarily residing in the cluster. This study presents the first observational evidence of a star intrusion into a cluster, which suggests that more evidence may be found.

preprint2022arXiv

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

In human speech, the attitude of a speaker cannot be fully expressed only by the textual content. It has to come along with the intonation. Declarative questions are commonly used in daily Cantonese conversations, and they are usually uttered with rising intonation. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences due to the loss of semantic information. Though it has become more common to complement the systems with extra language models, their performance in modeling rising intonation is not well studied. In this paper, we propose to complement the Cantonese TTS model with a BERT-based statement/question classifier. We design different training strategies and compare their performance. We conduct our experiments on a Cantonese corpus named CanTTS. Empirical results show that the separate training approach obtains the best generalization performance and feasibility.

preprint2022arXiv

A van der Waals Interface Hosting Two Groups of Magnetic Skyrmions

Multiple magnetic skyrmion phases add an additional degree of freedom for skyrmion based ultrahigh-density spin memory devices. Extending the field to two-dimensional van der Waals magnets is a rewarding challenge, where the realizable degree of freedoms (e.g. thickness, twisting angle and electrical gating) and high skyrmion density result in intriguing new properties and enhanced functionality. We report a van der Waals interface, formed by two 2D ferromagnets Cr2Ge2Te6 and Fe3GeTe2 with a Curie temperature of ~65 K and ~205 K, respectively, hosting two groups of magnetic skyrmions. Two sets of topological Hall effect are observed below 60 K when Cr2Ge2Te6 is magnetically ordered. These two groups of skyrmions are directly imaged using magnetic force microscopy. Interestingly, the magnetic skyrmions persist in the heterostructure in the remanent state with zero applied magnetic field. Our results are promising for the realization of skyrmionic devices based on van der Waals heterostructures hosting multiple skyrmion phases.

preprint2022arXiv

Accurate quantum simulation of molecular ground and excited states with a transcorrelated Hamiltonian

NISQ era devices suffer from a number of challenges like limited qubit connectivity, short coherence times and sizable gate error rates. Thus, quantum algorithms are desired that require shallow circuit depths and low qubit counts to take advantage of these devices. We attempt to realize this with the help of classical quantum chemical theories of canonical transformation and explicit correlation. In this work, compact ab initio Hamiltonians are generated classically through an approximate similarity transformation of the Hamiltonian with a) an explicitly correlated two-body unitary operator with generalized pair excitations that remove the Coulombic electron-electron singularities from the Hamiltonian and b) a unitary one-body operator to efficiently capture the orbital relaxation effects required for accurate description of the excited states. The resulting transcorelated Hamiltonians are able to describe both ground and excited states of molecular systems in a balanced manner. Using the fermionic-ADAPT-VQE method based on the unitary coupled cluster with singles and doubles (UCCSD) ansatz and only a minimal basis set (ANO-RCC-MB), we demonstrate that the transcorrelated Hamiltonians can produce ground state energies comparable to the much larger cc-pVTZ basis. This leads to a potential reduction in the number of required CNOT gates by more than three orders of magnitude for the chemical species studied in this work. Furthermore, using the qEOM formalism in conjunction with the transcorrelated Hamiltonian, we reduce the errors in excitation energies by an order of magnitude. The transcorrelated Hamiltonians developed here are Hermitian and contain only one- and two-body interaction terms and thus can be easily combined with any quantum algorithm for accurate electronic structure simulations.

preprint2022arXiv

Adversarial Filtering Modeling on Long-term User Behavior Sequences for Click-Through Rate Prediction

Rich user behavior information is of great importance for capturing and understanding user interest in click-through rate (CTR) prediction. To improve the richness, collecting long-term behaviors becomes a typical approach in academy and industry but at the cost of increasing online storage and latency. Recently, researchers have proposed several approaches to shorten long-term behavior sequence and then model user interests. These approaches reduce online cost efficiently but do not well handle the noisy information in long-term user behavior, which may deteriorate the performance of CTR prediction significantly. To obtain better cost/performance trade-off, we propose a novel Adversarial Filtering Model (ADFM) to model long-term user behavior. ADFM uses a hierarchical aggregation representation to compress raw behavior sequence and then learns to remove useless behavior information with an adversarial filtering mechanism. The selected user behaviors are fed into interest extraction module for CTR prediction. Experimental results on public datasets and industrial dataset demonstrate that our method achieves significant improvements over state-of-the-art models.

preprint2022arXiv

AnoDFDNet: A Deep Feature Difference Network for Anomaly Detection

This paper proposed a novel anomaly detection (AD) approach of High-speed Train images based on convolutional neural networks and the Vision Transformer. Different from previous AD works, in which anomalies are identified with a single image using classification, segmentation, or object detection methods, the proposed method detects abnormal difference between two images taken at different times of the same region. In other words, we cast anomaly detection problem with a single image into a difference detection problem with two images. The core idea of the proposed method is that the 'anomaly' usually represents an abnormal state instead of a specific object, and this state should be identified by a pair of images. In addition, we introduced a deep feature difference AD network (AnoDFDNet) which sufficiently explored the potential of the Vision Transformer and convolutional neural networks. To verify the effectiveness of the proposed AnoDFDNet, we collected three datasets, a difference dataset (Diff Dataset), a foreign body dataset (FB Dataset), and an oil leakage dataset (OL Dataset). Experimental results on above datasets demonstrate the superiority of proposed method. Source code are available at https://github.com/wangle53/AnoDFDNet.

preprint2022arXiv

Ask2Mask: Guided Data Selection for Masked Speech Modeling

Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations. In this work, we address this limitation. We propose ask2mask (ATM), a novel approach to focus on specific samples during MSM pre-training. ATM employs an external ASR model or \textit{scorer} to weight unsupervised input samples in two different ways: 1) A fine-grained data selection is performed by masking over the highly confident input frames as chosen by the scorer. This allows the model to learn meaningful representations. 2) ATM is further extended to focus at utterance-level by weighting the final MSM loss with the utterance-level confidence score. We conduct fine-tuning experiments on two well-benchmarked corpora: LibriSpeech (matching the pre-training data) and Commonvoice, TED-LIUM, AMI and CHiME-6 (not matching the pre-training data). The results substantiate the efficacy of ATM on significantly improving the recognition performance under mismatched conditions (up to 11.6\% relative over published results and upto 4.46\% relative over our internal baseline) while still yielding modest improvements under matched conditions.

preprint2022arXiv

Atomic-Scale Visualization of Chiral Charge Density Wave States and Their Reversible Transition

Chirality is essential for various amazing phenomena in life and matter. However,chirality and its switching in electronic superlattices, such as charge density wave(CDW) arrays, remain elusive. In this study, we characterize the chirality transition with atom-resolution imaging in a single-layer NbSe2 CDW pattern by technique of scanning tunneling microscopy. The atomic lattice of the CDW array is found continuous and intact although its chirality is switched. Several intermediate states are tracked by time-resolved imaging, revealing the fast and dynamic chirality transition. Importantly, the switching is reversibly realized with an external electric-field. Our findings unveil the delicate transition process of chiral CDW array in a 2D crystal down to the atomic scale and may be applicable for future nanoscale devices.

preprint2022arXiv

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.

preprint2022arXiv

Characterization and manipulation of intervalley scattering induced by an individual monovacancy in graphene

Intervalley scattering involves microscopic processes that electrons are scattered by atomic-scale defects on nanometer length scales. Although central to our understanding of electronic properties of materials, direct characterization and manipulation of range and strength of the intervalley scattering induced by an individual atomic defect have so far been elusive. Using scanning tunneling microscope, we visualized and controlled intervalley scattering from an individual monovacancy in graphene. By directly imaging the affected range of intervalley scattering of the monovacancy, we demonstrated that it is inversely proportional to the energy, i.e., it is proportional to the wavelength of massless Dirac Fermions. A giant electron-hole asymmetry of the intervalley scattering is observed because that the monovacancy is charged. By further charging the monovacancy, the bended electronic potential around the monovacancy softened the scattering potential, which, consequently, suppressed the intervalley scattering of the monovacancy.

preprint2022arXiv

Chiral SO(4) spin-valley density wave and degenerate topological superconductivity in magic-angle-twisted bilayer-graphene

Starting from a realistic extended Hubbard model for a $p_{x,y}$-orbital tight-binding model on the Honeycomb lattice, we perform a thorough investigation on the possible electron instabilities in the MA-TBG near the van Hove (VH) dopings. Here we focus on the interplay between the approximate SU(2)$\times$SU(2) symmetry and the $D_3$ symmetry, which leads to intriguing quantum states relevant to recent experiments, as revealed by our systematic RPA based calculations followed by a succeeding mean-field energy minimization for the ground state energy. At the SU(2)$\times$SU(2) symmetric point, the degenerate inter-valley SDW and VDW are mixed into a new state of matter dubbed as the chiral SO(4) spin-valley DW. This state simultaneously hosts three 4-component vectorial spin-valley DW orders with each adopting one wave vector, and the polarization directions of the three DW orders are mutually perpendicular to one another. %in the $\mathbb{R}^4$ space. In the presence of a tiny inter-valley exchange interaction with coefficient $J_H\to 0^{-}$ which breaks the SU(2)$\times$SU(2) symmetry, a pure chiral SDW state is obtained. In the case of $J_H\to 0^{+}$, a nematic VDW+SDW state emerges which possesses a stripy distribution of the charge density, consistent with the recent STM observations. On the aspect of SC, while the triplet $p+ip$ and singlet $d+id$ topological SCs are degenerate at $J_H=0$ near the VH dopings, the former (latter) is favored for $J_H\to 0^{-}$ ($J_H\to 0^{+}$). In addition, the two asymmetric doping-dependent behaviors of the obtained pairing phase diagram are well consistent with experiments.

preprint2022arXiv

Co-optimization of Battery Routing and Load Restoration for Microgrids with Mobile Energy Storage Systems

Mobile energy storage systems (MESS) offer great operational flexibility to enhance the resiliency of distribution systems in an emergency condition. The optimal placement and sizing of those units are pivotal for quickly restoring the curtailed loads. In this paper, we propose a model for load restoration in a microgrid while concurrently optimizing the MESS routes required for the same. The model is formulated as a mixed integer second order cone program by considering the state of charge and evolution of the lower and upper bounds of battery capacities. Simulation results tested on the IEEE 123- bus benchmark system demonstrate the efficacy of the proposed model.

preprint2022arXiv

Construction of a qudit using Schrodinger cat states and generation of hybrid entanglement between a discrete-variable qudit and a continuous-variable qudit

We show that a continuous-variable (CV) qudit can be constructed using quasiorthogonal cat states of a bosonic mode, when the phase encoded in each cat state is chosen appropriately. With the constructed CV qudit and the discrete-variable (DV) qudit encoded with Fock states, we propose an approach to generate the hybrid maximally entangled state of a CV qudit and a DV qudit by using two microwave cavities coupled to a superconducting flux qutrit. This proposal relies on the initial preparation of a superposition of Fock states of one cavity and the initial preparation of a cat state of the other cavity. After the initial state of each cavity is prepared, this proposal requires only two basic operations, i.e., the first operation employs the dispersive coupling of both cavities with the qutrit while the second operation uses the dispersive coupling of only one cavity with the qutrit. The entangled state production is deterministic and the operation time decreases as the dimensional size of each qudit increases. In addition, during the entire operation, the coupler qutrit remains in the ground state and thus decoherence from the qutrit is significantly reduced. As an example, we further discuss the experimental feasibility for generating the hybrid maximally entangled state of a DV qutrit and a CV qutrit based on circuit QED. This proposal is universal and can be extended to accomplish the same task, by using two microwave or optical cavities coupled to a natural or artificial three-level atom.

preprint2022arXiv

Contrastive Graph Learning for Population-based fMRI Classification

Contrastive self-supervised learning has recently benefited fMRI classification with inductive biases. Its weak label reliance prevents overfitting on small medical datasets and tackles the high intraclass variances. Nonetheless, existing contrastive methods generate resemblant pairs only on pixel-level features of 3D medical images, while the functional connectivity that reveals critical cognitive information is under-explored. Additionally, existing methods predict labels on individual contrastive representation without recognizing neighbouring information in the patient group, whereas interpatient contrast can act as a similarity measure suitable for population-based classification. We hereby proposed contrastive functional connectivity graph learning for population-based fMRI classification. Representations on the functional connectivity graphs are "repelled" for heterogeneous patient pairs meanwhile homogeneous pairs "attract" each other. Then a dynamic population graph that strengthens the connections between similar patients is updated for classification. Experiments on a multi-site dataset ADHD200 validate the superiority of the proposed method on various metrics. We initially visualize the population relationships and exploit potential subtypes.

preprint2022arXiv

Creation of a Modular Soft Robotic Fish Testing Platform

Research on the co-optimization of soft robotic design and control requires rapid means for real-world validation. Existing creation pipelines do not allow for the swift prototyping of soft robots to quickly test various design configurations and control policies. This work proposes a pipeline for rapid iterative design and fabrication of a miniaturized modular silicone-elastomer-based robotic fish. The modular design allows simple and rapid iterations of robotic fishes with varying configurations to assist current research efforts on the development of design optimization methods. The proposed robotic fish can serve as a standardized test platform on which performance metrics such as thrust and range of motion can be evaluated. We further show the design of an underwater evaluation setup capable of measuring input pressure, tail deformation, and thrust. Multiple robotic fish prototypes with varying stiffness and internal pneumatic chamber configurations are fabricated and experimentally evaluated. The presented flexible modular design principle for the robot and its evaluation platform unlocks the possibilities of more efficient soft robotic fish and will benefit research on design optimization and underwater exploration in the future.

preprint2022arXiv

CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

Safe reinforcement learning (RL) is still very challenging since it requires the agent to consider both return maximization and safe exploration. In this paper, we propose CUP, a Conservative Update Policy algorithm with a theoretical safety guarantee. We derive the CUP based on the new proposed performance bounds and surrogate functions. Although using bounds as surrogate functions to design safe RL algorithms have appeared in some existing works, we develop them at least three aspects: (i) We provide a rigorous theoretical analysis to extend the surrogate functions to generalized advantage estimator (GAE). GAE significantly reduces variance empirically while maintaining a tolerable level of bias, which is an efficient step for us to design CUP; (ii) The proposed bounds are tighter than existing works, i.e., using the proposed bounds as surrogate functions are better local approximations to the objective and safety constraints. (iii) The proposed CUP provides a non-convex implementation via first-order optimizers, which does not depend on any convex approximation. Finally, extensive experiments show the effectiveness of CUP where the agent satisfies safe constraints. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL.

preprint2022arXiv

Deformer: Towards Displacement Field Learning for Unsupervised Medical Image Registration

Recently, deep-learning-based approaches have been widely studied for deformable image registration task. However, most efforts directly map the composite image representation to spatial transformation through the convolutional neural network, ignoring its limited ability to capture spatial correspondence. On the other hand, Transformer can better characterize the spatial relationship with attention mechanism, its long-range dependency may be harmful to the registration task, where voxels with too large distances are unlikely to be corresponding pairs. In this study, we propose a novel Deformer module along with a multi-scale framework for the deformable image registration task. The Deformer module is designed to facilitate the mapping from image representation to spatial transformation by formulating the displacement vector prediction as the weighted summation of several bases. With the multi-scale framework to predict the displacement fields in a coarse-to-fine manner, superior performance can be achieved compared with traditional and learning-based approaches. Comprehensive experiments on two public datasets are conducted to demonstrate the effectiveness of the proposed Deformer module as well as the multi-scale framework.

preprint2022arXiv

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Research into Few-shot Semantic Segmentation (FSS) has attracted great attention, with the goal to segment target objects in a query image given only a few annotated support images of the target class. A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images. However, most existing approaches either compressed the support information into a few class-wise prototypes, or used partial support information (e.g., only foreground) at the pixel level, causing non-negligible information loss. In this paper, we propose Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation (DCAMA), where both foreground and background support information are fully exploited via multi-level pixel-wise correlations between paired query and support features. Implemented with the scaled dot-product attention in the Transformer architecture, DCAMA treats every query pixel as a token, computes its similarities with all support pixels, and predicts its segmentation label as an additive aggregation of all the support pixels' labels -- weighted by the similarities. Based on the unique formulation of DCAMA, we further propose efficient and effective one-pass inference for n-shot segmentation, where pixels of all support images are collected for the mask aggregation at once. Experiments show that our DCAMA significantly advances the state of the art on standard FSS benchmarks of PASCAL-5i, COCO-20i, and FSS-1000, e.g., with 3.1%, 9.7%, and 3.6% absolute improvements in 1-shot mIoU over previous best records. Ablative studies also verify the design DCAMA.

preprint2022arXiv

Differentially Private Load Restoration for Microgrids with Distributed Energy Storage

Distributed energy storage systems (ESSs) can be efficiently leveraged for load restoration (LR) for a microgrid (MG) in island mode. When the ESSs are owned by third parties rather than the MG operator (MGO), the ESS operating setpoints may be considered as private information of their respective owners. Therefore, efforts must be put forth to avoid the disclosure through adversarial analysis of load setpoints. In his paper, we consider a scenario where LR takes place in a MG by determining load and ESS power injections through the solution of an AC optimal power flow (AC-OPF) problem. Since the charge/discharge mode at any given time is assumed to be private, we develop a differentially-private mechanism which restores load while maintaining privacy of ESS mode data. The performance of the proposed mechanism is demonstrated for a 33-bus MG.

preprint2022arXiv

Dissecting Service Mesh Overheads

Service meshes play a central role in the modern application ecosystem by providing an easy and flexible way to connect different services that form a distributed application. However, because of the way they interpose on application traffic, they can substantially increase application latency and resource consumption. We develop a decompositional approach and a tool, called MeshInsight, to systematically characterize the overhead of service meshes and to help developers quantify overhead in deployment scenarios of interest. Using MeshInsight, we confirm that service meshes can have high overhead -- up to 185% higher latency and up to 92% more virtual CPU cores for our benchmark applications -- but the severity is intimately tied to how they are configured and the application workload. The primary contributors to overhead vary based on the configuration too. IPC (inter-process communication) and socket writes dominate when the service mesh operates as a TCP proxy, but protocol parsing dominates when it operates as an HTTP proxy. MeshInsight also enables us to study the end-to-end impact of optimizations to service meshes. We show that not all seemingly-promising optimizations lead to a notable overhead reduction in realistic settings.

preprint2022arXiv

Durable and Recoverable Hydrophilicity of Polyethylene Terephthalate Fabric Prepared with Plasma Selective Etching

Durable delustered PET (PET-TiO2) fabrics super hydrophilic surface has been obtained by plasma selecting etching. The aging effect of their hydrophilicity after plasma treatment has been investigated with storage time. After Ar/O2 radio frequency (RF) plasma treatment for only 7 min, PET-TiO2 fabric showed water contact angle of 0o. After 10 month storage time, it keeps its water contact angle below 75.7o. Further more, with Xenon light irradiation for 10 min, it is firstly found that it has well-recovered water contact angle to 5°. While the contact angle of PET fabric for 7 min returns to 123.0° and its hydrophilicity disappeared almost completely and showed no response to Xenon light irradiation. The water absorption rate of 7 min plasma treated PET-TiO2 fabric increased by 57.54%. By field emission scanning electron microscopy (FE-SEM), X-ray photoelectron spectroscopy (XPS) and X-ray diffraction analysis(XRD) measurement, waviness structure of humps and ridges with irregular particles or pits were found on the plasma treated PET-TiO2 fabric surface and increased Ti atomic percentage was observed. It is verified that TiO2 particles inside PET-TiO2 fiber have been exposed to its surface by plasma selective etching of its organic component. It suppresses the aging effect and is characterized with durable and recoverable hydrophilicity. This one step, quick, green and cost-resonable manufacture method has a pratical application for durable superhydrophilic surfaces.

preprint2022arXiv

Efficient scheme for realizing a multiplex-controlled phase gate with photonic qubits in circuit quantum electrodynamics

We propose an efficient scheme to implement a multiplex-controlled phase gate with multiple photonic qubits simultaneously controlling one target photonic qubit based on circuit quantum electrodynamics (QED). For convenience, we denote this multiqubit gate as MCP gate. The gate is realized by using a two-level coupler to couple multiple cavities. The coupler here is a superconducting qubit. This scheme is simple because the gate implementation requires only \textit{one step} of operation. In addition, this scheme is quite general because the two logic states of each photonic qubit can be encoded with a vacuum state and an arbitrary non-vacuum state (e.g., a Fock state, a superposition of Fock states, a cat state, or a coherent state, etc.) which is orthogonal or quasi-orthogonal to the vacuum state. The scheme has some additional advantages: Because only two levels of the coupler are used, i.e., no auxiliary levels are utilized, decoherence from higher energy levels of the coupler is avoided; the gate operation time does not depend on the number of qubits; and the gate is implemented deterministically because no measurement is applied. As an example, we numerically analyze the circuit-QED based experimental feasibility of implementing a three-qubit MCP gate with photonic qubits each encoded via a vacuum state and a cat state. The scheme can be applied to accomplish the same task in a wide range of physical system, which consists of multiple microwave or optical cavities coupled to a two-level coupler such as a natural or artificial atom.

preprint2022arXiv

Enhance Accuracy: Sensitivity and Uncertainty Theory in LiDAR Odometry and Mapping

Currently, the improvement of LiDAR poses estimation accuracy is an urgent need for mobile robots. Research indicates that diverse LiDAR points have different influences on the accuracy of pose estimation. This study aimed to select a good point set to enhance accuracy. Accordingly, the sensitivity and uncertainty of LiDAR point residuals were formulated as a fundamental basis for derivation and analysis. High-sensitivity and low -uncertainty point residual terms are preferred to achieve higher pose estimation accuracy. The proposed selection method has been theoretically proven to be capable of achieving a global statistical optimum. It was tested on artificial data and compared with the KITTI benchmark. It was also implemented in LiDAR odometry (LO) and LiDAR inertial odometry (LIO), both indoors and outdoors. The experiments revealed that utilizing selected LiDAR point residuals simultaneously enhances optimization accuracy, decreases residual terms, and guarantees real-time performance.

preprint2022arXiv

Entanglement Dynamics in Anti-$\mathcal{PT}$-Symmetric Systems

In the past years, many efforts have been made to study various noteworthy phenomena in both parity-time ($\mathcal{PT}$) and anti-parity-time ($\mathcal{APT}$) symmetric systems. However, entanglement dynamics in $\mathcal{APT}$-symmetric systems has not previously been investigated in both theory and experiments. Here, we investigate the entanglement evolution of two qubits in an $\mathcal{APT}$-symmetric system. In the $\mathcal{APT}$-symmetric unbroken regime, our theoretical simulations demonstrate the periodic oscillations of entanglement when each qubit evolves identically, while the nonperiodic oscillations of entanglement when each qubit evolves differently. In particular, when each qubit evolves near the exceptional point in the $\mathcal{APT}$-symmetric unbroken regime, there exist entanglement sudden vanishing and revival. Moreover, our simulations demonstrate rapid decay and delayed death of entanglement provided one qubit evolves in the $\mathcal{APT}$-symmetric broken regime. In this work, we also perform an experiment with a linear optical setup. The experimental results agree well with our theoretical simulation results. Our findings reveal novel phenomena of entanglement evolution in the $\mathcal{APT}$-symmetric system and opens a new direction for future studies on the dynamics of quantum entanglement in multiqubit $\mathcal{APT}$-symmetric systems or other non-Hermitian quantum systems.

preprint2022arXiv

Estimating Cluster Masses from SDSS Multi-band Images with Transfer Learning

The total masses of galaxy clusters characterize many aspects of astrophysics and the underlying cosmology. It is crucial to obtain reliable and accurate mass estimates for numerous galaxy clusters over a wide range of redshifts and mass scales. We present a transfer-learning approach to estimate cluster masses using the ugriz-band images in the SDSS Data Release 12. The target masses are derived from X-ray or SZ measurements that are only available for a small subset of the clusters. We designed a semi-supervised deep learning model consisting of two convolutional neural networks. In the first network, a feature extractor is trained to classify the SDSS photometric bands. The second network takes the previously trained features as inputs to estimate their total masses. The training and testing processes in this work depend purely on real observational data. Our algorithm reaches a mean absolute error (MAE) of 0.232 dex on average and 0.214 dex for the best fold. The performance is comparable to that given by redMaPPer, 0.192 dex. We have further applied a joint integrated gradient and class activation mapping method to interpret such a two-step neural network. The performance of our algorithm is likely to improve as the size of training dataset increases. This proof-of-concept experiment demonstrates the potential of deep learning in maximizing the scientific return of the current and future large cluster surveys.

preprint2022arXiv

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.

preprint2022arXiv

Higher-order Proton Cumulants in Au+Au Collisions at $\sqrt{s_{\rm NN}}$ = 3 GeV from RHIC-STAR

In these proceedings, we present the higher-order cumulants of proton multiplicity distributions of the fixed-target (FXT) run in Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 3.0 GeV. The cumulant ratios are presented as a function of centrality and collision energy. The proton cumulant ratio C4/C2 is consistent with fluctuations driven by baryon number conservation and indicates an energy regime dominated by hadronic interactions. These data imply that the QCD critical point could exist at energies higher than 3 GeV if created in heavy-ion collisions.

preprint2022arXiv

Higher-order topological states in photonic Thue-Morse quasicrystals: quadrupole insulator and a new origin of corner states

Corner states (CSs) in higher-order topological insulators (HOTIs) have recently been of great interest in both crystals and quasicrystals. In contrast to electronic systems, HOTIs have not been found in photonic quasicrystals (PQCs). Here, we systemically study the higher-order topology in the two-dimensional Thue-Morse photonic quasicrystals (TM-PQCs). Not only the topological phase transition and the non-trivial CSs with fractional charge induced by multipole moments, but also a new type of CSs are found due to the complex structure of TM-PQCs near corners. The different origins of these CSs are also analyzed based on the tight-binding model. Our work opens the door to explore richer HOT physics beyond photonic crystals and the robustness of CSs in PQC shows the potential for applications.

preprint2022arXiv

Hybrid controlled-SUM gate with one superconducting qutrit and one cat-state qutrit and application in hybrid entangled state preparation

Compared with a qubit, a qudit (i.e., $d$-level or $d$-state quantum system) provides a larger Hilbert space to store and process information. On the other hand, qudit-based hybrid quantum computing usually requires performing hybrid quantum gates with qudits different in their nature or in their encoding format. In this work, we consider the qutrit case, i.e., the case for a qudit with $d$=3. We propose a simple method to realize a hybrid quantum controlled-SUM gate with one superconducting (SC) qutrit and a cat-state qutrit. This gate plus single-qutrit gates form a universal set of ternary logic gates for quantum computing with qutrits. Our proposal is based on circuit QED and operates essentially by employing a SC ququart (a four-level quantum system) dispersively coupled to a microwave cavity. The gate implementation is quite simple because it only requires a single basic operation. Neither classical pulse nor measurement is needed. The auxiliary higher energy level of the SC ququart is virtually excited during the gate operation, thus decoherence from this level is greatly suppressed. As an application of this gate, we discuss the generation of a hybrid maximally-entangled state of one SC qutrit and one cat-state qutrit. We further analyze the experimental feasibility of creating such hybrid entangled state in circuit QED. This proposal is quite general and can be extended to accomplish the same task in a wide range of physical system, such as a four-level natural or artificial atom coupled to an optical or microwave cavity.

preprint2022arXiv

Image Steganography based on Style Transfer

Image steganography is the art and science of using images as cover for covert communications. With the development of neural networks, traditional image steganography is more likely to be detected by deep learning-based steganalysis. To improve upon this, we propose image steganography network based on style transfer, and the embedding of secret messages can be disguised as image stylization. We embed secret information while transforming the content image style. In latent space, the secret information is integrated into the latent representation of the cover image to generate the stego images, which are indistinguishable from normal stylized images. It is an end-to-end unsupervised model without pre-training. Extensive experiments on the benchmark dataset demonstrate the reliability, quality and security of stego images generated by our steganographic network.

preprint2022arXiv

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection.

preprint2022arXiv

Interpretable Graph Convolutional Network of Multi-Modality Brain Imaging for Alzheimer's Disease Diagnosis

Identification of brain regions related to the specific neurological disorders are of great importance for biomarker and diagnostic studies. In this paper, we propose an interpretable Graph Convolutional Network (GCN) framework for the identification and classification of Alzheimer's disease (AD) using multi-modality brain imaging data. Specifically, we extended the Gradient Class Activation Mapping (Grad-CAM) technique to quantify the most discriminative features identified by GCN from brain connectivity patterns. We then utilized them to find signature regions of interest (ROIs) by detecting the difference of features between regions in healthy control (HC), mild cognitive impairment (MCI), and AD groups. We conducted the experiments on the ADNI database with imaging data from three modalities, including VBM-MRI, FDG-PET, and AV45-PET, and showed that the ROI features learned by our method were effective for enhancing the performances of both clinical score prediction and disease status identification. It also successfully identified biomarkers associated with AD and MCI.

preprint2022arXiv

JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering

Existing KG-augmented models for commonsense question answering primarily focus on designing elaborate Graph Neural Networks (GNNs) to model knowledge graphs (KGs). However, they ignore (i) the effectively fusing and reasoning over question context representations and the KG representations, and (ii) automatically selecting relevant nodes from the noisy KGs during reasoning. In this paper, we propose a novel model, JointLK, which solves the above limitations through the joint reasoning of LM and GNN and the dynamic KGs pruning mechanism. Specifically, JointLK performs joint reasoning between LM and GNN through a novel dense bidirectional attention module, in which each question token attends on KG nodes and each KG node attends on question tokens, and the two modal representations fuse and update mutually by multi-step interactions. Then, the dynamic pruning module uses the attention weights generated by joint reasoning to prune irrelevant KG nodes recursively. We evaluate JointLK on the CommonsenseQA and OpenBookQA datasets, and demonstrate its improvements to the existing LM and LM+KG models, as well as its capability to perform interpretable reasoning.

preprint2022arXiv

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

preprint2022arXiv

Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation

End-to-end speech-to-speech translation (S2ST) without relying on intermediate text representations is a rapidly emerging frontier of research. Recent works have demonstrated that the performance of such direct S2ST systems is approaching that of conventional cascade S2ST when trained on comparable datasets. However, in practice, the performance of direct S2ST is bounded by the availability of paired S2ST training data. In this work, we explore multiple approaches for leveraging much more widely available unsupervised and weakly-supervised speech and text data to improve the performance of direct S2ST based on Translatotron 2. With our most effective approaches, the average translation quality of direct S2ST on 21 language pairs on the CVSS-C corpus is improved by +13.6 BLEU (or +113% relatively), as compared to the previous state-of-the-art trained without additional data. The improvements on low-resource language are even more significant (+398% relatively on average). Our comparative studies suggest future research directions for S2ST and speech representation learning.

preprint2022arXiv

LibMTL: A Python Library for Multi-Task Learning

This paper presents LibMTL, an open-source Python library built on PyTorch, which provides a unified, comprehensive, reproducible, and extensible implementation framework for Multi-Task Learning (MTL). LibMTL considers different settings and approaches in MTL, and it supports a large number of state-of-the-art MTL methods, including 12 loss weighting strategies, 7 architectures, and 84 combinations of different architectures and loss weighting methods. Moreover, the modular design in LibMTL makes it easy-to-use and well extensible, thus users can easily and fast develop new MTL methods, compare with existing MTL methods fairly, or apply MTL algorithms to real-world applications with the support of LibMTL. The source code and detailed documentations of LibMTL are available at https://github.com/median-research-group/LibMTL and https://libmtl.readthedocs.io, respectively.

preprint2022arXiv

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pruning structured parameters. More precisely, we create a Transformer-based supernet that is nested with thousands of weight-sharing subnets and design a two-stage distillation strategy to leverage the contextualized latent representations from HuBERT. Experiments on automatic speech recognition (ASR) and the SUPERB benchmark show the proposed LightHuBERT enables over $10^9$ architectures concerning the embedding dimension, attention dimension, head number, feed-forward network ratio, and network depth. LightHuBERT outperforms the original HuBERT on ASR and five SUPERB tasks with the HuBERT size, achieves comparable performance to the teacher model in most tasks with a reduction of 29% parameters, and obtains a $3.5\times$ compression ratio in three SUPERB tasks, e.g., automatic speaker verification, keyword spotting, and intent classification, with a slight accuracy loss. The code and pre-trained models are available at https://github.com/mechanicalsea/lighthubert.

preprint2022arXiv

Localization, multifractality, and many-body localization in periodically kicked quasiperiodic lattices

We study the combined effect of quasiperiodic disorder, driven and interaction in the periodically kicked Aubry-André model. In the non-interacting limit, by analyzing the quasienergy spectrum statistics, we verify the existence of a dynamical localization transition in the high-frequency region, whereas the spectrum statistics becomes intricate in the low-frequency region due to the emergence of the extended/localized-to-multifractal edges in the quasienergy spectrum, which separate the multifractal states from the extended (localized) states. When the interaction is introduced, we find the periodically kicked incommensurate potential can lead to a transition from ergodic to many-body-localization phase in the high-frequency region. However, the many-body localization phase vanishes in the low-frequency region even for strong quasiperiodic disorder. Our studies demonstrate that the periodically kicked Aubry-André model displays rich dynamical phenomena and the driving frequency plays an important role in the formation of many-body localization in addition to the disorder strength.

preprint2022arXiv

MAESTRO: Matched Speech Text Representations through Modality Matching

We present Maestro, a self-supervised training method to unify representations learnt from speech and text modalities. Self-supervised learning from speech signals aims to learn the latent structure inherent in the signal, while self-supervised learning from text attempts to capture lexical information. Learning aligned representations from unpaired speech and text sequences is a challenging task. Previous work either implicitly enforced the representations learnt from these two modalities to be aligned in the latent space through multitasking and parameter sharing or explicitly through conversion of modalities via speech synthesis. While the former suffers from interference between the two modalities, the latter introduces additional complexity. In this paper, we propose Maestro, a novel algorithm to learn unified representations from both these modalities simultaneously that can transfer to diverse downstream tasks such as Automated Speech Recognition (ASR) and Speech Translation (ST). Maestro learns unified representations through sequence alignment, duration prediction and matching embeddings in the learned space through an aligned masked-language model loss. We establish a new state-of-the-art (SOTA) on VoxPopuli multilingual ASR with a 8% relative reduction in Word Error Rate (WER), multidomain SpeechStew ASR (3.7% relative) and 21 languages to English multilingual ST on CoVoST 2 with an improvement of 2.8 BLEU averaged over 21 languages.

preprint2022arXiv

Masked Spatial-Spectral Autoencoders Are Excellent Hyperspectral Defenders

Deep learning methodology contributes a lot to the development of hyperspectral image (HSI) analysis community. However, it also makes HSI analysis systems vulnerable to adversarial attacks. To this end, we propose a masked spatial-spectral autoencoder (MSSA) in this paper under self-supervised learning theory, for enhancing the robustness of HSI analysis systems. First, a masked sequence attention learning module is conducted to promote the inherent robustness of HSI analysis systems along spectral channel. Then, we develop a graph convolutional network with learnable graph structure to establish global pixel-wise combinations.In this way, the attack effect would be dispersed by all the related pixels among each combination, and a better defense performance is achievable in spatial aspect.Finally, to improve the defense transferability and address the problem of limited labelled samples, MSSA employs spectra reconstruction as a pretext task and fits the datasets in a self-supervised manner.Comprehensive experiments over three benchmarks verify the effectiveness of MSSA in comparison with the state-of-the-art hyperspectral classification methods and representative adversarial defense strategies.

preprint2022arXiv

Mechanical control of physical properties in the van der Waals ferromagnet Cr2Ge2Te6 via application of electric current

Cr2Ge2Te6 is a van der Waals ferromagnet with a Curie temperature at 66 K. Here we report a swift change in the magnetic ground state upon application of small DC electric current, a giant yet anisotropic magnetoelectric effect, and a sharp, lattice-driven quantum switching manifested in the I-V characteristic of the bulk single-crystal Cr2Ge2Te6. At the heart of these observed phenomena is a newly uncovered, strongly anisotropic magnetoelastic coupling that enables strongly anisotropic responses of the lattice to application of electric current and/or magnetic field, thus the exotic phenomena in Cr2Ge2Te6. Such a rare mechanical tunability in the magnetic semiconductors promises tantalizing prospects for unique functional materials and devices.

preprint2022arXiv

Modeling and Predicting Citation Count via Recurrent Neural Network with Long Short-Term Memory

The rapid evolution of scientific research has been creating a huge volume of publications every year. Among the many quantification measures of scientific impact, citation count stands out for its frequent use in the research community. Although peer review process is the mainly reliable way of predicting a paper's future impact, the ability to foresee lasting impact on the basis of citation records is increasingly important in the scientific impact analysis in the era of big data. This paper focuses on the long-term citation count prediction for individual publications, which has become an emerging and challenging applied research topic. Based on the four key phenomena confirmed independently in previous studies of long-term scientific impact quantification, including the intrinsic quality of publications, the aging effect and the Matthew effect and the recency effect, we unify the formulations of all these observations in this paper. Building on a foundation of the above formulations, we propose a long-term citation count prediction model for individual papers via recurrent neural network with long short-term memory units. Extensive experiments on a real-large citation data set demonstrate that the proposed model consistently outperforms existing methods, and achieves a significant performance improvement.

preprint2022arXiv

Modeling and Predicting Popularity Dynamics via Deep Learning Attention Mechanism

An ability to predict the popularity dynamics of individual items within a complex evolving system has important implications in a wide range of domains. Here we propose a deep learning attention mechanism to model the process through which individual items gain their popularity. We analyze the interpretability of the model with the four key phenomena confirmed independently in the previous studies of long-term popularity dynamics quantification, including the intrinsic quality, the aging effect, the recency effect and the Matthew effect. We analyze the effectiveness of introducing attention model in popularity dynamics prediction. Extensive experiments on a real-large citation data set demonstrate that the designed deep learning attention mechanism possesses remarkable power at predicting the long-term popularity dynamics. It consistently outperforms the existing methods, and achieves a significant performance improvement.

preprint2022arXiv

Modern Views of Machine Learning for Precision Psychiatry

In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research.

preprint2022arXiv

mSLAM: Massively multilingual joint pre-training for speech and text

We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly on large amounts of unlabeled speech and text in multiple languages. mSLAM combines w2v-BERT pre-training on speech with SpanBERT pre-training on character-level text, along with Connectionist Temporal Classification (CTC) losses on paired speech and transcript data, to learn a single model capable of learning from and representing both speech and text signals in a shared representation space. We evaluate mSLAM on several downstream speech understanding tasks and find that joint pre-training with text improves quality on speech translation, speech intent classification and speech language-ID while being competitive on multilingual ASR, when compared against speech-only pre-training. Our speech translation model demonstrates zero-shot text translation without seeing any text translation data, providing evidence for cross-modal alignment of representations. mSLAM also benefits from multi-modal fine-tuning, further improving the quality of speech translation by directly leveraging text translation data during the fine-tuning process. Our empirical analysis highlights several opportunities and challenges arising from large-scale multimodal pre-training, suggesting directions for future research.

preprint2022arXiv

Multi-View Self-Attention Based Transformer for Speaker Recognition

Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Transformer variants for speaker recognition have not been well studied. In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition. Specifically, to balance the capabilities of capturing global dependencies and modeling the locality, we propose a multi-view self-attention mechanism for speaker Transformer, in which different attention heads can attend to different ranges of the receptive field. Furthermore, we introduce and compare five Transformer variants with different network architectures, embedding locations, and pooling methods to learn speaker embeddings. Experimental results on the VoxCeleb1 and VoxCeleb2 datasets show that the proposed multi-view self-attention mechanism achieves improvement in the performance of speaker recognition, and the proposed speaker Transformer network attains excellent results compared with state-of-the-art models.

preprint2022arXiv

Neutrino dipole portal at electron colliders

We propose to search for a heavy neutral lepton (HNL), that is also know as sterile neutrino, in electron colliders running with the center-of-mass energies at few GeV, including BESIII, Belle II, and the proposed Super Tau Charm Factory (STCF). We consider the HNL interacting with Standard Model neutrino and photon via a transition magnetic moment, the so-called dipole portal.We use the monophoton signature at electron colliders to probe the constraints on the active-sterile neutrino transition magnetic moments $d$ as the function of the HNL's mass $m_N$.It is found that BESIII, Belle II and STCF can probe the upper limits for $d$ down to 1.3 $\times 10^{-5}\ {\rm GeV}^{-1}$, 8 $\times 10^{-6}\ {\rm GeV}^{-1}$, and 1.3 $\times 10^{-6}\ {\rm GeV}^{-1}$ with $m_N$ around GeV scale, respectively, and have sensitivity to the previously unexplored parameter space for electron- ($d_e$) and tau-neutrino ($d_τ$) dipole portal with $m_N$ from dozens to thounsands MeV. On $d_μ$ for HNL mixing with the {muon}-neutrino, Belle II and STCF can also provide leading constraints.

preprint2022arXiv

OneLabeler: A Flexible System for Building Data Labeling Tools

Labeled datasets are essential for supervised machine learning. Various data labeling tools have been built to collect labels in different usage scenarios. However, developing labeling tools is time-consuming, costly, and expertise-demanding on software development. In this paper, we propose a conceptual framework for data labeling and OneLabeler based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios. The framework consists of common modules and states in labeling tools summarized through coding of existing tools. OneLabeler supports configuration and composition of common software modules through visual programming to build data labeling tools. A module can be a human, machine, or mixed computation procedure in data labeling. We demonstrate the expressiveness and utility of the system through ten example labeling tools built with OneLabeler. A user study with developers provides evidence that OneLabeler supports efficient building of diverse data labeling tools.

preprint2022arXiv

Online Beam Learning with Interference Nulling for Millimeter Wave MIMO Systems

Employing large antenna arrays is a key characteristic of millimeter wave (mmWave) and terahertz communication systems. Due to the hardware constraints and the lack of channel knowledge, codebook based beamforming/combining is normally adopted to achieve the desired array gain. However, most of the existing codebooks focus only on improving the gain of their target user, without taking interference into account. This can incur critical performance degradation in dense networks. In this paper, we propose a sample-efficient online reinforcement learning based beam pattern design algorithm that learns how to shape the beam pattern to null the interfering directions. The proposed approach does not require any explicit channel knowledge or any coordination with the interferers. Simulation results show that the developed solution is capable of learning well-shaped beam patterns that significantly suppress the interference while sacrificing tolerable beamforming/combing gain from the desired user. Furthermore, a hardware proof-of-concept prototype based on mmWave phased arrays is built and used to implement and evaluate the developed online beam learning solutions in realistic scenarios. The learned beam patterns, measured in an anechoic chamber, show the performance gains of the developed framework and highlight a promising machine learning based beam/codebook optimization direction for mmWave and terahertz systems.

preprint2022arXiv

Optical Observations of the Nearby Type Ia Supernova 2021hpr

We present the optical photometric and spectroscopic observations of the nearby Type Ia supernova (SN) 2021hpr. The observations covered the phase of $-$14.37 to +63.68 days relative to its maximum luminosity in the $B$ band. The evolution of multiband light/color curves of SN 2021hpr is similar to that of normal Type Ia supernovae (SNe Ia) with the exception of some phases, especially a plateau phase that appeared in the $V-R$ color curve before peak luminosity, which resembles that of SN 2017cbv. The first spectrum we observed at t $\sim -$14.4 days shows a higher velocity for the Si II $λ$6355 feature ($\sim$ 21,000 km s$^{-1}$) than that of other normal Velocity (NV) SNe Ia at the same phase. Based on the Si II $λ$6355 velocity of $\sim$ 12,420 km s$^{-1}$ around the maximum light, we deduce that SN 2021hpr is a transitional object between high velocity (HV) and NV SNe Ia. Meanwhile, the Si II $λ$6355 feature shows a high velocity gradient (HVG) of about 800 km s$^{-1}$ day$^{-1}$ from roughly $-$14.37 to $-$4.31 days relative to the $B$-band maximum, which indicates that SN 2021hpr can also be classified as an HVG SN Ia. The evolution of SN 2021hpr is similar to that of SN 2011fe. Including SN 2021hpr, there have been six supernovae observed in the host galaxy NGC 3147, and the supernovae explosion rate in the last 50 yr is slightly higher for SNe Ia, while lower for SNe Ibc and SNe II it is lower than expected rate from the radio data. Inspecting the spectra, we find that SN 2021hpr has a metal-rich (12 + log(O/H) $\approx$ 8.648) circumstellar environment, where HV SNe tend to reside. Based on the decline rate of SN 2021hpr in the $B$ band, we determine the distance modulus of the host galaxy NGC 3147 using the Phillips relation to be 33.46 $\pm$ 0.21 mag, which is close to that found by previous works.

preprint2022arXiv

Path-Aware Graph Attention for HD Maps in Motion Prediction

The success of motion prediction for autonomous driving relies on integration of information from the HD maps. As maps are naturally graph-structured, investigation on graph neural networks (GNNs) for encoding HD maps is burgeoning in recent years. However, unlike many other applications where GNNs have been straightforwardly deployed, HD maps are heterogeneous graphs where vertices (lanes) are connected by edges (lane-lane interaction relationships) of various nature, and most graph-based models are not designed to understand the variety of edge types which provide crucial cues for predicting how the agents would travel the lanes. To overcome this challenge, we propose Path-Aware Graph Attention, a novel attention architecture that infers the attention between two vertices by parsing the sequence of edges forming the paths that connect them. Our analysis illustrates how the proposed attention mechanism can facilitate learning in a didactic problem where existing graph networks like GCN struggle. By improving map encoding, the proposed model surpasses previous state of the art on the Argoverse Motion Forecasting dataset, and won the first place in the 2021 Argoverse Motion Forecasting Competition.

preprint2022arXiv

PFilter: Building Persistent Maps through Feature Filtering for Fast and Accurate LiDAR-based SLAM

Simultaneous localization and mapping (SLAM) based on laser sensors has been widely adopted by mobile robots and autonomous vehicles. These SLAM systems are required to support accurate localization with limited computational resources. In particular, point cloud registration, i.e., the process of matching and aligning multiple LiDAR scans collected at multiple locations in a global coordinate framework, has been deemed as the bottleneck step in SLAM. In this paper, we propose a feature filtering algorithm, PFilter, that can filter out invalid features and can thus greatly alleviate this bottleneck. Meanwhile, the overall registration accuracy is also improved due to the carefully curated feature points. We integrate PFilter into the well-established scan-to-map LiDAR odometry framework, F-LOAM, and evaluate its performance on the KITTI dataset. The experimental results show that PFilter can remove about 48.4% of the points in the local feature map and reduce feature points in scan by 19.3% on average, which save 20.9% processing time per frame. In the mean time, we improve the accuracy by 9.4%.

preprint2022arXiv

PHEE: A phased hybrid evaluation-enhanced approach for identifying influential users in social networks

For the purpose of maximizing the spread of influence caused by a certain small number k of nodes in a social network, we are asked to find a k-subset of nodes (i.e., a seed set) with the best capacity to influence the nodes not in it. This problem of influence maximization (IM) has wide application, belongs to subset problems, and is NP-hard. To solve it, we should theoretically examine all seed sets and evaluate their influence spreads, which is time-consuming. Therefore, metaheuristic strategies are generally employed to gain a good seed set within a reasonable time. We observe that many algorithms for the IM problem only adopt a uniform mechanism in the whole solution search process, which lacks a response measure when the algorithm becomes trapped in a local optimum. To address this issue, we propose a phased hybrid evaluation-enhanced (PHEE) approach for IM, which utilizes two distinct search strategies to enhance the search of optimal solutions: a randomized range division evolutionary (RandRDE) algorithm to improve the solution quality, and a fast convergence strategy. Our approach is evaluated on 10 real-world social networks of different sizes and types. Experimental results demonstrate that our algorithm is efficient and obtains the best influence spread for all the datasets compared with three state-of-the-art algorithms, outperforms the time consuming CELF algorithm on four datasets, and performs worse than CELF on only two networks.

preprint2022arXiv

Photometric properties and stellar parameters of the rapidly rotating magnetic early-B star HD 345439

We first present the multicolor photometry results of the rapidly rotating magnetic star HD 345439 using the Nanshan One-meter Wide-field Telescope. From the photometric observations, we derive a rotational period of 0.7699\pm0.0014 day. The light curves of HD 345439 are dominated by the double asymmetric S-wave feature that arises from the magnetic clouds. Pulsating behaviors are not observed in Sector 41 of the Transiting Exoplanet Survey Satellite. No evidence is found of the occurrence of centrifugal breakout events neither in the residual flux nor in the systematic variations at the extremum of the light curve. Based on the hypothesis of the Rigidly Rotating Magnetosphere model, we restrict the magnetic obliquity angle {$β$} and the rotational inclination angle $i$ so that they satisfy the approximate relation {$β+ i \approx 105^{\circ}$}. The colour excess, extinction, and luminosity are determined to be $E_{(B-V)}=0.745\pm0.016\,$mag, $A_{V}=2.31\pm0.05\,$mag, and $\rm log\,(L/L_{\odot})=3.82\pm0.1 $dex, respectively. Furthermore, we derive the effective temperature as $T$$\rm _{eff}=22\pm1 $kK and the surface gravity as log$g=4.00\pm0.22$. The mass$ M=7.24_{-1.24}^{+1.75}\rm M_{\odot}$, radius$ R=4.44_{-1.93}^{+2.68}\rm R_{\odot}$, and age$\rm τ_{age}=23.62\,_{-21.97}^{+4.24} $Myr are estimated from the Hertzsprung--Russell Diagram

preprint2022arXiv

Pileup Correction on Higher-order Cumulants with Unfolding Approach

Higher-order cumulants of conserved charge distributions are sensitive observables to probe the critical fluctuations near QCD critical point in heavy-ion collisions. Due to high interaction rate, pileup event can be one of the major sources of background in the measurements of higher-order cumulants. In this paper, we studied the effects of pileup events on higher-order cumulants of proton multiplicity distributions using UrQMD model. It is found that the proposed pileup correction fails if the correction parameters are determined by the Glauber fitting of charged particle multiplicities, which is usually done in the real heavy-ion experiment. To address this, we propose a model independent unfolding approach to determine the parameters in the pileup correction. This approach can be applied in the pileup correction for the future measurement of higher-order cumulants in heavy-ion collision experiment.

preprint2022arXiv

Policy Optimization with Stochastic Mirror Descent

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed $\mathtt{VRMPO}$ needs only $\mathcal{O}(ε^{-3})$ sample trajectories to achieve an $ε$-approximate first-order stationary point, which matches the best sample complexity for policy optimization. The extensive experimental results demonstrate that $\mathtt{VRMPO}$ outperforms the state-of-the-art policy gradient methods in various settings.

preprint2022arXiv

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Objectives: To develop and validate a deep learning (DL)-based primary tumor biopsy signature for predicting axillary lymph node (ALN) metastasis preoperatively in early breast cancer (EBC) patients with clinically negative ALN. Methods: A total of 1,058 EBC patients with pathologically confirmed ALN status were enrolled from May 2010 to August 2020. A DL core-needle biopsy (DL-CNB) model was built on the attention-based multiple instance-learning (AMIL) framework to predict ALN status utilizing the DL features, which were extracted from the cancer areas of digitized whole-slide images (WSIs) of breast CNB specimens annotated by two pathologists. Accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curves, and areas under the ROC curve (AUCs) were analyzed to evaluate our model. Results: The best-performing DL-CNB model with VGG16_BN as the feature extractor achieved an AUC of 0.816 (95% confidence interval (CI): 0.758, 0.865) in predicting positive ALN metastasis in the independent test cohort. Furthermore, our model incorporating the clinical data, which was called DL-CNB+C, yielded the best accuracy of 0.831 (95%CI: 0.775, 0.878), especially for patients younger than 50 years (AUC: 0.918, 95%CI: 0.825, 0.971). The interpretation of DL-CNB model showed that the top signatures most predictive of ALN metastasis were characterized by the nucleus features including density ($p$ = 0.015), circumference ($p$ = 0.009), circularity ($p$ = 0.010), and orientation ($p$ = 0.012). Conclusion: Our study provides a novel DL-based biomarker on primary tumor CNB slides to predict the metastatic status of ALN preoperatively for patients with EBC. The codes and dataset are available at https://github.com/bupt-ai-cz/BALNMP

preprint2022arXiv

Predicting Future CSI Feedback For Highly-Mobile Massive MIMO Systems

Massive multiple-input multiple-output (MIMO) system is promising in providing unprecedentedly high data rate. To achieve its full potential, the transceiver needs complete channel state information (CSI) to perform transmit/receive precoding/combining. This requirement, however, is challenging in the practical systems due to the unavoidable processing and feedback delays, which oftentimes degrades the performance to a great extent, especially in the high mobility scenarios. In this paper, we develop a deep learning based channel prediction framework that proactively predicts the downlink channel state information based on the past observed channel sequence. In its core, the model adopts a 3-D convolutional neural network (CNN) based architecture to efficiently learn the temporal, spatial and frequency correlations of downlink channel samples, based on which accurate channel prediction can be performed. Simulation results highlight the potential of the developed learning model in extracting information and predicting future downlink channels directly from the observed past channel sequence, which significantly improves the performance compared to the sample-and-hold approach, and mitigates the impact of the dynamic communication environment.

preprint2022arXiv

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.

preprint2022arXiv

QCD Critical Point and Net-Proton Number Fluctuations at RHIC-STAR

In the search of QCD phase boundary and critical point, higher-order cumulants of conserved quantities are proposed as promising observables and have been studied extensively both experimentally and theoretically. In this paper we present cumulant ratios up to $6^{th}$-order of net-proton number distributions in Au+Au collisions at $\sqrt{\mathrm{s_{NN}}}$ = 7.7 - 200 GeV from STAR Beam Energy Scan program phase I and $\sqrt{\mathrm{s}}$ = 200 GeV $p+p$ collisions. The results are compared with various models and Lattice QCD calculations.

preprint2022arXiv

QUBO-based density matrix electronic structure method

Density matrix electronic structure theory is used in many quantum chemistry methods to "alleviate" the computational cost that arises from directly using wave functions. Although density matrix based methods are computationally more efficient than wave functions based methods, yet significant computational effort is involved. Since the Schrödinger equation needs to be solved as an eigenvalue problem, the time-to-solution scales cubically with the system size, and is solved as many times in order to reach charge or field self-consistency. We hereby propose and study a method to compute the density matrix by using a quadratic unconstrained binary optimization (QUBO) solver. This method could be useful to solve the problem with quantum computers, and more specifically, quantum annealers. The method hereby proposed is based on a direct construction of the density matrix using a QUBO eigensolver. We explore the main parameters of the algorithm focusing on precision and efficiency. We show that, while direct construction of the density matrix using a QUBO formulation is possible, the efficiency and precision have room for improvement. Moreover, calculations performing Quantum Annealing with the D-Wave's new Advantage quantum processing units is compared with classical Simulated annealing, further highlighting some problems of the proposed method. We also show some alternative methods that could lead to a better performance of the density matrix construction.

preprint2022arXiv

Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning

Multi-Task Learning (MTL) has achieved success in various fields. However, how to balance different tasks to achieve good performance is a key problem. To achieve the task balancing, there are many works to carefully design dynamical loss/gradient weighting strategies but the basic random experiments are ignored to examine their effectiveness. In this paper, we propose the Random Weighting (RW) methods, including Random Loss Weighting (RLW) and Random Gradient Weighting (RGW), where an MTL model is trained with random loss/gradient weights sampled from a distribution. To show the effectiveness and necessity of RW methods, theoretically we analyze the convergence of RW and reveal that RW has a higher probability to escape local minima, resulting in better generalization ability. Empirically, we extensively evaluate the proposed RW methods to compare with twelve state-of-the-art methods on five image datasets and two multilingual problems from the XTREME benchmark to show RW methods can achieve comparable performance with state-of-the-art baselines. Therefore, we think that the RW methods are important baselines for MTL and should attract more attentions.

preprint2022arXiv

Recent advances of defect-induced spin and valley polarized states in graphene

Electrons in graphene have fourfold spin and valley degeneracies owing to the unique bipartite honeycomb lattice and an extremely weak spin-orbit coupling, which can support a series of broken symmetry states. Atomic-scale defects in graphene are expected to lift these degenerate degrees of freedom at the nanoscale, and hence, lead to rich quantum states, highlighting promising directions for spintronics and valleytronics. In this article, we mainly review the recent scanning tunneling microscopy (STM) advances on the spin and/or valley polarized states induced by an individual atomic-scale defect in graphene, including a single-carbon vacancy, a nitrogen-atom dopant, and a hydrogen-atom chemisorption. Lastly, we give a perspective in this field.

preprint2022arXiv

Searching for Variable Stars in the Open Cluster NGC 2355 and Its Surrounding Region

We have investigated the variable stars in the field surrounding NGC 2355 based on the time-series photometric observation data. More than 3000 CCD frames were obtained in the V band spread over 13 nights with the Nanshan One-meter Wide-field Telescope. We have detected 88 variable stars, containing 72 new variable stars and 16 known variable stars. By analyzing these light curves, we classified the variable stars as follows: 26 eclipsing binaries, 52 pulsating stars, 4 rotating variables, and 6 unclear type variable stars for which their periods are much longer than the time baseline chosen. Employing Gaia DR2 parallax, kinematics, and photometry, the cluster membership of these variable stars were also analyzed for NGC 2355. In addition to the 11 variable members reported by Cantat-Gaudin et al. (2018), we identify 4 more variable member candidates located at the outer region of NGC 2355 and showed homogeneity in space positions and kinematic properties with the cluster members. The main physical parameters of NGC 2355 estimated from the two-color and color-magnitude diagrams are log(age/yr) = 8.9, E(B - V) = 0.24 mag, and [Fe/H] = - 0.07 dex.

preprint2022arXiv

See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity

Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction. They are, however, unable to reproduce the exact visual stimulus, since it is the human-specified annotation of images, not their data, that determines what the synthesized images are. Moreover, synthesized images often suffer from noisy EEG encodings and unstable training of generative models, making them hard to recognize. Instead, we present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations, allowing us to recover the exact visual stimulus for an EEG clip. We maximize the mutual information between the EEG encoding and associated visual stimulus through optimization of a contrastive self-supervised objective, leading to two additional benefits. One, it enables EEG encodings to handle visual classes beyond seen ones during training, since learning is not directed at class annotations. In addition, the model is no longer required to generate every detail of the visual stimulus, but rather focuses on cross-modal alignment and retrieves images at the instance level, ensuring distinguishable model output. Empirical studies are conducted on the largest single-subject EEG dataset that measures brain activities evoked by image stimuli. We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot. We also examine the implications of a range of EEG and visual encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual classification task, despite not using class annotations, the proposed method outperforms state-of-the-art supervised EEG-visual reconstruction approaches, particularly on the capability of open class recognition.

preprint2022arXiv

Self-supervised Learning with Random-projection Quantizer for Speech Recognition

We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither the matrix nor the codebook is updated during self-supervised learning. Since the random-projection quantizer is not trained and is separated from the speech recognition model, the design makes the approach flexible and is compatible with universal speech recognition architecture. On LibriSpeech our approach achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models, and provides lower word-error-rates and latency than wav2vec 2.0 and w2v-BERT with streaming models. On multilingual tasks the approach also provides significant improvement over wav2vec 2.0 and w2v-BERT.

preprint2022arXiv

Side-aware Meta-Learning for Cross-Dataset Listener Diagnosis with Subjective Tinnitus

With the development of digital technology, machine learning has paved the way for the next generation of tinnitus diagnoses. Although machine learning has been widely applied in EEG-based tinnitus analysis, most current models are dataset-specific. Each dataset may be limited to a specific range of symptoms, overall disease severity, and demographic attributes; further, dataset formats may differ, impacting model performance. This paper proposes a side-aware meta-learning for cross-dataset tinnitus diagnosis, which can effectively classify tinnitus in subjects of divergent ages and genders from different data collection processes. Owing to the superiority of meta-learning, our method does not rely on large-scale datasets like conventional deep learning models. Moreover, we design a subject-specific training process to assist the model in fitting the data pattern of different patients or healthy people. Our method achieves a high accuracy of 73.8\% in the cross-dataset classification. We conduct an extensive analysis to show the effectiveness of side information of ears in enhancing model performance and side-aware meta-learning in improving the quality of the learned features.

preprint2022arXiv

Simultaneous Detection of Optical Flares of the Magnetically Active M Dwarf Wolf 359

We present detections of stellar flares of Wolf\,359, an M6.5 dwarf in the solar neighborhood (2.41~pc) known to be prone to flares due to surface magnetic activity. The observations were carried out from 2020 April 23 to 29 with a 1-m and a 0.5-m telescope separated by nearly 300~km in Xinjiang, China. In 27~hr of photometric monitoring, a total of 13 optical flares were detected, each with a total energy of $\gtrsim 5 \times 10^{29}$~erg. The measured event rate of about once every two hours is consistent with those reported previously in radio, X-ray and optical wavelengths for this star. One such flare, detected by both telescopes on 26 April, was an energetic event with a released energy of nearly $10^{33}$~erg. The two-telescope lightcurves of this major event sampled at different cadences and exposure timings enabled us to better estimate the intrinsic flare profile, which reached a peak of up to 1.6 times the stellar quiescent brightness, that otherwise would have been underestimated in the observed flare amplitudes of about $0.4$ and $0.8$, respectively, with single telescopes alone. The compromise between fast sampling so as to resolve a flare profile versus a longer integration time for higher photometric signal-to-noise provides a useful guidance in the experimental design of future flare observations.

preprint2022arXiv

Single-step implementation of a hybrid controlled-NOT gate with one superconducting qubit simultaneously controlling multiple target cat-state qubits

Hybrid quantum gates have recently drawn considerable attention. They play significant roles in connecting quantum information processors with qubits of different encoding and have important applications in the transmission of quantum states between a quantum processor and a quantum memory. In this work, we propose a single-step implementation of a multi-target-qubit controlled-NOT gate with one superconducting (SC) qubit simultaneously controlling $n$ target cat-state qubits. In this proposal, the gate is implemented with $n$ microwave cavities coupled to a three-level SC qutrit. The two logic states of the control SC qubit are represented by the two lowest levels of the qutrit, while the two logic states of each target cat-state qubit are represented by two quasi-orthogonal cat states of a microwave cavity. This proposal operates essentially through the dispersive coupling of each cavity with the qutrit. The gate realization is quite simple because it requires only a single-step operation. There is no need of applying a classical pulse or performing a measurement. The gate operation time is independent of the number of target qubits, thus it does not increase as the number of target qubits increases. Moreover, because the third higher energy level of the qutrit is not occupied during the gate operation, decoherence from the qutrit is greatly suppressed. As an application of this hybrid multi-target-qubit gate, we further discuss the generation of a hybrid Greenberger-Horne-Zeilinger (GHZ) entangled state of SC qubits and cat-state qubits. As an example, we numerically analyze the experimental feasibility of generating a hybrid GHZ state of one SC qubit and three cat-state qubits within present circuit QED technology.

preprint2022arXiv

Soft Retargeting Network for Click Through Rate Prediction

The study of user interest models has received a great deal of attention in click through rate (CTR) prediction recently. These models aim at capturing user interest from different perspectives, including user interest evolution, session interest, multiple interests, etc. In this paper, we focus on a new type of user interest, i.e., user retargeting interest. User retargeting interest is defined as user's click interest on target items the same as or similar to historical click items. We propose a novel soft retargeting network (SRN) to model this specific interest. Specifically, we first calculate the similarity between target item and each historical item with the help of graph embedding. Then we learn to aggregate the similarity weights to measure the extent of user's click interest on target item. Furthermore, we model the evolution of user retargeting interest. Experimental results on public datasets and industrial dataset demonstrate that our model achieves significant improvements over state-of-the-art models.

preprint2022arXiv

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder. Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification. We release our code and model at https://github.com/microsoft/SpeechT5.

preprint2022arXiv

Structured Light with Redundancy Codes

Structured light (SL) systems acquire high-fidelity 3D geometry with active illumination projection. Conventional systems exhibit challenges when working in environments with strong ambient illumination, global illumination and cross-device interference. This paper proposes a general-purposed technique to improve the robustness of SL by projecting redundant optical signals in addition to the native SL patterns. In this way, projected signals become more distinguishable from errors. Thus the geometry information can be more easily recovered using simple signal processing and the ``coding gain" in performance is obtained. We propose three applications using our redundancy codes: (1) Self error-correction for SL imaging under strong ambient light, (2) Error detection for adaptive reconstruction under global illumination, and (3) Interference filtering with device-specific projection sequence encoding, especially for event camera-based SL and light curtain devices. We systematically analyze the design rules and signal processing algorithms in these applications. Corresponding hardware prototypes are built for evaluations on real-world complex scenes. Experimental results on the synthetic and real data demonstrate the significant performance improvements in SL systems with our redundancy codes.

preprint2022arXiv

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in documents, the difficulty of incorporating external linguistic knowledge, and the lack of both accurate and efficient inference methods for approximating the intractable posterior. Recently, pretrained language models (PLMs) have brought astonishing performance improvements to a wide variety of tasks due to their superior representations of text. Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models. In this paper, we begin by analyzing the challenges of using PLM representations for topic discovery, and then propose a joint latent space learning and clustering framework built upon PLM embeddings. In the latent space, topic-word and document-topic distributions are jointly modeled so that the discovered topics can be interpreted by coherent and distinctive terms and meanwhile serve as meaningful summaries of the documents. Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery, and is conceptually simpler than topic models. On two benchmark datasets in different domains, our model generates significantly more coherent and diverse topics than strong topic models, and offers better topic-wise document representations, based on both automatic and human evaluations.

preprint2022arXiv

Topological properties of two-dimensional photonic square lattice without $C_4$ and $M_{x(y)}$ symmetries

Rich topological phenomena, edge states and two types of corner states, are unveiled in a two-dimensional square-lattice dielectric photonic crystal without both $C_4$ and $M_{x(y)}$ symmetries. Specifically, non-trivial type-I corner states, which do not exist in systems with $C_4$ and $M_{x(y)}$ since the degeneracy, are protected by non-zero quadrupole moment, no longer quantized to but less than $0.5$. Excellent properties, e.g. sub-wavelength localization and air-concentrated field distribution, are presented. Type-II corner states, induced by long-range interactions, are easier realized due to asymmetry. This work broadens the topological physics for the symmetries-broken systems and provides potential applications.

preprint2022arXiv

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. This paper demonstrates that transfer can be obtained by training a robust TTS system on data generated by a less robust TTS system designed for a high-quality transfer task; in particular, a CHiVE-BERT monolingual TTS system is trained on the output of a Tacotron model designed for accent transfer. While some quality loss is inevitable with this approach, experimental results show that the models trained on synthetic data this way can produce high quality audio displaying accent transfer, while preserving speaker characteristics such as speaking style.

preprint2022arXiv

Transferable Physical Attack against Object Detection with Separable Attention

Transferable adversarial attack is always in the spotlight since deep learning models have been demonstrated to be vulnerable to adversarial samples. However, existing physical attack methods do not pay enough attention on transferability to unseen models, thus leading to the poor performance of black-box attack.In this paper, we put forward a novel method of generating physically realizable adversarial camouflage to achieve transferable attack against detection models. More specifically, we first introduce multi-scale attention maps based on detection models to capture features of objects with various resolutions. Meanwhile, we adopt a sequence of composite transformations to obtain the averaged attention maps, which could curb model-specific noise in the attention and thus further boost transferability. Unlike the general visualization interpretation methods where model attention should be put on the foreground object as much as possible, we carry out attack on separable attention from the opposite perspective, i.e. suppressing attention of the foreground and enhancing that of the background. Consequently, transferable adversarial camouflage could be yielded efficiently with our novel attention-based loss function. Extensive comparison experiments verify the superiority of our method to state-of-the-art methods.

preprint2022arXiv

Unsupervised Data Selection via Discrete Speech Representation for ASR

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain. It takes the discrete speech representation available in common self-supervised learning frameworks as input, and applies a contrastive data selection method on the discrete tokens. Through extensive empirical studies we show that our proposed method reduces the amount of required pre-training data and improves the downstream ASR performance. Pre-training on a selected subset of 6% of the general data pool results in 11.8% relative improvements in LibriSpeech test-other compared to pre-training on the full set. On Multilingual LibriSpeech French, German, and Spanish test sets, selecting 6% data for pre-training reduces word error rate by more than 15% relatively compared to the full set, and achieves competitive results compared to current state-of-the-art performances.

preprint2022arXiv

Upper Field Strength Limit of Fast Radio Bursts

Fast radio bursts (FRBs) are cosmological radio transients with unclear generation mechanism. Known characteristics such as their luminosity, duration, spectrum and repetition rate, etc. suggest that FRBs are powerful coherent radio signals at GHz frequencies, but the status of FRBs near source remain unknown. As an extreme astronomical event, FRBs should be accompanied by energy -- comparable or even more powerful x/γ-ray counterparts. Here, particle-in-cell simulations of ultra-strong GHz radio pulse interaction with GeV photons show that at 3*10^12V/cm field-strengths, quantum cascade can generate dense pair plasmas, which greatly dampen the radio pulse. Thus, in the presence of GeV photons in the source region, GHz radio pulses stronger than 3*10^12V/cm cannot escape. This result indicates an upper field-strength limit of FRB at the source.

preprint2022arXiv

XTREME-S: Evaluating Cross-lingual Speech Representations

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.

preprint2021arXiv

A Better and Faster End-to-End Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this issue, we look at encouraging the E2E model to emit words early, through an algorithm called FastEmit [3]. Naturally, improving on latency results in a quality degradation. To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR. Secondly, we also explore running a 2nd-pass beam search to improve quality. In order to ensure the 2nd-pass completes quickly, we explore non-causal Conformer layers that feed into the same 1st-pass RNN-T decoder, an algorithm called Cascaded Encoders [5]. Overall, we find that the Conformer RNN-T with Cascaded Encoders offers a better quality and latency tradeoff for streaming ASR.

preprint2021arXiv

A Survey on Neural Network Interpretability

Along with the great success of deep neural networks, there is also growing concern about their black-box nature. The interpretability issue affects people's trust on deep learning systems. It is also related to many ethical problems, e.g., algorithmic discrimination. Moreover, interpretability is a desired property for deep networks to become powerful tools in other research fields, e.g., drug discovery and genomics. In this survey, we conduct a comprehensive review of the neural network interpretability research. We first clarify the definition of interpretability as it has been used in many different contexts. Then we elaborate on the importance of interpretability and propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus (from local to global interpretability). This taxonomy provides a meaningful 3D view of distribution of papers from the relevant literature as two of the dimensions are not simply categorical but allow ordinal subcategories. Finally, we summarize the existing interpretability evaluation methods and suggest possible research directions inspired by our new taxonomy.

preprint2021arXiv

Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging

A key challenge in training neural networks for a given medical imaging task is often the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports, which are often readily available in medical records, contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. This approach can be applied to any task for which text-image pairs are readily available. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%-98%.

preprint2021arXiv

Echo State Speech Recognition

We propose automatic speech recognition (ASR) models inspired by echo state network (ESN), in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained. Our study focuses on RNN-T and Conformer models, and we show that model quality does not drop even when the decoder is fully randomized. Furthermore, such models can be trained more efficiently as the decoders do not require to be updated. By contrast, randomizing encoders hurts model quality, indicating that optimizing encoders and learn proper representations for acoustic inputs are more vital for speech recognition. Overall, we challenge the common practice of training ASR models for all components, and demonstrate that ESN-based models can perform equally well but enable more efficient training and storage than fully-trainable counterparts.

preprint2021arXiv

Effects of $N(2000){5/2}^+$ on $γp \to K^+ Λ(1405)$

The photoproduction reaction of $γp \to K^+Λ(1405)$ is investigated based on an effective Lagrangian approach at the tree-level approximation with the purpose of understanding the reaction mechanism and extracting the resonance contents and the associated resonance parameters in this reaction. Apart from the $t$-channel $K$ and $K^\ast$ exchanges, $s$-channel nucleon ($N$) exchange, $u$-channel $Σ$, $Λ$, and $Λ(1405)$ exchanges, and generalized contact term, the exchanges of a minimum number of $N$ resonances in the $s$ channel are taken into account in constructing the reaction amplitudes to describe the experimental data. It is found that by introducing the $N(2000){5/2}^+$ resonance exchange in the $s$ channel, one can reproduce the most recent differential cross-section data from the CLAS Collaboration quite well. Further analysis shows that the cross sections of $γp \to K^+Λ(1405)$ at high energies are dominated by the $t$-channel $K$ exchange, while the contributions from the $s$-channel $N$ and $N(2000){5/2}^+$ exchanges are rather significant to the cross sections in the near-threshold energy region. Predictions for the beam and target asymmetries for $γp \to K^+Λ(1405)$ are given.

preprint2021arXiv

Evidence for $Z_{c}^{\pm}$ decays into the $ρ^{\pm} η_{c}$ final state

We study $e^{+}e^{-}$ collisions with a $π^{+}π^{-}π^{0}η_{c}$ final state using data samples collected with the BESIII detector at center-of-mass energies $\sqrt{s}=4.226$, $4.258$, $4.358$, $4.416$, and $4.600$ GeV. Evidence for the decay $\zcpm\to\rhopm\etac$ is reported with a statistical significance of $3.9σ$ with various systematic uncertainties taken into account at $\sqrt{s} = 4.226$ GeV, and the Born cross section times branching fraction $σ^{B}(\EE\to \pimp\zcpm)\times \BR(\zcpm\to\rhopm\etac)$ is measured to be $(48 \pm 11 \pm 11)\,\rm{pb}$. The $\zcpm\to \rhopm\etac$ signal is not significant at the other center-of-mass energies and the corresponding upper limits are determined. In addition, no significant signal is observed in a search for $\zcppm\to ρ^{\pm}\etac$ with the same data samples. The ratios $R_{\zc}=\BR(\zcpm\to ρ^{\pm} \etac)/\BR(\zcpm\to π^{\pm} \jpsi)$ and $R_{\zcp}=\BR(\zcppm\to ρ^{\pm} \etac)/\BR(\zcppm\to π^{\pm} \hc)$ are obtained and used to discriminate between different theoretical interpretations of the $\zcpm$ and $\zcppm$.

preprint2021arXiv

Generation of Intense Phase-Stable Femtosecond Hard X-ray Pulse Pairs

Coherent nonlinear spectroscopies and imaging in the X-ray domain provide direct insight into the coupled motions of electrons and nuclei with resolution on the electronic length and time scale. The experimental realization of such techniques will strongly benefit from access to intense, coherent pairs of femtosecond X-ray pulses. We have observed phase-stable X-ray pulse pairs containing more thank 3 x 10e7 photons at 5.9 keV (2.1 Angstrom) with about 1 fs duration and 2-5 fs separation. The highly directional pulse pairs are manifested by interference fringes in the superfluorescent and seeded stimulated manganese K-alpha emission induced by an X-ray free-electron laser. The fringes constitute the time-frequency X-ray analogue of the Young double-slit interference allowing for frequency-domain X-ray measurements with attosecond time resolution.

preprint2021arXiv

Generative Adversarial U-Net for Domain-free Medical Image Augmentation

The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing. Without a sufficient number of training samples, deep learning based models are very likely to suffer from over-fitting problem. The common solution is image manipulation such as image rotation, cropping, or resizing. Those methods can help relieve the over-fitting problem as more training samples are introduced. However, they do not really introduce new images with additional information and may lead to data leakage as the test set may contain similar samples which appear in the training set. To address this challenge, we propose to generate diverse images with generative adversarial network. In this paper, we develop a novel generative method named generative adversarial U-Net , which utilizes both generative adversarial network and U-Net. Different from existing approaches, our newly designed model is domain-free and generalizable to various medical images. Extensive experiments are conducted over eight diverse datasets including computed tomography (CT) scan, pathology, X-ray, etc. The visualization and quantitative results demonstrate the efficacy and good generalization of the proposed method on generating a wide array of high-quality medical images.

preprint2021arXiv

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

Streaming end-to-end automatic speech recognition (ASR) models are widely used on smart speakers and on-device applications. Since these models are expected to transcribe speech with minimal latency, they are constrained to be causal with no future context, compared to their non-streaming counterparts. Consequently, streaming models usually perform worse than non-streaming models. We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models. This way, we scale the training of streaming models to up to 3 million hours of YouTube audio. Experiments show that our approach can significantly reduce the word error rate (WER) of RNNT models not only on LibriSpeech but also on YouTube data in four languages. For example, in French, we are able to reduce the WER by 16.4% relatively to a baseline streaming model by leveraging a non-streaming teacher model trained on the same amount of labeled data as the baseline.

preprint2021arXiv

Learning from Home: A Mixed-Methods Analysis of Live Streaming Based Remote Education Experience in Chinese Colleges During the COVID-19 Pandemic

The COVID-19 global pandemic and resulted lockdown policies have forced education in nearly every country to switch from a traditional co-located paradigm to a pure online 'distance learning from home' paradigm. Lying in the center of this learning paradigm shift is the emergence and wide adoption of distance communication tools and live streaming platforms for education. Here, we present a mixed-methods study on live streaming based education experience during the COVID-19 pandemic. We focus our analysis on Chinese higher education, carried out semi-structured interviews on 30 students, and 7 instructors from diverse colleges and disciplines, meanwhile launched a large-scale survey covering 6291 students and 1160 instructors in one leading Chinese university. Our study not only reveals important design guidelines and insights to better support current remote learning experience during the pandemic, but also provides valuable implications towards constructing future collaborative education supporting systems and experience after pandemic.

preprint2021arXiv

Local Measurements of Shubnikov-de Haas Oscillations in Graphene Systems

Shubnikov-de Haas (SdH) oscillations, the most well-known magneto-oscillations caused by the quantization of electron energy levels in the presence of magnetic fields in two-dimensional (2D) electron systems, can be used to determine Fermi-surface properties and directly measure the Berry phase of the 2D systems. It is usually thought that transport measurements are required to measure the SdH oscillations. Contradicting this belief, we demonstrate that the SdH oscillations can be measured in graphene systems by carrying out scanning tunneling spectroscopy (STS) measurements. The energy-momentum dispersions and Berry phases of monolayer, Bernal-stacked bilayer, and ABC-stacked trilayer graphene are obtained according to the measured SdH oscillations in the STS spectra. It is possible to obtain the SdH oscillations when the size of the 2D systems is larger than the magnetic length and, importantly, no gate electrode is required in the STS measurement, therefore, the reported method in this work is applicable to a wide range of materials.

preprint2021arXiv

Measuring optical vortices by means of dual shearing-type Sagnac interferometers

Measuring the positions of optical vortices is an essential part in the researches of speckles and adaptive optics. The measurement accuracy is restricted by the performance of optical devices and the properties of optical vortices, such as density and size. In order to achieve high accuracy and wide range of application, the dual shearing-type Sagnac interferometers is proposed using two shearing plates to adjust the precision of optical vortices measurement. The shearing displacements are able to balance the measuring precision and the value of the intensity ratio point to provide optimum measurement performance. This method is useful for the observation of optical vortices with different sizes and densities, especially for the high density condition.

preprint2021arXiv

Model-based cellular kinetic analysis of SARS-CoV-2 infection: different immune response modes and treatment strategies

Increasing number in global COVID-19 cases demands for mathematical model to analyze the interaction between the virus dynamics and the response of innate and adaptive immunity. Here, based on the assumption of a weak and delayed response of the innate and adaptive immunity in SARS-CoV-2 infection, we constructed a mathematical model to describe the dynamic processes of immune system. Integrating theoretical results with clinical COVID-19 patients' data, we classified the COVID-19 development processes into three typical modes of immune responses, correlated with the clinical classification of mild & moderate, severe and critical patients. We found that the immune efficacy (the ability of host to clear virus and kill infected cells) and the lymphocyte supply (the abundance and pool of naïve T and B cell) play important roles in the dynamic process and determine the clinical outcome, especially for the severe and critical patients. Furthermore, we put forward possible treatment strategies for the three typical modes of immune response. We hope our results can help to understand the dynamical mechanism of the immune response against SARS-CoV-2 infection, and to be useful for the treatment strategies and vaccine design.

preprint2021arXiv

Multi-Objective Meta Learning

Meta learning with multiple objectives can be formulated as a Multi-Objective Bi-Level optimization Problem (MOBLP) where the upper-level subproblem is to solve several possible conflicting targets for the meta learner. However, existing studies either apply an inefficient evolutionary algorithm or linearly combine multiple objectives as a single-objective problem with the need to tune combination weights. In this paper, we propose a unified gradient-based Multi-Objective Meta Learning (MOML) framework and devise the first gradient-based optimization algorithm to solve the MOBLP by alternatively solving the lower-level and upper-level subproblems via the gradient descent method and the gradient-based multi-objective optimization method, respectively. Theoretically, we prove the convergence properties of the proposed gradient-based optimization algorithm. Empirically, we show the effectiveness of the proposed MOML framework in several meta learning problems, including few-shot learning, neural architecture search, domain adaptation, and multi-task learning.

preprint2021arXiv

Oceanic non-Kolmogorov optical turbulence and spherical wave propagation

Light propagation in turbulent media is conventionally studied with the help of the spatio-temporal power spectra of the refractive index fluctuations. In particular, for natural water turbulence several models for the spatial power spectra have been developed based on the classic, Kolmogorov postulates. However, as currently widely accepted, non-Kolmogorov turbulent regime is also common in the stratified flow fields, as suggested by recent developments in atmospheric optics. Until now all the models developed for the non-Kolmogorov optical turbulence were pertinent to atmospheric research and, hence, involved only one advected scalar, e.g., temperature. We generalize the oceanic spatial power spectrum, based on two advected scalars, temperature and salinity concentration, to the non-Kolmogorov turbulence regime, with the help of the so-called "Upper-Bound Limitation" and by adopting the concept of spectral correlation of two advected scalars. The proposed power spectrum can handle general non-Kolmogorov, anisotropic turbulence but reduces to Kolmogorov, isotropic case if the power law exponents of temperature and salinity are set to 11/3 and anisotropy coefficient is set to unity. To show the application of the new spectrum, we derive the expression for the second-order mutual coherence function of a spherical wave and examine its coherence radius (in both scalar and vector forms) to characterize the turbulent disturbance. Our numerical calculations show that the statistics of the spherical wave vary substantially with temperature and salinity non-Kolmogorov power law exponents and temperature-salinity spectral correlation coefficient. The introduced spectrum is envisioned to become of significance for theoretical analysis and experimental measurements of non-classic natural water double-diffusion turbulent regimes.

preprint2021arXiv

Oscillations of van Hove singularities spacing induced by sub-Angstrom fluctuations of interlayer spacing in graphene superlattices

Physical properties of two-dimensional van der Waals (vdWs) structures depend sensitively on both stacking orders and interlayer interactions. Yet, in most cases studied to date, the interlayer interaction is considered to be a static property of the vdWs structures. Here we demonstrate that applying a scanning tunneling microscopy (STM) tip pulse on twisted bilayer graphene (TBG) can induce sub-Angstrom fluctuations of the interlayer separation in the TBG, which are equivalent to dynamic vertical external pressure of about 10 GPa on the TBG. The sub-Angstrom fluctuations of the interlayer separation result in large oscillations of the energy separations between two van Hove singularities (VHSs) in the TBG. The period of the oscillations of the VHSs spacing is extremely long, about 500-1000 seconds, attributing to tip-induced local stress in the atomic-thick TBG. Our result provides an efficient method to tune and measure the physical properties of the vdWs structures dynamically.

preprint2021arXiv

Phase discontinuities induced scintillation enhancement: coherent vortex beams propagating through weak oceanic turbulence

Under the impact of an infinitely extended edge phase dislocation, optical vortices (screw phase dislocations) induce scintillation enhancement. The scintillation index of a beam consisting of two Gaussian vortex beams with ${\pm{1}}$ topological charges through weak oceanic turbulence is researched via derivation and phase screen simulation. Different combinations of two types of phase discontinuities can be obtained by changing the overlapping degree and the phase difference of two coherent Gaussian vortex beams. The scintillation indexes for them verify that the formation condition of the phenomenon is the coexistence of two types of phase discontinuities. And the enhanced scintillation index can be several orders of magnitude larger than that of a plane wave under weak perturbation (Rytov variance). This phenomenon could be useful for both optical vortex detection and perturbation measurement.

preprint2021arXiv

Photoproduction $γp \to K^+Λ(1520)$ in an effective Lagrangian approach

The data on differential cross sections and photon-beam asymmetries for the $γp \to K^+Λ(1520)$ reaction have been analyzed within a tree-level effective Lagrangian approach. In addition to the $t$-channel $K$ and $K^\ast$ exchanges, the $u$-channel $Λ$ exchange, the $s$-channel nucleon exchange, and the interaction current, a minimal number of nucleon resonances in the $s$ channel are introduced in constructing the reaction amplitudes to describe the data. The results show that the experimental data can be well reproduced by including either the $N(2060)5/2^-$ or the $N(2120)3/2^-$ resonance. In both cases, the contact term and the $K$ exchange are found to make significant contributions, while the contributions from the $K^\ast$ and $Λ$ exchanges are negligible in the former case and considerable in the latter case. Measurements of the data on target asymmetries are called on to further pin down the resonance contents and to clarify the roles of the $K^\ast$ and $Λ$ exchanges in this reaction.

preprint2021arXiv

Reinforcement Learning for Beam Pattern Design in Millimeter Wave and Massive MIMO Systems

Employing large antenna arrays is a key characteristic of millimeter wave (mmWave) and terahertz communication systems. However, due to the adoption of fully analog or hybrid analog/digital architectures, as well as non-ideal hardware or arbitrary/unknown array geometries, the accurate channel state information becomes hard to acquire. This impedes the design of beamforming/combining vectors that are crucial to fully exploit the potential of large-scale antenna arrays in providing sufficient receive signal power. In this paper, we develop a novel framework that leverages deep reinforcement learning (DRL) and a Wolpertinger-variant architecture and learns how to iteratively optimize the beam pattern (shape) for serving one or a small set of users relying only on the receive power measurements and without requiring any explicit channel knowledge. The proposed model accounts for key hardware constraints such as the phase-only, constant-modulus, and quantized-angle constraints. Further, the proposed framework can efficiently optimize the beam patterns for systems with non-ideal hardware and for arrays with unknown or arbitrary array geometries. Simulation results show that the developed solution is capable of finding near-optimal beam patterns based only on the receive power measurements.

preprint2021arXiv

Reinforcement Learning of Beam Codebooks in Millimeter Wave and Terahertz MIMO Systems

Millimeter wave (mmWave) and terahertz MIMO systems rely on pre-defined beamforming codebooks for both initial access and data transmission. Being pre-defined, however, these codebooks are commonly not optimized for specific environments, user distributions, and/or possible hardware impairments. This leads to large codebook sizes with high beam training overhead which increases the initial access/tracking latency and makes it hard for these systems to support highly mobile applications. To overcome these limitations, this paper develops a deep reinforcement learning framework that learns how to iteratively optimize the codebook beam patterns (shapes) relying only on the receive power measurements and without requiring any explicit channel knowledge. The developed model learns how to autonomously adapt the beam patterns to best match the surrounding environment, user distribution, hardware impairments, and array geometry. Further, this approach does not require any knowledge about the channel, array geometry, RF hardware, or user positions. To reduce the learning time, the proposed model designs a novel Wolpertinger-variant architecture that is capable of efficiently searching for an optimal policy in a large discrete action space, which is important for large antenna arrays with quantized phase shifters. This complex-valued neural network architecture design respects the practical RF hardware constraints such as the constant-modulus and quantized phase shifter constraints. Simulation results based on the publicly available DeepMIMO dataset confirm the ability of the developed framework to learn near-optimal beam patterns for both line-of-sight (LOS) and non-LOS scenarios and for arrays with hardware impairments without requiring any channel knowledge.

preprint2021arXiv

Self-supervised Low Light Image Enhancement and Denoising

This paper proposes a self-supervised low light image enhancement method based on deep learning, which can improve the image contrast and reduce noise at the same time to avoid the blur caused by pre-/post-denoising. The method contains two deep sub-networks, an Image Contrast Enhancement Network (ICE-Net) and a Re-Enhancement and Denoising Network (RED-Net). The ICE-Net takes the low light image as input and produces a contrast enhanced image. The RED-Net takes the result of ICE-Net and the low light image as input, and can re-enhance the low light image and denoise at the same time. Both of the networks can be trained with low light images only, which is achieved by a Maximum Entropy based Retinex (ME-Retinex) model and an assumption that noises are independently distributed. In the ME-Retinex model, a new constraint on the reflectance image is introduced that the maximum channel of the reflectance image conforms to the maximum channel of the low light image and its entropy should be the largest, which converts the decomposition of reflectance and illumination in Retinex model to a non-ill-conditioned problem and allows the ICE-Net to be trained with a self-supervised way. The loss functions of RED-Net are carefully formulated to separate the noises and details during training, and they are based on the idea that, if noises are independently distributed, after the processing of smoothing filters (\eg mean filter), the gradient of the noise part should be smaller than the gradient of the detail part. It can be proved qualitatively and quantitatively through experiments that the proposed method is efficient.

preprint2020arXiv

2D Convolutional Neural Networks for 3D Digital Breast Tomosynthesis Classification

Automated methods for breast cancer detection have focused on 2D mammography and have largely ignored 3D digital breast tomosynthesis (DBT), which is frequently used in clinical practice. The two key challenges in developing automated methods for DBT classification are handling the variable number of slices and retaining slice-to-slice changes. We propose a novel deep 2D convolutional neural network (CNN) architecture for DBT classification that simultaneously overcomes both challenges. Our approach operates on the full volume, regardless of the number of slices, and allows the use of pre-trained 2D CNNs for feature extraction, which is important given the limited amount of annotated training data. In an extensive evaluation on a real-world clinical dataset, our approach achieves 0.854 auROC, which is 28.80% higher than approaches based on 3D CNNs. We also find that these improvements are stable across a range of model configurations.

preprint2020arXiv

A singularity at the criticality for the free energy in percolation

Consider percolation on the triangular lattice. Let $κ(p)$ be the free energy at the zero field. We show that $$|κ'''(p)| \leq |p-p_c|^{-1/3+o(1)} \mbox{ if } p \neq p_c.$$ Furthermore, we show that there exists a sequence $ε_n\downarrow 0$ such that $$|κ'''(p_c\pm ε_n)|\geq ε_n^{-1/3+o(1)}.$$ This answers affirmatively a conjecture, asked by Sykes and Essam a half century ago, whether $κ(p)$ has a singularity at the criticality.

preprint2020arXiv

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking. In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. On the quality side, we incorporate a large number of utterances across varied domains to increase acoustic diversity and the vocabulary seen by the model. We also train with accented English speech to make the model more robust to different pronunciations. In addition, given the increased amount of training data, we explore a varied learning rate schedule. On the latency front, we explore using the end-of-sentence decision emitted by the RNN-T model to close the microphone, and also introduce various optimizations to improve the speed of LAS rescoring. Overall, we find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model. For example, for the same latency, RNN-T+LAS obtains a 8% relative improvement in WER, while being more than 400-times smaller in model size.

preprint2020arXiv

A Study on Evaluation Standard for Automatic Crack Detection Regard the Random Fractal

A reasonable evaluation standard underlies construction of effective deep learning models. However, we find in experiments that the automatic crack detectors based on deep learning are obviously underestimated by the widely used mean Average Precision (mAP) standard. This paper presents a study on the evaluation standard. It is clarified that the random fractal of crack disables the mAP standard, because the strict box matching in mAP calculation is unreasonable for the fractal feature. As a solution, a fractal-available evaluation standard named CovEval is proposed to correct the underestimation in crack detection. In CovEval, a different matching process based on the idea of covering box matching is adopted for this issue. In detail, Cover Area rate (CAr) is designed as a covering overlap, and a multi-match strategy is employed to release the one-to-one matching restriction in mAP. Extended Recall (XR), Extended Precision (XP) and Extended F-score (Fext) are defined for scoring the crack detectors. In experiments using several common frameworks for object detection, models get much higher scores in crack detection according to CovEval, which matches better with the visual performance. Moreover, based on faster R-CNN framework, we present a case study to optimize a crack detector based on CovEval standard. Recall (XR) of our best model achieves an industrial-level at 95.8, which implies that with reasonable standard for evaluation, the methods for object detection are with great potential for automatic industrial inspection.

preprint2020arXiv

Achieving Multi-Tasking Robots in Multi-Robot Tasks

One simplifying assumption made in distributed robot systems is that the robots are single-tasking: each robot operates on a single task at any time. While such a sanguine assumption is innocent to make in situations with sufficient resources so that the robots can operate independently, it becomes impractical when they must share their capabilities. In this paper, we consider multi-tasking robots with multi-robot tasks. Given a set of tasks, each achievable by a coalition of robots, our approach allows the coalitions to overlap and task synergies to be exploited by reasoning about the physical constraints that can be synergistically satisfied for achieving the tasks. The key contribution of this work is a general and flexible framework to achieve this ability for multi-robot systems in resource-constrained situations to extend their capabilities. The proposed approach is built on the information invariant theory, which specifies the interactions between information requirements. In our work, we map physical constraints to information requirements, thereby allowing task synergies to be identified via the information invariant framework. We show that our algorithm is sound and complete under a problem setting with multi-tasking robots. Simulation results show its effectiveness under resource-constrained situations and in handling challenging situations in a multi-UAV simulator.

preprint2020arXiv

Allocation of Multi-Robot Tasks with Task Variants

Task allocation has been a well studied problem. In most prior problem formulations, it is assumed that each task is associated with a unique set of resource requirements. In the scope of multi-robot task allocation problem, these requirements can be satisfied by a coalition of robots. In this paper, we introduce a more general formulation of multi-robot task allocation problem that allows more than one option for specifying the set of task requirements--satisfying any one of the options will satisfy the task. We referred to this new problem as the multi-robot task allocation problem with task variants. First, we theoretically show that this extension fortunately does not impact the complexity class, which is still NP-complete. For solution methods, we adapt two previous greedy methods for the task allocation problem without task variants to solve this new problem and analyze their effectiveness. In particular, we "flatten" the new problem to the problem without task variants, modify the previous methods to solve the flattened problem, and prove that the bounds still hold. Finally, we thoroughly evaluate these two methods along with a random baseline to demonstrate their efficacy for the new problem.

preprint2020arXiv

An End-to-End Attack on Text-based CAPTCHAs Based on Cycle-Consistent Generative Adversarial Network

As a widely deployed security scheme, text-based CAPTCHAs have become more and more difficult to resist machine learning-based attacks. So far, many researchers have conducted attacking research on text-based CAPTCHAs deployed by different companies (such as Microsoft, Amazon, and Apple) and achieved certain results.However, most of these attacks have some shortcomings, such as poor portability of attack methods, requiring a series of data preprocessing steps, and relying on large amounts of labeled CAPTCHAs. In this paper, we propose an efficient and simple end-to-end attack method based on cycle-consistent generative adversarial networks. Compared with previous studies, our method greatly reduces the cost of data labeling. In addition, this method has high portability. It can attack common text-based CAPTCHA schemes only by modifying a few configuration parameters, which makes the attack easier. Firstly, we train CAPTCHA synthesizers based on the cycle-GAN to generate some fake samples. Basic recognizers based on the convolutional recurrent neural network are trained with the fake data. Subsequently, an active transfer learning method is employed to optimize the basic recognizer utilizing tiny amounts of labeled real-world CAPTCHA samples. Our approach efficiently cracked the CAPTCHA schemes deployed by 10 popular websites, indicating that our attack is likely very general. Additionally, we analyzed the current most popular anti-recognition mechanisms. The results show that the combination of more anti-recognition mechanisms can improve the security of CAPTCHA, but the improvement is limited. Conversely, generating more complex CAPTCHAs may cost more resources and reduce the availability of CAPTCHAs.

preprint2020arXiv

Analysis of the decay $D^0\rightarrow K_{S}^{0} K^{+} K^{-}$

Using a data sample of $2.93~fb^{-1}$ of $e^+e^-$ collisions collected at $\sqrt{s}=3.773 GeV$ in the BESIII experiment, we perform an analysis of the decay $D^0\rightarrow K_{S}^{0} K^{+} K^{-}$. The Dalitz plot is analyzed using $1856\pm 45$ flavor-tagged signal decays. We find that the Dalitz plot is well described by a set of six resonances: $a_0(980)^0$, $a_0(980)^+$, $ϕ(1020)$, $a_2(1320)^+$, $a_2(1320)^-$ and $a_0(1450)^-$. Their magnitudes, phases and fit fractions are determined as well as the coupling of $a_0(980)$ to $K\bar{K}$, $g_{K\bar{K}}=3.77\pm 0.24\text{(stat.)}\pm0.35\text{(sys.)} GeV$. The branching fraction of the decay $D^0\rightarrow K_{S}^{0} K^{+} K^{-}$ is measured using $11660\pm 118$ untagged signal decays to be $(4.51\pm 0.05\text{(stat.)}\pm 0.16\text{(sys.)})10^{-3}$. Both measurements are limited by their systematic uncertainties.

preprint2020arXiv

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." These vulnerabilities are generally unacceptable and call for spoofing countermeasures or "presentation attack detection" systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects.

preprint2020arXiv

Attention: to Better Stand on the Shoulders of Giants

Science of science (SciSci) is an emerging discipline wherein science is used to study the structure and evolution of science itself using large data sets. The increasing availability of digital data on scholarly outcomes offers unprecedented opportunities to explore SciSci. In the progress of science, the previously discovered knowledge principally inspires new scientific ideas, and citation is a reasonably good reflection of this cumulative nature of scientific research. The researches that choose potentially influential references will have a lead over the emerging publications. Although the peer review process is the mainly reliable way of predicting a paper's future impact, the ability to foresee the lasting impact based on citation records is increasingly essential in the scientific impact analysis in the era of big data. This paper develops an attention mechanism for the long-term scientific impact prediction and validates the method based on a real large-scale citation data set. The results break conventional thinking. Instead of accurately simulating the original power-law distribution, emphasizing the limited attention can better stand on the shoulders of giants.

preprint2020arXiv

Better Than Reference In Low Light Image Enhancement: Conditional Re-Enhancement Networks

Low light images suffer from severe noise, low brightness, low contrast, etc. In previous researches, many image enhancement methods have been proposed, but few methods can deal with these problems simultaneously. In this paper, to solve these problems simultaneously, we propose a low light image enhancement method that can combined with supervised learning and previous HSV (Hue, Saturation, Value) or Retinex model based image enhancement methods. First, we analyse the relationship between the HSV color space and the Retinex theory, and show that the V channel (V channel in HSV color space, equals the maximum channel in RGB color space) of the enhanced image can well represent the contrast and brightness enhancement process. Then, a data-driven conditional re-enhancement network (denoted as CRENet) is proposed. The network takes low light images as input and the enhanced V channel as condition, then it can re-enhance the contrast and brightness of the low light image and at the same time reduce noise and color distortion. It should be noted that during the training process, any paired images with different exposure time can be used for training, and there is no need to carefully select the supervised images which will save a lot. In addition, it takes less than 20 ms to process a color image with the resolution 400*600 on a 2080Ti GPU. Finally, some comparative experiments are implemented to prove the effectiveness of the method. The results show that the method proposed in this paper can significantly improve the quality of the enhanced image, and by combining with other image contrast enhancement methods, the final enhancement result can even be better than the reference image in contrast and brightness. (Code will be available at https://github.com/hitzhangyu/image-enhancement-with-denoise)

preprint2020arXiv

Boosting Retailer Revenue by Generated Optimized Combined Multiple Digital Marketing Campaigns

Campaign is a frequently employed instrument in lifting up the GMV (Gross Merchandise Volume) of retailer in traditional marketing. As its counterpart in online context, digital-marketing-campaign (DMC) has being trending in recent years with the rapid development of the e-commerce. However, how to empower massive sellers on the online retailing platform the capacity of applying combined multiple digital marketing campaigns to boost their shops' revenue, is still a novel topic. In this work, a comprehensive solution of generating optimized combined multiple DMCs is presented. Firstly, a potential personalized DMC pool is generated for every retailer by a newly proposed neural network model, i.e. the DMCNet (Digital-Marketing-Campaign Net). Secondly, based on the sub-modular optimization theory and the DMC pool by DMCNet, the generated combined multiple DMCs are ranked with respect to their revenue generation strength then the top three ranked campaigns are returned to the sellers' back-end management system, so that retailers can set combined multiple DMCs for their online shops just in one-shot. Real online A/B-test shows that with the integrated solution, sellers of the online retailing platform increase their shops' GMVs with approximately 6$\%$.

preprint2020arXiv

Brain2Object: Printing Your Mind from Brain Signals with Spatial Correlation Embedding

Electroencephalography (EEG) signals are known to manifest differential patterns when individuals visually concentrate on different objects. In this work, we present an end-to-end digital fabrication system, Brain2Object, to print the 3D object that an individual is observing by decoding visually-evoked brain signals. We propose a unified training framework that combines multi-class Common Spatial Pattern and Convolutional Neural Networks to support the backend computation. We learn the dynamical graph representations of brain signals to accurately capture the structural information among EEG channels. A user-friendly interface is developed as the system front end. Brain2Object presents a streamlined end-to-end workflow that can serve as a template for deeper integration of BCI technologies to assist with our routine activities. The proposed system is evaluated extensively using offline experiments and through an online demonstrator. The experimental results show that our approach can achieve the recognition accuracy of 92.58% on a benchmark dataset and 75.23% on a locally collected dataset. Moreover, our method consistently outperforms a wide range of baseline and state-of-the-art approaches. The proof-of-concept corroborates the practicality of our approach and illustrates the ease with which such a system could be deployed.

preprint2020arXiv

Centrality selection effect on higher-order cumulants of net-proton multiplicity distributions in relativistic heavy-ion collisions

We studied the centrality selection effect on cumulants (up to fourth order) and the cumulants ratios of net-proton multiplicity distributions in Au+Au collisions at $\sqrt{s_{\mathrm{NN}}}$ = 7.7, 19.6 and 200 GeV from UrQMD model. The net-proton cumulants are calculated with collision centralities by using charged particle multiplicity from different pesudorapidity ($η$) region. By comparing the results from various collision centralities, we found that the autocorrelation effects are not significant in the results with collision centralities "refmult-3" and "refmult-2", which are using mid-rapidity charged particles but excluding (anti-)protons and analysis region, respectively. Furthermore, due to the contributions of spectator protons, we observed poor centrality resolution when using charged particles at forward $η$ region at low energies. This work can serve as a baseline for centrality selection of future fluctuations analysis in relativistic heavy-ion collisions.

preprint2020arXiv

CF2-Net: Coarse-to-Fine Fusion Convolutional Network for Breast Ultrasound Image Segmentation

Breast ultrasound (BUS) image segmentation plays a crucial role in a computer-aided diagnosis system, which is regarded as a useful tool to help increase the accuracy of breast cancer diagnosis. Recently, many deep learning methods have been developed for segmentation of BUS image and show some advantages compared with conventional region-, model-, and traditional learning-based methods. However, previous deep learning methods typically use skip-connection to concatenate the encoder and decoder, which might not make full fusion of coarse-to-fine features from encoder and decoder. Since the structure and edge of lesion in BUS image are common blurred, these would make it difficult to learn the discriminant information of structure and edge, and reduce the performance. To this end, we propose and evaluate a coarse-to-fine fusion convolutional network (CF2-Net) based on a novel feature integration strategy (forming an 'E'-like type) for BUS image segmentation. To enhance contour and provide structural information, we concatenate a super-pixel image and the original image as the input of CF2-Net. Meanwhile, to highlight the differences in the lesion regions with variable sizes and relieve the imbalance issue, we further design a weighted-balanced loss function to train the CF2-Net effectively. The proposed CF2-Net was evaluated on an open dataset by using four-fold cross validation. The results of the experiment demonstrate that the CF2-Net obtains state-of-the-art performance when compared with other deep learning-based methods

preprint2020arXiv

Channel Estimation and Hybrid Precoding for Distributed Phased Arrays Based MIMO Wireless Communications

Distributed phased arrays based multiple-input multiple-output (DPA-MIMO) is a newly introduced architecture that enables both spatial multiplexing and beamforming while facilitating highly reconfigurable hardware implementation in millimeter-wave (mmWave) frequency bands. With a DPA-MIMO system, we focus on channel state information (CSI) acquisition and hybrid precoding. As benefited from a coordinated and open-loop pilot beam pattern design, all the sub-arrays can perform channel sounding with less training overhead compared with the traditional orthogonal operation of each sub-array. Furthermore, two sparse channel recovery algorithms, known as joint orthogonal matching pursuit (JOMP) and joint sparse Bayesian learning with $\ell_2$ reweighting (JSBL-$\ell_2$), are proposed to exploit the hidden structured sparsity in the beam-domain channel vector. Finally, successive interference cancellation (SIC) based hybrid precoding through sub-array grouping is illustrated for the DPA-MIMO system, which decomposes the joint sub-array RF beamformer design into an interactive per-sub-array-group handle. Simulation results show that the proposed two channel estimators fully take advantage of the partial coupling characteristic of DPA-MIMO channels to perform channel recovery, and the proposed hybrid precoding algorithm is suitable for such array-of-sub-arrays architecture with satisfactory performance and low complexity.

preprint2020arXiv

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.

preprint2020arXiv

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet. ContextNet features a fully convolutional encoder that incorporates global context information into convolution layers by adding squeeze-and-excitation modules. In addition, we propose a simple scaling method that scales the widths of ContextNet that achieves good trade-off between computation and accuracy. We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2.1%/4.6% without external language model (LM), 1.9%/4.1% with LM and 2.9%/7.0% with only 10M parameters on the clean/noisy LibriSpeech test sets. This compares to the previous best published system of 2.0%/4.6% with LM and 3.9%/11.3% with 20M parameters. The superiority of the proposed ContextNet model is also verified on a much larger internal dataset.

preprint2020arXiv

Customized data-driven RANS closures for bi-fidelity LES-RANS optimization

Multi-fidelity optimization methods promise a high-fidelity optimum at a cost only slightly greater than a low-fidelity optimization. This promise is seldom achieved in practice, due to the requirement that low- and high-fidelity models correlate well. In this article, we propose an efficient bi-fidelity shape optimization method for turbulent fluid-flow applications with Large-Eddy Simulation (LES) and Reynolds-averaged Navier-Stokes (RANS) as the high- and low-fidelity models within a hierarchical-Kriging surrogate modelling framework. Since the LES-RANS correlation is often poor, we use the full LES flow-field at a single point in the design space to derive a custom-tailored RANS closure model that reproduces the LES at that point. This is achieved with machine-learning techniques, specifically sparse regression to obtain high corrections of the turbulence anisotropy tensor and the production of turbulence kinetic energy as functions of the RANS mean-flow. The LES-RANS correlation is dramatically improved throughout the design-space. We demonstrate the effectiveness and efficiency of our method in a proof-of-concept shape optimization of the well-known periodic-hill case. Standard RANS models perform poorly in this case, whereas our method converges to the LES-optimum with only two LES samples.

preprint2020arXiv

Deep Image Clustering with Category-Style Representation

Deep clustering which adopts deep neural networks to obtain optimal representations for clustering has been widely studied recently. In this paper, we propose a novel deep image clustering framework to learn a category-style latent representation in which the category information is disentangled from image style and can be directly used as the cluster assignment. To achieve this goal, mutual information maximization is applied to embed relevant information in the latent representation. Moreover, augmentation-invariant loss is employed to disentangle the representation into category part and style part. Last but not least, a prior distribution is imposed on the latent representation to ensure the elements of the category vector can be used as the probabilities over clusters. Comprehensive experiments demonstrate that the proposed approach outperforms state-of-the-art methods significantly on five public datasets.

preprint2020arXiv

Deep Learning for Massive MIMO with 1-Bit ADCs: When More Antennas Need Fewer Pilots

This paper considers uplink massive MIMO systems with 1-bit analog-to-digital converters (ADCs) and develops a deep-learning based channel estimation framework. In this framework, the prior channel estimation observations and deep neural network models are leveraged to learn the non-trivial mapping from quantized received measurements to channels. For that, we derive the sufficient length and structure of the pilot sequence to guarantee the existence of this mapping function. This leads to the interesting, and \textit{counter-intuitive}, observation that when more antennas are employed by the massive MIMO base station, our proposed deep learning approach achieves better channel estimation performance, for the same pilot sequence length. Equivalently, for the same channel estimation performance, this means that when more antennas are employed, fewer pilots are required. This observation is also analytically proved for some special channel models. Simulation results confirm our observations and show that more antennas lead to better channel estimation both in terms of the normalized mean squared error and the achievable signal-to-noise ratio per antenna.

preprint2020arXiv

Deep Reinforcement Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation

The promising coverage and spectral efficiency gains of intelligent reflecting surfaces (IRSs) are attracting increasing interest. In order to realize these surfaces in practice, however, several challenges need to be addressed. One of these main challenges is how to configure the reflecting coefficients on these passive surfaces without requiring massive channel estimation or beam training overhead. Earlier work suggested leveraging supervised learning tools to design the IRS reflection matrices. While this approach has the potential of reducing the beam training overhead, it requires collecting large datasets for training the neural network models. In this paper, we propose a novel deep reinforcement learning framework for predicting the IRS reflection matrices with minimal training overhead. Simulation results show that the proposed online learning framework can converge to the optimal rate that assumes perfect channel knowledge. This represents an important step towards realizing a standalone IRS operation, where the surface configures itself without any control from the infrastructure.

preprint2020arXiv

Defense-PointNet: Protecting PointNet Against Adversarial Attacks

Despite remarkable performance across a broad range of tasks, neural networks have been shown to be vulnerable to adversarial attacks. Many works focus on adversarial attacks and defenses on 2D images, but few focus on 3D point clouds. In this paper, our goal is to enhance the adversarial robustness of PointNet, which is one of the most widely used models for 3D point clouds. We apply the fast gradient sign attack method (FGSM) on 3D point clouds and find that FGSM can be used to generate not only adversarial images but also adversarial point clouds. To minimize the vulnerability of PointNet to adversarial attacks, we propose Defense-PointNet. We compare our model with two baseline approaches and show that Defense-PointNet significantly improves the robustness of the network against adversarial samples.

preprint2020arXiv

Determination of strong-phase parameters in $D\rightarrow K^0_{S,L}π^+π^-$

We report the most precise measurements to date of the strong-phase parameters between $D^0$ and $\bar{D}^0$ decays to $K^0_{S,L}π^+π^-$ using a sample of 2.93 fb$^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773 GeV with the BESIII detector at the BEPCII collider. Our results provide the key inputs for a binned model-independent determination of the Cabibbo-Kobayashi-Maskawa angle $γ/ϕ_3$ with $B$ decays. Using our results, the decay model sensitivity to the $γ/ϕ_3$ measurement is expected to be between 0.7$^{\circ}$ and 1.2$^{\circ}$, approximately a factor of three smaller than that achievable with previous measurements. The improved precision of this work ensures that measurements of $γ/ϕ_3$ will not be limited by knowledge of strong phases for the next decade. Furthermore, our results provide critical input for other flavor-physics investigations, including charm mixing, other measurements of $CP$ violation, and the measurement of strong-phase parameters for other $D$-decay modes.

preprint2020arXiv

Distant Transfer Learning via Deep Random Walk

Transfer learning, which is to improve the learning performance in the target domain by leveraging useful knowledge from the source domain, often requires that those two domains are very close, which limits its application scope. Recently, distant transfer learning has been studied to transfer knowledge between two distant or even totally unrelated domains via auxiliary domains that are usually unlabeled as a bridge in the spirit of human transitive inference that it is possible to connect two completely unrelated concepts together through gradual knowledge transfer. In this paper, we study distant transfer learning by proposing a DeEp Random Walk basEd distaNt Transfer (DERWENT) method. Different from existing distant transfer learning models that implicitly identify the path of knowledge transfer between the source and target instances through auxiliary instances, the proposed DERWENT model can explicitly learn such paths via the deep random walk technique. Specifically, based on sequences identified by the random walk technique on a data graph where source and target data have no direct edges, the proposed DERWENT model enforces adjacent data points in a squence to be similar, makes the ending data point be represented by other data points in the same sequence, and considers weighted training losses of source data. Empirical studies on several benchmark datasets demonstrate that the proposed DERWENT algorithm yields the state-of-the-art performance.

preprint2020arXiv

Diversifying Seeds and Audience in Social Influence Maximization

Influence maximization (IM) has been extensively studied for better viral marketing. However, previous works put less emphasis on how balancedly the audience are affected across different communities and how diversely the seed nodes are selected. In this paper, we incorporate audience diversity and seed diversity into the IM task. From the model perspective, in order to characterize both influence spread and diversity in our objective function, we adopt three commonly used utilities in economics (i.e., Perfect Substitutes, Perfect Complements and Cobb-Douglas). We validate our choices of these three functions by showing their nice properties. From the algorithmic perspective, we present various approximation strategies to maximize the utilities. In audience diversification, we propose a solution-dependent approximation algorithm to circumvent the hardness results. In seed diversification, we prove a ($1/e-ε$) approximation ratio based on non-monotonic submodular maximization. Experimental results show that our framework outperforms other natural heuristics both in utility maximization and result diversification.

preprint2020arXiv

Efficient Second-Order TreeCRF for Neural Dependency Parsing

In the deep learning (DL) era, parsing models are extremely simplified with little hurt on performance, thanks to the remarkable capability of multi-layer BiLSTMs in context representation. As the most popular graph-based dependency parser due to its high efficiency and performance, the biaffine parser directly scores single dependencies under the arc-factorization assumption, and adopts a very simple local token-wise cross-entropy training loss. This paper for the first time presents a second-order TreeCRF extension to the biaffine parser. For a long time, the complexity and inefficiency of the inside-outside algorithm hinder the popularity of TreeCRF. To address this issue, we propose an effective way to batchify the inside and Viterbi algorithms for direct large matrix operation on GPUs, and to avoid the complex outside algorithm via efficient back-propagation. Experiments and analysis on 27 datasets from 13 languages clearly show that techniques developed before the DL era, such as structural learning (global TreeCRF loss) and high-order modeling are still useful, and can further boost parsing performance over the state-of-the-art biaffine parser, especially for partially annotated training data. We release our code at https://github.com/yzhangcs/crfpar.

preprint2020arXiv

Electron interactions in strain-induced zero-energy flat band in twisted bilayer graphene near the magic angle

In the vicinity of the magic angle in twisted bilayer graphene (TBG), the two low-energy van Hove singularities (VHSs) become exceedingly narrow1-10 and many exotic correlated states, such as superconductivity, ferromagnetism, and topological phases, are observed11-16. Heterostrain, which is almost unavoidable in the TBG, can modify its single-particle band structure and lead to novel properties of the TBG that have never been considered so far. Here, we show that heterostrain in a TBG near the magic angle generates a new zero-energy flat band between the two VHSs. Doping the TBG to partially fill the zero-energy flat band, we observe a correlation-induced gap of about 10 meV that splits the flat band. By applying perpendicular magnetic fields, a large and linear response of the gap to magnetic fields is observed, attributing to the emergence of large orbital magnetic moments in the TBG when valley degeneracy of the flat band is lifted by electron-electron interactions. The orbital magnetic moment per moire supercell is measured as about 15 uB in the TBG.

preprint2020arXiv

Fast and Accurate Neural CRF Constituency Parsing

Estimating probability distribution is one of the core issues in the NLP field. However, in both deep learning (DL) and pre-DL eras, unlike the vast applications of linear-chain CRF in sequence labeling tasks, very few works have applied tree-structure CRF to constituency parsing, mainly due to the complexity and inefficiency of the inside-outside algorithm. This work presents a fast and accurate neural CRF constituency parser. The key idea is to batchify the inside algorithm for loss computation by direct large tensor operations on GPU, and meanwhile avoid the outside algorithm for gradient computation via efficient back-propagation. We also propose a simple two-stage bracketing-then-labeling parsing approach to improve efficiency further. To improve the parsing performance, inspired by recent progress in dependency parsing, we introduce a new scoring architecture based on boundary representation and biaffine attention, and a beneficial dropout strategy. Experiments on PTB, CTB5.1, and CTB7 show that our two-stage CRF parser achieves new state-of-the-art performance on both settings of w/o and w/ BERT, and can parse over 1,000 sentences per second. We release our code at https://github.com/yzhangcs/crfpar.

preprint2020arXiv

Fisher Deep Domain Adaptation

Deep domain adaptation models learn a neural network in an unlabeled target domain by leveraging the knowledge from a labeled source domain. This can be achieved by learning a domain-invariant feature space. Though the learned representations are separable in the source domain, they usually have a large variance and samples with different class labels tend to overlap in the target domain, which yields suboptimal adaptation performance. To fill the gap, a Fisher loss is proposed to learn discriminative representations which are within-class compact and between-class separable. Experimental results on two benchmark datasets show that the Fisher loss is a general and effective loss for deep domain adaptation. Noticeable improvements are brought when it is used together with widely adopted transfer criteria, including MMD, CORAL and domain adversarial loss. For example, an absolute improvement of 6.67% in terms of the mean accuracy is attained when the Fisher loss is used together with the domain adversarial loss on the Office-Home dataset.

preprint2020arXiv

Future Physics Programme of BESIII

There has recently been a dramatic renewal of interest in the subjects of hadron spectroscopy and charm physics. This renaissance has been driven in part by the discovery of a plethora of charmonium-like $XYZ$ states at BESIII and $B$ factories, and the observation of an intriguing proton-antiproton threshold enhancement and the possibly related $X(1835)$ meson state at BESIII, as well as the threshold measurements of charm mesons and charm baryons. We present a detailed survey of the important topics in tau-charm physics and hadron physics that can be further explored at BESIII over the remaining lifetime of BEPCII operation. This survey will help in the optimization of the data-taking plan over the coming years, and provides physics motivation for the possible upgrade of BEPCII to higher luminosity.

preprint2020arXiv

Generation of quantum entangled states of multiple groups of qubits distributed in multiple cavities

Provided that cavities are initially in a Greenberger-Horne-Zeilinger (GHZ) entangled state, we show that GHZ states of N-group qubits distributed in N cavities can be created via a 3-step operation. The GHZ states of the N-group qubits are generated by using N-group qutrits placed in the N cavities. Here, "qutrit" refers to a three-level quantum system with the two lowest levels representing a qubit while the third level acting as an intermediate state necessary for the GHZ state creation. This proposal does not depend on the architecture of the cavity-based quantum network and the way for coupling the cavities. The operation time is independent of the number of qubits. The GHZ states are prepared deterministically because no measurement on the states of qutrits or cavities is needed. In addition, the third energy level of the qutrits during the entire operation is virtually excited and thus decoherence from higher energy levels is greatly suppressed. This proposal is quite general and can in principle be applied to create GHZ states of many qubits using different types of physical qutrits (e.g., atoms, quantum dots, NV centers, various superconducting qutrits, etc.) distributed in multiple cavities. As a specific example, we further discuss the experimental feasibility of preparing a GHZ state of four-group transmon qubits (each group consisting of three qubits) distributed in four one-dimensional transmission line resonators arranged in an array.

preprint2020arXiv

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.

preprint2020arXiv

Improved Trainable Calibration Method for Neural Networks on Medical Imaging Classification

Recent works have shown that deep neural networks can achieve super-human performance in a wide range of image classification tasks in the medical imaging domain. However, these works have primarily focused on classification accuracy, ignoring the important role of uncertainty quantification. Empirically, neural networks are often miscalibrated and overconfident in their predictions. This miscalibration could be problematic in any automatic decision-making system, but we focus on the medical field in which neural network miscalibration has the potential to lead to significant treatment errors. We propose a novel calibration approach that maintains the overall classification accuracy while significantly improving model calibration. The proposed approach is based on expected calibration error, which is a common metric for quantifying miscalibration. Our approach can be easily integrated into any classification task as an auxiliary loss term, thus not requiring an explicit training round for calibration. We show that our approach reduces calibration error significantly across various architectures and datasets.

preprint2020arXiv

Inclusive charged and neutral particle multiplicity distributions in $χ_{cJ}$ and $J/ψ$ decays

Using a sample of 106 million $ψ(3686)$ decays, $ψ(3686) \to γχ_{cJ} (J = 0, 1, 2)$ and $ψ(3686) \to γχ_{cJ}, χ_{cJ} \to γJ/ψ$ $(J = 1, 2)$ events are utilized to study inclusive $χ_{cJ} \to$ anything, $χ_{cJ} \to$ hadrons, and $J/ψ\to$ anything distributions, including distributions of the number of charged tracks, electromagnetic calorimeter showers, and $π^0$s, and to compare them with distributions obtained from the BESIII Monte Carlo simulation. Information from each Monte Carlo simulated decay event is used to construct matrices connecting the detected distributions to the input predetection "produced" distributions. Assuming these matrices also apply to data, they are used to predict the analogous produced distributions of the decay events. Using these, the charged particle multiplicities are compared with results from MARK I. Further, comparison of the distributions of the number of photons in data with those in Monte Carlo simulation indicates that G-parity conservation should be taken into consideration in the simulation.

preprint2020arXiv

Intervalley quantum interference and measurement of Berry phase in bilayer graphene

Chiral quasiparticles in Bernal-stacked bilayer graphene have valley-contrasting Berry phases of 2π. This nontrival topological structure, associated with the pseudospin winding along a closed Fermi surface, is responsible for various novel electronic properties, such as anti-Klein tunneling, unconventional quantum Hall effect, and valley Hall effect1-6. Here we show that the quantum interference due to intervalley scattering induced by atomic defects/impurities provides further insights into the topological nature of the bilayer graphene. The scattered chiral quasiparticles between distinct valleys with opposite chirality undergoes a rotation of pseudospin that results in the Friedel oscillation with wavefront dislocations. The number of dislocations reflects the information about pseudospin texture and hence can be used to measure the Berry phase7. As demonstrated both experimentally and theoretically, the Friedel oscillation, depending on the atomic defect/impurity at different sublattices, can exhibit N = 4, 2, or 0 additional wavefronts, characterizing the 2π Berry phase of the bilayer graphene. Our results not only provide a comprehensive study of the intervalley quantum interference in bilayer graphene, but also shed lights on the pseudospin physics.

preprint2020arXiv

Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

In the pre deep learning era, part-of-speech tags have been considered as indispensable ingredients for feature engineering in dependency parsing. But quite a few works focus on joint tagging and parsing models to avoid error propagation. In contrast, recent studies suggest that POS tagging becomes much less important or even useless for neural parsing, especially when using character-based word representations. Yet there are not enough investigations focusing on this issue, both empirically and linguistically. To answer this, we design and compare three typical multi-task learning framework, i.e., Share-Loose, Share-Tight, and Stack, for joint tagging and parsing based on the state-of-the-art biaffine parser. Considering that it is much cheaper to annotate POS tags than parse trees, we also investigate the utilization of large-scale heterogeneous POS tag data. We conduct experiments on both English and Chinese datasets, and the results clearly show that POS tagging (both homogeneous and heterogeneous) can still significantly improve parsing performance when using the Stack joint framework. We conduct detailed analysis and gain more insights from the linguistic aspect.

preprint2020arXiv

Joint 2D-3D Breast Cancer Classification

Breast cancer is the malignant tumor that causes the highest number of cancer deaths in females. Digital mammograms (DM or 2D mammogram) and digital breast tomosynthesis (DBT or 3D mammogram) are the two types of mammography imagery that are used in clinical practice for breast cancer detection and diagnosis. Radiologists usually read both imaging modalities in combination; however, existing computer-aided diagnosis tools are designed using only one imaging modality. Inspired by clinical practice, we propose an innovative convolutional neural network (CNN) architecture for breast cancer classification, which uses both 2D and 3D mammograms, simultaneously. Our experiment shows that the proposed method significantly improves the performance of breast cancer classification. By assembling three CNN classifiers, the proposed model achieves 0.97 AUC, which is 34.72% higher than the methods using only one imaging modality.

preprint2020arXiv

Learning an Adaptive Model for Extreme Low-light Raw Image Processing

Low-light images suffer from severe noise and low illumination. Current deep learning models that are trained with real-world images have excellent noise reduction, but a ratio parameter must be chosen manually to complete the enhancement pipeline. In this work, we propose an adaptive low-light raw image enhancement network to avoid parameter-handcrafting and to improve image quality. The proposed method can be divided into two sub-models: Brightness Prediction (BP) and Exposure Shifting (ES). The former is designed to control the brightness of the resulting image by estimating a guideline exposure time $t_1$. The latter learns to approximate an exposure-shifting operator $ES$, converting a low-light image with real exposure time $t_0$ to a noise-free image with guideline exposure time $t_1$. Additionally, structural similarity (SSIM) loss and Image Enhancement Vector (IEV) are introduced to promote image quality, and a new Campus Image Dataset (CID) is proposed to overcome the limitations of the existing datasets and to supervise the training of the proposed model. Using the proposed model, we can achieve high-quality low-light image enhancement from a single raw image. In quantitative tests, it is shown that the proposed method has the lowest Noise Level Estimation (NLE) score compared with the state-of-the-art low-light algorithms, suggesting a superior denoising performance. Furthermore, those tests illustrate that the proposed method is able to adaptively control the global image brightness according to the content of the image scene. Lastly, the potential application in video processing is briefly discussed.

preprint2020arXiv

Learning Beam Codebooks with Neural Networks: Towards Environment-Aware mmWave MIMO

Scaling the number of antennas up is a key characteristic of current and future wireless communication systems. The hardware cost and power consumption, however, motivate large-scale MIMO systems, especially at millimeter wave (mmWave) bands, to rely on analog-only or hybrid analog/digital transceiver architectures. With these architectures, mmWave base stations normally use pre-defined beamforming codebooks for both initial access and data transmissions. Current beam codebooks, however, generally adopt single-lobe narrow beams and scan the entire angular space. This leads to high beam training overhead and loss in the achievable beamforming gains. In this paper, we propose a new machine learning framework for learning beamforming codebooks in hardware-constrained large-scale MIMO systems. More specifically, we develop a neural network architecture that accounts for the hardware constraints and learns beam codebooks that adapt to the surrounding environment and the user locations. Simulation results highlight the capability of the proposed solution in learning multi-lobe beams and reducing the codebook size, which leads to noticeable gains compared to classical codebook design approaches.

preprint2020arXiv

Learning Event-Based Motion Deblurring

Recovering sharp video sequence from a motion-blurred image is highly ill-posed due to the significant loss of motion information in the blurring process. For event-based cameras, however, fast motion can be captured as events at high time rate, raising new opportunities to exploring effective solutions. In this paper, we start from a sequential formulation of event-based motion deblurring, then show how its optimization can be unfolded with a novel end-to-end deep architecture. The proposed architecture is a convolutional recurrent neural network that integrates visual and temporal knowledge of both global and local scales in principled manner. To further improve the reconstruction, we propose a differentiable directional event filtering module to effectively extract rich boundary prior from the stream of events. We conduct extensive experiments on the synthetic GoPro dataset and a large newly introduced dataset captured by a DAVIS240C camera. The proposed approach achieves state-of-the-art reconstruction quality, and generalizes better to handling real-world motion blur.

preprint2020arXiv

Location Information Aided Multiple Intelligent Reflecting Surface Systems

This paper proposes a novel location information aided multiple intelligent reflecting surface (IRS) systems. Assuming imperfect user location information, the effective angles from the IRS to the users are estimated, which is then used to design the transmit beam and IRS beam. Furthermore, closed-form expressions for the achievable rate are derived. The analytical findings indicate that the achievable rate can be improved by increasing the number of base station (BS) antennas or reflecting elements. Specifically, a power gain of order $N M^2$ is achieved, where $N$ is the antenna number and $M$ is the number of reflecting elements. Moreover, with a large number of reflecting elements, the individual signal to interference plus noise ratio (SINR) is proportional to $M$, while becomes proportional to $M^2$ as non-line-of-sight (NLOS) paths vanish. Also, it has been shown that high location uncertainty would significantly degrade the achievable rate. Besides, IRSs should be deployed at distinct directions (relative to the BS) and be far away from each other to reduce the interference from multiple IRSs. Finally, an optimal power allocation scheme has been proposed to improve the system performance.

preprint2020arXiv

M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients

Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.

preprint2020arXiv

MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction

Click-through rate (CTR) prediction is a critical task in online advertising systems. Existing works mainly address the single-domain CTR prediction problem and model aspects such as feature interaction, user behavior history and contextual information. Nevertheless, ads are usually displayed with natural content, which offers an opportunity for cross-domain CTR prediction. In this paper, we address this problem and leverage auxiliary data from a source domain to improve the CTR prediction performance of a target domain. Our study is based on UC Toutiao (a news feed service integrated with the UC Browser App, serving hundreds of millions of users daily), where the source domain is the news and the target domain is the ad. In order to effectively leverage news data for predicting CTRs of ads, we propose the Mixed Interest Network (MiNet) which jointly models three types of user interest: 1) long-term interest across domains, 2) short-term interest from the source domain and 3) short-term interest in the target domain. MiNet contains two levels of attentions, where the item-level attention can adaptively distill useful information from clicked news / ads and the interest-level attention can adaptively fuse different interest representations. Offline experiments show that MiNet outperforms several state-of-the-art methods for CTR prediction. We have deployed MiNet in UC Toutiao and the A/B test results show that the online CTR is also improved substantially. MiNet now serves the main ad traffic in UC Toutiao.

preprint2020arXiv

Model-guided Multi-path Knowledge Aggregation for Aerial Saliency Prediction

As an emerging vision platform, a drone can look from many abnormal viewpoints which brings many new challenges into the classic vision task of video saliency prediction. To investigate these challenges, this paper proposes a large-scale video dataset for aerial saliency prediction, which consists of ground-truth salient object regions of 1,000 aerial videos, annotated by 24 subjects. To the best of our knowledge, it is the first large-scale video dataset that focuses on visual saliency prediction on drones. Based on this dataset, we propose a Model-guided Multi-path Network (MM-Net) that serves as a baseline model for aerial video saliency prediction. Inspired by the annotation process in eye-tracking experiments, MM-Net adopts multiple information paths, each of which is initialized under the guidance of a classic saliency model. After that, the visual saliency knowledge encoded in the most representative paths is selected and aggregated to improve the capability of MM-Net in predicting spatial saliency in aerial scenarios. Finally, these spatial predictions are adaptively combined with the temporal saliency predictions via a spatiotemporal optimization algorithm. Experimental results show that MM-Net outperforms ten state-of-the-art models in predicting aerial video saliency.

preprint2020arXiv

Model-independent determination of the relative strong-phase difference between $D^0$ and $\bar{D}^0\rightarrow K^0_{S,L}π^+π^-$ and its impact on the measurement of the CKM angle $γ/ϕ_3$

Crucial inputs for a variety of $CP$-violation studies can be determined through the analysis of pairs of quantum-entangled neutral $D$ mesons, which are produced in the decay of the $ψ(3770)$ resonance. The relative strong-phase parameters between $D^0$ and $\bar{D}^0$ in the decays $D^0\rightarrow K^0_{S,L}π^+π^-$ are studied using 2.93~${\rm fb}^{-1}$ of $e^+e^-$ annihilation data delivered by the BEPCII collider and collected by the BESIII detector at a center-of-mass energy of 3.773 GeV. Results are presented in regions of the phase space of the decay. These are the most precise measurements to date of the strong-phase parameters in $D \to K_{S,L}^0π^+π^-$ decays. Using these parameters, the associated uncertainty on the Cabibbo-Kobayashi-Maskawa angle $γ/ϕ_3$ is expected to be between $0.7^\circ$ and $1.2^\circ$, for an analysis using the decay $B^{\pm}\rightarrow DK^{\pm}$, $D\rightarrow K^0_Sπ^+π^-$, where $D$ represents a superposition of $D^0$ and $\bar{D^0}$ states. This is a factor of three smaller than that achievable with previous measurements. Furthermore, these results provide valuable input for charm-mixing studies, other measurements of $CP$ violation, and the measurement of strong-phase parameters for other $D$-decay modes.

preprint2020arXiv

Momentum Contrastive Learning for Few-Shot COVID-19 Diagnosis from Chest CT Images

The current pandemic, caused by the outbreak of a novel coronavirus (COVID-19) in December 2019, has led to a global emergency that has significantly impacted economies, healthcare systems and personal wellbeing all around the world. Controlling the rapidly evolving disease requires highly sensitive and specific diagnostics. While real-time RT-PCR is the most commonly used, these can take up to 8 hours, and require significant effort from healthcare professionals. As such, there is a critical need for a quick and automatic diagnostic system. Diagnosis from chest CT images is a promising direction. However, current studies are limited by the lack of sufficient training samples, as acquiring annotated CT images is time-consuming. To this end, we propose a new deep learning algorithm for the automated diagnosis of COVID-19, which only requires a few samples for training. Specifically, we use contrastive learning to train an encoder which can capture expressive feature representations on large and publicly available lung datasets and adopt the prototypical network for classification. We validate the efficacy of the proposed model in comparison with other competing methods on two publicly available and annotated COVID-19 CT datasets. Our results demonstrate the superior performance of our model for the accurate diagnosis of COVID-19 based on chest CT images.

preprint2020arXiv

Multiple Structural Priors Guided Self Attention Network for Language Understanding

Self attention networks (SANs) have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, standard SANs are usually position-independent, and thus are incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on SANs for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into SAN models, proposing the Multiple Structural Priors Guided Self Attention Network (MS-SAN) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent hierarchical structure of the texts, we extract these information not only from the word contexts but also from the dependency syntax trees. Experimental results on two tasks show that MS-SAN achieves significant improvements against other strong baselines.

preprint2020arXiv

Neural Networks Based Beam Codebooks: Learning mmWave Massive MIMO Beams that Adapt to Deployment and Hardware

Millimeter wave (mmWave) and massive MIMO systems are intrinsic components of 5G and beyond. These systems rely on using beamforming codebooks for both initial access and data transmission. Current beam codebooks, however, generally consist of a large number of narrow beams that scan all possible directions, even if these directions are never used. This leads to very large training overhead. Further, these codebooks do not normally account for the hardware impairments or the possible non-uniform array geometries, and their calibration is an expensive process. To overcome these limitations, this paper develops an efficient online machine learning framework that learns how to adapt the codebook beam patterns to the specific deployment, surrounding environment, user distribution, and hardware characteristics. This is done by designing a novel complex-valued neural network architecture in which the neuron weights directly model the beamforming weights of the analog phase shifters, accounting for the key hardware constraints such as the constant-modulus and quantized-angles. This model learns the codebook beams through online and self-supervised training avoiding the need for explicit channel state information. This respects the practical situations where the channel is either unavailable, imperfect, or hard to obtain, especially in the presence of hardware impairments. Simulation results highlight the capability of the proposed solution in learning environment and hardware aware beam codebooks, which can significantly reduce the training overhead, enhance the achievable data rates, and improve the robustness against possible hardware impairments.

preprint2020arXiv

Nucleons pair shell model in M-scheme

The nucleon pair shell model (NPSM) is casted into the so-called M-scheme for the cases with isospin symmetry and without isospin symmetry. The odd system and even system are treated on the same foot. The uncoupled commutators for nucleon-pairs, which are suitable for M-scheme, are given. Explicit formula of matrix elements in M-scheme for overlap, one-body operators, two-body operators are obtained. It is found that the $cpu$ time used in calculating the matrix elements in M-scheme is much shorter than that in the J-scheme of NPSM.

preprint2020arXiv

Observation of a cross-section enhancement near mass threshold in $e^{+}e^{-}\rightarrowΛ\barΛ$

The process $e^{+}e^{-}\rightarrowΛ\barΛ$ is studied using data samples at $\sqrt{s}=2.2324$, 2.400, 2.800 and 3.080 GeV collected with the BESIII detector operating at the BEPCII collider. The Born cross section is measured at $\sqrt{s}$=2.2324 GeV, which is 1.0 MeV above the $Λ\barΛ$ mass threshold, to be $305\pm45^{+66}_{-36}$ pb, where the first uncertainty is statistical and the second systematic. The substantial cross section near threshold is significantly larger than that expected from theory, which predicts the cross section to vanish at threshold. The Born cross sections at $\sqrt{s}$=2.400, 2.800 and 3.080 GeV are measured and found to be consistent with previous experimental results, but with improved precision. Finally, the corresponding effective electromagnetic form factors of $Λ$ are deduced.

preprint2020arXiv

Observation of a structure in $e^+e^- \to ϕη^{\prime}$ at $\sqrt{s}$ from 2.05 to 3.08 GeV

The process $e^{+}e^{-} \to ϕη^{\prime}$ has been studied for the first time in detail using data sample collected with the BESIII detector at the BEPCII collider at center of mass energies from 2.05 to 3.08 GeV. A resonance with quantum numbers $J^{PC}=1^{--}$ is observed with mass $M$ = (2177.5 $\pm$ 4.8 (stat) $\pm$ 19.5 (syst)) MeV/${ \it{c}^{\mathrm{2}}}$ and width $Γ$ = (149.0 $\pm$ 15.6 (stat) $\pm$ 8.9 (syst)) MeV with a statistical significance larger than 10$σ$. The observed structure could be identified with the $ϕ(2170)$, then the ratio of partial width between the $ϕη^{\prime}$ by BESIII and $ϕη$ by BABAR is ($\mathcal{B}^{R}_{ϕη}Γ^{R}_{ee})/{(\mathcal{B}^{R}_{ϕη^{\prime}}Γ^{R}_{ee})}$ = 0.23 $\pm$ 0.10 (stat) $\pm$ 0.18 (syst), which is smaller than the prediction of the $s\bar{s}g$ hybrid models by several orders of magnitude.

preprint2020arXiv

Offline Handwritten Chinese Text Recognition with Convolutional Neural Networks

Deep learning based methods have been dominating the text recognition tasks in different and multilingual scenarios. The offline handwritten Chinese text recognition (HCTR) is one of the most challenging tasks because it involves thousands of characters, variant writing styles and complex data collection process. Recently, the recurrent-free architectures for text recognition appears to be competitive as its highly parallelism and comparable results. In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function. To reduce the overfitting, we apply dropout after each max-pooling layer and with extreme high rate on the last one before the linear layer. The CASIA-HWDB database is selected to tune and evaluate the proposed models. With the existing text samples as templates, we randomly choose isolated character samples to synthesis more text samples for training. We finally achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.

preprint2020arXiv

Online Explanation Generation for Human-Robot Teaming

As AI becomes an integral part of our lives, the development of explainable AI, embodied in the decision-making process of an AI or robotic agent, becomes imperative. For a robotic teammate, the ability to generate explanations to justify its behavior is one of the key requirements of explainable agency. Prior work on explanation generation has been focused on supporting the rationale behind the robot's decision or behavior. These approaches, however, fail to consider the mental demand for understanding the received explanation. In other words, the human teammate is expected to understand an explanation no matter how much information is presented. In this work, we argue that explanations, especially those of a complex nature, should be made in an online fashion during the execution, which helps spread out the information to be explained and thus reduce the mental workload of humans in highly cognitive demanding tasks. However, a challenge here is that the different parts of an explanation may be dependent on each other, which must be taken into account when generating online explanations. To this end, a general formulation of online explanation generation is presented with three variations satisfying different "online" properties. The new explanation generation methods are based on a model reconciliation setting introduced in our prior work. We evaluated our methods both with human subjects in a simulated rover domain, using NASA Task Load Index (TLX), and synthetically with ten different problems across two standard IPC domains. Results strongly suggest that our methods generate explanations that are perceived as less cognitively demanding and much preferred over the baselines and are computationally efficient.

preprint2020arXiv

Partial wave analysis of $ψ(3686)\rightarrow K^{+}K^{-}η$

Using a sample of $(448.1\pm2.9)\times10^6$ $ψ(3686)$ events collected with the BESIII detector, we perform the first partial wave analysis of $ψ(3686)\rightarrow K^+K^-η$. In addition to the well established states, $ϕ(1020)$, $ϕ(1680)$, and $K_3^*(1780)$, contributions from $X(1750)$, $ρ(2150)$, $ρ_3(2250)$, and $K^*_2(1980)$ are also observed. The $X(1750)$ state is determined to be a $1^{--}$ resonance. The simultaneous observation of the $ϕ(1680)$ and $X(1750)$ indicates that the $X(1750)$, with previous observations in photoproduction, is distinct from the $ϕ(1680)$. The masses, widths, branching fractions of $ψ(3686)\rightarrow K^+K^-η$ and the intermediate resonances are also measured.

preprint2020arXiv

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them. Instead of relying on fully-typed NER datasets, many efforts have been made to leverage multiple partially-typed ones for training and allow the resulting model to cover a full type set. However, there is neither guarantee on the quality of integrated datasets, nor guidance on the design of training algorithms. Here, we conduct a systematic analysis and comparison between partially-typed NER datasets and fully-typed ones, in both theoretical and empirical manner. Firstly, we derive a bound to establish that models trained with partially-typed annotations can reach a similar performance with the ones trained with fully-typed annotations, which also provides guidance on the algorithm design. Moreover, we conduct controlled experiments, which shows partially-typed datasets leads to similar performance with the model trained with the same amount of fully-typed annotations

preprint2020arXiv

Probing the $L_μ-L_τ$ gauge boson at electron colliders

We investigate the minimal $U(1)_{L_μ-L_τ}$ model with extra heavy vector-like leptons or charged scalars. By studying the kinetic mixing between $U(1)_{L_μ-L_τ}$ gauge boson $Z^\prime$ and standard model photon, which is absent at tree level and will arise at one loop level due to $μ$, $τ$ and new heavy charged leptons or scalars, the interesting behavior is shown. It can provide possibility for visible signatures of new heavy particles. We propose to search for $Z^\prime$ at electron collider experiments, such as Belle II, BESIII and future Super Tau Charm Factory (STCF), using the monophoton final state. The parameter space of $Z^\prime$ is probed, and scanned by its gauge coupling constant $g_{Z^\prime}$ and mass $m_{Z^\prime}$. We find that electron colliders have sensitivity to the previously unexplored parameter space for $Z^\prime$ with MeV-GeV mass. Future STCF experiments with $\sqrt s=2-7$ GeV can exclude the anomalous muon magnetic moment favored area when $m_{Z^\prime}<5$ GeV with the luminosity of 30 ab$^{-1}$. For $m_{Z^\prime} < 2m_μ$, $g_{Z^\prime}$ can be down to $4.2\times 10^{-5}$ at 2 GeV STCF.

preprint2020arXiv

Quantum liquid from strange frustration in the trimer magnet Ba4Ir3O10

Quantum spin systems such as magnetic insulators usually show classical magnetic order, but such classical states can give way to quantum liquids with exotic entanglement through two known mechanisms of frustration: geometric frustration in lattices with triangle motifs, and spin-orbit-coupling frustration in the exactly solvable quantum liquid of Kitaev&#39;s honeycomb lattice. Here we present the experimental observation of a new kind of frustrated quantum liquid arising in an unlikely place: the magnetic insulator Ba4Ir3O10 where Ir3O12 trimers form an unfrustrated square lattice. Experimentally we find a quantum liquid state persisting down to 0.2 K that is stabilized by strong antiferromagnetic interaction with Curie-Weiss temperature - 766 K. The astonishing frustration parameter of 3800 is beyond any known iridate thus far. Heat capacity and thermal conductivity are both linear at low temperatures, a familiar feature in metals but here in an insulator pointing to an exotic quantum liquid state. A mere 2% Sr substitution for Ba produces long-range order at 130 K and destroys the linear-T features. Although the Ir4+(5d5) ions in Ba4Ir3O10 appear to form Ir3O12 trimers of face-sharing IrO6 octahedra, we propose that intra-trimer exchange is reduced and the lattice recombines into an array of coupled 1D chains with additional spins. An extreme limit of decoupled 1D chains can explain most but not all of the striking experimental observations, indicating that the inter-chain coupling plays an important role in the novel frustration mechanism leading to this quantum liquid.

preprint2020arXiv

Residual Attention U-Net for Automated Multi-Class Segmentation of COVID-19 Chest CT Images

The novel coronavirus disease 2019 (COVID-19) has been spreading rapidly around the world and caused significant impact on the public health and economy. However, there is still lack of studies on effectively quantifying the lung infection caused by COVID-19. As a basic but challenging task of the diagnostic framework, segmentation plays a crucial role in accurate quantification of COVID-19 infection measured by computed tomography (CT) images. To this end, we proposed a novel deep learning algorithm for automated segmentation of multiple COVID-19 infection regions. Specifically, we use the Aggregated Residual Transformations to learn a robust and expressive feature representation and apply the soft attention mechanism to improve the capability of the model to distinguish a variety of symptoms of the COVID-19. With a public CT image dataset, we validate the efficacy of the proposed algorithm in comparison with other competing methods. Experimental results demonstrate the outstanding performance of our algorithm for automated segmentation of COVID-19 Chest CT images. Our study provides a promising deep leaning-based segmentation tool to lay a foundation to quantitative diagnosis of COVID-19 lung infection in CT images.

preprint2020arXiv

RGB-D SLAM in Dynamic Environments Using Point Correlations

In this paper, a simultaneous localization and mapping (SLAM) method that eliminates the influence of moving objects in dynamic environments is proposed. This method utilizes the correlation between map points to separate points that are part of the static scene and points that are part of different moving objects into different groups. A sparse graph is first created using Delaunay triangulation from all map points. In this graph, the vertices represent map points, and each edge represents the correlation between adjacent points. If the relative position between two points remains consistent over time, there is correlation between them, and they are considered to be moving together rigidly. If not, they are considered to have no correlation and to be in separate groups. After the edges between the uncorrelated points are removed during point-correlation optimization, the remaining graph separates the map points of the moving objects from the map points of the static scene. The largest group is assumed to be the group of reliable static map points. Finally, motion estimation is performed using only these points. The proposed method was implemented for RGB-D sensors, evaluated with a public RGB-D benchmark, and tested in several additional challenging environments. The experimental results demonstrate that robust and accurate performance can be achieved by the proposed SLAM method in both slightly and highly dynamic environments. Compared with other state-of-the-art methods, the proposed method can provide competitive accuracy with good real-time performance.

preprint2020arXiv

Robust Design for Intelligent Reflecting Surfaces Assisted MISO Systems

In this work, we study the statistically robust beamforming design for an intelligent reflecting surfaces (IRS) assisted multiple-input single-output (MISO) wireless system under imperfect channel state information (CSI), where the channel estimation errors are assumed to be additive Gaussian. We aim at jointly optimizing the transmit/receive beamformers and IRS phase shifts to minimize the average mean squared error (MSE) at the user. In particular, to tackle the non-convex optimization problem, an efficient algorithm is developed by capitalizing on alternating optimization and majorization-minimization techniques. Simulation results show that the proposed scheme achieves robust MSE performance in the presence of CSI error, and substantially outperforms conventional non-robust methods.

preprint2020arXiv

Robust two-dimensional ice on graphene built from finite-length water molecular chains

Interfacial ice on graphene has attracted much attention because it is a model system to study two-dimensional (2D) ice structures on chemically inert substrates. While water-graphene interaction was usually assumed to be negligible, the structures of the 2D ice are believed to be not appreciably perturbed by the graphene substrate. Here we report atomic-resolved characterizations of an exotic 2D ice structure on graphene built from water molecular chains with finite lengths. Our experiments demonstrated that the water molecular chains are exactly orientated along zigzag directions of the graphene substrate, which evidences an anomalously strong interlayer interaction between the 2D ice and the graphene substrate. Moreover, the length of the water molecular chains closely links to the number of graphene layers, indicating layer-number-dependent water-graphene interfacial interactions. Our work highlights the important role of the 2D ice structures on the water-graphene interfacial interactions.

preprint2020arXiv

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community&#39;s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.

preprint2020arXiv

Self-supervised Image Enhancement Network: Training with Low Light Images Only

This paper proposes a self-supervised low light image enhancement method based on deep learning. Inspired by information entropy theory and Retinex model, we proposed a maximum entropy based Retinex model. With this model, a very simple network can separate the illumination and reflectance, and the network can be trained with low light images only. We introduce a constraint that the maximum channel of the reflectance conforms to the maximum channel of the low light image and its entropy should be largest in our model to achieve self-supervised learning. Our model is very simple and does not rely on any well-designed data set (even one low light image can complete the training). The network only needs minute-level training to achieve image enhancement. It can be proved through experiments that the proposed method has reached the state-of-the-art in terms of processing speed and effect.

preprint2020arXiv

Spectroscopic evidence for a spin and valley polarized metallic state in a non-magic-angle twisted bilayer graphene

In the magic-angle twisted bilayer graphene (MA-TBG), strong electron-electron (e-e) correlations caused by the band-flattening lead to many exotic quantum phases such as superconductivity, correlated insulator, ferromagnetism, and quantum anomalous Hall effects, when its low-energy van Hove singularities (VHSs) are partially filled. Here our high-resolution scanning tunneling microscope and spectroscopy measurements demonstrate that the e-e correlation in a non-magic-angle TBG with a twist angle θ = 1.49 still plays an important role in determining its electronic properties. Our most interesting observation on that sample is that when one of its VHS is partially filled, the one associated peak in the spectrum splits into four peaks. Our analysis based on the continuum model suggests that such a one-to-four split of the VHS originates from the formation of an interaction-driven spin-valley-polarized metallic state near the VHS, lifting both the spin and valley degeneracies. Our results for this non-magic-angle TBG reveal a new symmetry-breaking phase, which has not been identified in the MA-TBG or in other systems.

preprint2020arXiv

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task. We show that end-to-end ASR features, which integrate both acoustic and text information from speech, achieve promising results. We use RNN with self-attention as the sentiment classifier, which also provides an easy visualization through attention weights to help interpret model predictions. We use well benchmarked IEMOCAP dataset and a new large-scale speech sentiment dataset SWBD-sentiment for evaluation. Our approach improves the-state-of-the-art accuracy on IEMOCAP from 66.6% to 71.7%, and achieves an accuracy of 70.10% on SWBD-sentiment with more than 49,500 utterances.

preprint2020arXiv

Spontaneous Surface Collapse and Reconstruction in Antiferromagnetic Topological Insulator MnBi$_2$Te$_4$

MnBi$_2$Te$_4$ is an antiferromagnetic topological insulator which stimulates intense interests due to the exotic quantum phenomena and promising device applications. Surface structure is a determinant factor to understand the novel magnetic and topological behavior of MnBi2Te4, yet its precise atomic structure remains elusive. Here, we discovered a spontaneous surface collapse and reconstruction in few-layer MnBi2Te4 exfoliated under delicate protection. Instead of the ideal septuple-layer structure in the bulk, the collapsed surface is shown to reconstruct as Mn-doped Bi$_2$Te$_3$ quintuple-layer and Mn$_x$Bi$_y$Te double-layer with a clear van der Waals gap in between. Combining with first-principles calculations, such spontaneous surface collapse is attributed to the abundant intrinsic Mn-Bi antisite defects and tellurium vacancy in the exfoliated surface, which is further supported by in-situ annealing and electron irradiation experiments. Our results shed light on the understanding of the intricate surface-bulk correspondence of MnBi$_2$Te$_4$, and provide insightful perspective of the surface-related quantum measurements in MnBi$_2$Te$_4$ few-layer devices.

preprint2020arXiv

Study of $e^{+}e^{-} \to D^{+} D^{-} π^{+} π^{-} $ at center-of-mass energies from 4.36 to 4.60 GeV

We report a study of the $e^{+}e^{-} \to D^{+} D^{-} π^{+} π^{-}$ process using $e^{+}e^{-}$ collision data samples with an integrated luminosity of $2.5\,\rm{fb}^{-1}$ at center-of-mass energies from 4.36 to $4.60 \rm{GeV}$, collected with the BESIII detector at the BEPCII storage ring. The $D_{1}(2420)^+$ is observed in the $D^{+} π^{+} π^{-}$ mass spectrum. The mass and width of the $D_{1}(2420)^+$ are measured to be $(2427.2\pm 1.0_{\rm stat.}\pm 1.2_{\rm syst.}) \rm{MeV}/c^2$ and $(23.2\pm 2.3_{\rm stat.} \pm2.3_{\rm syst.}) \rm{MeV}$, respectively. The first errors are statistical and the second ones are systematic. In addition, the Born cross sections of the $e^{+}e^{-} \to D_{1}(2420)^+D^- + c.c. \to D^{+} D^{-} π^{+} π^{-}$ and $e^{+}e^{-} \to ψ(3770) π^{+} π^{-} \to D^{+} D^{-} π^{+} π^{-}$ processes are measured as a function of the center-of-mass energy.

preprint2020arXiv

Sum Rate Optimization for Two Way Communications with Intelligent Reflecting Surface

In this letter, an intelligent reflecting surface (IRS) enhanced full-duplex MIMO two-way communication system is studied. The system sum rate is maximized through jointly optimizing the source precoders and the IRS phase shift matrix. Adopting the idea of Arimoto-Blahut algorithm, the non-convex optimization problem is decoupled into three sub-problems, which are solved alternatingly. All the sub-problems can be solved efficiently with closed-form solutions. In addition, practical IRS assumptions, e.g., discrete phase shift levels, are also considered. Numerical results verify the convergence and performance of the proposed scheme.

preprint2020arXiv

Top-K Influential Nodes in Social Networks: A Game Perspective

Influence maximization, the fundamental of viral marketing, aims to find top-$K$ seed nodes maximizing influence spread under certain spreading models. In this paper, we study influence maximization from a game perspective. We propose a Coordination Game model, in which every individual makes its decision based on the benefit of coordination with its network neighbors, to study information propagation. Our model serves as the generalization of some existing models, such as Majority Vote model and Linear Threshold model. Under the generalized model, we study the hardness of influence maximization and the approximation guarantee of the greedy algorithm. We also combine several strategies to accelerate the algorithm. Experimental results show that after the acceleration, our algorithm significantly outperforms other heuristics, and it is three orders of magnitude faster than the original greedy method.

preprint2020arXiv

Transferring entangled states of photonic cat-state qubits in circuit QED

We propose a method for transferring quantum entangled states of two photonic cat-state qubits (cqubits) from two microwave cavities to the other two microwave cavities. This proposal is realized by using four microwave cavities coupled to a superconducting flux qutrit. Because of using four cavities with different frequencies, the inter-cavity crosstalk is significantly reduced. Since only one coupler qutrit is used, the circuit resources is minimized. The entanglement transfer is completed with a single-step operation only, thus this proposal is quite simple. The third energy level of the coupler qutrit is not populated during the state transfer, therefore decoherence from the higher energy level is greatly suppressed. Our numerical simulations show that high-fidelity transfer of two-cqubit entangled states from two transmission line resonators to the other two transmission line resonators is feasible with current circuit QED technology. This proposal is universal and can be applied to accomplish the same task in a wide range of physical systems, such as four microwave or optical cavities, which are coupled to a natural or artificial three-level atom.

preprint2020arXiv

Tunable lattice reconstruction and bandwidth of flat bands in magic-angle twisted bilayer graphene

The interplay between interlayer van der Waals interaction and intralayer lattice distortion can lead to structural reconstruction in slightly twisted bilayer graphene (TBG) with the twist angle being smaller than a characteristic angle θc. Experimentally, the θc is demonstrated to be very close to the magic angle (θ ~ 1.05°). In this work, we address the transition between reconstructed and unreconstructed structures of the TBG across the magic angle by using scanning tunnelling microscopy (STM). Our experiment demonstrates that both the two structures are stable in the TBG around the magic angle. By applying a STM tip pulse, we show that the two structures can be switched to each other and the bandwidth of the flat bands, which plays a vital role in the emergent strongly correlated states in the magic-angle TBG, can be tuned. The observed tunable lattice reconstruction and bandwidth of the flat bands provide an extra control knob to manipulate the exotic electronic states of the TBG near the magic angle.

preprint2020arXiv

Unsupervised Domain Adaptation for Mammogram Image Classification: A Promising Tool for Model Generalization

Generalization is one of the key challenges in the clinical validation and application of deep learning models to medical images. Studies have shown that such models trained on publicly available datasets often do not work well on real-world clinical data due to the differences in patient population and image device configurations. Also, manually annotating clinical images is expensive. In this work, we propose an unsupervised domain adaptation (UDA) method using Cycle-GAN to improve the generalization ability of the model without using any additional manual annotations.

preprint2020arXiv

Where are the Dangerous Intersections for Pedestrians and Cyclists: A Colocation-Based Approach

Pedestrians and cyclists are vulnerable road users. They are at greater risk for being killed in a crash than other road users. The percentage of fatal crashes that involve a pedestrian or cyclist is higher than the overall percentage of total trips taken by both modes. Because of this risk, finding ways to minimize problematic street environments is critical. Understanding traffic safety spatial patterns and identifying dangerous locations with significantly high crash risks for pedestrians and cyclists is essential in order to design possible countermeasures to improve road safety. This research develops two indicators for examining spatial correlation patterns between elements of the built environment (intersections) and crashes (pedestrian- or cyclist-involved). The global colocation quotient detects the overall connection in an area while the local colocation quotient identifies the locations of high-risk intersections. To illustrate our approach, we applied the methods to inspect the colocation patterns between pedestrian- or cyclist-vehicle crashes and intersections in Houston, Texas and we identified among many intersections the ones that significantly attract crashes. We also scrutinized those intersections, discussed possible attributes leading to high colocation of crashes and proposed corresponding countermeasures.

preprint2015arXiv

Measurement of the $\mathrm e^+\mathrm e^-\rightarrow\mathrmπ^+\mathrmπ^-$ Cross Section between 600 and 900 MeV Using Initial State Radiation

We extract the $e^+e^-\rightarrow π^+π^-$ cross section in the energy range between 600 and 900 MeV, exploiting the method of initial state radiation. A data set with an integrated luminosity of 2.93 fb$^{-1}$ taken at a center-of-mass energy of 3.773 GeV with the BESIII detector at the BEPCII collider is used. The cross section is measured with a systematic uncertainty of 0.9%. We extract the pion form factor $|F_π|^2$ as well as the contribution of the measured cross section to the leading order hadronic vacuum polarization contribution to $(g-2)_μ$. We find this value to be $a_μ^{ππ,\rm LO}(600-900\;\rm MeV) = (368.2 \pm 2.5_{\rm stat} \pm 3.3_{\rm sys})\cdot 10^{-10}$.