Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
191works
0followers
58topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

191 published item(s)

preprint2026arXiv

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and propose DUDE (Deceptive UI Detector & Evaluator), a two-stage framework combining hybrid-reward learning with asymmetric penalties and experience summarization to distill failure patterns into transferable guidance. We introduce RUC (Real UI Clickboxes), a benchmark of 1,407 scenarios spanning four domains and deception categories. Experiments show DUDE reduces deception susceptibility by 53.8% while maintaining task performance, establishing an effective foundation for robust web agent deployment.

preprint2025arXiv

Continuous Angular Power Spectrum Recovery From Channel Covariance via Chebyshev Polynomials

This paper proposes a Chebyshev polynomial expansion framework for the recovery of a continuous angular power spectrum (APS) from channel covariance. By exploiting the orthogonality of Chebyshev polynomials in a transformed domain, we derive an exact series representation of the covariance and reformulate the inherently ill-posed APS inversion as a finite-dimensional linear regression problem via truncation. The associated approximation error is directly controlled by the tail of the APS's Chebyshev series and decays rapidly with increasing angular smoothness. Building on this representation, we derive an exact semidefinite characterization of nonnegative APS and introduce a derivative-based regularizer that promotes smoothly varying APS profiles while preserving transitions of clusters. Simulation results show that the proposed Chebyshev-based framework yields accurate APS reconstruction, and enables reliable downlink (DL) covariance prediction from uplink (UL) measurements in a frequency division duplex (FDD) setting. These findings indicate that jointly exploiting smoothness and nonnegativity in a Chebyshev domain provides an effective tool for covariance-domain processing in multi-antenna systems.

preprint2024arXiv

Collaborative Watermarking for Adversarial Speech Synthesis

Advances in neural speech synthesis have brought us technology that is not only close to human naturalness, but is also capable of instant voice cloning with little data, and is highly accessible with pre-trained models available. Naturally, the potential flood of generated content raises the need for synthetic speech detection and watermarking. Recently, considerable research effort in synthetic speech detection has been related to the Automatic Speaker Verification and Spoofing Countermeasure Challenge (ASVspoof), which focuses on passive countermeasures. This paper takes a complementary view to generated speech detection: a synthesis system should make an active effort to watermark the generated speech in a way that aids detection by another machine, but remains transparent to a human listener. We propose a collaborative training scheme for synthetic speech watermarking and show that a HiFi-GAN neural vocoder collaborating with the ASVspoof 2021 baseline countermeasure models consistently improves detection performance over conventional classifier training. Furthermore, we demonstrate how collaborative training can be paired with augmentation strategies for added robustness against noise and time-stretching. Finally, listening tests demonstrate that collaborative training has little adverse effect on perceptual quality of vocoded speech.

preprint2024arXiv

Early Results from GLASS-JWST XXIII: The transmission of Lyman-alpha from UV-faint z ~ 3-6 galaxies

Lyman-alpha (Ly$α$) emission from galaxies can be used to trace neutral hydrogen in the epoch of reionization, however, there is a degeneracy between the attenuation of Ly$α$ in the intergalactic medium (IGM) and the line profile emitted from the galaxy. Large shifts of Ly$α$ redward of systemic due to scattering in the interstellar medium can boost Ly$α$ transmission in the IGM during reionization. The relationship between Ly$α$ velocity offset from systemic and other galaxy properties is not well-established at high-redshift or low luminosities, due to the difficulty of observing emission lines which trace systemic redshift. Rest-frame optical spectroscopy with JWST/NIRSpec has opened a new window into understanding of Ly$α$ at z>3. We present a sample of 12 UV-faint galaxies ($-20 \lesssim$ MUV $\lesssim -16$) at $3 \lesssim z \lesssim 6$, with Ly$α$ velocity offsets, $Δv_{\mathrm{Ly}α}$, measured from VLT/MUSE and JWST/NIRSpec from the GLASS-JWST Early Release Program. We find median $Δv_{\mathrm{Ly}α}$ of 205 km s$^{-1}$ and standard deviation 75 km s$^{-1}$, compared to 320 and 170km s$^{-1}$ for MUV < -20 galaxies in the literature. Our new sample demonstrates the previously observed trend of decreasing Ly$α$ velocity offset with decreasing UV luminosity and optical line velocity dispersion, extends to MUV $\gtrsim$ -20, consistent with a picture where the Ly$α$ profile is shaped by gas close to the systemic redshift. Our results imply that during reionization Ly$α$ from UV-faint galaxies will be preferentially attenuated, but that detecting Ly$α$ with low $Δv_{\mathrm{Ly}α}$ can be an indicator of large ionized bubbles.

preprint2024arXiv

High-Efficiency Resonant Beam Charging and Communication

With the development of Internet of Things (IoT), demands of power and data for IoT devices increase drastically. In order to resolve the supply-demand contradiction, simultaneous wireless information and power transfer (SWIPT) has been envisioned as an enabling technology by providing high-power energy transfer and high-rate data delivering concurrently. In this paper, we introduce a high-efficiency resonant beam (RB) charging and communication scheme. The scheme utilizes the semiconductor materials as gain medium, which has a better energy absorption capacity compared with the traditional solid-state one. Moreover, the telescope internal modulator (TIM) are adopted in the scheme which can concentrate beams to match the gain size, reducing the transmission loss. To evaluate the scheme SWIPT performance, we establish an analytical model and study the influence factors of its beam transmission, energy conversion, output power, and spectral efficiency. Numerical results shows that the proposed RB system can realize 16 W electric power output with 11 % end-to-end conversion efficiency, and support 18 bit/s/Hz spectral efficiency for communication.

preprint2024arXiv

Lyman Continuum Emission from AGN at 2.3$\lesssim$z$\lesssim$3.7 in the UVCANDELS Fields

We present the results of our search for Lyman continuum (LyC) emitting AGN at redshifts 2.3$\lesssim$z$\lesssim$4.9 from HST WFC3 F275W observations in the UVCANDELS fields. We also include LyC emission from AGN using HST WFC3 F225W, F275W, and F336W found in the ERS and HDUV data. We performed exhaustive queries of the Vizier database to locate AGN with high quality spectroscopic redshifts. In total, we found 51 AGN that met our criteria within the UVCANDELS and ERS footprints. Of these 51, we find 12 AGN had $\geq$4$σ$ detected LyC flux in the WFC3/UVIS images. Using space- and ground-based data from X-ray to radio, we fit the multi-wavelength photometric data of each AGN to a CIGALE SED and correlate various SED parameters to the LyC flux. KS-tests of the SED parameter distributions for the LyC-detected and non-detected AGN showed they are likely not distinct samples. However, we find that X-ray luminosity, star-formation onset age, and disk luminosity show strong correlations relative to their emitted LyC flux. We also find strong correlation of the LyC flux to several dust parameters, i.e., polar and toroidal dust emission, 6 $μm$ luminosity, and anti-correlation with metallicity and $A_{FUV}$. We simulate the LyC escape fraction ($f_{esc}$) using the CIGALE and IGM transmission models for the LyC-detected AGN and find an average $f_{esc}$$\simeq$18%, weighted by uncertainties. We stack the LyC flux of subsamples of AGN according to the wavelength continuum region in which they are detected and find no significant distinctions in their LyC emission, although our $sub-mm\ detected$ F336W sample shows the brightest stacked LyC flux. These findings indicate that LyC-production and -escape in AGN is more complicated than the simple assumption of thermal emission and a 100% escape fraction. Further testing of AGN models with larger samples than presented here is needed.

preprint2024arXiv

Spoofing attack augmentation: can differently-trained attack models improve generalisation?

A reliable deepfake detector or spoofing countermeasure (CM) should be robust in the face of unpredictable spoofing attacks. To encourage the learning of more generaliseable artefacts, rather than those specific only to known attacks, CMs are usually exposed to a broad variety of different attacks during training. Even so, the performance of deep-learning-based CM solutions are known to vary, sometimes substantially, when they are retrained with different initialisations, hyper-parameters or training data partitions. We show in this paper that the potency of spoofing attacks, also deep-learning-based, can similarly vary according to training conditions, sometimes resulting in substantial degradations to detection performance. Nevertheless, while a RawNet2 CM model is vulnerable when only modest adjustments are made to the attack algorithm, those based upon graph attention networks and self-supervised learning are reassuringly robust. The focus upon training data generated with different attack algorithms might not be sufficient on its own to ensure generaliability; some form of spoofing attack augmentation at the algorithm level can be complementary.

preprint2023arXiv

Learning-based Intelligent Surface Configuration, User Selection, Channel Allocation, and Modulation Adaptation for Jamming-resisting Multiuser OFDMA Systems

Reconfigurable intelligent surfaces (RISs) can potentially combat jamming attacks by diffusing jamming signals. This paper jointly optimizes user selection, channel allocation, modulation-coding, and RIS configuration in a multiuser OFDMA system under a jamming attack. This problem is non-trivial and has never been addressed, because of its mixed-integer programming nature and difficulties in acquiring channel state information (CSI) involving the RIS and jammer. We propose a new deep reinforcement learning (DRL)-based approach, which learns only through changes in the received data rates of the users to reject the jamming signals and maximize the sum rate of the system. The key idea is that we decouple the discrete selection of users, channels, and modulation-coding from the continuous RIS configuration, hence facilitating the RIS configuration with the latest twin delayed deep deterministic policy gradient (TD3) model. Another important aspect is that we show a winner-takes-all strategy is almost surely optimal for selecting the users, channels, and modulation-coding, given a learned RIS configuration. Simulations show that the new approach converges fast to fulfill the benefit of the RIS, due to its substantially small state and action spaces. Without the need of the CSI, the approach is promising and offers practical value.

preprint2023arXiv

Suppression of laser beam&#39;s polarization and intensity fluctuation via a Mach-Zehnder interferometer with proper feedback

Long ground-Rydberg coherence lifetime is interesting for implementing high-fidelity quantum logic gates, many-body physics, and other quantum information protocols. However, the potential well formed by a conventional far-off-resonance red-detuned optical-dipole trap that is attractive for ground-state cold atoms is usually repulsive for Rydberg atoms, which will result in the rapid loss of atoms and low repetition rate of the experimental sequence. Moreover, the coherence time will be sharply shortened due to the residual thermal motion of cold atoms. These issues can be addressed by a one-dimensional magic lattice trap, which can form a deeper potential trap than the traveling wave optical dipole trap when the output power is limited. In addition, these common techniques for atomic confinement generally have certain requirements for the polarization and intensity stability of the laser. Here, we demonstrated a method to suppress both the polarization drift and power fluctuation only based on the phase management of the Mach-Zehnder interferometer for a one-dimensional magic lattice trap. With the combination of three wave plates and the interferometer, we used the instrument to collect data in the time domain, analyzed the fluctuation of laser intensity, and calculated the noise power spectral density. We found that the total intensity fluctuation comprising laser power fluctuation and polarization drift was significantly suppressed, and the noise power spectral density after closed-loop locking with a typical bandwidth of 1-3000 Hz was significantly lower than that under the free running of the laser system. Typically, at 1000 Hz, the noise power spectral density after locking was about 10 dB lower than that under the free running of a master oscillator power amplifier system.The intensity-polarization control technique provides potential applications.

preprint2022arXiv

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

Temporal Sentence Grounding in Videos (TSGV), which aims to ground a natural language sentence in an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found that current benchmark datasets may have obvious moment annotation biases, enabling several simple baselines even without training to achieve SOTA performance. In this paper, we take a closer look at existing evaluation protocols, and find both the prevailing dataset and evaluation metrics are the devils that lead to untrustworthy benchmarking. Therefore, we propose to re-organize the two widely-used datasets, making the ground-truth moment distributions different in the training and test splits, i.e., out-of-distribution (OOD) test. Meanwhile, we introduce a new evaluation metric &#34;dR@n,IoU@m&#34; that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets. New benchmarking results indicate that our proposed evaluation protocols can better monitor the research progress. Furthermore, we propose a novel causality-based Multi-branch Deconfounding Debiasing (MDD) framework for unbiased moment prediction. Specifically, we design a multi-branch deconfounder to eliminate the effects caused by multiple confounders with causal intervention. In order to help the model better align the semantics between sentence queries and video moments, we enhance the representations during feature encoding. Specifically, for textual information, the query is parsed into several verb-centered phrases to obtain a more fine-grained textual feature. For visual information, the positional information has been decomposed from moment features to enhance representations of moments with diverse locations. Extensive experiments demonstrate that our proposed approach can achieve competitive results among existing SOTA approaches and outperform the base model with great gains.

preprint2022arXiv

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

Clustering is a fundamental machine learning task which has been widely studied in the literature. Classic clustering methods follow the assumption that data are represented as features in a vectorized form through various representation learning techniques. As the data become increasingly complicated and complex, the shallow (traditional) clustering methods can no longer handle the high-dimensional data type. With the huge success of deep learning, especially the deep unsupervised learning, many representation learning techniques with deep architectures have been proposed in the past decade. Recently, the concept of Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. Motivated by the tremendous success of deep learning in clustering, one of the most fundamental machine learning tasks, and the large number of recent advances in this direction, in this paper we conduct a comprehensive survey on deep clustering by proposing a new taxonomy of different state-of-the-art approaches. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering. Moreover, this survey also provides the popular benchmark datasets, evaluation metrics and open-source implementations to clearly illustrate various experimental settings. Last but not least, we discuss the practical applications of deep clustering and suggest challenging topics deserving further investigations as future directions.

preprint2022arXiv

A Practical Guide to Logical Access Voice Presentation Attack Detection

Voice-based human-machine interfaces with an automatic speaker verification (ASV) component are commonly used in the market. However, the threat from presentation attacks is also growing since attackers can use recent speech synthesis technology to produce a natural-sounding voice of a victim. Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable. Research on voice PAD has seen significant progress since the early 2010s, including the advancement in PAD models, benchmark datasets, and evaluation campaigns. This chapter presents a practical guide to the field of voice PAD, with a focus on logical access attacks using text-to-speech and voice conversion algorithms and spoofing countermeasures based on artifact detection. It introduces the basic concept of voice PAD, explains the common techniques, and provides an experimental study using recent methods on a benchmark dataset. Code for the experiments is open-sourced.

preprint2022arXiv

A Theoretical View on Sparsely Activated Networks

Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs to subnetworks (e.g., the Mixture of Experts (MoE) paradigm in Switch Transformers). However, prior work is largely empirical, and while existing routing functions work well in practice, they do not lead to theoretical guarantees on approximation ability. We aim to provide a theoretical explanation for the power of sparse networks. As our first contribution, we present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures. We then introduce a routing function based on locality sensitive hashing (LSH) that enables us to reason about how well sparse networks approximate target functions. After representing LSH-based sparse networks with our model, we prove that sparse networks can match the approximation power of dense networks on Lipschitz functions. Applying LSH on the input vectors means that the experts interpolate the target function in different subregions of the input space. To support our theory, we define various datasets based on Lipschitz target functions, and we show that sparse networks give a favorable trade-off between number of active units and approximation quality.

preprint2022arXiv

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss, while substantially reducing the engineering efforts of having separate models.

preprint2022arXiv

Accidental symmetries in the scalar potential of the Standard Model extended with two Higgs triplets

The extension of the Standard Model (SM) with two Higgs triplets offers an appealing way to account for both tiny Majorana neutrino masses via the type-II seesaw mechanism and the cosmological matter-antimatter asymmetry via the triplet leptogenesis. In this paper, we classify all possible accidental symmetries in the scalar potential of the two-Higgs-triplet model (2HTM). Based on the bilinear-field formalism, we show that the maximal symmetry group of the 2HTM potential is ${\rm SO(4)}$ and eight types of accidental symmetries in total can be identified. Furthermore, we examine the impact of the couplings between the SM Higgs doublet and the Higgs triplets on the accidental symmetries. The bounded-from-below conditions on the scalar potential with specific accidental symmetries are also derived. Taking the ${\rm SO(4)}$-invariant scalar potential as an example, we investigate the vacuum structures and the scalar mass spectra of the 2HTM.

preprint2022arXiv

Adaptive Worker Grouping For Communication-Efficient and Straggler-Tolerant Distributed SGD

Wall-clock convergence time and communication load are key performance metrics for the distributed implementation of stochastic gradient descent (SGD) in parameter server settings. Communication-adaptive distributed Adam (CADA) has been recently proposed as a way to reduce communication load via the adaptive selection of workers. CADA is subject to performance degradation in terms of wall-clock convergence time in the presence of stragglers. This paper proposes a novel scheme named grouping-based CADA (G-CADA) that retains the advantages of CADA in reducing the communication load, while increasing the robustness to stragglers at the cost of additional storage at the workers. G-CADA partitions the workers into groups of workers that are assigned the same data shards. Groups are scheduled adaptively at each iteration, and the server only waits for the fastest worker in each selected group. We provide analysis and experimental results to elaborate the significant gains on the wall-clock time, as well as communication load and computation load, of G-CADA over other benchmark schemes.

preprint2022arXiv

Adversarial Attack Framework on Graph Embedding Models with Limited Knowledge

With the success of the graph embedding model in both academic and industry areas, the robustness of graph embedding against adversarial attack inevitably becomes a crucial problem in graph learning. Existing works usually perform the attack in a white-box fashion: they need to access the predictions/labels to construct their adversarial loss. However, the inaccessibility of predictions/labels makes the white-box attack impractical to a real graph learning system. This paper promotes current frameworks in a more general and flexible sense -- we demand to attack various kinds of graph embedding models with black-box driven. We investigate the theoretical connections between graph signal processing and graph embedding models and formulate the graph embedding model as a general graph signal process with a corresponding graph filter. Therefore, we design a generalized adversarial attacker: GF-Attack. Without accessing any labels and model predictions, GF-Attack can perform the attack directly on the graph filter in a black-box fashion. We further prove that GF-Attack can perform an effective attack without knowing the number of layers of graph embedding models. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experiments validate the effectiveness of GF-Attack on several benchmark datasets.

preprint2022arXiv

An Edge-Cloud Integrated Framework for Flexible and Dynamic Stream Analytics

With the popularity of Internet of Things (IoT), edge computing and cloud computing, more and more stream analytics applications are being developed including real-time trend prediction and object detection on top of IoT sensing data. One popular type of stream analytics is the recurrent neural network (RNN) deep learning model based time series or sequence data prediction and forecasting. Different from traditional analytics that assumes data are available ahead of time and will not change, stream analytics deals with data that are being generated continuously and data trend/distribution could change (a.k.a. concept drift), which will cause prediction/forecasting accuracy to drop over time. One other challenge is to find the best resource provisioning for stream analytics to achieve good overall latency. In this paper, we study how to best leverage edge and cloud resources to achieve better accuracy and latency for stream analytics using a type of RNN model called long short-term memory (LSTM). We propose a novel edge-cloud integrated framework for hybrid stream analytics that supports low latency inference on the edge and high capacity training on the cloud. To achieve flexible deployment, we study different approaches of deploying our hybrid learning framework including edge-centric, cloud-centric and edge-cloud integrated. Further, our hybrid learning framework can dynamically combine inference results from an LSTM model pre-trained based on historical data and another LSTM model re-trained periodically based on the most recent data. Using real-world and simulated stream datasets, our experiments show the proposed edge-cloud deployment is the best among all three deployment types in terms of latency. For accuracy, the experiments show our dynamic learning approach performs the best among all learning approaches for all three concept drift scenarios.

preprint2022arXiv

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online.

preprint2022arXiv

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve reliability in the face of uncontrolled, unpredictable attacks are hence needed. We report in this paper our efforts to use self-supervised learning in the form of a wav2vec 2.0 front-end with fine tuning. Despite initial base representations being learned using only bona fide data and no spoofed data, we obtain the lowest equal error rates reported in the literature for both the ASVspoof 2021 Logical Access and Deepfake databases. When combined with data augmentation,these results correspond to an improvement of almost 90% relative to our baseline system.

preprint2022arXiv

Chiral Quantum Network with Giant Atoms

In superconducting quantum circuits (SQCs), chiral routing quantum information is often realized with the ferrite circulators, which are usually bulky, lossy and require strong magnetic fields. To overcome those problems, we propose a novel method to realize chiral quantum networks by exploiting giant atom effects in SQC platforms. By assuming each coupling point being modulated with time, the interaction becomes momentum-dependent, and giant atoms will chirally emit photons due to interference effects. The chiral factor can approach 1, and both the emission direction and rate can be freely tuned by the modulating signals. We demonstrate that a high-fidelity state transfer between remote giant atoms can be realized. Our proposal can be integrated on the superconducting chip easily, and has the potential to work as a tunable toolbox for quantum information processing in future chiral quantum networks.

preprint2022arXiv

CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training

Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics of source code from different views, including plain text, Abstract Syntax Tree (AST), and several kinds of code graphs (e.g., Control/Data Flow Graph). However, most of them only consider a single view of source code independently, ignoring the correspondences among different views. In this paper, we propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training, and name our model as CODE-MVP. Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework. Inspired by the type checking in compilation, we also design a fine-grained type inference objective in the pre-training. Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines. For example, we achieve 2.4/2.3/1.1 gain in terms of MRR/MAP/Accuracy metrics on natural language code retrieval, code similarity, and code defect detection tasks, respectively.

preprint2022arXiv

Communication-Efficient Local SGD with Age-Based Worker Selection

A major bottleneck of distributed learning under parameter-server (PS) framework is communication cost due to frequent bidirectional transmissions between the PS and workers. To address this issue, local stochastic gradient descent (SGD) and worker selection have been exploited by reducing the communication frequency and the number of participating workers at each round, respectively. However, partial participation can be detrimental to convergence rate, especially for heterogeneous local datasets. In this paper, to improve communication efficiency and speed up the training process, we develop a novel worker selection strategy named AgeSel. The key enabler of AgeSel is utilization of the ages of workers to balance their participation frequencies. The convergence of local SGD with the proposed age-based partial worker participation is rigorously established. Simulation results demonstrate that the proposed AgeSel strategy can significantly reduce the number of training rounds needed to achieve a targeted accuracy, as well as the communication cost. The influence of the algorithm hyper-parameter is also explored to manifest the benefit of age-based worker selection.

preprint2022arXiv

Compilable Neural Code Generation with Compiler Feedback

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code generation as text generation, either constrained by grammar structures in decoder, or driven by pre-trained language models on large-scale code corpus (e.g., CodeGPT, PLBART, and CodeT5). However, few of them account for compilability of the generated programs. To improve compilability of the generated programs, this paper proposes COMPCODER, a three-stage pipeline utilizing compiler feedback for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Comprehensive experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 in code completion on average and from 70.3 to 96.2 in text-to-code generation, respectively, when comparing with the state-of-the-art CodeGPT.

preprint2022arXiv

Context-Aware Streaming Perception in Dynamic Environments

Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming settings on average. In this paper, we propose to maximize streaming accuracy for every environment context. We posit that scenario difficulty influences the initial (offline) accuracy difference, while obstacle displacement in the scene affects the subsequent accuracy degradation. Our method, Octopus, uses these scenario properties to select configurations that maximize streaming accuracy at test time. Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach. Further, performance improvement using our method comes in addition to, and not instead of, advances in offline accuracy.

preprint2022arXiv

Cosmological constraints from the density gradient weighted correlation function

The mark weighted correlation function (MCF) $W(s,μ)$ is a computationally efficient statistical measure which can probe clustering information beyond that of the conventional 2-point statistics. In this work, we extend the traditional mark weighted statistics by using powers of the density field gradient $|\nabla ρ/ρ|^α$ as the weight, and use the angular dependence of the scale-averaged MCFs to constrain cosmological parameters. The analysis shows that the gradient based weighting scheme is statistically more powerful than the density based weighting scheme, while combining the two schemes together is more powerful than separately using either of them. Utilising the density weighted or the gradient weighted MCFs with $α=0.5,\ 1$, we can strengthen the constraint on $Ω_m$ by factors of 2 or 4, respectively, compared with the standard 2-point correlation function, while simultaneously using the MCFs of the two weighting schemes together can be $1.25$ times more statistically powerful than using the gradient weighting scheme alone. The mark weighted statistics may play an important role in cosmological analysis of future large-scale surveys. Many issues, including the possibility of using other types of weights, the influence of the bias on this statistics, as well as the usage of MCFs in the tomographic Alcock-Paczynski method, are worth further investigations.

preprint2022arXiv

Coupling two charge qubits via a superconducting resonator operating in the resonant and dispersive regimes

A key challenge for semiconductor quantum-dot charge qubits is the realization of long-range qubit coupling and performing high-fidelity gates based on it. Here, we describe a new type of charge qubit formed by an electron confined in a triple-quantum-dot system, enabling single and two-qubit gates working in the dipolar and quadrupolar detuning sweet spots. We further present the form for the long-range dipolar coupling between the charge qubit and the superconducting resonator. Based on the hybrid system composed of the qubits and the resonator, we present two types of entangling gates: the dynamical iSWAP gate and holonomic entangling gate, which are operating in the dispersive and resonant regimes, respectively. We find that the fidelity for the iSWAP gate can reach fidelity higher than 99\% for the noise level typical in experiments. Meanwhile, the fidelity for the holonomic gate can surpass 98\% if the anharmonicity in the resonator is large enough. Our proposal offers an alternative useful way to build up high-fidelity quantum computation for charge qubits in semiconductor quantum dot.

preprint2022arXiv

Covering Grassmannian Codes: Bounds and Constructions

Grassmannian $\mathcal{G}_q(n,k)$ is the set of all $k$-dimensional subspaces of the vector space $\mathbb{F}_q^n.$ Recently, Etzion and Zhang introduced a new notion called covering Grassmannian code which can be used in network coding solutions for generalized combination networks. An $α$-$(n,k,δ)_q^c$ covering Grassmannian code $\mathcal{C}$ is a subset of $\mathcal{G}_q(n,k)$ such that every set of $α$ codewords of $\mathcal{C}$ spans a subspace of dimension at least $δ+k$ in $\mathbb{F}_q^n.$ In this paper, we derive new upper and lower bounds on the size of covering Grassmannian codes. These bounds improve and extend the parameter range of known bounds.

preprint2022arXiv

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives, and incorporates a non-smooth regularization term for the better generalization ability. Decentralized stochastic proximal gradient (DSPG) method is commonly used to train this type of learning models, while the convergence rate is retarded by the variance of stochastic gradients. In this paper, we propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique. The basic idea is to introduce an estimator in each node, which tracks the local full gradient periodically, to correct the stochastic gradient at each iteration. By transforming our decentralized algorithm into a centralized inexact proximal gradient algorithm with variance reduction, and controlling the bounds of error sequences, we prove that DPSVRG converges at the rate of $O(1/T)$ for general convex objectives plus a non-smooth term with $T$ as the number of iterations, while DSPG converges at the rate $O(\frac{1}{\sqrt{T}})$. Our experiments on different applications, network topologies and learning models demonstrate that DPSVRG converges much faster than DSPG, and the loss function of DPSVRG decreases smoothly along with the training epochs.

preprint2022arXiv

Deep Learning-based Massive MIMO CSI Acquisition for 5G Evolution and 6G

Recently, inspired by successful applications in many fields, deep learning (DL) technologies for CSI acquisition have received considerable research interest from both academia and industry. Considering the practical feedback mechanism of 5th generation (5G) New radio (NR) networks, we propose two implementation schemes for artificial intelligence for CSI (AI4CSI), the DL-based receiver and end-to-end design, respectively. The proposed AI4CSI schemes were evaluated in 5G NR networks in terms of spectrum efficiency (SE), feedback overhead, and computational complexity, and compared with legacy schemes. To demonstrate whether these schemes can be used in real-life scenarios, both the modeled-based channel data and practically measured channels were used in our investigations. When DL-based CSI acquisition is applied to the receiver only, which has little air interface impact, it provides approximately 25\% SE gain at a moderate feedback overhead level. It is feasible to deploy it in current 5G networks during 5G evolutions. For the end-to-end DL-based CSI enhancements, the evaluations also demonstrated their additional performance gain on SE, which is 6% -- 26% compared with DL-based receivers and 33% -- 58% compared with legacy CSI schemes. Considering its large impact on air-interface design, it will be a candidate technology for 6th generation (6G) networks, in which an air interface designed by artificial intelligence can be used.

preprint2022arXiv

Dichotomic Pattern Mining with Applications to Intent Prediction from Semi-Structured Clickstream Datasets

We introduce a pattern mining framework that operates on semi-structured datasets and exploits the dichotomy between outcomes. Our approach takes advantage of constraint reasoning to find sequential patterns that occur frequently and exhibit desired properties. This allows the creation of novel pattern embeddings that are useful for knowledge extraction and predictive modeling. Finally, we present an application on customer intent prediction from digital clickstream data. Overall, we show that pattern embeddings play an integrator role between semi-structured data and machine learning models, improve the performance of the downstream task and retain interpretability.

preprint2022arXiv

Domain Shift-oriented Machine Anomalous Sound Detection Model Based on Self-Supervised Learning

Thanks to the development of deep learning, research on machine anomalous sound detection based on self-supervised learning has made remarkable achievements. However, there are differences in the acoustic characteristics of the test set and the training set under different operating conditions of the same machine (domain shifts). It is challenging for the existing detection methods to learn the domain shifts features stably with low computation overhead. To address these problems, we propose a domain shift-oriented machine anomalous sound detection model based on self-supervised learning (TranSelf-DyGCN) in this paper. Firstly, we design a time-frequency domain feature modeling network to capture global and local spatial and time-domain features, thus improving the stability of machine anomalous sound detection stability under domain shifts. Then, we adopt a Dynamic Graph Convolutional Network (DyGCN) to model the inter-dependence relationship between domain shifts features, enabling the model to perceive domain shifts features efficiently. Finally, we use a Domain Adaptive Network (DAN) to compensate for the performance decrease caused by domain shifts, making the model adapt to anomalous sound better in the self-supervised environment. The performance of the suggested model is validated on DCASE 2020 task 2 and DCASE 2022 task 2.

preprint2022arXiv

Early results from GLASS-JWST. IX: First spectroscopic confirmation of low-mass quiescent galaxies at $z>2$ with NIRISS

How passive galaxies form, and the physical mechanisms which prevent star formation over long timescales, are some of the most outstanding questions in understanding galaxy evolution. The properties of quiescent galaxies over cosmic time provide crucial information to identify the quenching mechanisms. Passive galaxies have been confirmed and studied out to $z\sim4$, but all of these studies have been limited to massive systems (mostly with $\log{(M_{\rm star}/M_{\odot})}>10.8$). Using James Webb Space Telescope (JWST) NIRISS grism slitless spectroscopic data from the GLASS JWST ERS program, we present spectroscopic confirmation of two quiescent galaxies at $z_{\rm spec}=2.650^{+0.004}_{-0.006}$ and $z_{\rm spec}=2.433^{+0.032}_{-0.016}$ (3$σ$ errors) with stellar masses of $\log{(M_{\rm star}/M_{\odot})}=10.53^{+0.18}_{-0.06}$ and $\log{(M_{\rm star}/M_{\odot})}=9.93^{+0.06}_{-0.07}$ (corrected for magnification factors of $μ=2.0$ and $μ=2.1$, respectively). The latter represents the first spectroscopic confirmation of the existence of low-mass quiescent galaxies at cosmic noon, showcasing the power of JWST to identify and characterize this enigmatic population.

preprint2022arXiv

Early results from GLASS-JWST. XI: Stellar masses and mass-to-light ratio of z>7 galaxies

We exploit James Webb Space Telescope (JWST) NIRCam observations from the GLASS-JWST-Early Release Science program to investigate galaxy stellar masses at z>7. We first show that JWST observations reduce the uncertainties on the stellar mass by a factor of at least 5-10, when compared with the highest quality data sets available to date. We then study the UV mass-to-light ratio, finding that galaxies exhibit a two orders of magnitude range of M/L_UV values for a given luminosity, indicative of a broad variety of physical conditions and star formation histories. As a consequence, previous estimates of the cosmic star stellar mass density - based on an average correlation between UV luminosity and stellar mass - can be biased by as much as a factor of ~6. Our first exploration demonstrates that JWST represents a new era in our understanding of stellar masses at z>7, and therefore of the growth of galaxies prior to cosmic reionization.

preprint2022arXiv

Energy-Efficient UAV-Mounted RIS Assisted Mobile Edge Computing

Unmanned aerial vehicle (UAV) and reconfigurable intelligent surface (RIS) have been recently applied in the field of mobile edge computing (MEC) to improve the data exchange environment by proactively changing the wireless channels through maneuverable location deployment and intelligent signals reflection, respectively. Nevertheless, they may suffer from inherent limitations in practical scenarios. UAV-mounted RIS (U-RIS), as a promising integrated approach, can combine the advantages of UAV and RIS to break the limit. Inspired by this, we consider a novel U-RIS assisted MEC system, where a U-RIS is deployed to assist the communication between the ground users and an MEC server. The joint UAV trajectory, RIS passive beamforming and MEC resource allocation design is developed to maximize the energy efficiency (EE) of the system. To tackle the intractable non-convex problem, we divide it into two subproblems and solve them iteratively based on successive convex approximation (SCA) and the Dinkelbach method. Finally we obtain a high-performance suboptimal solution. Simulation results show that the proposed algorithm significantly improves the energy efficiency of the MEC system.

preprint2022arXiv

Enhanced brain structure-function tethering in transmodal cortex revealed by high-frequency eigenmodes

The brain&#39;s structural connectome supports signal propagation between neuronal elements, shaping diverse coactivation patterns that can be captured as functional connectivity. While the link between structure and function remains an ongoing challenge, the prevailing hypothesis is that the structure-function relationship may itself be gradually decoupled along a macroscale functional gradient spanning unimodal to transmodal regions. However, this hypothesis is strongly constrained by the underlying models which may neglect requisite signaling mechanisms. Here, we transform the structural connectome into a set of orthogonal eigenmodes governing frequency-specific diffusion patterns and show that regional structure-function relationships vary markedly under different signaling mechanisms. Specifically, low-frequency eigenmodes, which are considered sufficient to capture the essence of the functional network, contribute little to functional connectivity reconstruction in transmodal regions, resulting in structure-function decoupling along the unimodal-transmodal gradient. In contrast, high-frequency eigenmodes, which are usually on the periphery of attention due to their association with noisy and random dynamical patterns, contribute significantly to functional connectivity prediction in transmodal regions, inducing gradually convergent structure-function relationships from unimodal to transmodal regions. Although the information in high-frequency eigenmodes is weak and scattered, it effectively enhances the structure-function correspondence by 35% in unimodal regions and 56% in transmodal regions. Altogether, our findings suggest that the structure-function divergence in transmodal areas may not be an intrinsic property of brain organization, but can be narrowed through multiplexed and regionally specialized signaling mechanisms.

preprint2022arXiv

Estimating the confidence of speech spoofing countermeasure

Conventional speech spoofing countermeasures (CMs) are designed to make a binary decision on an input trial. However, a CM trained on a closed-set database is theoretically not guaranteed to perform well on unknown spoofing attacks. In some scenarios, an alternative strategy is to let the CM defer a decision when it is not confident. The question is then how to estimate a CM&#39;s confidence regarding an input trial. We investigated a few confidence estimators that can be easily plugged into a CM. On the ASVspoof2019 logical access database, the results demonstrate that an energy-based estimator and a neural-network-based one achieved acceptable performance in identifying unknown attacks in the test set. On a test set with additional unknown attacks and bona fide trials from other databases, the confidence estimators performed moderately well, and the CMs better discriminated bona fide and spoofed trials that had a high confidence score. Additional results also revealed the difficulty in enhancing a confidence estimator by adding unknown attacks to the training set.

preprint2022arXiv

Eyes Tell All: Irregular Pupil Shapes Reveal GAN-generated Faces

Generative adversary network (GAN) generated high-realistic human faces have been used as profile images for fake social media accounts and are visually challenging to discern from real ones. In this work, we show that GAN-generated faces can be exposed via irregular pupil shapes. This phenomenon is caused by the lack of physiological constraints in the GAN models. We demonstrate that such artifacts exist widely in high-quality GAN-generated faces and further describe an automatic method to extract the pupils from two eyes and analysis their shapes for exposing the GAN-generated faces. Qualitative and quantitative evaluations of our method suggest its simplicity and effectiveness in distinguishing GAN-generated faces.

preprint2022arXiv

Facility Location with Congestion and Priority in Drone-Based Emergency Delivery

Thanks to their fast delivery, reduced traffic restrictions, and low manpower need, drones have been increasingly deployed to deliver time-critical materials, such as medication, blood, and exam kits, in emergency situations. This paper considers a facility location model of using drones as mobile servers in emergency delivery. The model jointly optimizes the location of facilities, the capacity of drones deployed at opened facilities, and the allocation of demands, with an objective of equitable response times among all demand sites. To this end, we employ queues to model the system congestion of drone requests and consider three queuing disciplines: non-priority, static priority, and dynamic priority. For each discipline, we approximate the model as a mixed-integer second-order conic program (MISOCP), which can readily be solved in commercial solvers. We conduct extensive computational experiments to demonstrate the effectiveness and accuracy of our approach. Additionally, we compare the system performance under the three queuing disciplines and various problem parameters, from which we produce operational recommendations to decision makers in emergency delivery.

preprint2022arXiv

Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards

Generating accurate descriptions for online fashion items is important not only for enhancing customers&#39; shopping experiences, but also for the increase of online sales. Besides the need of correctly presenting the attributes of items, the expressions in an enchanting style could better attract customer interests. The goal of this work is to develop a novel learning framework for accurate and expressive fashion captioning. Different from popular work on image captioning, it is hard to identify and describe the rich attributes of fashion items. We seed the description of an item by first identifying its attributes, and introduce attribute-level semantic (ALS) reward and sentence-level semantic (SLS) reward as metrics to improve the quality of text descriptions. We further integrate the training of our model with maximum likelihood estimation (MLE), attribute embedding, and Reinforcement Learning (RL). To facilitate the learning, we build a new FAshion CAptioning Dataset (FACAD), which contains 993K images and 130K corresponding enchanting and diverse descriptions. Experiments on FACAD demonstrate the effectiveness of our model.

preprint2022arXiv

First Census of Gas-phase Metallicity Gradients of Star-forming Galaxies in Overdense Environments at Cosmic Noon

We report the first spatially resolved measurements of gas-phase metallicity radial gradients in star-forming galaxies in overdense environments at $z\gtrsim2$. The spectroscopic data are acquired by the \mg\ survey, a Hubble Space Telescope (HST) cycle-28 medium program. This program is obtaining 45 orbits of WFC3/IR grism spectroscopy in the density peak regions of three massive galaxy protoclusters (BOSS 1244, BOSS 1542 and BOSS 1441) at $z=2-3$. Our sample in the BOSS 1244 field consists of 20 galaxies with stellar-mass ranging from $10^{9.0}$ to $10^{10.3}$ \Msun\ , star formation rate (SFR) from 10 to 240 \Msun\,yr$^{-1}$, and global gas-phase metallicity (\oh) from 8.2 to 8.6. At $1σ$ confidence level, 2/20 galaxies in our sample show positive (inverted) gradients -- the relative abundance of oxygen increasing with galactocentric radius, opposite the usual trend. Furthermore, 1/20 shows negative gradients and 17/20 are consistent with flat gradients. This high fraction of flat/inverted gradients is uncommon in simulations and previous observations conducted in blank fields at similar redshifts. To understand this, we investigate the correlations among various observed properties of our sample galaxies. We find an anticorrelation between metallicity gradient and global metallicity of our galaxies residing in extreme overdensities, and a marked deficiency of metallicity in our massive galaxies as compared to their coeval field counterparts. We conclude that the cold-mode gas accretion plays an active role in shaping the chemical evolution of galaxies in the protocluster environments, diluting their central chemical abundance, and flattening/inverting their metallicity gradients.

preprint2022arXiv

Fluorination Increases Hydrophobicity at the Macroscopic Level but not at the Microscopic Level

Hydrophobic interactions have been studied in detail in the past based on hydrophobic polymers, such as polystyrene (PS). Because fluorinated materials have relatively low surface energy, they often show both oleophobicity and hydrophobicity at the macroscopic level. However, it remains unknown how fluorination of hydrophobic polymer influences hydrophobicity at the microscopic level. In this work, we synthesized PS and fluorine-substituted PS (FPS) by reversible addition-fragmentation chain transfer polymerization method. Contact angle measurements confirmed that FPS is more hydrophobic than PS at the macroscopic level due to the introduction of fluorine. However, single molecule force spectroscopy experiments showed that the forces required to unfold the PS and FPS nanoparticles in water are indistinguishable, indicating that the strength of the hydrophobic ffect that drives the self-assembly of PS and FPS nanoparticles is the same at the microscopic level. The divergence of hydrophobic effect at the macroscopic and microscopic level may hint different underlying mechanisms: the hydrophobicity is dominated by the solvent hydration at the microscopic level and the surface-associated interaction at the macroscopic level.

preprint2022arXiv

Fundamental limitations on optimization in variational quantum algorithms

Exploring quantum applications of near-term quantum devices is a rapidly growing field of quantum information science with both theoretical and practical interests. A leading paradigm to establish such near-term quantum applications is variational quantum algorithms (VQAs). These algorithms use a classical optimizer to train a parameterized quantum circuit to accomplish certain tasks, where the circuits are usually randomly initialized. In this work, we prove that for a broad class of such random circuits, the variation range of the cost function via adjusting any local quantum gate within the circuit vanishes exponentially in the number of qubits with a high probability. This result can unify the restrictions on gradient-based and gradient-free optimizations in a natural manner and reveal extra harsh constraints on the training landscapes of VQAs. Hence a fundamental limitation on the trainability of VQAs is unraveled, indicating the essential mechanism of the optimization hardness in the Hilbert space with exponential dimension. We further showcase the validity of our results with numerical simulations of representative VQAs. We believe that these results would deepen our understanding of the scalability of VQAs and shed light on the search for near-term quantum applications with advantages.

preprint2022arXiv

Hermite-Gaussian-mode coherently composed states and deep learning based free-space optical communication link

In laser-based free-space optical communication, besides OAM beams, Hermite-Gaussian (HG) modes or HG-mode coherently composed states (HG-MCCS) can also be adopted as the information carrier to extend the channel capacity with the spatial pattern based encoding and decoding link. The light field of HG-MCCS is mainly determined by three independent parameters, including indexes of HG modes, relative initial phases between two eigenmodes, and scale coefficients of the eigenmodes, which can obtain a large number of effective coding modes at a low mode order. The beam intensity distributions of the HG-MCCSs have obvious distinguishable spatial characteristics and can keep propagation invariance, which are convenient to be decoded by the convolutional neural network (CNN) based image recognition method. We experimentally utilize HG-MCCS to realize a communication link including encoding, transmission under atmospheric turbulence (AT), and decoding based on CNN. With the index order of eigenmodes within six, 125 HG-MCCS are generated and used for information encoding, and the average recognition accuracy reached 99.5% for non-AT conditions. For the 125-level color images transmission, the error rate of the system is less than 1.8% even under the weak AT condition. Our work provides a useful basis for the future combination of dense data communication and artificial intelligence technology.

preprint2022arXiv

Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information. Recent works have been devoted to leveraging text summarization and have achieved promising results. However, these summarization-based methods did not take full advantage of the summary including ignoring the inherent interactions between the summary and document. As a result, they limited the representation to express major points in the document, which is highly indicative of the key sentiment. In this paper, we study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA. A Hierarchical Interaction Networks (HIN) is proposed to explore bidirectional interactions between the summary and document at multiple granularities and learn subject-oriented document representations for sentiment classification. Furthermore, we design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation. We extensively evaluate our proposed models on three public datasets. The experimental results consistently demonstrate the effectiveness of our proposed models and show that HIN-SR outperforms various state-of-the-art methods.

preprint2022arXiv

Hybrid subconvexity bounds for twists of $\rm GL(3)$ $L$-functions

Let $π$ be a $SL(3,\mathbb Z)$ Hecke-Maass cusp form and $χ$ a primitive Dirichlet character of prime power conductor $\mathfrak{q}=p^k$ with $p$ prime. In this paper we will prove the following subconvexity bound $$ L\left(\frac{1}{2}+it,π\times χ\right)\ll_{π,\varepsilon} p^{3/4}\big(\mathfrak{q}(1+|t|)\big)^{3/4-3/40+\varepsilon}, $$ for any $\varepsilon >0$ and $t \in \mathbb{R}$.

preprint2022arXiv

Hydrodynamic Relaxation in a Strongly Interacting Fermi Gas

We measure the free decay of a spatially periodic density profile in a normal fluid strongly interacting Fermi gas, which is confined in a box potential. This spatial profile is initially created in thermal equilibrium by a perturbing potential. After the perturbation is abruptly extinguished, the dominant spatial Fourier component exhibits an exponentially decaying (thermally diffusive) mode and a decaying oscillatory (first sound) mode, enabling independent measurement of the thermal conductivity and the shear viscosity directly from the time-dependent evolution.

preprint2022arXiv

Incremental Graph Computation: Anchored Vertex Tracking in Dynamic Social Networks

User engagement has recently received significant attention in understanding the decay and expansion of communities in many online social networking platforms. When a user chooses to leave a social networking platform, it may cause a cascading dropping out among her friends. In many scenarios, it would be a good idea to persuade critical users to stay active in the network and prevent such a cascade because critical users can have significant influence on user engagement of the whole network. Many user engagement studies have been conducted to find a set of critical (anchored) users in the static social network. However, social networks are highly dynamic and their structures are continuously evolving. In order to fully utilize the power of anchored users in evolving networks, existing studies have to mine multiple sets of anchored users at different times, which incurs an expensive computational cost. To better understand user engagement in evolving network, we target a new research problem called Anchored Vertex Tracking (AVT) in this paper, aiming to track the anchored users at each timestamp of evolving networks. Nonetheless, it is nontrivial to handle the AVT problem which we have proved to be NP-hard. To address the challenge, we develop a greedy algorithm inspired by the previous anchored k-core study in the static networks. Furthermore, we design an incremental algorithm to efficiently solve the AVT problem by utilizing the smoothness of the network structure&#39;s evolution. The extensive experiments conducted on real and synthetic datasets demonstrate the performance of our proposed algorithms and the effectiveness in solving the AVT problem.

preprint2022arXiv

Investigating self-supervised front ends for speech spoofing countermeasures

Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets.

preprint2022arXiv

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and back-end models may lead to a local minimum, which theoretically prevents the whole system from achieving the best optimization. Although some methods have been proposed for jointly optimizing the two models, such as the generalized end-to-end (GE2E) model and NPLDA E2E model, all of these methods are designed for use with a single enrollment utterance. In this paper, we propose a new E2E joint method for speaker verification especially designed for the practical case of multiple enrollment utterances. In order to leverage the intra-relationship among multiple enrollment utterances, our model comes equipped with frame-level and utterance-level attention mechanisms. We also utilize several data augmentation techniques, including conventional noise augmentation using MUSAN and RIRs datasets and a unique speaker embedding-level mixup strategy for better optimization.

preprint2022arXiv

Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt it into another language. In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments were conducted on the VoicePrivacy Challenge 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.

preprint2022arXiv

Learning to Solve Travelling Salesman Problem with Hardness-adaptive Curriculum

Various neural network models have been proposed to tackle combinatorial optimization problems such as the travelling salesman problem (TSP). Existing learning-based TSP methods adopt a simple setting that the training and testing data are independent and identically distributed. However, the existing literature fails to solve TSP instances when training and testing data have different distributions. Concretely, we find that different training and testing distribution will result in more difficult TSP instances, i.e., the solution obtained by the model has a large gap from the optimal solution. To tackle this problem, in this work, we study learning-based TSP methods when training and testing data have different distributions using adaptive-hardness, i.e., how difficult a TSP instance can be for a solver. This problem is challenging because it is non-trivial to (1) define hardness measurement quantitatively; (2) efficiently and continuously generate sufficiently hard TSP instances upon model training; (3) fully utilize instances with different levels of hardness to learn a more powerful TSP solver. To solve these challenges, we first propose a principled hardness measurement to quantify the hardness of TSP instances. Then, we propose a hardness-adaptive generator to generate instances with different hardness. We further propose a curriculum learner fully utilizing these instances to train the TSP solver. Experiments show that our hardness-adaptive generator can generate instances ten times harder than the existing methods, and our proposed method achieves significant improvement over state-of-the-art models in terms of the optimality gap.

preprint2022arXiv

Lessons learned from the NeurIPS 2021 MetaDL challenge: Backbone fine-tuning without episodic meta-learning dominates for few-shot learning image classification

Although deep neural networks are capable of achieving performance superior to humans on various tasks, they are notorious for requiring large amounts of data and computing resources, restricting their success to domains where such resources are available. Metalearning methods can address this problem by transferring knowledge from related tasks, thus reducing the amount of data and computing resources needed to learn new tasks. We organize the MetaDL competition series, which provide opportunities for research groups all over the world to create and experimentally assess new meta-(deep)learning solutions for real problems. In this paper, authored collaboratively between the competition organizers and the top-ranked participants, we describe the design of the competition, the datasets, the best experimental results, as well as the top-ranked methods in the NeurIPS 2021 challenge, which attracted 15 active teams who made it to the final phase (by outperforming the baseline), making over 100 code submissions during the feedback phase. The solutions of the top participants have been open-sourced. The lessons learned include that learning good representations is essential for effective transfer learning.

preprint2022arXiv

Lyman Continuum Galaxy Candidates in COSMOS

Star-forming galaxies are the sources likely to have reionized the universe. As we cannot observe them directly due to the opacity of the intergalactic medium at $z\gtrsim5$, we study $z\sim3\text{--}5$ galaxies as proxies to place observational constraints on cosmic reionization. Using new deep \textit{Hubble Space Telescope} rest-frame UV F336W and F435W imaging (30-orbit, $\sim40$~arcmin$^2$, $\sim29\text{--}30$~mag depth at 5$σ$), we attempt to identify a sample of Lyman continuum galaxies (LCGs). These are individual sources that emit ionizing flux below the Lyman break ($<912~\textÅ$). This population would allow us to constrain cosmic reionization parameters such as the number density and escape fraction ($f_{\rm esc}$) of ionizing sources. We compile a comprehensive parent sample that does not rely on the Lyman-break technique for redshifts. We present three new spectroscopic candidates at $z\sim3.7\text{--}4.4$, and 32 new photometric candidates. The high-resolution multi-band HST imaging and new Keck/Low Resolution Imaging Spectrometer (LRIS) redshifts make these promising spectroscopic LCG candidates. Using both a traditional and probabilistic approach, we find the most likely $f_{\rm esc}$ values for the three spectroscopic LCG candidates are $>100\%$, and therefore not physical. We are unable to confirm the true nature of these sources with the best available imaging and direct blue Keck/LRIS spectroscopy. More spectra, especially from the new class of 30 m telescopes, will be required to build a statistical sample of LCGs to place firm observational constraints on cosmic reionization.

preprint2022arXiv

Mask Wearing Status Estimation with Smartwatches

We present MaskReminder, an automatic mask-wearing status estimation system based on smartwatches, to remind users who may be exposed to the COVID-19 virus transmission scenarios, to wear a mask. MaskReminder with the powerful MLP-Mixer deep learning model can effectively learn long-short range information from the inertial measurement unit readings, and can recognize the mask-related hand movements such as wearing a mask, lowering the metal strap of the mask, removing the strap from behind one side of the ears, etc. Extensive experiments on 20 volunteers and 8000+ data samples show that the average recognition accuracy is 89%. Moreover, MaskReminder is capable to remind a user to wear with a success rate of 90% even in the user-independent setting.

preprint2022arXiv

Medical Matting: A New Perspective on Medical Segmentation with Uncertainty

It is difficult to accurately label ambiguous and complex shaped targets manually by binary masks. The weakness of binary mask under-expression is highlighted in medical image segmentation, where blurring is prevalent. In the case of multiple annotations, reaching a consensus for clinicians by binary masks is more challenging. Moreover, these uncertain areas are related to the lesions&#39; structure and may contain anatomical information beneficial to diagnosis. However, current studies on uncertainty mainly focus on the uncertainty in model training and data labels. None of them investigate the influence of the ambiguous nature of the lesion itself.Inspired by image matting, this paper introduces alpha matte as a soft mask to represent uncertain areas in medical scenes and accordingly puts forward a new uncertainty quantification method to fill the gap of uncertainty research for lesion structure. In this work, we introduce a new architecture to generate binary masks and alpha mattes in a multitasking framework, which outperforms all state-of-the-art matting algorithms compared. The proposed uncertainty map is able to highlight the ambiguous regions and a novel multitasking loss weighting strategy we presented can improve performance further and demonstrate their concrete benefits. To fully-evaluate the effectiveness of our proposed method, we first labelled three medical datasets with alpha matte to address the shortage of available matting datasets in medical scenes and prove the alpha matte to be a more efficient labeling method than a binary mask from both qualitative and quantitative aspects.

preprint2022arXiv

Microscopic theory on magnetic-field-tuned sweet spot of exchange interactions in multielectron quantum-dot systems

The exchange interaction in a singlet-triplet qubit defined by two-electron states in the double-quantum-dot system (&#34;two-electron singlet-triplet qubit&#34;) typically varies monotonically with the exchange interaction and thus carries no sweet spot. Here we study a singlet-triplet qubit defined by four-electron states in the double-quantum-dot system (&#34;four-electron singlet-triplet qubit&#34;). We demonstrate, using configuration-interaction calculations, that in the four-electron singlet-triplet qubit the exchange energy as a function of detuning can be non-monotonic, suggesting existence of sweet spots. We further show that the tuning of the sweet spot and the corresponding exchange energy by perpendicular magnetic field can be related to the variation of orbital splitting. Our results suggest that a singlet-triplet qubit with more than two electrons can have advantages in the realization of quantum computing.

preprint2022arXiv

Mitigating barren plateaus of variational quantum eigensolvers

Variational quantum algorithms (VQAs) are expected to establish valuable applications on near-term quantum computers. However, recent works have pointed out that the performance of VQAs greatly relies on the expressibility of the ansatzes and is seriously limited by optimization issues such as barren plateaus (i.e., vanishing gradients). This work proposes the state efficient ansatz (SEA) for accurate ground state preparation with improved trainability. We show that the SEA can generate an arbitrary pure state with much fewer parameters than a universal ansatz, making it efficient for tasks like ground state estimation. Then, we prove that barren plateaus can be efficiently mitigated by the SEA and the trainability can be further improved most quadratically by flexibly adjusting the entangling capability of the SEA. Finally, we investigate a plethora of examples in ground state estimation where we obtain significant improvements in the magnitude of cost gradient and the convergence speed.

preprint2022arXiv

Muffin: Testing Deep Learning Libraries via Neural Architecture Fuzzing

Deep learning (DL) techniques are proven effective in many challenging tasks, and become widely-adopted in practice. However, previous work has shown that DL libraries, the basis of building and executing DL models, contain bugs and can cause severe consequences. Unfortunately, existing testing approaches still cannot comprehensively exercise DL libraries. They utilize existing trained models and only detect bugs in model inference phase. In this work we propose Muffin to address these issues. To this end, Muffin applies a specifically-designed model fuzzing approach, which allows it to generate diverse DL models to explore the target library, instead of relying only on existing trained models. Muffin makes differential testing feasible in the model training phase by tailoring a set of metrics to measure the inconsistencies between different DL libraries. In this way, Muffin can best exercise the library code to detect more bugs. To evaluate the effectiveness of Muffin, we conduct experiments on three widely-used DL libraries. The results demonstrate that Muffin can detect 39 new bugs in the latest release versions of popular DL libraries, including Tensorflow, CNTK, and Theano.

preprint2022arXiv

Muon $(g-2)$ and Flavor Puzzles in the $U(1)^{}_{X}$-gauged Leptoquark Model

We present an economical model where an $S^{}_1$ leptoquark and an anomaly-free $U(1)^{}_X$ gauge symmetry with $X = B^{}_3-2L^{}_μ/3-L^{}_τ/3$ are introduced, to account for the muon anomalous magnetic moment $a^{}_μ\equiv (g^{}_μ-2)$ and flavor puzzles including $R^{}_{K^{(\ast)_{}}}$ and $R^{}_{D^{(\ast)_{}}}$ anomalies together with quark and lepton flavor mixing. The $Z^\prime_{}$ gauge boson associated with the $U(1)^{}_X$ symmetry is responsible for the $R^{}_{K^{(\ast)_{}}}$ anomaly. Meanwhile, the specific flavor mixing patterns of quarks and leptons can be generated after the spontaneous breakdown of the $U(1)^{}_X$ gauge symmetry via the Froggatt-Nielsen mechanism. The $S^{}_1$ leptoquark which is also charged under the $U(1)^{}_X$ gauge symmetry can simultaneously explain the latest muon $(g-2)$ result and the $R^{}_{D^{(\ast)_{}}}$ anomaly. In addition, we also discuss several other experimental constraints on our model.

preprint2022arXiv

NeurIPS&#39;22 Cross-Domain MetaDL competition: Design and baseline results

We present the design and baseline results for a new challenge in the ChaLearn meta-learning series, accepted at NeurIPS&#39;22, focusing on &#34;cross-domain&#34; meta-learning. Meta-learning aims to leverage experience gained from previous tasks to solve new tasks efficiently (i.e., with better performance, little training data, and/or modest computational resources). While previous challenges in the series focused on within-domain few-shot learning problems, with the aim of learning efficiently N-way k-shot tasks (i.e., N class classification problems with k training examples), this competition challenges the participants to solve &#34;any-way&#34; and &#34;any-shot&#34; problems drawn from various domains (healthcare, ecology, biology, manufacturing, and others), chosen for their humanitarian and societal impact. To that end, we created Meta-Album, a meta-dataset of 40 image classification datasets from 10 domains, from which we carve out tasks with any number of &#34;ways&#34; (within the range 2-20) and any number of &#34;shots&#34; (within the range 1-20). The competition is with code submission, fully blind-tested on the CodaLab challenge platform. The code of the winners will be open-sourced, enabling the deployment of automated machine learning solutions for few-shot image classification across several domains.

preprint2022arXiv

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in this matter or with limited resources. To address these hurdles, we develop an automatic tool, NL2GDPR, which can generate policies from natural language descriptions from the developer while also ensuring the app&#39;s functionalities are compliant with General Data Protection Regulation (GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA (Open Information Annotation), developed by Baidu Cognitive Computing Lab. At the core, NL2GDPR is a privacy-centric information extraction model, appended with a GDPR policy finder and a policy generator. We perform a comprehensive study to grasp the challenges in extracting privacy-centric information and generating privacy policies, while exploiting optimizations for this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4% accuracy in correctly identifying GDPR policies related to personal data storage, process, and share types, respectively. To the best of our knowledge, NL2GDPR is the first tool that allows a developer to automatically generate GDPR compliant policies, with only the need of entering the natural language for describing the app features. Note that other non-GDPR-related features might be integrated with the generated features to build a complex app.

preprint2022arXiv

Nonadiabatic geometric quantum computation with cat qubits via invariant-based reverse engineering

We propose a protocol to realize nonadiabatic geometric quantum computation of small-amplitude Schrödinger cat qubits via invariant-based reverse engineering. We consider a system with a two-photon driven Kerr nonlinearity, which provides a pair of dressed even and odd coherent states, i.e., Schrödinger cat states for fault-tolerant quantum computations. An additional coherent field is applied to linearly drive a cavity mode, to induce oscillations between dressed cat states. By designing this linear drive with invariant-based reverse engineering, nonadiabatic geometric quantum computation with cat qubits can be implemented. The performance of the protocol is estimated by taking into account the influence of systematic errors, additive white Gaussian noise, and decoherence including photon loss and dephasing. Numerical results demonstrate that our protocol is robust against these negative factors. Therefore, this protocol may provide a feasible method for nonadiabatic geometric quantum computation in bosonic systems.

preprint2022arXiv

Nonreciprocal waveguide-QED for spinning cavities with multiple coupling points

We investigate chiral emission and the single-photon scattering of spinning cavities coupled to a meandering waveguide at multiple coupling points. It is shown that nonreciprocal photon transmissions occur in the cavities-waveguide system, which stems from interference effects among different coupling points, and frequency shifts induced by the Sagnac effect. The nonlocal interference is akin to the mechanism in giant atoms. In the single-cavity setup, by optimizing the spinning velocity and number of coupling points, the chiral factor can approach 1, and the chiral direction can be freely switched. Moreover, destructive interference gives rise to the complete photon transmission in one direction over the whole optical frequency band, with no analogy in other quantum setups. In the multiple-cavity system, we also investigate the photon transport properties. The results indicate a directional information flow between different nodes. Our proposal provides a novel way to achieve quantum nonreciprocal devices, which can be applied in large-scale quantum chiral networks with optical waveguides.

preprint2022arXiv

One dimensional reduced model for ITER relevant energetic particle transport

We set up a mapping procedure able to translate the evolution of the radial profile of fast ions, interacting with Toroidal Alfvén Eigenmodes, into the dynamics of an equivalent one dimensional bump-on-tail system. We apply this mapping technique to reproduce ITER relevant simulations, which clearly outlined deviations from the diffusive quasi-linear (QL) model. Our analysis demonstrates the capability of the one-dimensional beam-plasma dynamics to predict the relevant features of the non-linear hybrid LIGKA/HAGIS simulations. In particular, we clearly identify how the deviation from the QL evolutive profiles is due to the presence of avalanche processes. A detailed analysis regarding the reduced dimensionality is also addressed, by means of phase-space slicing based on constants of motion. In the conclusions, we outline the main criticalities and outcomes of the procedure, which must be satisfactorily addressed to make quantitative prediction on the observed outgoing fluxes in a Tokamak device.

preprint2022arXiv

Open-Eye: An Open Platform to Study Human Performance on Identifying AI-Synthesized Faces

AI-synthesized faces are visually challenging to discern from real ones. They have been used as profile images for fake social media accounts, which leads to high negative social impacts. Although progress has been made in developing automatic methods to detect AI-synthesized faces, there is no open platform to study the human performance of AI-synthesized faces detection. In this work, we develop an online platform called Open-eye to study the human performance of AI-synthesized face detection. We describe the design and workflow of the Open-eye in this paper.

preprint2022arXiv

Out-Of-Distribution Generalization on Graphs: A Survey

Graph machine learning has been extensively studied in both academia and industry. Although booming with a vast number of emerging methods and techniques, most of the literature is built on the in-distribution hypothesis, i.e., testing and training graph data are identically distributed. However, this in-distribution hypothesis can hardly be satisfied in many real-world graph scenarios where the model performance substantially degrades when there exist distribution shifts between testing and training graph data. To solve this critical problem, out-of-distribution (OOD) generalization on graphs, which goes beyond the in-distribution hypothesis, has made great progress and attracted ever-increasing attention from the research community. In this paper, we comprehensively survey OOD generalization on graphs and present a detailed review of recent advances in this area. First, we provide a formal problem definition of OOD generalization on graphs. Second, we categorize existing methods into three classes from conceptually different perspectives, i.e., data, model, and learning strategy, based on their positions in the graph machine learning pipeline, followed by detailed discussions for each category. We also review the theories related to OOD generalization on graphs and introduce the commonly used graph datasets for thorough evaluations. Finally, we share our insights on future research directions. This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.

preprint2022arXiv

Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation

Real human conversation data are complicated, heterogeneous, and noisy, from which building open-domain dialogue systems remains a challenging task. In fact, such dialogue data still contains a wealth of information and knowledge, however, they are not fully explored. In this paper, we show existing open-domain dialogue generation methods that memorize context-response paired data with autoregressive or encode-decode language models underutilize the training data. Different from current approaches, using external knowledge, we explore a retrieval-generation training framework that can take advantage of the heterogeneous and noisy training data by considering them as &#34;evidence&#34;. In particular, we use BERTScore for retrieval, which gives better qualities of the evidence and generation. Experiments over publicly available datasets demonstrate that our method can help models generate better responses, even such training data are usually impressed as low-quality data. Such performance gain is comparable with those improved by enlarging the training set, even better. We also found that the model performance has a positive correlation with the relevance of the retrieved evidence. Moreover, our method performed well on zero-shot experiments, which indicates that our method can be more robust to real-world data.

preprint2022arXiv

Receiver Design for MIMO Unsourced Random Access with SKP Coding

In this letter, we extend the sparse Kronecker-product (SKP) coding scheme, originally designed for the additive white Gaussian noise (AWGN) channel, to multiple input multiple output (MIMO) unsourced random access (URA). With the SKP coding adopted for MIMO transmission, we develop an efficient Bayesian iterative receiver design to solve the intended challenging trilinear factorization problem. Numerical results show that the proposed design outperforms the existing counterparts, and that it performs well in all simulated settings with various antenna sizes and active-user numbers.

preprint2022arXiv

ReFormer: The Relational Transformer for Image Captioning

Image captioning is shown to be able to achieve a better performance by using scene graphs to represent the relations of objects in the image. The current captioning encoders generally use a Graph Convolutional Net (GCN) to represent the relation information and merge it with the object region features via concatenation or convolution to get the final input for sentence decoding. However, the GCN-based encoders in the existing methods are less effective for captioning due to two reasons. First, using the image captioning as the objective (i.e., Maximum Likelihood Estimation) rather than a relation-centric loss cannot fully explore the potential of the encoder. Second, using a pre-trained model instead of the encoder itself to extract the relationships is not flexible and cannot contribute to the explainability of the model. To improve the quality of image captioning, we propose a novel architecture ReFormer -- a RElational transFORMER to generate features with relation information embedded and to explicitly express the pair-wise relationships between objects in the image. ReFormer incorporates the objective of scene graph generation with that of image captioning using one modified Transformer model. This design allows ReFormer to generate not only better image captions with the bene-fit of extracting strong relational image features, but also scene graphs to explicitly describe the pair-wise relation-ships. Experiments on publicly available datasets show that our model significantly outperforms state-of-the-art methods on image captioning and scene graph generation

preprint2022arXiv

Robust Attentive Deep Neural Network for Exposing GAN-generated Faces

GAN-based techniques that generate and synthesize realistic faces have caused severe social concerns and security problems. Existing methods for detecting GAN-generated faces can perform well on limited public datasets. However, images from existing public datasets do not represent real-world scenarios well enough in terms of view variations and data distributions (where real faces largely outnumber synthetic faces). The state-of-the-art methods do not generalize well in real-world problems and lack the interpretability of detection results. Performance of existing GAN-face detection models degrades significantly when facing imbalanced data distributions. To address these shortcomings, we propose a robust, attentive, end-to-end network that can spot GAN-generated faces by analyzing their eye inconsistencies. Specifically, our model learns to identify inconsistent eye components by localizing and comparing the iris artifacts between the two eyes automatically. Our deep network addresses the imbalance learning issues by considering the AUC loss and the traditional cross-entropy loss jointly. Comprehensive evaluations of the FFHQ dataset in terms of both balanced and imbalanced scenarios demonstrate the superiority of the proposed method.

preprint2022arXiv

Robust Contrastive Learning against Noisy Views

Contrastive learning relies on an assumption that positive pairs contain related views, e.g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance. But what if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positive pairs with no apparent shared information. In this work, we propose a new contrastive loss function that is robust against noisy views. We provide rigorous theoretical justifications by showing connections to robust symmetric losses for noisy binary classification and by establishing a new contrastive bound for mutual information maximization based on the Wasserstein distance measure. The proposed loss is completely modality-agnostic and a simple drop-in replacement for the InfoNCE loss, which makes it easy to apply to existing contrastive frameworks. We show that our approach provides consistent improvements over the state-of-the-art on image, video, and graph contrastive learning benchmarks that exhibit a variety of real-world noise patterns.

preprint2022arXiv

Robust entangling gate for capacitively coupled few-electron singlet-triplet qubits

The search of a sweet spot, locus in qubit parameters where quantum control is first-order insensitive to noises, is key to achieve high-fidelity quantum gates. Efforts to search for such a sweet spot in conventional double-quantum-dot singlet-triplet qubits where each dot hosts one electron (&#34;two-electron singlet-triplet qubit&#34;), especially for two-qubit operations, have been unsuccessful. Here we consider singlet-triplet qubits allowing each dot to host more than one electron, with a total of four electrons in the double quantum dots (&#34;four-electron singlet-triplet qubit&#34;). We theoretically demonstrate, using configuration-interaction calculations, that sweet spots appear in this coupled qubit system. We further demonstrate that, under realistic charge noise and hyperfine noise, two-qubit operation at the proposed sweet spot could offer gate fidelities ($\sim99\%$) that are higher than conventional two-electron singlet-triplet qubit system ($\sim90\%$). Our results should facilitate realization of high-fidelity two-qubit gates in singlet-triplet qubit systems.

preprint2022arXiv

Scene Recognition with Objectness, Attribute and Category Learning

Scene classification has established itself as a challenging research problem. Compared to images of individual objects, scene images could be much more semantically complex and abstract. Their difference mainly lies in the level of granularity of recognition. Yet, image recognition serves as a key pillar for the good performance of scene recognition as the knowledge attained from object images can be used for accurate recognition of scenes. The existing scene recognition methods only take the category label of the scene into consideration. However, we find that the contextual information that contains detailed local descriptions are also beneficial in allowing the scene recognition model to be more discriminative. In this paper, we aim to improve scene recognition using attribute and category label information encoded in objects. Based on the complementarity of attribute and category labels, we propose a Multi-task Attribute-Scene Recognition (MASR) network which learns a category embedding and at the same time predicts scene attributes. Attribute acquisition and object annotation are tedious and time consuming tasks. We tackle the problem by proposing a partially supervised annotation strategy in which human intervention is significantly reduced. The strategy provides a much more cost-effective solution to real world scenarios, and requires considerably less annotation efforts. Moreover, we re-weight the attribute predictions considering the level of importance indicated by the object detected scores. Using the proposed method, we efficiently annotate attribute labels for four large-scale datasets, and systematically investigate how scene and attribute recognition benefit from each other. The experimental results demonstrate that MASR learns a more discriminative representation and achieves competitive recognition performance compared to the state-of-the-art methods

preprint2022arXiv

Self-directed Machine Learning

Conventional machine learning (ML) relies heavily on manual design from machine learning experts to decide learning tasks, data, models, optimization algorithms, and evaluation metrics, which is labor-intensive, time-consuming, and cannot learn autonomously like humans. In education science, self-directed learning, where human learners select learning tasks and materials on their own without requiring hands-on guidance, has been shown to be more effective than passive teacher-guided learning. Inspired by the concept of self-directed human learning, we introduce the principal concept of Self-directed Machine Learning (SDML) and propose a framework for SDML. Specifically, we design SDML as a self-directed learning process guided by self-awareness, including internal awareness and external awareness. Our proposed SDML process benefits from self task selection, self data selection, self model selection, self optimization strategy selection and self evaluation metric selection through self-awareness without human guidance. Meanwhile, the learning performance of the SDML process serves as feedback to further improve self-awareness. We propose a mathematical formulation for SDML based on multi-level optimization. Furthermore, we present case studies together with potential applications of SDML, followed by discussing future research directions. We expect that SDML could enable machines to conduct human-like self-directed learning and provide a new perspective towards artificial general intelligence.

preprint2022arXiv

Sensitivity tests of cosmic velocity fields to massive neutrinos

We investigate impacts of massive neutrinos on the cosmic velocity fields, employing high-resolution cosmological $N$-body simulations provided by the information-optimized CUBE code, where cosmic neutrinos are evolved using collisionless hydrodynamics and their perturbations can be accurately resolved. In this study we focus, for the first time, on the analysis of massive-neutrino induced suppression effects in various cosmic velocity field components of velocity magnitude, divergence, vorticity and dispersion. By varying the neutrino mass sum $M_ν$ from 0 -- 0.4 eV, the simulations show that, the power spectra of vorticity -- exclusively sourced by non-linear structure formation that is affected by massive neutrinos significantly -- is very sensitive to the mass sum, which potentially provide novel signatures in detecting massive neutrinos. Furthermore, using the chi-square statistic, we quantitatively test the sensitivity of the density and velocity power spectra to the neutrino mass sum. Indeed, we find that, the vorticity spectrum has the highest sensitivity, and the null hypothesis of massless neutrinos is incompatible with both vorticity and divergence spectra from $M_ν=0.1$ eV at high significance ($p$-value $= 0.03$ and $0.07$, respectively). These results demonstrate clearly the importance of peculiar velocity field measurements, in particular of vorticity and divergence components, in determination of neutrino mass and mass hierarchy.

preprint2022arXiv

Sign-switching of superexchange mediated by few electrons under non-uniform magnetic field

Long range interaction between distant spins is an important building block for the realization of large quantum-dot network in which couplings between pairs of spins can be selectively addressed. Recent experiments on coherent logical states oscillation between remote spins facilitated by intermediate electron states has paved the first step for large scale quantum information processing. Reaching this ultimate goal requires extensive studies on the superexchange interaction on different quantum-dot spatial arrangements and electron configurations. Here, we consider a linear triple-quantum-dot with two anti-parallel spins in the outer dots forming the logical states while various number of electrons in the middle dot forming a mediator, which facilitates the superexchange interaction. We show that the superexchange is enhanced when the number of mediating electrons increases. In addition, we show that forming a four-electron triplet in the mediator dot further enhance the superexchange strength. Our work can be a guide to scale up the quantum-dot array with controllable and dense connectivity.

preprint2022arXiv

Social Distancing Alert with Smartwatches

Social distancing is an efficient public health practice during the COVID-19 pandemic. However, people would violate the social distancing practice unconsciously when they conduct some social activities such as handshaking, hugging, kissing on the face or forehead, etc. In this paper, we present SoDA, a social distancing practice violation alert system based on smartwatches, for preventing COVID-19 virus transmission. SoDA utilizes recordings of accelerometers and gyroscopes to recognize activities that may violate social distancing practice with simple yet effective Vision Transformer models. Extensive experiments over 10 volunteers and 1800+ samples demonstrate that SoDA achieves social activity recognition with the accuracy of 94.7%, 1.8% negative alert, and 2.2% missing alert.

preprint2022arXiv

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex mapping from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learning-based framework that performs step-wise registration. The key notion is warping a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept `Plan&#39; to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods.

preprint2022arXiv

Submillimetre galaxies in two massive protoclusters at z = 2.24: witnessing the enrichment of extreme starbursts in the outskirts of HAE density peaks

Submillimetre galaxies represent a rapid growth phase of both star formation and massive galaxies. Mapping SMGs in galaxy protoclusters provides key insights into where and how these extreme starbursts take place in connections with the assembly of the large-scale structure in the early Universe. We search for SMGs at 850$\,μm$ using JCMT/SCUBA-2 in two massive protoclusters at $z=2.24$, BOSS1244 and BOSS1542, and detect 43 and 54 sources with $S_{850}>4\,$mJy at the $4σ$ level within an effective area of 264$\,$arcmin$^2$, respectively. We construct the intrinsic number counts and find that the abundance of SMGs is $2.0\pm0.3$ and $2.1\pm0.2$ times that of the general fields, confirming that BOSS1244 and BOSS1542 contain a higher fraction of dusty galaxies with strongly enhanced star formation. The volume densities of the SMGs are estimated to be $\sim15-$30 times the average, significantly higher than the overdensity factor ($\sim 6$) traced by H$α$ emission-line galaxies (HAEs). More importantly, we discover a prominent offset between the spatial distributions of the two populations in these two protoclusters -- SMGs are mostly located around the high-density regions of HAEs, and few are seen inside these regions. This finding may have revealed for the first time the occurrence of violent star formation enhancement in the outskirts of the HAE density peaks, likely driven by the boosting of gas supplies and/or starburst triggering events. Meanwhile, the lack of SMGs inside the most overdense regions at $z\sim2$ implies a transition to the environment disfavouring extreme starbursts.

preprint2022arXiv

Sum of Ranked Range Loss for Supervised Learning

In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary/multi-class classification at the sample level and the TKML individual loss for multi-label/multi-class classification at the label level. A combination loss of AoRR and TKML is proposed as a new learning objective for improving the robustness of multi-label learning in the face of outliers in sample and labels alike. Our empirical results highlight the effectiveness of the proposed optimization frameworks and demonstrate the applicability of proposed losses using synthetic and real data sets.

preprint2022arXiv

SWIPENET: Object detection in noisy underwater images

In recent years, deep learning based object detection methods have achieved promising performance in controlled environments. However, these methods lack sufficient capabilities to handle underwater object detection due to these challenges: (1) images in the underwater datasets and real applications are blurry whilst accompanying severe noise that confuses the detectors and (2) objects in real applications are usually small. In this paper, we propose a novel Sample-WeIghted hyPEr Network (SWIPENET), and a robust training paradigm named Curriculum Multi-Class Adaboost (CMA), to address these two problems at the same time. Firstly, the backbone of SWIPENET produces multiple high resolution and semantic-rich Hyper Feature Maps, which significantly improve small object detection. Secondly, a novel sample-weighted detection loss function is designed for SWIPENET, which focuses on learning high weight samples and ignore learning low weight samples. Moreover, inspired by the human education process that drives the learning from easy to hard concepts, we here propose the CMA training paradigm that first trains a clean detector which is free from the influence of noisy data. Then, based on the clean detector, multiple detectors focusing on learning diverse noisy data are trained and incorporated into a unified deep ensemble of strong noise immunity. Experiments on two underwater robot picking contest datasets (URPC2017 and URPC2018) show that the proposed SWIPENET+CMA framework achieves better accuracy in object detection against several state-of-the-art approaches.

preprint2022arXiv

Synergistic Network Learning and Label Correction for Noise-robust Image Classification

Large training datasets almost always contain examples with inaccurate or incorrect labels. Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice. To address this problem, we propose a robust label correction framework combining the ideas of small loss selection and noise correction, which learns network parameters and reassigns ground truth labels iteratively. Taking the expertise of DNNs to learn meaningful patterns before fitting noise, our framework first trains two networks over the current dataset with small loss selection. Based on the classification loss and agreement loss of two networks, we can measure the confidence of training data. More and more confident samples are selected for label correction during the learning process. We demonstrate our method on both synthetic and real-world datasets with different noise types and rates, including CIFAR-10, CIFAR-100 and Clothing1M, where our method outperforms the baseline approaches.

preprint2022arXiv

Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis

Speech synthesis and music audio generation from symbolic input differ in many aspects but share some similarities. In this study, we investigate how text-to-speech synthesis techniques can be used for piano MIDI-to-audio synthesis tasks. Our investigation includes Tacotron and neural source-filter waveform models as the basic components, with which we build MIDI-to-audio synthesis systems in similar ways to TTS frameworks. We also include reference systems using conventional sound modeling techniques such as sample-based and physical-modeling-based methods. The subjective experimental results demonstrate that the investigated TTS components can be applied to piano MIDI-to-audio synthesis with minor modifications. The results also reveal the performance bottleneck -- while the waveform model can synthesize high quality piano sound given natural acoustic features, the conversion from MIDI to acoustic features is challenging. The full MIDI-to-audio synthesis system is still inferior to the sample-based or physical-modeling-based approaches, but we encourage TTS researchers to test their TTS models for this new task and improve the performance.

preprint2022arXiv

The VoicePrivacy 2020 Challenge Evaluation Plan

The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this document, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation. We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results.

preprint2022arXiv

Theory on electron-phonon spin dehphasing in GaAs multi-electron double quantum dots

Recent studies reveal that a double-quantum-dot system hosting more than two electrons may be superior in certain aspects as compared to the traditional case in which only two electrons are confined (a singlet-triplet qubit). We study the electron-phonon dephasing occurring in a GaAs multi-electron double-quantum-dot system, in a biased case in which the singlet state is hybridized, as well as in an unbiased case in which the hybridization is absent. We have found that while the electron-phonon dephasing rate increases with the number of electrons confined in the unbiased case, this does not hold in the biased case. We define a merit figure as a ratio between the exchange energy and the dephasing rate, and have shown that in experimentally relevant range of the exchange energy, the merit figure actually increases with the number of electrons in the biased case. Our results show that the multi-electron quantum-dot system has another advantage in mitigating the effect of electron-phonon dephasing, which is previously under-appreciated in the literature.

preprint2022arXiv

Topological strings and Wilson loops

We propose the refined topological string correspondence to the expectation values of half-BPS Wilson loop operators in 5d $\mathcal{N}=1$ gauge theory partition function on the Omega-deformed background $\mathbb{R}^4_{ε_{1,2}}\times S^1$. We provide the refined topological vertex method and the refined holomorphic anomaly equation method in the topological string theory, from which we have exact computations on the 5d Wilson loops partition functions in both A- and B-models. Finally, with the exact results we have in B-model, we recover the quantum periods of local $\mathbb{P}^1\times\mathbb{P}^1$ model and local $\mathbb{P}^2$ model in the study of quantum geometry and we further give a refined generalization of A-period.

preprint2022arXiv

Trajectory Planning of Cellular-Connected UAV for Communication-assisted Radar Sensing

Being a key technology for beyond fifth-generation wireless systems, joint communication and radar sensing (JCAS) utilizes the reflections of communication signals to detect foreign objects and deliver situational awareness. A cellular-connected unmanned aerial vehicle (UAV) is uniquely suited to form a mobile bistatic synthetic aperture radar (SAR) with its serving base station (BS) to sense over large areas with superb sensing resolutions at no additional requirement of spectrum. This paper designs this novel BS-UAV bistatic SAR platform, and optimizes the flight path of the UAV to minimize its propulsion energy and guarantee the required sensing resolutions on a series of interesting landmarks. A new trajectory planning algorithm is developed to convexify the propulsion energy and resolution requirements by using successive convex approximation and block coordinate descent. Effective trajectories are obtained with a polynomial complexity. Extensive simulations reveal that the proposed trajectory planning algorithm outperforms significantly its alternative that minimizes the flight distance of cellular-aided sensing missions in terms of energy efficiency and effective consumption fluctuation. The energy saving offered by the proposed algorithm can be as significant as 55\%.

preprint2022arXiv

Unknown-Aware Object Detection: Learning What You Don&#39;t Know from Videos in the Wild

Building reliable object detectors that can detect out-of-distribution (OOD) objects is critical yet underexplored. One of the key challenges is that models lack supervision signals from unknown data, producing overconfident predictions on OOD objects. We propose a new unknown-aware object detection framework through Spatial-Temporal Unknown Distillation (STUD), which distills unknown objects from videos in the wild and meaningfully regularizes the model&#39;s decision boundary. STUD first identifies the unknown candidate object proposals in the spatial dimension, and then aggregates the candidates across multiple video frames to form a diverse set of unknown objects near the decision boundary. Alongside, we employ an energy-based uncertainty regularization loss, which contrastively shapes the uncertainty space between the in-distribution and distilled unknown objects. STUD establishes the state-of-the-art performance on OOD detection tasks for object detection, reducing the FPR95 score by over 10% compared to the previous best method. Code is available at https://github.com/deeplearning-wisc/stud.

preprint2022arXiv

Unsupervised Domain Adaptive Fundus Image Segmentation with Category-level Regularization

Existing unsupervised domain adaptation methods based on adversarial learning have achieved good performance in several medical imaging tasks. However, these methods focus only on global distribution adaptation and ignore distribution constraints at the category level, which would lead to sub-optimal adaptation performance. This paper presents an unsupervised domain adaptation framework based on category-level regularization that regularizes the category distribution from three perspectives. Specifically, for inter-domain category regularization, an adaptive prototype alignment module is proposed to align feature prototypes of the same category in the source and target domains. In addition, for intra-domain category regularization, we tailored a regularization technique for the source and target domains, respectively. In the source domain, a prototype-guided discriminative loss is proposed to learn more discriminative feature representations by enforcing intra-class compactness and inter-class separability, and as a complement to traditional supervised loss. In the target domain, an augmented consistency category regularization loss is proposed to force the model to produce consistent predictions for augmented/unaugmented target images, which encourages semantically similar regions to be given the same label. Extensive experiments on two publicly fundus datasets show that the proposed approach significantly outperforms other state-of-the-art comparison algorithms.

preprint2022arXiv

VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images

Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data. Addressing these limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with a call for algorithms towards labelling and segmentation of vertebrae. Two datasets containing a total of 374 multi-detector CT scans from 355 patients were prepared and 4505 vertebrae have individually been annotated at voxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/, https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on these datasets. In this work, we present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view. We also evaluate the generalisability of the approaches to an implicit domain shift in data by evaluating the top performing algorithms of one challenge iteration on data from the other iteration. The principal takeaway from VerSe: the performance of an algorithm in labelling and segmenting a spine scan hinges on its ability to correctly identify vertebrae in cases of rare anatomical variations. The content and code concerning VerSe can be accessed at: https://github.com/anjany/verse.

preprint2021arXiv

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

Deep learning has made breakthroughs and substantial in many fields due to its powerful automatic representation capabilities. It has been proven that neural architecture design is crucial to the feature representation of data and the final performance. However, the design of the neural architecture heavily relies on the researchers&#39; prior knowledge and experience. And due to the limitations of human&#39; inherent knowledge, it is difficult for people to jump out of their original thinking paradigm and design an optimal model. Therefore, an intuitive idea would be to reduce human intervention as much as possible and let the algorithm automatically design the neural architecture. Neural Architecture Search (NAS) is just such a revolutionary algorithm, and the related research work is complicated and rich. Therefore, a comprehensive and systematic survey on the NAS is essential. Previously related surveys have begun to classify existing work mainly based on the key components of NAS: search space, search strategy, and evaluation strategy. While this classification method is more intuitive, it is difficult for readers to grasp the challenges and the landmark work involved. Therefore, in this survey, we provide a new perspective: beginning with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then providing solutions for subsequent related research work. Besides, we conduct a detailed and comprehensive analysis, comparison, and summary of these works. Finally, we provide some possible future research directions.

preprint2021arXiv

A Marching Cube Algorithm Based on Edge Growth

Marching Cube algorithm is currently one of the most popular 3D reconstruction surface rendering algorithms. It forms cube voxels through the input image, and then uses 15 basic topological configurations to extract the iso-surfaces in the voxels. It processes each cube voxel in a traversal manner, but it does not consider the relationship between iso-surfaces in adjacent cubes. Due to ambiguity, the final reconstructed model may have holes. We propose a Marching Cube algorithm based on edge growth. The algorithm first extracts seed triangles, then grows the seed triangles and reconstructs the entire 3D model. According to the position of the growth edge, we propose 17 topological configurations with iso-surfaces. From the reconstruction results, the algorithm can reconstruct the 3D model well. When only the main contour of the 3D model needs to be organized, the algorithm performs well. In addition, when there are multiple scattered parts in the data, the algorithm can extract only the 3D contours of the parts connected to the seed by setting the region selected by the seed.

preprint2021arXiv

ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech

The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV). This paper describes the third in a series of bi-annual challenges: ASVspoof 2019. With the challenge database and protocols being described elsewhere, the focus of this paper is on results and the top performing single and ensemble system submissions from 62 teams, all of which out-perform the two baseline systems, often by a substantial margin. Deeper analyses shows that performance is dominated by specific conditions involving either specific spoofing attacks or specific acoustic environments. While fusion is shown to be particularly effective for the logical access scenario involving speech synthesis and voice conversion attacks, participants largely struggled to apply fusion successfully for the physical access scenario involving simulated replay attacks. This is likely the result of a lack of system complementarity, while oracle fusion experiments show clear potential to improve performance. Furthermore, while results for simulated data are promising, experiments with real replay data show a substantial gap, most likely due to the presence of additive noise in the latter. This finding, among others, leads to a number of ideas for further research and directions for future editions of the ASVspoof challenge.

preprint2021arXiv

Deep Learning to Quantify Pulmonary Edema in Chest Radiographs

Purpose: To develop a machine learning model to classify the severity grades of pulmonary edema on chest radiographs. Materials and Methods: In this retrospective study, 369,071 chest radiographs and associated radiology reports from 64,581 (mean age, 51.71; 54.51% women) patients from the MIMIC-CXR chest radiograph dataset were included. This dataset was split into patients with and without congestive heart failure (CHF). Pulmonary edema severity labels from the associated radiology reports were extracted from patients with CHF as four different ordinal levels: 0, no edema; 1, vascular congestion; 2, interstitial edema; and 3, alveolar edema. Deep learning models were developed using two approaches: a semi-supervised model using a variational autoencoder and a pre-trained supervised learning model using a dense neural network. Receiver operating characteristic curve analysis was performed on both models. Results: The area under the receiver operating characteristic curve (AUC) for differentiating alveolar edema from no edema was 0.99 for the semi-supervised model and 0.87 for the pre-trained models. Performance of the algorithm was inversely related to the difficulty in categorizing milder states of pulmonary edema (shown as AUCs for semi-supervised model and pre-trained model, respectively): 2 versus 0, 0.88 and 0.81; 1 versus 0, 0.79 and 0.66; 3 versus 1, 0.93 and 0.82; 2 versus 1, 0.69 and 0.73; and, 3 versus 2, 0.88 and 0.63. Conclusion: Deep learning models were trained on a large chest radiograph dataset and could grade the severity of pulmonary edema on chest radiographs with high performance.

preprint2021arXiv

Elliptic Quantum Curves of 6d SO(N) theories

We discuss supersymmetric defects in 6d $\mathcal{N}=(1,0)$ SCFTs with $\mathrm{SO}(N_c)$ gauge group and $N_c-8$ fundamental flavors. The codimension 2 and 4 defects are engineered by coupling the 6d gauge fields to charged free fields in four and two dimensions, respectively. We find that the partition function in the presence of the codimension 2 defect on $\mathbb{R}^4\times \mathbb{T}^2$ in the Nekrasov-Shatashvili limit satisfies an elliptic difference equation which quantizes the Seiberg-Witten curve of the 6d theory. The expectation value of the codimension 4 defect appearing in the difference equation is an even (under reflection) degree $N_c$ section over the elliptic curve when $N_c$ is even, and an odd section when $N_c$ is odd. We also find that RG-flows of the defects and the associated difference equations in the 6d $\mathrm{SO}(2N+1)$ gauge theories triggered by Higgs VEVs of KK-momentum states provide quantum Seiberg-Witten curves for $\mathbb{Z}_2$ twisted compactifications of the 6d $\mathrm{SO}(2N)$ gauge theories.

preprint2021arXiv

Experimental demonstration of adversarial examples in learning topological phases

Classification and identification of different phases and the transitions between them is a central task in condensed matter physics. Machine learning, which has achieved dramatic success in a wide range of applications, holds the promise to bring unprecedented perspectives for this challenging task. However, despite the exciting progress made along this direction, the reliability of machine-learning approaches likewise demands further investigation. Here, with the nitrogen-vacancy center platform, we report the first proof-of-principle experimental demonstration of adversarial examples in learning topological phases. We show that, after adding a tiny amount of carefully-designed perturbations, the experimentally observed adversarial examples can successfully deceive a splendid phase classifier, whose prediction accuracy is larger than $99.2\%$ on legitimate samples, with a notably high confidence. Our results explicitly showcase the crucial vulnerability aspect of applying machine learning techniques in classifying phases of matter, which provides an indispensable guide for future studies in this interdisciplinary field.

preprint2021arXiv

Fast generation of Cat states in Kerr nonlinear resonators via optimal adiabatic control

Macroscopic cat states have been widely studied to illustrate fundamental principles of quantum physics as well as their application in quantum information processing. In this paper, we propose a quantum speedup method for adiabatic creation of cat states in a Kerr nonlinear resonator via gradient-descent optimal adiabatic control. By simultaneously adiabatic tuning the the cavity detuning and driving field strength, the width of minimum energy gap between the target trajectory and non-adiabatic trajectory can be widen, which allows us to speed up the evolution along the adiabatic path. Compared with the previous proposal of preparing the cat state by only controlling two-photon pumping strength in a Kerr nonlinear resonator, our method can prepare the target state with much shorter time, as well as a high fidelity and a large non-classical volume. It is worth noting that the cat state prepared by our method is also robust against single-photon loss very well. Moreover, when our proposal has a large initial detuning, it will creates a large-size cat state successfully. This proposal of preparing cat states can be implemented in superconducting quantum circuits, which provides a quantum state resource for quantum information encoding and fault-tolerant quantum computing.

preprint2021arXiv

Floquet Spin Amplification

Detection of weak electromagnetic waves and hypothetical particles aided by quantum amplification is important for fundamental physics and applications. However, demonstrations of quantum amplification are still limited; in particular, the physics of quantum amplification is not fully explored in periodically driven (Floquet) systems, which are generally defined by time-periodic Hamiltonians and enable observation of many exotic quantum phenomena such as time crystals. Here we investigate the magnetic-field signal amplification by periodically driven $^{129}$Xe spins and observe signal amplification at frequencies of transitions between Floquet spin states. This &#34;Floquet amplification&#34; allows to simultaneously enhance and measure multiple magnetic fields with at least one order of magnitude improvement, offering the capability of femtotesla-level measurements. Our findings extend the physics of quantum amplification to Floquet systems and can be generalized to a wide variety of existing amplifiers, enabling a previously unexplored class of &#34;Floquet amplifiers&#34;.

preprint2021arXiv

Instance-Aware Predictive Navigation in Multi-Agent Environments

In this work, we aim to achieve efficient end-to-end learning of driving policies in dynamic multi-agent environments. Predicting and anticipating future events at the object level are critical for making informed driving decisions. We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures. We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view, conditioned on the selected action sequence of the ego-vehicle. To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences. We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level. Our method establishes a new state of the art in the challenging CARLA multi-agent driving simulation environments without expert demonstration, giving better explainability and sample efficiency.

preprint2021arXiv

Leveraging Regular Fundus Images for Training UWF Fundus Diagnosis Models via Adversarial Learning and Pseudo-Labeling

Recently, ultra-widefield (UWF) 200\degree~fundus imaging by Optos cameras has gradually been introduced because of its broader insights for detecting more information on the fundus than regular 30 degree - 60 degree fundus cameras. Compared with UWF fundus images, regular fundus images contain a large amount of high-quality and well-annotated data. Due to the domain gap, models trained by regular fundus images to recognize UWF fundus images perform poorly. Hence, given that annotating medical data is labor intensive and time consuming, in this paper, we explore how to leverage regular fundus images to improve the limited UWF fundus data and annotations for more efficient training. We propose the use of a modified cycle generative adversarial network (CycleGAN) model to bridge the gap between regular and UWF fundus and generate additional UWF fundus images for training. A consistency regularization term is proposed in the loss of the GAN to improve and regulate the quality of the generated data. Our method does not require that images from the two domains be paired or even that the semantic labels be the same, which provides great convenience for data collection. Furthermore, we show that our method is robust to noise and errors introduced by the generated unlabeled data with the pseudo-labeling technique. We evaluated the effectiveness of our methods on several common fundus diseases and tasks, such as diabetic retinopathy (DR) classification, lesion detection and tessellated fundus segmentation. The experimental results demonstrate that our proposed method simultaneously achieves superior generalizability of the learned representations and performance improvements in multiple tasks.

preprint2021arXiv

MetaDelta: A Meta-Learning System for Few-shot Image Classification

Meta-learning aims at learning quickly on novel tasks with limited data by transferring generic experience learned from previous tasks. Naturally, few-shot learning has been one of the most popular applications for meta-learning. However, existing meta-learning algorithms rarely consider the time and resource efficiency or the generalization capacity for unknown datasets, which limits their applicability in real-world scenarios. In this paper, we propose MetaDelta, a novel practical meta-learning system for the few-shot image classification. MetaDelta consists of two core components: i) multiple meta-learners supervised by a central controller to ensure efficiency, and ii) a meta-ensemble module in charge of integrated inference and better generalization. In particular, each meta-learner in MetaDelta is composed of a unique pretrained encoder fine-tuned by batch training and parameter-free decoder used for prediction. MetaDelta ranks first in the final phase in the AAAI 2021 MetaDL Challenge\footnote{https://competitions.codalab.org/competitions/26638}, demonstrating the advantages of our proposed system. The codes are publicly available at https://github.com/Frozenmad/MetaDelta.

preprint2021arXiv

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

Existing objective evaluation metrics for voice conversion (VC) are not always correlated with human perception. Therefore, training VC models with such criteria may not effectively improve naturalness and similarity of converted speech. In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech. We adopt the convolutional and recurrent neural network models to build a mean opinion score (MOS) predictor, termed as MOSNet. The proposed models are tested on large-scale listening test results of the Voice Conversion Challenge (VCC) 2018. Experimental results show that the predicted scores of the proposed MOSNet are highly correlated with human MOS ratings at the system level while being fairly correlated with human MOS ratings at the utterance level. Meanwhile, we have modified MOSNet to predict the similarity scores, and the preliminary results show that the predicted scores are also fairly correlated with human ratings. These results confirm that the proposed models could be used as a computational evaluator to measure the MOS of VC systems to reduce the need for expensive human rating.

preprint2021arXiv

Multimodal Gait Recognition for Neurodegenerative Diseases

In recent years, single modality based gait recognition has been extensively explored in the analysis of medical images or other sensory data, and it is recognised that each of the established approaches has different strengths and weaknesses. As an important motor symptom, gait disturbance is usually used for diagnosis and evaluation of diseases; moreover, the use of multi-modality analysis of the patient&#39;s walking pattern compensates for the one-sidedness of single modality gait recognition methods that only learn gait changes in a single measurement dimension. The fusion of multiple measurement resources has demonstrated promising performance in the identification of gait patterns associated with individual diseases. In this paper, as a useful tool, we propose a novel hybrid model to learn the gait differences between three neurodegenerative diseases, between patients with different severity levels of Parkinson&#39;s disease and between healthy individuals and patients, by fusing and aggregating data from multiple sensors. A spatial feature extractor (SFE) is applied to generating representative features of images or signals. In order to capture temporal information from the two modality data, a new correlative memory neural network (CorrMNN) architecture is designed for extracting temporal features. Afterwards, we embed a multi-switch discriminator to associate the observations with individual state estimations. Compared with several state-of-the-art techniques, our proposed framework shows more accurate classification results.

preprint2021arXiv

New bounds and constructions for constant weighted $X$-codes

As a crucial technique for integrated circuits (IC) test response compaction, $X$-compact employs a special kind of codes called $X$-codes for reliable compressions of the test response in the presence of unknown logic values ($X$s). From a combinatorial view point, Fujiwara and Colbourn \cite{FC2010} introduced an equivalent definition of $X$-codes and studied $X$-codes of small weights that have good detectability and $X$-tolerance. An $(m,n,d,x)$ $X$-code is an $m\times n$ binary matrix with column vectors as its codewords. The parameters $d,x$ correspond to the test quality of the code. In this paper, bounds and constructions for constant weighted $X$-codes are investigated. First, we obtain a general result on the maximum number of codewords $n$ for an $(m,n,d,x)$ $X$-code of weight $w$, and we further improve this lower bound for the case with $x=2$ and $w=3$ through the probabilistic method. Then, using tools from additive combinatorics and finite fields, we present some explicit constructions for constant weighted $X$-codes with $d=3,7$ and $x=2$, which are optimal for the case when $d=3, w=4$ and nearly optimal for the case when $d=3,w=3$. We also consider a special class of $X$-codes introduced in \cite{FC2010} and improve the best known lower bound on the maximum number of codewords for this kind of $X$-codes.

preprint2021arXiv

Quasi one-dimensional diffuse laser cooling of atoms

We demonstrate experimentally the generation of one-dimensional cold gases of $^{87}$Rb atoms by diffuse laser cooling (DLC). A horizontal slender vacuum glass tube with length of 105~cm and diameter of 2~cm is used in our experiment. The diffuse laser light inside the tube, which is generated by multi-reflection of injected lasers, cools the background vapor atoms. With 250~mW of cooling light and 50~mW of repumping light, an evenly distributed meter-long profile of atom cloud is obtained. We observe a factor 4 improvement on the atomic OD for a typical cooling duration of 170~ms and a sub-Doppler atomic temperature of 25~$μ$k. The maximum number of detected cold atoms remain constant for a free-fall duration of 30~ms. Such samples are ideal for many quantum optical experiments involving electromagnetically induced transparency, electronically highly excited (Rydberg) atoms and quantum precision measurements.

preprint2021arXiv

Shortcuts to Adiabaticity for the Quantum Rabi Model: Efficient Generation of Giant Entangled cat States via Parametric Amplification

We propose a method for the fast generation of nonclassical ground states of the Rabi model in the ultrastrong and deep-strong coupling regimes via the shortcuts-to-adiabatic (STA) dynamics. The time-dependent quantum Rabi model is simulated by applying parametric amplification to the Jaynes-Cummings model. Using experimentally feasible parametric drive, this STA protocol can generate large-size Schrödinger cat states, through a process that is 10 times faster compared to adiabatic protocols. Such fast evolution increases the robustness of our protocol against dissipation. Our method enables one to freely design the parametric drive, so that the target state can be generated in the lab frame. A largely detuned light-matter coupling makes the protocol robust against imperfections of the operation times in experiments.

preprint2021arXiv

Spontaneous imbibition in porous media: from pore scale to Darcy scale

Spontaneous imbibition has been receiving much attention due to its significance in many subsurface and industrial applications. Unveiling pore-scale wetting dynamics, and particularly its upscaling to the Darcy scale are still unresolved. In this work, we conduct image-based pore-network modeling of cocurrent spontaneous imbibition and the corresponding quasi-static imbibition, in homogeneous sintered glass beads as well as heterogeneous Estaillades. A wide range of viscosity ratios and wettability conditions are taken into account. Based on our pore-scale results, we show the influence of pore-scale heterogeneity on imbibition dynamics and nonwetting entrapment. We elucidate different pore-filling mechanisms in imbibition, which helps us understand wetting dynamics. Most importantly, we develop a non-equilibrium model for relative permeability of the wetting phase, which adequately incorporates wetting dynamics. This is crucial to the final goal of developing a two-phase imbibition model with measurable material properties such as capillary pressure and relative permeability. Finally, we propose some future work on both numerical and experimental verifications of the developed non-equilibrium permeability model.

preprint2021arXiv

Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-task Learning

The need for comprehensive and automated screening methods for retinal image classification has long been recognized. Well-qualified doctors annotated images are very expensive and only a limited amount of data is available for various retinal diseases such as age-related macular degeneration (AMD) and diabetic retinopathy (DR). Some studies show that AMD and DR share some common features like hemorrhagic points and exudation but most classification algorithms only train those disease models independently. Inspired by knowledge distillation where additional monitoring signals from various sources is beneficial to train a robust model with much fewer data. We propose a method called synergic adversarial label learning (SALL) which leverages relevant retinal disease labels in both semantic and feature space as additional signals and train the model in a collaborative manner. Our experiments on DR and AMD fundus image classification task demonstrate that the proposed method can significantly improve the accuracy of the model for grading diseases. In addition, we conduct additional experiments to show the effectiveness of SALL from the aspects of reliability and interpretability in the context of medical imaging application.

preprint2021arXiv

The dynamic energy balance in earthquakes expressed by fault surface morphology

The dynamic energy balance is essential for earthquake studies. The energy balance approach is one of the most famous developments in fracture mechanics. To interpret seismological data, crack models and sliding on a frictional surface (fault) models are widely used. The macroscopically observable energy budget and the microscopic processes can be related through the fracture energy $G_c$. The fault surface morphology is the direct result of the microscopic processes near the crack tip or on the frictional interface. Here we show that the dynamic energy balance in earthquakes can be expressed by fault surface morphology, and that they are quantitatively linked. The direct shear experiments proves the predictions of the theoretical discussions, and show that the strain rate has crucial influence on the dynamic energy balance.

preprint2021arXiv

The mass-metallicity relation at cosmic noon in overdense environments: first results from the MAMMOTH-Grism HST slitless spectroscopic survey

The MAMMOTH-Grism slitless spectroscopic survey is a Hubble Space Telescope (HST) cycle-28 medium program, which is obtaining 45 orbits of WFC3/IR grism spectroscopy in the density peak regions of three massive galaxy protoclusters at $z=2-3$ discovered using the MAMMOTH technique. We introduce this survey by presenting the first measurement of the mass-metallicity relation (MZR) at high redshift in overdense environments via grism spectroscopy. From the completed MAMMOTH-Grism observations in the field of the BOSS1244 protocluster at $z=2.24\pm0.02$, We secure a sample of 36 protocluster member galaxies at $z\sim2.24$, showing strong nebular emission lines ([O III], H$β$ and [O II]) in their G141 spectra. Using the multi-wavelength broad-band deep imaging from HST and ground-based telescopes, we measure their stellar masses in the range of $[10^{9},10^{10.4}]M_\odot$, instantaneous star formation rates (SFR) from 10 to 240$M_\odot yr^{-1}$, and global gas-phase metallicities [$\frac{1}{3}$,1] of solar. Compared with similarly selected field galaxy sample at the same redshift, our galaxies show on average increased SFRs by $\sim$0.06dex and $\sim$0.18dex at $\sim$10$^{10.1}M_\odot$ and $\sim$10$^{9.8}M_\odot$, respectively. Using the stacked spectra of our sample galaxies, we derive the MZR in the BOSS1244 protocluster core as $12+\log({\rm O/H})=(0.136\pm0.018)\times\log(M_\ast/M_\odot)+(7.082\pm0.175)$, showing significantly shallower slope than that in the field. This shallow MZR slope is likely caused by the combined effects of efficient recycling of feedback-driven winds and cold-mode gas accretion in protocluster environments. The former effect helps low-mass galaxies residing in overdensities retain their metal production, whereas the latter effect dilutes the metal content of high-mass galaxies, making them more metal poor than their coeval field counterparts.

preprint2021arXiv

Tunable Chiral Bound States with Giant Atoms

We propose tunable chiral bound states in a system composed of superconducting giant atoms and a Josephson photonic-crystal waveguide (PCW), with no analog in other quantum setups. The chiral bound states arise due to interference in the nonlocal coupling of a giant atom to multiple points of the waveguide. The chirality can be tuned by changing either the atom-waveguide coupling or the external bias of the PCW. Furthermore, the chiral bound states can induce directional dipole-dipole interactions between multiple giant atoms coupling to the same waveguide. Our proposal is ready to be implemented in experiments with superconducting circuits, where it can be used as a tunable toolbox to realize topological phase transitions and quantum simulations.

preprint2020arXiv

A Census of Sub-kiloparsec Resolution Metallicity Gradients in Star-forming Galaxies at Cosmic Noon from HST Slitless Spectroscopy

We present hitherto the largest sample of gas-phase metallicity radial gradients measured at sub-kiloparsec resolution in star-forming galaxies in the redshift range of $z\in[1.2, 2.3]$. These measurements are enabled by the synergy of slitless spectroscopy from the Hubble Space Telescope near-infrared channels and the lensing magnification from foreground galaxy clusters. Our sample consists of 76 galaxies with stellar mass ranging from 10$^7$ to 10$^{10}$ $M_\odot$, instantaneous star-formation rate in the range of [1, 100] $M_\odot$/yr, and global metallicity [$\frac{1}{12}$, 2] solar. At 2-$σ$ confidence level, 15/76 galaxies in our sample show negative radial gradients, whereas 7/76 show inverted gradients. Combining ours and all other metallicity gradients obtained at similar resolution currently available in the literature, we measure a negative mass dependence of $Δ\log({\rm O/H})/Δr~ [\mathrm{dex~kpc^{-1}}] = \left(-0.020\pm0.007\right) + \left(-0.016\pm0.008\right) \log(M_\ast/10^{9.4} M_\odot)$ with the intrinsic scatter being $σ=0.060\pm0.006$ over four orders of magnitude in stellar mass. Our result is consistent with strong feedback, not secular processes, being the primary governor of the chemo-structural evolution of star-forming galaxies during the disk mass assembly at cosmic noon. We also find that the intrinsic scatter of metallicity gradients increases with decreasing stellar mass and increasing specific star-formation rate. This increase in the intrinsic scatter is likely caused by the combined effect of cold-mode gas accretion and merger-induced starbursts, with the latter more predominant in the dwarf mass regime of $M_\ast\lesssim10^9 M_\odot$.

preprint2020arXiv

A SLAM Map Restoration Algorithm Based on Submaps and an Undirected Connected Graph

Many visual simultaneous localization and mapping (SLAM) systems have been shown to be accurate and robust, and have real-time performance capabilities on both indoor and ground datasets. However, these methods can be problematic when dealing with aerial frames captured by a camera mounted on an unmanned aerial vehicle (UAV) because the flight height of the UAV can be difficult to control and is easily affected by the environment.To cope with the case of lost tracking, many visual SLAM systems employ a relocalization strategy. This involves the tracking thread continuing the online working by inspecting the connections between the subsequent new frames and the generated map before the tracking was lost. To solve the missing map problem, which is an issue in many applications , after the tracking is lost, based on monocular visual SLAM, we present a method of reconstructing a complete global map of UAV datasets by sequentially merging the submaps via the corresponding undirected connected graph. Specifically, submaps are repeatedly generated, from the initialization process to the place where the tracking is lost, and a corresponding undirected connected graph is built by considering these submaps as nodes and the common map points within two submaps as edges. The common map points are then determined by the bag-of-words (BoW) method, and the submaps are merged if they are found to be connected with the online map in the undirect connected graph. To demonstrate the performance of the proposed method, we first investigated the performance on a UAV dataset, and the experimental results showed that, in the case of several tracking failures, the integrity of the mapping was significantly better than that of the current mainstream SLAM method.

preprint2020arXiv

An Efficient Index Method for the Optimal Route Query over Multi-Cost Networks

Smart city has been consider the wave of the future and the route recommendation in networks is a fundamental problem in it. Most existing approaches for the shortest route problem consider that there is only one kind of cost in networks. However, there always are several kinds of cost in networks and users prefer to select an optimal route under the global consideration of these kinds of cost. In this paper, we study the problem of finding the optimal route in the multi-cost networks. We prove this problem is NP-hard and the existing index techniques cannot be used to this problem. We propose a novel partition-based index with contour skyline techniques to find the optimal route. We propose a vertex-filtering algorithm to facilitate the query processing. We conduct extensive experiments on six real-life networks and the experimental results show that our method has an improvement in efficiency by an order of magnitude compared to the previous heuristic algorithms.

preprint2020arXiv

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as &#34;presentation attacks.&#34; These vulnerabilities are generally unacceptable and call for spoofing countermeasures or &#34;presentation attack detection&#34; systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects.

preprint2020arXiv

Automated Pavement Crack Segmentation Using U-Net-based Convolutional Neural Network

Automated pavement crack image segmentation is challenging because of inherent irregular patterns, lighting conditions, and noise in images. Conventional approaches require a substantial amount of feature engineering to differentiate crack regions from non-affected regions. In this paper, we propose a deep learning technique based on a convolutional neural network to perform segmentation tasks on pavement crack images. Our approach requires minimal feature engineering compared to other machine learning techniques. We propose a U-Net-based network architecture in which we replace the encoder with a pretrained ResNet-34 neural network. We use a &#34;one-cycle&#34; training schedule based on cyclical learning rates to speed up the convergence. Our method achieves an F1 score of 96% on the CFD dataset and 73% on the Crack500 dataset, outperforming other algorithms tested on these datasets. We perform ablation studies on various techniques that helped us get marginal performance boosts, i.e., the addition of spatial and channel squeeze and excitation (SCSE) modules, training with gradually increasing image sizes, and training various neural network layers with different learning rates.

preprint2020arXiv

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Based on this diverse dataset, we build a benchmark for heterogeneous multitask learning and study how to solve the tasks together. Our experiments show that special training strategies are needed for existing models to perform such heterogeneous tasks. BDD100K opens the door for future studies in this important venue.

preprint2020arXiv

Bridge the Domain Gap Between Ultra-wide-field and Traditional Fundus Images via Adversarial Domain Adaptation

For decades, advances in retinal imaging technology have enabled effective diagnosis and management of retinal disease using fundus cameras. Recently, ultra-wide-field (UWF) fundus imaging by Optos camera is gradually put into use because of its broader insights on fundus for some lesions that are not typically seen in traditional fundus images. Research on traditional fundus images is an active topic but studies on UWF fundus images are few. One of the most important reasons is that UWF fundus images are hard to obtain. In this paper, for the first time, we explore domain adaptation from the traditional fundus to UWF fundus images. We propose a flexible framework to bridge the domain gap between two domains and co-train a UWF fundus diagnosis model by pseudo-labelling and adversarial learning. We design a regularisation technique to regulate the domain adaptation. Also, we apply MixUp to overcome the over-fitting issue from incorrect generated pseudo-labels. Our experimental results on either single or both domains demonstrate that the proposed method can well adapt and transfer the knowledge from traditional fundus images to UWF fundus images and improve the performance of retinal disease recognition.

preprint2020arXiv

Category-wise Attack: Transferable Adversarial Examples for Anchor Free Object Detection

Deep neural networks have been demonstrated to be vulnerable to adversarial attacks: subtle perturbations can completely change the classification results. Their vulnerability has led to a surge of research in this direction. However, most works dedicated to attacking anchor-based object detection models. In this work, we aim to present an effective and efficient algorithm to generate adversarial examples to attack anchor-free object models based on two approaches. First, we conduct category-wise instead of instance-wise attacks on the object detectors. Second, we leverage the high-level semantic information to generate the adversarial examples. Surprisingly, the generated adversarial examples it not only able to effectively attack the targeted anchor-free object detector but also to be transferred to attack other object detectors, even anchor-based detectors such as Faster R-CNN.

preprint2020arXiv

Channel-Dependent Scheduling in Wireless Energy Transfer for Mobile Devices

Resonant Beam Charging (RBC) is the Wireless Power Transfer (WPT) technology, which can provide high-power, long-distance, mobile, and safe wireless charging for Internet of Things (IoT) devices. Supporting multiple IoT devices charging simultaneously is a significant feature of the RBC system. To optimize the multi-user charging performance, the transmitting power should be scheduled for charging all IoT devices simultaneously. In order to keep all IoT devices working as long as possible for fairness, we propose the First Access First Charge (FAFC) scheduling algorithm. Then, we formulate the scheduling parameters quantitatively for algorithm implementation. Finally, we analyze the performance of FAFC scheduling algorithm considering the impacts of the receiver number, the transmitting power and the charging time. Based on the analysis, we summarize the methods of improving the WPT performance for multiple IoT devices, which include limiting the receiver number, increasing the transmitting power, prolonging the charging time and improving the single-user&#39;s charging efficiency. The FAFC scheduling algorithm design and analysis provide a fair WPT solution for the multi-user RBC system.

preprint2020arXiv

Community detection based on first passage probabilities

Community detection is of fundamental significance for understanding the topology characters and the spreading dynamics on complex networks. While random walk is widely used and is proven effective in many community detection algorithms, there still exists two major defects: (i) the maximal length of random walk is too large to distinguish the clustering information if using the average step of all possible random walks; (ii) the useful community information at all other step lengths are missed if using a pre-assigned maximal length. In this paper, we propose a novel community detection method based on the first passage probabilities (FPPM), equipped with a new similarity measure that incorporates the complete structural information within the maximal step length. Here the diameter of the network is chosen as an appropriate boundary of random walks which is adaptive to different networks. Then we use the hierarchical clustering to group the vertices into communities and further select the best division through the corresponding modularity values. Finally, a post-processing strategy is designed to integrate the unreasonable small communities, which significantly improves the accuracy of community division. Surprisingly, the numerical simulations show that FPPM performs best compared to several classic algorithms on both synthetic benchmarks and real-world networks, which reveals the universality and effectiveness of our method.

preprint2020arXiv

Cooling-Aware Resource Allocation and Load Management for Mobile Edge Computing Systems

Driven by explosive computation demands of Internet of Things (IoT), mobile edge computing (MEC) provides a promising technique to enhance the computation capability for mobile users. In this paper, we propose a joint resource allocation and load management mechanism in an MEC system with wireless power transfer (WPT), by jointly optimizing the transmit power for WPT, the local/edge computing load, the offloading time, and the frequencies of the central processing units (CPUs) at the access point (AP) and the users. To achieve an energy-efficient and sustainable WPT-MEC system, we minimize the total energy consumption of the AP, while meeting computation latency requirements. Cooling energy which is non-negligible, is taken into account in minimizing the energy consumption of the MEC system. By rigorously orchestrating the state-of-the-art optimization techniques, we design an iterative algorithm and obtain the optimal solution in a semi-closed form. Based on the solution, interesting properties and insights are summarized. Extensive numerical tests show that the proposed algorithm can save up to 90.4% the energy of existing benchmarks.

preprint2020arXiv

Cosmological constraints from the redshift dependence of the Alcock-Paczynski effect: Possibility of estimateing the non-linear systematics using fast simulations

The tomographic AP method is so far the best method in separating the Alcock-Paczynski (AP) signal from the redshift space distortion (RSD) effects and deriving powerful constraints on cosmological parameters using the $\lesssim40h^{-1}\ \rm Mpc$ clustering region. To guarantee that the method can be easily applied to the future large scale structure (LSS) surveys, we study the possibility of estimating the systematics of the method using fast simulation method. The major contribution of the systematics comes from the non-zero redshift evolution of the RSD effects, which is quantified by $\hatξ_{Δs}(μ,z)$ in our analysis, and estimated using the BigMultidark exact N-body simulation and approximate COLA simulation samples. We find about 5\%/10\% evolution when comparing the $\hatξ_{Δs}(μ,z)$ measured as $z=0.5$/$z=1$ to the measurements at $z=0$. We checked the inaccuracy in the 2pCFs computed using COLA, and find it 5-10 times smaller than the intrinsic systematics of the tomographic AP method, indicating that using COLA to estimate the systematics is good enough. Finally, we test the effect of halo bias, and find $\lesssim$1.5\% change in $\hatξ_{Δs}$ when varying the halo mass within the range of $2\times 10^{12}$ to $10^{14}$ $M_{\odot}$. We will perform more studies to achieve an accurate and efficient estimation of the systematics in redshift range of $z=0-1.5$.

preprint2020arXiv

Cost of quantum entanglement simplified

Quantum entanglement is a key physical resource in quantum information processing that allows for performing basic quantum tasks such as teleportation and quantum key distribution, which are impossible in the classical world. Ever since the rise of quantum information theory, it has been an open problem to quantify entanglement in an information-theoretically meaningful way. In particular, every previously defined entanglement measure bearing a precise information-theoretic meaning is not known to be efficiently computable, or if it is efficiently computable, then it is not known to have a precise information-theoretic meaning. In this Letter, we meet this challenge by introducing an entanglement measure that has a precise information-theoretic meaning as the exact cost required to prepare an entangled state when two distant parties are allowed to perform quantum operations that completely preserve the positivity of the partial transpose. Additionally, this entanglement measure is efficiently computable by means of a semidefinite program, and it bears a number of useful properties such as additivity and faithfulness. Our results bring key insights into the fundamental entanglement structure of arbitrary quantum states, and they can be used directly to assess and quantify the entanglement produced in quantum-physical experiments.

preprint2020arXiv

Cross-Channel Intragroup Sparsity Neural Network

Modern deep neural networks rely on overparameterization to achieve state-of-the-art generalization. But overparameterized models are computationally expensive. Network pruning is often employed to obtain less demanding models for deployment. Fine-grained pruning removes individual weights in parameter tensors and can achieve a high model compression ratio with little accuracy degradation. However, it introduces irregularity into the computing dataflow and often does not yield improved model inference efficiency in practice. Coarse-grained model pruning, while realizing satisfactory inference speedup through removal of network weights in groups, e.g. an entire filter, often lead to significant accuracy degradation. This work introduces the cross-channel intragroup (CCI) sparsity structure, which can prevent the inference inefficiency of fine-grained pruning while maintaining outstanding model performance. We then present a novel training algorithm designed to perform well under the constraint imposed by the CCI-Sparsity. Through a series of comparative experiments we show that our proposed CCI-Sparsity structure and the corresponding pruning algorithm outperform prior art in inference efficiency by a substantial margin given suited hardware acceleration in the future.

preprint2020arXiv

Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation

State-of-the-art techniques in Generative Adversarial Networks (GANs) have shown remarkable success in image-to-image translation from peer domain X to domain Y using paired image data. However, obtaining abundant paired data is a non-trivial and expensive process in the majority of applications. When there is a need to translate images across n domains, if the training is performed between every two domains, the complexity of the training will increase quadratically. Moreover, training with data from two domains only at a time cannot benefit from data of other domains, which prevents the extraction of more useful features and hinders the progress of this research area. In this work, we propose a general framework for unsupervised image-to-image translation across multiple domains, which can translate images from domain X to any a domain without requiring direct training between the two domains involved in image translation. A byproduct of the framework is the reduction of computing time and computing resources, since it needs less time than training the domains in pairs as is done in state-of-the-art works. Our proposed framework consists of a pair of encoders along with a pair of GANs which learns high-level features across different domains to generate diverse and realistic samples from. Our framework shows competing results on many image-to-image tasks compared with state-of-the-art techniques.

preprint2020arXiv

Deep Learning for Learning Graph Representations

Mining graph data has become a popular research topic in computer science and has been widely studied in both academia and industry given the increasing amount of network data in the recent years. However, the huge amount of network data has posed great challenges for efficient analysis. This motivates the advent of graph representation which maps the graph into a low-dimension vector space, keeping original graph structure and supporting graph inference. The investigation on efficient representation of a graph has profound theoretical significance and important realistic meaning, we therefore introduce some basic ideas in graph representation/network embedding as well as some representative models in this chapter.

preprint2020arXiv

Deep Visual Odometry with Adaptive Memory

We propose a novel deep visual odometry (VO) method that considers global information by selecting memory and refining poses. Existing learning-based methods take the VO task as a pure tracking problem via recovering camera poses from image snippets, leading to severe error accumulation. Global information is crucial for alleviating accumulated errors. However, it is challenging to effectively preserve such information for end-to-end systems. To deal with this challenge, we design an adaptive memory module, which progressively and adaptively saves the information from local to global in a neural analogue of memory, enabling our system to process long-term dependency. Benefiting from global information in the memory, previous results are further refined by an additional refining module. With the guidance of previous outputs, we adopt a spatial-temporal attention to select features for each view based on the co-visibility in feature domain. Specifically, our architecture consisting of Tracking, Remembering and Refining modules works beyond tracking. Experiments on the KITTI and TUM-RGBD datasets demonstrate that our approach outperforms state-of-the-art methods by large margins and produces competitive results against classic approaches in regular scenes. Moreover, our model achieves outstanding performance in challenging scenarios such as texture-less regions and abrupt motions, where classic algorithms tend to fail.

preprint2020arXiv

Design Choices for X-vector Based Speaker Anonymization

The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker. In this paper, we present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender selection. To assess the strength of anonymization achieved, we consider attackers using an x-vector based speaker verification system who may use original or anonymized speech for enrollment, depending on their knowledge of the anonymization scheme. The Equal Error Rate (EER) achieved by the attackers and the decoding Word Error Rate (WER) over anonymized data are reported as the measures of privacy and utility. Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.

preprint2020arXiv

Domain Embedded Multi-model Generative Adversarial Networks for Image-based Face Inpainting

Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones.

preprint2020arXiv

Duplication of Windows Services

OS-level virtualization techniques virtualize system resources at the system call interface, has the distinct advantage of smaller run-time resource requirements as compared to HAL-level virtualization techniques, and thus forms an important building block for virtualizing parallel and distributed applications such as a HPC clusters. Because the Windows operating system puts certain critical functionalities in privileged user-level system service processes, a complete OS-level virtualization solution for the Windows platform requires duplication of such Windows service as Remote Procedure Call Server Service (RPCSS). As many implementation details of the Windows system services are proprietary, duplicating Windows system services becomes the key technical challenge for virtualizing the Windows platform at the OS level. Moreover, as a core component of cloud computing, IIS web server-related services need to be duplicated in containers (i.e., OS-level virtual machines), but so far there is no such scheme. In this paper, we thoroughly identify all issues that affect service duplication, and then propose the first known methodology to systematically duplicate both system and ordinary Windows services. Our experiments show that the methodology can duplicate a set of system and ordinary services on different versions of Windows OS.

preprint2020arXiv

Eco-evolutionary dynamics with environmental feedback: cooperation in a changing world

Eco-evolutionary game dynamics which characterizes the mutual interactions and the coupled evolutions of strategies and environments has been of growing interests in very recent years. Since such feedback loops widely exist in a range of coevolutionary systems, such as microbial systems, social-ecological system and psychological-economic system, recent modeling frameworks that unveil the oscillating dynamics of social dilemmas have great potential for practical applications. In this perspective article, we overview the latest progress of evolutionary game theory in this direction. We describe both mathematical methods and interdisciplinary applications across different fields. The ideas worthy of further consideration are discussed in prospects, with the central role of promoting cooperations in a changing world.

preprint2020arXiv

Efficiently computable bounds for magic state distillation

Magic-state distillation (or non-stabilizer state manipulation) is a crucial component in the leading approaches to realizing scalable, fault-tolerant, and universal quantum computation. Related to non-stabilizer state manipulation is the resource theory of non-stabilizer states, for which one of the goals is to characterize and quantify non-stabilizerness of a quantum state. In this paper, we introduce the family of thauma measures to quantify the amount of non-stabilizerness in a quantum state, and we exploit this family of measures to address several open questions in the resource theory of non-stabilizer states. As a first application, we establish the hypothesis testing thauma as an efficiently computable benchmark for the one-shot distillable non-stabilizerness, which in turn leads to a variety of bounds on the rate at which non-stabilizerness can be distilled, as well as on the overhead of magic-state distillation. We then prove that the max-thauma can be used as an efficiently computable tool in benchmarking the efficiency of magic-state distillation and that it can outperform pervious approaches based on mana. Finally, we use the min-thauma to bound a quantity known in the literature as the &#34;regularized relative entropy of magic.&#34; As a consequence of this bound, we find that two classes of states with maximal mana, a previously established non-stabilizerness measure, cannot be interconverted in the asymptotic regime at a rate equal to one. This result resolves a basic question in the resource theory of non-stabilizer states and reveals a difference between the resource theory of non-stabilizer states and other resource theories such as entanglement and coherence.

preprint2020arXiv

Eigen-GNN: A Graph Structure Preserving Plug-in for GNNs

Graph Neural Networks (GNNs) are emerging machine learning models on graphs. Although sufficiently deep GNNs are shown theoretically capable of fully preserving graph structures, most existing GNN models in practice are shallow and essentially feature-centric. We show empirically and analytically that the existing shallow GNNs cannot preserve graph structures well. To overcome this fundamental challenge, we propose Eigen-GNN, a simple yet effective and general plug-in module to boost GNNs ability in preserving graph structures. Specifically, we integrate the eigenspace of graph structures with GNNs by treating GNNs as a type of dimensionality reduction and expanding the initial dimensionality reduction bases. Without needing to increase depths, Eigen-GNN possesses more flexibilities in handling both feature-driven and structure-driven tasks since the initial bases contain both node features and graph structures. We present extensive experimental results to demonstrate the effectiveness of Eigen-GNN for tasks including node classification, link prediction, and graph isomorphism tests.

preprint2020arXiv

Energy Management and Trajectory Optimization for UAV-Enabled Legitimate Monitoring Systems

Thanks to their quick placement and high flexibility, unmanned aerial vehicles (UAVs) can be very useful in the current and future wireless communication systems. With a growing number of smart devices and infrastructure-free communication networks, it is necessary to legitimately monitor these networks to prevent crimes. In this paper, a novel framework is proposed to exploit the flexibility of the UAV for legitimate monitoring via joint trajectory design and energy management. The system includes a suspicious transmission link with a terrestrial transmitter and a terrestrial receiver, and a UAV to monitor the suspicious link. The UAV can adjust its positions and send jamming signal to the suspicious receiver to ensure successful eavesdropping. Based on this model, we first develop an approach to minimize the overall jamming energy consumption of the UAV. Building on a judicious (re-)formulation, an alternating optimization approach is developed to compute a locally optimal solution in polynomial time. Furthermore, we model and include the propulsion power to minimize the overall energy consumption of the UAV. Leveraging the successive convex approximation method, an effective iterative approach is developed to find a feasible solution fulfilling the Karush-Kuhn-Tucker (KKT) conditions. Extensive numerical results are provided to verify the merits of the proposed schemes.

preprint2020arXiv

Entangling Nuclear Spins by Dissipation in a Solid-state System

Entanglement is a fascinating feature of quantum mechanics and a key ingredient in most quantum information processing tasks. Yet the generation of entanglement is usually hampered by undesired dissipation owing to the inevitable coupling of a system with its environment. Here, we report an experiment on how to entangle two $^{13}$C nuclear spins via engineered dissipation in a nitrogen-vacancy system. We utilize the electron spin as an ancilla, and combine unitary processes together with optical pumping of the ancilla to implement the engineered dissipation and deterministically produce an entangled state of the two nuclear spins, independent of their initial states. Our experiment demonstrates the power of engineered dissipation as a tool for generation of multi-qubit entanglement in solid-state systems.

preprint2020arXiv

Evolution of Ethereum: A Temporal Graph Perspective

Ethereum is one of the most popular blockchain systems that supports more than half a million transactions every day and fosters miscellaneous decentralized applications with its Turing-complete smart contract machine. Whereas it remains mysterious what the transaction pattern of Ethereum is and how it evolves over time. In this paper, we study the evolutionary behavior of Ethereum transactions from a temporal graph point of view. We first develop a data analytics platform to collect external transactions associated with users as well as internal transactions initiated by smart contracts. Three types of temporal graphs, user-to-user, contract-to-contract and user-contract graphs, are constructed according to trading relationship and are segmented with an appropriate time window. We observe a strong correlation between the size of user-to-user transaction graph and the average Ether price in a time window, while no evidence of such linkage is shown at the average degree, average edge weights and average triplet closure duration. The macroscopic and microscopic burstiness of Ethereum transactions is validated. We analyze the Gini indexes of the transaction graphs and the user wealth in which Ethereum is found to be very unfair since the very beginning, in a sense, &#34;the rich is already very rich&#34;.

preprint2020arXiv

Ferroelastic-switching-driven colossal shear strain and piezoelectricity in a hybrid ferroelectric

Materials that can produce large controllable strains are widely used in shape memory devices, actuators and sensors. Great efforts have been made to improve the strain outputs of various material systems. Among them, ferroelastic transitions underpin giant reversible strains in electrically-driven ferro/piezoelectrics and thermally- or magneticallydriven shape memory alloys. However, large-strain ferroelastic switching in conventional ferroelectrics is very challenging while magnetic and thermal controls are not desirable for applications. Here, we demonstrate an unprecedentedly large shear strain up to 21.5 % in a hybrid ferroelectric, C6H5N(CH3)3CdCl3. The strain response is about two orders of magnitude higher than those of top-performing conventional ferroelectric polymers and oxides. It is achieved via inorganic bond switching and facilitated by the structural confinement of the large organic moieties, which prevents the undesired 180-degree polarization switching. Furthermore, Br substitution can effectively soften the bonds and result in giant shear piezoelectric coefficient (d35 ~ 4800 pm/V) in Br-rich end of the solid solution, C6H5N(CH3)3CdBr3xCl3(1-x). The superior electromechanical properties of the compounds promise their potential in lightweight and high energy density devices, and the strategy described here should inspire the development of next-generation piezoelectrics and electroactive materials based on hybrid ferroelectrics.

preprint2020arXiv

Forecast for FAST: from Galaxies Survey to Intensity Mapping

The Five-Hundred-Meter Aperture Spherical Radio Telescope(FAST) is the largest single-dish radio telescope in the world. In this paper, we make forecast on the FAST HI large scale structure survey by mock observations. We consider a drift scan survey with the L-band 19 beam receiver, which may be commensal with the pulsar search and Galactic HI survey. We also consider surveys at lower frequency, either using the current single feed wide band receiver, or a future multi-beam phased array feed (PAF) in the UHF band. We estimate the number density of detected HI galaxies and the measurement error in positions, the precision of the surveys are evaluated using both Fisher matrix and simulated observations. The measurement error in the HI galaxy power spectrum is estimated, and we find that the error is relatively large even at moderate redshifts, as the number of positively detected galaxies drops drastically with increasing redshift. However, good cosmological measurement could be obtained with the intensity mapping technique where the large scale HI distribution is measured without resolving individual galaxies. The figure of merit (FoM) for the dark energy equation of state with different observation times are estimated, we find that with the existing L-band multi-beam receiver, a good measurement of low redshift large scale structure can be obtained, which complements the existing optical surveys. With a PAF in the UHF band, the constraint can be much stronger, reaching the level of a dark energy task force (DETF) stage IV experiment.

preprint2020arXiv

Frustratingly Simple Few-Shot Object Detection

Detecting rare objects from a few examples is an emerging problem. Prior works show meta-learning is a promising approach. But, fine-tuning techniques have drawn scant attention. We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods. However, the high variance in the few samples often leads to the unreliability of existing benchmarks. We revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparisons and build new benchmarks based on three datasets: PASCAL VOC, COCO and LVIS. Again, our fine-tuning approach establishes a new state of the art on the revised benchmarks. The code as well as the pretrained models are available at https://github.com/ucbdrive/few-shot-object-detection.

preprint2020arXiv

Generating Synthetic Magnetism via Floquet Engineering Auxiliary Qubits in Phonon-Cavity-Based Lattice

Gauge magnetic fields have a close relation to breaking time-reversal symmetry in condensed matter. In the present of the gauge fields, we might observe nonreciprocal and topological transport. Inspired by these, there is a growing effort to realize exotic transport phenomena in optical and acoustic systems. However, due to charge neutrality, realizing analog magnetic flux for phonons in nanoscale systems is still challenging in both theoretical and experimental studies. Here we propose a novel mechanism to generate synthetic magnetic field for phonon lattice by Floquet engineering auxiliary qubits. We find that, a longitudinal Floquet drive on the qubit will produce a resonant coupling between two detuned acoustic cavities. Specially, the phase encoded into the longitudinal drive can exactly be transformed into the phonon-phonon hopping. Our proposal is general and can be realized in various types of artificial hybrid quantum systems. Moreover, by taking surface-acoustic-wave (SAW) cavities for example, we propose how to generate synthetic magnetic flux for phonon transport. In the present of synthetic magnetic flux, the time-reversal symmetry will be broken, which allows to realize the circulator transport and analog Aharonov-Bohm effects for acoustic waves. Last, we demonstrate that our proposal can be scaled to simulate topological states of matter in quantum acoustodynamics system.

preprint2020arXiv

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

Large-scale knowledge graphs (KGs) are shown to become more important in current information systems. To expand the coverage of KGs, previous studies on knowledge graph completion need to collect adequate training instances for newly-added relations. In this paper, we consider a novel formulation, zero-shot learning, to free this cumbersome curation. For newly-added relations, we attempt to learn their semantic features from their text descriptions and hence recognize the facts of unseen relations with no examples being seen. For this purpose, we leverage Generative Adversarial Networks (GANs) to establish the connection between text and knowledge graph domain: The generator learns to generate the reasonable relation embeddings merely with noisy text descriptions. Under this setting, zero-shot learning is naturally converted to a traditional supervised classification task. Empirically, our method is model-agnostic that could be potentially applied to any version of KG embeddings, and consistently yields performance improvements on NELL and Wiki dataset.

preprint2020arXiv

High-fidelity geometric gate for silicon-based spin qubits

High-fidelity manipulation is the key for the physical realization of fault-tolerant quantum computation. Here, we present a protocol to realize universal nonadiabatic geometric gates for silicon-based spin qubits. We find that the advantage of geometric gates over dynamical gates depends crucially on the evolution loop for the construction of the geometric phase. Under appropriate evolution loops, both the geometric single-qubit gates and the CNOT gate can outperform their dynamical counterparts for both systematic and detuning noises. We also perform randomized benchmarking using noise amplitudes consistent with experiments in silicon. For the static noise model, the averaged fidelities of geometric gates are around 99.90\% or above, while for the time-dependent $1/f$-type noise, the fidelities are around 99.98\% when only the detuning noise is present. We also show that the improvement in fidelities of the geometric gates over dynamical ones typically increases with the exponent $α$ of the $1/f$ noise, and the ratio can be as high as 4 when $α\approx 3$. Our results suggest that geometric gates with judiciously chosen evolution loops can be a powerful way to realize high-fidelity quantum gates.

preprint2020arXiv

How fine can fine-tuning be? Learning efficient language models

State-of-the-art performance on language understanding tasks is now achieved with increasingly large networks; the current record holder has billions of parameters. Given a language model pre-trained on massive unlabeled text corpora, only very light supervised fine-tuning is needed to learn a task: the number of fine-tuning steps is typically five orders of magnitude lower than the total parameter count. Does this mean that fine-tuning only introduces small differences from the pre-trained model in the parameter space? If so, can one avoid storing and computing an entire model for each task? In this work, we address these questions by using Bidirectional Encoder Representations from Transformers (BERT) as an example. As expected, we find that the fine-tuned models are close in parameter space to the pre-trained one, with the closeness varying from layer to layer. We show that it suffices to fine-tune only the most critical layers. Further, we find that there are surprisingly many good solutions in the set of sparsified versions of the pre-trained model. As a result, fine-tuning of huge language models can be achieved by simply setting a certain number of entries in certain layers of the pre-trained parameters to zero, saving both task-specific parameter storage and computational cost.

preprint2020arXiv

Influence of Laser Intensity Fluctuation on Single-Cesium Atom Trapping Lifetime in a 1064-nm Microscopic Optical Tweezer

An optical tweezer composed of a strongly focused single-spatial-mode Gaussian beam of a red-detuned 1064-nm laser can confine a single-cesium (Cs) atom at the strongest point of the light intensity. We can use this for coherent manipulation of single-quantum bits and single-photon sources. The trapping lifetime of the atoms in the optical tweezers is very short due to the impact of the background atoms, the laser intensity fluctuation of optical tweezer and the residual thermal motion of the atoms. In this paper, we analyzed the influence of the background pressure, the trap frequency of optical tweezers and the parametric heating of the optical tweezer on the atomic trapping lifetime. Combined with the external feedback loop based on an acousto-optical modulator (AOM), the intensity fluctuation of the 1064-nm laser in the time domain was suppressed from $\pm$ 3.360$\%$ to $\pm$ 0.064$\%$, and the suppression bandwidth reached approximately 33 kHz. The trapping lifetime of a single Cs atom in the microscopic optical tweezer was extended from 4.04 s to 6.34 s.

preprint2020arXiv

Interpretable CNNs for Object Classification

This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part. Our method does not require additional annotations of object parts or textures for supervision. Instead, we use the same training data as traditional CNNs. Our method automatically assigns each interpretable filter in a high conv-layer with an object part of a certain category during the learning process. Such explicit knowledge representations in conv-layers of CNN help people clarify the logic encoded in the CNN, i.e., answering what patterns the CNN extracts from an input image and uses for prediction. We have tested our method using different benchmark CNNs with various structures to demonstrate the broad applicability of our method. Experiments have shown that our interpretable filters are much more semantically meaningful than traditional filters.

preprint2020arXiv

Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment

We propose and demonstrate a novel machine learning algorithm that assesses pulmonary edema severity from chest radiographs. While large publicly available datasets of chest radiographs and free-text radiology reports exist, only limited numerical edema severity labels can be extracted from radiology reports. This is a significant challenge in learning such models for image classification. To take advantage of the rich information present in the radiology reports, we develop a neural network model that is trained on both images and free-text to assess pulmonary edema severity from chest radiographs at inference time. Our experimental results suggest that the joint image-text representation learning improves the performance of pulmonary edema assessment compared to a supervised model trained on images only. We also show the use of the text for explaining the image classification by the joint model. To the best of our knowledge, our approach is the first to leverage free-text radiology reports for improving the image model performance in this application. Our code is available at https://github.com/RayRuizhiLiao/joint_chestxray.

preprint2020arXiv

Joint User Identification, Channel Estimation, and Signal Detection for Grant-Free NOMA

For massive machine-type communications, centralized control may incur a prohibitively high overhead. Grant-free non-orthogonal multiple access (NOMA) provides possible solutions, yet poses new challenges for efficient receiver design. In this paper, we develop a joint user identification, channel estimation, and signal detection (JUICESD) algorithm. We divide the whole detection scheme into two modules: slot-wise multi-user detection (SMD) and combined signal and channel estimation (CSCE). SMD is designed to decouple the transmissions of different users by leveraging the approximate message passing (AMP) algorithms, and CSCE is designed to deal with the nonlinear coupling of activity state, channel coefficient and transmit signal of each user separately. To address the problem that the exact calculation of the messages exchanged within CSCE and between the two modules is complicated due to phase ambiguity issues, this paper proposes a rotationally invariant Gaussian mixture (RIGM) model, and develops an efficient JUICESD-RIGM algorithm. JUICESD-RIGM achieves a performance close to JUICESD with a much lower complexity. Capitalizing on the feature of RIGM, we further analyze the performance of JUICESD-RIGM with state evolution techniques. Numerical results demonstrate that the proposed algorithms achieve a significant performance improvement over the existing alternatives, and the derived state evolution method predicts the system performance accurately.

preprint2020arXiv

Learning Tuple Compatibility for Conditional OutfitRecommendation

Outfit recommendation requires the answers of some challenging outfit compatibility questions such as &#39;Which pair of boots and school bag go well with my jeans and sweater?&#39;. It is more complicated than conventional similarity search, and needs to consider not only visual aesthetics but also the intrinsic fine-grained and multi-category nature of fashion items. Some existing approaches solve the problem through sequential models or learning pair-wise distances between items. However, most of them only consider coarse category information in defining fashion compatibility while neglecting the fine-grained category information often desired in practical applications. To better define the fashion compatibility and more flexibly meet different needs, we propose a novel problem of learning compatibility among multiple tuples (each consisting of an item and category pair), and recommending fashion items following the category choices from customers. Our contributions include: 1) Designing a Mixed Category Attention Net (MCAN) which integrates both fine-grained and coarse category information into recommendation and learns the compatibility among fashion tuples. MCAN can explicitly and effectively generate diverse and controllable recommendations based on need. 2) Contributing a new dataset IQON, which follows eastern culture and can be used to test the generalization of recommendation systems. Our extensive experiments on a reference dataset Polyvore and our dataset IQON demonstrate that our method significantly outperforms state-of-the-art recommendation methods.

preprint2020arXiv

Lepton Flavor Mixing and CP Violation in the Minimal Type-(I+II) Seesaw Model with a Modular $A_4$ Symmetry

In this paper, we study the implications of the modular $A^{}_4$ flavor symmetry in constructing a supersymmetric minimal type-(I+II) seesaw model, in which only one right-handed neutrino and two Higgs triplets are introduced to account for the tiny neutrino masses, flavor mixing and CP violation. The right-handed neutrino as well as the Higgs triplets in this model are assigned into the trivial one-dimensional irreducible representation of the modular group $A^{}_{4}$. We show that the individual contributions to the neutrino masses from the right-handed neutrino and the Higgs triplet are comparable. We also find that the neutrino mass matrix can possess an approximate $μ-τ$ reflection symmetry for some specific values of free model parameters. Moreover, our model predicts relatively large masses of three light neutrinos, thus can be easily tested in future neutrino experiments.

preprint2020arXiv

Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences

We have been investigating rakugo speech synthesis as a challenging example of speech synthesis that entertains audiences. Rakugo is a traditional Japanese form of verbal entertainment similar to a combination of one-person stand-up comedy and comic storytelling and is popular even today. In rakugo, a performer plays multiple characters, and conversations or dialogues between the characters make the story progress. To investigate how close the quality of synthesized rakugo speech can approach that of professionals&#39; speech, we modeled rakugo speech using Tacotron 2, a state-of-the-art speech synthesis system that can produce speech that sounds as natural as human speech albeit under limited conditions, and an enhanced version of it with self-attention to better consider long-term dependencies. We also used global style tokens and manually labeled context features to enrich speaking styles. Through a listening test, we measured not only naturalness but also distinguishability of characters, understandability of the content, and the degree of entertainment. Although we found that the speech synthesis models could not yet reach the professional level, the results of the listening test provided interesting insights: 1) we should not focus only on the naturalness of synthesized speech but also the distinguishability of characters and the understandability of the content to further entertain audiences; 2) the fundamental frequency (fo) expressions of synthesized speech are poorer than those of human speech, and more entertaining speech should have richer fo expression. Although there is room for improvement, we believe this is an important stepping stone toward achieving entertaining speech synthesis at the professional level.

preprint2020arXiv

More Practical and Adaptive Algorithms for Online Quantum State Learning

Online quantum state learning is a recently proposed problem by Aaronson et al. (2018), where the learner sequentially predicts $n$-qubit quantum states based on given measurements on states and noisy outcomes. In the previous work, the algorithms are worst-case optimal in general but fail in achieving tighter bounds in certain simpler or more practical cases. In this paper, we develop algorithms to advance the online learning of quantum states. First, we show that Regularized Follow-the-Leader (RFTL) method with Tallis-2 entropy can achieve an $O(\sqrt{MT})$ total loss with perfect hindsight on the first $T$ measurements with maximum rank $M$. This regret bound depends only on the maximum rank $M$ of measurements rather than the number of qubits, which takes advantage of low-rank measurements. Second, we propose a parameter-free algorithm based on a classical adjusting learning rate schedule that can achieve a regret depending on the loss of best states in hindsight, which takes advantage of low noisy outcomes. Besides these more adaptive bounds, we also show that our RFTL with Tallis-2 entropy algorithm can be implemented efficiently on near-term quantum computing devices, which is not achievable in previous works.

preprint2020arXiv

Multi-modal Deep Analysis for Multimedia

With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.

preprint2020arXiv

Nearly nondestructive thermometry of labeled cold atoms and application to isotropic laser cooling

We have designed and implemented a straightforward method to deterministically measure the temperature of the selected segment of a cold atom ensemble, and we have also developed an upgrade in the form of nondestructive thermometry. The essence is to monitor the thermal expansion of the targeted cold atoms after labeling them through manipulating the internal states, and the nondestructive property relies upon the nearly lossless detection via driving a cycling transition. For cold atoms subject to isotropic laser cooling, this method has the unique capability of addressing only the atoms on the optical detection axis within the enclosure, which is exactly the part we care about in major applications such as atomic clock or quantum sensing. Furthermore, our results confirm the sub-Doppler cooling features in isotropic laser cooling, and we have investigated the relevant cooling properties. Meanwhile, we have applied the recently developed optical configuration with the cooling laser injection in the form of hollow beams, which helps to enhance the cooling performance and accumulate more cold atoms in the central regions.

preprint2020arXiv

New Constructions of Optimal Locally Repairable Codes with Super-Linear Length

As an important coding scheme in modern distributed storage systems, locally repairable codes (LRCs) have attracted a lot of attentions from perspectives of both practical applications and theoretical research. As a major topic in the research of LRCs, bounds and constructions of the corresponding optimal codes are of particular concerns. In this work, codes with $(r,δ)$-locality which have optimal minimal distance w.r.t. the bound given by Prakash et al. \cite{Prakash2012Optimal} are considered. Through parity check matrix approach, constructions of both optimal $(r,δ)$-LRCs with all symbol locality ($(r,δ)_a$-LRCs) and optimal $(r,δ)$-LRCs with information locality ($(r,δ)_i$-LRCs) are provided. As a generalization of a work of Xing and Yuan \cite{XY19}, these constructions are built on a connection between sparse hypergraphs and optimal $(r,δ)$-LRCs. With the help of constructions of large sparse hypergraphs, the length of codes constructed can be super-linear in the alphabet size. This improves upon previous constructions when the minimal distance of the code is at least $3δ+1$. As two applications, optimal H-LRCs with super-linear length and GSD codes with unbounded length are also constructed.

preprint2020arXiv

On Lattice Packings and Coverings of Asymmetric Limited-Magnitude Balls

We construct integer error-correcting codes and covering codes for the limited-magnitude error channel with more than one error. The codes are lattices that pack or cover the space with the appropriate error ball. Some of the constructions attain an asymptotic packing/covering density that is constant. The results are obtained via various methods, including the use of codes in the Hamming metric, modular $B_t$-sequences, $2$-fold Sidon sets, and sets avoiding arithmetic progression.

preprint2020arXiv

Post-Heat Treatment Design of High-Strength Low-Alloy Steels Processed by Laser Powder Bed Fusion

In this study, a post-heat treatment design for additively manufactured copper-bearing high-strength low-alloy (HSLA)-100 steel is performed by understanding the process-structure-property relationships. Hot isostatic pressing (HIP) is designed to reduce the porosity from 3% to less than 1% for the HSLA-100 steel processed by laser powder bed fusion (LPBF). Quenching dilatometry is employed to design the HIP parameters with the optimized cooling rate for the maximum amount of martensite transformed after HIP. Afterward, a post-heat treatment step with cyclic re-austenitization is introduced for an effective grain refinement to compensate the coarsened microstructure after HIP. Finally, tempering is optimized through microstructure characterization and microhardness. A two-fold increase in the yield strength of the HSLA with tailored microstructure during post-heat treatment is achieved in comparison with the as-built HSLA.

preprint2020arXiv

Public discourse and social network echo chambers driven by socio-cognitive biases

In recent years, social media has increasingly become an important platform for political campaigns, especially elections. It remains elusive how exactly public discourse is driven by the intricate interplay between individual socio-cognitive biases, dueling campaign efforts, and social media platforms. We examine this complex socio-political process by integrating observed retweet networks from the 2016 political networks with an agent-based model of political opinion formation and network structure. Here we show that the range of political viewpoints individuals are willing to consider is a key determinant in the formation of polarized networks and the emergence of echo chambers. We also find that winning majority support in public discourse is determined by both the effort exerted by campaigns and the relative ideological positioning of opposing campaigns. Our results demonstrate how public discourse and political polarization can be modeled as an interactive process of shifting individual opinions, evolving social networks, and political campaigns.

preprint2020arXiv

Quantum algorithms for hedging and the learning of Ising models

A paradigmatic algorithm for online learning is the Hedge algorithm by Freund and Schapire. An allocation into different strategies is chosen for multiple rounds and each round incurs corresponding losses for each strategy. The algorithm obtains a favorable guarantee for the total losses even in an adversarial situation. This work presents quantum algorithms for such online learning in an oracular setting. For $T$ time steps and $N$ strategies, we exhibit run times of about $O \left ({\rm poly} (T) \sqrt{N} \right)$ for estimating the losses and for betting on individual strategies by sampling. In addition, we discuss a quantum analogue of the Sparsitron, a machine learning algorithm based on the Hedge algorithm. The quantum algorithm inherits the provable learning guarantees from the classical algorithm and exhibits polynomial speedups. The speedups may find relevance in finance, for example for hedging risks, and machine learning, for example for learning generalized linear models or Ising models.

preprint2020arXiv

Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier

Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. In this paper, we propose a modification of SDIM termed SDIM-\emph{logit}. Instead of training generative classifier from scratch, SDIM-\emph{logit} first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM-\emph{logit} could inherit the performance of the discriminative classifier without loss. SDIM-\emph{logit} incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform \emph{classification with rejection}, where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM-\emph{logit} significantly improves the performance on the left test sets.

preprint2020arXiv

Resonant Beam Communications with Photovoltaic Receiver for Optical Data and Power Transfer

The vision and requirements of the sixth generation (6G) mobile communication systems are expected to adopt freespace optical communication (FSO) and wireless power transfer (WPT). The laser-based WPT or wireless information transfer (WIT) usually faces the challenges of mobility and safety. We present a mobile and safe resonant beam communication (RBCom) system, which can realize high-rate simultaneous wireless information and power transfer (SWIPT). We propose an analytical model to depict its carrier beam and information transfer procedures. The numerical results show that RBCom can achieve more than 40 mW charging power and 1:6 Gbit/s channel capacity with orthogonal frequency division multiplexing (OFDM) scheme, which can be applied in future scenario where power and high-rate data are simultaneously desired.

preprint2020arXiv

Reverberation Modeling for Source-Filter-based Neural Vocoder

This paper presents a reverberation module for source-filter-based neural vocoders that improves the performance of reverberant effect modeling. This module uses the output waveform of neural vocoders as an input and produces a reverberant waveform by convolving the input with a room impulse response (RIR). We propose two approaches to parameterizing and estimating the RIR. The first approach assumes a global time-invariant (GTI) RIR and directly learns the values of the RIR on a training dataset. The second approach assumes an utterance-level time-variant (UTV) RIR, which is invariant within one utterance but varies across utterances, and uses another neural network to predict the RIR values. We add the proposed reverberation module to the phase spectrum predictor (PSP) of a HiNet vocoder and jointly train the model. Experimental results demonstrate that the proposed module was helpful for modeling the reverberation effect and improving the perceived quality of generated reverberant speech. The UTV-RIR was shown to be more robust than the GTI-RIR to unknown reverberation conditions and achieved a perceptually better reverberation effect.

preprint2020arXiv

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language. Overcoming this challenge requires the ability to perform a wide variety of complex tasks in response to multifarious instructions from humans. In the hope that it might drive progress towards more flexible and powerful human interactions with robots, we propose a dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images. Given an instruction, success requires navigating through a previously-unseen environment to identify an object. This represents a practical challenge, but one that closely reflects one of the core visual problems in robotics. Several state-of-the-art vision-and-language navigation, and referring-expression models are tested to verify the difficulty of this new task, but none of them show promising results because there are many fundamental differences between our task and previous ones. A novel Interactive Navigator-Pointer model is also proposed that provides a strong baseline on the task. The proposed model especially achieves the best performance on the unseen test split, but still leaves substantial room for improvement compared to the human performance.

preprint2020arXiv

Rydberg level shift due to the electric field generated by Rydberg atom collision induced ionization in cesium atomic ensemble

We experimentally studied the Rydberg level shift caused by the electric field, which is generated by Rydberg atom collision induced ionization in a cesium atomic ensemble. The density of charged particles caused by collisions between Rydberg atoms is changed by controlling the ground-state atomic density and optical excitation process. We measured the Rydberg level shift using Rydberg electromagnetically-induced-transparency (EIT) spectroscopy, and interpreted the physical origin using a semi-classical model. The experimental results are in good agreement with the numerical simulation. These energy shifts are important for the self-calibrated sensing of microwave field by the employing of Rydberg EIT. Moreover, in contrast to the resonant excitation case, narrow-linewidth spectroscopy with high signal-to-noise ratio would be useful for high-precision measurements.

preprint2020arXiv

Scattering medium: randomly packed pinhole cameras

When light travels through scattering media, speckles (spatially random distribution of fluctuated intensities) are formed due to the interference of light travelling along different optical paths, preventing the perception of structure, absolute location and dimension of a target within or on the other side of the medium. Currently, the prevailing techniques such as wavefront shaping, optical phase conjugation, scattering matrix measurement, and speckle autocorrelation imaging can only picture the target structure in the absence of prior information. Here we show that a scattering medium can be conceptualized as an assembly of randomly packed pinhole cameras, and the corresponding speckle pattern is a superposition of randomly shifted pinhole images. This provides a new perspective to bridge target, scattering medium, and speckle pattern, allowing one to localize and profile a target quantitatively from speckle patterns perceived from the other side of the scattering medium, which is impossible with all existing methods. The method also allows us to interpret some phenomena of diffusive light that are otherwise challenging to understand. For example, why the morphological appearance of speckle patterns changes with the target, why information is difficult to be extracted from thick scattering media, and what determines the capability of seeing through scattering media. In summary, the concept, whilst in its infancy, opens a new door to unveiling scattering media and information extraction from scattering media in real time.

preprint2020arXiv

Self-Supervised Deep Visual Odometry with Online Adaptation

Self-supervised VO methods have shown great success in jointly estimating camera pose and depth from videos. However, like most data-driven methods, existing VO networks suffer from a notable decrease in performance when confronted with scenes different from the training data, which makes them unsuitable for practical applications. In this paper, we propose an online meta-learning algorithm to enable VO networks to continuously adapt to new environments in a self-supervised manner. The proposed method utilizes convolutional long short-term memory (convLSTM) to aggregate rich spatial-temporal information in the past. The network is able to memorize and learn from its past experience for better estimation and fast adaptation to the current frame. When running VO in the open world, in order to deal with the changing environment, we propose an online feature alignment method by aligning feature distributions at different time. Our VO network is able to seamlessly adapt to different environments. Extensive experiments on unseen outdoor scenes, virtual to real world and outdoor to indoor environments demonstrate that our method consistently outperforms state-of-the-art self-supervised VO baselines considerably.

preprint2020arXiv

Spin-orbit coupling and spin-triplet pairing symmetry in $\mathrm{Sr_2 Ru O_4}$

Spin-orbit coupling (SOC) plays a crucial role in determining the spin structure of an odd parity psedospin-triplet Cooper pairing state. Here, we present a thorough study of how SOC lifts the degeneracy among different p-wave pseudospin-triplet pairing states in a widely used microscopic model for $\mathrm{Sr_2 Ru O_4}$, combining a Ginzburg-Landau (GL) free energy expansion, a symmetry analysis of the model, and numerical weak-coupling renormalization group (RG) and random phase approximation (RPA) calculations. These analyses are then used to critically re-examine previous numerical results on the stability of chiral p-wave pairing. The symmetry analysis can serve as a guide for future studies, especially numerical calculations, on the pairing instability in $\mathrm{Sr_2 Ru O_4}$ and can be useful for studying other multi-band spin-triplet superconductors where SOC plays an important role.

preprint2020arXiv

Stacking fault energy prediction for austenitic steels: thermodynamic modeling vs. machine learning

Stacking fault energy (SFE) is of the most critical microstructure attribute for controlling the deformation mechanism and optimizing mechanical properties of austenitic steels, while there are no accurate and straightforward computational tools for modeling it. In this work, we applied both thermodynamic modeling and machine learning to predict the stacking fault energy (SFE) for more than 300 austenitic steels. The comparison indicates a high need of improving low-temperature CALPHAD (CALculation of PHAse Diagrams) databases and interfacial energy prediction to enhance thermodynamic model reliability. The ensembled machine learning algorithms provide a more reliable prediction compared with thermodynamic and empirical models. Based on the statistical analysis of experimental results, only Ni and Fe have a moderate monotonic influence on SFE, while many other elements exhibit a complex effect that their influence on SFE may change with the alloy composition.

preprint2020arXiv

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs. The reliability of spoofing CMs is typically gauged using the equal error rate (EER) metric. The primitive EER fails to reflect application requirements and the impact of spoofing and CMs upon ASV and its use as a primary metric in traditional ASV research has long been abandoned in favour of risk-based approaches to assessment. This paper presents several new extensions to the tandem detection cost function (t-DCF), a recent risk-based approach to assess the reliability of spoofing CMs deployed in tandem with an ASV system. Extensions include a simplified version of the t-DCF with fewer parameters, an analysis of a special case for a fixed ASV system, simulations which give original insights into its interpretation and new analyses using the ASVspoof 2019 database. It is hoped that adoption of the t-DCF for the CM assessment will help to foster closer collaboration between the anti-spoofing and ASV research communities.

preprint2020arXiv

Task-Aware Feature Generation for Zero-Shot Compositional Learning

Visual concepts (e.g., red apple, big elephant) are often semantically compositional and each element of the compositions can be reused to construct novel concepts (e.g., red elephant). Compositional feature synthesis, which generates image feature distributions exploiting the semantic compositionality, is a promising approach to sample-efficient model generalization. In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts. These synthetic features are then used to train a classifier to recognize novel concepts in a zero-shot manner. Our novel TFG design injects task-conditioned noise layer-by-layer, producing task-relevant variation at each level. We find the proposed generator design improves classification accuracy and sample efficiency. Our model establishes a new state of the art on three zero-shot compositional learning (ZSCL) benchmarks, outperforming the previous discriminative models by a large margin. Our model improves the performance of the prior arts by over 2x in the generalized ZSCL setting.

preprint2020arXiv

The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain

In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression. Just like perceptual and cognitive neurophysiology has inspired effective deep neural network architectures which in turn make a useful model for understanding the brain, here we explore how biological neural development might inspire efficient and robust optimization procedures which in turn serve as a useful model for the maturation and aging of the brain.

preprint2020arXiv

The three-level coupled Maxwell-Bloch equations: rogue waves, semirational rogue waves and W-shaped solitons

In this paper the coupled Maxwell-Bloch equations which describe the propagation of two optical pulses in an optical medium with coherent three-level atoms are studied by Darboux transformation. The general nth-order rogue wave solution involving two different choices of multiple roots for the spectral characteristic equation and the multiparametric nth-order semirational solution are both obtained in terms of Schur polynomials. The explicit rogue wave solutions and semirational solutions from first to second order are provided. In contrast to the known Peregrine soliton, dark and four-petaled structures, some unusual patterns such as triple-hole, twisted-pair, composite four-petaled and composite dark rogue waves are put forward. Moreover, the interaction between dark-bright soliton and dark rogue wave and interaction between breather and dark rogue wave are shown. Further, the higher-order nonlinear superposition modes which feature triple and quadruple temporal-spatial distributions are presented. Finally, the state transition between rogue wave and W-shaped soliton is found where the modulation instability growth rate tends to zero under the low perturbation frequency. Particularly, the dark and double-peak W-shaped solitons are examined.

preprint2020arXiv

Time-dependent Hamiltonian simulation with $L^1$-norm scaling

The difficulty of simulating quantum dynamics depends on the norm of the Hamiltonian. When the Hamiltonian varies with time, the simulation complexity should only depend on this quantity instantaneously. We develop quantum simulation algorithms that exploit this intuition. For sparse Hamiltonian simulation, the gate complexity scales with the $L^1$ norm $\int_{0}^{t}\mathrm{d}τ\left\lVert H(τ)\right\lVert_{\max}$, whereas the best previous results scale with $t\max_{τ\in[0,t]}\left\lVert H(τ)\right\lVert_{\max}$. We also show analogous results for Hamiltonians that are linear combinations of unitaries. Our approaches thus provide an improvement over previous simulation algorithms that can be substantial when the Hamiltonian varies significantly. We introduce two new techniques: a classical sampler of time-dependent Hamiltonians and a rescaling principle for the Schrödinger equation. The rescaled Dyson-series algorithm is nearly optimal with respect to all parameters of interest, whereas the sampling-based approach is easier to realize for near-term simulation. These algorithms could potentially be applied to semi-classical simulations of scattering processes in quantum chemistry.

preprint2020arXiv

Tunable optomechanically induced transparency by controlling the dark-mode effect

We study tunable optomechanically induced transparency by controlling the dark-mode effect induced by two mechanical modes coupled to a common cavity field. This is realized by introducing a phase-dependent phonon-exchange interaction, which is used to form a loop-coupled configuration. Combining this phase-dependent coupling with the optomechanical interactions, the dark-mode effect can be controlled by the quantum interference effect. In particular, the dark-mode effect in this two-mechanical-mode optomechanical system can lead to a double-amplified optomechanically induced transparency (OMIT) window and a higher efficiency of the second-order sideband in comparison with the standard optomechanical system. This is because the effective mechanical decay rate related to the linewidth of the OMIT window becomes a twofold increase in the weak-coupling limit. When the dark-mode effect is broken, controllable double transparency windows appear and the second-order sideband, as well as the light delay or advance, is significantly enhanced. For an N-mechanical-mode optomechanical system, we find that in the presence of the dark-mode effect, the amplification multiple of the linewidth of the OMIT window is nearly proportional to the number of mechanical modes, and that the OMIT with a single window becomes the one with N tunable windows by breaking the dark-mode effect. The study will be useful in optical information storage within a large-frequency bandwidth and multichannel optical communication based on optomechanical systems.

preprint2020arXiv

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e.g., television) using only visual observations. A key challenge for current deep reinforcement learning models lies in the requirements for a large amount of training data. It is exceedingly expensive to construct sufficient 3D synthetic environments annotated with the target object information. In this paper, we focus on visual navigation in the low-resource setting, where we have only a few training environments annotated with object information. We propose a novel unsupervised reinforcement learning approach to learn transferable meta-skills (e.g., bypass obstacles, go straight) from unannotated environments without any supervisory signals. The agent can then fast adapt to visual navigation through learning a high-level master policy to combine these meta-skills, when the visual-navigation-specified reward is provided. Evaluation in the AI2-THOR environments shows that our method significantly outperforms the baseline by 53.34% relatively on SPL, and further qualitative analysis demonstrates that our method learns transferable motor primitives for visual navigation.

preprint2020arXiv

Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model

Neural source-filter (NSF) waveform models generate speech waveforms by morphing sine-based source signals through dilated convolution in the time domain. Although the sine-based source signals help the NSF models to produce voiced sounds with specified pitch, the sine shape may constrain the generated waveform when the target voiced sounds are less periodic. In this paper, we propose a more flexible source signal called cyclic noise, a quasi-periodic noise sequence given by the convolution of a pulse train and a static random noise with a trainable decaying rate that controls the signal shape. We further propose a masked spectral loss to guide the NSF models to produce periodic voiced sounds from the cyclic noise-based source signal. Results from a large-scale listening test demonstrated the effectiveness of the cyclic noise and the masked spectral loss on speaker-independent NSF models in copy-synthesis experiments on the CMU ARCTIC database.

preprint2020arXiv

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSR-VTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context. Extensive experiments on the VATEX dataset show that, first, the unified multilingual model can not only produce both English and Chinese descriptions for a video more efficiently, but also offer improved performance over the monolingual models. Furthermore, we demonstrate that the spatiotemporal video context can be effectively utilized to align source and target languages and thus assist machine translation. In the end, we discuss the potentials of using VATEX for other video-and-language research.

preprint2020arXiv

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers. We investigate multi-speaker modeling for end-to-end text-to-speech synthesis and study the effects of different types of state-of-the-art neural speaker embeddings on speaker similarity for unseen speakers. Learnable dictionary encoding-based speaker embeddings with angular softmax loss can improve equal error rates over x-vectors in a speaker verification task; these embeddings also improve speaker similarity and naturalness for unseen speakers when used for zero-shot adaptation to new speakers in end-to-end speech synthesis.

preprint2019arXiv

Elliptic Blowup Equations for 6d SCFTs. II: Exceptional Cases

The building blocks of 6d $(1,0)$ SCFTs include certain rank one theories with gauge group $G=SU(3),SO(8),F_4,E_{6,7,8}$. In this paper, we propose a universal recursion formula for the elliptic genera of all such theories. This formula is solved from the elliptic blowup equations introduced in our previous paper. We explicitly compute the elliptic genera and refined BPS invariants, which recover all previous results from topological string theory, modular bootstrap, Hilbert series, 2d quiver gauge theories and 4d $\mathcal{N}=2$ superconformal $H_{G}$ theories. We also observe an intriguing relation between the $k$-string elliptic genus and the Schur indices of rank $k$ $H_{G}$ SCFTs, as a generalization of Lockhart-Zotto&#39;s conjecture at the rank one cases. In a subsequent paper, we deal with all other non-Higgsable clusters with matters.

preprint2019arXiv

Elliptic Blowup Equations for 6d SCFTs. III: E-strings, M-strings and Chains

We establish the elliptic blowup equations for E-strings and M-strings and solve elliptic genera and refined BPS invariants from them. Such elliptic blowup equations can be derived from a path integral interpretation. We provide toric hypersurface construction for the Calabi-Yau geometries of M-strings and those of E-strings with up to three mass parameters turned on, as well as an approach to derive the perturbative prepotential directly from the local description of the Calabi-Yau threefolds. We also demonstrate how to systematically obtain blowup equations for all rank one 5d SCFTs from E-string by blow-down operations. Finally, we present blowup equations for E-M and M string chains.

preprint2019arXiv

Experimental Test of Leggett&#39;s Inequalities with Solid-State Spins

Bell&#39;s theorem states that no local hidden variable model is compatible with quantum mechanics. Surprisingly, even if we release the locality constraint, certain nonlocal hidden variable models, such as the one proposed by Leggett, may still be at variance with the predictions of quantum physics. Here, we report an experimental test of Leggett&#39;s nonlocal model with solid-state spins in a diamond nitrogen-vacancy center. We entangle an electron spin with a surrounding weakly coupled $^{13}C$ nuclear spin and observe that the entangled states violate Leggett-type inequalities by more than four and seven standard deviations for six and eight measurement settings, respectively. Our experimental results are in full agreement with quantum predictions and violate Leggett&#39;s nonlocal hidden variable inequality with a high level of confidence.

preprint2019arXiv

Homophily on social networks changes evolutionary advantage in competitive information diffusion

Competitive information diffusion on large-scale social networks reveals fundamental characteristics of rumor contagions and has profound influence on public opinion formation. There has been growing interest in exploring dynamical mechanisms of the competing evolutions recently. Nevertheless, the impacts of population homophily, which determines powerful collective human behaviors, remains unclear. In this paper, we incorporate homophily effects into a modified competitive ignorant-spreader-ignorant (SIS) rumor diffusion model with generalized population preference. Using microscopic Markov chain approach, we first derive the phase diagram of competing diffusion results and examine how competitive information spreads and evolves on social networks. We then explore the detailed effects of homophily, which is modeled by a rewiring mechanism. Results show that homophily promotes the formation of divided &#34;echo chambers&#34; and protects the disadvantaged information from extinction, which further changes or even reverses the evolutionary advantage, i.e., the difference of final proportions of the competitive information. We highlight the conclusion that the reversals may happen only when the initially disadvantaged information has stronger transmission ability, owning diffusion advantage over the other one. Our framework provides profound insight into competing dynamics with population homophily, which may pave ways for further controlling misinformation and guiding public belief systems. Moreover, the reversing condition sheds light on designing effective competing strategies in many real scenarios.

preprint2019arXiv

Modeling and Analysis of Energy Harvesting and Smart Grid-Powered Wireless Communication Networks: A Contemporary Survey

The advancements in smart power grid and the advocation of ``green communications&#39;&#39; have inspired the wireless communication networks to harness energy from ambient environments and operate in an energy-efficient manner for economic and ecological benefits. This article presents a contemporary review of recent breakthroughs on the utilization, redistribution, trading and planning of energy harvested in future wireless networks interoperating with smart grids. This article starts with classical models of renewable energy harvesting technologies. We embark on constrained operation and optimization of different energy harvesting wireless systems, such as point-to-point, multipoint-to-point, multipoint-to-multipoint, multi-hop, and multi-cell systems. We also review wireless power and information transfer technologies which provide a special implementation of energy harvesting wireless communications. A significant part of the article is devoted to the redistribution of redundant (unused) energy harvested within cellular networks, the energy planning under dynamic pricing when smart grids are in place, and two-way energy trading between cellular networks and smart grids. Applications of different optimization tools, such as convex optimization, Lagrangian dual-based method, subgradient method, and Lyapunov-based online optimization, are compared. This article also collates the potential applications of energy harvesting techniques in emerging (or upcoming) 5G/B5G communication systems. It is revealed that an effective redistribution and two-way trading of energy can significantly reduce the electricity bills of wireless service providers and decrease the consumption of brown energy. A list of interesting research directions are provided, requiring further investigation.

preprint2019arXiv

On the Properties of the Effective Jarlskog Invariant for Three-flavor Neutrino Oscillations in Matter

In this paper, we show that the ratio of the effective Jarlskog invariant $\widetilde{\cal J}$ for leptonic CP violation in three-flavor neutrino oscillations in matter to its counterpart ${\cal J}$ in vacuum $\widetilde{\cal J}/{\cal J} \approx 1/(\hat{C}^{}_{12} \hat{C}^{}_{13})$ holds as an excellent approximation, where $\hat{C}^{}_{12} \equiv \sqrt{1 - 2 \hat{A}^{}_* \cos 2θ^{}_{12} + \hat{A}^2_*}$ with $\hat{A}^{}_* \equiv a\cos^2 θ^{}_{13}/Δ^{}_{21}$ and $\hat{C}^{}_{13} \equiv \sqrt{1 - 2 A^{}_{\rm c} \cos 2θ^{}_{13} + A^2_{\rm c}}$ with $A^{}_{\rm c} \equiv a/Δ^{}_{\rm c}$. Here $Δ^{}_{ij} \equiv m^2_i - m^2_j$ (for $ij = 21, 31, 32$) stand for the neutrino mass-squared differences in vacuum and $θ^{}_{ij}$ (for $ij = 12, 13, 23$) are the neutrino mixing angles in vacuum, while $Δ^{}_{\rm c} \equiv Δ^{}_{31}\cos^2θ^{}_{12} + Δ^{}_{32} \sin^2 θ^{}_{12}$ and the matter parameter $a \equiv 2\sqrt{2}G^{}_{\rm F} N^{}_e E$ are defined. This result has been explicitly derived by improving the previous analytical solutions to the renormalization-group equations of effective neutrino masses and mixing parameters in matter. Furthermore, as a practical application, such a simple analytical formula has been implemented to understand the existence and location of the extrema of $\widetilde{\cal J}$.

preprint2019arXiv

Overcome Competitive Exclusion in Ecosystems

Explaining biodiversity in nature is a fundamental problem in ecology. An outstanding challenge is embodied in the so-called Competitive Exclusion Principle: two species competing for one limiting resource cannot coexist at constant population densities, or more generally, the number of consumer species in steady coexistence cannot exceed that of resources. The fact that competitive exclusion is rarely observed in natural ecosystems has not been fully understood. Here we show that by forming chasing triplets among the consumers and resources in the consumption process, the Competitive Exclusion Principle can be naturally violated. The modeling framework developed here is broadly applicable and can be used to explain the biodiversity of many consumer-resource ecosystems and hence deepens our understanding of biodiversity in nature.

preprint2019arXiv

Privacy-preserving Distributed Machine Learning via Local Randomization and ADMM Perturbation

With the proliferation of training data, distributed machine learning (DML) is becoming more competent for large-scale learning tasks. However, privacy concerns have to be given priority in DML, since training data may contain sensitive information of users. In this paper, we propose a privacy-preserving ADMM-based DML framework with two novel features: First, we remove the assumption commonly made in the literature that the users trust the server collecting their data. Second, the framework provides heterogeneous privacy for users depending on data&#39;s sensitive levels and servers&#39; trust degrees. The challenging issue is to keep the accumulation of privacy losses over ADMM iterations minimal. In the proposed framework, a local randomization approach, which is differentially private, is adopted to provide users with self-controlled privacy guarantee for the most sensitive information. Further, the ADMM algorithm is perturbed through a combined noise-adding method, which simultaneously preserves privacy for users&#39; less sensitive information and strengthens the privacy protection of the most sensitive information. We provide detailed analyses on the performance of the trained model according to its generalization error. Finally, we conduct extensive experiments using real-world datasets to validate the theoretical results and evaluate the classification performance of the proposed framework.

preprint2019arXiv

Quantum Channel Simulation and the Channel&#39;s Smooth Max-Information

We study the general framework of quantum channel simulation, that is, the ability of a quantum channel to simulate another one using different classes of codes. First, we show that the minimum error of simulation and the one-shot quantum simulation cost under no-signalling assisted codes are given by semidefinite programs. Second, we introduce the channel&#39;s smooth max-information, which can be seen as a one-shot generalization of the mutual information of a quantum channel. We provide an exact operational interpretation of the channel&#39;s smooth max-information as the one-shot quantum simulation cost under no-signalling assisted codes, which significantly simplifies the study of channel simulation and provides insights and bounds for the case under entanglement-assisted codes. Third, we derive the asymptotic equipartition property of the channel&#39;s smooth max-information; i.e., it converges to the quantum mutual information of the channel in the independent and identically distributed asymptotic limit. This implies the quantum reverse Shannon theorem in the presence of no-signalling correlations. Finally, we explore the simulation cost of various quantum channels.

preprint2019arXiv

Steering Eco-Evolutionary Games Dynamics with Manifold Control

Feedback loops between population dynamics of individuals and their ecological environment are ubiquitously found in nature, and have shown profound effects on the resulting eco-evolutionary dynamics. Incorporating linear environmental feedback law into replicator dynamics of two-player games, recent theoretical studies shed light on understanding the oscillating dynamics of social dilemma. However, detailed effects of more general nonlinear feedback loops in multi-player games, which is more common especially in microbial systems, remain unclear. Here, we focus on ecological public goods games with environmental feedbacks driven by nonlinear selection gradient. Unlike previous models, multiple segments of stable and unstable equilibrium manifolds can emerge from the population dynamical systems. We find that a larger relative asymmetrical feedback speed for group interactions centered on cooperators not only accelerates the convergence of stable manifolds, but also increases the attraction basin of these stable manifolds. Furthermore, our work offers an innovative manifold control approach: by designing appropriate switching control laws, we are able to steer the eco-evolutionary dynamics to any desired population states. Our mathematical framework is an important generalization and complement to coevolutionary game dynamics, and also fills the theoretical gap in guiding the widespread problem of population state control in microbial experiments.

preprint2016arXiv

Direct Meissner Effect Observation of Superconductivity in Compressed H2S

Recently, an extremely high superconducting temperature (Tc) of ~200 K has been reported in the sulfur hydride system above 100 GPa. This result is supported by theoretical predictions and verified experimentally. The crystal structure of the superconducting phase was also identified experimentally, confirming the theoretically predicted structure as well as a decomposition mechanism from H2S to H3S+S. Even though nuclear resonant scattering has been successfully used to provide magnetic evidence for a superconducting state, a direct measurement of the important Meissner effect is still lacking. Here we report in situ alternating-current magnetic susceptibility measurements on compressed H2S under high pressures. It is shown that superconductivity suddenly appears at 117 GPa and that Tc reaches 183 K at 149 GPa before decreasing monotonically with a further increase in pressure. This evolution agrees with both theoretical calculations and earlier experimental measurements. The idea of conventional high temperature superconductivity in hydrogen-dominant compounds has thus been realized in the sulfur hydride system under hydrostatic pressure, opening further exciting perspectives for possibly realizing room temperature superconductivity in hydrogen-based compounds.