Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
54works
0followers
31topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

54 published item(s)

preprint2026arXiv

Probabilistic data quality assessment for structural monitoring data via outlier-resistant conditional diffusion model

Data quality assessment is an essential step that ensures the reliability of the subsequent structural health monitoring (SHM) tasks. This study proposes a prediction deviation-based SHM data quality assessment method using a univariate implicit auto-regressive model, enabling outlier diagnosis and data cleaning. The proposed conditional diffusion model (CDM) augments the standard diffusion model with a conditional embedding module to incorporate temporal context, quartile normalization to mitigate distribution skew, and a Huber loss to enhance robustness against outliers. Within this univariate implicit autoregressive framework, each data point is assigned an outlier probability, quantifying its degree of "outlier-ness", and a global quality evaluation score is computed to characterize the overall dataset quality. Extensive case studies utilizing operational data from real-world structures demonstrate that the proposed framework significantly improves the accuracy of data quality assessment, outperforming other strong baselines representative of clustering, isolation-based, and deep reconstruction methods. The effectiveness and robustness of the proposed framework are further demonstrated by the findings of ablation experiments and hyperparameter analysis.

preprint2026arXiv

ViMU: Benchmarking Video Metaphorical Understanding

Any new medium, once it emerges, is used for more than the transmission of overt content alone. The information it carries typically operates on two levels: one is the content directly presented, while the other is the subtext beneath it-the implicit ideas and intentions the creator seeks to convey through the medium. Likewise, since video technologies became widely adopted, video has served not only as a powerful tool for recording and communicating visual information, but also as a vehicle for emotions, attitudes, and social meanings that are often difficult to articulate explicitly. Thus, the true meaning of many videos does not reside solely in what is shown on screen; it is often embedded in context, style of expression, and the viewer's social experience. Some forms of such video subtext are humorous, while others carry irony, mockery, or criticism. These implicit meanings can also be interpreted very differently across cultural backgrounds and social groups. However, most existing video understanding models still focus primarily on literal visual comprehension, such as recognizing objects, actions, or temporal relations, and lack a systematic ability to understand the metaphorical, ironic, and social meanings embedded in videos. To bridge this gap, we introduce ViMU, the first benchmark designed to systematically evaluate the subtext understanding capabilities of frontier models in videos. ViMU assesses whether video understanding models can go beyond literal perception to infer implicit meaning while grounding their interpretations in multimodal evidence and answering both open-ended and multiple-choice questions. Importantly, all questions are designed to be hint-free, ensuring that no key evidence is disclosed to models before answering.

preprint2025arXiv

SyncGait: Robust Long-Distance Authentication for Drone Delivery via Implicit Gait Behaviors

In recent years, drone delivery, which utilizes unmanned aerial vehicles (UAVs) for package delivery and pickup, has gradually emerged as a crucial method in logistics. Since delivery drones are expensive and may carry valuable packages, they must maintain a safe distance from individuals until user-drone mutual authentication is confirmed. Despite numerous authentication schemes being developed, existing solutions are limited in authentication distance and lack resilience against sophisticated attacks. To this end, we introduce SyncGait, an implicit gait-based mutual authentication system for drone delivery. SyncGait leverages the user's unique arm swing as he walks toward the drone to achieve mutual authentication without requiring additional hardware or specific authentication actions. We conducted extensive experiments on 14 datasets collected from 31 subjects. The results demonstrate that SyncGait achieves an average accuracy of 99.84\% at a long distance ($>18m$) and exhibits strong resilience against various spoofing attacks, making it a robust, secure, and user-friendly solution in real-world scenarios.

preprint2023arXiv

Heterogeneous Graph Contrastive Multi-view Learning

Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.

preprint2023arXiv

The time-evolving impact of tree size on nighttime street canyon microclimate: Wind tunnel modeling of aerodynamic effects and heat removal

Urban trees play a crucial role in urban climate in many aspects. However, existing research has not adequately explored the impact from a time-evolving perspective, that is, tree growth over time. To bridge this research gap, this study investigates in a wind tunnel the effects of tree-to-canyon foliage cover and relative height (0.32-1.1 times canyon height), mimicking growth of trees, on conditions in street canyons during moderate and extreme heat. The results reveal that trees may affect canyon-wide ventilation and heat removal in two different scenarios. First, when canyons are in isothermal conditions, medium and large trees, that fill half the canyon height or reach slightly above the canyon, decelerate the shear layer and weaken the vortical flow, as a result reducing the canyon-wide ventilation. Second, in extreme heat conditions, medium and large trees trap heat at the pedestrian level due to the blockage of air entrainment and the suppression of upward buoyancy-driven flow from the ground surface. An air temperature rise that corresponds to 1.5 degree Celsius in a full-scale urban setting is observed in measurements. These observations suggest that urban trees' foliage cover must be managed for a canyon's optimal ventilation and heat removal during nighttime.

preprint2022arXiv

A Data-driven Adversarial Examples Recognition Framework via Adversarial Feature Genome

Adversarial examples pose many security threats to convolutional neural networks (CNNs). Most defense algorithms prevent these threats by finding differences between the original images and adversarial examples. However, the found differences do not contain features about the classes, so these defense algorithms can only detect adversarial examples without recovering the correct labels. In this regard, we propose the Adversarial Feature Genome (AFG), a novel type of data that contains both the differences and features about classes. This method is inspired by an observed phenomenon, namely the Adversarial Feature Separability (AFS), where the difference between the feature maps of the original images and adversarial examples becomes larger with deeper layers. On top of that, we further develop an adversarial example recognition framework that detects adversarial examples and can recover the correct labels. In the experiments, the detection and classification of adversarial examples by AFGs has an accuracy of more than 90.01\% in various attack scenarios. To the best of our knowledge, our method is the first method that focuses on both attack detecting and recovering. AFG gives a new data-driven perspective to improve the robustness of CNNs. The source code is available at https://github.com/GeoX-Lab/Adv_Fea_Genome.

preprint2022arXiv

A kinetic model for rarefied flows of molecular gas with vibrational modes

A kinetic model is proposed for rarefied flows of molecular gas with rotational and temperature-dependent vibrational degrees of freedom. The model reduces to the Boltzmann equation for monatomic gas when the energy exchange between the translational and internal modes is absent, thus the influence of intermolecular potential can be captured. Moreover, not only the transport coefficients but also their fundamental relaxation processes are recovered. The accuracy of our kinetic model is validated by the direct simulation Monte Carlo method in several rarefied gas flows, including the shock wave, Fourier flow, Couette flow, and the creep flow driven by Maxwell's demon. Then the kinetic model is adopted to investigate thermally-induced flows. By adjusting the viscosity index in the Boltzmann collision operator, we find that the intermolecular potential significantly influences the velocity and Knudsen force. Interestingly, in the transition flow regime, the Knudsen force exerting on a heated beam could reverse the direction when the viscosity index changes from 0.5 (hard-sphere gas) to 1 (Maxwell gas). This discovery is useful in the design of micro-electromechanical systems for microstructure actuation and gas sensing.

preprint2022arXiv

A Systematic Study of Android Non-SDK (Hidden) Service API Security

Android allows apps to communicate with its system services via system service helpers so that these apps can use various functions provided by the system services. Meanwhile, the system services rely on their service helpers to enforce security checks for protection. Unfortunately, the security checks in the service helpers may be bypassed via directly exploiting the non-SDK (hidden) APIs, degrading the stability and posing severe security threats such as privilege escalation, automatic function execution without users' interactions, crashes, and DoS attacks. Google has proposed various approaches to address this problem, e.g., case-by-case fixing the bugs or even proposing a blacklist to block all the non-SDK APIs. However, the developers can still figure out new ways of exploiting these hidden APIs to evade the non-SDKs restrictions. In this paper, we systematically study the vulnerabilities due to the hidden API exploitation and analyze the effectiveness of Google's countermeasures. We aim to answer if there are still vulnerable hidden APIs that can be exploited in the newest Android 12. We develop a static analysis tool called ServiceAudit to automatically mine the inconsistent security enforcement between service helper classes and the hidden service APIs. We apply ServiceAudit to Android 6~12. Our tool discovers 112 vulnerabilities in Android 6 with higher precision than existing approaches. Moreover, in Android 11 and 12, we identify more than 25 hidden APIs with inconsistent protections; however, only one of the vulnerable APIs can lead to severe security problems in Android 11, and none of them work on Android 12.

preprint2022arXiv

Adaptive Domain Interest Network for Multi-domain Recommendation

Industrial recommender systems usually hold data from multiple business scenarios and are expected to provide recommendation services for these scenarios simultaneously. In the retrieval step, the topK high-quality items selected from a large number of corpus usually need to be various for multiple scenarios. Take Alibaba display advertising system for example, not only because the behavior patterns of Taobao users are diverse, but also differentiated scenarios' bid prices assigned by advertisers vary significantly. Traditional methods either train models for each scenario separately, ignoring the cross-domain overlapping of user groups and items, or simply mix all samples and maintain a shared model which makes it difficult to capture significant diversities between scenarios. In this paper, we present Adaptive Domain Interest network that adaptively handles the commonalities and diversities across scenarios, making full use of multi-scenarios data during training. Then the proposed method is able to improve the performance of each business domain by giving various topK candidates for different scenarios during online inference. Specifically, our proposed ADI models the commonalities and diversities for different domains by shared networks and domain-specific networks, respectively. In addition, we apply the domain-specific batch normalization and design the domain interest adaptation layer for feature-level domain adaptation. A self training strategy is also incorporated to capture label-level connections across domains.ADI has been deployed in the display advertising system of Alibaba, and obtains 1.8% improvement on advertising revenue.

preprint2022arXiv

AnyFace: Free-style Text-to-Face Synthesis and Manipulation

Existing text-to-image synthesis methods generally are only applicable to words in the training dataset. However, human faces are so variable to be described with limited words. So this paper proposes the first free-style text-to-face method namely AnyFace enabling much wider open world applications such as metaverse, social media, cosmetics, forensics, etc. AnyFace has a novel two-stream framework for face image synthesis and manipulation given arbitrary descriptions of the human face. Specifically, one stream performs text-to-face generation and the other conducts face image reconstruction. Facial text and image features are extracted using the CLIP (Contrastive Language-Image Pre-training) encoders. And a collaborative Cross Modal Distillation (CMD) module is designed to align the linguistic and visual features across these two streams. Furthermore, a Diverse Triplet Loss (DT loss) is developed to model fine-grained features and improve facial diversity. Extensive experiments on Multi-modal CelebA-HQ and CelebAText-HQ demonstrate significant advantages of AnyFace over state-of-the-art methods. AnyFace can achieve high-quality, high-resolution, and high-diversity face synthesis and manipulation results without any constraints on the number and content of input captions.

preprint2022arXiv

Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation

Model-based methods for recommender systems have been studied extensively for years. Modern recommender systems usually resort to 1) representation learning models which define user-item preference as the distance between their embedding representations, and 2) embedding-based Approximate Nearest Neighbor (ANN) search to tackle the efficiency problem introduced by large-scale corpus. While providing efficient retrieval, the embedding-based retrieval pattern also limits the model capacity since the form of user-item preference measure is restricted to the distance between their embedding representations. However, for other more precise user-item preference measures, e.g., preference scores directly derived from a deep neural network, they are computationally intractable because of the lack of an efficient retrieval method, and an exhaustive search for all user-item pairs is impractical. In this paper, we propose a novel method to extend ANN search to arbitrary matching functions, e.g., a deep neural network. Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items. To solve the problem that the similarity measures of graph construction and user-item matching function are heterogeneous, we propose a pluggable adversarial training task to ensure the graph search with arbitrary matching function can achieve fairly high precision. Experimental results in both open source and industry datasets demonstrate the effectiveness of our method. The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase. We also summarize our detailed experiences in deployment in this paper.

preprint2022arXiv

Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning

In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER methods.

preprint2022arXiv

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

Recently, research communities highlight the necessity of formulating a scalability continuum for large-scale graph processing, which gains the scale-out benefits from distributed graph systems, and the scale-up benefits from high-performance accelerators. To this end, we propose a middleware, called the GX-plug, for the ease of integrating the merits of both. As a middleware, the GX-plug is versatile in supporting different runtime environments, computation models, and programming models. More, for improving the middleware performance, we study a series of techniques, including pipeline shuffle, synchronization caching and skipping, and workload balancing, for intra-, inter-, and beyond-iteration optimizations, respectively. Experiments show that our middleware efficiently plugs accelerators to representative distributed graph systems, e.g., GraphX and Powergraph, with up-to 20x acceleration ratio.

preprint2022arXiv

HDMapNet: An Online HD Map Construction and Evaluation Framework

Constructing HD semantic maps is a central component of autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of HD semantic map learning, which dynamically constructs the local semantics based on onboard sensor observations. Meanwhile, we introduce a semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our camera-LiDAR fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semantic-level and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem.

preprint2022arXiv

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

Building embodied intelligent agents that can interact with 3D indoor environments has received increasing research attention in recent years. While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e.g., a switch on the wall turns on or off the light, a remote control operates the TV). Humans often spend little or no effort to infer these relationships, even when entering a new room, by using our strong prior knowledge (e.g., we know that buttons control electrical devices) or using only a few exploratory interactions in cases of uncertainty (e.g., multiple switches and lights in the same room). In this paper, we take the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes. We create a new benchmark based on the AI2Thor and PartNet datasets and perform extensive experiments that prove the effectiveness of our proposed method. Results show that our model successfully learns priors and fast-interactive-adaptation strategies for exploring inter-object functional relationships in complex 3D scenes. Several ablation studies further validate the usefulness of each proposed module.

preprint2022arXiv

Improving Distantly Supervised Relation Extraction by Natural Language Inference

To reduce human annotations for relation extraction (RE) tasks, distantly supervised approaches have been proposed, while struggling with low performance. In this work, we propose a novel DSRE-NLI framework, which considers both distant supervision from existing knowledge bases and indirect supervision from pretrained language models for other tasks. DSRE-NLI energizes an off-the-shelf natural language inference (NLI) engine with a semi-automatic relation verbalization (SARV) mechanism to provide indirect supervision and further consolidates the distant annotations to benefit multi-classification RE models. The NLI-based indirect supervision acquires only one relation verbalization template from humans as a semantically general template for each relationship, and then the template set is enriched by high-quality textual patterns automatically mined from the distantly annotated corpus. With two simple and effective data consolidation strategies, the quality of training data is substantially improved. Extensive experiments demonstrate that the proposed framework significantly improves the SOTA performance (up to 7.73\% of F1) on distantly supervised RE benchmark datasets.

preprint2022arXiv

Magnetic phase transition induced ferroelectric polarization in BaFeF4 with room temperature weak ferromagnetism

BaMF4 (M=Fe, Co, Ni and Mn) family are typical multiferroic materials, having antiferromagnetism at around liquid nitrogen temperature. In this work, polycrystalline BaFeF4 has been prepared by solid state reaction. The slight deficiency of Fe leads to the coexistence of valence states of +2 and +3, facilitating the electrons to hop between the neighboring Fe2+ and Fe3+ ions through the middle F- ion, leading to the strong double exchange interaction with weak ferromagnetism above room temperature. A bifurcation at about 170 K between the zero-field-cooled and field-cooled temperature dependent magnetization curves indicates the onset of 2-dimensional antiferromagnetism, which is completed at about 125 K with the sudden drop of magnetization. Despite the fact of type-I multiferroic, its magnetoelectricity can be evidenced by the pyroelectric current, which shows a peak starting at about 170 K and finishing at about 125 K. The saturated ferroelectric polarization change of around 34 μC/m2 is observed, which is switchable by the reversed poling electric field and decreases to about 30 μC/m2 under a magnetic field of 90 kOe. This magnetoelectricity can be qualitatively reproduced by first-principles calculations. Our results represent substantial progress to search for high-temperature multiferroics in ferroelectric fluorides.

preprint2022arXiv

Mass spectra and strong decays of charmed and charmed-strange mesons

A semi-relativistic potential model is adopted to calculate the mass spectra of charmed and charmed-strange meson states up to the $2D$ excitations.The strong decay properties are further analyzed with a chiral quark model by using the numerical wave functions obtained from the potential model. By using the strong decay amplitudes extracted from the chiral quark model, we also systematically study the coupled-channel effects on the bare masses of the $1P$-wave states, since the masses of $D^*_{s0}(2317)$ and $D_{s1}(2460)$ cannot be explained with bare $1P$-wave states within the potential model. Based on our good descriptions of the mass and decay properties for the low-lying well-established states, we give a quark model classification for the high mass resonances observed in recent years. In the $D$-meson family, $D_0(2550)$ can be classified as the radially excited state $D(2^1S_0)$; $D_3^*(2750)$ and $D_2(2740)$ can be classified as the second orbital excitations $D(1^3D_3)$ and $D(1D'_2)$, respectively; $D_J^*(3000)$ may be a candidate of $D(1^3F_4)$ or $D(2^3P_2)$; while $D_J(3000)$ may favor the high mass mixed state $D(2P'_1)$; however, there still exist puzzles for understanding the natures of $D_1^*(2600)$ and $D_1^*(2760)$, whose decay properties cannot be well explained with either pure $D(2^3S_1)$ and $D(1^3D_1)$ states or their mixing. In the $D_s$-meson family, $D_{s3}^*(2860)$ favors the $D_s(1^3D_3)$ assignment; $D_{s1}^*(2700)$ and $D_{s1}^*(2860)$ may favor the mixed states $|(SD)_1\rangle_L$ and $|(SD)_1\rangle_H$ via the $2^3S_1$-$1^3D_1$ mixing, respectively; $D_{sJ}(3040)$ may favor $D_s(2P_1)$ or $D_s(2P_1')$, or corresponds to a structure contributed by both $D_s(2P_1)$ and $D_s(2P_1')$.

preprint2022arXiv

MOST-Net: A Memory Oriented Style Transfer Network for Face Sketch Synthesis

Face sketch synthesis has been widely used in multi-media entertainment and law enforcement. Despite the recent developments in deep neural networks, accurate and realistic face sketch synthesis is still a challenging task due to the diversity and complexity of human faces. Current image-to-image translation-based face sketch synthesis frequently encounters over-fitting problems when it comes to small-scale datasets. To tackle this problem, we present an end-to-end Memory Oriented Style Transfer Network (MOST-Net) for face sketch synthesis which can produce high-fidelity sketches with limited data. Specifically, an external self-supervised dynamic memory module is introduced to capture the domain alignment knowledge in the long term. In this way, our proposed model could obtain the domain-transfer ability by establishing the durable relationship between faces and corresponding sketches on the feature level. Furthermore, we design a novel Memory Refinement Loss (MR Loss) for feature alignment in the memory module, which enhances the accuracy of memory slots in an unsupervised manner. Extensive experiments on the CUFS and the CUFSF datasets show that our MOST-Net achieves state-of-the-art performance, especially in terms of the Structural Similarity Index(SSIM).

preprint2022arXiv

Multiple-access relay stations for long-haul fiber-optic radio frequency transfer

We report on the realization of a long-haul radio frequency (RF) transfer scheme by using multiple-access relay stations (MARSs). The proposed scheme with independent link noise compensation for each fiber sub-link effectively solves the limitation of compensation bandwidth for long-haul transfer. The MARS can have the capability to share the same modulated optical signal for the front and rear fiber sub-links, simplifying the configuration at the repeater station and enabling the transfer system to have the multiple-access capability. At the same time, we for the first time theoretically model the effect of the MARS position on the fractional frequency instability of the fiber-optic RF transfer, demonstrating that the MARS position has little effect on system's performance when the ratio of the front and rear fiber sub-links is around $1:1$. We experimentally demonstrate a 1 GHz signal transfer by using one MARS connecting 260 and 280 km fiber links with the fractional frequency instabilities of less than $5.9\times10^{-14}$ at 1 s and $8.5\times10^{-17}$ at 10,000 s at the remote site and of $5.6\times10^{-14}$ and $6.6\times10^{-17}$ at the integration times of 1 s and 10,000 s at the MARS. The proposed scalable technique can arbitrarily add the same MARSs in the fiber link, which has great potential in realizing ultra-long-haul RF transfer.

preprint2022arXiv

One Shot Face Swapping on Megapixels

Face swapping has both positive applications such as entertainment, human-computer interaction, etc., and negative applications such as DeepFake threats to politics, economics, etc. Nevertheless, it is necessary to understand the scheme of advanced methods for high-quality face swapping and generate enough and representative face swapping images to train DeepFake detection algorithms. This paper proposes the first Megapixel level method for one shot Face Swapping (or MegaFS for short). Firstly, MegaFS organizes face representation hierarchically by the proposed Hierarchical Representation Face Encoder (HieRFE) in an extended latent space to maintain more facial details, rather than compressed representation in previous face swapping methods. Secondly, a carefully designed Face Transfer Module (FTM) is proposed to transfer the identity from a source image to the target by a non-linear trajectory without explicit feature disentanglement. Finally, the swapped faces can be synthesized by StyleGAN2 with the benefits of its training stability and powerful generative capability. Each part of MegaFS can be trained separately so the requirement of our model for GPU memory can be satisfied for megapixel face swapping. In summary, complete face representation, stable training, and limited memory usage are the three novel contributions to the success of our method. Extensive experiments demonstrate the superiority of MegaFS and the first megapixel level face swapping database is released for research on DeepFake detection and face image editing in the public domain. The dataset is at this link.

preprint2022arXiv

Quantum Deep Learning for Mutant COVID-19 Strain Prediction

New COVID-19 epidemic strains like Delta and Omicron with increased transmissibility and pathogenicity emerge and spread across the whole world rapidly while causing high mortality during the pandemic period. Early prediction of possible variants (especially spike protein) of COVID-19 epidemic strains based on available mutated SARS-CoV-2 RNA sequences may lead to early prevention and treatment. Here, combining the advantage of quantum and quantum-inspired algorithms with the wide application of deep learning, we propose a development tool named DeepQuantum, and use this software to realize the goal of predicting spike protein variation structure of COVID-19 epidemic strains. In addition, this hybrid quantum-classical model for the first time achieves quantum-inspired blur convolution similar to classical depthwise convolution and also successfully applies quantum progressive training with quantum circuits, both of which guarantee that our model is the quantum counterpart of the famous style-based GAN. The results state that the fidelities of random generating spike protein variation structure are always beyond 96% for Delta, 94% for Omicron. The training loss curve is more stable and converges better with multiple loss functions compared with the corresponding classical algorithm. At last, evidences that quantum-inspired algorithms promote the classical deep learning and hybrid models effectively predict the mutant strains are strong.

preprint2022arXiv

Towards future directions in data-integrative supervised prediction of human aging-related genes

Identification of human genes involved in the aging process is critical due to the incidence of many diseases with age. A state-of-the-art approach for this purpose infers a weighted dynamic aging-specific subnetwork by mapping gene expression (GE) levels at different ages onto the protein-protein interaction network (PPIN). Then, it analyzes this subnetwork in a supervised manner by training a predictive model to learn how network topologies of known aging- vs. non-aging-related genes change across ages. Finally, it uses the trained model to predict novel aging-related genes. However, the best current subnetwork resulting from this approach still yields suboptimal prediction accuracy. This could be because it was inferred using outdated GE and PPIN data. Here, we evaluate whether analyzing a weighted dynamic aging-specific subnetwork inferred from newer GE and PPIN data improves prediction accuracy upon analyzing the best current subnetwork inferred from outdated data. Unexpectedly, we find that not to be the case. To understand this, we perform aging-related pathway and Gene Ontology (GO) term enrichment analyses. We find that the suboptimal prediction accuracy, regardless of which GE or PPIN data is used, may be caused by the current knowledge about which genes are aging-related being incomplete, or by the current methods for inferring or analyzing an aging-specific subnetwork being unable to capture all of the aging-related knowledge. These findings can potentially guide future directions towards improving supervised prediction of aging-related genes via -omics data integration.

preprint2022arXiv

Voice-Face Homogeneity Tells Deepfake

Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of generalizability of these approaches has reached a blockage. To address this issue, given the empirical results that the identities behind voices and faces are often mismatched in deepfake videos, and the voices and faces have homogeneity to some extent, in this paper, we propose to perform the deepfake detection from an unexplored voice-face matching view. To this end, a voice-face matching method is devised to measure the matching degree of these two. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead, advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets - DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.

preprint2021arXiv

A Directed Spanning Tree Adaptive Control Framework for Time-Varying Formations

In this paper, the time-varying formation and time-varying formation tracking problems are solved for linear multi-agent systems over digraphs without the knowledge of the eigenvalues of the Laplacian matrix associated to the digraph. The solution to these problems relies on a framework that generalizes the directed spanning tree adaptive method, which was originally limited to consensus problems. Necessary and sufficient conditions for the existence of solutions to the formation problems are derived. Asymptotic convergence of the formation errors is proved via graph theory and Lyapunov analysis.

preprint2021arXiv

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.

preprint2021arXiv

Data Poisoning Attacks to Deep Learning Based Recommender Systems

Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based, association-rule-based, matrix-factorization-based, to deep learning based, have been developed and deployed in industry. Among them, deep learning based recommender systems become increasingly popular due to their superior performance. In this work, we conduct the first systematic study on data poisoning attacks to deep learning based recommender systems. An attacker's goal is to manipulate a recommender system such that the attacker-chosen target items are recommended to many users. To achieve this goal, our attack injects fake users with carefully crafted ratings to a recommender system. Specifically, we formulate our attack as an optimization problem, such that the injected ratings would maximize the number of normal users to whom the target items are recommended. However, it is challenging to solve the optimization problem because it is a non-convex integer programming problem. To address the challenge, we develop multiple techniques to approximately solve the optimization problem. Our experimental results on three real-world datasets, including small and large datasets, show that our attack is effective and outperforms existing attacks. Moreover, we attempt to detect fake users via statistical analysis of the rating patterns of normal and fake users. Our results show that our attack is still effective and outperforms existing attacks even if such a detector is deployed.

preprint2021arXiv

Detecting Localized Adversarial Examples: A Generic Approach using Critical Region Analysis

Deep neural networks (DNNs) have been applied in a wide range of applications,e.g.,face recognition and image classification; however,they are vulnerable to adversarial examples. By adding a small amount of imperceptible perturbations,an attacker can easily manipulate the outputs of a DNN. Particularly,the localized adversarial examples only perturb a small and contiguous region of the target object,so that they are robust and effective in both digital and physical worlds. Although the localized adversarial examples have more severe real-world impacts than traditional pixel attacks,they have not been well addressed in the literature. In this paper,we propose a generic defense system called TaintRadar to accurately detect localized adversarial examples via analyzing critical regions that have been manipulated by attackers. The main idea is that when removing critical regions from input images,the ranking changes of adversarial labels will be larger than those of benign labels. Compared with existing defense solutions,TaintRadar can effectively capture sophisticated localized partial attacks, e.g.,the eye-glasses attack,while not requiring additional training or fine-tuning of the original model's structure. Comprehensive experiments have been conducted in both digital and physical worlds to verify the effectiveness and robustness of our defense.

preprint2021arXiv

Low-light Image Restoration with Short- and Long-exposure Raw Pairs

Low-light imaging with handheld mobile devices is a challenging issue. Limited by the existing models and training data, most existing methods cannot be effectively applied in real scenarios. In this paper, we propose a new low-light image restoration method by using the complementary information of short- and long-exposure images. We first propose a novel data generation method to synthesize realistic short- and longexposure raw images by simulating the imaging pipeline in lowlight environment. Then, we design a new long-short-exposure fusion network (LSFNet) to deal with the problems of low-light image fusion, including high noise, motion blur, color distortion and misalignment. The proposed LSFNet takes pairs of shortand long-exposure raw images as input, and outputs a clear RGB image. Using our data generation method and the proposed LSFNet, we can recover the details and color of the original scene, and improve the low-light image quality effectively. Experiments demonstrate that our method can outperform the state-of-the art methods.

preprint2021arXiv

Reproducing sub-millimetre galaxy number counts with cosmological hydrodynamic simulations

Matching the number counts of high-$z$ sub-millimetre-selected galaxies (SMGs) has been a long standing problem for galaxy formation models. In this paper, we use 3D dust radiative transfer to model the sub-mm emission from galaxies in the SIMBA cosmological hydrodynamic simulations, and compare predictions to the latest single-dish observational constraints on the abundance of 850$\mathrm{μm}$-selected sources. We find good agreement with the shape of the integrated 850$\mathrm{μm}$ luminosity function, and the normalisation is within 0.25 dex at $> 3 \; \mathrm{mJy}$, unprecedented for a fully cosmological hydrodynamic simulation, along with good agreement in the redshift distribution of bright SMGs. The agreement is driven primarily by SIMBA's good match to infrared measures of the star formation rate (SFR) function between $z = 2-4$ at high SFRs. Also important is the self-consistent on-the-fly dust model in SIMBA, which predicts, on average, higher dust masses (by up to a factor of 2.5) compared to using a fixed dust-to-metals ratio of 0.3. We construct a lightcone to investigate the effect of far-field blending, and find that 52% of sources are blends of multiple components, which makes a small contribution to the normalisation of the bright-end of the number counts. We provide new fits to the 850$\mathrm{μm}$ luminosity as a function of SFR and dust mass. Our results demonstrate that exotic solutions to the discrepancy between sub-mm counts in simulations and observations, such as a top-heavy IMF, are unnecessary, and that sub-millimetre-bright phases are a natural consequence of massive galaxy evolution.

preprint2021arXiv

Truncation-Free Matching System for Display Advertising at Alibaba

Matching module plays a critical role in display advertising systems. Without query from user, it is challenging for system to match user traffic and ads suitably. System packs up a group of users with common properties such as the same gender or similar shopping interests into a crowd. Here term crowd can be viewed as a tag over users. Then advertisers bid for different crowds and deliver their ads to those targeted users. Matching module in most industrial display advertising systems follows a two-stage paradigm. When receiving a user request, matching system (i) finds the crowds that the user belongs to; (ii) retrieves all ads that have targeted those crowds. However, in applications such as display advertising at Alibaba, with very large volumes of crowds and ads, both stages of matching have to truncate the long-tailed parts for online serving, under limited latency. That's to say, not all ads have the chance to participate in online matching. This results in sub-optimal result for both advertising performance and platform revenue. In this paper, we study the truncation problem and propose a Truncation Free Matching System (TFMS). The basic idea is to decouple the matching computation from the online pipeline. Instead of executing the two-stage matching when user visits, TFMS utilizes a near-line truncation-free matching to pre-calculate and store those top valuable ads for each user. Then the online pipeline just needs to fetch the pre-stored ads as matching results. In this way, we can jump out of online system's latency and computation cost limitations, and leverage flexible computation resource to finish the user-ad matching. TFMS has been deployed in our productive system since 2019, bringing (i) more than 50% improvement of impressions for advertisers who encountered truncation before, (ii) 9.4% Revenue Per Mile gain, which is significant enough for the business.

preprint2020arXiv

A Fast Radio Burst discovered in FAST drift scan survey

We report the discovery of a highly dispersed fast radio burst, FRB~181123, from an analysis of $\sim$1500~hr of drift-scan survey data taken using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The pulse has three distinct emission components, which vary with frequency across our 1.0--1.5~GHz observing band. We measure the peak flux density to be $>0.065$~Jy and the corresponding fluence $>0.2$~Jy~ms. Based on the observed dispersion measure of 1812~cm$^{-3}$~pc, we infer a redshift of $\sim 1.9$. From this, we estimate the peak luminosity and isotropic energy to be $\lesssim 2\times10^{43}$~erg~s$^{-1}$ and $\lesssim 2\times10^{40}$~erg, respectively. With only one FRB from the survey detected so far, our constraints on the event rate are limited. We derive a 95\% confidence lower limit for the event rate of 900 FRBs per day for FRBs with fluences $>0.025$~Jy~ms. We performed follow-up observations of the source with FAST for four hours and have not found a repeated burst. We discuss the implications of this discovery for our understanding of the physical mechanisms of FRBs.

preprint2020arXiv

A novel combination of theoretical analysis and data-driven method for reconstruction of structural defects

Ultrasonic guided wave technology has played a significant role in the field of non-destructive testing as it employs acoustic waves that have advantages of high propagation efficiency and low energy consumption during the inspect process. However, theoretical solutions to guided wave scattering problems using assumptions such as Born approximation, have led to the poor quality of the reconstructed results. To address this issue, a novel approach to quantitative reconstruction of defects using the integration of data-driven method with the guided wave scattering analysis has been proposed in this paper. Based on the geometrical information of defects and initial results by the theoretical analysis of defect reconstructions, a deep learning neural network model is built to reveal the physical relationship between defects and the received signals. This data-driven model is then applied to quantitatively assess and characterize defect profiles in structures, reduce the inaccuracy of the theoretical modelling and eliminate the impact of noise pollution in the process of inspection. To demonstrate advantages of the developed approach to reconstructions of defects with complex profiles, numerical examples including basic defect profiles and a defect with the noisy fringe have been examined. Results show that this approach has greater accuracy for reconstruction of defects in structures as compared with the analytical method and provides a valuable insight into the development of artificial intelligence-assisted inspection systems with high accuracy and efficiency in the field of non-destructive testing.

preprint2020arXiv

Adversarial Example in Remote Sensing Image Recognition

With the wide application of remote sensing technology in various fields, the accuracy and security requirements for remote sensing images (RSIs) recognition are also increasing. In recent years, due to the rapid development of deep learning in the field of image recognition, RSI recognition models based on deep convolution neural networks (CNNs) outperform traditional hand-craft feature techniques. However, CNNs also pose security issues when they show their capability of accurate classification. By adding a very small variation of the adversarial perturbation to the input image, the CNN model can be caused to produce erroneous results with extremely high confidence, and the modification of the image is not perceived by the human eye. This added adversarial perturbation image is called an adversarial example, which poses a serious security problem for systems based on CNN model recognition results. This paper, for the first time, analyzes adversarial example problem of RSI recognition under CNN models. In the experiments, we used different attack algorithms to fool multiple high-accuracy RSI recognition models trained on multiple RSI datasets. The results show that RSI recognition models are also vulnerable to adversarial examples, and the models with different structures trained on the same RSI dataset also have different vulnerabilities. For each RSI dataset, the number of features also affects the vulnerability of the model. Many features are good for defensive adversarial examples. Further, we find that the attacked class of RSI has an attack selectivity property. The misclassification of adversarial examples of the RSIs are related to the similarity of the original classes in the CNN feature space. In addition, adversarial examples in RSI recognition are of great significance for the security of remote sensing applications, showing a huge potential for future research.

preprint2020arXiv

Automatic Historical Feature Generation through Tree-based Method in Ads Prediction

Historical features are important in ads click-through rate (CTR) prediction, because they account for past engagements between users and ads. In this paper, we study how to efficiently construct historical features through counting features. The key challenge of such problem lies in how to automatically identify counting keys. We propose a tree-based method for counting key selection. The intuition is that a decision tree naturally provides various combinations of features, which could be used as counting key candidate. In order to select personalized counting features, we train one decision tree model per user, and the counting keys are selected across different users with a frequency-based importance measure. To validate the effectiveness of proposed solution, we conduct large scale experiments on Twitter video advertising data. In both online learning and offline training settings, the automatically identified counting features outperform the manually curated counting features.

preprint2020arXiv

Discovery and timing of pulsars in the globular cluster M13 with FAST

We report the discovery of a binary millisecond pulsar (namely PSR J1641+3627F or M13F) in the globular cluster M13 (NGC 6205) and timing solutions of M13A to F using observations made with the Five-hundred-metre Aperture Spherical radio Telescope (FAST). PSR J1641+3627F has a spin period of 3.00 ms and an orbital period of 1.4 days. The most likely companion mass is 0.16 M$_{\odot}$. M13A to E all have short spin periods and small period derivatives. We also confirm that the binary millisecond pulsar PSR J1641$+$3627E (also M13E) is a black widow with a companion mass around 0.02 M$_{\odot}$. We find that all the binary systems have low eccentricities compared to those typical for globular cluster pulsars and that they decrease with distance from the cluster core. This is consistent with what is expected as this cluster has a very low encounter rate per binary.

preprint2020arXiv

Enhanced Valley Zeeman Splitting in Fe-Doped Monolayer MoS2

The Zeeman effect offers unique opportunities for magnetic manipulation of the spin degree of freedom (DOF). Recently, valley Zeeman splitting, referring to the lifting of valley degeneracy, has been demonstrated in two-dimensional transition metal dichalcogenides (TMDs) at liquid helium temperature. However, to realize the practical applications of valley pseudospins, the valley DOF must be controllable by a magnetic field at room temperature, which remains a significant challenge. Magnetic doping in TMDs can enhance the Zeeman splitting, however, to achieve this experimentally is not easy. Here, we report unambiguous magnetic manipulation of valley Zeeman splitting at 300 K (g = -6.4) and 10 K (g = -11) in a CVD-grown Fe-doped MoS2 monolayer; the effective g factor can be tuned to -20.7 by increasing the Fe dopant concentration, which represents an approximately fivefold enhancement as compared to undoped MoS2. Our measurements and calculations reveal that the enhanced splitting and geff factors are due to the Heisenberg exchange interaction of the localized magnetic moments (Fe 3d electrons) with MoS2 through the d-orbital hybridization.

preprint2020arXiv

Improving Relation Extraction with Knowledge-attention

While attention mechanisms have been proven to be effective in many NLP tasks, majority of them are data-driven. We propose a novel knowledge-attention encoder which incorporates prior knowledge from external lexical resources into deep neural networks for relation extraction task. Furthermore, we present three effective ways of integrating knowledge-attention with self-attention to maximize the utilization of both knowledge and data. The proposed relation extraction system is end-to-end and fully attention-based. Experiment results show that the proposed knowledge-attention mechanism has complementary strengths with self-attention, and our integrated models outperform existing CNN, RNN, and self-attention based models. State-of-the-art performance is achieved on TACRED, a complex and large-scale relation extraction dataset.

preprint2020arXiv

Lifespan of solutions to a damped fourth-order wave equation with logarithmic nonlinearity

This paper is devoted to the lifespan of solutions to a damped fourth-order wave equation with logarithmic nonlinearity $$u_{tt}+Δ^2u-Δu-ωΔu_t+α(t)u_t=|u|^{p-2}u\ln|u|.$$ Finite time blow-up criteria for solutions at both lower and high initial energy levels are established, and an upper bound for the blow-up time is given for each case. Moreover, by constructing a new auxiliary functional and making full use of the strong damping term, a lower bound for the blow-up time is also derived.

preprint2020arXiv

Negative Margin Matters: Understanding Margin in Few-shot Classification

This paper introduces a negative margin loss to metric learning based few-shot learning methods. The negative margin loss significantly outperforms regular softmax loss, and achieves state-of-the-art accuracy on three standard few-shot classification benchmarks with few bells and whistles. These results are contrary to the common practice in the metric learning field, that the margin is zero or positive. To understand why the negative margin loss performs well for the few-shot classification, we analyze the discriminability of learned features w.r.t different margins for training and novel classes, both empirically and theoretically. We find that although negative margin reduces the feature discriminability for training classes, it may also avoid falsely mapping samples of the same novel class to multiple peaks or clusters, and thus benefit the discrimination of novel classes. Code is available at https://github.com/bl0/negative-margin.few-shot.

preprint2020arXiv

Off-Path TCP Exploits of the Mixed IPID Assignment

In this paper, we uncover a new off-path TCP hijacking attack that can be used to terminate victim TCP connections or inject forged data into victim TCP connections by manipulating the new mixed IPID assignment method, which is widely used in Linux kernel version 4.18 and beyond to help defend against TCP hijacking attacks. The attack has three steps. First, an off-path attacker can downgrade the IPID assignment for TCP packets from the more secure per-socket-based policy to the less secure hash-based policy, building a shared IPID counter that forms a side channel on the victim. Second, the attacker detects the presence of TCP connections by observing the shared IPID counter on the victim. Third, the attacker infers the sequence number and the acknowledgment number of the detected connection by observing the side channel of the shared IPID counter. Consequently, the attacker can completely hijack the connection, i.e., resetting the connection or poisoning the data stream. We evaluate the impacts of this off-path TCP attack in the real world. Our case studies of SSH DoS, manipulating web traffic, and poisoning BGP routing tables show its threat on a wide range of applications. Our experimental results show that our off-path TCP attack can be constructed within 215 seconds and the success rate is over 88%. Finally, we analyze the root cause of the exploit and develop a new IPID assignment method to defeat this attack. We prototype our defense in Linux 4.18 and confirm its effectiveness through extensive evaluation over real applications on the Internet.

preprint2020arXiv

On Carbon Nanotubes in the Interstellar Medium

Since their discovery in 1991, carbon nanotubes (CNTs) -- a novel one-dimensional carbon allotrope -- have attracted considerable interest worldwide because of their potential technological applications such as electric and optical devices. In the astrophysical context, CNTs may be present in the interstellar space since many of the other allotropes of carbon (e.g., amorphous carbon, fullerenes, nanodiamonds, graphite, polycyclic aromatic hydrocarbons, and possibly graphene as well) are known to be widespread in the Universe, as revealed by presolar grains in carbonaceous primitive meteorites and/or by their fingerprint spectral features in astronomical spectra. In addition, there are also experimental and theoretical pathways to the formation of CNTs in the interstellar medium (ISM). In this work, we examine their possible presence in the ISM by comparing the observed interstellar extinction curve with the ultraviolet/optical absorption spectra experimentally obtained for single-walled CNTs of a wide range of diameters and chiralities. Based on the absence in the interstellar extinction curve of the ~4.5 and 5.25 eV $π$-plasmon absorption bands which are pronounced in the experimental spectra of CNTs, we place an upper limit of ~10 ppm of C/H (i.e., ~4% of the total interstellar C) on the interstellar CNT abundance.

preprint2020arXiv

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them. Instead of relying on fully-typed NER datasets, many efforts have been made to leverage multiple partially-typed ones for training and allow the resulting model to cover a full type set. However, there is neither guarantee on the quality of integrated datasets, nor guidance on the design of training algorithms. Here, we conduct a systematic analysis and comparison between partially-typed NER datasets and fully-typed ones, in both theoretical and empirical manner. Firstly, we derive a bound to establish that models trained with partially-typed annotations can reach a similar performance with the ones trained with fully-typed annotations, which also provides guidance on the algorithm design. Moreover, we conduct controlled experiments, which shows partially-typed datasets leads to similar performance with the model trained with the same amount of fully-typed annotations

preprint2020arXiv

Powderday: Dust Radiative Transfer for Galaxy Simulations

We present Powderday, a flexible, fast, open-source dust radiative transfer package designed to interface with galaxy formation simulations. Powderday builds on FSPS population synthesis models, Hyperion dust radiative transfer, and employs yt to interface between different software packages. We include our stellar population synthesis modeling on the fly, which allows for significant run-time flexibility in the assumed stellar physics. We include a model for nebular line emission that can employ either precomputed Cloudy lookup tables (for efficiency), or direct photoionization calculations for all young stars (for flexibility). The dust content follows either observationally-motivated prescriptions, direct modeling from galaxy formation simulations, or a novel approach that includes the dust content via learning-based algorithms from the SIMBA cosmological galaxy formation simulation. AGN can additionally be included via a range of prescriptions. The output of these models are broadband SEDs, as well as filter-convolved images. Powderday is designed to eliminate last-mile efforts by researchers that employ different hydrodynamic galaxy formation models, and seamlessly interfaces with GIZMO, AREPO, GASOLINE, CHANGA, and ENZO. We demonstrate the capabilities of the code via three applications: a model for the star formation rate (SFR) - infrared luminosity relation in galaxies (including the impact of AGN); the impact of circumstellar dust around AGB stars on the mid-infrared emission from galaxy SEDs; and the impact of galaxy inclination angle on dust attenuation laws.

preprint2020arXiv

Reference-guided Face Component Editing

Face portrait editing has achieved great progress in recent years. However, previous methods either 1) operate on pre-defined face attributes, lacking the flexibility of controlling shapes of high-level semantic facial components (e.g., eyes, nose, mouth), or 2) take manually edited mask or sketch as an intermediate representation for observable changes, but such additional input usually requires extra efforts to obtain. To break the limitations (e.g. shape, mask or sketch) of the existing methods, we propose a novel framework termed r-FACE (Reference-guided FAce Component Editing) for diverse and controllable face component editing with geometric changes. Specifically, r-FACE takes an image inpainting model as the backbone, utilizing reference images as conditions for controlling the shape of face components. In order to encourage the framework to concentrate on the target face components, an example-guided attention module is designed to fuse attention features and the target face component features extracted from the reference image. Through extensive experimental validation and comparisons, we verify the effectiveness of the proposed framework.

preprint2020arXiv

Removing Backdoor-Based Watermarks in Neural Networks with Limited Data

Deep neural networks have been widely applied and achieved great success in various fields. As training deep models usually consumes massive data and computational resources, trading the trained deep models is highly demanded and lucrative nowadays. Unfortunately, the naive trading schemes typically involves potential risks related to copyright and trustworthiness issues, e.g., a sold model can be illegally resold to others without further authorization to reap huge profits. To tackle this problem, various watermarking techniques are proposed to protect the model intellectual property, amongst which the backdoor-based watermarking is the most commonly-used one. However, the robustness of these watermarking approaches is not well evaluated under realistic settings, such as limited in-distribution data availability and agnostic of watermarking patterns. In this paper, we benchmark the robustness of watermarking, and propose a novel backdoor-based watermark removal framework using limited data, dubbed WILD. The proposed WILD removes the watermarks of deep models with only a small portion of training data, and the output model can perform the same as models trained from scratch without watermarks injected. In particular, a novel data augmentation method is utilized to mimic the behavior of watermark triggers. Combining with the distribution alignment between the normal and perturbed (e.g., occluded) data in the feature space, our approach generalizes well on all typical types of trigger contents. The experimental results demonstrate that our approach can effectively remove the watermarks without compromising the deep model performance for the original task with the limited access to training data.

preprint2020arXiv

Spatial-Adaptive Network for Single Image Denoising

Previous works have shown that convolutional neural networks can achieve good performance in image denoising tasks. However, limited by the local rigid convolutional operation, these methods lead to oversmoothing artifacts. A deeper network structure could alleviate these problems, but more computational overhead is needed. In this paper, we propose a novel spatial-adaptive denoising network (SADNet) for efficient single image blind noise removal. To adapt to changes in spatial textures and edges, we design a residual spatial-adaptive block. Deformable convolution is introduced to sample the spatially correlated features for weighting. An encoder-decoder structure with a context block is introduced to capture multiscale information. With noise removal from the coarse to fine, a high-quality noisefree image can be obtained. We apply our method to both synthetic and real noisy image datasets. The experimental results demonstrate that our method can surpass the state-of-the-art denoising methods both quantitatively and visually.

preprint2020arXiv

Spatio-Temporal Dual Affine Differential Invariant for Skeleton-based Action Recognition

The dynamics of human skeletons have significant information for the task of action recognition. The similarity between trajectories of corresponding joints is an indicating feature of the same action, while this similarity may subject to some distortions that can be modeled as the combination of spatial and temporal affine transformations. In this work, we propose a novel feature called spatio-temporal dual affine differential invariant (STDADI). Furthermore, in order to improve the generalization ability of neural networks, a channel augmentation method is proposed. On the large scale action recognition dataset NTU-RGB+D, and its extended version NTU-RGB+D 120, it achieves remarkable improvements over previous state-of-the-art methods.

preprint2020arXiv

Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork

Background. Human aging is linked to many prevalent diseases. The aging process is highly influenced by genetic factors. Hence, it is important to identify human aging-related genes. We focus on supervised prediction of such genes. Gene expression-based methods for this purpose study genes in isolation from each other. While protein-protein interaction (PPI) network-based methods for this purpose account for interactions between genes' protein products, current PPI network data are context-unspecific, spanning different biological conditions. Instead, here, we focus on an aging-specific subnetwork of the entire PPI network, obtained by integrating aging-specific gene expression data and PPI network data. The potential of such data integration has been recognized but mostly in the context of cancer. So, we are the first to propose a supervised learning framework for predicting aging-related genes from an aging-specific PPI subnetwork. Results. In a systematic and comprehensive evaluation, we find that in many of the evaluation tests: (i) using an aging-specific subnetwork indeed yields more accurate aging-related gene predictions than using the entire network, and (ii) predictive methods from our framework that have not previously been used for supervised prediction of aging-related genes outperform existing prominent methods for the same purpose. Conclusion. These results justify the need for our framework.

preprint2020arXiv

When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

Automatic speech recognition (ASR) systems have been widely deployed in modern smart devices to provide convenient and diverse voice-controlled services. Since ASR systems are vulnerable to audio replay attacks that can spoof and mislead ASR systems, a number of defense systems have been proposed to identify replayed audio signals based on the speakers' unique acoustic features in the frequency domain. In this paper, we uncover a new type of replay attack called modulated replay attack, which can bypass the existing frequency domain based defense systems. The basic idea is to compensate for the frequency distortion of a given electronic speaker using an inverse filter that is customized to the speaker's transform characteristics. Our experiments on real smart devices confirm the modulated replay attacks can successfully escape the existing detection mechanisms that rely on identifying suspicious features in the frequency domain. To defeat modulated replay attacks, we design and implement a countermeasure named DualGuard. We discover and formally prove that no matter how the replay audio signals could be modulated, the replay attacks will either leave ringing artifacts in the time domain or cause spectrum distortion in the frequency domain. Therefore, by jointly checking suspicious features in both frequency and time domains, DualGuard can successfully detect various replay attacks including the modulated replay attacks. We implement a prototype of DualGuard on a popular voice interactive platform, ReSpeaker Core v2. The experimental results show DualGuard can achieve 98% accuracy on detecting modulated replay attacks.

preprint2019arXiv

Canonical interpretation of $Y(10750)$ and $Υ(10860)$ in the $Υ$ family

Inspired by the new resonance $Y(10750)$, we calculate the masses and two-body OZI-allowed strong decays of the higher vector bottomonium sates within both screened and linear potential models. We discuss the possibilities of $Υ(10860)$ and $Y(10750)$ as mixed states via the $S-D$ mixing. Our results suggest that $Y(10750)$ and $Υ(10860)$ might be explained as mixed states between $5S$- and $4D$-wave vector $b\bar{b}$ states. The $Y(10750)$ and $Υ(10860)$ resonances may correspond to the mixed states dominated by the $4D$- and $5S$-wave components, respectively. The mass and the strong decay behaviors of the $Υ(11020)$ resonance are consistent with the assignment of the $Υ(6S)$ state in the potential models.

preprint2019arXiv

Entanglement spectrum edge reconstruction and correlation hole of the FQH liquids

The edge of the electronic fractional quantum Hall (FQH) system obeys the law of the chiral Luttinger liquid theory due to its intrinsic topological properties and the relation of bulk-edge correspondence. However, in a realistic experimental system, such as the usual Hall bar setup, the soften of the background confinement potential can induce the reconstruction of the edge spectrum which breaks the chirality and universality of the FQH edge. The entanglement spectrum (ES) of the FQH ground state has the same counting structure as that in the energy spectrum indicating the topological characters of the quantum state. In this work, we report that the ES can also have an edge reconstruction while sweeping the area of the sub-system in real space cut. Moreover, we found the critical area of the sub-system matches accurately with the intrinsic building block of the fractional quantum Hall liquids, namely the correlation hole of the FQH liquids. The above results seem like be universal after our studying a series of typical FQH states, such as two Laughlin states at $ν= 1/3$ and $ν= 1/5$, and the Moore-Read state for $ν= 5/2$.

preprint2019arXiv

Neutral excitation and bulk gap of the Fractional Quantum Hall liquids in disk geometry

For the numerical simulation of the fractional quantum Hall effects on a finite disk, the rotational symmetry is the only symmetry that were used in diagonalizing the Hamiltonian. In this work, we propose a method of using the weak translational symmetry for the center of mass of the many-body system. With this approach, the bulk properties , such as the energy gap and the magneto-roton excitation are exactly consistent with that in the closed manifold like sphere and torus. As an application, we consider the FQH phase and its phase transition in the fast rotated dipolar fermions. We thus weapon the disk geometry versatility in analyzing the bulk properties beside the usual edge physics.

preprint2019arXiv

Non-Majorana Origin of the Half-Quantized Conductance Plateau in Quantum Anomalous Hall Insulator and Superconductor Hybrid Structures

A quantum anomalous Hall (QAH) insulator coupled to an s-wave superconductor is predicted to harbor a topological superconducting phase, the elementary excitations of which (i.e. Majorana fermions) can form topological qubits upon non-Abelian braiding operations. A recent transport experiment interprets the half-quantized two-terminal conductance plateau as the presence of chiral Majorana fermions in a millimeter-size QAH-Nb hybrid structure. However, there are concerns about this interpretation because non-Majorana mechanisms can also generate similar signatures, especially in a disordered QAH system. Here, we fabricated QAH-Nb hybrid structures and studied the QAH-Nb contact transparency and its effect on the corresponding two-terminal conductance. When the QAH film is tuned to the metallic regime by electric gating, we observed a sharp zero-bias enhancement in the differential conductance, up to 80% at zero magnetic field. This large enhancement suggests high probability of Andreev reflection and transparent interface between the magnetic topological insulator (TI) and Nb layers. When the magnetic TI film is in the QAH state with well-aligned magnetization, we found that the two-terminal conductance is always half-quantized. Our experiment provides a comprehensive understanding of the superconducting proximity effect observed in QAH-superconductor hybrid structures and shows that the half-quantized conductance plateau is unlikely to be induced by chiral Majorana fermions.