Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
36works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2026arXiv

DRNet: All-in-One Image Restoration via Prior-Guided Dynamic Reparameterization

All-in-one image restoration aims to handle diverse degradations within a single model. However, existing methods often suffer from three key limitations: 1) per-input computational overhead from dynamic degradation estimation; 2) optimization challenges due to task heterogeneity; and 3) inefficient, frequency-agnostic encoder designs. To overcome these, we introduce the Dynamic Reparameterization Network (DRNet), a novel framework operating on an initialization-stage reconfiguration paradigm that fundamentally eliminates per-input overhead. At its core, a Dynamic Reparameterization MLP (DRMLP) guided by a Task-Specific Modulator (TSM), which effectively mitigates task heterogeneity by orchestrating both specific restoration goals and a versatile general-purpose mode within a unified architecture. Furthermore, we incorporate a Continuous Wavelet Transform Encoder (CWTE) that explicitly leverages frequency characteristics via wavelet decomposition for a lightweight yet powerful design. Extensive experiments demonstrate that DRNet achieves state-of-the-art performance across five restoration tasks with superior parameter efficiency. Crucially, it showcases unique flexibility, excelling as both a highly competitive foundation model for blind restoration and a top-performing user-guided specialist.

preprint2025arXiv

Observation of robust one-dimensional edge channels in a three-dimensional quantum spin Hall insulator

Topologically protected edge channels show prospects for quantum devices. They have been found experimentally in two-dimensional (2D) quantum spin Hall insulators (QSHIs), weak topological insulators and higher-order topological insulators (HOTIs), but the number of materials realizing these topologies is still quite limited. Here, we provide evidence for topological edge states within a novel topology named three-dimensional (3D) QSHIs. Its topology originates solely from a nonzero $S_z$ spin Chern number for each $k_z$ plane of the crystal and is realized in bulk $α$-Bi$_4$I$_4$ with trivial symmetry indicators, as we show by density functional theory calculations. We experimentally observe the related edge states at each type of monolayer and bilayer step of this material by scanning tunneling microscopy. Consistently, the edge states are neither interrupted, nor backscattered by defects at the step edges corroborating their helical character as expected from the nontrivial topology. Furthermore, two individual edge channels are directly observed at bilayer steps without visible interaction gap opening, demonstrating the robustness of these edge modes against vertical stacking. Our results establish $α$-Bi$_4$I$_4$ as the first material realization of a 3D QSHI whose definition goes beyond the scope of topological symmetry indicators, and provide a pathway for realizing nearly-quantized spin Hall conductivity per unit cell in a bulk crystal.

preprint2024arXiv

From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against common distortions in online social networks (OSNs). In this paper, we introduce an end-to-end robust generative video steganography network (RoGVS), which achieves visual editing by modifying semantic feature of videos to embed secret message. We employ face-swapping scenario to showcase the visual editing effects. We first design a secret message embedding module to adaptively hide secret message into the semantic feature of videos. Extensive experiments display that the proposed RoGVS method applied to facial video datasets demonstrate its superiority over existing video and image steganography techniques in terms of both robustness and capacity.

preprint2024arXiv

Object-oriented backdoor attack against image captioning

Backdoor attack against image classification task has been widely studied and proven to be successful, while there exist little research on the backdoor attack against vision-language models. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Assuming the attacker has total access to the training dataset, and cannot intervene in model construction or training process. Specifically, a portion of benign training samples is randomly selected to be poisoned. Afterwards, considering that the captions are usually unfolded around objects in an image, we design an object-oriented method to craft poisons, which aims to modify pixel values by a slight range with the modification number proportional to the scale of the current detected object region. After training with the poisoned data, the attacked model behaves normally on benign images, but for poisoned images, the model will generate some sentences irrelevant to the given image. The attack controls the model behavior on specific test images without sacrificing the generation performance on benign test images. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.

preprint2024arXiv

PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation Models Through Prompt Tuning

Deceptive images can be shared in seconds with social networking services, posing substantial risks. Tampering traces, such as boundary artifacts and high-frequency information, have been significantly emphasized by massive networks in the Image Manipulation Localization (IML) field. However, they are prone to image post-processing operations, which limit the generalization and robustness of existing methods. We present a novel Prompt-IML framework. We observe that humans tend to discern the authenticity of an image based on both semantic and high-frequency information, inspired by which, the proposed framework leverages rich semantic knowledge from pre-trained visual foundation models to assist IML. We are the first to design a framework that utilizes visual foundation models specially for the IML task. Moreover, we design a Feature Alignment and Fusion module to align and fuse features of semantic features with high-frequency features, which aims at locating tampered regions from multiple perspectives. Experimental results demonstrate that our model can achieve better performance on eight typical fake image datasets and outstanding robustness.

preprint2023arXiv

Trojaning semi-supervised learning model via poisoning wild images on the web

Wild images on the web are vulnerable to backdoor (also called trojan) poisoning, causing machine learning models learned on these images to be injected with backdoors. Most previous attacks assumed that the wild images are labeled. In reality, however, most images on the web are unlabeled. Specifically, we study the effects of unlabeled backdoor images under semi-supervised learning (SSL) on widely studied deep neural networks. To be realistic, we assume that the adversary is zero-knowledge and that the semi-supervised learning model is trained from scratch. Firstly, we find the fact that backdoor poisoning always fails when poisoned unlabeled images come from different classes, which is different from poisoning the labeled images. The reason is that the SSL algorithms always strive to correct them during training. Therefore, for unlabeled images, we implement backdoor poisoning on images from the target class. Then, we propose a gradient matching strategy to craft poisoned images such that their gradients match the gradients of target images on the SSL model, which can fit poisoned images to the target class and realize backdoor injection. To the best of our knowledge, this may be the first approach to backdoor poisoning on unlabeled images of trained-from-scratch SSL models. Experiments show that our poisoning achieves state-of-the-art attack success rates on most SSL algorithms while bypassing modern backdoor defenses.

preprint2022arXiv

A DTCWT-SVD Based Video Watermarking resistant to frame rate conversion

Videos can be easily tampered, copied and redistributed by attackers for illegal and monetary usage. Such behaviors severely jeopardize the interest of content owners. Despite huge efforts made in digital video watermarking for copyright protection, typical distortions in video transmission including signal attacks, geometric attacks and temporal synchronization attacks can still easily erase the embedded signal. Among them, temporal synchronization attacks which include frame dropping, frame insertion and frame rate conversion is one of the most prevalent attacks. To address this issue, we present a new video watermarking based on joint Dual-Tree Cosine Wavelet Transformation (DTCWT) and Singular Value Decomposition (SVD), which is resistant to frame rate conversion. We first extract a set of candidate coefficient by applying SVD decomposition after DTCWT transform. Then, we simulate the watermark embedding by adjusting the shape of candidate coefficient. Finally, we perform group-level watermarking that includes moderate temporal redundancy to resist temporal desynchronization attacks. Extensive experimental results show that the proposed scheme is more resilient to temporal desynchronization attacks and performs better than the existing blind video watermarking schemes.

preprint2022arXiv

Coupling Visual Semantics of Artificial Neural Networks and Human Brain Function via Synchronized Activations

Artificial neural networks (ANNs), originally inspired by biological neural networks (BNNs), have achieved remarkable successes in many tasks such as visual representation learning. However, whether there exists semantic correlations/connections between the visual representations in ANNs and those in BNNs remains largely unexplored due to both the lack of an effective tool to link and couple two different domains, and the lack of a general and effective framework of representing the visual semantics in BNNs such as human functional brain networks (FBNs). To answer this question, we propose a novel computational framework, Synchronized Activations (Sync-ACT), to couple the visual representation spaces and semantics between ANNs and BNNs in human brain based on naturalistic functional magnetic resonance imaging (nfMRI) data. With this approach, we are able to semantically annotate the neurons in ANNs with biologically meaningful description derived from human brain imaging for the first time. We evaluated the Sync-ACT framework on two publicly available movie-watching nfMRI datasets. The experiments demonstrate a) the significant correlation and similarity of the semantics between the visual representations in FBNs and those in a variety of convolutional neural networks (CNNs) models; b) the close relationship between CNN's visual representation similarity to BNNs and its performance in image classification tasks. Overall, our study introduces a general and effective paradigm to couple the ANNs and BNNs and provides novel insights for future studies such as brain-inspired artificial intelligence.

preprint2022arXiv

Emergence of insulating ferrimagnetism and perpendicular magnetic anisotropy in 3d-5d perovskite oxide composite films for insulator spintronic

Magnetic insulators with strong perpendicular magnetic anisotropy (PMA) play a key role in exploring pure spin current phenomena and developing ultralow-dissipation spintronic devices, thereby it is highly desirable to develop new material platforms. Here we report epitaxial growth of La2/3Sr1/3MnO3 (LSMO)-SrIrO3 (SIO) composite oxide films (LSMIO) with different crystalline orientations fabricated by sequential two-target ablation process using pulsed laser deposition. The LSMIO films exhibit high crystalline quality with homogeneous mixture of LSMO and SIO at atomic level. Ferrimagnetic and insulating transport characteristics are observed, with the temperature-dependent electric resistivity well fitted by Mott variable-range-hopping model. Moreover, the LSMIO films show strong PMA. Through further constructing all perovskite oxide heterostructures of the ferrimagnetic insulator LSMIO and a strong spin-orbital coupled SIO layer, pronounced spin Hall magnetoresistance (SMR) and spin Hall-like anomalous Hall effect (SH-AHE) were observed. These results illustrate the potential application of the ferrimagnetic insulator LSMIO in developing all-oxide ultralow-dissipation spintronic devices.

preprint2022arXiv

Exploring Depth Information for Face Manipulation Detection

Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.

preprint2022arXiv

Fusion of Self-supervised Learned Models for MOS Prediction

We participated in the mean opinion score (MOS) prediction challenge, 2022. This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD). To improve the accuracy of the predicted scores, we have explored several model fusion-related strategies and proposed a fused framework in which seven pretrained self-supervised learned (SSL) models have been engaged. These pretrained SSL models are derived from three ASR frameworks, including Wav2Vec, Hubert, and WavLM. For the OOD track, we followed the 7 SSL models selected on the main track and adopted a semi-supervised learning method to exploit the unlabeled data. According to the official analysis results, our system has achieved 1st rank in 6 out of 16 metrics and is one of the top 3 systems for 13 out of 16 metrics. Specifically, we have achieved the highest LCC, SRCC, and KTAU scores at the system level on main track, as well as the best performance on the LCC, SRCC, and KTAU evaluation metrics at the utterance level on OOD track. Compared with the basic SSL models, the prediction accuracy of the fused system has been largely improved, especially on OOD sub-track.

preprint2022arXiv

Generative Steganography Network

Steganography usually modifies cover media to embed secret data. A new steganographic approach called generative steganography (GS) has emerged recently, in which stego images (images containing secret data) are generated from secret data directly without cover media. However, existing GS schemes are often criticized for their poor performances. In this paper, we propose an advanced generative steganography network (GSN) that can generate realistic stego images without using cover images. We firstly introduce the mutual information mechanism in GS, which helps to achieve high secret extraction accuracy. Our model contains four sub-networks, i.e., an image generator ($G$), a discriminator ($D$), a steganalyzer ($S$), and a data extractor ($E$). $D$ and $S$ act as two adversarial discriminators to ensure the visual quality and security of generated stego images. $E$ is to extract the hidden secret from generated stego images. The generator $G$ is flexibly constructed to synthesize either cover or stego images with different inputs. It facilitates covert communication by concealing the function of generating stego images in a normal generator. A module named secret block is designed to hide secret data in the feature maps during image generation, with which high hiding capacity and image fidelity are achieved. In addition, a novel hierarchical gradient decay (HGD) skill is developed to resist steganalysis detection. Experiments demonstrate the superiority of our work over existing methods.

preprint2022arXiv

Hierarchical Capsule Prediction Network for Marketing Campaigns Effect

Marketing campaigns are a set of strategic activities that can promote a business's goal. The effect prediction for marketing campaigns in a real industrial scenario is very complex and challenging due to the fact that prior knowledge is often learned from observation data, without any intervention for the marketing campaign. Furthermore, each subject is always under the interference of several marketing campaigns simultaneously. Therefore, we cannot easily parse and evaluate the effect of a single marketing campaign. To the best of our knowledge, there are currently no effective methodologies to solve such a problem, i.e., modeling an individual-level prediction task based on a hierarchical structure with multiple intertwined events. In this paper, we provide an in-depth analysis of the underlying parse tree-like structure involved in the effect prediction task and we further establish a Hierarchical Capsule Prediction Network (HapNet) for predicting the effects of marketing campaigns. Extensive results based on both the synthetic data and real data demonstrate the superiority of our model over the state-of-the-art methods and show remarkable practicability in real industrial applications.

preprint2022arXiv

High-Capacity Framework for Reversible Data Hiding in Encrypted Image Using Pixel Predictions and Entropy Encoding

While the existing vacating room before encryption (VRBE) based schemes can achieve decent embedding rate, the payloads of the existing vacating room after encryption (VRAE) based schemes are relatively low. To address this issue, this paper proposes a generalized framework for high-capacity RDHEI for both VRBE and VRAE cases. First, an efficient embedding room generation algorithm (ERGA) is designed to produce large embedding room by using pixel prediction and entropy encoding. Then, we propose two RDHEI schemes, one for VRBE, another for VRAE. In the VRBE scenario, the image owner generates the embedding room with ERGA and encrypts the preprocessed image by using the stream cipher with two encryption keys. Then, the data hider locates the embedding room and embeds the encrypted additional data. In the VRAE scenario, the cover image is encrypted by an improved block modulation and permutation encryption algorithm, where the spatial redundancy in the plain-text image is largely preserved. Then, the data hider applies ERGA on the encrypted image to generate the embedding room and conducts data embedding. For both schemes, the receivers with different authentication keys can respectively conduct error-free data extraction and/or error-free image recovery. The experimental results show that the two proposed schemes outperform many state-of-the-art RDHEI arts. Besides, the schemes can ensure high security level, where the original image can be hardly discovered from the encrypted version before and after data hiding by the unauthorized user.

preprint2022arXiv

Image Generation Network for Covert Transmission in Online Social Network

Online social networks have stimulated communications over the Internet more than ever, making it possible for secret message transmission over such noisy channels. In this paper, we propose a Coverless Image Steganography Network, called CIS-Net, that synthesizes a high-quality image directly conditioned on the secret message to transfer. CIS-Net is composed of four modules, namely, the Generation, Adversarial, Extraction, and Noise Module. The receiver can extract the hidden message without any loss even the images have been distorted by JPEG compression attacks. To disguise the behaviour of steganography, we collected images in the context of profile photos and stickers and train our network accordingly. As such, the generated images are more inclined to escape from malicious detection and attack. The distinctions from previous image steganography methods are majorly the robustness and losslessness against diverse attacks. Experiments over diverse public datasets have manifested the superior ability of anti-steganalysis.

preprint2022arXiv

Learning Infomax and Domain-Independent Representations for Causal Effect Inference with Real-World Data

The foremost challenge to causal inference with real-world data is to handle the imbalance in the covariates with respect to different treatment options, caused by treatment selection bias. To address this issue, recent literature has explored domain-invariant representation learning based on different domain divergence metrics (e.g., Wasserstein distance, maximum mean discrepancy, position-dependent metric, and domain overlap). In this paper, we reveal the weaknesses of these strategies, i.e., they lead to the loss of predictive information when enforcing the domain invariance; and the treatment effect estimation performance is unstable, which heavily relies on the characteristics of the domain distributions and the choice of domain divergence metrics. Motivated by information theory, we propose to learn the Infomax and Domain-Independent Representations to solve the above puzzles. Our method utilizes the mutual information between the global feature representations and individual feature representations, and the mutual information between feature representations and treatment assignment predictions, in order to maximally capture the common predictive information for both treatment and control groups. Moreover, our method filters out the influence of instrumental and irrelevant variables, and thus it effectively increases the predictive ability of potential outcomes. Experimental results on both the synthetic and real-world datasets show that our method achieves state-of-the-art performance on causal effect inference. Moreover, our method exhibits reliable prediction performances when facing data with different characteristics of data distributions, complicated variable types, and severe covariate imbalance.

preprint2022arXiv

Multi-Task Adversarial Learning for Treatment Effect Estimation in Basket Trials

Estimating treatment effects from observational data provides insights about causality guiding many real-world applications such as different clinical study designs, which are the formulations of trials, experiments, and observational studies in medical, clinical, and other types of research. In this paper, we describe causal inference for application in a novel clinical design called basket trial that tests how well a new drug works in patients who have different types of cancer that all have the same mutation. We propose a multi-task adversarial learning (MTAL) method, which incorporates feature selection multi-task representation learning and adversarial learning to estimate potential outcomes across different tumor types for patients sharing the same genetic mutation but having different tumor types. In our paper, the basket trial is employed as an intuitive example to present this new causal inference setting. This new causal inference setting includes, but is not limited to basket trials. This setting has the same challenges as the traditional causal inference problem, i.e., missing counterfactual outcomes under different subgroups and treatment selection bias due to confounders. We present the practical advantages of our MTAL method for the analysis of synthetic basket trial data and evaluate the proposed estimator on two benchmarks, IHDP and News. The results demonstrate the superiority of our MTAL method over the competing state-of-the-art methods.

preprint2022arXiv

Multimodal Fake News Detection via CLIP-Guided Learning

Multimodal fake news detection has attracted many research interests in social forensics. Many existing approaches introduce tailored attention mechanisms to guide the fusion of unimodal features. However, how the similarity of these features is calculated and how it will affect the decision-making process in FND are still open questions. Besides, the potential of pretrained multi-modal feature learning models in fake news detection has not been well exploited. This paper proposes a FND-CLIP framework, i.e., a multimodal Fake News Detection network based on Contrastive Language-Image Pretraining (CLIP). Given a targeted multimodal news, we extract the deep representations from the image and text using a ResNet-based encoder, a BERT-based encoder and two pair-wise CLIP encoders. The multimodal feature is a concatenation of the CLIP-generated features weighted by the standardized cross-modal similarity of the two modalities. The extracted features are further processed for redundancy reduction before feeding them into the final classifier. We introduce a modality-wise attention module to adaptively reweight and aggregate the features. We have conducted extensive experiments on typical fake news datasets. The results indicate that the proposed framework has a better capability in mining crucial features for fake news detection. The proposed FND-CLIP can achieve better performances than previous works, i.e., 0.7\%, 6.8\% and 1.3\% improvements in overall accuracy on Weibo, Politifact and Gossipcop, respectively. Besides, we justify that CLIP-based learning can allow better flexibility on multimodal feature selection.

preprint2022arXiv

NeuralSound: Learning-based Modal Sound Synthesis With Acoustic Transfer

We present a novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and an end-to-end sound radiation network for acoustic transfer. Our mixed vibration solver consists of a 3D sparse convolution network and a Locally Optimal Block Preconditioned Conjugate Gradient module (LOBPCG) for iterative optimization. Moreover, we highlight the correlation between a standard modal vibration solver and our network architecture. Our radiation network predicts the Far-Field Acoustic Transfer maps (FFAT Maps) from the surface vibration of the object. The overall running time of our learning method for any new object is less than one second on a GTX 3080 Ti GPU while maintaining a high sound quality close to the ground truth that is computed using standard numerical methods. We also evaluate the numerical accuracy and perceptual accuracy of our sound synthesis approach on different objects corresponding to various materials.

preprint2022arXiv

Robust Watermarking for Video Forgery Detection with Improved Imperceptibility and Robustness

Videos are prone to tampering attacks that alter the meaning and deceive the audience. Previous video forgery detection schemes find tiny clues to locate the tampered areas. However, attackers can successfully evade supervision by destroying such clues using video compression or blurring. This paper proposes a video watermarking network for tampering localization. We jointly train a 3D-UNet-based watermark embedding network and a decoder that predicts the tampering mask. The perturbation made by watermark embedding is close to imperceptible. Considering that there is no off-the-shelf differentiable video codec simulator, we propose to mimic video compression by ensembling simulation results of other typical attacks, e.g., JPEG compression and blurring, as an approximation. Experimental results demonstrate that our method generates watermarked videos with good imperceptibility and robustly and accurately locates tampered areas within the attacked version.

preprint2021arXiv

Cooperative control of perpendicular magnetic anisotropy via crystal structure and orientation in single-crystal flexible SrRuO3 membranes

Flexible magnetic materials with robust and controllable perpendicular magnetic anisotropy (PMA) are highly desirable for developing flexible high-performance spintronic devices. However, it is still challenge to fabricate PMA films through current techniques of direct deposition on polymers. Here, we report a facile method for synthesizing single-crystal freestanding SrRuO3 (SRO) membranes with controlled crystal structure and orientation using water-soluble Ca3-xSrxAl2O6 sacrificial layers. Through cooperative effect of crystal structure and orientation engineering, flexible SrRuO3 membranes reveal highly tunable magnetic anisotropy from in-plane to our-of-plane with a remarkable PMA energy of 7.34*106 erg/cm3. Based on the first-principles calculations, it reveals that the underlying mechanism of PMA modulation is intimately correlated with structure-controlled Ru 4d-orbital occupation, as well as the spin-orbital matrix element differences, dependent on the crystal orientation. In addition, there are no obvious changes of the magnetism after 10,000 bending cycles, indicating an excellent magnetism reliability in the prepared films. This work provides a feasible approach to prepare the flexible oxide films with strong and controllable PMA.

preprint2021arXiv

Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.

preprint2021arXiv

Enhanced Superconductivity in the Se-substituted 1T-PdTe$_2$

Two-dimensional transition metal dichalcogenide PdTe$_2$ recently attracts much attention due to its phase coexistence of type-II Dirac semimetal and type-I superconductivity. Here we report a 67 % enhancement of superconducting transition temperature in the 1T-PdSeTe in comparison to that of PdTe2 through partial substitution of Te atoms by Se. The superconductivity has been unambiguously confirmed by the magnetization, resistivity and specific heat measurements. 1T-PdSeTe shows type-II superconductivity with large anisotropy and non-bulk superconductivity nature with volume fraction ~ 20 % estimated from magnetic and heat capacity measurements. 1T-PdSeTe expands the family of superconducting transition metal dichalcogenides and thus provides additional insights for understanding superconductivity and topological physics in the 1T-PdTe$_2$ system

preprint2021arXiv

Lateral modulation of magnetic anisotropy in tricolor 3d-5d oxide superlattices

Manipulating magnetic anisotropy (MA) purposefully in transition metal oxides (TMOs) enables the development of oxide-based spintronic devices with practical applications. Here, we report a pathway to reversibly switch the lateral magnetic easy-axis via interfacial oxygen octahedral coupling (OOC) effects in 3d-5d tricolor superlattices, i.e. [SrIrO3,mRTiO3,SrIrO3,2La0.67Sr0.33MnO3]10 (RTiO3: SrTiO3 and CaTiO3). In the heterostructures, the anisotropy energy (MAE) is enhanced over one magnitude to ~106 erg/cm3 compared to La0.67Sr0.33MnO3 films. Moreover, the magnetic easy-axis is reversibly reoriented between (100)- and (110)-directions by changing the RTiO3. Using first-principles density functional theory calculations, we find that the SrIrO3 owns a large single-ion anisotropy due to its strong spin-orbit interaction. This anisotropy can be reversibly controlled by the OOC, then reorient the easy-axis of the superlattices. Additionally, it enlarges the MAE of the films via the cooperation with a robust orbital hybridization between the Ir and Mn atoms. Our results indicate that the tricolor superlattices consisting of 3d and 5d oxides provide a powerful platform to study the MA and develop oxide-based spintronic devices.

preprint2021arXiv

Learning Emergent Discrete Message Communication for Cooperative Reinforcement Learning

Communication is a important factor that enables agents work cooperatively in multi-agent reinforcement learning (MARL). Most previous work uses continuous message communication whose high representational capacity comes at the expense of interpretability. Allowing agents to learn their own discrete message communication protocol emerged from a variety of domains can increase the interpretability for human designers and other agents.This paper proposes a method to generate discrete messages analogous to human languages, and achieve communication by a broadcast-and-listen mechanism based on self-attention. We show that discrete message communication has performance comparable to continuous message communication but with much a much smaller vocabulary size.Furthermore, we propose an approach that allows humans to interactively send discrete messages to agents.

preprint2021arXiv

Searching for Fast Model Families on Datacenter Accelerators

Neural Architecture Search (NAS), together with model scaling, has shown remarkable progress in designing high accuracy and fast convolutional architecture families. However, as neither NAS nor model scaling considers sufficient hardware architecture details, they do not take full advantage of the emerging datacenter (DC) accelerators. In this paper, we search for fast and accurate CNN model families for efficient inference on DC accelerators. We first analyze DC accelerators and find that existing CNNs suffer from insufficient operational intensity, parallelism, and execution efficiency. These insights let us create a DC-accelerator-optimized search space, with space-to-depth, space-to-batch, hybrid fused convolution structures with vanilla and depthwise convolutions, and block-wise activation functions. On top of our DC accelerator optimized neural architecture search space, we further propose a latency-aware compound scaling (LACS), the first multi-objective compound scaling method optimizing both accuracy and latency. Our LACS discovers that network depth should grow much faster than image size and network width, which is quite different from previous compound scaling results. With the new search space and LACS, our search and scaling on datacenter accelerators results in a new model series named EfficientNet-X. EfficientNet-X is up to more than 2X faster than EfficientNet (a model series with state-of-the-art trade-off on FLOPs and accuracy) on TPUv3 and GPUv100, with comparable accuracy. EfficientNet-X is also up to 7X faster than recent RegNet and ResNeSt on TPUv3 and GPUv100.

preprint2020arXiv

A Survey on Causal Inference

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

preprint2020arXiv

Analysis of Fleet Management and Network Design for On-Demand Urban Air Mobility Operations

A significant challenge in estimating operational feasibility of Urban Air Mobility (UAM) missions lies in understanding how choices in design impact the performance of a complex system-of-systems. This work examines the ability of the UAM ecosystem and the operations within it to meet a variety of demand profiles that may emerge in the coming years. We perform a set of simulation driven feasibility and scalability analyses based on UAM operational models with the goal of estimating capacity and throughput for a given set of parameters that represent an operational UAM ecosystem. UAM ecosystem design guidelines, vehicle constraints, and effective operational policies can be drawn from our analysis. Results show that, while critical for enabling UAM, the performance of the UAM ecosystem is robust to variations in ground infrastructure and fleet design decisions, while being sensitive to decisions for fleet and traffic management policies. We show that so long as the ecosystem design parameters for ground infrastructure and fleet design fall within a sensible range, the performance of the UAM ecosystem is affected by the policies used to manage the UAM traffic.

preprint2020arXiv

Cross-scale Attention Model for Acoustic Event Classification

A major advantage of a deep convolutional neural network (CNN) is that the focused receptive field size is increased by stacking multiple convolutional layers. Accordingly, the model can explore the long-range dependency of features from the top layers. However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation. This limitation is especially evident in the acoustic event classification (AEC) task, where both short- and long-duration events are involved in an audio clip and needed to be classified. In this paper, we propose a cross-scale attention (CSA) model, which explicitly integrates features from different scales to form the final representation. Moreover, we propose the adoption of the attention mechanism to specify the weights of local and global features based on the spatial and temporal characteristics of acoustic events. Using mathematic formulations, we further reveal that the proposed CSA model can be regarded as a weighted residual CNN (ResCNN) model when the ResCNN is used as a backbone model. We tested the proposed model on two AEC datasets: one is an urban AEC task, and the other is an AEC task in smart car environments. Experimental results show that the proposed CSA model can effectively improve the performance of current state-of-the-art deep learning algorithms.

preprint2020arXiv

Effect of isotope disorder on the Raman spectra of cubic boron arsenide

Boron arsenide (c-BAs) is at the forefront of research on ultrahigh thermal conductivity materials. We present a Raman scattering study of isotopically tailored cubic boron arsenide single crystals for 11 isotopic compositions spanning the range from nearly pure c-$^{10}$BAs to nearly pure c-$^{11}$BAs. Our results provide insights on the effects of strong mass disorder on optical phonons and the appearance of two-mode behavior in the Raman spectra of mixed crystals. Strong isotope disorder also relaxes the one-phonon Raman selection rules, resulting in disorder-activated Raman scattering by acoustic phonons.

preprint2020arXiv

Learning Robust Data Representation: A Knowledge Flow Perspective

It is always demanding to learn robust visual representation for various learning problems; however, this learning and maintenance process usually suffers from noise, incompleteness or knowledge domain mismatch. Thus, robust representation learning by removing noisy features or samples, complementing incomplete data, and mitigating the distribution difference becomes the key. Along this line of research, low-rank modeling has been widely-applied to solving representation learning challenges. This survey covers the topic from a knowledge flow perspective in terms of: (1) robust knowledge recovery, (2) robust knowledge transfer, and (3) robust knowledge fusion, centered around several major applications. First of all, we deliver a unified formulation for robust knowledge discovery given single dataset. Second, we discuss robust knowledge transfer and fusion given multiple datasets with different knowledge flows, followed by practical challenges, model variations, and remarks. Finally, we highlight future research of robust knowledge discovery for incomplete, unbalance, large-scale data analysis. This would benefit AI community from literature review to future direction.

preprint2020arXiv

Non-Abelian Aharonov-Bohm Caging in Photonic Lattices

Aharonov-Bohm (AB) caging is the localization effect in translational-invariant lattices due to destructive interference induced by penetrated magnetic fields. While current research focuses mainly on the case of Abelian AB caging, here we go beyond and develop the non-Abelian AB caging concept by considering the particle localization in a 1D multi-component rhombic lattice with non-Abelian background gauge field. In contrast to its Abelian counterpart, the non-Abelian AB cage depends on both the form of the nilpotent interference matrix and the initial state of the lattice. This phenomena is the consequence of the non-Abelian nature of the gauge potential and thus has no Abelian analog. We further propose a circuit quantum electrodynamics realization of the proposed physics, in which the required non-Abelian gauge field can be synthesized by the parametric conversion method, and the non-Abelian AB caging can be unambiguously demonstrated through the pumping and the steady-state measurements of only a few sites on the lattice. Requiring only currently available technique, our proposal can be readily tested in experiment and may pave a new route towards the investigation of exotic photonic quantum fluids.

preprint2020arXiv

Novel polymorphic phase of BaCu2As2: impact of flux for new phase formation in crystal growth

In this work, we have thoroughly studied the effects of flux composition and temperature on the crystal growth of the BaCu2As2 compound. While Pb and CuAs self-flux produce the well-known α-phase ThCr2Si2-type structure (Z=2), a new polymorphic phase of BaCu2As2 (\b{eta} phase) with a much larger c lattice parameter (Z=10), which could be considered an intergrowth of the ThCr2Si2- and CaBe2Ge2-type structures, has been discovered via Sn flux growth. We have characterized this structure through single-crystal X-ray diffraction, transmission electron microscopy (TEM), and scanning transmission electron microscopy (STEM) studies. Furthermore, we compare this new polymorphic intergrowth structure with the α-phase BaCu2As2 (ThCr2Si2 type with Z=2) and the \b{eta}-phase BaCu2Sb2 (intergrowth of ThCr2Si2 and CaBe2Ge2 types with Z=6), both with the same space group I4/mmm. Electrical transport studies reveal p-type carriers and magnetoresistivity up to 22% at 5 K and under a magnetic field of 7 T. Our work suggests a new route for the discovery of new polymorphic structures through flux and temperature control during material synthesis.

preprint2020arXiv

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

3D vehicle detection based on point cloud is a challenging task in real-world applications such as autonomous driving. Despite significant progress has been made, we observe two aspects to be further improved. First, the semantic context information in LiDAR is seldom explored in previous works, which may help identify ambiguous vehicles. Second, the distribution of point cloud on vehicles varies continuously with increasing depths, which may not be well modeled by a single model. In this work, we propose a unified model SegVoxelNet to address the above two problems. A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view. Suspicious regions could be highlighted while noisy regions are suppressed by this module. To better deal with vehicles at different depths, a novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range. Extensive experiments on the KITTI dataset show that the proposed method outperforms the state-of-the-art alternatives in both accuracy and efficiency with point cloud as input only.

preprint2020arXiv

Stereotype-Free Classification of Fictitious Faces

Equal Opportunity and Fairness are receiving increasing attention in artificial intelligence. Stereotyping is another source of discrimination, which yet has been unstudied in literature. GAN-made faces would be exposed to such discrimination, if they are classified by human perception. It is possible to eliminate the human impact on fictitious faces classification task by the use of statistical approaches. We present a novel approach through penalized regression to label stereotype-free GAN-generated synthetic unlabeled images. The proposed approach aids labeling new data (fictitious output images) by minimizing a penalized version of the least squares cost function between realistic pictures and target pictures.

preprint2020arXiv

Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint privacy protection do not provide either a meaningful privacy-utility trade-off or a formal and rigorous definition of privacy. In this study, we design a novel and rigorous privacy metric for voiceprint privacy, which is referred to as voice-indistinguishability, by extending differential privacy. We also propose mechanisms and frameworks for privacy-preserving speech data release satisfying voice-indistinguishability. Experiments on public datasets verify the effectiveness and efficiency of the proposed methods.