Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
36works
0followers
25topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2026arXiv

Generalization Bounds for Transformer Channel Decoders

Transformer channel decoders, such as the Error Correction Code Transformer (ECCT), have shown strong empirical performance in channel decoding, yet their generalization behavior remains theoretically unclear. This paper studies the generalization performance of ECCT from a learning-theoretic perspective. By establishing a connection between multiplicative noise estimation errors and bit-error-rate (BER), we derive an upper bound on the generalization gap via bit-wise Rademacher complexity. The resulting bound characterizes the dependence on code length, model parameters, and training set size, and applies to both single-layer and multi-layer ECCTs. We further show that parity-check-based masked attention induces sparsity that reduces the covering number, leading to a tighter generalization bound. To the best of our knowledge, this work provides the first theoretical generalization guarantees for this class of decoders.

preprint2026arXiv

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in choosing between retrieval-augmented generation (RAG) and long-context (LC) strategies: RAG is efficient but constrained by retrieval quality, while LC supports global reasoning at higher cost and with position sensitivity. Existing methods such as Self-Route adopt failure-driven fallback from RAG to LC, but remain passive, inefficient, and hard to interpret. We propose Pre-Route, a proactive routing framework that performs structured reasoning before answering. Using lightweight metadata (e.g., document type, length, initial snippet), Pre-Route enables task analysis, coverage estimation, and information-need prediction, producing explainable and cost-efficient routing decisions. Our study shows three key findings: (i) LLMs possess latent routing ability that can be reliably elicited with guidelines, allowing single-sample performance to approach that of multi-sample (Best-of-N) results; (ii) linear probes reveal that structured prompts sharpen the separability of the "optimal routing dimension" in representation space; and (iii) distillation transfers this reasoning structure to smaller models for lightweight deployment. Experiments on LaRA (in-domain) and LongBench-v2 (OOD) confirm that Pre-Route outperforms Always-RAG, Always-LC, and Self-Route baselines, achieving superior overall cost-effectiveness.

preprint2026arXiv

STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading

In financial trading, factor models are widely used to price assets and capture excess returns from mispricing. Recently, we have witnessed the rise of variational autoencoder-based latent factor models, which learn latent factors self-adaptively. While these models focus on modeling overall market conditions, they often fail to effectively capture the temporal patterns of individual stocks. Additionally, representing multiple factors as single values simplifies the model but limits its ability to capture complex relationships and dependencies. As a result, the learned factors are of low quality and lack diversity, reducing their effectiveness and robustness across different trading periods. To address these issues, we propose a Spatio-Temporal factOR Model based on dual vector quantized variational autoencoders, named STORM, which extracts features of stocks from temporal and spatial perspectives, then fuses and aligns these features at the fine-grained and semantic level, and represents the factors as multi-dimensional embeddings. The discrete codebooks cluster similar factor embeddings, ensuring orthogonality and diversity, which helps distinguish between different factors and enables factor selection in financial trading. To show the performance of the proposed factor model, we apply it to two downstream experiments: portfolio management on two stock datasets and individual trading tasks on six specific stocks. The extensive experiments demonstrate STORM's flexibility in adapting to downstream tasks and superior performance over baseline models.

preprint2026arXiv

WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning

Large Language Model(LLM)-based agents have shown strong capabilities in web information seeking, with reinforcement learning (RL) becoming a key optimization paradigm. However, planning remains a bottleneck, as existing methods struggle with long-horizon strategies. Our analysis reveals a critical phenomenon, plan anchor, where the first reasoning step disproportionately impacts downstream behavior in long-horizon web reasoning tasks. Current RL algorithms, fail to account for this by uniformly distributing rewards across the trajectory. To address this, we propose Anchor-GRPO, a two-stage RL framework that decouples planning and execution. In Stage 1, the agent optimizes its first-step planning using fine-grained rubrics derived from self-play experiences and human calibration. In Stage 2, execution is aligned with the initial plan through sparse rewards, ensuring stable and efficient tool usage. We evaluate Anchor-GRPO on four benchmarks: BrowseComp, BrowseComp-Zh, GAIA, and XBench-DeepSearch. Across models from 3B to 30B, Anchor-GRPO outperforms baseline GRPO and First-step GRPO, improving task success and tool efficiency. Notably, WebAnchor-30B achieves 46.0% pass@1 on BrowseComp and 76.4% on GAIA. Anchor-GRPO also demonstrates strong scalability, getting higher accuracy as model size and context length increase.

preprint2025arXiv

AODDiff: Probabilistic Reconstruction of Aerosol Optical Depth via Diffusion-based Bayesian Inference

High-quality reconstruction of Aerosol Optical Depth (AOD) fields is critical for Atmosphere monitoring, yet current models remain constrained by the scarcity of complete training data and a lack of uncertainty quantification.To address these limitations, we propose AODDiff, a probabilistic reconstruction framework based on diffusion-based Bayesian inference. By leveraging the learned spatiotemporal probability distribution of the AOD field as a generative prior, this framework can be flexibly adapted to various reconstruction tasks without requiring task-specific retraining. We first introduce a corruption-aware training strategy to learns a spatiotemporal AOD prior solely from naturally incomplete data. Subsequently, we employ a decoupled annealing posterior sampling strategy that enables the more effective and integration of heterogeneous observations as constraints to guide the generation process. We validate the proposed framework through extensive experiments on Reanalysis data. Results across downscaling and inpainting tasks confirm the efficacy and robustness of AODDiff, specifically demonstrating its advantage in maintaining high spatial spectral fidelity. Furthermore, as a generative model, AODDiff inherently enables uncertainty quantification via multiple sampling, offering critical confidence metrics for downstream applications.

preprint2024arXiv

WFTNet: Exploiting Global and Local Periodicity in Long-term Time Series Forecasting

Recent CNN and Transformer-based models tried to utilize frequency and periodicity information for long-term time series forecasting. However, most existing work is based on Fourier transform, which cannot capture fine-grained and local frequency structure. In this paper, we propose a Wavelet-Fourier Transform Network (WFTNet) for long-term time series forecasting. WFTNet utilizes both Fourier and wavelet transforms to extract comprehensive temporal-frequency information from the signal, where Fourier transform captures the global periodic patterns and wavelet transform captures the local ones. Furthermore, we introduce a Periodicity-Weighted Coefficient (PWC) to adaptively balance the importance of global and local frequency patterns. Extensive experiments on various time series datasets show that WFTNet consistently outperforms other state-of-the-art baseline. Code is available at https://github.com/Hank0626/WFTNet.

preprint2023arXiv

Non-Equilibrium Spark Plasma Reactive Doping Enables Highly Adjustable Metal to Insulator Transitions and Improved Mechanical Stability for VO2

Although vanadium dioxide (VO2) exhibits the most abrupt metal to insulator transition (MIT) properties near room-temperature, the present regulation of their MIT functionalities is insufficient owing to the high complexity and susception associated with V4+. Herein, we demonstrate a spark plasma assisted reactive sintering (SPARS) approach to simultaneously achieve in situ doping and sintering of VO2 within largely short period (~10 minutes). This enables high convenience and flexibility in regulating the electronic structure of VO2 via dopant elements covering Ti, W, Nb, Mo, Cr and Fe, leading to a wide adjustment in their metal to insulator transition temperature (TMIT) and basic resistivity. Furthermore, the mechanical strengths of the doped-VO2 were meanwhile largely improved via the compositing effect of high melting-point dopant oxide. The high adjustability in MIT properties and improved mechanical properties further paves the way towards practical applications of VO2 in power electronics, thermochromism and infrared camouflage.

preprint2022arXiv

A Benchmark and Comprehensive Survey on Knowledge Graph Entity Alignment via Representation Learning

In the last few years, the interest in knowledge bases has grown exponentially in both the research community and the industry due to their essential role in AI applications. Entity alignment is an important task for enriching knowledge bases. This paper provides a comprehensive tutorial-type survey on representative entity alignment techniques that use the new approach of representation learning. We present a framework for capturing the key characteristics of these techniques, propose two datasets to address the limitation of existing benchmark datasets, and conduct extensive experiments using the proposed datasets. The framework gives a clear picture of how the techniques work. The experiments yield important results about the empirical performance of the techniques and how various factors affect the performance. One important observation not stressed by previous work is that techniques making good use of attribute triples and relation predicates as features stand out as winners.

preprint2022arXiv

A Large-Scale Chinese Short-Text Conversation Dataset

The advancements of neural dialogue generation models show promising results on modeling short-text conversations. However, training such models usually needs a large-scale high-quality dialogue corpus, which is hard to access. In this paper, we present a large-scale cleaned Chinese conversation dataset, LCCC, which contains a base version (6.8million dialogues) and a large version (12.0 million dialogues). The quality of our dataset is ensured by a rigorous data cleaning pipeline, which is built based on a set of rules and a classifier that is trained on manually annotated 110K dialogue pairs. We also release pre-training dialogue models which are trained on LCCC-base and LCCC-large respectively. The cleaned dataset and the pre-training models will facilitate the research of short-text conversation modeling. All the models and datasets are available at https://github.com/thu-coai/CDial-GPT.

preprint2022arXiv

Adaptive Frequency Learning in Two-branch Face Forgery Detection

Face forgery has attracted increasing attention in recent applications of computer vision. Existing detection techniques using the two-branch framework benefit a lot from a frequency perspective, yet are restricted by their fixed frequency decomposition and transform. In this paper, we propose to Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. To be specific, we automatically learn decomposition in the frequency domain by introducing heterogeneity constraints, and propose an attention-based module to adaptively incorporate frequency features into spatial clues. Then we liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers. Extensive experiments show that AFD generally outperforms.

preprint2022arXiv

Backdoor Learning: A Survey

Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if the hidden backdoor is activated by attacker-specified triggers. This threat could happen when the training process is not fully controlled, such as training on third-party datasets or adopting third-party models, which poses a new and realistic threat. Although backdoor learning is an emerging and rapidly growing research area, its systematic review, however, remains blank. In this paper, we present the first comprehensive survey of this realm. We summarize and categorize existing backdoor attacks and defenses based on their characteristics, and provide a unified framework for analyzing poisoning-based backdoor attacks. Besides, we also analyze the relation between backdoor attacks and relevant fields ($i.e.,$ adversarial attacks and data poisoning), and summarize widely adopted benchmark datasets. Finally, we briefly outline certain future research directions relying upon reviewed works. A curated list of backdoor-related resources is also available at \url{https://github.com/THUYimingLi/backdoor-learning-resources}.

preprint2022arXiv

CausPref: Causal Preference Learning for Out-of-Distribution Recommendation

In spite of the tremendous development of recommender system owing to the progressive capability of machine learning recently, the current recommender system is still vulnerable to the distribution shift of users and items in realistic scenarios, leading to the sharp decline of performance in testing environments. It is even more severe in many common applications where only the implicit feedback from sparse data is available. Hence, it is crucial to promote the performance stability of recommendation method in different environments. In this work, we first make a thorough analysis of implicit recommendation problem from the viewpoint of out-of-distribution (OOD) generalization. Then under the guidance of our theoretical analysis, we propose to incorporate the recommendation-specific DAG learner into a novel causal preference-based recommendation framework named CausPref, mainly consisting of causal learning of invariant user preference and anti-preference negative sampling to deal with implicit feedback. Extensive experimental results from real-world datasets clearly demonstrate that our approach surpasses the benchmark models significantly under types of out-of-distribution settings, and show its impressive interpretability.

preprint2022arXiv

DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition

The MultiCoNER shared task aims at detecting semantically ambiguous and complex named entities in short and low-context settings for multiple languages. The lack of contexts makes the recognition of ambiguous named entities challenging. To alleviate this issue, our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia to provide related context information to the named entity recognition (NER) model. Given an input sentence, our system effectively retrieves related contexts from the knowledge base. The original input sentences are then augmented with such context information, allowing significantly better contextualized token representations to be captured. Our system wins 10 out of 13 tracks in the MultiCoNER shared task.

preprint2022arXiv

Few-Shot Backdoor Attacks on Visual Object Tracking

Visual object tracking (VOT) has been widely adopted in mission-critical applications, such as autonomous driving and intelligent surveillance systems. In current practice, third-party resources such as datasets, backbone networks, and training platforms are frequently used to train high-performance VOT models. Whilst these resources bring certain convenience, they also introduce new security threats into VOT models. In this paper, we reveal such a threat where an adversary can easily implant hidden backdoors into VOT models by tempering with the training process. Specifically, we propose a simple yet effective few-shot backdoor attack (FSBA) that optimizes two losses alternately: 1) a \emph{feature loss} defined in the hidden feature space, and 2) the standard \emph{tracking loss}. We show that, once the backdoor is embedded into the target model by our FSBA, it can trick the model to lose track of specific objects even when the \emph{trigger} only appears in one or a few frames. We examine our attack in both digital and physical-world settings and show that it can significantly degrade the performance of state-of-the-art VOT trackers. We also show that our attack is resistant to potential defenses, highlighting the vulnerability of VOT models to potential backdoor attacks.

preprint2022arXiv

Improving Adversarial Robustness via Channel-wise Activation Suppressing

The study of adversarial examples and their activation has attracted significant attention for secure and robust learning with deep neural networks (DNNs). Different from existing works, in this paper, we highlight two new characteristics of adversarial examples from the channel-wise activation perspective: 1) the activation magnitudes of adversarial examples are higher than that of natural examples; and 2) the channels are activated more uniformly by adversarial examples than natural examples. We find that the state-of-the-art defense adversarial training has addressed the first issue of high activation magnitudes via training on adversarial examples, while the second issue of uniform activation remains. This motivates us to suppress redundant activation from being activated by adversarial perturbations via a Channel-wise Activation Suppressing (CAS) strategy. We show that CAS can train a model that inherently suppresses adversarial activation, and can be easily applied to existing defense methods to further improve their robustness. Our work provides a simple but generic training strategy for robustifying the intermediate layer activation of DNNs.

preprint2021arXiv

$t$-$k$-means: A Robust and Stable $k$-means Variant

$k$-means algorithm is one of the most classical clustering methods, which has been widely and successfully used in signal processing. However, due to the thin-tailed property of the Gaussian distribution, $k$-means algorithm suffers from relatively poor performance on the dataset containing heavy-tailed data or outliers. Besides, standard $k$-means algorithm also has relatively weak stability, $i.e.$ its results have a large variance, which reduces its credibility. In this paper, we propose a robust and stable $k$-means variant, dubbed the $t$-$k$-means, as well as its fast version to alleviate those problems. Theoretically, we derive the $t$-$k$-means and analyze its robustness and stability from the aspect of the loss function and the expression of the clustering center, respectively. Extensive experiments are also conducted, which verify the effectiveness and efficiency of the proposed method. The code for reproducing main results is available at \url{https://github.com/THUYimingLi/t-k-means}.

preprint2021arXiv

Backdoor Attack against Speaker Verification

Speaker verification has been widely and successfully adopted in many mission-critical areas for user identification. The training of speaker verification requires a large amount of data, therefore users usually need to adopt third-party data ($e.g.$, data from the Internet or third-party data company). This raises the question of whether adopting untrusted third-party data can pose a security threat. In this paper, we demonstrate that it is possible to inject the hidden backdoor for infecting speaker verification models by poisoning the training data. Specifically, we design a clustering-based attack scheme where poisoned samples from different clusters will contain different triggers ($i.e.$, pre-defined utterances), based on our understanding of verification tasks. The infected models behave normally on benign samples, while attacker-specified unenrolled triggers will successfully pass the verification even if the attacker has no information about the enrolled speaker. We also demonstrate that existing backdoor attacks cannot be directly adopted in attacking speaker verification. Our approach not only provides a new perspective for designing novel attacks, but also serves as a strong baseline for improving the robustness of verification methods. The code for reproducing main results is available at \url{https://github.com/zhaitongqing233/Backdoor-attack-against-speaker-verification}.

preprint2021arXiv

Hydrogen induced electronic transition within correlated perovskite nickelates with heavy rare-earth composition

Although discovery in hydrogen induced electronic transition within perovskite family of rare-earth nickelate (ReNiO3) opens up a new paradigm in exploring both the new materials functionality and device applications, the existing research stays at ReNiO3 with light rare-earth compositions. To further extend the cognition towards heavier rare-earth, herein we demonstrate the hydrogen induced electronic transitions for quasi-single crystalline ReNiO3/LaAlO3 (001) heterostructures, covering a large variety of the rare-earth composition from Nd to Er. The hydrogen induced elevations in the resistivity of ReNiO3 (RH/R0) show an unexpected non-monotonic tendency with the atomic number of the rare-earth composition, e.g., firstly increase from Nd to Dy and afterwards decreases from Dy to Er. Although ReNiO3 with heavy rare-earth composition (e.g. DyNiO3) exhibits large RH/R0 up to 107, their hydrogen induced electronic transition is not reversible. Further probing the electronic structures via near edge X-ray absorption fine structure analysis clearly demonstrates the respective transition in electronic structures of ReNiO3 from Ni3+ based electron itinerant orbital configurations towards the Ni2+ based electron localized state. Balancing the hydrogen induced transition reversibility with the abruption in the variations of material resistivity, we emphasize that the ReNiO3 with middle rare-earth compositions (e.g. Sm) to be most suitable that caters for the potential applications in correlated electronic devices.

preprint2021arXiv

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models. For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges. Inspired by the recent progress in network architecture search, it is interesting to explore the possibility of discovering new loss function formulations via directly searching the primitive operation combinations. So that the learned losses not only fit for diverse object detection challenges to alleviate huge human efforts, but also have better alignment with evaluation metric and good mathematical convergence property. Beyond the previous auto-loss works on face recognition and image classification, our work makes the first attempt to discover new loss functions for the challenging object detection from primitive operation levels. We propose an effective convergence-simulation driven evolutionary search algorithm, called CSE-Autoloss, for speeding up the search progress by regularizing the mathematical rationality of loss candidates via convergence property verification and model optimization simulation. CSE-Autoloss involves the search space that cover a wide range of the possible variants of existing losses and discovers best-searched loss function combination within a short time (around 1.5 wall-clock days). We conduct extensive evaluations of loss function search on popular detectors and validate the good generalization capability of searched losses across diverse architectures and datasets. Our experiments show that the best-discovered loss function combinations outperform default combinations by 1.1% and 0.8% in terms of mAP for two-stage and one-stage detectors on COCO respectively. Our searched losses are available at https://github.com/PerdonLiu/CSE-Autoloss.

preprint2021arXiv

Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities

The optimization problems associated with training generative adversarial neural networks can be largely reduced to certain {\em non-monotone} variational inequality problems (VIPs), whereas existing convergence results are mostly based on monotone or strongly monotone assumptions. In this paper, we propose {\em optimistic dual extrapolation (OptDE)}, a method that only performs {\em one} gradient evaluation per iteration. We show that OptDE is provably convergent to {\em a strong solution} under different coherent non-monotone assumptions. In particular, when a {\em weak solution} exists, the convergence rate of our method is $O(1/{ε^{2}})$, which matches the best existing result of the methods with two gradient evaluations. Further, when a {\em $σ$-weak solution} exists, the convergence guarantee is improved to the linear rate $O(\log\frac{1}ε)$. Along the way--as a byproduct of our inquiries into non-monotone variational inequalities--we provide the near-optimal $O\big(\frac{1}ε\log \frac{1}ε\big)$ convergence guarantee in terms of restricted strong merit function for monotone variational inequalities. We also show how our results can be naturally generalized to the stochastic setting, and obtain corresponding new convergence results. Taken together, our results contribute to the broad landscape of variational inequality--both non-monotone and monotone alike--by providing a novel and more practical algorithm with the state-of-the-art convergence guarantees.

preprint2021arXiv

Rethinking the Trigger of Backdoor Attack

Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs), such that the prediction of the infected model will be maliciously changed if the hidden backdoor is activated by the attacker-defined trigger, while it performs well on benign samples. Currently, most of existing backdoor attacks adopted the setting of \emph{static} trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area. In this paper, we revisit this attack paradigm by analyzing the characteristics of the static trigger. We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training. We further explore how to utilize this property for backdoor defense, and discuss how to alleviate such vulnerability of existing attacks.

preprint2021arXiv

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In both general convex and strongly convex settings, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations which improves the best-known result $O(n\log n)$, where $n$ is the number of samples. Meanwhile, VRADA matches the lower bound of the general convex setting up to a $\log\log n$ factor and matches the lower bounds in both regimes $n\le Θ(κ)$ and $n\gg κ$ of the strongly convex setting, where $κ$ denotes the condition number. Besides improving the best-known results and matching all the above lower bounds simultaneously, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. The underlying novel approaches such as the novel initialization strategy in VRADA may be of independent interest. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.

preprint2021arXiv

Visual Privacy Protection via Mapping Distortion

Privacy protection is an important research area, which is especially critical in this big data era. To a large extent, the privacy of visual classification data is mainly in the mapping between the image and its corresponding label, since this relation provides a great amount of information and can be used in other scenarios. In this paper, we propose the mapping distortion based protection (MDP) and its augmentation-based extension (AugMDP) to protect the data privacy by modifying the original dataset. In the modified dataset generated by MDP, the image and its label are not consistent ($e.g.$, a cat-like image is labeled as the dog), whereas the DNNs trained on it can still achieve good performance on benign testing set. As such, this method can protect privacy when the dataset is leaked. Extensive experiments are conducted, which verify the effectiveness and feasibility of our method. The code for reproducing main results is available at \url{https://github.com/PerdonLiu/Visual-Privacy-Protection-via-Mapping-Distortion}.

preprint2020arXiv

AC-frequency switchable correlated transports in rare-earth perovskite nickelates

Whilst electron correlations were previously recognized to trigger beyond conventional direct current (DC) electronic transportations (e.g. metal-to-insulator transitions, bad metal, thermistors), their respective influences to the alternation current (AC) transport are largely overlooked. Herein, we demonstrate active regulations in the electronic functionalities of d-band correlated rare-earth nickelate (ReNiO3) thin films, by simply utilizing their electronic responses to AC-frequencies (fAC). Assisted by temperature dependent near edge X-ray absorption fine structure analysis, we discovered positive temperature dependences in Coulomb viscosity of ReNiO3 that moderates their AC impedance. Distinguished crosslinking among R(Real)-fAC measured in nearby temperatures is observed that differs to conventional oxides. It enables active adjustability in correlated transports of ReNiO3, among NTCR-, TDelta- and PTCR- thermistors, via fAC from the electronic perspective without varying materials or device structures. The TDelta-fAC relationship can be further widely adjusted via Re composition and interfacial strains. The AC-frequency sensitivity discovered in ReNiO3 brings in a new freedom to regulating and switching the device working states beyond the present semiconductor technologies. It opens a new paradigm for enriching novel electronic applications catering automatic transmission or artificial intelligence in sensing temperatures and frequencies.

preprint2020arXiv

An ACE/CRIS-observation-based Galactic Cosmic Rays heavy nuclei spectra model II

An observation-based Galactic Cosmic Ray (GCR) spectral model for heavy nuclei is developed. Zhao and Qin (J. Geophys. Res. Space Phys.118, 1837(2013)) proposed an empirical elemental GCR spectra model for nuclear charge 5-28 over the energy range from 30 to 500 MeV/nuc, which is proved to be successful in predicting yearly averaged GCR heavy nuclei spectra.Based on the latest highly statistically precise measurements from ACE/CRIS,a further elemental GCR model with monthly averaged spectra is presented. The model can reproduce the past and predict the futureGCR intensity monthly by correlating model parameters with thecontinuous sunspot number (SSN) record. The effects of solar activity on GCR modulation are considered separately for odd and even solar cycles. Compared with other comprehensive GCR models, our modeling results are satisfyingly consistent with the GCR spectral measurements from ACE/SIS and IMP-8, and have comparable prediction accuracy as the Badhwar & O'Neill 2014 model.A detailed error analysis is also provided.Finally, the GCR carbon and iron nuclei fluxes for the subsequent two solar cycles (SC 25 and 26) are predicted and they show a potential trend in reduced flux amplitude, which is suspected to be relevant to possible weak solar cycles.

preprint2020arXiv

Breaking the $O(1/ε)$ Optimal Rate for a Class of Minimax Problems

It is known that for convex optimization $\min_{\mathbf{w}\in\mathcal{W}}f(\mathbf{w})$, the best possible rate of first order accelerated methods is $O(1/\sqrtε)$. However, for the bilinear minimax problem: $\min_{\mathbf{w}\in\mathcal{W}}\max_{\mathbf{v}\in\mathcal{V}}$ $f(\mathbf{w})$ $+\langle\mathbf{w}, \boldsymbol{A}\mathbf{v}\rangle$ $-h(\mathbf{v})$ where both $f(\mathbf{w})$ and $h(\mathbf{v})$ are convex, the best known rate of first order methods slows down to $O(1/ε)$. It is not known whether one can achieve the accelerated rate $O(1/\sqrtε)$ for the bilinear minimax problem without assuming $f(\mathbf{w})$ and $h(\mathbf{v})$ being strongly convex. In this paper, we fill this theoretical gap by proposing a bilinear accelerated extragradient (BAXG) method. We show that when $\mathcal{W}=\mathbb{R}^d$, $f(\mathbf{w})$ and $h(\mathbf{v})$ are convex and smooth, and $\boldsymbol{A}$ has full column rank, then the BAXG method achieves an accelerated rate $O(1/\sqrtε\log \frac{1}ε)$, within a logarithmic factor to the likely optimal rate $O(1/\sqrtε)$. As result, a large class of bilinear convex concave minimax problems, including a few problems of practical importance, can be solved much faster than previously known methods.

preprint2020arXiv

Detecting Suspected Epidemic Cases Using Trajectory Big Data

Emerging infectious diseases are existential threats to human health and global stability. The recent outbreaks of the novel coronavirus COVID-19 have rapidly formed a global pandemic, causing hundreds of thousands of infections and huge economic loss. The WHO declares that more precise measures to track, detect and isolate infected people are among the most effective means to quickly contain the outbreak. Based on trajectory provided by the big data and the mean field theory, we establish an aggregated risk mean field that contains information of all risk-spreading particles by proposing a spatio-temporal model named HiRES risk map. It has dynamic fine spatial resolution and high computation efficiency enabling fast update. We then propose an objective individual epidemic risk scoring model named HiRES-p based on HiRES risk maps, and use it to develop statistical inference and machine learning methods for detecting suspected epidemic-infected individuals. We conduct numerical experiments by applying the proposed methods to study the early outbreak of COVID-19 in China. Results show that the HiRES risk map has strong ability in capturing global trend and local variability of the epidemic risk, thus can be applied to monitor epidemic risk at country, province, city and community levels, as well as at specific high-risk locations such as hospital and station. HiRES-p score seems to be an effective measurement of personal epidemic risk. The accuracy of both detecting methods are above 90\% when the population infection rate is under 20\%, which indicates great application potential in epidemic risk prevention and control practice.

preprint2020arXiv

Does the price of strategic commodities respond to U.S. Partisan Conflict?

A noteworthy feature of U.S. politics in recent years is serious partisan conflict, which has led to intensifying polarization and exacerbating high policy uncertainty. The US is a significant player in oil and gold markets. Oil and gold also form the basis of important strategic reserves in the US. We investigate whether U.S. partisan conflict affects the returns and price volatility of oil and gold using a parametric test of Granger causality in quantiles. The empirical results suggest that U.S. partisan conflict has an effect on the returns of oil and gold, and the effects are concentrated at the tail of the conditional distribution of returns. More specifically, the partisan conflict mainly affects oil returns when the crude oil market is in a bearish state (lower quantiles). By contrast, partisan conflict matters for gold returns only when the gold market is in a bullish scenario (higher quantiles). In addition, for the volatility of oil and gold, the predictability of partisan conflict index virtually covers the entire distribution of volatility.

preprint2020arXiv

Multinomial Random Forest: Toward Consistency and Privacy-Preservation

Despite the impressive performance of random forests (RF), its theoretical properties have not been thoroughly understood. In this paper, we propose a novel RF framework, dubbed multinomial random forest (MRF), to analyze the \emph{consistency} and \emph{privacy-preservation}. Instead of deterministic greedy split rule or with simple randomness, the MRF adopts two impurity-based multinomial distributions to randomly select a split feature and a split value respectively. Theoretically, we prove the consistency of the proposed MRF and analyze its privacy-preservation within the framework of differential privacy. We also demonstrate with multiple datasets that its performance is on par with the standard RF. To the best of our knowledge, MRF is the first consistent RF variant that has comparable performance to the standard RF.

preprint2020arXiv

Perpendicularly magnetized YIG films with small Gilbert damping constant and anomalous spin transport properties

The Y3Fe5O12 (YIG) films with perpendicular magnetic anisotropy (PMA) have recently attracted a great deal of attention for spintronics applications. Here, we report the induced PMA in the ultrathin YIG films grown on (Gd2.6Ca0.4)(Ga4.1Mg0.25Zr0.65)O12 (SGGG) substrates by epitaxial strain without preprocessing. Reciprocal space mapping shows that the films are lattice-matched to the substrates without strain relaxation. Through ferromagnetic resonance and polarized neutron reflectometry measurements, we find that these YIG films have ultra-low Gilbert damping constant with a magnetic dead layer as thin as about 0.3 nm at the YIG/SGGG interfaces. Moreover, the transport behavior of the Pt/YIG/SGGG films reveals an enhancement of spin mixing conductance and a large non-monotonic magnetic field dependence of anomalous Hall effect as compared with the Pt/YIG/Gd3Ga5O12 (GGG) films. The non-monotonic anomalous Hall signal is extracted in the temperature range from 150 to 350 K, which has been ascribed to the possible non-collinear magnetic order at the Pt/YIG interface induced by uniaxial strain.

preprint2020arXiv

Rectified Decision Trees: Exploring the Landscape of Interpretable and Effective Machine Learning

Interpretability and effectiveness are two essential and indispensable requirements for adopting machine learning methods in reality. In this paper, we propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT), to explore the possibility of fulfilling those requirements simultaneously. Specifically, we extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels while preserving the deterministic splitting paths. We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method. Accordingly, ReDT preserves the excellent interpretable nature of the decision trees while having a relatively good performance. The effectiveness of adopting soft labels instead of hard ones is also analyzed empirically and theoretically. Surprisingly, experiments indicate that the introduction of soft labels also reduces the model size compared with the standard decision trees from the aspect of the total nodes and rules, which is an unexpected gift from the `dark knowledge' distilled from the teacher model.

preprint2020arXiv

Rectified Decision Trees: Towards Interpretability, Compression and Empirical Soundness

How to obtain a model with good interpretability and performance has always been an important research topic. In this paper, we propose rectified decision trees (ReDT), a knowledge distillation based decision trees rectification with high interpretability, small model size, and empirical soundness. Specifically, we extend the impurity calculation and the pure ending condition of the classical decision tree to propose a decision tree extension that allows the use of soft labels generated by a well-trained teacher model in training and prediction process. It is worth noting that for the acquisition of soft labels, we propose a new multiple cross-validation based method to reduce the effects of randomness and overfitting. These approaches ensure that ReDT retains excellent interpretability and even achieves fewer nodes than the decision tree in the aspect of compression while having relatively good performance. Besides, in contrast to traditional knowledge distillation, back propagation of the student model is not necessarily required in ReDT, which is an attempt of a new knowledge distillation approach. Extensive experiments are conducted, which demonstrates the superiority of ReDT in interpretability, compression, and empirical soundness.

preprint2020arXiv

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student's and the teachers' structure level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.

preprint2020arXiv

Toward Adversarial Robustness via Semi-supervised Robust Training

Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + δ$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{https://github.com/THUYimingLi/Semi-supervised_Robust_Training}.

preprint2020arXiv

Unconventional charge-spin conversion in Weyl-semimetal WTe2

An outstanding feature of topological quantum materials is their novel spin topology in the electronic band structures with an expected large charge-to-spin conversion efficiency. Here, we report a charge current-induced spin polarization in the type-II Weyl semimetal candidate WTe2 and efficient spin injection and detection in a graphene channel up to room temperature. Contrary to the conventional spin Hall and Rashba-Edelstein effects, our measurements indicate an unconventional charge-to-spin conversion in WTe2, which is primarily forbidden by the crystal symmetry of the system. Such a large spin polarization can be possible in WTe2 due to a reduced crystal symmetry combined with its large spin Berry curvature, spin-orbit interaction with a novel spin-texture of the Fermi states. We demonstrate a robust and practical method for electrical creation and detection of such a spin polarization using both charge-to-spin conversion and its inverse phenomenon and utilized it for efficient spin injection and detection in a graphene channel up to room temperature. These findings open opportunities for utilizing topological Weyl materials as non-magnetic spin sources in allelectrical van der Waals spintronic circuits and for low-power and high-performance non-volatile spintronic technologies.

preprint2019arXiv

Observation of Spin Hall Effect in Weyl Semimetal WTe2 at Room Temperature

Discovery of topological Weyl semimetals has revealed the opportunities to realize several extraordinary physical phenomena in condensed matter physics. Specifically, these semimetals with strong spin-orbit coupling, broken inversion symmetry and novel spin texture are predicted to exhibit a large spin Hall effect that can efficiently convert the charge current to a spin current. Here we report the direct experimental observation of a large spin Hall and inverse spin Hall effects in Weyl semimetal WTe2 at room temperature obeying Onsager reciprocity relation. We demonstrate the detection of the pure spin current generated by spin Hall phenomenon in WTe2 by making van der Waals heterostructures with graphene, taking advantage of its long spin coherence length and spin transmission at the heterostructure interface. These experimental findings well supported by ab initio calculations show a large charge-spin conversion efficiency in WTe2; which can pave the way for utilization of spin-orbit induced phenomena in spintronic memory and logic circuit architectures.