Source author record

Hua Wu

Hua Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language cond-mat.str-el Artificial Intelligence cond-mat.mtrl-sci Machine Learning Computer Vision cond-mat.soft quant-ph Biomolecules cond-mat cond-mat.other cond-mat.stat-mech eess.AS Information Retrieval Molecular Networks Neural and Evolutionary Computing physics.chem-ph Quantitative Methods Sound

Catalog footprint

What is connected

67works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models

Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while others like Llama yield limited improvements. Complementing data-centric approaches, we reveal that this disparity reflects a hidden structural property: \textbf{distributional clarity} in probability space. Through a three-stage analysis-from phenomenon to mechanism to interpretation-we uncover that RL-friendly models exhibit intra-class compactness and inter-class separation in their probability assignments to correct vs. incorrect responses. We quantify this clarity using the \textbf{Silhouette Coefficient} ($S$) and demonstrate that (1) high $S$ correlates strongly with RL performance; (2) low $S$ is associated with severe logic errors and reasoning instability. To confirm this property, we introduce a Silhouette-Aware Reweighting strategy that prioritizes low-$S$ samples during training. Experiments across six mathematical benchmarks show consistent improvements across all model families, with gains up to 5.9 points on AIME24. Our work establishes distributional clarity as a fundamental, trainable property underlying RL-Friendliness.

preprint2026arXiv

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

Extending the input modality of Large Language Models~(LLMs) to the audio domain is essential for achieving comprehensive multimodal perception. However, it is well-known that acoustic information is intrinsically \textit{heterogeneous}, entangling attributes such as speech, music, and environmental context. Existing research is limited to a dense, parameter-shared adapter to model these diverse patterns, which induces \textit{gradient conflict} during optimization, as parameter updates required for distinct attributes contradict each other. To address this limitation, we introduce the \textit{\textbf{MoE-Adapter}}, a sparse Mixture-of-Experts~(MoE) architecture designed to decouple acoustic information. Specifically, it employs a dynamic gating mechanism that routes audio tokens to specialized experts capturing complementary feature subspaces while retaining shared experts for global context, thereby mitigating gradient conflicts and enabling fine-grained feature learning. Comprehensive experiments show that the MoE-Adapter achieves superior performance on both audio semantic and paralinguistic tasks, consistently outperforming dense linear baselines with comparable computational costs. Furthermore, we will release the related code and models to facilitate future research.

preprint2026arXiv

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally intensive and difficult to scale. In this work, we introduce VideoAR, the first large-scale Visual Autoregressive (VAR) framework for video generation that combines multi-scale next-frame prediction with autoregressive modeling. VideoAR disentangles spatial and temporal dependencies by integrating intra-frame VAR modeling with causal next-frame prediction, supported by a 3D multi-scale tokenizer that efficiently encodes spatio-temporal dynamics. To improve long-term consistency, we propose Multi-scale Temporal RoPE, Cross-Frame Error Correction, and Random Frame Mask, which collectively mitigate error propagation and stabilize temporal coherence. Our multi-stage pretraining pipeline progressively aligns spatial and temporal learning across increasing resolutions and durations. Empirically, VideoAR achieves new state-of-the-art results among autoregressive models, improving FVD on UCF-101 from 99.5 to 88.6 while reducing inference steps by over 10x, and reaching a VBench score of 81.74-competitive with diffusion-based models an order of magnitude larger. These results demonstrate that VideoAR narrows the performance gap between autoregressive and diffusion paradigms, offering a scalable, efficient, and temporally consistent foundation for future video generation research.

preprint2023arXiv

ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Task-agnostic knowledge distillation attempts to address the problem of deploying large pretrained language model in resource-constrained scenarios by compressing a large pretrained model called teacher into a smaller one called student such that the student can be directly finetuned on downstream tasks and retains comparable performance. However, we empirically find that there is a generalization gap between the student and the teacher in existing methods. In this work, we show that we can leverage multi-task learning in task-agnostic distillation to advance the generalization of the resulted student. In particular, we propose Multi-task Infused Task-agnostic Knowledge Distillation (MITKD). We first enhance the teacher by multi-task training it on multiple downstream tasks and then perform distillation to produce the student. Experimental results demonstrate that our method yields a student with much better generalization, significantly outperforms existing baselines, and establishes a new state-of-the-art result on in-domain, out-domain, and low-resource datasets in the setting of task-agnostic distillation. Moreover, our method even exceeds an 8x larger BERT$_{\text{Base}}$ on SQuAD and four GLUE tasks. In addition, by combining ERNIE 3.0, our method achieves state-of-the-art results on 10 Chinese datasets.

preprint2023arXiv

Universal Information Extraction as Unified Semantic Matching

The challenge of information extraction (IE) lies in the diversity of label schemas and the heterogeneity of structures. Traditional methods require task-specific model design and rely heavily on expensive supervision, making them difficult to generalize to new schemas. In this paper, we decouple IE into two basic abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching (USM) framework, which introduces three unified token linking operations to model the abilities of structuring and conceptualizing. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand. Empirical evaluation on 4 IE tasks shows that the proposed method achieves state-of-the-art performance under the supervised experiments and shows strong generalization ability in zero/few-shot transfer settings.

preprint2022arXiv

An Interpretability Evaluation Benchmark for Pre-trained Language Models

While pre-trained language models (LMs) have brought great improvements in many NLP tasks, there is increasing attention to explore capabilities of LMs and interpret their predictions. However, existing works usually focus only on a certain capability with some downstream tasks. There is a lack of datasets for directly evaluating the masked word prediction performance and the interpretability of pre-trained LMs. To fill in the gap, we propose a novel evaluation benchmark providing with both English and Chinese annotated data. It tests LMs abilities in multiple dimensions, i.e., grammar, semantics, knowledge, reasoning and computation. In addition, it provides carefully annotated token-level rationales that satisfy sufficiency and compactness. It contains perturbed instances for each original instance, so as to use the rationale consistency under perturbations as the metric for faithfulness, a perspective of interpretability. We conduct experiments on several widely-used pre-trained LMs. The results show that they perform very poorly on the dimensions of knowledge and computation. And their plausibility in all dimensions is far from satisfactory, especially when the rationale is short. In addition, the pre-trained LMs we evaluated are not robust on syntax-aware data. We will release this evaluation benchmark at \url{http://xyz}, and hope it can facilitate the research progress of pre-trained LMs.

preprint2022arXiv

Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance. It consists of two procedures: bidirectional pretraining and unidirectional finetuning. Both procedures utilize SimCut, a simple regularization method that forces the consistency between the output distributions of the original and the cutoff sentence pairs. Without leveraging extra dataset via back-translation or integrating large-scale pretrained model, Bi-SimCut achieves strong translation performance across five translation benchmarks (data sizes range from 160K to 20.2M): BLEU scores of 31.16 for en -> de and 38.37 for de -> en on the IWSLT14 dataset, 30.78 for en -> de and 35.15 for de -> en on the WMT14 dataset, and 27.17 for zh -> en on the WMT17 dataset. SimCut is not a new method, but a version of Cutoff (Shen et al., 2020) simplified and adapted for NMT, and it could be considered as a perturbation-based method. Given the universality and simplicity of SimCut and Bi-SimCut, we believe they can serve as strong baselines for future NMT research.

preprint2022arXiv

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

Pre-trained language models (PLMs), such as BERT and GPT, have revolutionized the field of NLP, not only in the general domain but also in the biomedical domain. Most prior efforts in building biomedical PLMs have resorted simply to domain adaptation and focused mainly on English. In this work we introduce eHealth, a Chinese biomedical PLM built from scratch with a new pre-training framework. This new framework pre-trains eHealth as a discriminator through both token- and sequence-level discrimination. The former is to detect input tokens corrupted by a generator and recover their original identities from plausible candidates, while the latter is to further distinguish corruptions of a same original sequence from those of others. As such, eHealth can learn language semantics at both token and sequence levels. Extensive experiments on 11 Chinese biomedical language understanding tasks of various forms verify the effectiveness and superiority of our approach. We release the pre-trained model at \url{https://github.com/PaddlePaddle/Research/tree/master/KG/eHealth} and will also release the code later.

preprint2022arXiv

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of 8.8% on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method.

preprint2022arXiv

DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training

Due to the limitations of the model structure and pre-training objectives, existing vision-and-language generation models cannot utilize pair-wise images and text through bi-directional generation. In this paper, we propose DU-VLG, a framework which unifies vision-and-language generation as sequence generation problems. DU-VLG is trained with novel dual pre-training tasks: multi-modal denoising autoencoder tasks and modality translation tasks. To bridge the gap between image understanding and generation, we further design a novel commitment loss. We compare pre-training objectives on image captioning and text-to-image generation datasets. Results show that DU-VLG yields better performance than variants trained with uni-directional generation objectives or the variant without the commitment loss. We also obtain higher scores compared to previous state-of-the-art systems on three vision-and-language generation tasks. In addition, human judges further confirm that our model generates real and relevant images as well as faithful and informative captions.

preprint2022arXiv

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

In this paper, we focus on studying robustness evaluation of Chinese question matching. Most of the previous work on analyzing robustness issue focus on just one or a few types of artificial adversarial examples. Instead, we argue that it is necessary to formulate a comprehensive evaluation about the linguistic capabilities of models on natural texts. For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. DuQM contains 3 categories and 13 subcategories with 32 linguistic perturbations. The extensive experiments demonstrate that DuQM has a better ability to distinguish different models. Importantly, the detailed breakdown of evaluation by linguistic phenomenon in DuQM helps us easily diagnose the strength and weakness of different models. Additionally, our experiment results show that the effect of artificial adversarial examples does not work on the natural texts.

preprint2022arXiv

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new state-of-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly apply conventional distillation methods. They fail to consider the particular situation where the teacher and student have different structures. In this paper, we propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher. Extensive experiments are conducted to validate that our proposed solution outperforms strong baselines and establish a new state-of-the-art on open-domain QA benchmarks.

preprint2022arXiv

ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention

Sparse Transformer has recently attracted a lot of attention since the ability for reducing the quadratic dependency on the sequence length. We argue that two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer. This paper proposes a well-designed model named ERNIE-Sparse. It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information. (ii) Self-Attention Regularization (SAR) method, a novel regularization designed to minimize the distance for transformers with different attention topologies. To evaluate the effectiveness of ERNIE-Sparse, we perform extensive evaluations. Firstly, we perform experiments on a multi-modal long sequence modeling task benchmark, Long Range Arena (LRA). Experimental results demonstrate that ERNIE-Sparse significantly outperforms a variety of strong baseline methods including the dense attention and other efficient sparse attention methods and achieves improvements by 2.77% (57.78% vs. 55.01%). Secondly, to further show the effectiveness of our method, we pretrain ERNIE-Sparse and verified it on 3 text classification and 2 QA downstream tasks, achieve improvements on classification benchmark by 0.83% (92.46% vs. 91.63%), on QA benchmark by 3.24% (74.67% vs. 71.43%). Experimental results continue to demonstrate its superior performance.

preprint2022arXiv

Exploring Contextual Word-level Style Relevance for Unsupervised Style Transfer

Unsupervised style transfer aims to change the style of an input sentence while preserving its original content without using parallel training data. In current dominant approaches, owing to the lack of fine-grained control on the influence from the target style,they are unable to yield desirable output sentences. In this paper, we propose a novel attentional sequence-to-sequence (Seq2seq) model that dynamically exploits the relevance of each output word to the target style for unsupervised style transfer. Specifically, we first pretrain a style classifier, where the relevance of each input word to the original style can be quantified via layer-wise relevance propagation. In a denoising auto-encoding manner, we train an attentional Seq2seq model to reconstruct input sentences and repredict word-level previously-quantified style relevance simultaneously. In this way, this model is endowed with the ability to automatically predict the style relevance of each output word. Then, we equip the decoder of this model with a neural style component to exploit the predicted wordlevel style relevance for better style transfer. Particularly, we fine-tune this model using a carefully-designed objective function involving style transfer, style relevance consistency, content preservation and fluency modeling loss terms. Experimental results show that our proposed model achieves state-of-the-art performance in terms of both transfer accuracy and content preservation.

preprint2022arXiv

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models. This advancement has resulted in more fluent, coherent and even properties controllable (e.g. stylistic, sentiment, length etc.) generation, naturally leading to development in downstream tasks such as abstractive summarization, dialogue generation, machine translation, and data-to-text generation. However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge, which makes the performance of text generation unsatisfactory for practical applications in many real-world scenarios. Many studies on analysis, evaluation, and optimization methods for faithfulness problems have been proposed for various tasks, but have not been organized, compared and discussed in a combined manner. In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods. We organize the evaluation and optimization methods for different tasks into a unified taxonomy to facilitate comparison and learning across tasks. Several research trends are discussed further.

preprint2022arXiv

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements.

preprint2022arXiv

Long Time No See! Open-Domain Conversation with Long-Term Persona Memory

Most of the open-domain dialogue models tend to perform poorly in the setting of long-term human-bot conversations. The possible reason is that they lack the capability of understanding and memorizing long-term dialogue history information. To address this issue, we present a novel task of Long-term Memory Conversation (LeMon) and then build a new dialogue dataset DuLeMon and a dialogue generation framework with Long-Term Memory (LTM) mechanism (called PLATO-LTM). This LTM mechanism enables our system to accurately extract and continuously update long-term persona memory without requiring multiple-session dialogue datasets for model training. To our knowledge, this is the first attempt to conduct real-time dynamic management of persona information of both parties, including the user and the bot. Results on DuLeMon indicate that PLATO-LTM can significantly outperform baselines in terms of long-term dialogue consistency, leading to better dialogue engagingness.

preprint2022arXiv

Magnetic frustration in the cubic double perovskite Ba2NiIrO6

Hybrid transition metal oxides continue to attract attention due to their multiple degrees of freedom ($e.g.$, lattice, charge, spin, and orbital) and versatile properties. Here we investigate the magnetic and electronic properties of the newly synthesized double perovskite Ba$_2$NiIrO$_6$, using crystal field theory, superexchange model analysis, density functional calculations, and parallel tempering Monte Carlo (PTMC) simulations. Our results indicate that Ba$_2$NiIrO$_6$ has the Ni$^{2+}$ ($t_{2g}^{6}e_{g}^{2}$)-Ir$^{6+}$ ($t_{2g}^{3}$) charge states. The first nearest-neighboring (1NN) Ni$^{2+}$-Ir$^{6+}$ ions prefer a ferromagnetic (FM) coupling as expected from the Goodenough-Kanamori-Anderson rules, which contradicts the experimental antiferromagnetic (AF) order in Ba$_2$NiIrO$_6$. We find that the strong 2NN AF couplings are frustrated in the fcc sublattices, and they play a major role in determining the observed AF ground state. We also prove that the $J_{\rm eff}$ = 3/2 and $J_{\rm eff}$ = 1/2 states induced by spin-orbit coupling, which would be manifested in low-dimensional (e.g., layered) iridates, are however not the case for cubic Ba$_2$NiIrO$_6$. Our PTMC simulations show that when the long-range (2NN and 3NN) AF interactions are included, an AF transition with $T_{\rm N}$ = 66 K would be obtained and it is well comparable with the experimental 51 K. Meanwhile, we propose a possible 2$\times$2$\times$2 noncollinear AF structure for Ba$_2$NiIrO$_6$.

preprint2022arXiv

PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation

Despite recent progress of pre-trained language models on generating fluent text, existing methods still suffer from incoherence problems in long-form text generation tasks that require proper content control and planning to form a coherent high-level logical flow. In this work, we propose PLANET, a novel generation framework leveraging autoregressive self-attention mechanism to conduct content planning and surface realization dynamically. To guide the generation of output sentences, our framework enriches the Transformer decoder with latent representations to maintain sentence-level semantic plans grounded by bag-of-words. Moreover, we introduce a new coherence-based contrastive learning objective to further improve the coherence of output. Extensive experiments are conducted on two challenging long-form text generation tasks including counterargument generation and opinion article generation. Both automatic and human evaluations show that our method significantly outperforms strong baselines and generates more coherent texts with richer contents.

preprint2022arXiv

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

As the first session-level Chinese dataset, CHASE contains two separate parts, i.e., 2,003 sessions manually constructed from scratch (CHASE-C), and 3,456 sessions translated from English SParC (CHASE-T). We find the two parts are highly discrepant and incompatible as training and evaluation data. In this work, we present SeSQL, yet another large-scale session-level text-to-SQL dataset in Chinese, consisting of 5,028 sessions all manually constructed from scratch. In order to guarantee data quality, we adopt an iterative annotation workflow to facilitate intense and in-time review of previous-round natural language (NL) questions and SQL queries. Moreover, by completing all context-dependent NL questions, we obtain 27,012 context-independent question/SQL pairs, allowing SeSQL to be used as the largest dataset for single-round multi-DB text-to-SQL parsing. We conduct benchmark session-level text-to-SQL parsing experiments on SeSQL by employing three competitive session-level parsers, and present detailed analysis.

preprint2022arXiv

Spin-Orbital States and Strong Antiferromagnetism of Layered Eu$_2$SrFe$_2$O$_6$ and Sr$_3$Fe$_2$O$_4$Cl$_2$

The insulating iron compounds Eu$_2$SrFe$_2$O$_6$ and Sr$_3$Fe$_2$O$_4$Cl$_2$ have high-temperature antiferromagnetic (AF) order despite their different layered structures. Here we carry out density functional calculations and Monte Carlo simulations to study their electronic structures and magnetic properties aided with analyses of the crystal field, magnetic anisotropy, and superexchange. We find that both compounds are Mott insulators and in the high-spin (HS) Fe$^{2+}$ state ($S$ = 2) accompanied by the weakened crystal field. Although they have different local coordination and crystal fields, the Fe$^{2+}$ ions have the same level sequence and ground-state configuration $(3z^2-r^2)^2(xz,yz)^2(xy)^1(x^2-y^2)^1$. Then, the multiorbital superexchange produces strong AF couplings, and the $(3z^2-r^2)/(xz,yz)$ mixing via the spin-orbit coupling (SOC) yields a small in-plane orbital moment and anisotropy. Indeed, by tracing a set of different spin-orbital states, our density functional calculations confirm the strong AF couplings and the easy planar magnetization for both compounds. Moreover, using the derived magnetic parameters, our Monte Carlo simulations give the Néel temperature $T_{\rm N}$ = 420 K (372 K) for the former (the latter), which well reproduce the experimental results. Therefore, the present study provides a unified picture for Eu$_2$SrFe$_2$O$_6$ and Sr$_3$Fe$_2$O$_4$Cl$_2$ concerning their electronic and magnetic properties.

preprint2022arXiv

Towards Boosting the Open-Domain Chatbot with Human Feedback

Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of Chinese pre-trained dialogue models.

preprint2022arXiv

Towards Multi-Turn Empathetic Dialogs with Positive Emotion Elicitation

Emotional support is a crucial skill for many real-world scenarios, including caring for the elderly, mental health support, and customer service chats. This paper presents a novel task of empathetic dialog generation with positive emotion elicitation to promote users' positive emotions, similar to that of emotional support between humans. In this task, the agent conducts empathetic responses along with the target of eliciting the user's positive emotions in the multi-turn dialog. To facilitate the study of this task, we collect a large-scale emotional dialog dataset with positive emotion elicitation, called PosEmoDial (about 820k dialogs, 3M utterances). In these dialogs, the agent tries to guide the user from any possible initial emotional state, e.g., sadness, to a positive emotional state. Then we present a positive-emotion-guided dialog generation model with a novel loss function design. This loss function encourages the dialog model to not only elicit positive emotions from users but also ensure smooth emotional transitions along with the whole dialog. Finally, we establish benchmark results on PosEmoDial, and we will release this dataset and related source code to facilitate future studies.

preprint2022arXiv

Unified Structure Generation for Universal Information Extraction

Information extraction suffers from its varying targets, heterogeneous structures, and demand-specific schemas. In this paper, we propose a unified text-to-structure generation framework, namely UIE, which can universally model different IE tasks, adaptively generate targeted structures, and collaboratively learn general IE abilities from different knowledge sources. Specifically, UIE uniformly encodes different extraction structures via a structured extraction language, adaptively generates target extractions via a schema-based prompt mechanism - structural schema instructor, and captures the common IE abilities via a large-scale pre-trained text-to-structure model. Experiments show that UIE achieved the state-of-the-art performance on 4 IE tasks, 13 datasets, and on all supervised, low-resource, and few-shot settings for a wide range of entity, relation, event and sentiment extraction tasks and their unification. These results verified the effectiveness, universality, and transferability of UIE.

preprint2022arXiv

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.

preprint2022arXiv

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e. text or image) or limited multi-modal data (i.e. image-text pairs). In this work, we propose a unified-modal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections can be utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space over a corpus of image-text pairs. As the non-paired single-modal data is very rich, our model can utilize much larger scale of data to learn more generalizable representations. Moreover, the textual knowledge and visual knowledge can enhance each other in the unified semantic space. The experimental results show that UNIMO significantly improves the performance of several single-modal and multi-modal downstream tasks. Our code and pre-trained models are public at the UNIMO project page https://unimo-ptm.github.io/

preprint2022arXiv

Unique electronic state in ferromagnetic semiconductor FeCl$_{2}$ monolayer

Two-dimensional (2D) van der Waals (vdW) magnetic materials could be an ideal platform for ultracompact spintronic applications. Among them, FeCl$_{2}$ monolayer in the triangular lattice is subject to a strong debate. Thus, we critically examine its spin-orbital state, electronic structure, and magnetic properties, using a set of delicate first-principles calculations, crystal field level analyses, and Monte Carlo simulations. Our work reveals that FeCl$_{2}$ monolayer is a ferromagnetic (FM) semiconductor in which the electron correlation of the narrow Fe $3d$ bands determines the band gap of about 1.2 eV. Note that only when the spin-orbit coupling (SOC) is properly handled, the unique $d$$^{5\uparrow}$$l$$^\downarrow_{z+}$ electronic ground state is achieved. Then, both the orbital and spin contributions (0.59 $μ_{\rm B}$ plus 3.56 $μ_{\rm B}$) to the total magnetic moment well account for, for the first time, the experimental perpendicular moment of 4.3 $μ_{\rm B}$/Fe. Moreover, we find that a compressive strain further stabilizes the $d$$^{5\uparrow}$$l$$^\downarrow_{z+}$ ground state, and that the enhanced magnetic anisotropy and exchange coupling would boost the Curie temperature ($T_{\rm C}$) from 25 K for the pristine FeCl$_{2}$ monolayer to 69-102 K under 3$\%$-5$\%$ compressive strain. Therefore, FeCl$_{2}$ monolayer is indeed an appealing 2D FM semiconductor.

preprint2022arXiv

Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals

Most dialog systems posit that users have figured out clear and specific goals before starting an interaction. For example, users have determined the departure, the destination, and the travel time for booking a flight. However, in many scenarios, limited by experience and knowledge, users may know what they need, but still struggle to figure out clear and specific goals by determining all the necessary slots. In this paper, we identify this challenge and make a step forward by collecting a new human-to-human mixed-type dialog corpus. It contains 5k dialog sessions and 168k utterances for 4 dialog types and 5 domains. Within each session, an agent first provides user-goal-related knowledge to help figure out clear and specific goals, and then help achieve them. Furthermore, we propose a mixed-type dialog model with a novel Prompt-based continual learning mechanism. Specifically, the mechanism enables the model to continually strengthen its ability on any specific type by utilizing existing dialog corpora effectively.

preprint2021arXiv

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

Conventional methods for the image-text generation tasks mainly tackle the naturally bidirectional generation tasks separately, focusing on designing task-specific frameworks to improve the quality and fidelity of the generated samples. Recently, Vision-Language Pre-training models have greatly improved the performance of the image-to-text generation tasks, but large-scale pre-training models for text-to-image synthesis task are still under-developed. In this paper, we propose ERNIE-ViLG, a unified generative pre-training framework for bidirectional image-text generation with transformer model. Based on the image quantization models, we formulate both image generation and text generation as autoregressive generative tasks conditioned on the text/image input. The bidirectional image-text generative modeling eases the semantic alignments across vision and language. For the text-to-image generation process, we further propose an end-to-end training method to jointly learn the visual sequence generator and the image reconstructor. To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7.9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

preprint2021arXiv

Learning to Select External Knowledge with Multi-Scale Negative Sampling

The Track-1 of DSTC9 aims to effectively answer user requests or questions during task-oriented dialogues, which are out of the scope of APIs/DB. By leveraging external knowledge resources, relevant information can be retrieved and encoded into the response generation for these out-of-API-coverage queries. In this work, we have explored several advanced techniques to enhance the utilization of external knowledge and boost the quality of response generation, including schema guided knowledge decision, negatives enhanced knowledge selection, and knowledge grounded response generation. To evaluate the performance of our proposed method, comprehensive experiments have been carried out on the publicly available dataset. Our approach was ranked as the best in human evaluation of DSTC9 Track-1.

preprint2021arXiv

Syntactic and Semantic-driven Learning for Open Information Extraction

One of the biggest bottlenecks in building accurate, high coverage neural open IE systems is the need for large labelled corpora. The diversity of open domain corpora and the variety of natural language expressions further exacerbate this problem. In this paper, we propose a syntactic and semantic-driven learning approach, which can learn neural open IE models without any human-labelled data by leveraging syntactic and semantic knowledge as noisier, higher-level supervisions. Specifically, we first employ syntactic patterns as data labelling functions and pretrain a base model using the generated labels. Then we propose a syntactic and semantic-driven reinforcement learning algorithm, which can effectively generalize the base model to open situations with high accuracy. Experimental results show that our approach significantly outperforms the supervised counterparts, and can even achieve competitive performance to supervised state-of-the-art (SoA) model

preprint2020arXiv

CoKE: Contextualized Knowledge Graph Embedding

Knowledge graph embedding, which projects symbolic entities and relations into continuous vector spaces, is gaining increasing attention. Previous methods allow a single static embedding for each entity or relation, ignoring their intrinsic contextual nature, i.e., entities and relations may appear in different graph contexts, and accordingly, exhibit different properties. This work presents Contextualized Knowledge Graph Embedding (CoKE), a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings. Two types of graph contexts are studied: edges and paths, both formulated as sequences of entities and relations. CoKE takes a sequence as input and uses a Transformer encoder to obtain contextualized representations. These representations are hence naturally adaptive to the input, capturing contextual meanings of entities and relations therein. Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction and path query answering. It performs consistently better than, or at least equally well as current state-of-the-art in almost every case, in particular offering an absolute improvement of 21.0% in H@10 on path query answering. Our code is available at \url{https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE}.

preprint2020arXiv

Discovering Dialog Structure Graph for Open-Domain Dialog Generation

Learning interpretable dialog structure from human-human dialogs yields basic insights into the structure of conversation, and also provides background knowledge to facilitate dialog generation. In this paper, we conduct unsupervised discovery of dialog structure from chitchat corpora, and then leverage it to facilitate dialog generation in downstream systems. To this end, we present a Discrete Variational Auto-Encoder with Graph Neural Network (DVAE-GNN), to discover a unified human-readable dialog structure. The structure is a two-layer directed graph that contains session-level semantics in the upper-layer vertices, utterance-level semantics in the lower-layer vertices, and edges among these semantic vertices. In particular, we integrate GNN into DVAE to fine-tune utterance-level semantics for more effective recognition of session-level semantic vertex. Furthermore, to alleviate the difficulty of discovering a large number of utterance-level semantics, we design a coupling mechanism that binds each utterance-level semantic vertex with a distinct phrase to provide prior semantics. Experimental results on two benchmark corpora confirm that DVAE-GNN can discover meaningful dialog structure, and the use of dialog structure graph as background knowledge can facilitate a graph grounded conversational system to conduct coherent multi-turn dialog generation.

preprint2020arXiv

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).

preprint2020arXiv

Leveraging Graph to Improve Abstractive Multi-Document Summarization

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as similarity graph and discourse graph, to more effectively process multiple input documents and produce abstractive summaries. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries. Furthermore, pre-trained language models can be easily combined with our model, which further improve the summarization performance significantly. Empirical results on the WikiSum and MultiNews dataset show that the proposed architecture brings substantial improvements over several strong baselines.

preprint2020arXiv

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

Pre-training models have been proved effective for a wide range of natural language processing tasks. Inspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering. In this framework, we adopt flexible attention mechanisms to fully leverage the bi-directional context and the uni-directional characteristic of language generation. We also introduce discrete latent variables to tackle the inherent one-to-many mapping problem in response generation. Two reciprocal tasks of response generation and latent act recognition are designed and carried out simultaneously within a shared network. Comprehensive experiments on three publicly available datasets verify the effectiveness and superiority of the proposed framework.

preprint2020arXiv

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.

preprint2020arXiv

Towards Conversational Recommendation over Multi-Type Dialogs

We propose a new task of conversational recommendation over multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog (e.g., QA) to a recommendation dialog, taking into account user's interests and feedback. To facilitate the study of this task, we create a human-to-human Chinese dialog dataset \emph{DuRecDial} (about 10k dialogs, 156k utterances), which contains multiple sequential dialogs for every pair of a recommendation seeker (user) and a recommender (bot). In each dialog, the recommender proactively leads a multi-type dialog to approach recommendation targets and then makes multiple recommendations with rich interaction behavior. This dataset allows us to systematically investigate different parts of the overall problem, e.g., how to naturally lead a dialog, how to interact with users for recommendation. Finally we establish baseline results on DuRecDial for future studies. Dataset and codes are publicly available at https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2020-DuRecDial.

preprint2020arXiv

VI3: a 2D Ising ferromagnet

Two-dimensional (2D) magnetic materials are of great current interest for their promising applications in spintronics. Here we propose the van der Waals (vdW) material VI3 to be a 2D Ising ferromagnet (FM), using density functional calculations, crystal field level diagrams, superexchange model analyses, and Monte Carlo simulations. The $a_{1g}$$^1$$e'_{-}$$^1$ ground state in the trigonal crystal field gives rise to the 2D Ising FM due to a significant single ion anisotropy (SIA) and enhanced FM superexchange both associated with the $S_z$=1 and $L_z$=--1 state of V3+ ions. We find that a tensile strain on the VI3 monolayer further stabilizes the $a_{1g}$$^1$$e'_{-}$$^1$ ground state, and its Curie temperature ($T_{\rm C}$) would increase from 70 K to 90-110 K under a 2.5-5\% tensile strain. Moreover, we suggest a group of spin-orbital states with a strong SIA which may help to search more 2D Ising magnets.

preprint2016arXiv

Agreement-based Joint Training for Bidirectional Attention-based Neural Machine Translation

The attentional mechanism has proven to be effective in improving end-to-end neural machine translation. However, due to the intricate structural divergence between natural languages, unidirectional attention-based models might only capture partial aspects of attentional regularities. We propose agreement-based joint training for bidirectional attention-based end-to-end neural machine translation. Instead of training source-to-target and target-to-source translation models independently,our approach encourages the two complementary models to agree on word alignment matrices on the same training data. Experiments on Chinese-English and English-French translation tasks show that agreement-based joint training significantly improves both alignment and translation quality over independent training.

preprint2016arXiv

Minimum Risk Training for Neural Machine Translation

We propose minimum risk training for end-to-end neural machine translation. Unlike conventional maximum likelihood estimation, minimum risk training is capable of optimizing model parameters directly with respect to arbitrary evaluation metrics, which are not necessarily differentiable. Experiments show that our approach achieves significant improvements over maximum likelihood estimation on a state-of-the-art neural machine translation system across various languages pairs. Transparent to architectures, our approach can be applied to more neural networks and potentially benefit more NLP tasks.

preprint2016arXiv

Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Information

With the rapid growth of knowledge bases (KBs) on the web, how to take full advantage of them becomes increasingly important. Knowledge base-based question answering (KB-QA) is one of the most promising approaches to access the substantial knowledge. Meantime, as the neural network-based (NN-based) methods develop, NN-based KB-QA has already achieved impressive results. However, previous work did not put emphasis on question representation, and the question is converted into a fixed vector regardless of its candidate answers. This simple representation strategy is unable to express the proper information of the question. Hence, we present a neural attention-based model to represent the questions dynamically according to the different focuses of various candidate answer aspects. In addition, we leverage the global knowledge inside the underlying KB, aiming at integrating the rich KB information into the representation of the answers. And it also alleviates the out of vocabulary (OOV) problem, which helps the attention model to represent the question more precisely. The experimental results on WEBQUESTIONS demonstrate the effectiveness of the proposed approach.

preprint2016arXiv

Semi-Supervised Learning for Neural Machine Translation

While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the Chinese-English dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems.

preprint2015arXiv

Giant magnetic anisotropy of Co, Ru, and Os adatoms on MgO (001) surface

Large magnetic anisotropy energy (MAE) is desirable and critical for nanoscale magnetic devices. Here, using ligand-field level diagrams and density functional calculations, we well explain the very recent discovery [I. G. Rau et al., Science 344, 988 (2014)] that an individual Co adatom on a MgO (001) surface has a large MAE of more than 60 meV. More importantly, we predict that a giant MAE up to 110 meV could be realized for Ru adatoms on MgO (001), and even more for the Os adatoms (208 meV). This is a joint effect of the special ligand field, orbital multiplet, and significant spin-orbit interaction, in the intermediate-spin state of the Ru or Os adatoms on top of the surface oxygens. The giant MAE could provide a route to atomic scale memory.

preprint2014arXiv

Black phosphorus field-effect transistors

Two-dimensional crystals have emerged as a new class of materials with novel properties that may impact future technologies. Experimentally identifying and characterizing new functional two-dimensional materials in the vast material pool is a tremendous challenge, and at the same time potentially rewarding. In this work, we succeed in fabricating field-effect transistors based on few-layer black phosphorus crystals with thickness down to a few nanometers. Drain current modulation on the order of 10E5 is achieved in samples thinner than 7.5 nm at room temperature, with well-developed current saturation in the IV characteristics, both are important for reliable transistor performance of the device. Sample mobility is also found to be thickness dependent, with the highest value up to ~ 1000 cm2/Vs obtained at thickness ~ 10 nm. Our results demonstrate the potential of black phosphorus thin crystal as a new two-dimensional material for future applications in nano-electronic devices.

preprint2014arXiv

Ferrimagnetism in the double perovskite Ca2FeOsO6: a density functional study

Using density functional calculations, we find that the newly synthesized Ca$_2$FeOsO$_6$ has the high-spin Fe$^{3+}$ ($3d^5$)-Os$^{5+}$ ($5d^3$) state. The octahedral Os$^{5+}$ ion has a large intrinsic exchange splitting, and its $t_{2g\uparrow}^3$ configuration makes the spin-orbit coupling ineffective. Moreover, there is a strong antiferromagnetic (AF) coupling between the neighboring Fe$^{3+}$ ($S$ = 5/2) and Os$^{5+}$ ($S$ = -3/2), but the AF couplings within both the fcc Fe$^{3+}$ and Os$^{5+}$ sublattices are one order of magnitude weaker. Therefore, a magnetic frustration is suppressed and a stable ferrimagnetic (FiM) ground state appears. This FiM order is due to the virtual hopping of the $t_{2g}$ electrons from Os$^{5+}$ ($t_{2g\downarrow}^3$) to Fe$^{3+}$ ($t_{2g\uparrow}^3e_{g\uparrow}^2$). However, if the experimental bended Fe$^{3+}$-O$^{2-}$-Os$^{5+}$ exchange path gets straight, the $e_g$ hopping from Fe$^{3+}$ ($t_{2g\uparrow}^3e_{g\uparrow}^2$) to Os$^{5+}$ ($t_{2g\uparrow}^3$) would be facilitated and then a ferromagnetic (FM) coupling would occur.

preprint2014arXiv

Long-range magnetic interaction and frustration in double perovskite Sr$_{2}$NiIrO$_{6}$

Sr$_{2}$NiIrO$_{6}$ would be a ferromagnetic (FM) insulator in terms of the common superexchange mechanism between the first nearest neighboring (1NN) magnetic ions Ni$^{2+}$ ($t_{2g}^{6}e_{g}^{2}$) and Ir$^{6+}$ ($t_{2g}^{3}$). However, the observed antiferromagnetic (AF) order questions this viewpoint. In this work, we present first-principles calculations and find that while the 1NN Ni$^{2+}$-Ir$^{6+}$ exchange is indeed FM, the 2NN and 3NN couplings in the fcc Ir (and Ni) sublattice are AF. Moreover, the 2NN AF Ir-Ir coupling turns out to be even stronger than the 1NN FM Ni-Ir coupling, thus giving rise to a magnetic frustration. Sr$_{2}$NiIrO$_{6}$ hence becomes a distorted low-temperature antiferromagnet. Naturally, a very similar magnetic property in Sr$_{2}$ZnIrO$_{6}$ can be explained by the frustrated AF coupling in the fcc Ir$^{6+}$ sublattice. This work highlights the long-range magnetic interaction of the delocalized $5d$ electrons, and also addresses why the spin-orbit coupling is ineffective here.

preprint2014arXiv

Rare case of magnetic Ag$^{3+}$ ion: double perovskite Cs$_{2}$KAgF$_{6}$

Normally $4d$ or $5d$ transition metals are in a low-spin state. Here using first-principles calculations, we report on a rare case of a high-spin $S$=1 magnetic state for the Ag$^{3+}$ ion in the double perovskite Cs$_{2}$KAgF$_{6}$. We also explored a possibility of a conventional low-spin $S$=0 ground state and find an associated tetragonal distortion to be 0.29 Å. However, the lattice elastic energy cost and the Hund exchange loss exceed the e$_{g}$ crystal-field energy gain, thus making the low-spin tetragonal structure less favorable than the high-spin cubic structure. We conclude that the compact perovskite structure of Cs$_{2}$KAgF$_{6}$ is an important factor in stabilizing the unusual high-spin ground state of Ag$^{3+}$.

preprint2013arXiv

Geometric Phase Gates with Adiabatic Control in Electron Spin Resonance

High-fidelity quantum operations are a key requirement for fault-tolerant quantum information processing. In electron spin resonance, manipulation of the quantum spin is usually achieved with time-dependent microwave fields. In contrast to the conventional dynamic approach, adiabatic geometric phase operations are expected to be less sensitive to certain kinds of noise and field inhomogeneities. Here, we investigate such phase gates applied to electron spins both through simulations and experiments, showing that the adiabatic geometric phase gate is indeed inherently robust against inhomogeneity in the applied microwave field strength. While only little advantage is offered over error-correcting composite pulses for modest inhomogeneities <=10%, the adiabatic approach reveals its potential for situations where field inhomogeneities are unavoidably large.

preprint2013arXiv

Impact of spin-orbit coupling on the magnetism of Sr3MIrO6 (M = Ni, Co)

Using density functional calculations, we demonstrate that the spin-orbit coupling (SOC) of the Ir4+ ion plays an essential role in determining the antiferromagnetism of the hexagonal spin-chain system Sr3MIrO6 (M = Ni, Co) by tuning the crystal-field level sequence and altering the Ir-M inter-orbital interactions. The SOC splits the e'_{g} doublet of the octahedral Ir4+ ion (t_{2g}^5) in a trigonal crystal field, and the single t_{2g} hole resides on the e'_{g} upper branch and gives rise to the antiferromagnetic superexchange. In absence of the SOC, however, the single t_{2g} hole would occupy the a_{1g} singlet instead, which would mediate an unreal ferromagnetic exchange due to a direct a_{1g} hopping along the Ir-M chain. We also find that the Ni2+ and Co2+ ions are both in a high-spin state and moreover the Co2+ ion carries a huge orbital moment. This work well accounts for the recent experiments and magnifies again the significance of the SOC in iridates.

preprint2012arXiv

Charge order at the frontier between the molecular and solid states in Ba3NaRu2O9

We show that the valence electrons of Ba3NaRu2O9, which has a quasi-molecular structure, completely crystallize below 210 K. Using an extended Hubbard model, we show that the charge ordering instability results from long-range Coulomb interactions. However, orbital ordering, metal-metal bonding and formation of a partial spin gap enforce the magnitude of the charge separation. The striped charge order and frustrated hcp lattice of Ru2O9 dimers lead to competition with a quasi-degenerate charge-melted phase under photo-excitation at low temperature. Our results establish a broad class of simple metal oxides as models for emergent phenomena at the border between the molecular and solid states.

preprint2012arXiv

Is N-doped SrO magnetic? A first-principles view

N-doped SrO seems to be one of the model systems for d^0 magnetism, in which magnetism (or ideally, ferromagnetism) was ascribed to the localized N 2p spins mediated by delocalized O 2p holes. Here we offer a different view, using density functional calculations. We find that N-doped SrO with solely substitutional N impurities as widely assumed in the literature is unstable, and instead that a pairing state of substitutional and interstitial N impurities is significantly more stable and has a much lower formation energy than the former by 6.7 eV. The stable (N_{sub}-N_{int})^{2-} dimers behave like a charged (N_2)^{2-} molecule and have each a molecular spin=1. However, their spin-polarized molecular levels lie well inside the wide band gap of SrO and thus the exchange interaction is negligibly weak. As a consequence, N-doped SrO could not be ferromagnetic but paramagnetic.

preprint2012arXiv

Local correlations, non-local screening, multiplets, and band formation in NiO

We report on a comparative study of the valence band electronic structure of NiO as bulk material and of NiO as impurity in MgO. From the impurity we have been able to determine reliably the parameters which describe the local correlations, thereby establishing the compensated-spin character of the first ionization state or the state created by hole doping. Using bulk-sensitive x-ray photoemission we identify pronounced satellite features in the valence band of bulk NiO which cannot be explained by single-site many body approaches nor by mean field calculations. We infer the presence of screening processes involving local quasi-core states in the valence band and non-local coherent many body states. These processes are strong and the propagation of an extra hole in the valence band of NiO will therefore be accompanied by a range of high energy excitations. This in turn will make the observation of the dispersion relations in the Ni 3d bands difficult, also because the effective band width is no more than 0.25 eV as estimated from multi-site calculations.

preprint2012arXiv

Metal-insulator transition in Sr2-xLaxCoO4 driven by spin-state transition

We sought the origin of the metal-insulator transition in Sr2-xLaxCoO4, using electron-correlation corrected density functional calculations. Our results show that Sr2CoO4 is in an intermediate-spin (IS) state and a strong Co4+ 3d-O 2p hybridization is responsible for its ferromagnetic metallicity. Upon La doping, however, a spin-state transition occurs in Sr1.5La0.5CoO4: IS-Co4+ x 2 + 1e --> LS-Co4+ + HS-Co3+ (LS: low spin; HS: high spin). Then the spin-state transition suppresses an electron hopping via a spin-blockade and gives rise to the insulating behavior of Sr1.5La0.5CoO4. A corresponding superexchange accounts for its ferromagnetism. Thus, spin state could provide a way to tune materials properties.

preprint2011arXiv

Ab initio study of the giant ferroelectric distortion and pressure induced spin-state transition in BiCoO3

Using configuration-state-constrained electronic structure calculations based on the generalized gradient approximation plus Hubbard U method, we sought the origin of the giant tetragonal ferroelectric distortion in the ambient phase of the potentially multiferroic material BiCoO3 and identified the nature of the pressure induced spin-state transition. Our results show that a strong Bi-O covalency drives the giant ferroelectric distortion, which is further stabilized by an xy-type orbital ordering of the high-spin (HS) Co3+ ions. For the orthorhombic phase under 5.8 GPa, we find that a mixed HS and low-spin (LS) state is more stable than both LS and intermediate-spin (IS) states, and that the former well accounts for the available experimental results. Thus, we identify that the pressure induced spin-state transition is via a mixed HS+LS state, and we predict that the HS-to-LS transition would be complete upon a large volume decrease of about 20%.

preprint2011arXiv

Electron spin ensemble strongly coupled to a three-dimensional microwave cavity

We demonstrate the strong coupling between an electron spin ensemble and a three-dimensional cavity in a reflection geometry. We also find that an anticrossing in the cavity/spin spectrum can be observed under conditions that the collective coupling strength $g_c$ is smaller than the spin linewidth $γ_s$ or the cavity linewidth. We identify a ratio of $g_c$ to $γ_s$ ($g_c/γ_s >$ 0.64) as a condition to observe a splitting in the cavity frequency. Finally, we confirm that $g_c$ scales with $\sqrt{N}$, where $N$ is the number of polarized spins.

preprint2011arXiv

Orbital order in La0.5Sr1.5MnO4: beyond a common local Jahn-Teller picture

The standard way to find the orbital occupation of Jahn-Teller (JT) ions is to use structural data, with the assumption of a one-to-one correspondence between the orbital occupation and the associated JT distortion, e.g. in O6 octahedron. We show, however, that this approach in principle does not work for layered systems. Specifically, using the layered manganite La0.5Sr1.5MnO4 as an example, we found from our x-ray absorption measurements and theoretical calculations, that the type of orbital ordering strongly contradicts the standard local distortion approach for the Mn3+O6 octahedra, and that the generally ignored long-range crystal field effect and anisotropic hopping integrals are actually crucial to determine the orbital occupation. Our findings may open a pathway to control of the orbital state in multilayer systems and thus of their physical properties.

preprint2011arXiv

Shear-driven solidification of dilute colloidal suspensions

We show that the shear-induced solidification of dilute charge-stabilized (DLVO) colloids is due to the interplay between the shear-induced formation and breakage of large non-Brownian clusters. While their size is limited by breakage, their number density increases with the shearing-time. Upon flow cessation, the dense packing of clusters interconnects into a rigid state by means of grainy bonds, each involving a large number of primary colloidal bonds. The emerging picture of shear-driven solidification in dilute colloidal suspensions combines the gelation of Brownian systems with the jamming of athermal systems.

preprint2010arXiv

Magnetism in C or N-doped MgO and ZnO: density-functional study of impurity pairs

It is shown that substitution of C or N for O recently proposed as a way to create ferromagnetism in otherwise nonmagnetic oxide insulators is curtailed by formation of impurity pairs, and the resultant C2 spin=1 dimers as well as the isoelectronic N2^{2+} interact antiferromagneticallly in p-type MgO. For C-doped ZnO, however, we demonstrate using the HSE hybrid functional that a resonance of the spin-polarized C2 ppπ* states with the host conduction band results in a long-range ferromagnetic interaction. Magnetism of open-shell impurity molecules is proposed as a possible route to d0-ferromagnetism in oxide spintronic materials.

preprint2010arXiv

Orbital occupation and magnetic moments of tetrahedrally coordinated iron in CaBaFe4O7

CaBaFe4O7 is a mixed-valent transition metal oxide having both Fe2+ and Fe3+ ions in tetrahedral coordination. Here we characterize its magnetic properties by magnetization measurements and investigate its local electronic structure using soft x-ray absorption spectroscopy at the Fe L2,3 edges, in combination with multiplet cluster and spin-resolved band structure calculations. We found that the Fe2+ ion in the unusual tetrahedral coordination is Jahn-Teller active with the high-spin e^2 (up) t2^3 (up) e^1 (down) configuration having a x^2-y^2-like electron for the minority spin. We deduce that there is an appreciable orbital moment of about L_z=0.36 caused by multiplet interactions, thereby explaining the observed magnetic anisotropy. CaBaFe4O7, a member of the '114' oxide family, offers new opportunities to explore charge, orbital and spin physics in transition metal oxides.

preprint2010arXiv

Shear-induced reaction-limited aggregation kinetics of Brownian particles at arbitrary concentrations

The aggregation of interacting Brownian particles in sheared concentrated suspensions is an important issue in colloid and soft matter science per se. Also, it serves as a model to understand biochemical reactions occurring in vivo where both crowding and shear play an important role. We present an effective medium approach within the Smoluchowski equation with shear which allows one to calculate the encounter kinetics through a potential barrier under shear at arbitrary colloid concentrations. Experiments on a model colloidal system in simple shear flow support the validity of the model in the range considered. By generalizing Kramers' rate theory to the presence of collective hydrodynamics, our model explains the significant increase in the shear-induced reaction-limited aggregation kinetics upon increasing the colloid concentration.

preprint2010arXiv

Storage of multiple coherent microwave excitations in an electron spin ensemble

Electron and nuclear spins have good coherence times and an ensemble of spins is a promising candidate for a quantum memory. By employing holographic techniques via field gradients a single ensemble may be used to store many bits of information. Here we present a coherent memory using a pulsed magnetic field gradient, and demonstrate the storage and retrieval of up to 100 weak 10 GHz coherent excitations in collective states of an electron spin ensemble. We further show that such collective excitations in the electron spin can then be stored in nuclear spin states, which offer coherence times in excess of seconds.

preprint2010arXiv

The spin states of Co ions in La1.5Ca0.5CoO4 from first-principles

The spin states and electronic structure of layered perovskite La1.5Ca0.5CoO4 are investigated using fullpotential linearized augmented plane-wave method. All the computational results indicate that the Co2+ ion is in a high-spin state and the Co3+ in a low-spin state. The Co2+ t2g orbitals with a small crystal-field splitting are mixed by spin-orbit coupling, which accounts for the observed easy in-plane magnetism. The nonmagnetic LS-Co3+ state, which is stabilized by a strong crystal field, provides a natural explanation for the observed low magnetic ordering temperature and a spin-blockade phenomenon of the electron hopping. Furthermore, we find that the intermediate-spin state of Co3+ has a large multiplet splitting. But the lowest-lying IS state of Co3+ is still higher in energy than the LS ground state by a few hundred millielectron volts and the HS state of Co3+ is even less stable, both in sharp contrast to a recent experimental study which suggested the HS+IS mixed Co3+ ground state. We note that either the IS-Co3+ or HS-Co3+ states or their mixture would produce a wrong out-of-plane magnetic anisotropy and a much higher magneticordering temperature than observed. Thus, the present work sheds light on this material concerning its electronic and magnetic structure, and it would stimulate different experiments to settle this intriguing spin-state issue.

preprint2010arXiv

Theory of activated-rate processes under shear with application to shear-induced aggregation of colloids

Using a novel approximation scheme within the convective diffusion (two body Smoluchowski) equation framework, we unveil the shear-driven aggregation mechanism at the origin of structure-formation in sheared colloidal systems. The theory, verified against numerics and experiments, explains the induction time followed by explosive (irreversible) rise of viscosity observed in charge-stabilized colloidal and protein systems under steady shear. The Arrhenius-type equation with shear derived here, extending Kramers theory in the presence of shear, is the first analytical result clearly showing the important role of shear-drive in activated-rate processes as they are encountered in soft condensed matter.

preprint2009arXiv

Elasticity of arrested short-ranged attractive colloids: homogeneous and heterogeneous glasses

We evaluate the elasticity of arrested short-ranged attractive colloids by combining an analytically solvable elastic model with a hierarchical arrest scheme into a new approach, which allows to discriminate the microscopic (primary particle-level) from the mesoscopic (cluster-level) contribution to the macroscopic shear modulus. The results quantitatively predict experimental data in a wide range of volume fractions and indicate in which cases the relevant contribution is due to mesoscopic structures. On this basis we propose that different arrested states of short-ranged attractive colloids can be meaningfully distinguished as homogeneous or heterogeneous colloidal glasses in terms of the length-scale which controls their elastic behavior.

preprint2005arXiv

Electronic structure of $RE$AuMg and $RE$AgMg ($RE$ = Eu, Gd, Yb)

We have investigated the electronic structure of the equiatomic EuAuMg, GdAuMg, YbAuMg and GdAgMg intermetallics using x-ray photoelectron spectroscopy. The spectra revealed that the Yb and Eu are divalent while the Gd is trivalent. The spectral weight in the vicinity of the Fermi level is dominated by the mix of Mg $s$, Au/Ag $sp$ and $RE$ $spd$ bands, and not by the $RE$ $4f$. We also found that the Au and Ag $d$ bands are extraordinarily narrow, as if the noble metal atoms were impurities submerged in a low density $sp$ metal host. The experimental results were compared with band structure calculations, and we found good agreement provided that the spin-orbit interaction in the Au an Ag $d$ bands is included and correlation effects in an open $4f$ shell are accounted for using the local density approximation + Hubbard $U$ scheme. Nevertheless, limitations of such a mean-field scheme to explain excitation spectra are also evident.

preprint1994arXiv

Ballistic transport: A view from the quantum theory of motion

Ballistic transport of electrons through a quantum wire with a constriction is studied in terms of Bohm's interpretation of quantum mechanics, in which the concept of a particle orbit is permitted. The classical bouncing ball trajectories, which justify the name ``ballistic transport'', are established in the large wave number limit. The formation and the vital role of quantum vortices is investigated.

Hua Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

67 published item(s)

Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Universal Information Extraction as Unified Semantic Matching

An Interpretability Evaluation Benchmark for Pre-trained Language Models

Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention

Exploring Contextual Word-level Style Relevance for Unsupervised Style Transfer

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Long Time No See! Open-Domain Conversation with Long-Term Persona Memory

Magnetic frustration in the cubic double perovskite Ba2NiIrO6

PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

Spin-Orbital States and Strong Antiferromagnetism of Layered Eu$_2$SrFe$_2$O$_6$ and Sr$_3$Fe$_2$O$_4$Cl$_2$

Towards Boosting the Open-Domain Chatbot with Human Feedback

Towards Multi-Turn Empathetic Dialogs with Positive Emotion Elicitation

Unified Structure Generation for Universal Information Extraction

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Unique electronic state in ferromagnetic semiconductor FeCl$_{2}$ monolayer

Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

Learning to Select External Knowledge with Multi-Scale Negative Sampling

Syntactic and Semantic-driven Learning for Open Information Extraction

CoKE: Contextualized Knowledge Graph Embedding

Discovering Dialog Structure Graph for Open-Domain Dialog Generation

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Leveraging Graph to Improve Abstractive Multi-Document Summarization

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

Towards Conversational Recommendation over Multi-Type Dialogs

VI3: a 2D Ising ferromagnet

Agreement-based Joint Training for Bidirectional Attention-based Neural Machine Translation

Minimum Risk Training for Neural Machine Translation

Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Information

Semi-Supervised Learning for Neural Machine Translation

Giant magnetic anisotropy of Co, Ru, and Os adatoms on MgO (001) surface

Black phosphorus field-effect transistors

Ferrimagnetism in the double perovskite Ca2FeOsO6: a density functional study

Long-range magnetic interaction and frustration in double perovskite Sr$_{2}$NiIrO$_{6}$

Rare case of magnetic Ag$^{3+}$ ion: double perovskite Cs$_{2}$KAgF$_{6}$

Geometric Phase Gates with Adiabatic Control in Electron Spin Resonance

Impact of spin-orbit coupling on the magnetism of Sr3MIrO6 (M = Ni, Co)

Charge order at the frontier between the molecular and solid states in Ba3NaRu2O9

Is N-doped SrO magnetic? A first-principles view

Local correlations, non-local screening, multiplets, and band formation in NiO

Metal-insulator transition in Sr2-xLaxCoO4 driven by spin-state transition

Ab initio study of the giant ferroelectric distortion and pressure induced spin-state transition in BiCoO3

Electron spin ensemble strongly coupled to a three-dimensional microwave cavity

Orbital order in La0.5Sr1.5MnO4: beyond a common local Jahn-Teller picture

Shear-driven solidification of dilute colloidal suspensions

Magnetism in C or N-doped MgO and ZnO: density-functional study of impurity pairs

Orbital occupation and magnetic moments of tetrahedrally coordinated iron in CaBaFe4O7

Shear-induced reaction-limited aggregation kinetics of Brownian particles at arbitrary concentrations

Storage of multiple coherent microwave excitations in an electron spin ensemble

The spin states of Co ions in La1.5Ca0.5CoO4 from first-principles

Theory of activated-rate processes under shear with application to shear-induced aggregation of colloids

Elasticity of arrested short-ranged attractive colloids: homogeneous and heterogeneous glasses

Electronic structure of $RE$AuMg and $RE$AgMg ($RE$ = Eu, Gd, Yb)

Ballistic transport: A view from the quantum theory of motion