Source author record

Haifeng Wang

Haifeng Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

58works

33topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR

Current role-playing agents (RPAs) are typically constructed by imitating surface-level behaviors, but this approach lacks internal cognitive consistency, often causing out-of-character errors in complex situations. To address this, we propose Character-R1, a framework designed to provide comprehensive verifiable reward signals for effective role-aware reasoning, which are missing in recent studies. Specifically, our framework comprises three core designs: (1) Cognitive Focus Reward, which enforces explicit label-based analysis of 10 character elements (e.g., worldview) to structure internal cognition; (2) Reference-Guided Reward, which utilizes overlap-based metrics with reference responses as optimization anchors to enhance exploration and performance; and (3) Character-Conditioned Reward Normalization, which adjusts reward distributions based on character categories to ensure robust optimization across heterogeneous roles. Extensive experiments demonstrate that Character-R1 significantly outperforms existing methods in knowledge, memory and others.

preprint2026arXiv

Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models

Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while others like Llama yield limited improvements. Complementing data-centric approaches, we reveal that this disparity reflects a hidden structural property: \textbf{distributional clarity} in probability space. Through a three-stage analysis-from phenomenon to mechanism to interpretation-we uncover that RL-friendly models exhibit intra-class compactness and inter-class separation in their probability assignments to correct vs. incorrect responses. We quantify this clarity using the \textbf{Silhouette Coefficient} ($S$) and demonstrate that (1) high $S$ correlates strongly with RL performance; (2) low $S$ is associated with severe logic errors and reasoning instability. To confirm this property, we introduce a Silhouette-Aware Reweighting strategy that prioritizes low-$S$ samples during training. Experiments across six mathematical benchmarks show consistent improvements across all model families, with gains up to 5.9 points on AIME24. Our work establishes distributional clarity as a fundamental, trainable property underlying RL-Friendliness.

preprint2026arXiv

DocOS: Towards Proactive Document-Guided Actions in GUI Agents

While Graphical User Interface (GUI) agents have shown promising performance in automated device interaction, they primarily depend on static parametric knowledge from pre-training or instruction tuning. This reliance fundamentally limits their ability to handle long-tailed tasks that require explicit procedural knowledge absent from model parameters, often forcing agents to resort to inefficient and brittle trial-and-error exploration. To mitigate this limitation, we introduce \textbf{Proactive Document-Guided Action} for GUI agents in dynamic, open-web environments, a novel paradigm that mirrors human problem-solving by enabling agents to autonomously search for relevant documentation to resolve long-tailed tasks. To evaluate agents' capability in this paradigm, we propose \textbf{DocOS}, a benchmark designed to assess document-guided problem solving in fully interactive environments. DocOS requires agents to autonomously navigate a web browser, locate relevant online documentation, comprehend procedural instructions, and faithfully ground them into executable GUI actions. Extensive experiments reveal that progress is strictly constrained by dual bottlenecks: agents struggle to reliably locate relevant information during proactive search and frequently fail to faithfully ground retrieved instructions into precise actions, pointing toward document-guided interaction as a crucial pathway for enabling self-evolving GUI agents in dynamic environments.

preprint2026arXiv

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

Extending the input modality of Large Language Models~(LLMs) to the audio domain is essential for achieving comprehensive multimodal perception. However, it is well-known that acoustic information is intrinsically \textit{heterogeneous}, entangling attributes such as speech, music, and environmental context. Existing research is limited to a dense, parameter-shared adapter to model these diverse patterns, which induces \textit{gradient conflict} during optimization, as parameter updates required for distinct attributes contradict each other. To address this limitation, we introduce the \textit{\textbf{MoE-Adapter}}, a sparse Mixture-of-Experts~(MoE) architecture designed to decouple acoustic information. Specifically, it employs a dynamic gating mechanism that routes audio tokens to specialized experts capturing complementary feature subspaces while retaining shared experts for global context, thereby mitigating gradient conflicts and enabling fine-grained feature learning. Comprehensive experiments show that the MoE-Adapter achieves superior performance on both audio semantic and paralinguistic tasks, consistently outperforming dense linear baselines with comparable computational costs. Furthermore, we will release the related code and models to facilitate future research.

preprint2026arXiv

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization

Pretraining large language models (LLMs) with next-token prediction has led to remarkable advances, yet the context-dependent nature of token embeddings in such models results in high intra-class variance and inter-class similarity, thus hindering the efficiency of representation learning. While similarity-based regularization has demonstrated benefit in supervised fine-tuning and classification tasks, its application and efficacy in large-scale LLM pretraining remains underexplored. In this work, we propose the SimReg, an embedding similarity regularization loss that explicitly encourages token representations with the same ground-truth label within each sequence to be more similar, while enforcing separation from different-label tokens via a contrastive loss. Our analysis reveals that this mechanism introduces gains by enlarging multi-classification margins, thereby enabling more efficient classification. Extensive experiments across dense and Mixture-of-Experts (MoE) architectures demonstrate that SimReg consistently accelerates training convergence by over 30% and improves average zero-shot downstream performance by over 1% across standard benchmarks. Further ablation studies and analyses offer practical insights into hyperparameter tuning and loss effectiveness.

preprint2026arXiv

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally intensive and difficult to scale. In this work, we introduce VideoAR, the first large-scale Visual Autoregressive (VAR) framework for video generation that combines multi-scale next-frame prediction with autoregressive modeling. VideoAR disentangles spatial and temporal dependencies by integrating intra-frame VAR modeling with causal next-frame prediction, supported by a 3D multi-scale tokenizer that efficiently encodes spatio-temporal dynamics. To improve long-term consistency, we propose Multi-scale Temporal RoPE, Cross-Frame Error Correction, and Random Frame Mask, which collectively mitigate error propagation and stabilize temporal coherence. Our multi-stage pretraining pipeline progressively aligns spatial and temporal learning across increasing resolutions and durations. Empirically, VideoAR achieves new state-of-the-art results among autoregressive models, improving FVD on UCF-101 from 99.5 to 88.6 while reducing inference steps by over 10x, and reaching a VBench score of 81.74-competitive with diffusion-based models an order of magnitude larger. These results demonstrate that VideoAR narrows the performance gap between autoregressive and diffusion paradigms, offering a scalable, efficient, and temporally consistent foundation for future video generation research.

preprint2023arXiv

Revisiting mass estimates of the Milky Way

We use the rotation curve from Gaia data release (DR) 3 to estimate the mass of the Milky Way. We consider an Einasto density profile to model the dark matter component. We extrapolate and obtain a dynamical mass $M=2.75^{+3.11}_{-0.48}\times 10^{11} M_\odot$ at $112$ kpc. This lower-mass Milky Way is consistent with the significant declining rotation curve, and can provide new insights into our Galaxy and halo inhabitants.

preprint2022arXiv

Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance. It consists of two procedures: bidirectional pretraining and unidirectional finetuning. Both procedures utilize SimCut, a simple regularization method that forces the consistency between the output distributions of the original and the cutoff sentence pairs. Without leveraging extra dataset via back-translation or integrating large-scale pretrained model, Bi-SimCut achieves strong translation performance across five translation benchmarks (data sizes range from 160K to 20.2M): BLEU scores of 31.16 for en -> de and 38.37 for de -> en on the IWSLT14 dataset, 30.78 for en -> de and 35.15 for de -> en on the WMT14 dataset, and 27.17 for zh -> en on the WMT17 dataset. SimCut is not a new method, but a version of Cutoff (Shen et al., 2020) simplified and adapted for NMT, and it could be considered as a perturbation-based method. Given the universality and simplicity of SimCut and Bi-SimCut, we believe they can serve as strong baselines for future NMT research.

preprint2022arXiv

Broadening and redward asymmetry of H$α$ line profiles observed by LAMOST during a stellar flare on an M-type star

Stellar flares are characterized by sudden enhancement of electromagnetic radiation in stellar atmospheres. So far much of our understanding of stellar flares comes from photometric observations, from which plasma motions in flare regions could not be detected. From the spectroscopic data of LAMOST DR7, we have found one stellar flare that is characterized by an impulsive increase followed by a gradual decrease in the H$α$ line intensity on an M4-type star, and the total energy radiated through H$α$ is estimated to be on the order of $10^{33}$ erg. The H$α$ line appears to have a Voigt profile during the flare, which is likely caused by Stark pressure broadening due to the dramatic increase of electron density and/or opacity broadening due to the occurrence of strong non-thermal heating. Obvious enhancement has been identified at the red wing of the H$α$ line profile after the impulsive increase of the H$α$ line intensity. The red wing enhancement corresponds to plasma moving away from the Earth at a velocity of 100$-$200 km s$^{-1}$. According to the current knowledge of solar flares, this red wing enhancement may originate from: (1) flare-driven coronal rain, (2) chromospheric condensation, or (3) a filament/prominence eruption that either with a non-radial backward propagation or with strong magnetic suppression. The total mass of the moving plasma is estimated to be on the order of $10^{15}$ kg.

preprint2022arXiv

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

Pre-trained language models (PLMs), such as BERT and GPT, have revolutionized the field of NLP, not only in the general domain but also in the biomedical domain. Most prior efforts in building biomedical PLMs have resorted simply to domain adaptation and focused mainly on English. In this work we introduce eHealth, a Chinese biomedical PLM built from scratch with a new pre-training framework. This new framework pre-trains eHealth as a discriminator through both token- and sequence-level discrimination. The former is to detect input tokens corrupted by a generator and recover their original identities from plausible candidates, while the latter is to further distinguish corruptions of a same original sequence from those of others. As such, eHealth can learn language semantics at both token and sequence levels. Extensive experiments on 11 Chinese biomedical language understanding tasks of various forms verify the effectiveness and superiority of our approach. We release the pre-trained model at \url{https://github.com/PaddlePaddle/Research/tree/master/KG/eHealth} and will also release the code later.

preprint2022arXiv

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of 8.8% on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method.

preprint2022arXiv

DuETA: Traffic Congestion Propagation Pattern Modeling via Efficient Graph Learning for ETA Prediction at Baidu Maps

Estimated time of arrival (ETA) prediction, also known as travel time estimation, is a fundamental task for a wide range of intelligent transportation applications, such as navigation, route planning, and ride-hailing services. To accurately predict the travel time of a route, it is essential to take into account both contextual and predictive factors, such as spatial-temporal interaction, driving behavior, and traffic congestion propagation inference. The ETA prediction models previously deployed at Baidu Maps have addressed the factors of spatial-temporal interaction (ConSTGAT) and driving behavior (SSML). In this work, we focus on modeling traffic congestion propagation patterns to improve ETA performance. Traffic congestion propagation pattern modeling is challenging, and it requires accounting for impact regions over time and cumulative effect of delay variations over time caused by traffic events on the road network. In this paper, we present a practical industrial-grade ETA prediction framework named DuETA. Specifically, we construct a congestion-sensitive graph based on the correlations of traffic patterns, and we develop a route-aware graph transformer to directly learn the long-distance correlations of the road segments. This design enables DuETA to capture the interactions between the road segment pairs that are spatially distant but highly correlated with traffic conditions. Extensive experiments are conducted on large-scale, real-world datasets collected from Baidu Maps. Experimental results show that ETA prediction can significantly benefit from the learned traffic congestion propagation patterns. In addition, DuETA has already been deployed in production at Baidu Maps, serving billions of requests every day. This demonstrates that DuETA is an industrial-grade and robust solution for large-scale ETA prediction services.

preprint2022arXiv

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

In this paper, we focus on studying robustness evaluation of Chinese question matching. Most of the previous work on analyzing robustness issue focus on just one or a few types of artificial adversarial examples. Instead, we argue that it is necessary to formulate a comprehensive evaluation about the linguistic capabilities of models on natural texts. For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. DuQM contains 3 categories and 13 subcategories with 32 linguistic perturbations. The extensive experiments demonstrate that DuQM has a better ability to distinguish different models. Importantly, the detailed breakdown of evaluation by linguistic phenomenon in DuQM helps us easily diagnose the strength and weakness of different models. Additionally, our experiment results show that the effect of artificial adversarial examples does not work on the natural texts.

preprint2022arXiv

ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Pre-trained models (PTMs) have become a fundamental backbone for downstream tasks in natural language processing and computer vision. Despite initial gains that were obtained by applying generic PTMs to geo-related tasks at Baidu Maps, a clear performance plateau over time was observed. One of the main reasons for this plateau is the lack of readily available geographic knowledge in generic PTMs. To address this problem, in this paper, we present ERNIE-GeoL, which is a geography-and-language pre-trained model designed and developed for improving the geo-related tasks at Baidu Maps. ERNIE-GeoL is elaborately designed to learn a universal representation of geography-language by pre-training on large-scale data generated from a heterogeneous graph that contains abundant geographic knowledge. Extensive quantitative and qualitative experiments conducted on large-scale real-world datasets demonstrate the superiority and effectiveness of ERNIE-GeoL. ERNIE-GeoL has already been deployed in production at Baidu Maps since April 2021, which significantly benefits the performance of various downstream tasks. This demonstrates that ERNIE-GeoL can serve as a fundamental backbone for a wide range of geo-related tasks.

preprint2022arXiv

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new state-of-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly apply conventional distillation methods. They fail to consider the particular situation where the teacher and student have different structures. In this paper, we propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher. Extensive experiments are conducted to validate that our proposed solution outperforms strong baselines and establish a new state-of-the-art on open-domain QA benchmarks.

preprint2022arXiv

ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention

Sparse Transformer has recently attracted a lot of attention since the ability for reducing the quadratic dependency on the sequence length. We argue that two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer. This paper proposes a well-designed model named ERNIE-Sparse. It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information. (ii) Self-Attention Regularization (SAR) method, a novel regularization designed to minimize the distance for transformers with different attention topologies. To evaluate the effectiveness of ERNIE-Sparse, we perform extensive evaluations. Firstly, we perform experiments on a multi-modal long sequence modeling task benchmark, Long Range Arena (LRA). Experimental results demonstrate that ERNIE-Sparse significantly outperforms a variety of strong baseline methods including the dense attention and other efficient sparse attention methods and achieves improvements by 2.77% (57.78% vs. 55.01%). Secondly, to further show the effectiveness of our method, we pretrain ERNIE-Sparse and verified it on 3 text classification and 2 QA downstream tasks, achieve improvements on classification benchmark by 0.83% (92.46% vs. 91.63%), on QA benchmark by 3.24% (74.67% vs. 71.43%). Experimental results continue to demonstrate its superior performance.

preprint2022arXiv

Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Evolutionary game theory has been a successful tool to combine classical game theory with learning-dynamical descriptions in multiagent systems. Provided some symmetric structures of interacting players, many studies have been focused on using a simplified heuristic payoff table as input to analyse the dynamics of interactions. Nevertheless, even for the state-of-the-art method, there are two limits. First, there is inaccuracy when analysing the simplified payoff table. Second, no existing work is able to deal with 2-population multiplayer asymmetric games. In this paper, we fill the gap between heuristic payoff table and dynamic analysis without any inaccuracy. In addition, we propose a general framework for $m$ versus $n$ 2-population multiplayer asymmetric games. Then, we compare our method with the state-of-the-art in some classic games. Finally, to illustrate our method, we perform empirical game-theoretical analysis on Wolfpack as well as StarCraft II, both of which involve complex multiagent interactions.

preprint2022arXiv

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements.

preprint2022arXiv

K-UNN: k-Space Interpolation With Untrained Neural Network

Recently, untrained neural networks (UNNs) have shown satisfactory performances for MR image reconstruction on random sampling trajectories without using additional full-sampled training data. However, the existing UNN-based approach does not fully use the MR image physical priors, resulting in poor performance in some common scenarios (e.g., partial Fourier, regular sampling, etc.) and the lack of theoretical guarantees for reconstruction accuracy. To bridge this gap, we propose a safeguarded k-space interpolation method for MRI using a specially designed UNN with a tripled architecture driven by three physical priors of the MR images (or k-space data), including sparsity, coil sensitivity smoothness, and phase smoothness. We also prove that the proposed method guarantees tight bounds for interpolated k-space data accuracy. Finally, ablation experiments show that the proposed method can more accurately characterize the physical priors of MR images than existing traditional methods. Additionally, under a series of commonly used sampling trajectories, experiments also show that the proposed method consistently outperforms traditional parallel imaging methods and existing UNNs, and even outperforms the state-of-the-art supervised-trained k-space deep learning methods in some cases.

preprint2022arXiv

Long Time No See! Open-Domain Conversation with Long-Term Persona Memory

Most of the open-domain dialogue models tend to perform poorly in the setting of long-term human-bot conversations. The possible reason is that they lack the capability of understanding and memorizing long-term dialogue history information. To address this issue, we present a novel task of Long-term Memory Conversation (LeMon) and then build a new dialogue dataset DuLeMon and a dialogue generation framework with Long-Term Memory (LTM) mechanism (called PLATO-LTM). This LTM mechanism enables our system to accurately extract and continuously update long-term persona memory without requiring multiple-session dialogue datasets for model training. To our knowledge, this is the first attempt to conduct real-time dynamic management of persona information of both parties, including the user and the bot. Results on DuLeMon indicate that PLATO-LTM can significantly outperform baselines in terms of long-term dialogue consistency, leading to better dialogue engagingness.

preprint2022arXiv

Multi-Weight Respecification of Scan-specific Learning for Parallel Imaging

Parallel imaging is widely used in magnetic resonance imaging as an acceleration technology. Traditional linear reconstruction methods in parallel imaging often suffer from noise amplification. Recently, a non-linear robust artificial-neural-network for k-space interpolation (RAKI) exhibits superior noise resilience over other linear methods. However, RAKI performs poorly at high acceleration rates, and needs a large amount of autocalibration signals as the training samples. In order to tackle these issues, we propose a multi-weight method that implements multiple weighting matrices on the undersampled data, named as MW-RAKI. Enforcing multiple weighted matrices on the measurements can effectively reduce the influence of noise and increase the data constraints. Furthermore, we incorporate the strategy of multiple weighting matrixes into a residual version of RAKI, and form MW-rRAKI.Experimental compari-sons with the alternative methods demonstrated noticeably better reconstruction performances, particularly at high acceleration rates.

preprint2022arXiv

Rice Diseases Detection and Classification Using Attention Based Neural Network and Bayesian Optimization

In this research, an attention-based depthwise separable neural network with Bayesian optimization (ADSNN-BO) is proposed to detect and classify rice disease from rice leaf images. Rice diseases frequently result in 20 to 40 \% corp production loss in yield and is highly related to the global economy. Rapid disease identification is critical to plan treatment promptly and reduce the corp losses. Rice disease diagnosis is still mainly performed manually. To achieve AI assisted rapid and accurate disease detection, we proposed the ADSNN-BO model based on MobileNet structure and augmented attention mechanism. Moreover, Bayesian optimization method is applied to tune hyper-parameters of the model. Cross-validated classification experiments are conducted based on a public rice disease dataset with four categories in total. The experimental results demonstrate that our mobile compatible ADSNN-BO model achieves a test accuracy of 94.65\%, which outperforms all of the state-of-the-art models tested. To check the interpretability of our proposed model, feature analysis including activation map and filters visualization approach are also conducted. Results show that our proposed attention-based mechanism can more effectively guide the ADSNN-BO model to learn informative features. The outcome of this research will promote the implementation of artificial intelligence for fast plant disease diagnosis and control in the agricultural field.

preprint2022arXiv

Searching for multiple populations in star clusters using the China Space Station Telescope

Multiple stellar populations (MPs) in most star clusters older than 2 Gyr, as seen by lots of spectroscopic and photometric studies, have led to a significant challenge to the traditional view of star formation. In this field, space-based instruments, in particular the Hubble Space Telescope (HST), have made a breakthrough as they significantly improved the efficiency of detecting MPs in crowding stellar fields by images. The China Space Station Telescope (CSST) and the HST are sensitive to a similar wavelength interval, but it covers a field of view which is about 5-8 times wider than that of HST. One of its instruments, the Multi-Channel Imager (MCI), will have multiple filters covering a wide wavelength range from NUV to NIR, making the CSST a potentially powerful tool for studying MPs in clusters. In this work, we evaluate the efficiency of the designed filters for the MCI/CSST in revealing MPs in different color-magnitude diagrams (CMDs). We find that CMDs made with MCI/CSST photometry in appropriate UV filters are powerful tools to disentangle stellar populations with different abundances of He, C, N, O and Mg. On the contrary, the traditional CMDs are blind to multiple populations in globular clusters (GCs). We show that CSST has the potential of being the spearhead instrument for investigating MPs in GCs in the next decades.

preprint2022arXiv

Self-Score: Self-Supervised Learning on Score-Based Models for MRI Reconstruction

Recently, score-based diffusion models have shown satisfactory performance in MRI reconstruction. Most of these methods require a large amount of fully sampled MRI data as a training set, which, sometimes, is difficult to acquire in practice. This paper proposes a fully-sampled-data-free score-based diffusion model for MRI reconstruction, which learns the fully sampled MR image prior in a self-supervised manner on undersampled data. Specifically, we first infer the fully sampled MR image distribution from the undersampled data by Bayesian deep learning, then perturb the data distribution and approximate their probability density gradient by training a score function. Leveraging the learned score function as a prior, we can reconstruct the MR image by performing conditioned Langevin Markov chain Monte Carlo (MCMC) sampling. Experiments on the public dataset show that the proposed method outperforms existing self-supervised MRI reconstruction methods and achieves comparable performances with the conventional (fully sampled data trained) score-based diffusion methods.

preprint2022arXiv

Semantic Similarity Computing Model Based on Multi Model Fine-Grained Nonlinear Fusion

Natural language processing (NLP) task has achieved excellent performance in many fields, including semantic understanding, automatic summarization, image recognition and so on. However, most of the neural network models for NLP extract the text in a fine-grained way, which is not conducive to grasp the meaning of the text from a global perspective. To alleviate the problem, the combination of the traditional statistical method and deep learning model as well as a novel model based on multi model nonlinear fusion are proposed in this paper. The model uses the Jaccard coefficient based on part of speech, Term Frequency-Inverse Document Frequency (TF-IDF) and word2vec-CNN algorithm to measure the similarity of sentences respectively. According to the calculation accuracy of each model, the normalized weight coefficient is obtained and the calculation results are compared. The weighted vector is input into the fully connected neural network to give the final classification results. As a result, the statistical sentence similarity evaluation algorithm reduces the granularity of feature extraction, so it can grasp the sentence features globally. Experimental results show that the matching of sentence similarity calculation method based on multi model nonlinear fusion is 84%, and the F1 value of the model is 75%.

preprint2022arXiv

Towards Boosting the Open-Domain Chatbot with Human Feedback

Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of Chinese pre-trained dialogue models.

preprint2022arXiv

Towards Multi-Turn Empathetic Dialogs with Positive Emotion Elicitation

Emotional support is a crucial skill for many real-world scenarios, including caring for the elderly, mental health support, and customer service chats. This paper presents a novel task of empathetic dialog generation with positive emotion elicitation to promote users' positive emotions, similar to that of emotional support between humans. In this task, the agent conducts empathetic responses along with the target of eliciting the user's positive emotions in the multi-turn dialog. To facilitate the study of this task, we collect a large-scale emotional dialog dataset with positive emotion elicitation, called PosEmoDial (about 820k dialogs, 3M utterances). In these dialogs, the agent tries to guide the user from any possible initial emotional state, e.g., sadness, to a positive emotional state. Then we present a positive-emotion-guided dialog generation model with a novel loss function design. This loss function encourages the dialog model to not only elicit positive emotions from users but also ensure smooth emotional transitions along with the whole dialog. Finally, we establish benchmark results on PosEmoDial, and we will release this dataset and related source code to facilitate future studies.

preprint2022arXiv

Understanding the Impact of the COVID-19 Pandemic on Transportation-related Behaviors with Human Mobility Data

The constrained outbreak of COVID-19 in Mainland China has recently been regarded as a successful example of fighting this highly contagious virus. Both the short period (in about three months) of transmission and the sub-exponential increase of confirmed cases in Mainland China have proved that the Chinese authorities took effective epidemic prevention measures, such as case isolation, travel restrictions, closing recreational venues, and banning public gatherings. These measures can, of course, effectively control the spread of the COVID-19 pandemic. Meanwhile, they may dramatically change the human mobility patterns, such as the daily transportation-related behaviors of the public. To better understand the impact of COVID-19 on transportation-related behaviors and to provide more targeted anti-epidemic measures, we use the huge amount of human mobility data collected from Baidu Maps, a widely-used Web mapping service in China, to look into the detail reaction of the people there during the pandemic. To be specific, we conduct data-driven analysis on transportation-related behaviors during the pandemic from the perspectives of 1) means of transportation, 2) type of visited venues, 3) check-in time of venues, 4) preference on "origin-destination" distance, and 5) "origin-transportation-destination" patterns. For each topic, we also give our specific insights and policy-making suggestions. Given that the COVID-19 pandemic is still spreading in more than 200 countries and territories worldwide, infecting millions of people, the insights and suggestions provided here may help fight COVID-19.

preprint2022arXiv

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.

preprint2022arXiv

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e. text or image) or limited multi-modal data (i.e. image-text pairs). In this work, we propose a unified-modal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections can be utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space over a corpus of image-text pairs. As the non-paired single-modal data is very rich, our model can utilize much larger scale of data to learn more generalizable representations. Moreover, the textual knowledge and visual knowledge can enhance each other in the unified semantic space. The experimental results show that UNIMO significantly improves the performance of several single-modal and multi-modal downstream tasks. Our code and pre-trained models are public at the UNIMO project page https://unimo-ptm.github.io/

preprint2022arXiv

Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals

Most dialog systems posit that users have figured out clear and specific goals before starting an interaction. For example, users have determined the departure, the destination, and the travel time for booking a flight. However, in many scenarios, limited by experience and knowledge, users may know what they need, but still struggle to figure out clear and specific goals by determining all the necessary slots. In this paper, we identify this challenge and make a step forward by collecting a new human-to-human mixed-type dialog corpus. It contains 5k dialog sessions and 168k utterances for 4 dialog types and 5 domains. Within each session, an agent first provides user-goal-related knowledge to help figure out clear and specific goals, and then help achieve them. Furthermore, we propose a mixed-type dialog model with a novel Prompt-based continual learning mechanism. Specifically, the mechanism enables the model to continually strengthen its ability on any specific type by utilizing existing dialog corpora effectively.

preprint2021arXiv

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

Conventional methods for the image-text generation tasks mainly tackle the naturally bidirectional generation tasks separately, focusing on designing task-specific frameworks to improve the quality and fidelity of the generated samples. Recently, Vision-Language Pre-training models have greatly improved the performance of the image-to-text generation tasks, but large-scale pre-training models for text-to-image synthesis task are still under-developed. In this paper, we propose ERNIE-ViLG, a unified generative pre-training framework for bidirectional image-text generation with transformer model. Based on the image quantization models, we formulate both image generation and text generation as autoregressive generative tasks conditioned on the text/image input. The bidirectional image-text generative modeling eases the semantic alignments across vision and language. For the text-to-image generation process, we further propose an end-to-end training method to jointly learn the visual sequence generator and the image reconstructor. To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7.9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

preprint2021arXiv

Learning to Select External Knowledge with Multi-Scale Negative Sampling

The Track-1 of DSTC9 aims to effectively answer user requests or questions during task-oriented dialogues, which are out of the scope of APIs/DB. By leveraging external knowledge resources, relevant information can be retrieved and encoded into the response generation for these out-of-API-coverage queries. In this work, we have explored several advanced techniques to enhance the utilization of external knowledge and boost the quality of response generation, including schema guided knowledge decision, negatives enhanced knowledge selection, and knowledge grounded response generation. To evaluate the performance of our proposed method, comprehensive experiments have been carried out on the publicly available dataset. Our approach was ranked as the best in human evaluation of DSTC9 Track-1.

preprint2021arXiv

Role of a fractal shape of the inclusions on acoustic attenuation in a nanocomposite

Nanophononic materials are promising to control the transport of sound in the GHz range and heat in the THz range. Here we are interested in the influence of a dendritic shape of inclusion on acoustic attenuation. We investigate a Finite Element numerical simulation of the transient propagation of an acoustic wave-packet in 2D nanophononic materials with circular or dendritic inclusions periodically distributed in matrix. By measuring the penetration length, diffusivity, and instantaneous wave velocity, we find that the multi-branching tree-like form of dendrites provides a continuous source of phonon-interface scattering leading to an increasing acoustic attenuation. When the wavelength is far less than the inter-inclusion distance, we report a strong attenuation process in the dendritic case which can be fitted by a compressed exponential function with $β>1$.

preprint2020arXiv

An Investigation of Containment Measures Against the COVID-19 Pandemic in Mainland China

As the recent COVID-19 outbreak rapidly expands all over the world, various containment measures have been carried out to fight against the COVID-19 pandemic. In Mainland China, the containment measures consist of three types, i.e., Wuhan travel ban, intra-city quarantine and isolation, and inter-city travel restriction. In order to carry out the measures, local economy and information acquisition play an important role. In this paper, we investigate the correlation of local economy and the information acquisition on the execution of containment measures to fight against the COVID-19 pandemic in Mainland China. First, we use a parsimonious model, i.e., SIR-X model, to estimate the parameters, which represent the execution of intra-city quarantine and isolation in major cities of Mainland China. In order to understand the execution of intra-city quarantine and isolation, we analyze the correlation between the representative parameters including local economy, mobility, and information acquisition. To this end, we collect the data of Gross Domestic Product (GDP), the inflows from Wuhan and outflows, and the COVID-19 related search frequency from a widely-used Web mapping service, i.e., Baidu Maps, and Web search engine, i.e., Baidu Search Engine, in Mainland China. Based on the analysis, we confirm the strong correlation between the local economy and the execution of information acquisition in major cities of Mainland China. We further evidence that, although the cities with high GDP per capita attracts bigger inflows from Wuhan, people are more likely to conduct the quarantine measure and to reduce going out to other cities. Finally, the correlation analysis using search data shows that well-informed individuals are likely to carry out containment measures.

preprint2020arXiv

CoKE: Contextualized Knowledge Graph Embedding

Knowledge graph embedding, which projects symbolic entities and relations into continuous vector spaces, is gaining increasing attention. Previous methods allow a single static embedding for each entity or relation, ignoring their intrinsic contextual nature, i.e., entities and relations may appear in different graph contexts, and accordingly, exhibit different properties. This work presents Contextualized Knowledge Graph Embedding (CoKE), a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings. Two types of graph contexts are studied: edges and paths, both formulated as sequences of entities and relations. CoKE takes a sequence as input and uses a Transformer encoder to obtain contextualized representations. These representations are hence naturally adaptive to the input, capturing contextual meanings of entities and relations therein. Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction and path query answering. It performs consistently better than, or at least equally well as current state-of-the-art in almost every case, in particular offering an absolute improvement of 21.0% in H@10 on path query answering. Our code is available at \url{https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE}.

preprint2020arXiv

Deep Low-rank Prior in Dynamic MR Imaging

The deep learning methods have achieved attractive performance in dynamic MR cine imaging. However, all of these methods are only driven by the sparse prior of MR images, while the important low-rank (LR) prior of dynamic MR cine images is not explored, which limits the further improvements on dynamic MR reconstruction. In this paper, a learned singular value thresholding (Learned-SVT) operation is proposed to explore deep low-rank prior in dynamic MR imaging for obtaining improved reconstruction results. In particular, we come up with two novel and distinct schemes to introduce the learnable low-rank prior into deep network architectures in an unrolling manner and a plug-and-play manner respectively. In the unrolling manner, we put forward a model-based unrolling sparse and low-rank network for dynamic MR imaging, dubbed SLR-Net. The SLR-Net is defined over a deep network flow graph, which is unrolled from the iterative procedures in the Iterative Shrinkage-Thresholding Algorithm (ISTA) for optimizing a sparse and low-rank based dynamic MRI model. In the plug-and-play manner, we present a plug-and-play LR network module that can be easily embedded into any other dynamic MR neural networks without changing the network paradigm. Experimental results show that both schemes can further improve the state-of-the-art CS methods, such as k-t SLR, and sparsity-driven deep learning-based methods, such as DC-CNN and CRNN, both qualitatively and quantitatively.

preprint2020arXiv

Discovering Dialog Structure Graph for Open-Domain Dialog Generation

Learning interpretable dialog structure from human-human dialogs yields basic insights into the structure of conversation, and also provides background knowledge to facilitate dialog generation. In this paper, we conduct unsupervised discovery of dialog structure from chitchat corpora, and then leverage it to facilitate dialog generation in downstream systems. To this end, we present a Discrete Variational Auto-Encoder with Graph Neural Network (DVAE-GNN), to discover a unified human-readable dialog structure. The structure is a two-layer directed graph that contains session-level semantics in the upper-layer vertices, utterance-level semantics in the lower-layer vertices, and edges among these semantic vertices. In particular, we integrate GNN into DVAE to fine-tune utterance-level semantics for more effective recognition of session-level semantic vertex. Furthermore, to alleviate the difficulty of discovering a large number of utterance-level semantics, we design a coupling mechanism that binds each utterance-level semantic vertex with a distinct phrase to provide prior semantics. Experimental results on two benchmark corpora confirm that DVAE-GNN can discover meaningful dialog structure, and the use of dialog structure graph as background knowledge can facilitate a graph grounded conversational system to conduct coherent multi-turn dialog generation.

preprint2020arXiv

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).

preprint2020arXiv

Fast Cadzow's Algorithm and a Gradient Variant

The Cadzow's algorithm is a signal denoising and recovery method which was designed for signals corresponding to low rank Hankel matrices. In this paper we first introduce a Fast Cadzow's algorithm which is developed by incorporating a novel subspace projection to reduce the high computational cost of the SVD in the Cadzow's algorithm. Then a Gradient method and a Fast Gradient method are proposed to address the non-decreasing MSE issue when applying the Cadzow's or Fast Cadzow's algorithm for signal denoising. Extensive empirical performance comparisons demonstrate that the proposed algorithms can complete the denoising and recovery tasks more efficiently and effectively.

preprint2020arXiv

Leveraging Graph to Improve Abstractive Multi-Document Summarization

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as similarity graph and discourse graph, to more effectively process multiple input documents and produce abstractive summaries. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries. Furthermore, pre-trained language models can be easily combined with our model, which further improve the summarization performance significantly. Empirical results on the WikiSum and MultiNews dataset show that the proposed architecture brings substantial improvements over several strong baselines.

preprint2020arXiv

Mapping the Galactic disk with the LAMOST and Gaia Red clump sample: I: precise distances, masses, ages and 3D velocities of $\sim$ 140000 red clump stars

We present a sample of $\sim$ 140,000 primary red clump (RC) stars of spectral signal-to-noise ratios higher than 20 from the LAMOST Galactic spectroscopic surveys, selected based on their positions in the metallicity-dependent effective temperature--surface gravity and color--metallicity diagrams, supervised by high-quality $Kepler$ asteroseismology data. The stellar masses and ages of those stars are further determined from the LAMOST spectra, using the Kernel Principal Component Analysis method, trained with thousands of RCs in the LAMOST-$Kepler$ fields with accurate asteroseismic mass measurements. The purity and completeness of our primary RC sample are generally higher than 80 per cent. For the mass and age, a variety of tests show typical uncertainties of 15 and 30 per cent, respectively. Using over ten thousand primary RCs with accurate distance measurements from the parallaxes of Gaia DR2, we re-calibrate the $K_{\rm s}$ absolute magnitudes of primary RCs by, for the first time, considering both the metallicity and age dependencies. With the the new calibration, distances are derived for all the primary RCs, with a typical uncertainty of 5--10 per cent, even better than the values yielded by the Gaia parallax measurements for stars beyond 3--4 kpc. The sample covers a significant volume of the Galactic disk of $4 \leq R \leq 16$ kpc, $|Z| \leq 5$ kpc, and $-20 \leq ϕ\leq 50^{\circ}$. Stellar atmospheric parameters, line-of-sight velocities and elemental abundances derived from the LAMOST spectra and proper motions of Gaia DR2 are also provided for the sample stars. Finally, the selection function of the sample is carefully evaluated in the color-magnitude plane for different sky areas. The sample is publicly available.

preprint2020arXiv

Picosecond-precision optical time transfer in free space using flexible binary offset carrier modulation

Free-space optical time transfer that features high precision and flexibility will act a crucial role in near-future ground-to-satellite/inter-satellite clock networks and outdoor timing services. Here we propose a free-space optical flexible-binary-offset-carrier-modulated (FlexBOC-modulated) time transfer method. The utilized FlexBOC modulation could yield a comparative precision, although its occupied bandwidth is tremendously reduced by at least 97.5% compared to optical binary phase modulation. Meanwhile, the adoption of optical techniques eliminates the multi-path effect that is major limit in the current microwave satellite time transfer system. What's more, the time interval measurement avoids a continuous link that may be routinely broken by physical obstructions. For verification, a time transfer experiment with our home-built system between two sites separated by a 30-m free-space path outside the laboratory was conducted. Over a 15 h period, the time deviation is 2.3 ps in a 1-s averaging time, and averages down to 1.0 ps until ~60 s. The fractional frequency instability exhibits 4.0E-12 at a gate time of 1 s, and approaches to 2.6E10-15 at 10000 s.

preprint2020arXiv

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

Pre-training models have been proved effective for a wide range of natural language processing tasks. Inspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering. In this framework, we adopt flexible attention mechanisms to fully leverage the bi-directional context and the uni-directional characteristic of language generation. We also introduce discrete latent variables to tackle the inherent one-to-many mapping problem in response generation. Two reciprocal tasks of response generation and latent act recognition are designed and carried out simultaneously within a shared network. Comprehensive experiments on three publicly available datasets verify the effectiveness and superiority of the proposed framework.

preprint2020arXiv

Positive Contrast Susceptibility MR Imaging Using GPU-based Primal-Dual Algorithm

The susceptibility-based positive contrast MR technique was applied to estimate arbitrary magnetic susceptibility distributions of the metallic devices using a kernel deconvolution algorithm with a regularized L-1 minimization.Previously, the first-order primal-dual (PD) algorithm could provide a faster reconstruction time to solve the L-1 minimization, compared with other methods. Here, we propose to accelerate the PD algorithm of the positive contrast image using the multi-core multi-thread feature of graphics processor units (GPUs). The some experimental results showed that the GPU-based PD algorithm could achieve comparable accuracy of the metallic interventional devices in positive contrast imaging with less computational time. And the GPU-based PD approach was 4~15 times faster than the previous CPU-based scheme.

preprint2020arXiv

Quantifying the Economic Impact of COVID-19 in Mainland China Using Human Mobility Data

To contain the pandemic of coronavirus (COVID-19) in Mainland China, the authorities have put in place a series of measures, including quarantines, social distancing, and travel restrictions. While these strategies have effectively dealt with the critical situations of outbreaks, the combination of the pandemic and mobility controls has slowed China's economic growth, resulting in the first quarterly decline of Gross Domestic Product (GDP) since GDP began to be calculated, in 1992. To characterize the potential shrinkage of the domestic economy, from the perspective of mobility, we propose two new economic indicators: the New Venues Created (NVC) and the Volumes of Visits to Venue (V^3), as the complementary measures to domestic investments and consumption activities, using the data of Baidu Maps. The historical records of these two indicators demonstrated strong correlations with the past figures of Chinese GDP, while the status quo has dramatically changed this year, due to the pandemic. We hereby presented a quantitative analysis to project the impact of the pandemic on economies, using the recent trends of NVC and V^3. We found that the most affected sectors would be travel-dependent businesses, such as hotels, educational institutes, and public transportation, while the sectors that are mandatory to human life, such as workplaces, residential areas, restaurants, and shopping sites, have been recovering rapidly. Analysis at the provincial level showed that the self-sufficient and self-sustainable economic regions, with internal supplies, production, and consumption, have recovered faster than those regions relying on global supply chains.

preprint2020arXiv

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.

preprint2020arXiv

Towards Conversational Recommendation over Multi-Type Dialogs

We propose a new task of conversational recommendation over multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog (e.g., QA) to a recommendation dialog, taking into account user's interests and feedback. To facilitate the study of this task, we create a human-to-human Chinese dialog dataset \emph{DuRecDial} (about 10k dialogs, 156k utterances), which contains multiple sequential dialogs for every pair of a recommendation seeker (user) and a recommender (bot). In each dialog, the recommender proactively leads a multi-type dialog to approach recommendation targets and then makes multiple recommendations with rich interaction behavior. This dataset allows us to systematically investigate different parts of the overall problem, e.g., how to naturally lead a dialog, how to interact with users for recommendation. Finally we establish baseline results on DuRecDial for future studies. Dataset and codes are publicly available at https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2020-DuRecDial.

preprint2020arXiv

Ultra-high Hydrogen Storage Capacity of Holey Graphyne

Holey graphyne (HGY), a novel 2D single-crystalline carbon allotrope, was synthesized most recently by Castro-Stephens coupling reaction. The natural existing uniform periodic holes in the 2D carbon-carbon network demonstrate its tremendous potential application in the area of energy storage. Herein, we conducted density functional theory calculation to predict the hydrogen storage capacity of HGY sheet. It's found the Li-decorated single-layer HGY can serve as a promising candidate for hydrogen storage. Our numerical calculations demonstrate that Li atoms can bind strongly to the HGY sheet without the formation of Li clusters, and each Li atom can anchor four H2 molecules with the average adsorption energy about -0.22 eV/H2. The largest hydrogen storage capacity of the doped HGY sheet can arrive as high as 12.8 wt%, this value largely surpasses the target of the U. S. Department of Energy (9 wt%), showing the Li/HGY complex is an ideal hydrogen storage material at ambient conditions. In addition, we investigate the polarization mechanism of the storage media and and find that the polarization stemed from both the electric field induced by the ionic Li decorated on the HGY and the weak polarized hydrogen molecules dominated the H2 adsorption process.

preprint2017arXiv

Coherent anti-Stokes Raman Scattering Lidar Using Slow Light: A Theoretical Study

We theoretically investigate a scheme in which backward coherent anti-Stokes Raman scattering (CARS) is significantly enhanced by using slow light. Specifically, we reduce the group velocity of the Stokes excitation pulse by introducing a coupling laser that causes electromagnetically induced transparency (EIT). When the Stokes pulse has a spatial length shorter than the CARS wavelength, the backward CARS emission is significantly enhanced. We also investigated the possibility of applying this scheme as a CARS lidar with O2 or N2 as the EIT medium. We found that if nanosecond laser with large pulse energy (>1 J) and a telescope with large aperture (~10 m) are equipped in the lidar system, a CARS lidar could become much more sensitive than a spontaneous Raman lidar.

preprint2017arXiv

In-Plane Anisotropies of Polarized Raman Response and Electrical Conductivity in Layered Tin Selenide

The group IV-VI compound SnSe, with an orthorhombic lattice structure, has recently attracted particular interest due to its unexpectedly low thermal conductivity and high power factor, showing great promise for thermoelectric applications. SnSe displays intriguing anisotropic properties due to the puckered low-symmetry in-plane lattice structure. Low-dimensional materials have potential advantages in improving the efficiency of thermoelectric conversion, due to the increased power factor and decreased thermal conductivity. A complete study of the optical and electrical anisotropies of SnSe nanostructures is a necessary prerequisite in taking advantage of the material properties for high performance devices. Here, we synthesize the single crystal SnSe nanoplates (NPs) by chemical vapor deposition. The angular dependence of the polarized Raman spectra of SnSe NPs shows anomalous anisotropic light-mater interaction. The angle-resolved charge transport of the SnSe NPs expresses a strong anisotropic conductivity behavior. These studies elucidate the anisotropic interactions which will be of use for future ultrathin SnSe in electronic, thermoelectric and optoelectronic devices.

preprint2016arXiv

Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task Learning

Various treebanks have been released for dependency parsing. Despite that treebanks may belong to different languages or have different annotation schemes, they contain syntactic knowledge that is potential to benefit each other. This paper presents an universal framework for exploiting these multi-typed treebanks to improve parsing with deep multi-task learning. We consider two kinds of treebanks as source: the multilingual universal treebanks and the monolingual heterogeneous treebanks. Multiple treebanks are trained jointly and interacted with multi-level parameter sharing. Experiments on several benchmark datasets in various languages demonstrate that our approach can make effective use of arbitrary source treebanks to improve target parsing models.

preprint2016arXiv

Sudden gap-closure across the topological phase transition in Bi$_{2-x}$In$_{x}$Se$_{3}$

The phase transition from a topological insulator to a trivial band insulator is studied by angle-resoled photoemission spectroscopy on Bi$_{2-x}$In$_{x}$Se$_{3}$ single crystals. We first report the complete evolution of the bulk band structures throughout the transition. The robust surface state and the bulk gap size ($\sim$ 0.50 eV) show no significant change upon doping for $x$ = 0.05, 0.10 and 0.175. At $x$ $\geq$ 0.225, the surface state completely disappears and the bulk gap size increases, suggesting a sudden gap-closure and topological phase transition around $x \sim$ 0.175$-$0.225. We discuss the underlying mechanism of the phase transition, proposing that it is governed by the combined effect of spin-orbit coupling and interactions upon band hybridization. Our study provides a new venue to investigate the mechanism of the topological phase transition induced by non-magnetic impurities.

preprint2016arXiv

The Ideal Tensile Strength and Phonon Instability of Borophene

Very recently, two-dimensional(2D) boron sheets (borophene) with rectangular structure has been grown successfully on single crystal Ag(111) substrates.The fabricated boroprene is predicted to have unusual mechanical properties. We performed first-principle calculations to investigate the mechanical properties of the monolayer borophene, including ideal tensile strength and critical strain. It was found that monolayer borophene can withstand stress up to 20.26 N/m and 12.98 N/m in a and b directions, respectively.However, its critical strain was found to be small. In a direction, the critical value is only 8%, which, to the best of our knowledge, is the lowest among all studied 2D materials.Our numerical results show that the tensile strain applied in b direction enhances the bucking height of borophene resulting in an out-of-plane negative Poisson's ratio, which makes the boron sheet show superior mechanical flexibility along b direction.The failure mechanism and phonon instability of monolayer borophene were also explored.

preprint2015arXiv

Accelerate Single-shot Data Acquisitions Using Compressed Sensing and FRONSAC Imaging

Nonlinear spatial encoding magnetic (SEM) fields have been studied to complement multichannel RF encoding and accelerate MRI scans. Published schemes include PatLoc, O-Space, Null Space, 4D-RIO, and others, but the large variety of possible approaches to exploiting nonlinear SEMs remains mostly unexplored. Before, we have presented a new approach, Fast ROtary Nonlinear Spatial ACquisition (FRONSAC) imaging, where the nonlinear fields provide a small rotating perturbation to standard linear trajectories. While FRONSAC encoding greatly improves image quality, at the highest accelerations or weakest FRONSAC fields, some undersampling artifacts remain. However, the under-sampling artifacts that occur with FRONSAC encoding are relatively incoherent and well suited to the compressed sensing (CS) reconstruction. CS provides a sparsity-promoting convex strategy to reconstruct images from highly undersampled datasets. The work presented here combines the benefits of FRONSAC and CS. Simulations illustrate that this combination can further improve image reconstruction with FRONSAC gradients of low amplitudes and frequencies.

preprint2012arXiv

Optimal coherent control of CARS: signal enhancement and background elimination

The ability to enhance resonant signals and eliminate the non-resonant background is analyzed for Coherent Anti-Stokes Raman Scattering (CARS). The analysis is done at a specific frequency as well as for broadband excitation using femtosecond pulse-shaping techniques. An appropriate objective functional is employed to balance resonant signal enhancement against non-resonant background suppression. Optimal enhancement of the signal and minimization of the background can be achieved by shaping the probe pulse alone while keeping the pump and Stokes pulses in transform-limited-form (TLF). In some cases analytical forms for the probe pulse can be found, and numerical simulations are carried out for other circumstances. It is found that a good approximate solution for the optimal pulse in the two-pulse CARS is a superposition of linear and arctangent type phases for the pump. The well-known probe delay method is shown to be a quasi-optimal scheme for background suppression. The results should provide a basis to improve the performance of CARS spectroscopy and microscopy.

preprint2010arXiv

Joint Relay Selection and Link Adaptation for Distributed Beamforming in Regenerative Cooperative Networks

Relay selection enhances the performance of the cooperative networks by selecting the links with higher capacity. Meanwhile link adaptation improves the spectral efficiency of wireless data-centric networks through adapting the modulation and coding schemes (MCS) to the current link condition. In this paper, relay selection is combined with link adaptation for distributed beamforming in a two-hop regenerative cooperative system. A novel signaling mechanism and related optimal algorithms are proposed for joint relay selection and link adaptation. In the proposed scheme, there is no need to feedback the relay selection results to each relay. Instead, by broadcasting the link adaptation results from the destination, each relay will automatically understand whether it is selected or not. The lower and upper bounds of the throughput of the proposed scheme are derived. The analysis and simulation results indicate that the proposed scheme provides synergistic gains compared to the pure relay selection and link adaptation schemes.

preprint2010arXiv

Joint Uplink and Downlink Relay Selection in Cooperative Cellular Networks

We consider relay selection technique in a cooperative cellular network where user terminals act as mobile relays to help the communications between base station (BS) and mobile station (MS). A novel relay selection scheme, called Joint Uplink and Downlink Relay Selection (JUDRS), is proposed in this paper. Specifically, we generalize JUDRS in two key aspects: (i) relay is selected jointly for uplink and downlink, so that the relay selection overhead can be reduced, and (ii) we consider to minimize the weighted total energy consumption of MS, relay and BS by taking into account channel quality and traffic load condition of uplink and downlink. Information theoretic analysis of the diversity-multiplexing tradeoff demonstrates that the proposed scheme achieves full spatial diversity in the quantity of cooperating terminals in this network. And numerical results are provided to further confirm a significant energy efficiency gain of the proposed algorithm comparing to the previous best worse channel selection and best harmonic mean selection algorithms.

Haifeng Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

58 published item(s)

Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR

Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models

DocOS: Towards Proactive Document-Guided Actions in GUI Agents

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Revisiting mass estimates of the Milky Way

Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

Broadening and redward asymmetry of H$α$ line profiles observed by LAMOST during a stellar flare on an M-type star

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

DuETA: Traffic Congestion Propagation Pattern Modeling via Efficient Graph Learning for ETA Prediction at Baidu Maps

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention

Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

K-UNN: k-Space Interpolation With Untrained Neural Network

Long Time No See! Open-Domain Conversation with Long-Term Persona Memory

Multi-Weight Respecification of Scan-specific Learning for Parallel Imaging

Rice Diseases Detection and Classification Using Attention Based Neural Network and Bayesian Optimization

Searching for multiple populations in star clusters using the China Space Station Telescope

Self-Score: Self-Supervised Learning on Score-Based Models for MRI Reconstruction

Semantic Similarity Computing Model Based on Multi Model Fine-Grained Nonlinear Fusion

Towards Boosting the Open-Domain Chatbot with Human Feedback

Towards Multi-Turn Empathetic Dialogs with Positive Emotion Elicitation

Understanding the Impact of the COVID-19 Pandemic on Transportation-related Behaviors with Human Mobility Data

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning

Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

Learning to Select External Knowledge with Multi-Scale Negative Sampling

Role of a fractal shape of the inclusions on acoustic attenuation in a nanocomposite

An Investigation of Containment Measures Against the COVID-19 Pandemic in Mainland China

CoKE: Contextualized Knowledge Graph Embedding

Deep Low-rank Prior in Dynamic MR Imaging

Discovering Dialog Structure Graph for Open-Domain Dialog Generation

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Fast Cadzow's Algorithm and a Gradient Variant

Leveraging Graph to Improve Abstractive Multi-Document Summarization

Mapping the Galactic disk with the LAMOST and Gaia Red clump sample: I: precise distances, masses, ages and 3D velocities of $\sim$ 140000 red clump stars

Picosecond-precision optical time transfer in free space using flexible binary offset carrier modulation

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

Positive Contrast Susceptibility MR Imaging Using GPU-based Primal-Dual Algorithm

Quantifying the Economic Impact of COVID-19 in Mainland China Using Human Mobility Data

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

Towards Conversational Recommendation over Multi-Type Dialogs

Ultra-high Hydrogen Storage Capacity of Holey Graphyne

Coherent anti-Stokes Raman Scattering Lidar Using Slow Light: A Theoretical Study

In-Plane Anisotropies of Polarized Raman Response and Electrical Conductivity in Layered Tin Selenide

Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task Learning

Sudden gap-closure across the topological phase transition in Bi$_{2-x}$In$_{x}$Se$_{3}$

The Ideal Tensile Strength and Phonon Instability of Borophene

Accelerate Single-shot Data Acquisitions Using Compressed Sensing and FRONSAC Imaging

Optimal coherent control of CARS: signal enhancement and background elimination

Joint Relay Selection and Link Adaptation for Distributed Beamforming in Regenerative Cooperative Networks

Joint Uplink and Downlink Relay Selection in Cooperative Cellular Networks