Source author record

Xin Jiang

Xin Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

55works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice

Accurate and effective discrete image tokenization is crucial for long image sequence processing. However, current methods rigidly compress all content at a fixed rate, ignoring the variable information density of images and leading to either redundancy or information loss. Inspired by information entropy, we propose TaTok, a Theoretically grounded adaptive image Tokenization framework. We rigorously identify two key drawbacks in existing methods: information insufficiency when reconstructing images with patch tokens alone, and information redundancy among patch tokens. To address these, we introduce global tokens that model mutual information across patch tokens, and a Dynamic Token Filtering (DTF) algorithm based on cumulative conditional entropy to eliminate redundancy. Experiments confirm TaTok's state-of-the-art performance, delivering a 1.3x gFID improvement and 8.7x inference speedup. By allocating tokens according to information richness, TaTok enables more compressed yet accurate image tokenization, offering valuable insights for future research.

preprint2026arXiv

Predictive and feedback signals differently shape the formation of group-level and individualized language representations

Adults vary greatly in how effectively they learn a new language, but the signals driving the learning processes and individual differences remain unclear. Over seven days, we tracked behavioral learning and collected fMRI data from 102 adults as they learned an artificial language with corrective feedback. We trained matched transformer models with prediction, feedback, or combined objectives and compared their internal representations to brain activity. Representations derived from the prediction-focused model accounted for the largest share of unique neural variance at the group level, despite the human task being feedback-based. Throughout model training, both objectives showed a shift in brain-model alignment from sensory to higher-order language and associative networks, indicating abstraction processing. Conversely, neural patterns related to the feedback model were most useful for predicting individual generalization outcomes on Day 7. These findings support a multi-signal model of adult language learning, in which prediction shapes a common neural learning architecture across learners, whereas feedback-related mechanisms better explain individual differences over time.

preprint2026arXiv

ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning

Tool learning, which allows Large Language Models (LLMs) to leverage external tools for solving complex user tasks, has emerged as a promising avenue for extending model capabilities. However, existing approaches primarily focus on data synthesis for fine-tuning LLMs to invoke tools effectively, largely ignoring how to fully stimulate the potential of the model. In this paper, we propose ToolACE-R, a novel framework that includes both model-aware iterative training and adaptive refinement for tool learning. ToolACE-R features a model-aware iterative training procedure that progressively adjust training samples based on the model's evolving capabilities to maximize its potential. Additionally, it incorporates self-refinement training corpus which emphasizes LLM's ability to iteratively refine their tool calls, optimizing performance without requiring external feedback. Furthermore, we introduce adaptive self-refinement mechanism for efficient test-time scaling, where the trained model can autonomously determine when to stop the process based on iterative self-refinement. We conduct extensive experiments across several benchmark datasets, showing that ToolACE-R achieves competitive performance compared to advanced API-based models. The performance of tool invocation can be further improved efficiently through adaptive self-refinement. These results highlight the effectiveness and generalizability of ToolACE-R, offering a promising direction for more efficient and scalable tool learning.

preprint2025arXiv

Entanglement Entropy of Conformal Field Theory in All Dimensions

We provide a field-theoretic method to calculate entanglement entropy of CFT in all dimensions. This method works for entangling surfaces of arbitrary shape. The formalism manifests a field-theoretic proof of the Ryu-Takayanagi formula.

preprint2025arXiv

Training Report of TeleChat3-MoE

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This technical report mainly presents the underlying training infrastructure that enables reliable and efficient scaling to frontier model sizes. We detail systematic methodologies for operator-level and end-to-end numerical accuracy verification, ensuring consistency across hardware platforms and distributed parallelism strategies. Furthermore, we introduce a suite of performance optimizations, including interleaved pipeline scheduling, attention-aware data scheduling for long-sequence training,hierarchical and overlapped communication for expert parallelism, and DVM-based operator fusion. A systematic parallelization framework, leveraging analytical estimation and integer linear programming, is also proposed to optimize multi-dimensional parallelism configurations. Additionally, we present methodological approaches to cluster-level optimizations, addressing host- and device-bound bottlenecks during large-scale training tasks. These infrastructure advancements yield significant throughput improvements and near-linear scaling on clusters comprising thousands of devices, providing a robust foundation for large-scale language model development on hardware ecosystems.

preprint2024arXiv

Random-coupled Neural Network

Improving the efficiency of current neural networks and modeling them in biological neural systems have become popular research directions in recent years. Pulse-coupled neural network (PCNN) is a well applicated model for imitating the computation characteristics of the human brain in computer vision and neural network fields. However, differences between the PCNN and biological neural systems remain: limited neural connection, high computational cost, and lack of stochastic property. In this study, random-coupled neural network (RCNN) is proposed. It overcomes these difficulties in PCNN's neuromorphic computing via a random inactivation process. This process randomly closes some neural connections in the RCNN model, realized by the random inactivation weight matrix of link input. This releases the computational burden of PCNN, making it affordable to achieve vast neural connections. Furthermore, the image and video processing mechanisms of RCNN are researched. It encodes constant stimuli as periodic spike trains and periodic stimuli as chaotic spike trains, the same as biological neural information encoding characteristics. Finally, the RCNN is applicated to image segmentation, fusion, and pulse shape discrimination subtasks. It is demonstrated to be robust, efficient, and highly anti-noised, with outstanding performance in all applications mentioned above.

preprint2024arXiv

Timelike entanglement entropy and $T\bar{T}$ deformation

In a previous work arXiv:1811.07758 about the $T\bar{T}$ deformed CFT$_2$, from the consistency requirement of the entanglement entropy theory, we found that in addition to the usual spacelike entanglement entropy, a timelike entanglement entropy must be introduced and treated equally. Inspired by the recent explicit constructions of the timelike entanglement entropy and its bulk dual, we provide a comprehensive analysis of the timelike and spacelike entanglement entropies in the $T\bar{T}$ deformed finite size system and finite temperature system. The results confirm our prediction that in the finite size system only the timelike entanglement entropy receives a correction, while in the finite temperature system only the usual spacelike entanglement entropy gets a correction. These findings affirm the necessity of a complete measure including both spacelike and timelike entanglement entropies.

preprint2024arXiv

Timelike entanglement entropy in dS$_3$/CFT$_2$

In the context of dS$_3$/CFT$_2$, we propose a timelike entanglement entropy defined by the renormalization group flow. This timelike entanglement entropy is calculated in CFT by using the Callan-Symanzik equation. We find an exact match between this entanglement entropy and the length of a timelike geodesic connecting two different spacelike surfaces in dS$_3$.The counterpart of this entanglement entropy in AdS$_3$ is a spacelike one, also induced by RG flow and extends all the way into the bulk of AdS$_3$. As a result, in both AdS$_3$/CFT$_2$ and dS$_3$/CFT$_2$, there exist exactly three entanglement entropies, providing precisely sufficient information to reconstruct the three-dimensional bulk geometry.

preprint2022arXiv

AutoBERT-Zero: Evolving BERT Backbone from Scratch

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks. However, the conventional paradigm constructs the backbone by purely stacking the manually designed global self-attention layers, introducing inductive bias and thus leads to sub-optimal. In this work, we make the first attempt to automatically discover novel pre-trained language model (PLM) backbone on a flexible search space containing the most fundamental operations from scratch. Specifically, we propose a well-designed search space which (i) contains primitive math operations in the intra-layer level to explore novel attention structures, and (ii) leverages convolution blocks to be the supplementary for attentions in the inter-layer level to better learn local dependency. To enhance the efficiency for finding promising architectures, we propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm, which optimizes both the search algorithm and evaluation of candidate models. Specifically, we propose Operation-Priority (OP) evolution strategy to facilitate model search via balancing exploration and exploitation. Furthermore, we design a Bi-branch Weight-Sharing (BIWS) training strategy for fast model evaluation. Extensive experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks, proving the architecture's transfer and scaling abilities. Remarkably, AutoBERT-Zero-base outperforms RoBERTa-base (using much more data) and BERT-large (with much larger model size) by 2.4 and 1.4 higher score on GLUE test set.

preprint2022arXiv

Boosting Graph Structure Learning with Dummy Nodes

With the development of graph kernels and graph representation learning, many superior methods have been proposed to handle scalability and oversmoothing issues on graph structure learning. However, most of those strategies are designed based on practical experience rather than theoretical analysis. In this paper, we use a particular dummy node connecting to all existing vertices without affecting original vertex and edge properties. We further prove that such the dummy node can help build an efficient monomorphic edge-to-vertex transform and an epimorphic inverse to recover the original graph back. It also indicates that adding dummy nodes can preserve local and global structures for better graph representation learning. We extend graph kernels and graph neural networks with dummy nodes and conduct experiments on graph classification and subgraph isomorphism matching tasks. Empirical results demonstrate that taking graphs with dummy nodes as input significantly boosts graph structure learning, and using their edge-to-vertex graphs can also achieve similar results. We also discuss the gain of expressive power from the dummy in neural networks.

preprint2022arXiv

CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems

As labeling cost for different modules in task-oriented dialog (ToD) systems is high, a major challenge in practice is to learn different tasks with the least amount of labeled data. Recently, prompting methods over pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD. To better utilize the power of PLMs, this paper proposes Comprehensive Instruction (CINS) that exploits PLMs with extra task-specific instructions. We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD, i.e. intent classification, dialog state tracking, and natural language generation. A sequence-to-sequence model (T5) is adopted to solve these three tasks in a unified framework. Extensive experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data. Empirical results demonstrate that the proposed CINS approach consistently improves techniques that finetune PLMs with raw input or short prompts.

preprint2022arXiv

CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD. We utilize conv-transformer structure to encode input speech in a streaming manner. A coupled cross-attention (CoCA) mechanism is proposed to integrate frame-level acoustic features with encoded reference linguistic features. CoCA also enables our model to perform mispronunciation classification with whole utterances. The proposed model allows system fusion between the streaming output and mispronunciation classification output for further performance enhancement. We evaluate CoCA-MDD on publicly available corpora. CoCA-MDD achieves F1 scores of 57.03% and 60.78% for streaming and fusion modes respectively on L2-ARCTIC. For phone-level pronunciation scoring, CoCA-MDD achieves 0.58 Pearson correlation coefficient (PCC) value on SpeechOcean762.

preprint2022arXiv

Compilable Neural Code Generation with Compiler Feedback

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code generation as text generation, either constrained by grammar structures in decoder, or driven by pre-trained language models on large-scale code corpus (e.g., CodeGPT, PLBART, and CodeT5). However, few of them account for compilability of the generated programs. To improve compilability of the generated programs, this paper proposes COMPCODER, a three-stage pipeline utilizing compiler feedback for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Comprehensive experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 in code completion on average and from 70.3 to 96.2 in text-to-code generation, respectively, when comparing with the state-of-the-art CodeGPT.

preprint2022arXiv

Compression of Generative Pre-trained Language Models via Quantization

The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the demand for model compression. Despite various methods to compress BERT or its variants, there are few attempts to compress generative PLMs, and the underlying difficulty remains unclear. In this paper, we compress generative PLMs by quantization. We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}. Correspondingly, we propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules. Empirical results on various tasks show that our proposed method outperforms the state-of-the-art compression methods on generative PLMs by a clear margin. With comparable performance with the full-precision models, we achieve 14.4x and 13.4x compression rates on GPT-2 and BART, respectively.

preprint2022arXiv

DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling

Incorporating lexical knowledge into deep learning models has been proved to be very effective for sequence labeling tasks. However, previous works commonly have difficulty dealing with large-scale dynamic lexicons which often cause excessive matching noise and problems of frequent updates. In this paper, we propose DyLex, a plug-in lexicon incorporation approach for BERT based sequence labeling tasks. Instead of leveraging embeddings of words in the lexicon as in conventional methods, we adopt word-agnostic tag embeddings to avoid re-training the representation while updating the lexicon. Moreover, we employ an effective supervised lexical knowledge denoising method to smooth out matching noise. Finally, we introduce a col-wise attention based knowledge fusion mechanism to guarantee the pluggability of the proposed framework. Experiments on ten datasets of three tasks show that the proposed framework achieves new SOTA, even with very large scale lexicons.

preprint2022arXiv

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

The recent large-scale vision-language pre-training (VLP) of dual-stream architectures (e.g., CLIP) with a tremendous amount of image-text pair data, has shown its superiority on various multimodal alignment tasks. Despite its success, the resulting models are not capable of multimodal generative tasks due to the weak text encoder. To tackle this problem, we propose to augment the dual-stream VLP model with a textual pre-trained language model (PLM) via vision-language knowledge distillation (VLKD), enabling the capability for multimodal generation. VLKD is pretty data- and computation-efficient compared to the pre-training from scratch. Experimental results show that the resulting model has strong zero-shot performance on multimodal generation tasks, such as open-ended visual question answering and image captioning. For example, it achieves 44.5% zero-shot accuracy on the VQAv2 dataset, surpassing the previous state-of-the-art zero-shot model with $7\times$ fewer parameters. Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.

preprint2022arXiv

Exploring Extreme Parameter Compression for Pre-trained Language Models

Recent work explored the potential of large-scale Transformer-based pre-trained models, especially Pre-trained Language Models (PLMs) in natural language processing. This raises many concerns from various perspectives, e.g., financial costs and carbon emissions. Compressing PLMs like BERT with negligible performance loss for faster inference and cheaper deployment has attracted much attention. In this work, we aim to explore larger compression ratios for PLMs, among which tensor decomposition is a potential but under-investigated one. Two decomposition and reconstruction protocols are further proposed to improve the effectiveness and efficiency during compression. Our compressed BERT with ${1}/{7}$ parameters in Transformer layers performs on-par with, sometimes slightly better than the original BERT in GLUE benchmark. A tiny version achieves $96.7\%$ performance of BERT-base with $ {1}/{48} $ encoder parameters (i.e., less than 2M parameters excluding the embedding layer) and $2.7 \times$ faster on inference. To show that the proposed method is orthogonal to existing compression methods like knowledge distillation, we also explore the benefit of the proposed method on a distilled BERT.

preprint2022arXiv

Gravitational Lensing by Black Holes with Multiple Photon Spheres

We study gravitational lensing of light by hairy black holes, which, in a certain parameter regime, can possess two photon spheres of different size outside the event horizon. In particular, we focus on higher-order images of a point-like light source and a luminous celestial sphere produced by strong gravitational lensing near photon spheres. Two photon spheres usually triple the number of high-order images of a point-like light source. When a hairy black hole is illuminated by a celestial sphere, two photon spheres would give rise to two critical curves in the black hole image, and the smaller critical curve coincides with the shadow edge. In addition to a set of higher-order images of the celestial sphere outside the shadow edge, two more sets of higher-order images are observed inside and outside the larger critical curve, respectively.

preprint2022arXiv

How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis

Recently, there has been a trend to investigate the factual knowledge captured by Pre-trained Language Models (PLMs). Many works show the PLMs' ability to fill in the missing factual words in cloze-style prompts such as "Dante was born in [MASK]." However, it is still a mystery how PLMs generate the results correctly: relying on effective clues or shortcut patterns? We try to answer this question by a causal-inspired analysis that quantitatively measures and evaluates the word-level patterns that PLMs depend on to generate the missing words. We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred. Our analysis shows: (1) PLMs generate the missing factual words more by the positionally close and highly co-occurred words than the knowledge-dependent words; (2) the dependence on the knowledge-dependent words is more effective than the positionally close and highly co-occurred words. Accordingly, we conclude that the PLMs capture the factual knowledge ineffectively because of depending on the inadequate associations.

preprint2022arXiv

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training that better facilitate downstream passage retrieval. We investigate the effectiveness of our approach across a wide range of open-domain QA datasets under zero-shot, few-shot, multi-hop, and out-of-domain scenarios. The experiments show our HLP outperforms the BM25 by up to 7 points as well as other pre-training methods by more than 10 points in terms of top-20 retrieval accuracy under the zero-shot scenario. Furthermore, HLP significantly outperforms other pre-training methods under the other scenarios.

preprint2022arXiv

HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks

The workflow of pretraining and fine-tuning has emerged as a popular paradigm for solving various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of pretrained models growing rapidly, how to perform parameter-efficient fine-tuning has become fairly important for quick transfer learning and deployment. In this paper, we design a novel unified parameter-efficient transfer learning framework that works effectively on both pure language and V&L tasks. In particular, we use a shared hypernetwork that takes trainable hyper-embeddings as input, and outputs weights for fine-tuning different small modules in a pretrained language model, such as tuning the parameters inserted into multi-head attention blocks (i.e., prefix-tuning) and feed-forward blocks (i.e., adapter-tuning). We define a set of embeddings (e.g., layer, block, task and visual embeddings) as the key components to calculate hyper-embeddings, which thus can support both pure language and V&L tasks. Our proposed framework adds fewer trainable parameters in multi-task learning while achieving superior performances and transfer ability compared to state-of-the-art methods. Empirical results on the GLUE benchmark and multiple V&L tasks confirm the effectiveness of our framework on both textual and visual modalities.

preprint2022arXiv

JABER and SABER: Junior and Senior Arabic BERt

Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception. However, we found that previously released Arabic BERT models were significantly under-trained. In this technical report, we present JABER and SABER, Junior and Senior Arabic BERt respectively, our pre-trained language model prototypes dedicated for Arabic. We conduct an empirical study to systematically evaluate the performance of models across a diverse set of existing Arabic NLU tasks. Experimental results show that JABER and SABER achieve state-of-the-art performances on ALUE, a new benchmark for Arabic Language Understanding Evaluation, as well as on a well-established NER benchmark.

preprint2022arXiv

LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework

Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data. Training PLMs has incurred significant cost, but utilizing the few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shot learners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Active learning is integrated into LMTurk to reduce the amount of queries made to PLMs, minimizing the computational cost of running PLM inference passes. Altogether, LMTurk is an important step towards making effective use of current PLMs.

preprint2022arXiv

Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation

Real human conversation data are complicated, heterogeneous, and noisy, from which building open-domain dialogue systems remains a challenging task. In fact, such dialogue data still contains a wealth of information and knowledge, however, they are not fully explored. In this paper, we show existing open-domain dialogue generation methods that memorize context-response paired data with autoregressive or encode-decode language models underutilize the training data. Different from current approaches, using external knowledge, we explore a retrieval-generation training framework that can take advantage of the heterogeneous and noisy training data by considering them as "evidence". In particular, we use BERTScore for retrieval, which gives better qualities of the evidence and generation. Experiments over publicly available datasets demonstrate that our method can help models generate better responses, even such training data are usually impressed as low-quality data. Such performance gain is comparable with those improved by enlarging the training set, even better. We also found that the model performance has a positive correlation with the relevance of the retrieved evidence. Moreover, our method performed well on zero-shot experiments, which indicates that our method can be more robust to real-world data.

preprint2022arXiv

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain dialogue generation model based on a large pre-trained language model (PLM) PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue models trained over a massive amount of dialogue data from scratch, we aim to build a powerful dialogue model with relatively fewer data and computation costs by inheriting valuable language capabilities and knowledge from PLMs. To this end, we train PanGu-Bot from the large PLM PANGU-alpha, which has been proven well-performed on a variety of Chinese natural language tasks. We investigate different aspects of responses generated by PanGu-Bot, including response quality, knowledge, and safety. We show that PanGu-Bot outperforms state-of-the-art Chinese dialogue systems (CDIALGPT (Wang et al., 2020), EVA (Zhou et al., 2021), EVA2.0 (Gu et al., 2022)) w.r.t. the above three aspects. We also demonstrate that PanGu-Bot can be easily deployed to generate emotional responses without further training. Throughout our empirical analysis, we also point out that the PanGu-Bot response quality, knowledge correctness, and safety are still far from perfect, and further explorations are indispensable to building reliable and smart dialogue systems. Our model and code will be available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-Bot soon.

preprint2022arXiv

PERT: A New Solution to Pinyin to Character Conversion Task

Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language and so on. It's usually treated as sequence labelling task and resolved by language model, i.e. n-gram or RNN. However, the low capacity of the n-gram or RNN limits its performance. This paper introduces a new solution named PERT which stands for bidirectional Pinyin Encoder Representations from Transformers. It achieves significant improvement of performance over baselines. Furthermore, we combine PERT with n-gram under a Markov framework, and improve performance further. Lastly, the external lexicon is incorporated into PERT so as to resolve the OOD issue of IME.

preprint2022arXiv

Read before Generate! Faithful Long Form Question Answering with Machine Reading

Long-form question answering (LFQA) aims to generate a paragraph-length answer for a given question. While current work on LFQA using large pre-trained model for generation are effective at producing fluent and somewhat relevant content, one primary challenge lies in how to generate a faithful answer that has less hallucinated content. We propose a new end-to-end framework that jointly models answer generation and machine reading. The key idea is to augment the generation model with fine-grained, answer-related salient information which can be viewed as an emphasis on faithful facts. State-of-the-art results on two LFQA datasets, ELI5 and MS MARCO, demonstrate the effectiveness of our method, in comparison with strong baselines on automatic and human evaluation metrics. A detailed analysis further proves the competency of our methods in generating fluent, relevant, and more faithful answers.

preprint2022arXiv

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language. This work concerns addressing two major problems in existing Arabic PLMs which constraint progress of the Arabic NLU and NLG fields.First, existing Arabic PLMs are not well-explored and their pre-trainig can be improved significantly using a more methodical approach. Second, there is a lack of systematic and reproducible evaluation of these models in the literature. In this work, we revisit both the pre-training and evaluation of Arabic PLMs. In terms of pre-training, we explore improving Arabic LMs from three perspectives: quality of the pre-training data, size of the model, and incorporating character-level information. As a result, we release three new Arabic BERT-style models ( JABER, Char-JABER, and SABER), and two T5-style models (AT5S and AT5B). In terms of evaluation, we conduct a comprehensive empirical study to systematically evaluate the performance of existing state-of-the-art models on ALUE that is a leaderboard-powered benchmark for Arabic NLU tasks, and on a subset of the ARGEN benchmark for Arabic NLG tasks. We show that our models significantly outperform existing Arabic PLMs and achieve a new state-of-the-art performance on discriminative and generative Arabic NLU and NLG tasks. Our models and source code to reproduce of results will be made available shortly.

preprint2022arXiv

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

We introduce a new approach for speech pre-training named SPIRAL which works by learning denoising representation of perturbed data in a teacher-student framework. Specifically, given a speech utterance, we first feed the utterance to a teacher network to obtain corresponding representation. Then the same utterance is perturbed and fed to a student network. The student network is trained to output representation resembling that of the teacher. At the same time, the teacher network is updated as moving average of student's weights over training steps. In order to prevent representation collapse, we apply an in-utterance contrastive loss as pre-training objective and impose position randomization on the input to the teacher. SPIRAL achieves competitive or better results compared to state-of-the-art speech pre-training method wav2vec 2.0, with significant reduction of training cost (80% for BASE model, 65% for LARGE model). Furthermore, we address the problem of noise-robustness that is critical to real-world speech applications. We propose multi-condition pre-training by perturbing the student's input with various types of additive noise. We demonstrate that multi-condition pre-trained SPIRAL models are more robust to noisy speech (9.0% - 13.3% relative word error rate reduction on real noisy test data), compared to applying multi-condition training solely in the fine-tuning stage. Source code is available at https://github.com/huawei-noah/Speech-Backbones/tree/main/SPIRAL.

preprint2022arXiv

UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation

With the rapid increase of multimedia data, a large body of literature has emerged to work on multimodal summarization, the majority of which target at refining salient information from textual and visual modalities to output a pictorial summary with the most relevant images. Existing methods mostly focus on either extractive or abstractive summarization and rely on qualified image captions to build image references. We are the first to propose a Unified framework for Multimodal Summarization grounding on BART, UniMS, that integrates extractive and abstractive objectives, as well as selecting the image output. Specially, we adopt knowledge distillation from a vision-language pretrained model to improve image selection, which avoids any requirement on the existence and quality of image captions. Besides, we introduce a visual guided decoder to better integrate textual and visual modalities in guiding abstractive text generation. Results show that our best model achieves a new state-of-the-art result on a large-scale benchmark dataset. The newly involved extractive objective as well as the knowledge distillation technique are proven to bring a noticeable improvement to the multimodal summarization task.

preprint2021arXiv

Consolidating Kinematic Models to Promote Coordinated Mobile Manipulations

We construct a Virtual Kinematic Chain (VKC) that readily consolidates the kinematics of the mobile base, the arm, and the object to be manipulated in mobile manipulations. Accordingly, a mobile manipulation task is represented by altering the state of the constructed VKC, which can be converted to a motion planning problem, formulated, and solved by trajectory optimization. This new VKC perspective of mobile manipulation allows a service robot to (i) produce well-coordinated motions, suitable for complex household environments, and (ii) perform intricate multi-step tasks while interacting with multiple objects without an explicit definition of intermediate goals. In simulated experiments, we validate these advantages by comparing the VKC-based approach with baselines that solely optimize individual components. The results manifest that VKC-based joint modeling and planning promote task success rates and produce more efficient trajectories.

preprint2021arXiv

Personalized Graph Neural Networks with Attention Mechanism for Session-Aware Recommendation

The problem of session-aware recommendation aims to predict users' next click based on their current session and historical sessions. Existing session-aware recommendation methods have defects in capturing complex item transition relationships. Other than that, most of them fail to explicitly distinguish the effects of different historical sessions on the current session. To this end, we propose a novel method, named Personalized Graph Neural Networks with Attention Mechanism (A-PGNN) for brevity. A-PGNN mainly consists of two components: one is Personalized Graph Neural Network (PGNN), which is used to extract the personalized structural information in each user behavior graph, compared with the traditional Graph Neural Network (GNN) model, which considers the role of the user when the node embeddding is updated. The other is Dot-Product Attention mechanism, which draws on the Transformer net to explicitly model the effect of historical sessions on the current session. Extensive experiments conducted on two real-world data sets show that A-PGNN evidently outperforms the state-of-the-art personalized session-aware recommendation methods.

preprint2020arXiv

An Investigation of Few-Shot Learning in Spoken Term Classification

In this paper, we investigate the feasibility of applying few-shot learning algorithms to a speech task. We formulate a user-defined scenario of spoken term classification as a few-shot learning problem. In most few-shot learning studies, it is assumed that all the N classes are new in a N-way problem. We suggest that this assumption can be relaxed and define a N+M-way problem where N and M are the number of new classes and fixed classes respectively. We propose a modification to the Model-Agnostic Meta-Learning (MAML) algorithm to solve the problem. Experiments on the Google Speech Commands dataset show that our approach outperforms the conventional supervised learning approach and the original MAML.

preprint2020arXiv

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Collecting supporting evidence from large corpora of text (e.g., Wikipedia) is of great challenge for open-domain Question Answering (QA). Especially, for multi-hop open-domain QA, scattered evidence pieces are required to be gathered together to support the answer extraction. In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering. Specifically, the hop in this paper is defined as the combination of a hyperlink and the corresponding outbound link document. The hyperlink is encoded as the mention embedding which models the structured knowledge of how the outbound link entity is mentioned in the textual context, and the corresponding outbound link document is encoded as the document embedding representing the unstructured knowledge within it. Accordingly, we build HopRetriever which retrieves hops over Wikipedia to answer complex questions. Experiments on the HotpotQA dataset demonstrate that HopRetriever outperforms previously published evidence retrieval methods by large margins. Moreover, our approach also yields quantifiable interpretations of the evidence collection process.

preprint2020arXiv

Learning to Detect Unacceptable Machine Translations for Downstream Tasks

The field of machine translation has progressed tremendously in recent years. Even though the translation quality has improved significantly, current systems are still unable to produce uniformly acceptable machine translations for the variety of possible use cases. In this work, we put machine translation in a cross-lingual pipeline and introduce downstream tasks to define task-specific acceptability of machine translations. This allows us to leverage parallel data to automatically generate acceptability annotations on a large scale, which in turn help to learn acceptability detectors for the downstream tasks. We conduct experiments to demonstrate the effectiveness of our framework for a range of downstream tasks and translation models.

preprint2020arXiv

Neural Subgraph Isomorphism Counting

In this paper, we study a new graph learning problem: learning to count subgraph isomorphisms. Different from other traditional graph learning problems such as node classification and link prediction, subgraph isomorphism counting is NP-complete and requires more global inference to oversee the whole graph. To make it scalable for large-scale graphs and patterns, we propose a learning framework which augments different representation learning architectures and iteratively attends pattern and target data graphs to memorize subgraph isomorphisms for the global counting. We develop both small graphs (<= 1,024 subgraph isomorphisms in each) and large graphs (<= 4,096 subgraph isomorphisms in each) sets to evaluate different models. A mutagenic compound dataset, MUTAG, is also used to evaluate neural models and demonstrate the success of transfer learning. While the learning based approach is inexact, we are able to generalize to count large patterns and data graphs in linear time compared to the exponential time of the original NP-complete problem. Experimental results show that learning based subgraph isomorphism counting can speed up the traditional algorithm, VF2, 10-1,000 times with acceptable errors. Domain adaptation based on fine-tuning also shows the usefulness of our approach in real-world applications.

preprint2020arXiv

On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification

Implicit discourse relation classification is one of the most difficult parts in shallow discourse parsing as the relation prediction without explicit connectives requires the language understanding at both the text span level and the sentence level. Previous studies mainly focus on the interactions between two arguments. We argue that a powerful contextualized representation module, a bilateral multi-perspective matching module, and a global information fusion module are all important to implicit discourse analysis. We propose a novel model to combine these modules together. Extensive experiments show that our proposed model outperforms BERT and other state-of-the-art systems on the PDTB dataset by around 8% and CoNLL 2016 datasets around 16%. We also analyze the effectiveness of different modules in the implicit discourse relation classification task and demonstrate how different levels of representation learning can affect the results.

preprint2020arXiv

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Masked language model and autoregressive language model are two types of language models. While pretrained masked language models such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive language models such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated language model. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also outperforms BERT on a set of downstream NLU tasks.

preprint2020arXiv

Progressive Memory Banks for Incremental Domain Adaptation

This paper addresses the problem of incremental domain adaptation (IDA) in natural language processing (NLP). We assume each domain comes one after another, and that we could only access data in the current domain. The goal of IDA is to build a unified model performing well on all the domains that we have encountered. We adopt the recurrent neural network (RNN) widely used in NLP, but augment it with a directly parameterized memory bank, which is retrieved by an attention mechanism at each step of RNN transition. The memory bank provides a natural way of IDA: when adapting our model to a new domain, we progressively add new slots to the memory bank, which increases the number of parameters, and thus the model capacity. We learn the new memory slots and fine-tune existing parameters by back-propagation. Experimental results show that our approach achieves significantly better performance than fine-tuning alone. Compared with expanding hidden states, our approach is more robust for old domains, shown by both empirical and theoretical results. Our model also outperforms previous work of IDA including elastic weight consolidation and progressive neural networks in the experiments.

preprint2020arXiv

Unsupervised Text Generation by Learning from Search

In this work, we present TGLS, a novel framework to unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, paraphrase generation and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance with the state-of-the-art supervised methods in paraphrase generation.

preprint2019arXiv

Towards third-order parametric down-conversion in optical fibers

Optical fibers have been considered an optimal platform for third-order parametric down-conversion since they can potentially overcome the weak third-order nonlinearity by their long interaction length. Here we present, in the first part, a theoretical derivation for the conversion rate both in the case of spontaneous generation and in the presence of a seed beam. Then we review three types of optical fibers and we examine their properties in terms of conversion efficiency and practical feasibility.

preprint2016arXiv

Coherence and incoherence collective behavior in financial market

Financial markets have been extensively studied as highly complex evolving systems. In this paper, we quantify financial price fluctuations through a coupled dynamical system composed of phase oscillators. We find a Financial Coherence and Incoherence (FCI) coexistence collective behavior emerges as the system evolves into the stable state, in which the stocks split into two groups: one is represented by coherent, phase-locked oscillators, the other is composed of incoherent, drifting oscillators. It is demonstrated that the size of the coherent stock groups fluctuates during the economic periods according to real-world financial instabilities or shocks. Further, we introduce the coherent characteristic matrix to characterize the involvement dynamics of stocks in the coherent groups. Clustering results on the matrix provides a novel manifestation of the correlations among stocks in the economic periods. Our analysis for components of the groups is consistent with the Global Industry Classification Standard (GICS) classification and can also figure out features for newly developed industries. These results can provide potentially implications on characterizing inner dynamical structure of financial markets and making optimal investment tragedies.

preprint2016arXiv

Incorporating Semantic Knowledge into Latent Matching Model in Search

The relevance between a query and a document in search can be represented as matching degree between the two objects. Latent space models have been proven to be effective for the task, which are often trained with click-through data. One technical challenge with the approach is that it is hard to train a model for tail queries and tail documents for which there are not enough clicks. In this paper, we propose to address the challenge by learning a latent matching model, using not only click-through data but also semantic knowledge. The semantic knowledge can be categories of queries and documents as well as synonyms of words, manually or automatically created. Specifically, we incorporate semantic knowledge into the objective function by including regularization terms. We develop two methods to solve the learning task on the basis of coordinate descent and gradient descent respectively, which can be employed in different settings. Experimental results on two datasets from an app search engine demonstrate that our model can make effective use of semantic knowledge, and thus can significantly enhance the accuracies of latent matching models, particularly for tail queries.

preprint2016arXiv

Neural Generative Question Answering

This paper presents an end-to-end neural network model, named Neural Generative Question Answering (GENQA), that can generate answers to simple factoid questions, based on the facts in a knowledge-base. More specifically, the model is built on the encoder-decoder framework for sequence-to-sequence learning, while equipped with the ability to enquire the knowledge-base, and is trained on a corpus of question-answer pairs, with their associated triples in the knowledge-base. Empirical study shows the proposed model can effectively deal with the variations of questions and answers, and generate right and natural answers by referring to the facts in the knowledge-base. The experiment on question answering demonstrates that the proposed model can outperform an embedding-based QA model as well as a neural dialogue model trained on the same data.

preprint2016arXiv

Online Data Thinning via Multi-Subspace Tracking

In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in datacenters. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariances matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering and subspace tracking allow the proposed method to adapt to dynamic environments. Furthermore, the proposed method allows subsampling, is robust to missing data, and uses a mini-batch online optimization approach. The resulting algorithms are scalable, efficient, and are capable of operating in real time. Experiments on wide-area motion imagery and e-mail databases illustrate the efficacy of the proposed approach.

preprint2016arXiv

Symmetry-broken states on networks of coupled oscillators

When identical oscillators are coupled together in a network, dynamical steady states are often assumed to reflect network symmetries. Here we show that alternative persistent states may also exist that break the symmetries of the underlying coupling network. We further show that these symmetry-broken coexistent states are analogous to those dubbed "chimera states," which can occur when identical oscillators are coupled to one another in identical ways.

preprint2015arXiv

Contagion processes on the static and activity driven coupling networks

The evolution of network structure and the spreading of epidemic are common coexistent dynamical processes. In most cases, network structure is treated either static or time-varying, supposing the whole network is observed in a same time window. In this paper, we consider the epidemic spreading on a network consisting of both static and time-varying structures. At meanwhile, the time-varying part and the epidemic spreading are supposed to be of the same time scale. We introduce a static and activity driven coupling (SADC) network model to characterize the coupling between static (strong) structure and dynamic (weak) structure. Epidemic thresholds of SIS and SIR model are studied on SADC both analytically and numerically with various coupling strategies, where the strong structure is of homogeneous or heterogeneous degree distribution. Theoretical thresholds obtained from SADC model can both recover and generalize the classical results in static and time-varying networks. It is demonstrated that weak structures can make the epidemics break out much more easily in homogeneous coupling but harder in heterogeneous coupling when keeping same average degree in SADC networks. Furthermore, we show there exists a threshold ratio of the weak structure to have substantive effects on the breakout of the epidemics. This promotes our understanding of why epidemics can still break out in some social networks even we restrict the flow of the population.

preprint2015arXiv

Role of transparency of platinum-ferromagnet interface in determining intrinsic magnitude of spin Hall effect

The spin Hall effect (SHE) converts charge current to pure spin currents in orthogonal directions in materials that have significant spin-orbit coupling.The efficiency of the conversion is described by the spin Hall Angle (SHA). The SHA can most readily be inferred by using the generated spin currents to excite or rotate the magnetization of ferromagnetic films or nano-elements via spin-transfer torques.Some of the largest spin torque derived spin Hall angles (ST-SHA) have been reported in platinum. Here we show, using spin torque ferromagnetic resonance (ST-FMR) measurements, that the transparency of the Pt-ferromagnet interface to the spin current plays a central role in determining the magnitude of the ST-SHA. We measure a much larger ST-SHA in Pt/cobalt (~0.11) compared to Pt/permalloy (~0.05) bilayers when the interfaces are assumed to be completely transparent. Taking into account the transparency of these interfaces, as derived from spin-mixing conductances, we find that the intrinsic SHA in platinum has a much higher value of 0.19 +- 0.04 as compared to the ST-SHA. The importance of the interface transparency is further exemplified by the insertion of atomically thin magnetic layers at the Pt/permalloy interface that we show strongly modulates the magnitude of the ST-SHA.

preprint2014arXiv

A unified phase transition picture of the charged topological black hole in Horava-Lifshitz gravity

Aiming at a unified phase transition picture of the charged topological black hole in Hořava-Lifshitz gravity, we investigate this issue not only in canonical ensemble with the fixed charge case but also in grand-canonical ensemble with the fixed potential case. We firstly perform the standard analysis of the specific heat, the free energy and the Gibbs potential, and then study its geometrothermodynamics. It is shown that the local phase transition points not only witness the divergence of the specific heat, but also witness the minimum temperature and the maximum free energy or Gibbs potential. They also witness the divergence of the corresponding thermodynamic scalar curvature. No matter which ensemble is chosen, the metric constructed can successfully produce the behavior of the thermodynamic interaction and phase transition structure while other metrics failed to predict the phase transition point of the charged topological black hole in former literature. In grand-canonical ensemble, we have discovered the phase transition which has not been reported before. It is similar to the canonical ensemble in which the phase transition only takes place when $k=-1$. But it also has its unique characteristics that the location of the phase transition point depends on the value of potential, which is different from the canonical ensemble where the phase transition point is independent of the parameters. After an analytical check of Ehrenfest scheme, we find that the new phase transition is a second order one. It is also found that the thermodynamics of the black hole in Horava-Lifshitz gravity is quite different from that in Einstein gravity.

preprint2014arXiv

Minimax Optimal Rates for Poisson Inverse Problems with Physical Constraints

This paper considers fundamental limits for solving sparse inverse problems in the presence of Poisson noise with physical constraints. Such problems arise in a variety of applications, including photon-limited imaging systems based on compressed sensing. Most prior theoretical results in compressed sensing and related inverse problems apply to idealized settings where the noise is i.i.d., and do not account for signal-dependent noise and physical sensing constraints. Prior results on Poisson compressed sensing with signal-dependent noise and physical constraints provided upper bounds on mean squared error performance for a specific class of estimators. However, it was unknown whether those bounds were tight or if other estimators could achieve significantly better performance. This work provides minimax lower bounds on mean-squared error for sparse Poisson inverse problems under physical constraints. Our lower bounds are complemented by minimax upper bounds. Our upper and lower bounds reveal that due to the interplay between the Poisson noise model, the sparsity constraint and the physical constraints: (i) the mean-squared error does not depend on the sample size $n$ other than to ensure the sensing matrix satisfies RIP-like conditions and the intensity $T$ of the input signal plays a critical role; and (ii) the mean-squared error has two distinct regimes, a low-intensity and a high-intensity regime and the transition point from the low-intensity to high-intensity regime depends on the input signal $f^*$. In the low-intensity regime the mean-squared error is independent of $T$ while in the high-intensity regime, the mean-squared error scales as $\frac{s \log p}{T}$, where $s$ is the sparsity level, $p$ is the number of pixels or parameters and $T$ is the signal intensity.

preprint2014arXiv

Rolling, sliding & torsion of micron-sized silica particles - Experimental, numerical and theoretical analysis

The contact mechanics of individual, very small particles with other particles and walls is studied using a nanoindenter setup that allows normal and lateral displacement control and measurement of the respective forces. The sliding, rolling and torsional forces and torques are tested with borosilicate microspheres, featuring radii of about 10$μ$m. The contacts are with flat silicon substrates of different roughness for pure sliding and rolling and with silicon based, ion-beam crafted rail systems for combined rolling and torsion. The experimental results are discussed and compared to various analytical predictions and contact models, allowing for two concurrent interpretations of the effects of surface roughness, plasticity and adhesion. This enables us to determine both rolling and torsion friction coefficients together with their associated length scales. Interestingly, even though normal contacts behave elastically (Hertzian), all other modes of motion display effects due to surface roughness and consequent plastic deformation. The influence of adhesion is interpreted in the framework of different models and is very different for different degrees of freedom, being largest for rolling.

preprint2013arXiv

Spin injection and detection in lanthanum- and niobium-doped SrTiO3 using the Hanle technique

There has been much interest in the injection and detection of spin polarized carriers in semiconductors for the purposes of developing novel spintronic devices. Here we report the electrical injection and detection of spin-polarized carriers into Nb-doped strontium titanate (STO) single crystals and La-doped STO epitaxial thin films using MgO tunnel barriers and the three-terminal Hanle technique. Spin lifetimes of up to ~100 ps are measured at room temperature and vary little as the temperature is decreased to low temperatures. However, the mobility of the STO has a strong temperature dependence. This behavior and the carrier doping dependence of the spin lifetime suggest that the spin lifetime is limited by spin-dependent scattering at the MgO/STO interfaces, perhaps related to the formation of doping induced Ti3+. Our results reveal a severe limitation of the three-terminal Hanle technique for measuring spin lifetimes within the interior of the subject material.

preprint2012arXiv

A Hierarchical Bayesian Approach for Aerosol Retrieval Using MISR Data

Atmospheric aerosols can cause serious damage to human health and life expectancy. Using the radiances observed by NASA's Multi-angle Imaging SpectroRadiometer (MISR), the current MISR operational algorithm retrieves Aerosol Optical Depth (AOD) at a spatial resolution of 17.6 km x 17.6 km. A systematic study of aerosols and their impact on public health, especially in highly-populated urban areas, requires a finer-resolution estimate of the spatial distribution of AOD values. We embed MISR's operational weighted least squares criterion and its forward simulations for AOD retrieval in a likelihood framework and further expand it into a Bayesian hierarchical model to adapt to a finer spatial scale of 4.4 km x 4.4 km. To take advantage of AOD's spatial smoothness, our method borrows strength from data at neighboring pixels by postulating a Gaussian Markov Random Field prior for AOD. Our model considers both AOD and aerosol mixing vectors as continuous variables. The inference of AOD and mixing vectors is carried out using Metropolis-within-Gibbs sampling methods. Retrieval uncertainties are quantified by posterior variabilities. We also implement a parallel MCMC algorithm to reduce computational cost. We assess our retrievals performance using ground-based measurements from the AErosol RObotic NETwork (AERONET), a hand-held sunphotometer and satellite images from Google Earth. Based on case studies in the greater Beijing area, China, we show that a 4.4 km resolution can improve the accuracy and coverage of remotely-sensed aerosol retrievals, as well as our understanding of the spatial and seasonal behaviors of aerosols. This improvement is particularly important during high-AOD events, which often indicate severe air pollution.

preprint2009arXiv

Detecting Structure of Complex Network by Quantum Bosonic Dynamics

We introduce a non-interacting boson model to investigate topological structure of complex networks in the present paper. By exactly solving this model, we show that it provides a powerful analytical tool in uncovering the important properties of real-world networks. We find that the ground state degeneracy of this model is equal to the number of connected components in the network and the square of coefficients in the expansion of ground state gives the averaged time for a random walker spending at each node in the infinite time limit. Furthermore, the first excited state appears always on its largest connected component. To show usefulness of this approach in practice, we carry on also numerical simulations on some concrete complex networks. Our results are completely consistent with the previous conclusions derived by graph theory methods.

preprint2009arXiv

Thermal-magnetic noise measurement of spin-torque effects on ferromagnetic resonance in MgO-based magnetic tunnel junctions

Thermal-magnetic noise at ferromagnetic resonance (T-FMR) can be used to measure magnetic perpendicular anisotropy of nanoscale magnetic tunnel junctions (MTJs). For this purpose, T-FMR measurements were conducted with an external magnetic field up to 14 kOe applied perpendicular to the film surface of MgO-based MTJs under a dc bias. The observed frequency-field relationship suggests that a 20 A CoFeB free layer has an effective demagnetization field much smaller than the intrinsic bulk value of CoFeB, with 4PiMeff = (6.1 +/- 0.3) kOe. This value is consistent with the saturation field obtained from magnetometry measurements on extended films of the same CoFeB thickness. In-plane T-FMR on the other hand shows less consistent results for the effective demagnetization field, presumably due to excitations of more complex modes. These experiments suggest that the perpendicular T-FMR is preferred for quantitative magnetic characterization of nanoscale MTJs.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2403.17512:author:5:xin-jiang

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24157:author:51:xin-jiang

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.16384:author:2:xin-jiang

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.09409:author:3:xin-jiang

Imported May 20, 2026Synced May 20, 2026

23 works

Qun Liu

Researcher

Qun Liu contributes to research discovery and scholarly infrastructure.

Open to collaborate

13 works

Lifeng Shang

Researcher

Lifeng Shang contributes to research discovery and scholarly infrastructure.

Open to collaborate

11 works

Yasheng Wang

Researcher

Yasheng Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Fei Mi

Researcher

Fei Mi contributes to research discovery and scholarly infrastructure.

Open to collaborate

Xin Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

55 published item(s)

Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice

Predictive and feedback signals differently shape the formation of group-level and individualized language representations

ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning

Entanglement Entropy of Conformal Field Theory in All Dimensions

Training Report of TeleChat3-MoE

Random-coupled Neural Network

Timelike entanglement entropy and $T\bar{T}$ deformation

Timelike entanglement entropy in dS$_3$/CFT$_2$

AutoBERT-Zero: Evolving BERT Backbone from Scratch

Boosting Graph Structure Learning with Dummy Nodes

CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems

CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

Compilable Neural Code Generation with Compiler Feedback

Compression of Generative Pre-trained Language Models via Quantization

DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Exploring Extreme Parameter Compression for Pre-trained Language Models

Gravitational Lensing by Black Holes with Multiple Photon Spheres

How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks

JABER and SABER: Junior and Senior Arabic BERt

LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework

Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

PERT: A New Solution to Pinyin to Character Conversion Task

Read before Generate! Faithful Long Form Question Answering with Machine Reading

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation

Consolidating Kinematic Models to Promote Coordinated Mobile Manipulations

Personalized Graph Neural Networks with Attention Mechanism for Session-Aware Recommendation

An Investigation of Few-Shot Learning in Spoken Term Classification

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Learning to Detect Unacceptable Machine Translations for Downstream Tasks

Neural Subgraph Isomorphism Counting

On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Progressive Memory Banks for Incremental Domain Adaptation

Unsupervised Text Generation by Learning from Search

Towards third-order parametric down-conversion in optical fibers

Coherence and incoherence collective behavior in financial market

Incorporating Semantic Knowledge into Latent Matching Model in Search

Neural Generative Question Answering

Online Data Thinning via Multi-Subspace Tracking

Symmetry-broken states on networks of coupled oscillators

Contagion processes on the static and activity driven coupling networks

Role of transparency of platinum-ferromagnet interface in determining intrinsic magnitude of spin Hall effect

A unified phase transition picture of the charged topological black hole in Horava-Lifshitz gravity

Minimax Optimal Rates for Poisson Inverse Problems with Physical Constraints

Rolling, sliding & torsion of micron-sized silica particles - Experimental, numerical and theoretical analysis

Spin injection and detection in lanthanum- and niobium-doped SrTiO3 using the Hanle technique

A Hierarchical Bayesian Approach for Aerosol Retrieval Using MISR Data

Detecting Structure of Complex Network by Quantum Bosonic Dynamics

Thermal-magnetic noise measurement of spin-torque effects on ferromagnetic resonance in MgO-based magnetic tunnel junctions