Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
58works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

58 published item(s)

preprint2026arXiv

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks. However, they fall short to comprehend context involving multiple images. A primary reason for this shortcoming is that the visual features for each images are encoded individually by frozen encoders before feeding into the LLM backbone, lacking awareness of other images and the multimodal instructions. We term this issue as prior-LLM modality isolation and propose a two phase paradigm, browse-and-concentrate, to enable in-depth multimodal context fusion prior to feeding the features into LLMs. This paradigm initially "browses" through the inputs for essential insights, and then revisits the inputs to "concentrate" on crucial details, guided by these insights, to achieve a more comprehensive understanding of the multimodal inputs. Additionally, we develop training strategies specifically to enhance the understanding of multi-image inputs. Our method markedly boosts the performance on 7 multi-image scenarios, contributing to increments on average accuracy by 2.13% and 7.60% against strong MLLMs baselines with 3B and 11B LLMs, respectively.

preprint2026arXiv

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Policy entropy has emerged as a fundamental measure for understanding and controlling exploration in reinforcement learning with verifiable rewards (RLVR) for LLMs. However, existing entropy-aware methods mainly regulate entropy through global objectives, while the token-level mechanism by which sampled policy updates reshape policy entropy remains underexplored. In this work, we develop a theoretical framework of entropy mechanics in RLVR. Our analysis yields a first-order approximation of the entropy change, giving rise to entropy polarity, a signed token-level quantity that predicts how much a sampled update expands or contracts entropy. This analysis further reveals a structural asymmetry: reinforcing frequent high-probability tokens triggers contraction tendencies, whereas expansive tendencies typically require lower-probability samples or stronger distributional correction. Empirically, we show that entropy polarity reliably predicts entropy changes, and that positive and negative polarity branches play complementary roles in preserving exploration while strengthening exploitation. Building on these insights, we propose Polarity-Aware Policy Optimization (PAPO), which preserves both polarity branches and implements entropy control through advantage reweighting. With the empirical entropy trajectory as an online phase signal, PAPO adaptively reallocates optimization pressure between entropy-expanding and entropy-contracting updates. Experiments on mathematical reasoning and agentic benchmarks show that PAPO consistently outperforms competitive baselines, while delivering superior training efficiency and substantial reward improvements.

preprint2026arXiv

EvoRoute: Experience-Driven Self-Routing LLM Agent Systems

Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tasks. However, this success is shadowed by prohibitive economic costs and severe latency, exposing a critical, yet underexplored, trade-off. We formalize this challenge as the \textbf{Agent System Trilemma}: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments. Leveraging an ever-expanding knowledge base of prior experience, EvoRoute dynamically selects Pareto-optimal LLM backbones at each step, balancing accuracy, efficiency, and resource use, while continually refining its own selection policy through environment feedback. Experiments on challenging agentic benchmarks such as GAIA and BrowseComp+ demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80\%$ and latency by over $70\%$.

preprint2026arXiv

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

As LLM-powered agents are increasingly deployed in edge-cloud environments, personalized memory has become a key enabler of long-term adaptation and user-centric interaction. However, cloud-assisted memory management exposes sensitive user information, while existing privacy protection methods typically rely on aggressive masking that removes task-relevant semantics and consequently degrades memory utility and personalization quality. To address this challenge, We propose MemPrivacy, which identifies privacy-sensitive spans on edge devices, replaces them with semantically structured type-aware placeholders for cloud-side memory processing, and restores the original values locally when needed. By decoupling privacy protection from semantic destruction, MemPrivacy minimizes sensitive data exposure while retaining the information required for effective memory formation and retrieval. We also construct MemPrivacy-Bench for systematic evaluation, a dataset covering 200 users and over 155k privacy instances, and introduce a four-level privacy taxonomy for configurable protection policies. Experiments show that MemPrivacy achieves strong performance in privacy information extraction, substantially surpassing strong general-purpose models such as GPT-5.2 and Gemini-3.1-Pro, while also reducing inference latency. Across multiple widely used memory systems, MemPrivacy limits utility loss to within 1.6%, outperforming baseline masking strategies. Overall, MemPrivacy offers an effective balance between privacy protection and personalized memory utility for edge-cloud agents, enabling secure, practical, and user-transparent deployment.

preprint2026arXiv

Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

Large language models have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque, limiting our ability to inspect, control, and systematically improve them. This opacity motivates a growing body of research in mechanistic interpretability, with sparse autoencoders (SAEs) emerging as one of the most promising tools for decomposing model activations into sparse, interpretable feature representations. We introduce Qwen-Scope, an open-source suite of SAEs built on the Qwen model family, comprising 14 groups of SAEs across 7 model variants from the Qwen3 and Qwen3.5 series, covering both dense and mixture-of-expert architectures. Built on top of these SAEs, we show that SAEs can go beyond post-hoc analysis to serve as practical interfaces for model development along four directions: (i) inference-time steering, where SAE feature directions control language, concepts, and preferences without modifying model weights; (ii) evaluation analysis, where activated SAE features provide a representation-level proxy for benchmark redundancy and capability coverage; (iii) data-centric workflows, where SAE features support multilingual toxicity classification and safety-oriented data synthesis; and (iv) post-training optimization, where SAE-derived signals are incorporated into supervised fine-tuning and reinforcement learning objectives to mitigate undesirable behaviors such as code-switching and repetition. Together, these results demonstrate that SAEs can serve not only as post-hoc analysis tools, but also as reusable representation-level interfaces for diagnosing, controlling, evaluating, and improving large language models. By open-sourcing Qwen-Scope, we aim to support mechanistic research and accelerate practical workflows that connect model internals to downstream behavior.

preprint2026arXiv

STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading

In financial trading, factor models are widely used to price assets and capture excess returns from mispricing. Recently, we have witnessed the rise of variational autoencoder-based latent factor models, which learn latent factors self-adaptively. While these models focus on modeling overall market conditions, they often fail to effectively capture the temporal patterns of individual stocks. Additionally, representing multiple factors as single values simplifies the model but limits its ability to capture complex relationships and dependencies. As a result, the learned factors are of low quality and lack diversity, reducing their effectiveness and robustness across different trading periods. To address these issues, we propose a Spatio-Temporal factOR Model based on dual vector quantized variational autoencoders, named STORM, which extracts features of stocks from temporal and spatial perspectives, then fuses and aligns these features at the fine-grained and semantic level, and represents the factors as multi-dimensional embeddings. The discrete codebooks cluster similar factor embeddings, ensuring orthogonality and diversity, which helps distinguish between different factors and enables factor selection in financial trading. To show the performance of the proposed factor model, we apply it to two downstream experiments: portfolio management on two stock datasets and individual trading tasks on six specific stocks. The extensive experiments demonstrate STORM's flexibility in adapting to downstream tasks and superior performance over baseline models.

preprint2026arXiv

Study the property of $W^{\prime}$ at future $e^-p$ collider

As a strong candidate for new physics beyond the Standard Model, the exotic charged gauge boson $W^{\prime}$ has attracted extensive research interest. In this work we investigate the interactions of the $W^{\prime}$ boson at the electron-proton colliders. The process $e^- u \to ν_e d$ and $e^- u \to e^\pm jjj$ with $t$-channel $W^{\prime}$ exchange are studied. The polarization of the initial-state electrons has a significant impact on the cross section of the studied process, while the angular distribution of the final-state leptons serves as an important observable for the interactions of the $W^{\prime}$ boson. In some specific regions of the parameter space, the detectable mass range for the $W^{\prime}$ boson can reach around 10 TeV, and the coupling strength can achieve a precision of approximately 1\% relative to the interaction strength of the Standard Model. Especially, $e^- u \to e^+ jjj$ process is forbidden within the Standard Model, which would constitute important evidence in the search for the Left-Right Symmetric Model.

preprint2026arXiv

ToolRM: Towards Agentic Tool-Use Reward Modeling

Reward models (RMs) play a critical role in aligning large language models (LLMs) with human preferences. Yet in the domain of tool learning, the lack of RMs specifically designed for function-calling tasks has limited progress toward more capable agentic AI. We introduce ToolRM, a family of lightweight reward models tailored for general tool-use scenarios. To build these models, we propose a novel pipeline that constructs high-quality pairwise preference data using rule-based scoring and multidimensional sampling. This yields ToolPref-Pairwise-30K, a diverse, balanced, and challenging preference dataset that supports both generative and discriminative reward modeling. We also introduce TRBench$_{BFCL}$, a benchmark built on the agent evaluation suite BFCL to evaluate RMs on tool calling tasks. Trained on our constructed data, models from the Qwen3-4B/8B series achieve up to 17.94% higher accuracy, substantially outperforming frontier LLMs and RMs in pairwise reward judgments. Beyond training objectives, generative ToolRM generalizes to broader critique tasks, including Best-of-N sampling and self-correction. Experiments on ACEBench highlight its effectiveness and efficiency, enabling inference-time scaling while reducing output token usage by over 66%. Its support for downstream RL training further validates its practical utility. We release data to facilitate future research.

preprint2026arXiv

XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition

Large Language Models (LLMs) are increasingly deployed for knowledge synthesis, yet their capacity for compositional generalization in scientific knowledge remains under-characterized. Existing benchmarks primarily focus on single-turn restricted scenarios, failing to capture the capability boundaries exposed by real-world interactive scientific workflows. To address this, we introduce XDomainBench, a diagnostic benchmark for interactive interdisciplinary scientific reasoning. We formalize the composition order and mixture structure to enable systematic stress-testing from single-discipline to inter-disciplinary, comprising 8,598 interactive sessions across 20 domains and 4 task categories, with 8 realistic trajectory patterns covering difficulty and domain-mixture dynamics, simulating real AI4S scenarios. Large-scale evaluation of LLMs reveals a systematic reasoning collapse as composition order increases, stemming from two root causes: (i) direct difficulty increases induced by domain composition, and (ii) indirect interaction-amplified failures where trajectory patterns trigger error accumulation, reasoning breaks, and domain confusion, ultimately leading to session collapse.

preprint2024arXiv

DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever

Recently, substantial advancements in pre-trained vision-language models have greatly enhanced the capabilities of multi-modal dialog systems. These models have demonstrated significant improvements by fine-tuning on downstream tasks. However, the existing pre-trained models primarily focus on effectively capturing the alignment between vision and language modalities, often ignoring the intricate nature of dialog context. In this paper, we propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval. Specifically, our approach introduces a multi-modal context prompt generator to learn context features which are subsequently distilled into prompts within the pre-trained vision-language model CLIP. Besides, we introduce domain prompt to mitigate the disc repancy from the downstream dialog data. To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space, with each expert being responsible to one specific retrieval type. Extensive experiments show that DialCLIP achieves state-of-the-art performance on two widely recognized benchmark datasets (i.e., PhotoChat and MMDialog) by tuning a mere 0.04% of the total parameters. These results highlight the efficacy and efficiency of our proposed approach, underscoring its potential to advance the field of multi-modal dialog retrieval.

preprint2024arXiv

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the corresponding markup sequences from the table images. However, existing approaches either count on additional heuristic rules to recover the table structures, or face challenges in capturing long-range dependencies within tables, resulting in increased complexity. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR. Moreover, inspired by the persuasive success of pre-trained models on a number of computer vision and natural language processing tasks, we propose two pre-training tasks to enrich the spatial and logical representations at the feature level of LORE, resulting in an upgraded version called LORE++. The incorporation of pre-training in LORE++ has proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks against methods of previous paradigms demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR.

preprint2024arXiv

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Recently, the strong text creation ability of Large Language Models(LLMs) has given rise to many tools for assisting paper reading or even writing. However, the weak diagram analysis abilities of LLMs or Multimodal LLMs greatly limit their application scenarios, especially for scientific academic paper writing. In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs. By parsing Latex source files of high-quality papers, we carefully build a multi-modal diagram understanding dataset M-Paper. By aligning diagrams in the paper with related paragraphs, we construct professional diagram analysis samples for training and evaluation. M-Paper is the first dataset to support joint comprehension of multiple scientific diagrams, including figures and tables in the format of images or Latex codes. Besides, to better align the copilot with the user's intention, we introduce the `outline' as the control signal, which could be directly given by the user or revised based on auto-generated ones. Comprehensive experiments with a state-of-the-art Mumtimodal LLM demonstrate that training on our dataset shows stronger scientific diagram understanding performance, including diagram captioning, diagram analysis, and outline recommendation. The dataset, code, and model are available at https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/PaperOwl.

preprint2024arXiv

Unifying Structured Data as Graph for Data-to-Text Pre-Training

Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performances. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.

preprint2023arXiv

Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well to unseen databases. Recently, the pre-trained text-to-text transformer model, namely T5, though not specialized for text-to-SQL parsing, has achieved state-of-the-art performance on standard benchmarks targeting domain generalization. In this work, we explore ways to further augment the pre-trained T5 model with specialized components for text-to-SQL parsing. Such components are expected to introduce structural inductive bias into text-to-SQL parsers thus improving model's capacity on (potentially multi-hop) reasoning, which is critical for generating structure-rich SQLs. To this end, we propose a new architecture GRAPHIX-T5, a mixed model with the standard pre-trained transformer model augmented by some specially-designed graph-aware layers. Extensive experiments and analysis demonstrate the effectiveness of GRAPHIX-T5 across four text-to-SQL benchmarks: SPIDER, SYN, REALISTIC and DK. GRAPHIX-T5 surpass all other T5-based parsers with a significant margin, achieving new state-of-the-art performance. Notably, GRAPHIX-T5-large reach performance superior to the original T5-large by 5.7% on exact match (EM) accuracy and 6.6% on execution accuracy (EX). This even outperforms the T5-3B by 1.2% on EM and 1.5% on EX.

preprint2022arXiv

A global analysis of charmless two body hadronic decays for anti-triplet charmed baryons

Recently Belle collaboration reported new measurements for the branching fractions with the first observing two processes of $\mathcal{B}(Ξ_c^0\toΛK^0_S)$, $\mathcal{B}(Ξ_c^0\toΣ^0 K^0_S)$ and updating data for $\mathcal{B}(Ξ_c^0\toΣ^+ K^-)$. Combined with other known data on charmless two body decays of anti-triplet charmed baryons, a lot of information can be derived with the assistance of $SU(3)$ flavour symmetry. Using $SU(3)$ relations between different decay modes, we can give some predictions based on the new measurements which can be tested with the high luminosity experiments in the future. More interestingly, we find that a global fit is now possible with the addition of new Belle data. In general, there are 18 complex $SU(3)$ invariant amplitudes. We find that a scenario of all amplitudes being real can fit the data well with a $χ^2/d.o.f$ only $0.773$. This indicates that neglecting the phases of the amplitudes is a reasonable assumption. When more data become available, one may be able to get more information for phases in the amplitudes. We give several comments on the feature of global fit regarding the branching fractions, relations between different decays, and decays involving $K^0$ and $\bar K^0$. Many of the unknown branching fractions and polarization asymmetry parameters of anti-triplet charmed baryon for charmless two body decays are predicted to be accessible by experiments at Belle, Belle~II, BES-III, and LHCb. The validity of $SU(3)$ for charmless two body hadronic decays can be more accurately tested.

preprint2022arXiv

A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactions with the systems. In recent years, deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output SQL query. Subsequently, the large pre-trained language models have taken the state-of-the-art of the text-to-SQL parsing task to a new level. In this survey, we present a comprehensive review on deep learning approaches for text-to-SQL parsing. First, we introduce the text-to-SQL parsing corpora which can be categorized as single-turn and multi-turn. Second, we provide a systematical overview of pre-trained language models and existing methods for text-to-SQL parsing. Third, we present readers with the challenges faced by text-to-SQL parsing and explore some potential future directions in this field.

preprint2022arXiv

AISHELL-NER: Named Entity Recognition from Chinese Speech

Named Entity Recognition (NER) from speech is among Spoken Language Understanding (SLU) tasks, aiming to extract semantic information from the speech signal. NER from speech is usually made through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech Recognition (ASR) system and (2) applying an NER tagger to the ASR outputs. Recent works have shown the capability of the End-to-End (E2E) approach for NER from English and French speech, which is essentially entity-aware ASR. However, due to the many homophones and polyphones that exist in Chinese, NER from Chinese speech is effectively a more challenging task. In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. Extensive experiments are conducted to explore the performance of several state-of-the-art methods. The results demonstrate that the performance could be improved by combining entity-aware ASR and pretrained NER tagger, which can be easily applied to the modern SLU pipeline. The dataset is publicly available at github.com/Alibaba-NLP/AISHELL-NER.

preprint2022arXiv

DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition

The MultiCoNER shared task aims at detecting semantically ambiguous and complex named entities in short and low-context settings for multiple languages. The lack of contexts makes the recognition of ambiguous named entities challenging. To alleviate this issue, our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia to provide related context information to the named entity recognition (NER) model. Given an input sentence, our system effectively retrieves related contexts from the knowledge base. The original input sentences are then augmented with such context information, allowing significantly better contextualized token representations to be captured. Our system wins 10 out of 13 tracks in the MultiCoNER shared task.

preprint2022arXiv

Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.

preprint2022arXiv

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems

In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semi-supervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response latency by 50%.

preprint2022arXiv

Evaluating Lyman-$α$ Constraints for General Dark-Matter Velocity Distributions: Multiple Scales and Cautionary Tales

The Lyman-$α$ absorption spectrum associated with photons traversing the intergalactic medium allows us to probe the linear matter power spectrum down to relatively small distance scales. Finding ways of accurately evaluating Lyman-$α$ constraints across large classes of candidate models of dark-matter physics is thus of paramount importance. While such constraints have been evaluated for dark-matter models with relatively simple dark-matter velocity distributions, more complex models -- particularly those with dark-matter velocity distributions stretching across multiple scales -- are receiving increasing attention. In this paper, we undertake a study of the Lyman-$α$ constraints associated with general dark-matter velocity distributions. Although these constraints are difficult to evaluate in principle, in practice there exist two ways of recasting them into forms which are easier to evaluate and which therefore allow a more rapid determination of whether a given dark-matter model is ruled in or out. We utilize both of these recasts in order to determine the Lyman-$α$ bounds on different classes of dark-matter velocity distributions. We also develop a general method by which the results of these different recasts can be compared. For relatively simple dark-matter velocity distributions, we find that these two classes of recasts tend to align and give similar results. However, the situation is far more complex for distributions involving multiple velocity scales: while these two recasts continue to yield similar results within certain regions of parameter space, they nevertheless yield dramatically different results within precisely those regions of parameter space which are likely to be phenomenologically relevant. This, then, serves as a cautionary tale regarding the use of such recasts for complex dark-matter velocity distributions.

preprint2022arXiv

Field-free spin-orbit torque-induced switching of perpendicular magnetization at room temperature in WTe2/ferromagnet heterostructures

Spin-orbit torque (SOT) provides an efficient way to achieve charge-to-spin conversion and can switch perpendicular magnetization, which is essential for designing novel energy-efficient spintronic devices. An out-of-plane SOT could directly switch perpendicular magnetization. Encouragingly, field-free perpendicular magnetization switching of a two-dimensional (2D) material WTe2/ferromagnet (FM) bilayer has been reported recently, but the working temperature (200 K) is below room temperature. Here, we report the field-free perpendicular magnetization switching carried out at room temperature on a WTe2/Pt/Co/Pt multilayer film. Controlled experiments confirm that the field-free switching is caused by the in-plane antidamping SOT generated in the Pt/Co/Pt multilayer and the out-of-plane generated in the a-axis WTe2 thin film. This work offers a potential method for using spintronic devices made of two-dimensional materials at room temperature.

preprint2022arXiv

First Lattice QCD determination of semileptonic decays of charmed-strange baryons $Ξ_c$

While the standard model is the most successfully theory to describe all interactions and constituents in elementary particle physics, it has been constantly examined for over four decades. Weak decays of charm quarks can measure the coupling strength of quarks in different families and serve as an ideal probe for CP violation. As the lowest charm-strange baryons with three different flavors, $Ξ_c$ baryons (made of $csu$ or $csd$) have been extensively studied in experiments at the large hadron collider and in electron-positron collision. However the lack of reliable knowledge in theory becomes the unavoidable obstacle in the way. In this work, we use the state-of-the-art Lattice QCD techniques, and generate 2+1 clover fermion ensembles with two lattice spacings, $a=(0.108{\rm fm},0.080{\rm fm})$. We then present the first {\it ab-initio} lattice QCD determination of form factors governing $Ξ_{c}\to Ξ\ell^+ν_{\ell}$, analogous with the notable $β$-decay of nuclei. Our theoretical results for decay widths are consistent with and about two times more precise than the latest measurements by ALICE and Belle collaborations. Together with experimental measurements, we independently determine the quark-mixing matrix element $|V_{cs}|$, which is found in good agreement with other determinations.

preprint2022arXiv

Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition

Nested entities are observed in many domains due to their compositionality, which cannot be easily recognized by the widely-used sequence labeling framework. A natural solution is to treat the task as a span classification problem. To learn better span representation and increase classification performance, it is crucial to effectively integrate heterogeneous factors including inside tokens, boundaries, labels, and related spans which could be contributing to nested entities recognition. To fuse these heterogeneous factors, we propose a novel triaffine mechanism including triaffine attention and scoring. Triaffine attention uses boundaries and labels as queries and uses inside tokens and related spans as keys and values for span representations. Triaffine scoring interacts with boundaries and span representations for classification. Experiments show that our proposed method outperforms previous span-based methods, achieves the state-of-the-art $F_1$ scores on nested NER datasets GENIA and KBP2017, and shows comparable results on ACE2004 and ACE2005.

preprint2022arXiv

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.

preprint2022arXiv

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. Specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representation. Besides, the inherent temporal relations are captured by aligning video-text pairs as a whole in different time resolutions with multi-modal temporal relation exploration task. Furthermore, we introduce the shuffling test to evaluate the temporal reliance of datasets and video-language pre-training models. We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e.g., SSv2-Template and SSv2-Label) with 8.6% and 11.1% improvement respectively. HiTeA also demonstrates strong generalization ability when directly transferred to downstream tasks in a zero-shot manner. Models and demo will be available on ModelScope.

preprint2022arXiv

Image Captioning In the Transformer Age

Image Captioning (IC) has achieved astonishing developments by incorporating various techniques into the CNN-RNN encoder-decoder architecture. However, since CNN and RNN do not share the basic network component, such a heterogeneous pipeline is hard to be trained end-to-end where the visual encoder will not learn anything from the caption supervision. This drawback inspires the researchers to develop a homogeneous architecture that facilitates end-to-end training, for which Transformer is the perfect one that has proven its huge potential in both vision and language domains and thus can be used as the basic component of the visual encoder and language decoder in an IC pipeline. Meantime, self-supervised learning releases the power of the Transformer architecture that a pre-trained large-scale one can be generalized to various tasks including IC. The success of these large-scale models seems to weaken the importance of the single IC task. However, we demonstrate that IC still has its specific significance in this age by analyzing the connections between IC with some popular self-supervised learning paradigms. Due to the page limitation, we only refer to highly important papers in this short survey and more related works can be found at https://github.com/SjokerLily/awesome-image-captioning.

preprint2022arXiv

Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting

We study the knowledge extrapolation problem to embed new components (i.e., entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting. In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations. To solve this problem, we introduce the meta-learning setting, where a set of tasks are sampled on the existing KG to mimic the link prediction task on the emerging KG. Based on sampled tasks, we meta-train a graph neural network framework that can construct features for unseen components based on structural information and output embeddings for them. Experimental results show that our proposed method can effectively embed unseen components and outperforms models that consider inductive settings for KGs and baselines that directly use conventional KG embedding methods.

preprint2022arXiv

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three Chinese-as-a-Second-Language (CSL) learner sources. Each sentence is corrected by three annotators, and their corrections are carefully reviewed by a senior annotator, resulting in 2.3 references per sentence. We conduct experiments with two mainstream CGEC models, i.e., the sequence-to-sequence model and the sequence-to-edit model, both enhanced with large pretrained language models, achieving competitive benchmark performance on previous and our datasets. We also discuss CGEC evaluation methodologies, including the effect of multiple references and using a char-based metric. Our annotation guidelines, data, and code are available at \url{https://github.com/HillZhang1999/MuCGEC}.

preprint2022arXiv

Nucleon and $Δ$ resonances in $γn \to K^+Σ^-$ photoproduction

The most recent data on beam asymmetries, $Σ$, and beam-target asymmetries, $E$, from the CLAS Collaboration, together with the previous data on differential cross sections and beam asymmetries from the CLAS and LEPS Collaborations, for the $γn \to K^+Σ^-$ reaction are studied based on an effective Lagrangian approach in the tree-level Born approximation. The $t$-channel $K$ and $K^\ast(892)$ exchanges, the $u$-channel $Σ$ exchange, the interaction current, and the exchanges of $N$, $Δ$, and their excited states in the $s$ channel are considered in constructing the reaction amplitudes to describe the available experimental data. The reaction mechanisms of $γn \to K^+Σ^-$ are analyzed, and the associated resonances' parameters are extracted. The numerical results show that the $Δ$ exchange and the $N(1710)1/2^+$, $N(1880)1/2^+$, $N(1900)3/2^+$, and $Δ(1920)3/2^+$ resonance exchanges in the $s$-channel dominate the $γn \to K^+Σ^-$ reaction in the lower energy region, and the $t$-channel $K^\ast(892)$ exchange plays a crucial role at forward angles in the higher energy region.

preprint2022arXiv

Parallel Instance Query Network for Named Entity Recognition

Named entity recognition (NER) is a fundamental task in natural language processing. Recent works treat named entity recognition as a reading comprehension task, constructing type-specific queries manually to extract entities. This paradigm suffers from three issues. First, type-specific queries can only extract one type of entities per inference, which is inefficient. Second, the extraction for different types of entities is isolated, ignoring the dependencies between them. Third, query construction relies on external knowledge and is difficult to apply to realistic scenarios with hundreds of entity types. To deal with them, we propose Parallel Instance Query Network (PIQN), which sets up global and learnable instance queries to extract entities from a sentence in a parallel manner. Each instance query predicts one entity, and by feeding all instance queries simultaneously, we can query all entities in parallel. Instead of being constructed from external knowledge, instance queries can learn their different query semantics during training. For training the model, we treat label assignment as a one-to-many Linear Assignment Problem (LAP) and dynamically assign gold entities to instance queries with minimal assignment cost. Experiments on both nested and flat NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

preprint2022arXiv

Photoproduction $γp \to K^+Λ(1690)$ in an effective Lagrangian approach

A gauge-invariant model is constructed for the $γp \to K^+Λ(1690)$ reaction within a tree-level effective Lagrangian approach with the purpose to understand the underlying production mechanisms and to study the resonance contributions in this reaction. In addition to the $t$-channel $K$ and $K^\ast$ exchanges, the $s$-channel nucleon exchange, and the interaction current, the $s$-channel nucleon resonance exchanges are also included in constructing the reaction amplitudes to describe the data. It is found that the contributions from the $s$-channel $N(2570)5/2^-$ exchange are required to describe the most recently measured total cross-section data for $γp \to K^+Λ(1690)$ from the CLAS Collaboration. Further analysis shows that the interaction current dominates the $γp \to K^+Λ(1690)$ reaction near the threshold as a result of gauge invariance. The $t$-channel $K$ exchange contributes significantly, while the contributions from the $t$-channel $K^\ast$ exchange as well as the $s$-channel nucleon exchange turn out to be negligible. The contributions from the $s$-channel $N(2570)5/2^-$ exchange are found to be responsible for the bump structure shown in the CLAS total cross-section data above the center-of-mass energy $W \approx 2.7$ GeV. The predictions of the differential cross sections for $γp \to K^+Λ(1690)$ are shown and discussed, which can provide theoretical guidances for the future experiments.

preprint2022arXiv

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

Structured pruning has been extensively studied on monolingual pre-trained language models and is yet to be fully evaluated on their multilingual counterparts. This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency. Experiments on nine downstream tasks show several counter-intuitive phenomena: for settings, individually pruning for each language does not induce a better result; for algorithms, the simplest method performs the best; for efficiency, a fast model does not imply that it is also small. To facilitate the comparison on all sparsity levels, we present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference. We hope this work fills the gap in the study of structured pruning on multilingual pre-trained models and sheds light on future research.

preprint2022arXiv

Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing

The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing procedure based on Poincaré distance metric, and use the induced relations to augment current graph-based parsers for better schema linking. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences, even when surface forms of mentions and entities differ. Moreover, our probing procedure is entirely unsupervised and requires no additional parameters. Extensive experiments show that our framework sets new state-of-the-art performance on three benchmarks. We empirically verify that our probing procedure can indeed find desired relational structures through qualitative analysis. Our code can be found at https://github.com/AlibabaResearch/DAMO-ConvAI.

preprint2022arXiv

Resurrecting Low-Mass Axion Dark Matter Via a Dynamical QCD Scale

In the framework where the strong coupling is dynamical, the QCD sector may confine at a much higher temperature than it would in the Standard Model, and the temperature-dependent mass of the QCD axion evolves in a non-trivial way. We find that, depending on the evolution of $Λ_{\mathrm{QCD}}$, the axion field may undergo multiple distinct phases of damping and oscillation leading generically to a suppression of its relic abundance. Such a suppression could therefore open up a wide range of parameter space, resurrecting in particular axion dark-matter models with a large Peccei-Quinn scale $f_a\gg 10^{12}~\mathrm{GeV}$, i.e., with a lighter mass than the standard QCD axion.

preprint2022arXiv

Revisiting Dark Matter Freeze-in and Freeze-out through Phase-Space Distribution

We revisit dark-matter production through freeze-in and freeze-out by solving the Boltzmann equations at the level of the phase-space distribution $f(p,t)$. Using the $2\to2$ annihilation and the $1\to2$ decay processes for illustration, we compare the resulting dark-matter relic abundance with that from the number-density approach. In the transition regime between freeze-in and freeze-out, we find the difference can be quite significant, or even by orders of magnitude if the annihilation of dark-matter particles or the decaying mediator is neglected. The freeze-in production in the $2\to2$ and the $1\to 2$ processes can also result in non-thermal phase-space distributions, or even multi-modal ones with out-of-equilibrium decay, which can potentially affect structure formation at late times. We also investigate how elastic scatterings can distort such non-thermal distributions.

preprint2022arXiv

SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding

Pre-training methods with contrastive learning objectives have shown remarkable success in dialog understanding tasks. However, current contrastive learning solely considers the self-augmented dialog samples as positive samples and treats all other dialog samples as negative ones, which enforces dissimilar representations even for dialogs that are semantically related. In this paper, we propose SPACE-2, a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised contrastive pre-training. Concretely, we first define a general semantic tree structure (STS) to unify the inconsistent annotation schema across different dialog datasets, so that the rich structural information stored in all labeled data can be exploited. Then we propose a novel multi-view score function to increase the relevance of all possible dialogs that share similar STSs and only push away other completely different dialogs during supervised contrastive pre-training. To fully exploit unlabeled dialogs, a basic self-supervised contrastive loss is also added to refine the learned representations. Experiments show that our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks. For reproducibility, we release the code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/space-2.

preprint2022arXiv

SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation

Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either dialog understanding or dialog generation, but not both. In this paper, we propose SPACE-3, a novel unified semi-supervised pre-trained conversation model learning from large-scale dialog corpora with limited annotations, which can be effectively fine-tuned on a wide range of downstream dialog tasks. Specifically, SPACE-3 consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation module to produce appropriate responses. We design a dedicated pre-training objective for each component. Concretely, we pre-train the dialog encoding module with span mask language modeling to learn contextualized dialog information. To capture the structured dialog semantics, we pre-train the dialog understanding module via a novel tree-induced semi-supervised contrastive learning objective with the help of extra dialog annotations. In addition, we pre-train the dialog policy module by minimizing the L2 distance between its output policy vector and the semantic vector of the response for policy optimization. Finally, the dialog generation model is pre-trained by language modeling. Results show that SPACE-3 achieves state-of-the-art performance on eight downstream dialog benchmarks, including intent prediction, dialog state tracking, and end-to-end dialog modeling. We also show that SPACE-3 has a stronger few-shot ability than existing models under the low-resource setting.

preprint2022arXiv

Stasis in an Expanding Universe: A Recipe for Stable Mixed-Component Cosmological Eras

One signature of an expanding universe is the time-variation of the cosmological abundances of its different components. For example, a radiation-dominated universe inevitably gives way to a matter-dominated universe, and critical moments such as matter-radiation equality are fleeting. In this paper, we point out that this lore is not always correct, and that it is possible to obtain a form of "stasis" in which the relative cosmological abundances $Ω_i$ of the different components remain unchanged over extended cosmological epochs, even as the universe expands. Moreover, we demonstrate that such situations are not fine-tuned, but are actually global attractors within certain cosmological frameworks, with the universe naturally evolving towards such long-lasting periods of stasis for a wide variety of initial conditions. The existence of this kind of stasis therefore gives rise to a host of new theoretical possibilities across the entire cosmological timeline, ranging from potential implications for primordial density perturbations, dark-matter production, and structure formation all the way to early reheating, early matter-dominated eras, and even the age of the universe.

preprint2022arXiv

Stochastic Gravitational Wave Background from PBH-ABH Mergers

The measurement of gravitational waves produced by binary black-hole mergers at the Advanced LIGO has encouraged extensive studies on the stochastic gravitational wave background. Recent studies have focused on gravitational wave sources made of the same species, such as mergers from binary primordial black holes or those from binary astrophysical black holes. In this paper, we study a new possibility -- the stochastic gravitational wave background produced by mergers of one primordial black hole and one astrophysical black hole. Such systems are necessarily present if primordial black holes exist. We study the isotropic gravitational wave background produced through the history of the Universe. We find it is very challenging to detect such a signal. We also demonstrate that it is improper to treat the gravitational waves produced by such binaries in the Milky Way as a directional stochastic background, due to a very low binary formation rate.

preprint2022arXiv

Using 5G in Smart Cities: A Systematic Mapping Study

5G is the fifth generation wireless network, with a set of characteristics, e.g., high bandwidth and data rates. The scenarios of using 5G include enhanced Mobile Broadband (eMBB), massive Machine Type Communications (mMTC), and ultra-Reliable and Low-Latency Communications (uRLLC). 5G is expected to support a wide variety of applications. We conducted a systematic mapping study that covers the literature published between Jan 2012 and Dec 2019 regarding using 5G in smart cities. The scenarios, architecture, technologies, challenges, and lessons learned of using 5G in smart cities are summarized and further analyzed based on 32 selected studies, and the results are that: (1) The studies are distributed over 27 publication venues. 17 studies report results based on academic studies and 13 studies use demonstration or toy examples. Only 2 studies report using 5G in smart cities based on industrial studies. 16 studies include assumptions of 5G network design or smart city scenarios. (2) The most discussed smart city scenario is transportation, followed by public safety, healthcare, city tourism, entertainment, and education. (3) 28 studies propose and/or discuss the architecture of 5G-enabled smart cities, containing smart city architecture (treating 5G as a component), 5G network architecture in smart cities, and business architecture of using 5G in smart cities. (4) The most mentioned 5G-related technologies are radio access technologies, network slicing, and edge computing. (5) Challenges are mainly about complex context, challenging requirements, and network development of using 5G in smart cities. (6) Most of the lessons learned identified are benefits regarding 5G itself or the proposed 5G-related methods in smart cities. This work provides a reflection of the past eight years of the state of the art on using 5G in smart cities, which can benefit both researchers and practitioners.

preprint2021arXiv

Angular Distributions for Multi-body Semileptonic Charmed Baryon Decays

We perform an analysis of angular distributions in semileptonic decays of charmed baryons $B_1^{(\prime)}\to B_2^{(\prime)}(\to B_3^{(\prime)}B_4^{(\prime)})\ell^+ν_{\ell}$, where the $B_1=(Λ_c^+,Ξ_c^{(0,+)})$ are the SU(3)-antitriplet baryons and $B_1'=Ω_c^-$ is an SU(3) sextet. We will firstly derive analytic expressions for angular distributions using helicity amplitude technique. Based on the lattice QCD results for $Λ_c^+\toΛ$ and $Ξ_c^0\toΞ^-$ form factors and model calculation of the $Ω_c^0\toΩ^-$ transition, we predict branching fractions: $\mathcal{B}(Λ_{c}^{+} \rightarrow p π^{-} e^{+} ν_{e})=2.48(15)\%$, $\mathcal{B}(Λ_{c}^+\rightarrow p π^{-}μ^{+}ν_μ)=2.50(14)\%$, $\mathcal{B}(Ξ_{c}\rightarrow Λπ^{-}e^{+}ν_{e})=2.40(30)\%$, $\mathcal{B}(Ξ_{c}\rightarrow Λπ^{-}μ^{+}ν_ν)=2.41(30)\%$, $\mathcal{B}(Ω_{c}\rightarrow ΛK^{-}e^{+}ν_{e})=0.362(14)\%$, $\mathcal{B}(Ω_{c}\rightarrow ΛK^{-}μ^{+}ν_ν)=0.350(14)\%$. Besides, we also predict the $q^2$-dependence and angular distributions of these processes, in particular the coefficients for the $\cos nθ_{\ell}$ ($\cos nθ_{h}$, $\cos nϕ$) $(n=0, 1, 2, \cdots)$ terms. This work can provide a theoretical basis for the ongoing experiments at BESIII, LHCb and BELLE-II.

preprint2021arXiv

Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing

Semantic parsing has long been a fundamental problem in natural language processing. Recently, cross-domain context-dependent semantic parsing has become a new focus of research. Central to the problem is the challenge of leveraging contextual information of both natural language utterance and database schemas in the interaction history. In this paper, we present a dynamic graph framework that is capable of effectively modelling contextual utterances, tokens, database schemas, and their complicated interaction as the conversation proceeds. The framework employs a dynamic memory decay mechanism that incorporates inductive bias to integrate enriched contextual relation representation, which is further enhanced with a powerful reranking model. At the time of writing, we demonstrate that the proposed framework outperforms all existing models by large margins, achieving new state-of-the-art performance on two large-scale benchmarks, the SParC and CoSQL datasets. Specifically, the model attains a 55.8% question-match and 30.8% interaction-match accuracy on SParC, and a 46.8% question-match and 17.0% interaction-match accuracy on CoSQL.

preprint2021arXiv

Effects of $N(2000){5/2}^+$ on $γp \to K^+ Λ(1405)$

The photoproduction reaction of $γp \to K^+Λ(1405)$ is investigated based on an effective Lagrangian approach at the tree-level approximation with the purpose of understanding the reaction mechanism and extracting the resonance contents and the associated resonance parameters in this reaction. Apart from the $t$-channel $K$ and $K^\ast$ exchanges, $s$-channel nucleon ($N$) exchange, $u$-channel $Σ$, $Λ$, and $Λ(1405)$ exchanges, and generalized contact term, the exchanges of a minimum number of $N$ resonances in the $s$ channel are taken into account in constructing the reaction amplitudes to describe the experimental data. It is found that by introducing the $N(2000){5/2}^+$ resonance exchange in the $s$ channel, one can reproduce the most recent differential cross-section data from the CLAS Collaboration quite well. Further analysis shows that the cross sections of $γp \to K^+Λ(1405)$ at high energies are dominated by the $t$-channel $K$ exchange, while the contributions from the $s$-channel $N$ and $N(2000){5/2}^+$ exchanges are rather significant to the cross sections in the near-threshold energy region. Predictions for the beam and target asymmetries for $γp \to K^+Λ(1405)$ are given.

preprint2021arXiv

High-performance green and blue quantum-dot light-emitting diodes with eliminated charge leakage

Quantum-dot light-emitting diodes (QD-LEDs) promise a new generation of efficient, low-cost, large-area, and flexible electroluminescent devices. However, the inferior performance of green and blue QD-LEDs is hindering the commercialization of QD-LEDs in display and solid-state lighting. Here, we demonstrate best-performing green and blue QD-LEDs with ~100% conversion of the injected charge carriers into emissive excitons. Key to this success is eliminating electron leakage at the organic/inorganic interface by using hole-transport polymers with low electron affinity and reduced energetic disorder. Our devices exhibit record-high peak external quantum efficiencies (28.7% for green, 21.9% for blue), exceptionally high efficiencies in wide ranges of luminance, and unprecedented stability (T95 lifetime: 580,000 h for green, 4,400 h for blue). The overall performance surpasses previously reported solution-processed green and blue LEDs.

preprint2021arXiv

Mortality Forecasting using Factor Models: Time-varying or Time-invariant Factor Loadings?

Many existing mortality models follow the framework of classical factor models, such as the Lee-Carter model and its variants. Latent common factors in factor models are defined as time-related mortality indices (such as $κ_t$ in the Lee-Carter model). Factor loadings, which capture the linear relationship between age variables and latent common factors (such as $β_x$ in the Lee-Carter model), are assumed to be time-invariant in the classical framework. This assumption is usually too restrictive in reality as mortality datasets typically span a long period of time. Driving forces such as medical improvement of certain diseases, environmental changes and technological progress may significantly influence the relationship of different variables. In this paper, we first develop a factor model with time-varying factor loadings (time-varying factor model) as an extension of the classical factor model for mortality modelling. Two forecasting methods to extrapolate the factor loadings, the local regression method and the naive method, are proposed for the time-varying factor model. From the empirical data analysis, we find that the new model can capture the empirical feature of time-varying factor loadings and improve mortality forecasting over different horizons and countries. Further, we propose a novel approach based on change point analysis to estimate the optimal `boundary' between short-term and long-term forecasting, which is favoured by the local linear regression and naive method, respectively. Additionally, simulation studies are provided to show the performance of the time-varying factor model under various scenarios.

preprint2021arXiv

Photoproduction $γp \to K^+Λ(1520)$ in an effective Lagrangian approach

The data on differential cross sections and photon-beam asymmetries for the $γp \to K^+Λ(1520)$ reaction have been analyzed within a tree-level effective Lagrangian approach. In addition to the $t$-channel $K$ and $K^\ast$ exchanges, the $u$-channel $Λ$ exchange, the $s$-channel nucleon exchange, and the interaction current, a minimal number of nucleon resonances in the $s$ channel are introduced in constructing the reaction amplitudes to describe the data. The results show that the experimental data can be well reproduced by including either the $N(2060)5/2^-$ or the $N(2120)3/2^-$ resonance. In both cases, the contact term and the $K$ exchange are found to make significant contributions, while the contributions from the $K^\ast$ and $Λ$ exchanges are negligible in the former case and considerable in the latter case. Measurements of the data on target asymmetries are called on to further pin down the resonance contents and to clarify the roles of the $K^\ast$ and $Λ$ exchanges in this reaction.

preprint2021arXiv

Photoproduction reaction $γn \to K^{\ast 0}Λ$ in an effective Lagrangian approach

In our previous work [Phys. Rev. C 101, 014003 (2020)], the photoproduction reaction $γp \to K^{\ast +} Λ$ has been investigated within an effective Lagrangian approach. There, the reaction amplitudes were constructed by including the $t$-channel $K$, $K^\ast$, and $κ$ exchanges, the $u$-channel $Λ$, $Σ$, and $Σ^\ast$ exchanges, the $s$-channel $N$, $N(2000)5/2^+$, and $N(2060)5/2^-$ exchanges, and the interaction current. It has been shown that the data on both the differential cross sections and the spin density matrix elements were simultaneously and satisfactorily described. In this paper, we study the photoproduction reaction $γn \to K^{\ast 0} Λ$ based on the same reaction mechanism as that of $γp \to K^{\ast +} Λ$ with the purpose of getting a unified description of the data for both $γp \to K^{\ast +} Λ$ and $γn \to K^{\ast 0} Λ$ within a same model. All hadronic coupling constants, form factor cutoffs, and the resonance masses and widths in the present calculations remain the same as in our previous work for $γp \to K^{\ast +} Λ$. The available differential cross-section data for $γn \to K^{\ast 0} Λ$ are well reproduced. Further analysis shows that the cross sections of $γn \to K^{\ast 0} Λ$ are dominated by the contributions of the $t$-channel $K$ exchange, while the $s$-channel $N(2000)5/2^+$ and $N(2060)5/2^-$ exchanges provide considerable contributions as well.

preprint2020arXiv

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

preprint2020arXiv

Analysis of the data on spin density matrix elements for $γp \to K^{*+}Λ$

In our previous work [Phys. Rev. C {\bf 96}, 035206 (2017)], the high-precision differential cross-section data for $γp \to K^{*+}Λ$ reported by the CLAS Collaboration has been analyzed within an effective Lagrangian approach. It was found that apart from the $t$-channel $K$, $K^*$, and $κ$ exchanges, the $u$-channel $Λ$, $Σ$, and $Σ^*$ exchanges, the $s$-channel $N$ exchange, and the interaction current, one needs to introduce at least two nucleon resonances in the $s$ channel in constructing the reaction amplitudes to describe the cross-section data. One of the needed resonances is $N(2060)5/2^-$, and the other one could be one of the $N(2000)5/2^+$, $N(2040)3/2^+$, $N(2100)1/2^+$, $N(2120)3/2^-$, and $N(2190)7/2^-$ resonances. In this paper, we further include in our analysis the data on spin density matrix elements for $K^*$ meson reported recently by the CLAS Collaboration, with the purpose being to impose further constraints on extracting the resonance contents and to gain a better understanding of the reaction mechanism. It turns out that with the new data on spin density matrix elements taken into account, only the set with the $N(2060)5/2^-$ and $N(2000)5/2^+$ resonances among those five possible solutions extracted from the analysis of the differential cross-section data can satisfactorily describe the data on both the differential cross sections and the spin density matrix elements. Further analysis shows that this reaction is dominated by the $t$-channel $K$ exchange and $s$-channel $N(2060)5/2^-$ and $N(2000)5/2^+$ exchanges.

preprint2020arXiv

CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

In text generation evaluation, many practical issues, such as inconsistent experimental settings and metric implementations, are often ignored but lead to unfair evaluation and untenable conclusions. We present CoTK, an open-source toolkit aiming to support fast development and fair evaluation of text generation. In model development, CoTK helps handle the cumbersome issues, such as data processing, metric implementation, and reproduction. It standardizes the development steps and reduces human errors which may lead to inconsistent experimental settings. In model evaluation, CoTK provides implementation for many commonly used metrics and benchmark models across different experimental settings. As a unique feature, CoTK can signify when and which metric cannot be fairly compared. We demonstrate that it is convenient to use CoTK for model development and evaluation, particularly across different experimental settings.

preprint2020arXiv

Deciphering the Archaeological Record: Cosmological Imprints of Non-Minimal Dark Sectors

Many proposals for physics beyond the Standard Model give rise to a dark sector containing many degrees of freedom. In this work, we explore the cosmological implications of the non-trivial dynamics which may arise within such dark sectors, focusing on decay processes which take place entirely among the dark constituents. First, we demonstrate that such decays can leave dramatic imprints on the resulting dark-matter phase-space distribution. In particular, this distribution need not be thermal -- it can even be multi-modal, exhibiting a non-trivial pattern of peaks and troughs as a function of momentum. We then proceed to show how these features can induce modifications to the matter power spectrum. Finally, we assess the extent to which one can approach the archaeological "inverse" problem of deciphering the properties of an underlying dark sector from the matter power spectrum. Indeed, one of the main results of this paper is a remarkably simple conjectured analytic expression which permits the reconstruction of many of the important features of the dark-matter phase-space distribution directly from the matter power spectrum. Our results therefore provide an interesting toolbox of methods for learning about, and potentially constraining, the features of non-minimal dark sectors and their dynamics in the early universe.

preprint2020arXiv

Freeze-in Dark Matter from Secret Neutrino Interactions

We investigate a simplified freeze-in dark-matter model in which the dark matter only interacts with the standard-model neutrinos via a light scalar. The extremely small coupling for the freeze-in mechanism is naturally realized in several neutrino-portal scenarios with the secret neutrino interactions. We study possible evolution history of the dark sector: the dark sector would undergo pure freeze-in production if the interactions between the dark-sector particles are negligible, while thermal equilibrium within the dark sector could occur if the reannihilation of the dark matter and the scalar mediator is rapid enough. We investigate the relic abundance in the freeze-in and dark freeze-out regimes, calculate evolution of the dark temperature, and study its phenomenological aspects on BBN and CMB constraints, the indirect-detection signature, as well as the potential to solve the small scale structure problem.

preprint2020arXiv

Nucleon and $Δ$ resonances in $γp \to K^+ Σ^0(1385)$ photoproduction

The photoproduction of $γp \to K^+ Σ^0(1385)$ is investigated based on an effective Lagrangian approach using the tree-level Born approximation, with the purpose of understanding the reaction mechanisms and resonance contents and their associated parameters in this reaction. In addition to the $t$-channel $K$ and $K^\ast(892)$ exchanges, $s$-channel nucleon ($N$) exchange, $u$-channel $Λ$ exchange, and generalized contact term, the exchanges of a minimum number of $N$ and $Δ$ resonances in the $s$ channel are taken into account in constructing the reaction amplitudes to describe the experimental data. It is found that the most recent differential cross-section data from the CLAS Collaboration can be well reproduced by including one of the $N(1895){1/2}^-$, $Δ(1900){1/2}^-$, and $Δ(1930){5/2}^-$ resonances. The reaction mechanisms of $γp \to K^+ Σ^0(1385)$ are discussed in detail, and the predictions of the beam and target asymmetries for this reaction are given. The cross sections of $γp \to K^0 Σ^+(1385)$ are shown to be able to further constrain the theoretical models and pin down the resonance contents for $γp \to K^+ Σ^0(1385)$.

preprint2020arXiv

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

Self-supervised pre-training, such as BERT, MASS and BART, has emerged as a powerful technique for natural language understanding and generation. Existing pre-training techniques employ autoencoding and/or autoregressive objectives to train Transformer-based models by recovering original word tokens from corrupted text with some masked tokens. The training goals of existing techniques are often inconsistent with the goals of many language generation tasks, such as generative question answering and conversational response generation, for producing new text given context. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. The new scheme alleviates the mismatch introduced by the existing denoising scheme between pre-training and fine-tuning where generation is more than reconstructing original text. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks covering generative question answering (Rank 1 on the official MARCO leaderboard), abstractive summarization on CNN/DailyMail as well as Gigaword, question generation on SQuAD, and conversational response generation on Cornell Movie Dialogues.

preprint2020arXiv

Photoproduction $γp \to K^{*+} Λ$ in a Reggeized model

The high-precision differential cross-section data for the reaction $γp \to K^{*+}Λ$ are reanalyzed within a Regge-inspired effective Lagrangian approach. The model adopts Regge phenomenology to constrain the $t$-channel contributions from the $κ$, $K$, and $K^*$ exchanges. A minimal number of resonances in the $s$ channel are introduced in constructing the reaction amplitudes in order to describe the data. It is shown that the differential cross-section data for $γp \to K^{*+}Λ$ can be satisfactorily described by introducing the only $N(2060){5/2}^-$ resonance in the $s$ channel, which is quite different from our earlier work performed in an effective Lagrangian approach [A. C. Wang {\it et al.}, Phys. Rev. C 96, 035206 (2017)], where the amplitudes are computed by evaluating Feynman diagrams and it is found that introducing at least one additional resonance apart from the $N(2060){5/2}^-$ is indispensable for reproducing the data. The roles of individual contributions from meson and baryon exchanges on the angular distributions are found to be highly model dependent. The extracted mass of $N(2060){5/2}^-$ turns out to be well determined, independent of how the $t$-channel amplitudes are constructed, whereas the width does not.

preprint2020arXiv

Selected strong decays of pentaquark State $P_c(4312)$ in a chiral constituent quark model

The newly confirmed pentaquark state $P_c(4312)$ has been treated as a weakly bound $(Σ_c\bar{D})$ state by a well-established chiral constituent quark model and by a dynamical calculation on quark degrees of freedom where the quark exchange effect is accounted for. The obtained mass $4308$ MeV agrees with data. In this work, the selected strong decays of the $P_c(4312)$ state are studied with the obtained wave function. It is shown that the width of the $Λ_c\bar{D}^*$ decay is overwhelmed and the branching ratios of the $p\,η_c$ and $p\,J/ψ$ decays are both less than 1 percentage.

preprint2020arXiv

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student's and the teachers' structure level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.