Researcher profile

Jiajun Chen

Jiajun Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoE

Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by fusing a multilingual CPT-enhanced model with its instruct counterpart, they are plagued by a critical trade-off: mitigating parameter conflicts to preserve original abilities inevitably dilutes new language acquisition, and vice-versa. To resolve this conflict, we introduce \method, which upcycles a dense model into a Mixture-of-Experts~(MoE) architecture, allocating different experts to different languages. Alignment ability is then transferred by grafting a MoE-expanded parameter delta~($Δ_{\text{post}}$) to the CPT-enhanced base model, bypassing the complex alignment phase. Experiments demonstrate \method's superiority even against baselines with similar FLOPs or number of parameters; it improves performance on expanded languages while effectively preserving original capabilities. We further show our approach is highly applicable across different models and Post-training deltas.

preprint2022arXiv

$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation

Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose $\textit{latent}$-GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm.

preprint2022arXiv

A Numerical Reasoning Question Answering System with Fine-grained Retriever and the Ensemble of Multiple Generators for FinQA

The numerical reasoning in the financial domain -- performing quantitative analysis and summarizing the information from financial reports -- can greatly increase business efficiency and reduce costs of billions of dollars. Here, we propose a numerical reasoning question answering system to answer numerical reasoning questions among financial text and table data sources, consisting of a retriever module, a generator module, and an ensemble module. Specifically, in the retriever module, in addition to retrieving the whole row data, we innovatively design a cell retriever that retrieves the gold cells to avoid bringing unrelated and similar cells in the same row to the inputs of the generator module. In the generator module, we utilize multiple generators to produce programs, which are operation steps to answer the question. Finally, in the ensemble module, we integrate multiple programs to choose the best program as the output of our system. In the final private test set in FinQA Competition, our system obtains 69.79 execution accuracy.

preprint2022arXiv

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

preprint2022arXiv

Analyzing the Intensity of Complaints on Social Media

Complaining is a speech act that expresses a negative inconsistency between reality and human expectations. While prior studies mostly focus on identifying the existence or the type of complaints, in this work, we present the first study in computational linguistics of measuring the intensity of complaints from text. Analyzing complaints from such perspective is particularly useful, as complaints of certain degrees may cause severe consequences for companies or organizations. We create the first Chinese dataset containing 3,103 posts about complaints from Weibo, a popular Chinese social media platform. These posts are then annotated with complaints intensity scores using Best-Worst Scaling (BWS) method. We show that complaints intensity can be accurately estimated by computational models with the best mean square error achieving 0.11. Furthermore, we conduct a comprehensive linguistic analysis around complaints, including the connections between complaints and sentiment, and a cross-lingual comparison for complaints expressions used by Chinese and English speakers. We finally show that our complaints intensity scores can be incorporated for better estimating the popularity of posts on social media.

preprint2022arXiv

Catalytic growth of ultralong graphene nanoribbons on insulating substrates

Graphene nanoribbons (GNRs) with widths of a few nanometres are promising candidates for future nano-electronic applications due to their structurally tunable bandgaps, ultrahigh carrier mobilities, and exceptional stability. However, the direct growth of micrometre-long GNRs on insulating substrates, which is essential for the fabrication of nano-electronic devices, remains an immense challenge. Here, we report the epitaxial growth of GNRs on an insulating hexagonal boron nitride (h-BN) substrate through nanoparticle-catalysed chemical vapor deposition (CVD). Ultra-narrow GNRs with lengths of up to 10 μm are synthesized. Remarkably, the as-grown GNRs are crystallographically aligned with the h-BN substrate, forming one-dimensional (1D) moiré superlattices. Scanning tunnelling microscopy reveals an average width of 2 nm and a typical bandgap of ~1 eV for similar GNRs grown on conducting graphite substrates. Fully atomistic computational simulations support the experimental results and reveal a competition between the formation of GNRs and carbon nanotubes (CNTs) during the nucleation stage, and van der Waals sliding of the GNRs on the h-BN substrate throughout the growth stage. Our study provides a scalable, single-step method for growing micrometre-long narrow GNRs on insulating substrates, thus opening a route to explore the performance of high-quality GNR devices and the fundamental physics of 1D moiré superlattices.

preprint2022arXiv

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Sequence-to-sequence learning naturally has two directions. How to effectively utilize supervision signals from both directions? Existing approaches either require two separate models, or a multitask-learned model but with inferior performance. In this paper, we propose REDER (Reversible Duplex Transformer), a parameter-efficient model and apply it to machine translation. Either end of REDER can simultaneously input and output a distinct language. Thus REDER enables reversible machine translation by simply flipping the input and output ends. Experiments verify that REDER achieves the first success of reversible machine translation, which helps outperform its multitask-trained baselines by up to 1.3 BLEU.

preprint2022arXiv

Exploiting Global Semantic Similarities in Knowledge Graphs by Relational Prototype Entities

Knowledge graph (KG) embedding aims at learning the latent representations for entities and relations of a KG in continuous vector spaces. An empirical observation is that the head (tail) entities connected by the same relation often share similar semantic attributes -- specifically, they often belong to the same category -- no matter how far away they are from each other in the KG; that is, they share global semantic similarities. However, many existing methods derive KG embeddings based on the local information, which fail to effectively capture such global semantic similarities among entities. To address this challenge, we propose a novel approach, which introduces a set of virtual nodes called \textit{\textbf{relational prototype entities}} to represent the prototypes of the head and tail entities connected by the same relations. By enforcing the entities' embeddings close to their associated prototypes' embeddings, our approach can effectively encourage the global semantic similarities of entities -- that can be far away in the KG -- connected by the same relation. Experiments on the entity alignment and KG completion tasks demonstrate that our approach significantly outperforms recent state-of-the-arts.

preprint2022arXiv

Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation

Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, it heavily relies on high-quality in-domain parallel corpora, limiting its capability on unsupervised domain adaptation, where in-domain parallel corpora are scarce or nonexistent. In this paper, we propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval. To this end, we first introduce an autoencoder task based on the target language, and then insert lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task. Experiments on multi-domain datasets demonstrate that our proposed approach significantly improves the translation accuracy with target-side monolingual data, while achieving comparable performance with back-translation.

preprint2022arXiv

Relaxation times for Bose-Einstein condensation by self-interaction and gravity

In this paper, we study the Bose-Einstein condensation of a scalar field with an attractive self-interaction, with or without gravitational interactions. We confirm through full dynamical simulation that the condensation timescale due to self-interaction is inversely proportional to the square of the number density $n$ and the self-coupling constant $g$ : $τ\propto n^{-2} g^{-2}$. We also investigate the condensation timescale when self-interaction and gravity are both important by solving the Gross-Pitaevskii-Poisson equations, and find that the condensation time scales according to an additive model for the cross section. We discuss the relevance of our results to theoretical models of boson star formation by condensation.

preprint2022arXiv

Rethinking Document-level Neural Machine Translation

This paper does not aim at introducing a novel model for document-level neural machine translation. Instead, we head back to the original Transformer model and hope to answer the following question: Is the capacity of current models strong enough for document-level translation? Interestingly, we observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words. We evaluate this model and several recent approaches on nine document-level datasets and two sentence-level datasets across six languages. Experiments show that document-level Transformer models outperforms sentence-level ones and many previous methods in a comprehensive set of metrics, including BLEU, four lexical indices, three newly proposed assistant linguistic indicators, and human evaluation.

preprint2022arXiv

What Makes for Automatic Reconstruction of Pulmonary Segments

3D reconstruction of pulmonary segments plays an important role in surgical treatment planning of lung cancer, which facilitates preservation of pulmonary function and helps ensure low recurrence rates. However, automatic reconstruction of pulmonary segments remains unexplored in the era of deep learning. In this paper, we investigate what makes for automatic reconstruction of pulmonary segments. First and foremost, we formulate, clinically and geometrically, the anatomical definitions of pulmonary segments, and propose evaluation metrics adhering to these definitions. Second, we propose ImPulSe (Implicit Pulmonary Segment), a deep implicit surface model designed for pulmonary segment reconstruction. The automatic reconstruction of pulmonary segments by ImPulSe is accurate in metrics and visually appealing. Compared with canonical segmentation methods, ImPulSe outputs continuous predictions of arbitrary resolutions with higher training efficiency and fewer parameters. Lastly, we experiment with different network inputs to analyze what matters in the task of pulmonary segment reconstruction. Our code is available at https://github.com/M3DV/ImPulSe.

preprint2021arXiv

Topology-Aware Correlations Between Relations for Inductive Link Prediction in Knowledge Graphs

Inductive link prediction -- where entities during training and inference stages can be different -- has been shown to be promising for completing continuously evolving knowledge graphs. Existing models of inductive reasoning mainly focus on predicting missing links by learning logical rules. However, many existing approaches do not take into account semantic correlations between relations, which are commonly seen in real-world knowledge graphs. To address this challenge, we propose a novel inductive reasoning approach, namely TACT, which can effectively exploit Topology-Aware CorrelaTions between relations in an entity-independent manner. TACT is inspired by the observation that the semantic correlation between two relations is highly correlated to their topological structure in knowledge graphs. Specifically, we categorize all relation pairs into several topological patterns, and then propose a Relational Correlation Network (RCN) to learn the importance of the different patterns for inductive link prediction. Experiments demonstrate that TACT can effectively model semantic correlations between relations, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the inductive link prediction task.

preprint2020arXiv

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance. Instead of collecting and analyzing bad cases using limited handcrafted error features, here we investigate this issue by generating adversarial examples via a new paradigm based on reinforcement learning. Our paradigm could expose pitfalls for a given performance metric, e.g., BLEU, and could target any given neural machine translation architecture. We conduct experiments of adversarial attacks on two mainstream neural machine translation architectures, RNN-search, and Transformer. The results show that our method efficiently produces stable attacks with meaning-preserving adversarial examples. We also present a qualitative and quantitative analysis for the preference pattern of the attack, demonstrating its capability of pitfall exposure.

preprint2020arXiv

Adding A Filter Based on The Discriminator to Improve Unconditional Text Generation

The autoregressive language model (ALM) trained with maximum likelihood estimation (MLE) is widely used in unconditional text generation. Due to exposure bias, the generated texts still suffer from low quality and diversity. This presents statistically as a discrepancy between the real text and generated text. Some research shows a discriminator can detect this discrepancy. Because the discriminator can encode more information than the generator, discriminator has the potentiality to improve generator. To alleviate the exposure bias, generative adversarial networks (GAN) use the discriminator to update the generator's parameters directly, but they fail by being evaluated precisely. A critical reason for the failure is the difference between the discriminator input and the ALM input. We propose a novel mechanism by adding a filter which has the same input as the discriminator. First, discriminator detects the discrepancy signals and passes to filter directly (or by learning). Then, we use the filter to reject some generated samples with a sampling-based method. Thus, the original generative distribution is revised to reduce the discrepancy. Two ALMs, RNN-based and Transformer-based, are experimented. Evaluated precisely by three metrics, our mechanism consistently outperforms the ALMs and all kinds of GANs across two benchmark data sets.

preprint2020arXiv

GRET: Global Representation Enhanced Transformer

Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation.

preprint2020arXiv

Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction

Target-oriented opinion words extraction (TOWE) is a new subtask of ABSA, which aims to extract the corresponding opinion words for a given opinion target in a sentence. Recently, neural network methods have been applied to this task and achieve promising results. However, the difficulty of annotation causes the datasets of TOWE to be insufficient, which heavily limits the performance of neural models. By contrast, abundant review sentiment classification data are easily available at online review sites. These reviews contain substantial latent opinions information and semantic patterns. In this paper, we propose a novel model to transfer these opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE. To address the challenges in the transfer process, we design an effective transformation method to obtain latent opinions, then integrate them into TOWE. Extensive experimental results show that our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge. Further analysis validates the effectiveness of our model.

preprint2020arXiv

Prompt Agnostic Essay Scorer: A Domain Generalization Approach to Cross-prompt Automated Essay Scoring

Cross-prompt automated essay scoring (AES) requires the system to use non target-prompt essays to award scores to a target-prompt essay. Since obtaining a large quantity of pre-graded essays to a particular prompt is often difficult and unrealistic, the task of cross-prompt AES is vital for the development of real-world AES systems, yet it remains an under-explored area of research. Models designed for prompt-specific AES rely heavily on prompt-specific knowledge and perform poorly in the cross-prompt setting, whereas current approaches to cross-prompt AES either require a certain quantity of labelled target-prompt essays or require a large quantity of unlabelled target-prompt essays to perform transfer learning in a multi-step manner. To address these issues, we introduce Prompt Agnostic Essay Scorer (PAES) for cross-prompt AES. Our method requires no access to labelled or unlabelled target-prompt data during training and is a single-stage approach. PAES is easy to apply in practice and achieves state-of-the-art performance on the Automated Student Assessment Prize (ASAP) dataset.

preprint2020arXiv

Towards Making the Most of Context in Neural Machine Translation

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2.1 BLEU on state-of-the-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated.