Researcher profile

Xuemei Tang

Xuemei Tang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

CHisAgent: A Multi-Agent Framework for Event Taxonomy Construction in Ancient Chinese Cultural Systems

Despite strong performance on many tasks, large language models (LLMs) show limited ability in historical and cultural reasoning, particularly in non-English contexts such as Chinese history. Taxonomic structures offer an effective mechanism to organize historical knowledge and improve understanding. However, manual taxonomy construction is costly and difficult to scale. Therefore, we propose \textbf{CHisAgent}, a multi-agent LLM framework for historical taxonomy construction in ancient Chinese contexts. CHisAgent decomposes taxonomy construction into three role-specialized stages: a bottom-up \textit{Inducer} that derives an initial hierarchy from raw historical corpora, a top-down \textit{Expander} that introduces missing intermediate concepts using LLM world knowledge, and an evidence-guided \textit{Enricher} that integrates external structured historical resources to ensure faithfulness. Using the \textit{Twenty-Four Histories}, we construct a large-scale, domain-aware event taxonomy covering politics, military, diplomacy, and social life in ancient China. Extensive reference-free and reference-based evaluations demonstrate improved structural coherence and coverage, while further analysis shows that the resulting taxonomy supports cross-cultural alignment.

preprint2026arXiv

Do Large Language Models Plan Answer Positions? Position Bias in Multiple-Choice Question Generation

Large language models (LLMs) are increasingly used to generate multiple-choice questions (MCQs), where correct answers should ideally be uniformly distributed across options. However, we observe that LLMs exhibit systematic position biases during generation. Through extensive experiments with 10 LLMs and 5 vision-language models (VLMs) on three MCQ generation tasks, we show that these biases are structured, with similar patterns emerging within model families. To investigate the underlying mechanisms, we conduct probing experiments and find that hidden representations in the question stem encode predictive signals of the correct answer position, suggesting that answer position may be implicitly planned during generation. Building on this insight, we apply activation steering to manipulate internal representations and influence answer position. Our results show that steering can partially control positional preferences and substantially shift answer position distributions. Our findings provide a practical framework for studying implicit positional planning in LLMs and highlight the importance of controllable generation for reliable MCQ construction and evaluation.

preprint2022arXiv

Chinese Word Segmentation with Heterogeneous Graph Neural Network

In recent years, deep learning has achieved significant success in the Chinese word segmentation (CWS) task. Most of these methods improve the performance of CWS by leveraging external information, e.g., words, sub-words, syntax. However, existing approaches fail to effectively integrate the multi-level linguistic information and also ignore the structural feature of the external information. Therefore, in this paper, we proposed a framework to improve CWS, named HGNSeg. It exploits multi-level external information sufficiently with the pre-trained language model and heterogeneous graph neural network. The experimental results on six benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008) validate that our approach can effectively improve the performance of Chinese word segmentation. Importantly, in cross-domain scenarios, our method also shows a strong ability to alleviate the out-of-vocabulary (OOV) problem.

preprint2022arXiv

That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.