Source author record

Xin Pei

Xin Pei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.IM Information Retrieval Social and Information Networks Artificial Intelligence Computation and Language cs.CY Machine Learning

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger reasoning at lower cost. However, this paradigm remains underexplored in mainstream generative recommendation. Adapting it reveals three unique challenges: (1) the gap between prior-less Semantic ID (SID) symbols and continuous latent reasoning - SIDs lack pre-trained semantics, hindering joint optimization; (2) representation drift due to a lack of reasoning chain supervision; and (3) the suboptimality of applying a globally fixed reasoning depth. To address these, we propose LASAR (Latent Adaptive Semantic Aligned Reasoning), an SFT-then-RL framework. First, we bridge this gap via two-stage training: Stage 1 grounds SID semantics before Stage 2 introduces latent reasoning, ensuring efficient convergence. Second, we mitigate representation drift through explicit CoT semantic alignment. Step-wise bidirectional KL divergence constrains the latent reasoning trajectory using hidden-state anchors extracted from CoT text, while a Policy Head predicts per-sample reasoning depth. Third, during the GRPO-based RL phase, terminal-only KL alignment accommodates variable-length reasoning, and REINFORCE optimizes the Policy Head to dynamically allocate steps. This nearly halves the average latent step count while simultaneously improving recommendation quality. Experiments on three real-world datasets demonstrate that LASAR outperforms all baselines. It adds marginal inference latency and is roughly 20 times faster than generating explicit CoT text.

preprint2022arXiv

Multi-dimensional Racism Classification during COVID-19: Stigmatization, Offensiveness, Blame, and Exclusion

Transcending the binary categorization of racist texts, our study takes cues from social science theories to develop a multi-dimensional model for racism detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of BERT and topic modeling, this categorical detection enables insights into the underlying subtlety of racist discussion on digital platforms during COVID-19. Our study contributes to enriching the scholarly discussion on deviant racist behaviours on social media. First, a stage-wise analysis is applied to capture the dynamics of the topic changes across the early stages of COVID-19 which transformed from a domestic epidemic to an international public health emergency and later to a global pandemic. Furthermore, mapping this trend enables a more accurate prediction of public opinion evolvement concerning racism in the offline world, and meanwhile, the enactment of specified intervention strategies to combat the upsurge of racism during the global public health crisis like COVID-19. In addition, this interdisciplinary research also points out a direction for future studies on social network analysis and mining. Integration of social science perspectives into the development of computational methods provides insights into more accurate data detection and analytics.

preprint2020arXiv

#Coronavirus or #Chinesevirus?!: Understanding the negative sentiment reflected in Tweets with racist hashtags across the development of COVID-19

Situated in the global outbreak of COVID-19, our study enriches the discussion concerning the emergent racism and xenophobia on social media. With big data extracted from Twitter, we focus on the analysis of negative sentiment reflected in tweets marked with racist hashtags, as racism and xenophobia are more likely to be delivered via the negative sentiment. Especially, we propose a stage-based approach to capture how the negative sentiment changes along with the three development stages of COVID-19, under which it transformed from a domestic epidemic into an international public health emergency and later, into the global pandemic. At each stage, sentiment analysis enables us to recognize the negative sentiment from tweets with racist hashtags, and keyword extraction allows for the discovery of themes in the expression of negative sentiment by these tweets. Under this public health crisis of human beings, this stage-based approach enables us to provide policy suggestions for the enactment of stage-specific intervention strategies to combat racism and xenophobia on social media in a more effective way.

preprint2020arXiv

First SETI Observations with China's Five-hundred-meter Aperture Spherical radio Telescope (FAST)

The Search for Extraterrestrial Intelligence (SETI) attempts to address the possibility of the presence of technological civilizations beyond the Earth. Benefiting from high sensitivity, large sky coverage, an innovative feed cabin for China's Five-hundred-meter Aperture Spherical radio Telescope (FAST), we performed the SETI first observations with FAST's newly commisioned 19-beam receiver; we report preliminary results in this paper. Using the data stream produced by the SERENDIP VI realtime multibeam SETI spectrometer installed at FAST, as well as its off-line data processing pipelines, we identify and remove four kinds of radio frequency interference(RFI): zone, broadband, multi-beam, and drifting, utilizing the Nebula SETI software pipeline combined with machine learning algorithms. After RFI mitigation, the Nebula pipeline identifies and ranks interesting narrow band candidate ET signals, scoring candidates by the number of times candidate signals have been seen at roughly the same sky position and same frequency, signal strength, proximity to a nearby star or object of interest, along with several other scoring criteria. We show four example candidates groups that demonstrate these RFI mitigation and candidate selection. This preliminary testing on FAST data helps to validate our SETI instrumentation techniques as well as our data processing pipeline.

preprint2020arXiv

Opportunities to Search for Extra-Terrestrial Intelligence with the Five-hundred-meter Aperture Spherical radio Telescope

The discovery of ubiquitous habitable extrasolar planets, combined with revolutionary advances in instrumentation and observational capabilities, has ushered in a renaissance in the search for extra-terrestrial intelligence (SETI). Large scale SETI activities are now underway at numerous international facilities. The Five-hundred-meter Aperture Spherical radio Telescope (FAST) is the largest single-aperture radio telescope in the world, well positioned to conduct sensitive searches for radio emission indicative of exo-intelligence. SETI is one of the five key science goals specified in the original FAST project plan. A collaboration with the Breakthrough Listen Initiative has been initiated in 2016 with a joint statement signed both by Dr. Jun Yan, the then director of the National Astronomical Observatories, Chinese Academy of Sciences (NAOC), and Dr. Peter Worden, the Chairman of the Breakthrough Prize Foundation. In this paper, we highlight some of the unique features of FAST that will allow for novel SETI observations. We identify and describe three different signal types indicative of a technological source, namely, narrow-band, wide-band artificially dispersed, and modulated signals. We here propose observations with FAST to achieve sensitivities never before explored.

preprint2020arXiv

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.

Xin Pei

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

Multi-dimensional Racism Classification during COVID-19: Stigmatization, Offensiveness, Blame, and Exclusion

#Coronavirus or #Chinesevirus?!: Understanding the negative sentiment reflected in Tweets with racist hashtags across the development of COVID-19

First SETI Observations with China's Five-hundred-meter Aperture Spherical radio Telescope (FAST)

Opportunities to Search for Extra-Terrestrial Intelligence with the Five-hundred-meter Aperture Spherical radio Telescope

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning