Researcher profile

Bowen Xu

Bowen Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation

Activation sparsity can reduce the computational overhead and memory transfers during the forward pass of Large Language Model (LLM) inference. Existing methods face limitations, either demanding time-consuming recovery training that hinders real-world adoption, or relying on empirical magnitude-based pruning, which causes fluctuating sparsity and unstable inference speed-up. This paper introduces LaRoSA (Layerwise Rotated Sparse Activation), a novel method for activation sparsification designed to improve LLM efficiency without requiring additional training or magnitude-based pruning. We leverage layerwise orthogonal rotations to transform input activations into rotated forms that are more suitable for sparsification. By employing a Top-K selection approach within the rotated activations, we achieve consistent model-level sparsity and reliable wall-clock time speed-up. LaRoSA is effective across various sizes and types of LLMs, demonstrating minimal performance degradation and robust inference acceleration. Specifically, for LLaMA2-7B at 40% sparsity, LaRoSA achieves a mere 0.17 perplexity gap with a consistent 1.30x wall-clock time speed-up, and reduces the accuracy gap in zero-shot tasks compared to the dense model to just 0.54%, while surpassing TEAL by 1.77% and CATS by 17.14%.

preprint2022arXiv

Accurate Polygonal Mapping of Buildings in Satellite Imagery

This paper studies the problem of polygonal mapping of buildings by tackling the issue of mask reversibility that leads to a notable performance gap between the predicted masks and polygons from the learning-based methods. We addressed such an issue by exploiting the hierarchical supervision (of bottom-level vertices, mid-level line segments and the high-level regional masks) and proposed a novel interaction mechanism of feature embedding sourced from different levels of supervision signals to obtain reversible building masks for polygonal mapping of buildings. As a result, we show that the learned reversible building masks take all the merits of the advances of deep convolutional neural networks for high-performing polygonal mapping of buildings. In the experiments, we evaluated our method on the two public benchmarks of AICrowd and Inria. On the AICrowd dataset, our proposed method obtains unanimous improvements on the metrics of AP, APboundary and PoLiS. For the Inria dataset, our proposed method also obtains very competitive results on the metrics of IoU and Accuracy. The models and source code are available at https://github.com/SarahwXU.

preprint2022arXiv

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.

preprint2022arXiv

Can Identifier Splitting Improve Open-Vocabulary Language Model of Code?

Statistical language models on source code have successfully assisted software engineering tasks. However, developers can create or pick arbitrary identifiers when writing source code. Freely chosen identifiers lead to the notorious out-of-vocabulary (OOV) problem that negatively affects model performance. Recently, Karampatsis et al. showed that using the Byte Pair Encoding (BPE) algorithm to address the OOV problem can improve the language models' predictive performance on source code. However, a drawback of BPE is that it cannot split the identifiers in a way that preserves the meaningful semantics. Prior researchers also show that splitting compound identifiers into sub-words that reflect the semantics can benefit software development tools. These two facts motivate us to explore whether identifier splitting techniques can be utilized to augment the BPE algorithm and boost the performance of open-vocabulary language models considered in Karampatsis et al.'s work. This paper proposes to split identifiers in both constructing vocabulary and processing model inputs procedures, thus exploiting three different settings of applying identifier splitting to language models for the code completion task. We contrast models' performance under these settings and find that simply inserting identifier splitting into the pipeline hurts the model performance, while a hybrid strategy combining identifier splitting and the BPE algorithm can outperform the original open-vocabulary models on predicting identifiers by 3.68% of recall and 6.32% of Mean Reciprocal Rank. The results also show that the hybrid strategy can improve the entropy of language models by 2.02%.

preprint2022arXiv

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Stack Overflow is often viewed as the most influential Software Question Answer (SQA) website with millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which leads to tag synonym and tag explosion problems. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging the software engineering (SE) domain-specific PTM CodeBERT in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art deep learning (Convolutional Neural Network-based) approach by a large margin in terms of average $Precision@k$, $Recall@k$, and $F1$-$score@k$. We conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in predicting the most relevant tags, and utilizing all the components achieves the best performance.

preprint2021arXiv

4D Attention-based Neural Network for EEG Emotion Recognition

Electroencephalograph (EEG) emotion recognition is a significant task in the brain-computer interface field. Although many deep learning methods are proposed recently, it is still challenging to make full use of the information contained in different domains of EEG signals. In this paper, we present a novel method, called four-dimensional attention-based neural network (4D-aNN) for EEG emotion recognition. First, raw EEG signals are transformed into 4D spatial-spectral-temporal representations. Then, the proposed 4D-aNN adopts spectral and spatial attention mechanisms to adaptively assign the weights of different brain regions and frequency bands, and a convolutional neural network (CNN) is utilized to deal with the spectral and spatial information of the 4D representations. Moreover, a temporal attention mechanism is integrated into a bidirectional Long Short-Term Memory (LSTM) to explore temporal dependencies of the 4D representations. Our model achieves state-of-the-art performance on the SEED dataset under intra-subject splitting. The experimental results have shown the effectiveness of the attention mechanisms in different domains for EEG emotion recognition.