Source author record

Bohan Liu

Bohan Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computer Vision Software Engineering

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models

Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reasoning in large language models. In this work, we study this question by directly analyzing and intervening on the internal representations of LLMs with Sparse Autoencoders (SAEs), identifying a small set of latent features that are causally associated with LLM reasoning behavior. Across multiple model families and reasoning benchmarks, we find that steering a single reasoning-related latent feature can substantially improve accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT prompting while producing more efficient outputs. We further observe that this reasoning-oriented internal state is triggered early in generation and can override prompt-level instructions that discourage explicit reasoning. Overall, our results suggest that multi-step reasoning in LLMs is supported by latent internal activations that can be externally activated, while CoT prompting is one effective, but not unique, way of activating this mechanism rather than its necessary cause.

preprint2025arXiv

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Spurious bias, a tendency to exploit spurious correlations between superficial input attributes and prediction targets, has revealed a severe robustness pitfall in classical machine learning problems. Multimodal Large Language Models (MLLMs), which leverage pretrained vision and language models, have recently demonstrated strong capability in joint vision-language understanding. However, both the presence and severity of spurious biases in MLLMs remain poorly understood. In this work, we address this gap by analyzing the spurious biases in the multimodal setting and uncovering the specific inference-time data patterns that can manifest this problem. To support this analysis, we introduce MM-SpuBench, a comprehensive, human-verified benchmark dataset consisting of image-class pairs annotated with core and spurious attributes, grounded in our taxonomy of nine distinct types of spurious correlations. The benchmark is constructed using human-interpretable attribute information to capture a wide range of spurious patterns reflective of real-world knowledge. Leveraging this benchmark, we conduct a comprehensive evaluation of the state-of-the-art open-source and proprietary MLLMs with both standard accuracy and the proposed Conditional Generation Likelihood Advantage (CGLA). Our findings highlight the persistence of reliance on spurious correlations and the difficulty of mitigation on our benchmark. We hope this work can inspire new technical strides to mitigate these biases. Our benchmark is publicly available at https://huggingface.co/datasets/mmbench/MM-SpuBench.

preprint2023arXiv

Metrics for Software Process Simulation Modeling

Background: Software Process Simulation (SPS) has become an effective tool for software process management and improvement. However, its adoption in industry is less than what the research community expected due to the burden of measurement cost and the high demand for domain knowledge. The difficulty of extracting appropriate metrics with real data from process enactment is one of the great challenges. Objective: We aim to provide evidence-based support of the process metrics for software process (simulation) modeling. Method: A systematic literature review was performed by extending our previous review series to draw a comprehensive understanding of the metrics for process modeling following a meta-model of ontology of metrics in SPS. Results: We identified 145 process modeling studies that collectively involve 2130 metrics and classified them using the coding technique. Two diagrams which illustrate the high frequency causal relationships used between metrics are proposed in terms of two hierarchical levels of modeling purposes. We revisited the data issues encountered in SPS data preparing phases, as well as identified the corresponding strategies. Conclusion: The results of this study provide process modelers with an evidence-based reference of the identification and the use of metrics in SPS modeling, and further contribute to the development of the body of knowledge on software metrics in the context of process modeling. Furthermore, this study is not limited to process simulation but can be extended to software process modeling, in general. Taking simulation metrics as standards and references can further motivate and guide software developers to improve the collection, governance, and application of process data in practice.

preprint2015arXiv

Random Subspace Learning Approach to High-Dimensional Outliers Detection

We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like minimum covariance determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection is concerned.