Source author record

Mingwei Liu

Mingwei Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence Computation and Language cond-mat.mes-hall cond-mat.mtrl-sci

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Issue resolution, a complex Software Engineering (SWE) task integral to real-world development, has emerged as a compelling challenge for artificial intelligence. The establishment of benchmarks like SWE-bench revealed this task as profoundly difficult for large language models, thereby significantly accelerating the evolution of autonomous coding agents. This paper presents a systematic survey of this emerging domain. We begin by examining data construction pipelines, covering automated collection and synthesis approaches. We then provide a comprehensive analysis of methodologies, spanning training-free frameworks with their modular components to training-based techniques, including supervised fine-tuning and reinforcement learning. Subsequently, we discuss critical analyses of data quality and agent behavior, alongside practical applications. Finally, we identify key challenges and outline promising directions for future research. An open-source repository is maintained at https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution to serve as a dynamic resource in this field.

preprint2026arXiv

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

Large language models (LLMs) frequently generate defective outputs in code generation tasks, ranging from logical bugs to security vulnerabilities. While these generation failures are often treated as model-level limitations, empirical evidence increasingly traces their root causes to imperfections within the training corpora. Yet, the specific mechanisms linking training data quality issues to generated code quality issues remain largely unmapped. This paper presents a systematic literature review of 114 primary studies to investigate how training data quality issues propagate into code generation. We establish a unified taxonomy that categorizes generated code quality issues across nine dimensions and training data quality issues into code and non-code attributes. Based on this taxonomy, we formalize a causal framework detailing 18 typical propagation mapping mechanisms. Furthermore, we synthesize state-of-the-art detection and mitigation techniques across the data, model, and generation lifecycles. The reviewed literature reveals a clear methodological shift: quality assurance is transitioning from reactive, heuristic-based post-generation filtering toward proactive, data-centric governance and closed-loop repair. Finally, we identify open challenges and outline research directions for developing reliable LLMs for code through integrated data curation and continuous evaluation. Our repository is available at https://github.com/SYSUSELab/From-Data-to-Code.

preprint2026arXiv

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has significantly advanced code generation, though their efficiency is still impacted by certain inherent architectural constraints. Each token generation necessitates a complete inference pass, requiring persistent retention of contextual information in memory and escalating resource consumption. While existing research prioritizes inference-phase optimizations such as prompt compression and model quantization, the generation phase remains underexplored. To tackle these challenges, we propose a knowledge-infused framework named ShortCoder, which optimizes code generation efficiency while preserving semantic equivalence and readability. In particular, we introduce: (1) ten syntax-level simplification rules for Python, derived from AST-preserving transformations, achieving 18.1% token reduction without functional compromise; (2) a hybrid data synthesis pipeline integrating rule-based rewriting with LLM-guided refinement, producing ShorterCodeBench, a corpus of validated tuples of original code and simplified code with semantic consistency; (3) a fine-tuning strategy that injects conciseness awareness into the base LLMs. Extensive experimental results demonstrate that ShortCoder consistently outperforms state-of-the-art methods on HumanEval, achieving an improvement of 18.1%-37.8% in generation efficiency over previous methods while ensuring the performance of code generation.

preprint2021arXiv

Dynamics of Polar Skyrmion Bubbles under Electric Fields

Room-temperature polar skyrmion bubbles that are recently found in oxide superlattice, have received enormous interests for their potential applications in nanoelectronics due to the nanometer size, emergent chirality, and negative capacitance. For practical applications, the ability to controllably manipulate them by using external stimuli is prerequisite. Here, we study the dynamics of individual polar skyrmion bubbles at the nanoscale by using in situ biasing in a scanning transmission electron microscope. The reversible electric field-driven phase transition between topological and trivial polar states are demonstrated. We create, erase and monitor the shrinkage and expansion of individual polar skyrmions. We find that their transition behaviors are substantially different from that of magnetic analogue. The underlying mechanism is discussed by combing with the phase-field simulations. The controllable manipulation of nanoscale polar skyrmions allows us to tune the dielectric permittivity at atomic scale and detailed knowledge of their phase transition behaviors provides fundamentals for their applications in nanoelectronics.

Mingwei Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Dynamics of Polar Skyrmion Bubbles under Electric Fields