Source author record

Lin Shi

Lin Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence cond-mat.mtrl-sci eess.IV Machine Learning physics.comp-ph

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2022arXiv

Automatic Comment Generation via Multi-Pass Deliberation

Deliberation is a common and natural behavior in human daily life. For example, when writing papers or articles, we usually first write drafts, and then iteratively polish them until satisfied. In light of such a human cognitive process, we propose DECOM, which is a multi-pass deliberation framework for automatic comment generation. DECOM consists of multiple Deliberation Models and one Evaluation Model. Given a code snippet, we first extract keywords from the code and retrieve a similar code fragment from a pre-defined corpus. Then, we treat the comment of the retrieved code as the initial draft and input it with the code and keywords into DECOM to start the iterative deliberation process. At each deliberation, the deliberation model polishes the draft and generates a new comment. The evaluation model measures the quality of the newly generated comment to determine whether to end the iterative process or not. When the iterative process is terminated, the best-generated comment will be selected as the target comment. Our approach is evaluated on two real-world datasets in Java (87K) and Python (108K), and experiment results show that our approach outperforms the state-of-the-art baselines. A human evaluation study also confirms the comments generated by DECOM tend to be more readable, informative, and useful.

preprint2022arXiv

BugListener: Identifying and Synthesizing Bug Reports from Collaborative Live Chats

In community-based software development, developers frequently rely on live-chatting to discuss emergent bugs/errors they encounter in daily development tasks. However, it remains a challenging task to accurately record such knowledge due to the noisy nature of interleaved dialogs in live chat data. In this paper, we first formulate the task of identifying and synthesizing bug reports from community live chats, and propose a novel approach, named BugListener, to address the challenges. Specifically, BugListener automates three sub-tasks: 1) Disentangle the dialogs from massive chat logs by using a Feed-Forward neural network; 2) Identify the bug-report dialogs from separated dialogs by modeling the original dialog to the graph-structured dialog and leveraging the graph neural network to learn the contextual information; 3) Synthesize the bug reports by utilizing the TextCNN model and Transfer Learning network to classify the sentences into three groups: observed behaviors (OB), expected behaviors (EB), and steps to reproduce the bug (SR). BugListener is evaluated on six open source projects. The results show that: for bug report identification, BugListener achieves the average F1 of 74.21%, improving the best baseline by 10.37%; and for bug report synthesis task, BugListener could classify the OB, EB, and SR sentences with the F1 of 67.37%, 87.14%, and 65.03%, improving the best baselines by 7.21%, 7.38%, 5.30%, respectively. A human evaluation also confirms the effectiveness of BugListener in generating relevant and accurate bug reports. These demonstrate the significant potential of applying BugListener in community-based software development, for promoting bug discovery and quality improvement.

preprint2022arXiv

Where is Your App Frustrating Users?

User reviews of mobile apps provide a communication channel for developers to perceive user satisfaction. Many app features that users have problems with are usually expressed by key phrases such as "upload pictures", which could be buried in the review texts. The lack of fine-grained view about problematic features could obscure the developers' understanding of where the app is frustrating users, and postpone the improvement of the apps. Existing pattern-based approaches to extract target phrases suffer from low accuracy due to insufficient semantic understanding of the reviews, thus can only summarize the high-level topics/aspects of the reviews. This paper proposes a semantic-aware, fine-grained app review analysis approach (SIRA) to extract, cluster, and visualize the problematic features of apps. The main component of SIRA is a novel BERT+Attr-CRF model for fine-grained problematic feature extraction, which combines textual descriptions and review attributes to better model the semantics of reviews and boost the performance of the traditional BERT-CRF model. SIRA also clusters the extracted phrases based on their semantic relations and presents a visualization of the summaries. Our evaluation on 3,426 reviews from six apps confirms the effectiveness of SIRA in problematic feature extraction and clustering. We further conduct an empirical study with SIRA on 318,534 reviews of 18 popular apps to explore its potential application and examine its usefulness in real-world practice.

preprint2020arXiv

QC-SPHRAM: Quasi-conformal Spherical Harmonics Based Geometric Distortions on Hippocampal Surfaces for Early Detection of the Alzheimer's Disease

We propose a disease classification model, called the QC-SPHARM, for the early detection of the Alzheimer's Disease (AD). The proposed QC-SPHARM can distinguish between normal control (NC) subjects and AD patients, as well as between amnestic mild cognitive impairment (aMCI) patients having high possibility progressing into AD and those who do not. Using the spherical harmonics (SPHARM) based registration, hippocampal surfaces segmented from the ADNI data are individually registered to a template surface constructed from the NC subjects using SPHARM. Local geometric distortions of the deformation from the template surface to each subject are quantified in terms of conformality distortions and curvatures distortions. The measurements are combined with the spherical harmonics coefficients and the total volume change of the subject from the template. Afterwards, a t-test based feature selection method incorporating the bagging strategy is applied to extract those local regions having high discriminating power of the two classes. The disease diagnosis machine can therefore be built using the data under the Support Vector Machine (SVM) setting. Using 110 NC subjects and 110 AD patients from the ADNI database, the proposed algorithm achieves 85:2% testing accuracy on 80 random samples as testing subjects, with the incorporation of surface geometry in the classification machine. Using 20 aMCI patients who has advanced to AD during a two-year period and another 20 aMCI patients who remain non-AD for the next two years, the algorithm achieves 81:2% accuracy using 10 randomly picked subjects as testing data. Our proposed method is 6%-15% better than other classification models without the incorporation of surface geometry. The results demonstrate the advantages of using local geometric distortions as the discriminating criterion for early AD diagnosis.

preprint2019arXiv

Anharmonic corrections to the multiphonon deep-level charge capture ab initio calculations for semiconductors

Nonradiative carrier recombination at semiconductor deep centers is of great importance to both fundamental physics and device engineering. In this letter, we provide a revised analysis of K. Huang's original nonradiative multi-phonon (NMP) theory with ab initio calculations. First, we identify at first-principle level that Huang's concise formula gives the same results as the matrix based formula, and Huang's high temperature formula provides an analytical expression for the coupling constant in Marcus theory. Secondly, the anharmonic effects are corrected by taking into account local phonon mode variation at different charge states of the defect. The corrected capture rates for defects in GaN and SiC agree well with experiments.

preprint2015arXiv

A comparative study of ab initio nonradiative recombination rate calculations under different formalisms

Nonradiative carrier recombination is of both great applied and fundamental importance.But the correct ab initio approaches to calculate it remains to be inconclusive. Here we used 5 different formalisms to calculate the nonradiative carrier recombinations of two complex defect structures GaP:Zn_Ga-O_P and GaN:Zn_Ga-V_N, and compared the results with experiments.In order to apply different multiphonon assisted electron transition formalisms, we have calculated the electron-phonon coupling constants by ab initio density functional theory for all phonon modes. Compared with different methods, the capture coefficients calculated by the static coupling theory are 4.30*10^-8 and 1.46*10^-7 cm^3/s for GaP:Zn_Ga-O_P and GaN:Zn_Ga-V_N, which are in good agreement with the experiment results, 4*10^-8 and 3.0*10^-7 cm^3/s respectively. We also provided arguments for why the static coupling theory should be used to calculate the nonradiative decays of semiconductors.

Lin Shi

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Automatic Comment Generation via Multi-Pass Deliberation

BugListener: Identifying and Synthesizing Bug Reports from Collaborative Live Chats

Where is Your App Frustrating Users?

QC-SPHRAM: Quasi-conformal Spherical Harmonics Based Geometric Distortions on Hippocampal Surfaces for Early Detection of the Alzheimer's Disease

Anharmonic corrections to the multiphonon deep-level charge capture ab initio calculations for semiconductors

A comparative study of ab initio nonradiative recombination rate calculations under different formalisms