Researcher profile

Lin Qiu

Lin Qiu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

As LLM-based agents increasingly rely on external tools, it is important to evaluate their ability to sustain tool-grounded reasoning beyond familiar workflows and short-range interactions. We introduce AgentEscapeBench, an escape-room-style benchmark that tests whether agents can infer, execute, and revise novel tool-use procedures under explicit long-range dependency constraints. Each task defines a directed acyclic dependency graph over tools and items, requiring agents to invoke real external functions, track hidden state revealed incrementally, propagate intermediate results, and submit a deterministically verifiable final answer. AgentEscapeBench includes 270 instances across five difficulty tiers and supports fully automated evaluation. Experiments with sixteen LLM agents and human participants show that performance drops sharply as dependency depth increases: humans decline from 98.3% success at difficulty-5 to 80.0% at difficulty-25, while the best model drops from 90.0% to 60.0%. Trajectory analysis attributes model failures mainly to breakdowns in long-range state tracking, clue adherence, and intermediate-result propagation. These findings suggest that current agents can often handle local tool use but still struggle with deep contextual dependencies. We hope AgentEscapeBench can serve as a diagnostic testbed for measuring current agent capabilities and informing future training efforts toward more robust general-purpose reasoning, action, and adaptation.

preprint2026arXiv

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks

The evaluation of Large Language Models (LLMs) for software engineering has shifted towards complex, repository-level tasks. However, existing benchmarks predominantly rely on coarse-grained pass rates that treat programming proficiency as a monolithic capability, obscuring specific cognitive bottlenecks. Furthermore, the static nature of these benchmarks renders them vulnerable to data contamination and performance saturation. To address these limitations, we introduce CoreCodeBench, a configurable repository-level benchmark designed to dissect coding capabilities through atomized tasks. Leveraging our automated framework, CorePipe, we extract and transform Python repositories into a comprehensive suite of tasks that isolate distinct cognitive demands within identical code contexts. Unlike static evaluations, CoreCodeBench supports controllable difficulty scaling to prevent saturation and ensures superior data quality. It achieves a 78.55% validity yield, significantly surpassing the 31.7% retention rate of SWE-bench-Verified. Extensive experiments with state-of-the-art LLMs reveal a significant capability misalignment, evidenced by distinct ranking shifts across cognitive dimensions. This indicates that coding proficiency is non-monolithic, as strength in one aspect does not necessarily translate to others. These findings underscore the necessity of our fine-grained taxonomy in diagnosing model deficiencies and offer a sustainable, rigorous framework for evolving code intelligence. The code for CorePipe is available at https://github.com/AGI-Eval-Official/CoreCodeBench, and the data for CoreCodeBench can be accessed at https://huggingface.co/collections/tubehhh/corecodebench-68256d2faabf4b1610a08caa.

preprint2024arXiv

Spectral integrated neural networks (SINNs) for solving forward and inverse dynamic problems

This paper proposes a novel neural network framework, denoted as spectral integrated neural networks (SINNs), for resolving three-dimensional forward and inverse dynamic problems. In the SINNs, the spectral integration method is applied to perform temporal discretization, and then a fully connected neural network is adopted to solve resulting partial differential equations (PDEs) in the spatial domain. Specifically, spatial coordinates are employed as inputs in the network architecture, and the output layer is configured with multiple outputs, each dedicated to approximating solutions at different time instances characterized by Gaussian points used in the spectral method. By leveraging the automatic differentiation technique and spectral integration scheme, the SINNs minimize the loss function, constructed based on the governing PDEs and boundary conditions, to obtain solutions for dynamic problems. Additionally, we utilize polynomial basis functions to expand the unknown function, aiming to enhance the performance of SINNs in addressing inverse problems. The conceived framework is tested on six forward and inverse dynamic problems, involving nonlinear PDEs. Numerical results demonstrate the superior performance of SINNs over the popularly used physics-informed neural networks in terms of convergence speed, computational accuracy and efficiency. It is also noteworthy that the SINNs exhibit the capability to deliver accurate and stable solutions for long-time dynamic problems.

preprint2023arXiv

Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival Prediction

The integration of multi-modal data, such as pathological images and genomic data, is essential for understanding cancer heterogeneity and complexity for personalized treatments, as well as for enhancing survival predictions. Despite the progress made in integrating pathology and genomic data, most existing methods cannot mine the complex inter-modality relations thoroughly. Additionally, identifying explainable features from these models that govern preclinical discovery and clinical prediction is crucial for cancer diagnosis, prognosis, and therapeutic response studies. We propose PONET- a novel biological pathway-informed pathology-genomic deep model that integrates pathological images and genomic data not only to improve survival prediction but also to identify genes and pathways that cause different survival rates in patients. Empirical results on six of The Cancer Genome Atlas (TCGA) datasets show that our proposed method achieves superior predictive performance and reveals meaningful biological interpretations. The proposed method establishes insight into how to train biologically informed deep networks on multimodal biomedical data which will have general applicability for understanding diseases and predicting response and resistance to treatment.

preprint2022arXiv

Towards the establishment of the light $J^{P(C)}=1^{-(+)}$ hybrid nonet

The observation of the light hybrid candidate $η_1(1855)$ by the BESIII Collaboration brings great opportunities for advancing our knowledges about exotic hadrons in the light flavor sector. We show that this observation provides a crucial clue for establishing the $J^{P(C)}=1^{-(+)}$ hybrid nonet. Based on the flux tube model picture, the production and decay mechanisms for the $J^{P(C)}=1^{-(+)}$ hybrid nonet in the $J/ψ$ radiative decays into two pseudoscalar mesons are investigated. In the $I=0$ sector, we find that the SU(3) flavor octet and singlet mixing is non-negligible and apparently deviates from the flavor ideal mixing. Since only signals for one isoscalar $η_1(1855)$ are observed in the $ηη'$ channel, we investigate two schemes of the nonet structure in which $η_1(1855)$ can be either the higher or lower mass state that strongly couples to $ηη'$. Possible channels for detecting the multiplets are suggested. In particular, a combined analysis of the hybrid production in $J/ψ\to VH$, where $V$ and $H$ stand for the light vector mesons and $1^{-(+)}$ hybrid states, may provide further evidence for this nonet structure and finally establish these mysterious exotic species in experiment.

preprint2022arXiv

Variational Interpretable Learning from Multi-view Data

The main idea of canonical correlation analysis (CCA) is to map different views onto a common latent space with maximum correlation. We propose a deep interpretable variational canonical correlation analysis (DICCA) for multi-view learning. The developed model extends the existing latent variable model for linear CCA to nonlinear models through the use of deep generative networks. DICCA is designed to disentangle both the shared and view-specific variations for multi-view data. To further make the model more interpretable, we place a sparsity-inducing prior on the latent weight with a structured variational autoencoder that is comprised of view-specific generators. Empirical results on real-world datasets show that our methods are competitive across domains.

preprint2021arXiv

NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy

Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-level covariates. We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among test-level covariates. Our method parametrizes the test-level covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines.

preprint2020arXiv

Probabilistic Canonical Correlation Analysis for Sparse Count Data

Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of continuous variables. CCA has applications in many fields, such as genomics and neuroimaging. It can extract meaningful features as well as use these features for subsequent analysis. Although some sparse CCA methods have been developed to deal with high-dimensional problems, they are designed specifically for continuous data and do not consider the integer-valued data from next-generation sequencing platforms that exhibit very low counts for some important features. We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets (PSCCA). PSCCA demonstrates that correlations and canonical correlations estimated at the natural parameter level are more appropriate than traditional estimation methods applied to the raw data. We demonstrate through simulation studies that PSCCA outperforms other standard correlation approaches and sparse CCA approaches in estimating the true correlations and canonical correlations at the natural parameter level. We further apply the PSCCA method to study the association of miRNA and mRNA expression data sets from a squamous cell lung cancer study, finding that PSCCA can uncover a large number of strongly correlated pairs than standard correlation and other sparse CCA approaches.