Source author record

Shijie Li

Shijie Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision astro-ph.CO astro-ph.GA Artificial Intelligence Computation Computation and Language Machine Learning Methodology Programming Languages Robotics Software Engineering

Catalog footprint

What is connected

17works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AI for Auto-Research: Roadmap & User Guide

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-end analysis of AI across the complete research lifecycle, organized into four epistemological phases: Creation (idea generation, literature review, coding & experiments, tables & figures), Writing (paper writing), Validation (peer review, rebuttal & revision), and Dissemination (posters, slides, videos, social media, project pages, and interactive agents). We identify a sharp, stage-dependent boundary between reliable assistance and unreliable autonomy: AI excels at structured, retrieval-grounded, and tool-mediated tasks, but remains fragile for genuinely novel ideas, research-level experiments, and scientific judgment. Generated ideas often degrade after implementation, research code lags far behind pattern-matching benchmarks, and end-to-end autonomous systems have not yet consistently reached major-venue acceptance standards. We further show that greater automation can obscure rather than eliminate failure modes, making human-governed collaboration the most credible deployment paradigm. Finally, we provide a structured taxonomy, benchmark suite, and tool inventory, cross-stage design principles, and a practitioner-oriented playbook, with resources maintained at our project page.

preprint2026arXiv

Consistency-Aware Parameter-Preserving Knowledge Editing Framework for Multi-Hop Question Answering

Parameter-Preserving Knowledge Editing (PPKE) enables updating models with new information without retraining or parameter adjustment. Recent PPKE approaches used knowledge graphs (KG) to extend knowledge editing (KE) capabilities to multi-hop question answering (MHQA). However, these methods often lack consistency, leading to knowledge contamination, unstable updates, and retrieval behaviors that are misaligned with the intended edits. Such inconsistencies undermine the reliability of PPKE in multi-hop reasoning. We present CAPE-KG, Consistency-Aware Parameter-Preserving Editing with Knowledge Graphs, a novel consistency-aware framework for PPKE on MHQA. CAPE-KG ensures KG construction, update, and retrieval are always aligned with the requirements of the MHQA task, maintaining coherent reasoning over both unedited and edited knowledge. Extensive experiments on the MQuAKE benchmark show accuracy improvements in PPKE performance for MHQA, demonstrating the effectiveness of addressing consistency in PPKE.

preprint2026arXiv

Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

Embodied agents in household environments must plan under partial observation: they need to remember objects, track state changes, and recover when actions fail. Existing benchmarks only partially test this ability. Egocentric video datasets capture realistic human activities but remain passive, while interactive simulators support execution but rely on synthetic scenes and hand-crafted dynamics, introducing a sim-to-real gap and often assuming fully observable state. We introduce Ego2World, an executable benchmark that turns egocentric cooking videos into executable symbolic worlds governed by graph-transition rules. Built on HD-EPIC, Ego2World derives reusable transition rules from video annotations and executes them in a hidden symbolic world graph. During evaluation, the simulator maintains the hidden world graph, while the agent plans over its own partial belief graph using only local observations and execution feedback. This separation forces agents to update memory and replan without observing the true world state. Experiments show that action-overlap scores overestimate physical-state success, and that persistent belief memory improves task completion while reducing repeated visual exploration -- suggesting that belief maintenance should be a first-class target of embodied-agent evaluation.

preprint2026arXiv

From Priors to Perception: Grounding Video-LLMs in Physical Reality

While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine physical fallacies. Furthermore, we find that models fail systematically not only in anti-physics anomalies but also in counter-intuitive scenarios where visual facts contradict statistical expectations. Accordingly, we propose the Unified Attribution Theory: this dual failure stems not from perception deficiency, but from Semantic Prior Dominance -- the reasoning mechanism is deeply hijacked by internal narrative scripts. To address this, we construct the Programmatic Adversarial Curriculum (PACC), the first high-fidelity adversarial video dataset synthesized based on physical laws, thoroughly decoupling visual artifacts from logical errors. Concurrently, we design the Visual-Anchored Reasoning Chain (VARC) to force models to explicitly ground their judgments in low-level visual facts prior to logical adjudication. Experiments demonstrate that without invasive architectural modifications, standard LoRA fine-tuning with the PACC curriculum effectively neutralizes prior interference in state-of-the-art (SOTA) models, yielding a substantial leap in physical reasoning capabilities.

preprint2026arXiv

Grounding by Remembering: Cross-Scene and In-Scene Memory for 3D Functional Affordances

Functional affordance grounding requires more than recognizing an object: an agent must localize the specific region that supports an interaction, such as the handle to pull or the button to press. This is difficult for training-free vision-language pipelines because actionable regions are often small, visually ambiguous, and repeated across multiple same-category instances in a scene. We propose AFFORDMEM, a framework that grounds 3D functional affordances by remembering geometry at two levels. The first is cross-scene affordance memory: the agent maintains a category-level memory bank of RGB images with affordance regions rendered as overlays, and recalls the most informative examples at query time to guide a frozen VLM toward small operable subregions that text-only prompting consistently misses. The second is in-scene spatial memory: as the agent processes the scene, it organizes candidate instances and their 3D spatial relations into a structured scene graph, enabling the language model to resolve references over distant or currently unobserved candidates such as "the second handle from the top." AFFORDMEM requires no model fine-tuning and no target-scene annotation, using a reusable memory bank built from source scenes. On SceneFun3D, our method improves AP50 over the prior training-free state of the art by 3.23 on Split 0 and 3.7 on Split 1. Ablation studies support complementary benefits: cross-scene affordance memory improves fine-grained localization, while in-scene spatial memory provides the larger gain on spatially qualified queries. The project homepage is available at the project page.

preprint2024arXiv

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess. This paper provides empirical evidence of CodeFuse-Query's transformative approach, demonstrating its robustness, scalability, and efficiency. We also highlight its real-world impact and diverse applications, emphasizing its potential to reshape the landscape of static code analysis in the context of large-scale software development.Furthermore, in the spirit of collaboration and advancing the field, our project is open-sourced and the repository is available for public access

preprint2022arXiv

Improved Counting and Localization from Density Maps for Object Detection in 2D and 3D Microscopy Imaging

Object counting and localization are key steps for quantitative analysis in large-scale microscopy applications. This procedure becomes challenging when target objects are overlapping, are densely clustered, and/or present fuzzy boundaries. Previous methods producing density maps based on deep learning have reached a high level of accuracy for object counting by assuming that object counting is equivalent to the integration of the density map. However, this model fails when objects show significant overlap regarding accurate localization. We propose an alternative method to count and localize objects from the density map to overcome this limitation. Our procedure includes the following three key aspects: 1) Proposing a new counting method based on the statistical properties of the density map, 2) optimizing the counting results for those objects which are well-detected based on the proposed counting method, and 3) improving localization of poorly detected objects using the proposed counting method as prior information. Validation includes processing of microscopy data with known ground truth and comparison with other models that use conventional processing of the density map. Our results show improved performance in counting and localization of objects in 2D and 3D microscopy data. Furthermore, the proposed method is generic, considering various applications that rely on the density map approach. Our code will be released post-review.

preprint2022arXiv

Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots

With the growth of the economy and society, enterprises, especially in the FinTech industry, have increasing demands of outbound calls for customers such as debt collection, marketing, anti-fraud calls, and so on. But a large amount of repetitive and mechanical work occupies most of the time of human agents, so the cost of equipment and labor for enterprises is increasing accordingly. At the same time, with the development of artificial intelligence technology in the past few decades, it has become quite common for companies to use new technologies such as Big Data and artificial intelligence to empower outbound call businesses. The intelligent outbound robot is a typical application of the artificial intelligence technology in the field of outbound call businesses. It is mainly used to communicate with customers in order to accomplish a certain target. It has the characteristics of low cost, high reuse, and easy compliance, which has attracted more attention from the industry. At present, there are two kinds of intelligent outbound robots in the industry but both of them still leave large room for improvement. One kind of them is based on a finite state machine relying on the configuration of jump conditions and corresponding nodes based on manual experience. This kind of intelligent outbound robot is also called a flow-based robot. For example, the schematic diagram of the working model of a flow-based robot for debt collection is shown in Fig.\ref{fig:label}. In each round, the robot will reply to the user with the words corresponding to each node.

preprint2021arXiv

Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition

With the advances in capturing 2D or 3D skeleton data, skeleton-based action recognition has received an increasing interest over the last years. As skeleton data is commonly represented by graphs, graph convolutional networks have been proposed for this task. While current graph convolutional networks accurately recognize actions, they are too expensive for robotics applications where limited computational resources are available. In this paper, we therefore propose a highly efficient graph convolutional network that addresses the limitations of previous works. This is achieved by a parallel structure that gradually fuses motion and spatial information and by reducing the temporal resolution as early as possible. Furthermore, we explicitly address the issue that human poses can contain errors. To this end, the network first refines the poses before they are further processed to recognize the action. We therefore call the network Pose Refinement Graph Convolutional Network. Compared to other graph convolutional networks, our network requires 86\%-93\% less parameters and reduces the floating point operations by 89%-96% while achieving a comparable accuracy. It therefore provides a much better trade-off between accuracy, memory footprint and processing time, which makes it suitable for robotics applications.

preprint2020arXiv

MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation

With the success of deep learning in classifying short trimmed videos, more attention has been focused on temporally segmenting and classifying activities in long untrimmed videos. State-of-the-art approaches for action segmentation utilize several layers of temporal convolution and temporal pooling. Despite the capabilities of these approaches in capturing temporal dependencies, their predictions suffer from over-segmentation errors. In this paper, we propose a multi-stage architecture for the temporal action segmentation task that overcomes the limitations of the previous approaches. The first stage generates an initial prediction that is refined by the next ones. In each stage we stack several layers of dilated temporal convolutions covering a large receptive field with few parameters. While this architecture already performs well, lower layers still suffer from a small receptive field. To address this limitation, we propose a dual dilated layer that combines both large and small receptive fields. We further decouple the design of the first stage from the refining stages to address the different requirements of these stages. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our models achieve state-of-the-art results on three datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.

preprint2016arXiv

An empirical model to form and evolve galaxies in dark matter halos

Based on the star formation histories (SFH) of galaxies in halos of different masses, we develop an empirical model to grow galaxies in dark mattet halos. This model has very few ingredients, any of which can be associated to observational data and thus be efficiently assessed. By applying this model to a very high resolution cosmological $N$-body simulation, we predict a number of galaxy properties that are a very good match to relevant observational data. Namely, for both centrals and satellites, the galaxy stellar mass function (SMF) up to redshift $z\simeq4$ and the conditional stellar mass functions (CSMF) in the local universe are in good agreement with observations. In addition, the 2-point correlation is well predicted in the different stellar mass ranges explored by our model. Furthermore, after applying stellar population synthesis models to our stellar composition as a function of redshift, we find that the luminosity functions in $^{0.1}u$, $^{0.1}g$, $^{0.1}r$, $^{0.1}i$ and $^{0.1}z$ bands agree quite well with the SDSS observational results down to an absolute magnitude at about -17.0. The SDSS conditional luminosity functions (CLF) itself is predicted well. Finally, the cold gas is derived from the star formation rate (SFR) to predict the HI gas mass within each mock galaxy. We find a remarkably good match to observed HI-to-stellar mass ratios. These features ensure that such galaxy/gas catalogs can be used to generate reliable mock redshift surveys.

preprint2016arXiv

ELUCID - Exploring the Local Universe with reConstructed Initial Density field III: Constrained Simulation in the SDSS Volume

A method we developed recently for the reconstruction of the initial density field in the nearby Universe is applied to the Sloan Digital Sky Survey Data Release 7. A high-resolution N-body constrained simulation (CS) of the reconstructed initial condition, with $3072^3$ particles evolved in a 500 Mpc/h box, is carried out and analyzed in terms of the statistical properties of the final density field and its relation with the distribution of SDSS galaxies. We find that the statistical properties of the cosmic web and the halo populations are accurately reproduced in the CS. The galaxy density field is strongly correlated with the CS density field, with a bias that depend on both galaxy luminosity and color. Our further investigations show that the CS provides robust quantities describing the environments within which the observed galaxies and galaxy systems reside. Cosmic variance is greatly reduced in the CS so that the statistical uncertainties can be controlled effectively even for samples of small volumes.

preprint2016arXiv

Galaxy groups in the 2MASS Redshift Survey

A galaxy group catalog is constructed from the 2MASS Redshift Survey (2MRS) with the use of a halo-based group finder. The halo mass associated with a group is estimated using a `GAP' method based on the luminosity of the central galaxy and its gap with other member galaxies. Tests using mock samples shows that this method is reliable, particularly for poor systems containing only a few members. On average 80% of all the groups have completeness >0.8, and about 65% of the groups have zero contamination. Halo masses are estimated with a typical uncertainty $\sim 0.35\,{\rm dex}$. The application of the group finder to the 2MRS gives 29,904 groups from a total of 43,246 galaxies at $z \leq 0.08$, with 5,286 groups having two or more members. Some basic properties of this group catalog is presented, and comparisons are made with other groups catalogs in overlap regions. With a depth to $z\sim 0.08$ and uniformly covering about 91% of the whole sky, this group catalog provides a useful data base to study galaxies in the local cosmic web, and to reconstruct the mass distribution in the local Universe.

preprint2016arXiv

Mapping the real space distributions of galaxies in SDSS DR7: I. Two Point Correlation Functions

Using a method to correct redshift space distortion (RSD) for individual galaxies, we mapped the real space distributions of galaxies in the Sloan Digital Sky Survey (SDSS) Data Release 7 (DR7). We use an ensemble of mock catalogs to demonstrate the reliability of our method. Here as the first paper in a series, we mainly focus on the two point correlation function (2PCF) of galaxies. Overall the 2PCF measured in the reconstructed real space for galaxies brighter than $^{0.1}{\rm M}_r-5\log h=-19.0$ agrees with the direct measurement to an accuracy better than the measurement error due to cosmic variance, if the reconstruction uses the correct cosmology. Applying the method to the SDSS DR7, we construct a real space version of the main galaxy catalog, which contains 396,068 galaxies in the North Galactic Cap with redshifts in the range $0.01 \leq z \leq 0.12$. The Sloan Great Wall, the largest known structure in the nearby Universe, is not as dominant an over-dense structure as appears to be in redshift space. We measure the 2PCFs in reconstructed real space for galaxies of different luminosities and colors. All of them show clear deviations from single power-law forms, and reveal clear transitions from 1-halo to 2-halo terms. A comparison with the corresponding 2PCFs in redshift space nicely demonstrates how RSDs boost the clustering power on large scales (by about $40-50\%$ at scales $\sim 10 h^{-1}{\rm {Mpc}}$) and suppress it on small scales (by about $70-80\%$ at a scale of $0.3 h^{-1}{\rm {Mpc}}$).

preprint2016arXiv

Robust Orthogonal Complement Principal Component Analysis

Recently, the robustification of principal component analysis has attracted lots of attention from statisticians, engineers and computer scientists. In this work we study the type of outliers that are not necessarily apparent in the original observation space but can seriously affect the principal subspace estimation. Based on a mathematical formulation of such transformed outliers, a novel robust orthogonal complement principal component analysis (ROC-PCA) is proposed. The framework combines the popular sparsity-enforcing and low rank regularization techniques to deal with row-wise outliers as well as element-wise outliers. A non-asymptotic oracle inequality guarantees the accuracy and high breakdown performance of ROC-PCA in finite samples. To tackle the computational challenges, an efficient algorithm is developed on the basis of Stiefel manifold optimization and iterative thresholding. Furthermore, a batch variant is proposed to significantly reduce the cost in ultra high dimensions. The paper also points out a pitfall of a common practice of SVD reduction in robust PCA. Experiments show the effectiveness and efficiency of ROC-PCA in both synthetic and real data.

preprint2014arXiv

Joint Association Graph Screening and Decomposition for Large-scale Linear Dynamical Systems

This paper studies large-scale dynamical networks where the current state of the system is a linear transformation of the previous state, contaminated by a multivariate Gaussian noise. Examples include stock markets, human brains and gene regulatory networks. We introduce a transition matrix to describe the evolution, which can be translated to a directed Granger transition graph, and use the concentration matrix of the Gaussian noise to capture the second-order relations between nodes, which can be translated to an undirected conditional dependence graph. We propose regularizing the two graphs jointly in topology identification and dynamics estimation. Based on the notion of joint association graph (JAG), we develop a joint graphical screening and estimation (JGSE) framework for efficient network learning in big data. In particular, our method can pre-determine and remove unnecessary edges based on the joint graphical structure, referred to as JAG screening, and can decompose a large network into smaller subnetworks in a robust manner, referred to as JAG decomposition. JAG screening and decomposition can reduce the problem size and search space for fine estimation at a later stage. Experiments on both synthetic data and real-world applications show the effectiveness of the proposed framework in large-scale network topology identification and dynamics estimation.

preprint2013arXiv

Constraining the Star Formation Histories in Dark Matter Halos: I. Central Galaxies

Using the self-consistent modeling of the conditional stellar mass functions across cosmic time by Yang et al. (2012), we make model predictions for the star formation histories (SFHs) of {\it central} galaxies in halos of different masses. The model requires the following two key ingredients: (i) mass assembly histories of central and satellite galaxies, and (ii) local observational constraints of the star formation rates of central galaxies as function of halo mass. We obtain a universal fitting formula that describes the (median) SFH of central galaxies as function of halo mass, galaxy stellar mass and redshift. We use this model to make predictions for various aspects of the star formation rates of central galaxies across cosmic time. Our main findings are the following. (1) The specific star formation rate (SSFR) at high $z$ increases rapidly with increasing redshift [$\propto (1+z)^{2.5}$] for halos of a given mass and only slowly with halo mass ($\propto M_h^{0.12}$) at a given $z$, in almost perfect agreement with the specific mass accretion rate of dark matter halos. (2) The ratio between the star formation rate (SFR) in the main-branch progenitor and the final stellar mass of a galaxy peaks roughly at a constant value, $\sim 10^{-9.3} h^2 {\rm yr}^{-1}$, independent of halo mass or the final stellar mass of the galaxy. However, the redshift at which the SFR peaks increases rapidly with halo mass. (3) More than half of the stars in the present-day Universe were formed in halos with $10^{11.1}\msunh < M_h < 10^{12.3}\msunh$ in the redshift range $0.4 < z < 1.9$. (4) ... [abridged]

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision astro-ph.CO astro-ph.GA Artificial Intelligence Computation Computation and Language Machine Learning Methodology Programming Languages Robotics Software Engineering

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.04515:author:3:shijie-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.11616:author:5:shijie-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.13335:author:6:shijie-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.18661:author:15:shijie-li

Imported May 20, 2026Synced May 20, 2026

5 works

Xiaohu Yang

Researcher

Xiaohu Yang contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

H. J. Mo

Researcher

H. J. Mo contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Huiyuan Wang

Researcher

Huiyuan Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Yi Lu

Researcher

Yi Lu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Shijie Li

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

AI for Auto-Research: Roadmap & User Guide

Consistency-Aware Parameter-Preserving Knowledge Editing Framework for Multi-Hop Question Answering

Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

From Priors to Perception: Grounding Video-LLMs in Physical Reality

Grounding by Remembering: Cross-Scene and In-Scene Memory for 3D Functional Affordances

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

Improved Counting and Localization from Density Maps for Object Detection in 2D and 3D Microscopy Imaging

Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots

Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition

MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation

An empirical model to form and evolve galaxies in dark matter halos

ELUCID - Exploring the Local Universe with reConstructed Initial Density field III: Constrained Simulation in the SDSS Volume

Galaxy groups in the 2MASS Redshift Survey

Mapping the real space distributions of galaxies in SDSS DR7: I. Two Point Correlation Functions

Robust Orthogonal Complement Principal Component Analysis

Joint Association Graph Screening and Decomposition for Large-scale Linear Dynamical Systems

Constraining the Star Formation Histories in Dark Matter Halos: I. Central Galaxies