Source author record

Yibo Liu

Yibo Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision cond-mat.quant-gas eess.IV Graphics Neural and Evolutionary Computing quant-ph Robotics

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional engagement. Producing cutscenes is inherently complex: it demands seamless coordination across screenwriting, cinematography, character animation, voice acting, and technical direction, often requiring days to weeks of collaborative effort from multidisciplinary teams to produce minutes of polished content. In this work, we present Cutscene Agent, an LLM agent framework for automated end-to-end cutscene generation. The framework makes three contributions: (1)~a Cutscene Toolkit built on the Model Context Protocol (MCP) that establishes \emph{bidirectional} integration between LLM agents and the game engine -- agents not only invoke engine operations but continuously observe real-time scene state, enabling closed-loop generation of editable engine-native cinematic assets; (2)~a multi-agent system where a director agent orchestrates specialist subagents for animation, cinematography, and sound design, augmented by a visual reasoning feedback loop for perception-driven refinement; and (3)~CutsceneBench, a hierarchical evaluation benchmark for cutscene generation. Unlike typical tool-use benchmarks that evaluate short, isolated function calls, cutscene generation requires long-horizon, multi-step orchestration of dozens of interdependent tool invocations with strict ordering constraints -- a capability dimension that existing benchmarks do not cover. We evaluate a range of LLMs on CutsceneBench and analyze their performance across this challenging task.

preprint2022arXiv

Endowing Language Models with Multimodal Knowledge Graph Representations

We propose a method to make natural language understanding models more parameter efficient by storing knowledge in an external knowledge graph (KG) and retrieving from this KG using a dense index. Given (possibly multilingual) downstream task data, e.g., sentences in German, we retrieve entities from the KG and use their multimodal representations to improve downstream task performance. We use the recently released VisualSem KG as our external knowledge repository, which covers a subset of Wikipedia and WordNet entities, and compare a mix of tuple-based and graph-based algorithms to learn entity and relation representations that are grounded on the KG multimodal information. We demonstrate the usefulness of the learned entity representations on two downstream tasks, and show improved performance on the multilingual named entity recognition task by $0.3\%$--$0.7\%$ F1, while we achieve up to $2.5\%$ improvement in accuracy on the visual sense disambiguation task. All our code and data are available in: \url{https://github.com/iacercalixto/visualsem-kg}.

preprint2022arXiv

Floquet analysis of extended Rabi models based on high-frequency expansion

The extended quantum Rabi models make a significant contribution to understand the quantum nature of the atom-light interaction. We transform two kinds of extended quantum Rabi model, anisotropic Rabi model and asymmetric Rabi model, into rotating frame, and regard them as periodically driven quantum systems. The analytical solutions of the quasi-energy spectrum as well as the Floquet modes for both models are constructed by applying the Floquet theory and the high-frequency expansion, which is applied to the non-stroboscopic dynamics of physical observables such as atomic inversion, transverse magnetization, atom-field correlation, etc. For anisotropic Rabi model, the quasi energy fits well with the numerical results even when the rotating-wave coupling is in the deep-strong coupling regime $g\simeq\hbarω$ if the counterrotating terms is small enough compared to the driving frequency. Avoided level crossing may occur for quasi energy with the same parity when the positive branch spectrum lines for the total excitation number $N$ cross the negative branch lines for $N+2$, while the high frequency expansion fails to predict this due to the conservation of the total excitation number. Furthermore, we present analytical and numerical study of the long-time evolution of population and figure out the analytical method is credible for the population dynamics. For asymmetric Rabi model, we find that the external bias field which breaks the parity symmetry of total excitation number tends to cluster the upper and lower branches into two bundles, and the detuning induced gap in the first temporal Brillouin zone shows a quadratic dependence on the bias. Both models prove that treating the Hamiltonian in the rotating frame by Floquet theory gives an alternative tool in the study of interaction between atom and light.

preprint2022arXiv

Intensity Image-based LiDAR Fiducial Marker System

The fiducial marker system for LiDAR is crucial for the robotic application but it is still rare to date. In this paper, an Intensity Image-based LiDAR Fiducial Marker (IILFM) system is developed. This system only requires an unstructured point cloud with intensity as the input and it has no restriction on marker placement and shape. A marker detection method that locates the predefined 3D fiducials in the point cloud through the intensity image is introduced. Then, an approach that utilizes the detected 3D fiducials to estimate the LiDAR 6-DOF pose that describes the transmission from the world coordinate system to the LiDAR coordinate system is developed. Moreover, all these processes run in real-time (approx 40 Hz on Livox Mid-40 and approx 143 Hz on VLP-16). Qualitative and quantitative experiments are conducted to demonstrate that the proposed system has similar convenience and accuracy as the conventional visual fiducial marker system. The codes and results are available at: https://github.com/York-SDCNLab/IILFM.

preprint2022arXiv

Multi-view Point Cloud Registration based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism

Multi-view point cloud registration is fundamental in 3D reconstruction. Since there are close connections between point clouds captured from different viewpoints, registration performance can be enhanced if these connections be harnessed properly. Therefore, this paper models the registration problem as multi-task optimization, and proposes a novel bi-channel knowledge sharing mechanism for effective and efficient problem solving. The modeling of multi-view point cloud registration as multi-task optimization are twofold. By simultaneously considering the local accuracy of two point clouds as well as the global consistency posed by all the point clouds involved, a fitness function with an adaptive threshold is derived. Also a framework of the co-evolutionary search process is defined for the concurrent optimization of multiple fitness functions belonging to related tasks. To enhance solution quality and convergence speed, the proposed bi-channel knowledge sharing mechanism plays its role. The intra-task knowledge sharing introduces aiding tasks that are much simpler to solve, and useful information is shared across aiding tasks and the original tasks, accelerating the search process. The inter-task knowledge sharing explores commonalities buried among the original tasks, aiming to prevent tasks from getting stuck to local optima. Comprehensive experiments conducted on model object as well as scene point clouds show the efficacy of the proposed method.

Yibo Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

Endowing Language Models with Multimodal Knowledge Graph Representations

Floquet analysis of extended Rabi models based on high-frequency expansion

Intensity Image-based LiDAR Fiducial Marker System

Multi-view Point Cloud Registration based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism