Source author record

Yue Pan

Yue Pan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics physics.optics

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

Despite extensive efforts on egocentric video datasets and benchmarks, understanding users' internal states, which is crucial for enabling seamless AI assistant experiences, remains largely overlooked. In this work, we introduce EgoIntrospect, the first egocentric dataset captured in user-driven scenarios with self-annotations that explicitly reveal users' interactive intentions with AI assistants. EgoIntrospect was collected using a cross-device setup, providing synchronized video, audio, gaze, motion, and physiological signals. It consists of 180 hours of recordings from 60 subjects, with an average recording duration of 3 hours per subject. Leveraging EgoIntrospect, we formalize a suite of tasks centered on user internal states, including affective experience, interactive intent, and cognitive memory. We further process the annotations to construct benchmarks that evaluate the ability of modern multimodal large language models to reason about users' internal states from egocentric observations. Experiments on our benchmark suggest that existing multimodal large language models struggle to effectively leverage multimodal signals to infer users' subjective internal states. The dataset and annotations will be made publicly available to advance research in egocentric vision and wearable AI assistants. Project page: https://ego-introspect.github.io/

preprint2026arXiv

VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes

3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation in large-scale and long-sequence scenes. This allows us to scale up to arbitrary scenes and improves robustness (even under pose drifts). In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes. Furthermore, we design a 2D-3D Gaussian loop closure method to eliminate pose drift. We further propose a submap fusion method with online distillation to achieve global consistency in large-scale scenes when detecting a loop. Experiments on various indoor and outdoor datasets demonstrate the superiority and generalizability of the proposed framework. The code will be open source on https://github.com/dtc111111/vpgs-slam.

preprint2026arXiv

What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models

In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed distance functions (SDF), and scene graphs, as well as more recent neural representations like Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and the emerging Foundation Models. While current SLAM and localization systems predominantly rely on sparse representations like point clouds and voxels, dense scene representations are expected to play a critical role in downstream tasks such as navigation and obstacle avoidance. Moreover, neural representations such as NeRF, 3DGS, and foundation models are well-suited for integrating high-level semantic features and language-based priors, enabling more comprehensive 3D scene understanding and embodied intelligence. In this paper, we categorized the core modules of robotics into five parts (Perception, Mapping, Localization, Navigation, Manipulation). We start by presenting the standard formulation of different scene representation methods and comparing the advantages and disadvantages of scene representation across different modules. This survey is centered around the question: What is the best 3D scene representation for robotics? We then discuss the future development trends of 3D scene representations, with a particular focus on how the 3D Foundation Model could replace current methods as the unified solution for future robotic applications. The remaining challenges in fully realizing this model are also explored. We aim to offer a valuable resource for both newcomers and experienced researchers to explore the future of 3D scene representations and their application in robotics. We have published an open-source project on GitHub and will continue to add new works and technologies to this project.

preprint2020arXiv

Remote sensing image fusion based on Bayesian GAN

Remote sensing image fusion technology (pan-sharpening) is an important means to improve the information capacity of remote sensing images. Inspired by the efficient arameter space posteriori sampling of Bayesian neural networks, in this paper we propose a Bayesian Generative Adversarial Network based on Preconditioned Stochastic Gradient Langevin Dynamics (PGSLD-BGAN) to improve pan-sharpening tasks. Unlike many traditional generative models that consider only one optimal solution (might be locally optimal), the proposed PGSLD-BGAN performs Bayesian inference on the network parameters, and explore the generator posteriori distribution, which assists selecting the appropriate generator parameters. First, we build a two-stream generator network with PAN and MS images as input, which consists of three parts: feature extraction, feature fusion and image reconstruction. Then, we leverage Markov discriminator to enhance the ability of generator to reconstruct the fusion image, so that the result image can retain more details. Finally, introducing Preconditioned Stochastic Gradient Langevin Dynamics policy, we perform Bayesian inference on the generator network. Experiments on QuickBird and WorldView datasets show that the model proposed in this paper can effectively fuse PAN and MS images, and be competitive with even superior to state of the arts in terms of subjective and objective metrics.

preprint2019arXiv

Target-less registration of point clouds: A review

Point cloud registration has been one of the basic steps of point cloud processing, which has a lot of applications in remote sensing and robotics. In this report, we summarized the basic workflow of target-less point cloud registration,namely correspondence determination and transformation estimation. Then we reviewed three commonly used groups of registration approaches, namely the feature matching based methods, the iterative closest points algorithm and the randomly hypothesis and verify based methods. Besides, we analyzed the advantage and disadvantage of these methods are introduced their common application scenarios. At last, we discussed the challenges of current point cloud registration methods and proposed several open questions for the future development of automatic registration approaches.

preprint2015arXiv

Arbitrary orbital angular momentum of photons

Orbital angular momentum (OAM) of photons, as a new fundamental degree of freedom, has excited a great diversity of interest, because of a variety of emerging applications. Arbitrarily tunable OAM has gained much attention, but its creation remains still a tremendous challenge. We demonstrate the realization of well-controlled arbitrary OAM in both theory and experiment. We present the concept of general OAM, which extends the OAM carried by the scalar vortex field to the OAM carried by the azimuthally varying polarized vector field. The arbitrary OAM has the same characteristics as the well-defined integer OAM: intrinsic OAM, uniform local OAM and intensity ring, and propagation stability. The arbitrary OAM has unique natures: it is allowed to be flexibly tailored and the radius of the focusing ring can have various choices for a desired OAM, which are of great significance to the benefit of surprising applications of the arbitrary OAM.

Yue Pan

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes

What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models

Remote sensing image fusion based on Bayesian GAN

Target-less registration of point clouds: A review

Arbitrary orbital angular momentum of photons