Source author record

Bin Li

Bin Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language Machine Learning cond-mat.mtrl-sci eess.SP eess.SY physics.app-ph Systems and Control

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

An Efficient Streaming Video Understanding Framework with Agentic Control

Streaming video requires handling dynamic information density under strict latency budgets. Yet, existing methods typically employ static strategies, such as fixed memory compression or reliance on a single model, forcing a trade-off: fast models fail on complex queries, while always-on heavy models violate real-time constraints and overcomplicate simple queries. Rather than fixing these decisions upfront, we propose R3-Streaming (Remember, Respond, Reason), which formulates streaming video understanding as a cascaded control problem: for each query, the system compresses memory, judges response readiness, and routes computation sequentially, so that each downstream decision builds on progressively refined information states. To optimize this pipeline, we introduce an age-aware forgetting policy for memory compression, as aggressively compressing historical frames can yield substantial performance gains. For compute routing, we propose TB-GRPO, a target-balanced reinforcement learning objective that routes hard queries to a stronger model while preventing mode collapse. Extensive evaluations demonstrate that R3-Streaming achieves state-of-the-art results among streaming MLLMs, reaching 57.92 on OVO-Bench and 76.36 on StreamingBench, while reducing visual token usage by 95 to 96 percent.

preprint2026arXiv

Consistency-Aware Parameter-Preserving Knowledge Editing Framework for Multi-Hop Question Answering

Parameter-Preserving Knowledge Editing (PPKE) enables updating models with new information without retraining or parameter adjustment. Recent PPKE approaches used knowledge graphs (KG) to extend knowledge editing (KE) capabilities to multi-hop question answering (MHQA). However, these methods often lack consistency, leading to knowledge contamination, unstable updates, and retrieval behaviors that are misaligned with the intended edits. Such inconsistencies undermine the reliability of PPKE in multi-hop reasoning. We present CAPE-KG, Consistency-Aware Parameter-Preserving Editing with Knowledge Graphs, a novel consistency-aware framework for PPKE on MHQA. CAPE-KG ensures KG construction, update, and retrieval are always aligned with the requirements of the MHQA task, maintaining coherent reasoning over both unedited and edited knowledge. Extensive experiments on the MQuAKE benchmark show accuracy improvements in PPKE performance for MHQA, demonstrating the effectiveness of addressing consistency in PPKE.

preprint2026arXiv

Distributionally Robust Game for Proof-of-Work Blockchain Mining Under Resource Uncertainties

Blockchain plays a crucial role in ensuring the security and integrity of decentralized systems, with the proof-of-work (PoW) mechanism being fundamental for achieving distributed consensus. As PoW blockchains see broader adoption, an increasingly diverse set of miners with varying computing capabilities participate in the network. In this paper, we consider the PoW blockchain mining, where the miners are associated with resource uncertainties. To characterize the uncertainty computing resources at different mining participants, we establish an ambiguous set representing uncertainty of resource distributions. Then, the networked mining is formulated as a non-cooperative game, where distributionally robust performance is calculated for each individual miner to tackle the resource uncertainties. We prove the existence of the equilibrium of the distributionally robust mining game. To derive the equilibrium, we propose the conditional value-at-risk (CVaR)-based reinterpretation of the best response of each miner. We then solve the individual strategy with alternating optimization, which facilitates the iteration among miners towards the game equilibrium. Furthermore, we consider the case that the ambiguity of resource distribution reduces to Gaussian distribution and the case that another uncertainties vanish, and then characterize the properties of the equilibrium therein along with a distributed algorithm to achieve the equilibrium. Simulation results show that the proposed approaches effectively converge to the equilibrium, and effectively tackle the uncertainties in blockchain mining to achieve a robust performance guarantee.

preprint2026arXiv

Electric field switching of altermagnetic spin-splitting in multiferroic skyrmions

Magnetic skyrmions are localized magnetic structures that retain their shape and stability over time, thanks to their topological nature. Recent theoretical and experimental progress has laid the groundwork for understanding magnetic skyrmions characterized by negligible net magnetization and ultrafast dynamics. Notably, skyrmions emerging in materials with altermagnetism, a novel magnetic phase featuring lifted Kramers degeneracy-have remained unreported until now. In this study, we demonstrate that BiFeO3, a multiferroic renowned for its strong coupling between ferroelectricity and magnetism, can transit from a spin cycloid to a Neel-type skyrmion under antidamping spin-orbit torque at room temperature. Strikingly, the altermagnetic spin splitting within BiFeO3 skyrmion can be reversed through the application of an electric field, revealed via the Circular photogalvanic effect. This quasiparticle, which possesses a neutral topological charge, holds substantial promise for diverse applications-most notably, enabling the development of unconventional computing systems with low power consumption and magnetoelectric controllability.

preprint2026arXiv

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training

GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity of suitable environments. We present InfiniteWeb, a system that automatically generates functional web environments at scale for GUI agent training. While LLMs perform well on generating a single webpage, building a realistic and functional website with many interconnected pages faces challenges. We address these challenges through unified specification, task-centric test-driven development, and a combination of website seed with reference design image to ensure diversity. Our system also generates verifiable task evaluators enabling dense reward signals for reinforcement learning. Experiments show that InfiniteWeb surpasses commercial coding agents at realistic website construction, and GUI agents trained on our generated environments achieve significant performance improvements on OSWorld and Online-Mind2Web, demonstrating the effectiveness of proposed system.

preprint2026arXiv

On the Robustness of Age for Learning-Based Wireless Scheduling in Unknown Environments

The constrained combinatorial multi-armed bandit model has been widely employed to solve problems in wireless networking and related areas, including the problem of wireless scheduling for throughput optimization under unknown channel conditions. Most work in this area uses an algorithm design strategy that combines a bandit learning algorithm with the virtual queue technique to track the throughput constraint violation. These algorithms seek to minimize the virtual queue length in their algorithm design. However, in networks where channel conditions change abruptly, the resulting constraints may become infeasible, leading to unbounded growth in virtual queue lengths. In this paper, we make the key observation that the dynamics of the head-of-line age, i.e. the age of the oldest packet in the virtual queue, make it more robust when used in algorithm design compared to the virtual queue length. We therefore design a learning-based scheduling policy that uses the head-of-line age in place of the virtual queue length. We show that our policy matches state-of-the-art performance under i.i.d. network conditions. Crucially, we also show that the system remains stable even under abrupt changes in channel conditions and can rapidly recover from periods of constraint infeasibility.

preprint2026arXiv

Service Provisioning and Path Planning with Obstacle Avoidance for Low-Altitude Wireless Networks

This paper investigates the three-dimensional (3D) deployment of uncrewed aerial vehicles (UAVs) as aerial base stations in heterogeneous communication networks under constraints imposed by diverse ground obstacles. Given the diverse data demands of user equipments (UEs), a user satisfaction model is developed to provide personalized services. In particular, when a UE is located within a ground obstacle, the UAV must approach the obstacle boundary to ensure reliable service quality. Considering constraints such as UAV failures due to battery depletion, heterogeneous UEs, and obstacles, we aim to maximize overall user satisfaction by jointly optimizing the 3D trajectories of UAVs, transmit beamforming vectors, and binary association indicators between UAVs and UEs. To address the complexity and dynamics of the problem, a block coordinate descent method is adopted to decompose it into two subproblems. The beamforming subproblem is efficiently addressed via a bisection-based water-filling algorithm. For the trajectory and association subproblem, we design a deep reinforcement learning algorithm based on proximal policy optimization to learn an adaptive control policy. Simulation results demonstrate that the proposed scheme outperforms baseline schemes in terms of convergence speed and overall system performance. Moreover, it achieves efficient association and accurate obstacle avoidance.

preprint2026arXiv

TimeGMM: Single-Pass Probabilistic Forecasting via Adaptive Gaussian Mixture Models with Reversible Normalization

Probabilistic time series forecasting is crucial for quantifying future uncertainty, with significant applications in fields such as energy and finance. However, existing methods often rely on computationally expensive sampling or restrictive parametric assumptions to characterize future distributions, which limits predictive performance and introduces distributional mismatch. To address these challenges, this paper presents TimeGMM, a novel probabilistic forecasting framework based on Gaussian Mixture Models (GMM) that captures complex future distributions in a single forward pass. A key component is GMM-adapted Reversible Instance Normalization (GRIN), a novel module designed to dynamically adapt to temporal-probabilistic distribution shifts. The framework integrates a dedicated Temporal Encoder (TE-Module) with a Conditional Temporal-Probabilistic Decoder (CTPD-Module) to jointly capture temporal dependencies and mixture distribution parameters. Extensive experiments demonstrate that TimeGMM consistently outperforms state-of-the-art methods, achieving maximum improvements of 22.48\% in CRPS and 21.23\% in NMAE.

preprint2026arXiv

Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification

With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially partition the noise-filtered tracklets into "sub-tracklets". Then, we cluster and further merge sub-tracklets using the self-supervised signal from the tracklet partition, which is enhanced through a progressive strategy to generate reliable pseudo labels, facilitating intra-class cross-tracklet aggregation. Moreover, we propose the Class Smoothing Classification (CSC) loss to efficiently promote model learning. Extensive experiments on the MARS and DukeMTMC-VideoReID datasets demonstrate that our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods. The code is available at https://github.com/Darylmeng/SSRC-Reid.

preprint2026arXiv

VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network

The rapid advances in deep learning have significantly enhanced the accuracy of multimodal 3D human pose estimation (HPE). However, the state-of-the-art (SOTA) HPE pipelines still rely on Transformers, whose quadratic complexity makes real-time processing for long sequences impractical. Mamba addresses this issue through selective state-space modeling, enabling efficient sequence processing without sacrificing representational power. Nevertheless, it struggles to capture complex spatial dependencies in multimodal settings. To bridge this gap, we propose VIMCAN, a hybrid architecture that combines the efficient sequence modeling of Mamba with the spatial reasoning of Cross-Attention, and performs robust visual-inertial fusion and human pose estimation between RGB keypoints and wearable IMU data. By leveraging Mamba's dynamic parameterization for temporal modeling and Attention for spatial dependency extraction, VIMCAN achieves superior accuracy, with mean per-joint position errors (MPJPE) of 17.2 mm on TotalCapture and 45.3 mm on 3DPW. VIMCAN outperforms prior Transformer-based and other SOTA approaches while supporting real-time inference at over 60 frames per second on consumer-grade hardware. The source code is available at \href{https://github.com/Eddieyzp/VIMCAN}{this GitHub repository}.

Bin Li

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

An Efficient Streaming Video Understanding Framework with Agentic Control

Consistency-Aware Parameter-Preserving Knowledge Editing Framework for Multi-Hop Question Answering

Distributionally Robust Game for Proof-of-Work Blockchain Mining Under Resource Uncertainties

Electric field switching of altermagnetic spin-splitting in multiferroic skyrmions

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training

On the Robustness of Age for Learning-Based Wireless Scheduling in Unknown Environments

Service Provisioning and Path Planning with Obstacle Avoidance for Low-Altitude Wireless Networks

TimeGMM: Single-Pass Probabilistic Forecasting via Adaptive Gaussian Mixture Models with Reversible Normalization

Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification

VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network