Researcher profile

Kai Lu

Kai Lu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

CycleVLA: Proactive Self-Correcting Vision-Language-Action Models via Subtask Backtracking and Minimum Bayes Risk Decoding

Current work on robot failure detection and correction typically operate in a post hoc manner, analyzing errors and applying corrections only after failures occur. This work introduces CycleVLA, a system that equips Vision-Language-Action models (VLAs) with proactive self-correction, the capability to anticipate incipient failures and recover before they fully manifest during execution. CycleVLA achieves this by integrating a progress-aware VLA that flags critical subtask transition points where failures most frequently occur, a VLM-based failure predictor and planner that triggers subtask backtracking upon predicted failure, and a test-time scaling strategy based on Minimum Bayes Risk (MBR) decoding to improve retry success after backtracking. Extensive experiments show that CycleVLA improves performance for both well-trained and under-trained VLAs, and that MBR serves as an effective zero-shot test-time scaling strategy for VLAs. Project Page: https://dannymcy.github.io/cyclevla/

preprint2026arXiv

WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization

Recently, video language models (VLMs) have been applied in various fields. However, the visual token sequence of the VLM is too long, which may cause intolerant inference latency and GPU memory usage. Existing methods propose mixed-precision quantization to the key-value (KV) cache in VLMs based on token granularity, which is time-consuming in the search process and hardware inefficient during computation. This paper introduces a novel approach called WindowQuant, which employs window-adaptive mixed-precision quantization to optimize the KV cache. WindowQuant consists of two modules: window-level quantization search and window-level KV cache computation. Window-level quantization search quickly determines the optimal bit-width configuration of the KV cache windows based on the similarity scores between the corresponding visual token windows and the text prompt, maintaining the model accuracy. Furthermore, window-level KV cache computation reorders the KV cache windows before quantization, avoiding the hardware inefficiency caused by mixed-precision quantization in inference computation. Extensive experiments demonstrate that WindowQuant outperforms state-of-the-art VLM models and KV cache quantization methods on various datasets.

preprint2022arXiv

Design and control analysis of a deployable clustered hyperbolic paraboloid cable net

This paper presents an analytical and experimental design and deployment control analysis of a hyperbolic paraboloid cable net based on clustering actuation strategies. First, the dynamics and statics for clustered tensegrity structures (CTS) are given. Then, we propose the topology design of the deployable hyperbolic paraboloid cable net. The deployability of the cable net is achieved by using clustered cables. It is shown that the clustered cables significantly reduce the number of actuators required for control. The deployment trajectory and actuation prestress in the cables are designed to ensure the tensions are feasible during the deployment process. Then, we compare the deployment analysis's open-loop and closed-loop control strategies. Finally, a lab-scale model is constructed to validate the actuation laws. We test the static performance and deployment process of the experimental model. Results show that the closed-loop control approach is more stable and smoother than the open-loop one in the deployment process. The approaches developed in this paper can also be used for various deployable tensegrity structures.

preprint2022arXiv

Hadro-chemistry effects on charm decayed leptons in heavy-ion collisions

Charm-hadrons possess versatile hadro-chemistry as characterized by various transverse-momentum-dependent ratios between their different species. In particular, the charm hadro-chemistry may be modified in relativistic heavy-ion collisions with respect to proton-proton collisions at the same energy, as caused by novel diffusion and hadronization mechanisms of charm quarks in the environment of the created quark-gluon plasma (QGP) in the former. Inspired by recent measurements of leptons from charm-hadron decays (separated from bottom decays) in Pb-Pb and Au-Au collisions, we investigate the effects of the charm hadro-chemistry on the leptonic observables. We find that full consideration of charm hadro-chemistry in both proton-proton and heavy-ion collisions causes only mild change of charm-leptons' suppression factor with respect to previous calculations hadronizing charm quarks into $D$ mesons only, whereas the resulting change (increase) in the charm-leptons' elliptic flow turns out to be more pronounced as a consequence of the larger collectivity of $Λ_c$ baryons than $D$ mesons.

preprint2022arXiv

Large-scale full-programmable quantum walk and its applications

With photonics, the quantum computational advantage has been demonstrated on the task of boson sampling. Next, developing quantum-enhanced approaches for practical problems becomes one of the top priorities for photonic systems. Quantum walks are powerful kernels for developing new and useful quantum algorithms. Here we realize large-scale quantum walks using a fully programmable photonic quantum computing system. The system integrates a silicon quantum photonic chip, enabling the simulation of quantum walk dynamics on graphs with up to 400 vertices and possessing full programmability over quantum walk parameters, including the particle property, initial state, graph structure, and evolution time. In the 400-dimensional Hilbert space, the average fidelity of random entangled quantum states after the whole on-chip circuit evolution reaches as high as 94.29$\pm$1.28$\%$. With the system, we demonstrated exponentially faster hitting and quadratically faster mixing performance of quantum walks over classical random walks, achieving more than two orders of magnitude of enhancement in the experimental hitting efficiency and almost half of the reduction in the experimental evolution time for mixing. We utilize the system to implement a series of quantum applications, including measuring the centrality of scale-free networks, searching targets on Erdös-Rényi networks, distinguishing non-isomorphic graph pairs, and simulating the topological phase of higher-order topological insulators. Our work shows one feasible path for quantum photonics to address applications of practical interests in the near future.

preprint2022arXiv

Tree-based Search Graph for Approximate Nearest Neighbor Search

Nearest neighbor search supports important applications in many domains, such as database, machine learning, computer vision. Since the computational cost for accurate search is too high, the community turned to the research of approximate nearest neighbor search (ANNS). Among them, graph-based algorithm is one of the most important branches. Research by Fu et al. shows that the algorithms based on Monotonic Search Network (MSNET), such as NSG and NSSG, have achieved the state-of-the-art search performance in efficiency. The MSNET is dedicated to achieving monotonic search with minimal out-degree of nodes to pursue high efficiency. However, the current MSNET designs did not optimize the probability of the monotonic search, and the lower bound of the probability is only 50%. If they fail in monotonic search stage, they have to suffer tremendous backtracking cost to achieve the required accuracy. This will cause performance problems in search efficiency. To address this problem, we propose (r,p)-MSNET, which achieves guaranteed probability on monotonic search. Due to the high building complexity of a strict (r,p)-MSNET, we propose TBSG, which is an approximation with low complexity. Experiment conducted on four million-scaled datasets show that TBSG outperforms existing state-of-the-art graph-based algorithms in search efficiency. Our code has been released on Github.