Source author record

Bin Liu

Bin Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Human-Computer Interaction Machine Learning astro-ph.CO Information Retrieval quant-ph Robotics

Catalog footprint

What is connected

10works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method

High-quality point cloud data is a critical foundation for tasks such as autonomous driving and 3D reconstruction. However, LiDAR-based point cloud acquisition is often affected by various disturbances, resulting in a large number of noise points that degrade the accuracy of subsequent point cloud object detection and recognition. Moreover, existing point cloud denoising methods typically sacrifice computational efficiency in pursuit of higher denoising accuracy, or, conversely, improve processing speed at the expense of preserving object boundaries and fine structural details, making it difficult to simultaneously achieve high denoising accuracy, strong edge preservation, and real-time performance. To address these limitations, this paper proposes an adaptive dualweight gravitational-based point cloud denoising method. First, an octree is employed to perform spatial partitioning of the global point cloud, enabling parallel acceleration. Then, within each leaf node, adaptive voxel-based occupancy statistics and k-nearest neighbor (kNN) density estimation are applied to rapidly remove clearly isolated and low-density noise points, thereby reducing the effective candidate set. Finally, a gravitational scoring function that combines density weights with adaptive distance weights is constructed to finely distinguish noise points from object points. Experiments conducted on the Stanford 3D Scanning Repository, the Canadian Adverse Driving Conditions (CADC) dataset, and in-house RUBY PLUS LiDAR point clouds acquired in our laboratory demonstrate that, compared with existing methods, the proposed approach achieves consistent improvements in F1, PSNR, and Chamfer Distance (CD) across various noise conditions while reducing the single-frame processing time, thereby validating its high accuracy, robustness, and real-time performance in multi-noise scenarios.

preprint2026arXiv

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

Vision-language-action (VLA) models remain constrained by the scarcity of action-labeled robot data, whereas action-free videos provide abundant evidence of how the physical world changes. Latent action models offer a promising way to extract such priors from videos, but reconstruction-trained latent codes are not necessarily suitable for policy generation: they may predict future observations while lacking the structure needed to be reused or generated coherently with robot actions. We introduce ALAM (Algebraic Latent Action Model), an Algebraically Consistent Latent Action Model that turns temporal relations in action-free video into structural supervision. Given frame triplets, ALAM learns latent transitions that are grounded by reconstruction while being regularized by composition and reversal consistency, encouraging a locally additive transition space. For downstream VLA learning, we freeze the pretrained encoder and use its latent transition sequences as auxiliary generative targets, co-generated with robot actions under a joint flow-matching objective. This couples structured latent transitions with flow-based policy generation, allowing the policy to exploit ALAM's locally consistent transition geometry without requiring latent-to-action decoding. Representation probes show that ALAM reduces additivity and reversibility errors by 25-85 times over unstructured latent-action baselines and improves long-horizon cumulative reconstruction. When transferred to VLA policies, ALAM raises the average success rate from 47.9% to 85.0% on MetaWorld MT50 and from 94.1% to 98.1% on LIBERO, with consistent gains on real-world manipulation tasks. Ablations further confirm that the strongest improvements arise from the synergy between algebraically structured latent transitions and joint flow matching.

preprint2026arXiv

Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition

In this paper, we propose GesFi, a novel WiFi-based gesture recognition system that introduces WiFi latent domain mining to redefine domains directly from the data itself. GesFi first processes raw sensing data collected from WiFi receivers using CSI-ratio denoising, Short-Time Fast Fourier Transform, and visualization techniques to generate standardized input representations. It then employs class-wise adversarial learning to suppress gesture semantic and leverages unsupervised clustering to automatically uncover latent domain factors responsible for distributional shifts. These latent domains are then aligned through adversarial learning to support robust cross-domain generalization. Finally, the system is applied to the target environment for robust gesture inference. We deployed GesFi under both single-pair and multi-pair settings using commodity WiFi transceivers, and evaluated it across multiple public datasets and real-world environments. Compared to state-of-the-art baselines, GesFi achieves up to 78% and 50% performance improvements over existing adversarial methods, and consistently outperforms prior generalization approaches across most cross-domain tasks.

preprint2026arXiv

Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

WiFi-based 3D human pose estimation offers a low-cost and privacy-preserving alternative to vision-based systems for smart interaction. However, existing approaches rely on visual 3D poses as supervision and directly regress CSI to a camera-based coordinate system. We find that this practice leads to coordinate overfitting: models memorize deployment-specific WiFi transceiver layouts rather than only learning activity-relevant representations, resulting in severe generalization failures. To address this challenge, we present PerceptAlign, the first geometry-conditioned framework for WiFi-based cross-layout pose estimation. PerceptAlign introduces a lightweight coordinate unification procedure that aligns WiFi and vision measurements in a shared 3D space using only two checkerboards and a few photos. Within this unified space, it encodes calibrated transceiver positions into high-dimensional embeddings and fuses them with CSI features, making the model explicitly aware of device geometry as a conditional variable. This design forces the network to disentangle human motion from deployment layouts, enabling robust and, for the first time, layout-invariant WiFi pose estimation. To support systematic evaluation, we construct the largest cross-domain 3D WiFi pose estimation dataset to date, comprising 21 subjects, 5 scenes, 18 actions, and 7 device layouts. Experiments show that PerceptAlign reduces in-domain error by 12.3% and cross-domain error by more than 60% compared to state-of-the-art baselines. These results establish geometry-conditioned learning as a viable path toward scalable and practical WiFi sensing.

preprint2026arXiv

Correct and Weight: A Simple Yet Effective Loss for Implicit Feedback Recommendation

Learning from implicit feedback has become the standard paradigm for modern recommender systems. However, this setting is fraught with the persistent challenge of false negatives, where unobserved user-item interactions are not necessarily indicative of negative preference. To address this issue, this paper introduces a novel and principled loss function, named Corrected and Weighted (CW) loss, that systematically corrects for the impact of false negatives within the training objective. Our approach integrates two key techniques. First, inspired by Positive-Unlabeled learning, we debias the negative sampling process by re-calibrating the assumed negative distribution. By theoretically approximating the true negative distribution (p-) using the observable general data distribution (p) and the positive interaction distribution (p^+), our method provides a more accurate estimate of the likelihood that a sampled unlabeled item is truly negative. Second, we introduce a dynamic re-weighting mechanism that modulates the importance of each negative instance based on the model's current prediction. This scheme encourages the model to enforce a larger ranking margin between positive items and confidently predicted (i.e., easy) negative items, while simultaneously down-weighting the penalty on uncertain negatives that have a higher probability of being false negatives. A key advantage of our approach is its elegance and efficiency; it requires no complex modifications to the data sampling process or significant computational overhead, making it readily applicable to a wide array of existing recommendation models. Extensive experiments conducted on four large-scale, sparse benchmark datasets demonstrate the superiority of our proposed loss. The results show that our method consistently and significantly outperforms a suite of state-of-the-art loss functions across multiple ranking-oriented metrics.

preprint2026arXiv

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet how such modules should be parameterized on top of a pretrained VLA remains an open design question. Existing world-model-augmented VLAs typically pass the per-frame visual stream into the world module at high visual bandwidth and treat its rollout as a side product of action prediction; under a constrained adaptation budget on a frozen backbone, this leaves both the per-frame representation and the latent action coupling under-examined. We introduce OneWM-VLA, which compresses each view into a single semantic token per frame through an Adaptive Attention Pooling, and produces the resulting latent stream and the action trajectory under a single flow-matching objective rather than connecting them through a separate decoder. Empirically, we find that per-frame visual bandwidth can be reduced to a single token without compromising long-horizon performance under our setup. Trained with 14.71M LoRA parameters on a $π_0$ (2B) backbone, OneWM-VLA improves the average success rate from 47.9% to 61.3% on MetaWorld~MT50, reaches 95.6% on LIBERO-Long (vs.85.2% for $π_0$), and reaches 60.0% on the long-horizon deformable task Fold Cloth on a real Piper arm (vs.20.0% for $π_0$).

preprint2026arXiv

Pervasive Vulnerability Analysis and Defense for QKD-based Quantum Private Query

Quantum Private Query (QPQ) based on Quantum Key Distribution (QKD) is among the most practically viable quantum communication protocols, with application value second only to QKD itself. However, prevalent security vulnerabilities in the post-processing stages of most existing QKD-based QPQ protocols have been severely overlooked. This study focuses on hidden information extraction under undetermined signal bits, revealing that most such QPQ protocols face severe security threats even without complex quantum resources. Specifically, direct observation attack causes incremental information leakage, while the minimum error discrimination attack efficiently steals additional database inforamtion. To address these critical flaws, the proposed multi-encryption defense scheme is compatible with existing QPQ protocols. The study demonstrates the necessity of the multi-encryption strategy for the security of databases in QPQ, providing key theoretical and technical support for constructing practical QPQ protocols resistant to real-world attacks.

preprint2026arXiv

SM3D: Mitigating Spectral Bias and Semantic Dilution in Point Cloud State Space Models

Point clouds are a fundamental 3D data representation that underpins various computer vision tasks. Recently, Mamba has demonstrated strong potential for 3D point cloud understanding. However, existing approaches primarily focus on point serialization, overlooking a more fundamental limitation: State Space Models (SSMs) inherently exhibit a spectral low-pass bias arising from their recursive formulation. In serialized point clouds, this bias is particularly detrimental, as it suppresses high-frequency geometric structures and progressively dilutes semantic discriminability across deep layers. To address these limitations, we propose SM3D, a spectral-aware framework designed to jointly preserve geometric fidelity and semantic consistency. First, a Geometric Spectral Compensator (GSC) is introduced to counteract the low-pass bias by explicitly injecting graph-guided high-frequency components through local Laplacian analysis, thereby restoring structural sensitivity. Second, we design a Semantic Coherence Refiner (SCR) to rectify semantic drift through frequency-aware channel recalibration. To balance theoretical precision and computational efficiency, SCR is instantiated via two pathways: an exact Laplacian eigendecomposition (SCR-L) and a linear-complexity Chebyshev polynomial approximation (SCR-C). Extensive experiments demonstrate that SM3D achieves state-of-the-art performance, including 96.0% accuracy on ModelNet40 and 86.5% mIoU on ShapeNetPart, validating its effectiveness in mitigating spectral low-pass bias and semantic dilution (Code: https://github.com/L1277471578/SM3D).

preprint2026arXiv

Testing supermassive primordial black holes with lensing signals of binary black hole merges

Next-generation ground-based gravitational wave (GW) detectors are expected to observe millions of binary black hole mergers, a fraction of which will be strongly lensed by intervening galaxies or clusters, producing multiple images with characteristic distribution of time delay. Importantly, the predicted rate and properties of such events are sensitive to the abundance and distribution of strong lensing objects which directly depends on cosmological models. One such scenario posits the existence of supermassive primordial black holes (SMPBHs) in the early universe, which would enhance the formation of dark matter halos. This mechanism has been proposed to explain the abundance of high-redshift galaxies observed by James Webb Space Telescope. Crucially, the same cosmological model with SMPBHs would also leave a distinct imprint on the population of strongly lensed GWs. It predicts both an increased event rate and a modified distribution of time delays between the multiple images. Therefore, we propose statistical measurements of the rate and time delay distribution of strong lensing GW events as a powerful probe to directly constrain the abundance of SMPBHs. Considering $Λ$CDM cosmology with (non-)clustered SMPBHs, we find that the abundance of SMPBHs $f_{\rm PBH}$ with masses above $10^8~M_{\odot}$ is constrained to be $\sim10^{-4}$ at $95\%$ confidence level. It will be comparable and complementary to the currently available constraint from large scale structure observations.

preprint2025arXiv

MICACL: Multi-Instance Category-Aware Contrastive Learning for Long-Tailed Dynamic Facial Expression Recognition

Dynamic facial expression recognition (DFER) faces significant challenges due to long-tailed category distributions and complexity of spatio-temporal feature modeling. While existing deep learning-based methods have improved DFER performance, they often fail to address these issues, resulting in severe model induction bias. To overcome these limitations, we propose a novel multi-instance learning framework called MICACL, which integrates spatio-temporal dependency modeling and long-tailed contrastive learning optimization. Specifically, we design the Graph-Enhanced Instance Interaction Module (GEIIM) to capture intricate spatio-temporal between adjacent instances relationships through adaptive adjacency matrices and multiscale convolutions. To enhance instance-level feature aggregation, we develop the Weighted Instance Aggregation Network (WIAN), which dynamically assigns weights based on instance importance. Furthermore, we introduce a Multiscale Category-aware Contrastive Learning (MCCL) strategy to balance training between major and minor categories. Extensive experiments on in-the-wild datasets (i.e., DFEW and FERV39k) demonstrate that MICACL achieves state-of-the-art performance with superior robustness and generalization.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision Artificial Intelligence Human-Computer Interaction Machine Learning astro-ph.CO Information Retrieval quant-ph Robotics

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2509.04344:author:8:bin-liu

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.07931:author:7:bin-liu

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.10819:author:7:bin-liu

Imported May 20, 2026Synced May 20, 2026

2 works

Chunyang Wang

Researcher

Chunyang Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Huan Yan

Researcher

Huan Yan contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Jinyang Huang

Researcher

Jinyang Huang contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Xiang Zhang

Researcher

Xiang Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

Bin Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models

Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition

Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

Correct and Weight: A Simple Yet Effective Loss for Implicit Feedback Recommendation

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

Pervasive Vulnerability Analysis and Defense for QKD-based Quantum Private Query

SM3D: Mitigating Spectral Bias and Semantic Dilution in Point Cloud State Space Models

Testing supermassive primordial black holes with lensing signals of binary black hole merges

MICACL: Multi-Instance Category-Aware Contrastive Learning for Long-Tailed Dynamic Facial Expression Recognition