Source author record

Shuyang Jiang

Shuyang Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Information Theory Machine Learning math.IT

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency

Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck that is poorly sensitive to fine-grained local structure. RANKCLIP partially addresses the first issue with a list-wise Plackett-Luce ranking-consistency loss, but its model is strictly first-order and inherits the second weakness untouched. We propose DINORANKCLIP, a pretraining framework that addresses both jointly. Our principal contribution is injecting a frozen DINOv3 teacher into the contrastive trunk through a dual-branch lightweight student and a multi-scale fusion module with channel-spatial attention, a self-attention refiner, and a conflict-aware gate that preserves the cross-modal alignment up to first order. Complementarily, we introduce a high-order Plackett-Luce ranking model in which the per-position utility is augmented with attention-parameterised pairwise and tuple-wise transition terms; the family contains CLIP and RANKCLIP as nested zero-order and first-order special cases, and the optimal order on every benchmark is $R^*=3$. The full empirical study -- order sweep, Fine-grained Probe on five datasets, four-node Modality-Gap analysis, six-variant Fusion ablation -- fits in 72 hours on a single eight-GPU H100 node and trains entirely on Conceptual Captions 3M. DINORANKCLIP consistently outperforms CLIP, CyCLIP, ALIP, and RANKCLIP under matched compute, with the largest relative gains on the fine-grained and out-of-distribution evaluations that most directly stress local structural reasoning.

preprint2022arXiv

Efficient 3D Deep LiDAR Odometry

An efficient 3D point cloud learning architecture, named EfficientLO-Net, for LiDAR odometry is first proposed in this paper. In this architecture, the projection-aware representation of the 3D point cloud is proposed to organize the raw 3D point cloud into an ordered data form to achieve efficiency. The Pyramid, Warping, and Cost volume (PWC) structure for the LiDAR odometry task is built to estimate and refine the pose in a coarse-to-fine approach. A projection-aware attentive cost volume is built to directly associate two discrete point clouds and obtain embedding motion patterns. Then, a trainable embedding mask is proposed to weigh the local motion patterns to regress the overall pose and filter outlier points. The trainable pose warp-refinement module is iteratively used with embedding mask optimized hierarchically to make the pose estimation more robust for outliers. The entire architecture is holistically optimized end-to-end to achieve adaptive learning of cost volume and mask, and all operations involving point cloud sampling and grouping are accelerated by projection-aware 3D feature learning methods. The superior performance and effectiveness of our LiDAR odometry architecture are demonstrated on KITTI, M2DGR, and Argoverse datasets. Our method outperforms all recent learning-based methods and even the geometry-based approach, LOAM with mapping optimization, on most sequences of KITTI odometry dataset. We open sourced our codes at: https://github.com/IRMVLab/EfficientLO-Net.

preprint2014arXiv

Block-Sparsity-Induced Adaptive Filter for Multi-Clustering System Identification

In order to improve the performance of least mean square (LMS)-based adaptive filtering for identifying block-sparse systems, a new adaptive algorithm called block-sparse LMS (BS-LMS) is proposed in this paper. The basis of the proposed algorithm is to insert a penalty of block-sparsity, which is a mixed \$l_{2, 0}\$ norm of adaptive tap-weights with equal group partition sizes, into the cost function of traditional LMS algorithm. To describe a block-sparse system response, we first propose a Markov-Gaussian model, which can generate a kind of system responses of arbitrary average sparsity and arbitrary average block length using given parameters. Then we present theoretical expressions of the steady-state misadjustment and transient convergence behavior of BS-LMS with an appropriate group partition size for white Gaussian input data. Based on the above results, we theoretically demonstrate that BS-LMS has much better convergence behavior than \$l_0\$-LMS with the same small level of misadjustment. Finally, numerical experiments verify that all of the theoretical analysis agrees well with simulation results in a large range of parameters.