Source author record

Hongxiang Li

Hongxiang Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia Artificial Intelligence cond-mat.mtrl-sci Information Theory Machine Learning math.IT physics.data-an

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation

Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these methods from a polar decomposition perspective (i.e., radial and angular sub-manifolds). Under this new geometric view, we identify three overlooked limitations in them: 1) Angular dynamics distortion: The radial-angular coupling induces non-uniform speed on the angular sub-manifold, leading to regression training difficulty and extra truncation errors. 2) Radial dynamics neglect: Feature normalization discards modality confidence, failing to distinguish out-of-distribution and in-distribution data, and abandoning crucial radial dynamics. 3) Context-agnostic unconditional flow: Dataset-specific information loss during pre-trained cross-modal feature extraction remains unrecovered. To resolve these issues, we propose warped product flow matching (WP-FM), a unified Riemannian framework that reformulates alignment on a warped product manifold. Within this framework, we derive direct product flow matching (DP-FM) by introducing a constant-warping metric, which yields a decoupled cylindrical manifold (i.e., direct product manifold). DP-FM enables independent radial evolution and constant-speed angular geodesic transport, effectively eliminating angular dynamics distortion while preserving radial consistency. Meanwhile, we incorporate classifier-free guidance by conditioning the flow on the pre-trained VLMs' hidden states to inject missing dataset-specific information. Extensive results across 11 benchmarks have demonstrated that DP-FM achieves a new state-of-the-art for multi-step few-shot adaptation.

preprint2024arXiv

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

The recent video grounding works attempt to introduce vanilla contrastive learning into video grounding. However, we claim that this naive solution is suboptimal. Contrastive learning requires two key properties: (1) \emph{alignment} of features of similar samples, and (2) \emph{uniformity} of the induced distribution of the normalized features on the hypersphere. Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations. Both characteristics lead to vanilla contrastive learning being unsuitable for video grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. We quantify the correlations among moments leveraging the geodesic distance that guides the model to learn the correct cross-modal representations. Furthermore, from the novel perspective of game theory, we propose semantic Shapley interaction based on geodesic distance sampling to learn fine-grained semantic alignment in similar moments. Experiments on three benchmarks demonstrate the effectiveness of our method.

preprint2021arXiv

Convolutional neural network-assisted recognition of nanoscale L12 ordered structures in face-centred cubic alloys

Nanoscale L12-type ordered structures are widely used in face-centred cubic (FCC) alloys to exploit their hardening capacity and thereby improve mechanical properties. These fine-scale particles are typically fully coherent with matrix with the same atomic configuration disregarding chemical species, which makes them challenging to be characterized. Spatial distribution maps (SDMs) are used to probe local order by interrogating the three-dimensional (3D) distribution of atoms within reconstructed atom probe tomography (APT) data. However, it is almost impossible to manually analyse the complete point cloud ($>10$ million) in search for the partial crystallographic information retained within the data. Here, we proposed an intelligent L12-ordered structure recognition method based on convolutional neural networks (CNNs). The SDMs of a simulated L12-ordered structure and the FCC matrix were firstly generated. These simulated images combined with a small amount of experimental data were used to train a CNN-based L12-ordered structure recognition model. Finally, the approach was successfully applied to reveal the 3D distribution of L12-type $δ^\prime$-Al3(LiMg) nanoparticles with an average radius of 2.54 nm in a FCC Al-Li-Mg system. The minimum radius of detectable nanodomain is even down to 5 Å. The proposed CNN-APT method is promising to be extended to recognize other nanoscale ordered structures and even more-challenging short-range ordered phenomena in the near future.

preprint2016arXiv

Opportunistic Scheduling for Network Coded Data in Wireless Multicast Networks

In this paper queue stability in a single-hop wireless multicast networks over erasure channels is analyzed. First, a queuing model consisting of several sub-queues is introduced. Under the queueing stability constraint, we adopt Lyapunov optimization model and define decision variables to derive a network coding based packet scheduling algorithm, which has significantly less complexity and shorter queue size compared with the existing solutions. Further, the proposed algorithm is modified to meet the requirements of time-critical data. Finally, the simulation results verify the effectiveness of our proposed algorithm.

preprint2015arXiv

A Fast Sub-Pixel Motion Estimation Algorithm for H.264/AVC Video Coding

Motion Estimation (ME) is one of the most time-consuming parts in video coding. The use of multiple partition sizes in H.264/AVC makes it even more complicated when compared to ME in conventional video coding standards. It is important to develop fast and effective sub-pixel ME algorithms since (a) The computation overhead by sub-pixel ME has become relatively significant while the complexity of integer-pixel search has been greatly reduced by fast algorithms, and (b) Reducing sub-pixel search points can greatly save the computation for sub-pixel interpolation. In this paper, a novel fast sub-pixel ME algorithm is proposed which performs a 'rough' sub-pixel search before the partition selection, and performs a 'precise' sub-pixel search for the best partition. By reducing the searching load for the large number of non-best partitions, the computation complexity for sub-pixel search can be greatly decreased. Experimental results show that our method can reduce the sub-pixel search points by more than 50% compared to existing fast sub-pixel ME methods with negligible quality degradation.

preprint2015arXiv

A new network-based algorithm for human activity recognition in video

In this paper, a new network-transmission-based (NTB) algorithm is proposed for human activity recognition in videos. The proposed NTB algorithm models the entire scene as an error-free network. In this network, each node corresponds to a patch of the scene and each edge represents the activity correlation between the corresponding patches. Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network. By analyzing these specific "package transmission" processes, various activities can be effectively detected. The implementation of our NTB algorithm into abnormal activity detection and group activity recognition are described in detail in the paper. Experimental results demonstrate the effectiveness of our proposed algorithm.

preprint2015arXiv

Macroblock Classification Method for Video Applications Involving Motions

In this paper, a macroblock classification method is proposed for various video processing applications involving motions. Based on the analysis of the Motion Vector field in the compressed video, we propose to classify Macroblocks of each video frame into different classes and use this class information to describe the frame content. We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame. Based on the proposed macroblock classification, we further propose algorithms for different video processing applications, including shot change detection, motion discontinuity detection, and outlier rejection for global motion estimation. Experimental results demonstrate that the methods based on the proposed approach can work effectively on these applications.

Hongxiang Li

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

Convolutional neural network-assisted recognition of nanoscale L12 ordered structures in face-centred cubic alloys

Opportunistic Scheduling for Network Coded Data in Wireless Multicast Networks

A Fast Sub-Pixel Motion Estimation Algorithm for H.264/AVC Video Coding

A new network-based algorithm for human activity recognition in video

Macroblock Classification Method for Video Applications Involving Motions