Source author record

Yunbin Zhao

Yunbin Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision Information Theory Machine Learning math.IT math.NA math.OC

Catalog footprint

What is connected

4works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Policy entropy has emerged as a fundamental measure for understanding and controlling exploration in reinforcement learning with verifiable rewards (RLVR) for LLMs. However, existing entropy-aware methods mainly regulate entropy through global objectives, while the token-level mechanism by which sampled policy updates reshape policy entropy remains underexplored. In this work, we develop a theoretical framework of entropy mechanics in RLVR. Our analysis yields a first-order approximation of the entropy change, giving rise to entropy polarity, a signed token-level quantity that predicts how much a sampled update expands or contracts entropy. This analysis further reveals a structural asymmetry: reinforcing frequent high-probability tokens triggers contraction tendencies, whereas expansive tendencies typically require lower-probability samples or stronger distributional correction. Empirically, we show that entropy polarity reliably predicts entropy changes, and that positive and negative polarity branches play complementary roles in preserving exploration while strengthening exploitation. Building on these insights, we propose Polarity-Aware Policy Optimization (PAPO), which preserves both polarity branches and implements entropy control through advantage reweighting. With the empirical entropy trajectory as an online phase signal, PAPO adaptively reallocates optimization pressure between entropy-expanding and entropy-contracting updates. Experiments on mathematical reasoning and agentic benchmarks show that PAPO consistently outperforms competitive baselines, while delivering superior training efficiency and substantial reward improvements.

preprint2022arXiv

Short Range Correlation Transformer for Occluded Person Re-Identification

Occluded person re-identification is one of the challenging areas of computer vision, which faces problems such as inefficient feature representation and low recognition accuracy. Convolutional neural network pays more attention to the extraction of local features, therefore it is difficult to extract features of occluded pedestrians and the effect is not so satisfied. Recently, vision transformer is introduced into the field of re-identification and achieves the most advanced results by constructing the relationship of global features between patch sequences. However, the performance of vision transformer in extracting local features is inferior to that of convolutional neural network. Therefore, we design a partial feature transformer-based person re-identification framework named PFT. The proposed PFT utilizes three modules to enhance the efficiency of vision transformer. (1) Patch full dimension enhancement module. We design a learnable tensor with the same size as patch sequences, which is full-dimensional and deeply embedded in patch sequences to enrich the diversity of training samples. (2) Fusion and reconstruction module. We extract the less important part of obtained patch sequences, and fuse them with original patch sequence to reconstruct the original patch sequences. (3) Spatial Slicing Module. We slice and group patch sequences from spatial direction, which can effectively improve the short-range correlation of patch sequences. Experimental results over occluded and holistic re-identification datasets demonstrate that the proposed PFT network achieves superior performance consistently and outperforms the state-of-the-art methods.

preprint2013arXiv

RSP-Based Analysis for Sparsest and Least $\ell_1$-Norm Solutions to Underdetermined Linear Systems

Recently, the worse-case analysis, probabilistic analysis and empirical justification have been employed to address the fundamental question: When does $\ell_1$-minimization find the sparsest solution to an underdetermined linear system? In this paper, a deterministic analysis, rooted in the classic linear programming theory, is carried out to further address this question. We first identify a necessary and sufficient condition for the uniqueness of least $\ell_1$-norm solutions to linear systems. From this condition, we deduce that a sparsest solution coincides with the unique least $\ell_1$-norm solution to a linear system if and only if the so-called \emph{range space property} (RSP) holds at this solution. This yields a broad understanding of the relationship between $\ell_0$- and $\ell_1$-minimization problems. Our analysis indicates that the RSP truly lies at the heart of the relationship between these two problems. Through RSP-based analysis, several important questions in this field can be largely addressed. For instance, how to efficiently interpret the gap between the current theory and the actual numerical performance of $\ell_1$-minimization by a deterministic analysis, and if a linear system has multiple sparsest solutions, when does $\ell_1$-minimization guarantee to find one of them? Moreover, new matrix properties (such as the \emph{RSP of order $K$} and the \emph{Weak-RSP of order $K$}) are introduced in this paper, and a new theory for sparse signal recovery based on the RSP of order $K$ is established.

preprint2012arXiv

Rank-one Solutions for Homogeneous Linear Matrix Equations over the Positive Semidefinite Cone

The problem of finding a rank-one solution to a system of linear matrix equations arises from many practical applications. Given a system of linear matrix equations, however, such a low-rank solution does not always exist. In this paper, we aim at developing some sufficient conditions for the existence of a rank-one solution to the system of homogeneous linear matrix equations (HLME) over the positive semidefinite cone. First, we prove that an existence condition of a rank-one solution can be established by a homotopy invariance theorem. The derived condition is closely related to the so-called $P_\emptyset$ property of the function defined by quadratic transformations. Second, we prove that the existence condition for a rank-one solution can be also established through the maximum rank of the (positive semidefinite) linear combination of given matrices. It is shown that an upper bound for the rank of the solution to a system of HLME over the positive semidefinite cone can be obtained efficiently by solving a semidefinite programming (SDP) problem. Moreover, a sufficient condition for the nonexistence of a rank-one solution to the system of HLME is also established in this paper.