Source author record

Jianshu Zhang

Jianshu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory math.IT Artificial Intelligence Computation and Language math.OC

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) elicits long chain-of-thought reasoning in large language models (LLMs), but outcome-based rewards lead to coarse-grained advantage estimation. While existing approaches improve RLVR via token-level entropy or sequence-level length control, they lack a semantically grounded, step-level measure of reasoning progress. As a result, LLMs fail to distinguish necessary deduction from redundant verification: they may continue checking after reaching a correct solution and, in extreme cases, overturn a correct trajectory into an incorrect final answer. To remedy the lack of process supervision, we introduce a training-free probing mechanism that extracts intermediate confidence and correctness and combines them into a Step Potential signal that explicitly estimates the reasoning state at each step. Building on this signal, we propose Step Potential Advantage Estimation (SPAE), a fine-grained credit assignment method that amplifies potential gains, penalizes potential drops, and applies penalty after potential saturates to encourage timely termination. Experiments across multiple benchmarks show SPAE consistently improves accuracy while substantially reducing response length, outperforming strong RL baselines and recent efficient reasoning and token-level advantage estimation methods. The code is available at https://github.com/cii030/SPAE-RL.

preprint2026arXiv

Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery

Scientific discovery is increasingly constrained by costly experiments and limited resources, underscoring the need for efficient optimization in AI for science. Bayesian Optimization (BO), though widely adopted for balancing exploration and exploitation, often exhibits slow cold-start performance and poor scalability in high-dimensional settings, limiting its applicability in real-world scientific problems. To overcome these challenges, we propose LLM-Guided Bayesian Optimization (LGBO), the first LLM preference-guided BO framework that continuously integrates the semantic reasoning of large language models (LLMs) into the optimization loop. Unlike prior works that use LLMs only for warm-start initialization or candidate generation, LGBO introduces a region-lifted preference mechanism that embeds LLM-driven preferences into every iteration, shifting the surrogate mean in a stable and controllable way. Theoretically, we prove that LGBO does not perform significantly worse than standard BO in the worst case, while achieving significantly faster convergence when preferences align with the objective. Empirically, LGBO consistently outperforms existing methods across diverse dry benchmarks in physics, chemistry, biology, and materials science. Most notably, in a new wet-lab optimization of Fe-Cr battery electrolytes, LGBO attains \textbf{90\% of the best observed value within 6 iterations}, whereas standard BO and existing LLM-augmented baselines require more than 10. Together, these results suggest that LGBO offers a promising direction for integrating LLMs into scientific optimization workflows.

preprint2022arXiv

Split, embed and merge: An accurate table structure recognizer

Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand easily, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. Our model takes table images as input and can correctly recognize the structure of tables, whether they are simple or a complex tables. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row (column) separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and language modalities. Moreover, we achieve a higher precision in our experiments through adding additional semantic features. Finally, we process the merging of these basic table grids in a self-regression manner. The correspondent merging results is learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97.11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place in the complex table and third place in all tables in ICDAR 2021 Competition on Scientific Literature Parsing, Task-B. Extensive experiments on other publicly available datasets demonstrate that our model achieves state-of-the-art.

preprint2020arXiv

Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition

In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore reduces the difficulty of symbol segmentation and recognition via the decoder with attention mechanism. For multi-modal HMER, other than fusing multi-modal information in decoder, SCAN can also fuse multi-modal information in encoder by utilizing the stroke based alignments between online and offline modalities. The encoder fusion is a better way for combining multi-modal information as it implements the information interaction one step before the decoder fusion so that the advantages of multiple modalities can be exploited earlier and more adequately when training the encoder-decoder model. Evaluated on a benchmark published by CROHME competition, the proposed SCAN achieves the state-of-the-art performance.

preprint2013arXiv

Sum-Rate Maximization with Minimum Power Consumption for MIMO DF Two-Way Relaying: Part I - Relay Optimization

The problem of power allocation is studied for a multiple-input multiple-output (MIMO) decode-and-forward (DF) two-way relaying system consisting of two source nodes and one relay. It is shown that achieving maximum sum-rate in such a system does not necessarily demand the consumption of all available power at the relay. Instead, the maximum sum-rate can be achieved through efficient power allocation with minimum power consumption. Deriving such power allocation, however, is nontrivial due to the fact that it generally leads to a nonconvex problem. In Part I of this two-part paper, a sum-rate maximizing power allocation with minimum power consumption is found for MIMO DF two-way relaying, in which the relay optimizes its own power allocation strategy given the power allocation strategies of the source nodes. An algorithm is proposed for efficiently finding the optimal power allocation of the relay based on the proposed idea of relative water-levels. The considered scenario features low complexity due to the fact that the relay optimizes its power allocation without coordinating the source nodes. As a trade-off for the low complexity, it is shown that there can be waste of power at the source nodes because of no coordination between the relay and the source nodes. Simulation results demonstrate the performance of the proposed algorithm and the effect of asymmetry on the considered system.

preprint2012arXiv

Sum-Rate Maximization with Minimum Power Consumption for MIMO DF Two-Way Relaying: Part II - Network Optimization

In Part II of this two-part paper, a sum-rate-maximizing power allocation with minimum power consumption is found for multiple-input multiple-output (MIMO) decode-and-forward (DF) two-way relaying (TWR) in a network optimization scenario. In this scenario, the relay and the source nodes jointly optimize their power allocation strategies to achieve network optimality. Unlike the relay optimization scenario considered in part I which features low complexity but does not achieve network optimality, the network-level optimal power allocation can be achieved in the network optimization scenario at the cost of higher complexity. The network optimization problem is considered in two cases each with several subcases. It is shown that the considered problem, which is originally nonconvex, can be transferred into different convex problems for all but two subcases. For the remaining two subcases, one for each case, it is proved that the optimal strategies for the source nodes and the relay must satisfy certain properties. Based on these properties, an algorithm is proposed for finding the optimal solution. The effect of asymmetry in the number of antennas, power limits, and channel statistics is also considered. Such asymmetry is shown to have a negative effect on both the achievable sum-rate and the power allocation efficiency in MIMO DF TWR. Simulation results demonstrate the performance of the proposed algorithm and the effect of asymmetry in the system.