Source author record

Zhen-Yu Zhang

Zhen-Yu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.mtrl-sci quant-ph

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Data-dependent Exploration for Online Reinforcement Learning from Human Feedback

Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, which requires algorithms that enable the LLMs to generate informative comparisons that improve sample-efficiency in online RLHF. Existing exploration strategies often derive bonuses via on-policy expectations, which are difficult to estimate reliably from the limited historical preference data available during training; as a result, the policy can prematurely down-weight under-explored regions that may contain high-value behaviors. In this paper, we propose data-dependent exploration for preference optimization (DEPO), a simple and scalable method that leverages historical data to construct an extra uncertainty bonus for high-uncertainty regions, encouraging exploration toward potentially high-value data. Theoretically, we provide a data-dependent regret bound for the proposed algorithm, showing that it adapts to the hardness of the learning task itself and can be tighter than worst-case bounds in practice. Empirically, the proposed method consistently outperforms strong baselines across benchmarks, demonstrating improved sample efficiency.

preprint2022arXiv

Early Abnormal Detection of Sewage Pipe Network: Bagging of Various Abnormal Detection Algorithms

Abnormalities of the sewage pipe network will affect the normal operation of the whole city. Therefore, it is important to detect the abnormalities early. This paper propose an early abnormal-detection method. The abnormalities are detected by using the conventional algorithms, such as isolation forest algorithm, two innovations are given: (1) The current and historical data measured by the sensors placed in the sewage pipe network (such as ultrasonic Doppler flowmeter) are taken as the overall dataset, and then the general dataset is detected by using the conventional anomaly detection method to diagnose the anomaly of the data. The anomaly refers to the sample different from the others samples in the whole dataset. Because the definition of anomaly is not through the algorithm, but the whole dataset, the construction of the whole dataset is the key to propose the early abnormal-detection algorithms. (2) A bagging strategy for a variety of conventional anomaly detection algorithms is proposed to achieve the early detection of anomalies with the high precision and recall. The results show that this method can achieve the early anomaly detection with the highest precision of 98.21%, the recall rate 63.58% and F1-score of 0.774.

preprint2010arXiv

Equivalent Circuit Description of Non-compensated n-p Codoped TiO2 as Intermediate Band Solar Cells

The novel concept of non-compensated n-p codoping has made it possible to create tunable intermediate bands in the intrinsic band gap of TiO2, making the codoped TiO2 a promising material for developing intermediate band solar cells (IBSCs). Here we investigate the quantum efficiency of such IBSCs within two scenarios - with and without current extracted from the extended intermediate band. Using the ideal equivalent circuit model, we find that the maximum efficiency of 57% in the first scenario and 53% in the second are both much higher than the Shockley-Queisser limit from single gap solar cells. We also obtain various key quantities of the circuits, a useful step in realistic development of TiO2 based solar cells invoking device integration. These equivalent circuit results are also compared with the efficiencies obtained directly from consideration of electron transition between the energy bands, and both approaches reveal the intriguing existence of double peaks in the maximum quantum efficiency as a function of the relative location of IBs.