Source author record

Han Zhong

Han Zhong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Science and Game Theory cond-mat.mes-hall quant-ph

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Skyrmion Quantum Diode Prototype: Bridging Micromagnetic Simulations and Quantum Models

Magnetic skyrmions are topologically protected spin textures known for their robustness against perturbations. Their topological stability makes them robust information carriers, ideal for tackling a key challenge in quantum computing: creating reliable, one-way links between different types of qubits. In this proof-of-concept study, we introduce a novel device - the skyrmion quantum diode - based on skyrmion qubits. Our approach combines classical micromagnetic simulations, achieving skyrmion diameters as small as 3 nm, with quantum circuit models inspired by superconducting qubits. In this work, we demonstrate: (i) unidirectional skyrmion transport via the skyrmion Hall effect in asymmetric junctions, spanning length scales from 20 nm down to 3 nm; (ii) potential compatibility with flux-tunable quantum architectures; and (iii) preliminary insights into anharmonicity in skyrmion-based qubit systems. These results establish both the operational feasibility and the scaling behavior necessary for a hybrid skyrmion-quantum platform. Our work outlines a path toward integrating skyrmion based quantum components into practical device architectures, enabling low-dissipation, unidirectional quantum information transport. This capability is crucial for scalable quantum computing, spintronic logic, and hybrid quantum systems, and opens opportunities for chipscale, pump-free isolators and directional quantum links that enhance readout fidelity, reduce cryogenic load, and support modular skyrmion-superconducting processors

preprint2022arXiv

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

In this paper, we present a reduction-based framework for conservative bandits and RL, in which our core technique is to calculate the necessary and sufficient budget obtained from running the baseline policy. For lower bounds, we improve the existing lower bound for conservative multi-armed bandits and obtain new lower bounds for conservative linear bandits, tabular RL and low-rank MDP, through a black-box reduction that turns a certain lower bound in the nonconservative setting into a new lower bound in the conservative setting. For upper bounds, in multi-armed bandits, linear bandits and tabular RL, our new upper bounds tighten or match existing ones with significantly simpler analyses. We also obtain a new upper bound for conservative low-rank MDP.

preprint2022arXiv

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer. The goal of the agent is to learn the optimal policy which is most preferred by the human overseer. Despite the empirical successes, the theoretical understanding of preference-based RL (PbRL) is only limited to the tabular case. In this paper, we propose the first optimistic model-based algorithm for PbRL with general function approximation, which estimates the model using value-targeted regression and calculates the exploratory policies by solving an optimistic planning problem. Our algorithm achieves the regret of $\tilde{O} (\operatorname{poly}(d H) \sqrt{K} )$, where $d$ is the complexity measure of the transition and preference model depending on the Eluder dimension and log-covering numbers, $H$ is the planning horizon, $K$ is the number of episodes, and $\tilde O(\cdot)$ omits logarithmic terms. Our lower bound indicates that our algorithm is near-optimal when specialized to the linear setting. Furthermore, we extend the PbRL problem by formulating a novel problem called RL with $n$-wise comparisons, and provide the first sample-efficient algorithm for this new setting. To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.

preprint2022arXiv

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions. Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which suggests that the data-dependent term in the upper bound is intrinsic. Our theoretical results also highlight a notion of "relative uncertainty", which characterizes the necessary and sufficient condition for achieving sample efficiency in offline MGs. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation.

Han Zhong

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Skyrmion Quantum Diode Prototype: Bridging Micromagnetic Simulations and Quantum Models

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets