Source author record

Zhiyuan Zhou

Zhiyuan Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning physics.optics quant-ph cond-mat.mtrl-sci physics.app-ph Robotics

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Learning Spatial-Aware Manipulation Ordering

Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that directly learns object manipulation priorities based on spatial context. Our architecture integrates a spatial context encoder with a temporal priority structuring module. We construct a spatial graph using k-Nearest Neighbors to aggregate geometric information from the local layout and encode both object-object and object-manipulator interactions to support accurate manipulation ordering in real-time. To generate physically and semantically plausible supervision signals, we introduce a spatial prior labeling method that guides a vision-language model to produce reasonable manipulation orders for distillation. We evaluate OrderMind on our Manipulation Ordering Benchmark, comprising 163,222 samples of varying difficulty. Extensive experiments in both simulation and real-world environments demonstrate that our method significantly outperforms prior approaches in effectiveness and efficiency, enabling robust manipulation in cluttered scenes.

preprint2022arXiv

Characterizing the Action-Generalization Gap in Deep Q-Learning

We study the action generalization ability of deep Q-learning in discrete action spaces. Generalization is crucial for efficient reinforcement learning (RL) because it allows agents to use knowledge learned from past experiences on new tasks. But while function approximation provides deep RL agents with a natural way to generalize over state inputs, the same generalization mechanism does not apply to discrete action outputs. And yet, surprisingly, our experiments indicate that Deep Q-Networks (DQN), which use exactly this type of function approximator, are still able to achieve modest action generalization. Our main contribution is twofold: first, we propose a method of evaluating action generalization using expert knowledge of action similarity, and empirically confirm that action generalization leads to faster learning; second, we characterize the action-generalization gap (the difference in learning performance between DQN and the expert) in different domains. We find that DQN can indeed generalize over actions in several simple domains, but that its ability to do so decreases as the action space grows larger.

preprint2022arXiv

Designing Rewards for Fast Learning

To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward function for the environment, arguably the most important knob designers have in interacting with RL agents. Although many reward functions induce the same optimal behavior (Ng et al., 1999), in practice, some of them result in faster learning than others. In this paper, we look at how reward-design choices impact learning speed and seek to identify principles of good reward design that quickly induce target behavior. This reward-identification problem is framed as an optimization problem: Firstly, we advocate choosing state-based rewards that maximize the action gap, making optimal actions easy to distinguish from suboptimal ones. Secondly, we propose minimizing a measure of the horizon, something we call the "subjective discount", over which rewards need to be optimized to encourage agents to make optimal decisions with less lookahead. To solve this optimization problem, we propose a linear-programming based algorithm that efficiently finds a reward function that maximizes action gap and minimizes subjective discount. We test the rewards generated with the algorithm in tabular environments with Q-Learning, and empirically show they lead to faster learning. Although we only focus on Q-Learning because it is perhaps the simplest and most well understood RL algorithm, preliminary results with R-max (Brafman and Tennenholtz, 2000) suggest our results are much more general. Our experiments support three principles of reward design: 1) consistent with existing results, penalizing each step taken induces faster learning than rewarding the goal. 2) When rewarding subgoals along the target trajectory, rewards should gradually increase as the goal gets closer. 3) Dense reward that's nonzero on every state is only good if designed carefully.

preprint2021arXiv

Magnon-mediated interlayer coupling in an all-antiferromagnetic junction

The interlayer coupling mediated by fermions in ferromagnets brings about parallel and anti-parallel magnetization orientations of two magnetic layers, resulting in the giant magnetoresistance, which forms the foundation in spintronics and accelerates the development of information technology. However, the interlayer coupling mediated by another kind of quasi-particle, boson, is still lacking. Here we demonstrate such a static interlayer coupling at room temperature in an antiferromagnetic junction Fe2O3/Cr2O3/Fe2O3, where the two antiferromagnetic Fe2O3 layers are functional materials and the antiferromagnetic Cr2O3 layer serves as a spacer. The Néel vectors in the top and bottom Fe2O3 are strongly orthogonally coupled, which is bridged by a typical bosonic excitation (magnon) in the Cr2O3 spacer. Such an orthogonally coupling exceeds the category of traditional collinear interlayer coupling via fermions in ground state, reflecting the fluctuating nature of the magnons, as supported by our magnon quantum well model. Besides the fundamental significance on the quasi-particle-mediated interaction, the strong coupling in an antiferromagnetic magnon junction makes it a realistic candidate for practical antiferromagnetic spintronics and magnonics with ultrahigh-density integration.

preprint2020arXiv

Increasing two-photon entangled dimensions by shaping input beam profiles

Photon pair entangled in high dimensional orbital angular momentum (OAM) degree of freedom (DOF) has been widely regarded as a possible source in improving the capacity of quantum information processing. The need for the generation of a high dimensional maximally entangled state in the OAM DOF is therefore much desired. In this work, we demonstrate a simple method to generate a broader and flatter OAM spectrum, i.e. a larger spiral bandwidth (SB), of entangled photon pairs generated through spontaneous parametric down-conversion by modifying the pump beam profile. By investigating both experimentally and theoretically, we have found that an exponential pump profile that is roughly the inverse of the mode profiles of the single-mode fibers used for OAM detection will provide a much larger SB when compared to a Gaussian shaped pump.

preprint2019arXiv

A high-dimensional quantum frequency converter

In high dimensional quantum communication networks, quantum frequency convertor (QFC) is indispensable as an interface in the frequency domain. For example, many QFCs have been built to link atomic memories and fiber channels. However, almost all of QFCs work in a two-dimensional space. It is still a pivotal challenge to construct a high-quality QFC for some complex quantum states, e.g., a high dimensional single-photon state that refers to a qudit. Here, we firstly propose a high-dimensional QFC for an orbital angular momentum qudit via sum frequency conversion with a flat top beam pump. As a proof-of-principle demonstration, we realize quantum frequency conversions for a qudit from infrared to visible range. Based on the qudit quantum state tomography, the fidelities of converted state are 98.29(95.02)\%, 97.42(91.74)\%, and 86.75(67.04)\% for a qudit without (with) dark counts in 2,3, and 5 dimensions, respectively. The demonstration is very promising for constructing a high capacity quantum communication network.