Source author record

Dong Zheng

Dong Zheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.str-el Information Retrieval Information Theory Machine Learning math.IT astro-ph.HE cond-mat.mtrl-sci cond-mat.quant-gas hep-ph quant-ph

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Two-Stage Constrained Actor-Critic for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users sequentially interact with the system and provide complex and multi-faceted responses, including watch time and various types of interactions with multiple videos. One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning. On the other hand, the platforms also needs to satisfy the constraint of accommodating the responses of multiple user interactions (auxiliary goals) such like, follow, share etc. In this paper, we formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP). We find that traditional constrained reinforcement learning algorithms can not work well in this setting. We propose a novel two-stage constrained actor-critic method: At stage one, we learn individual policies to optimize each auxiliary signal. At stage two, we learn a policy to (i) optimize the main signal and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive offline evaluations, we demonstrate effectiveness of our method over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our method in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of both watch time and interactions. Our approach has been fully launched in the production system to optimize user experiences on the platform.

preprint2022arXiv

Constrained Reinforcement Learning for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users provide complex and multi-faceted responses towards recommendations, including watch time and various types of interactions with videos. As a result, established recommendation algorithms that concern a single objective are not adequate to meet this new demand of optimizing comprehensive user experiences. In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos. To solve the constrained MDP, we propose a two-stage reinforcement learning approach based on actor-critic framework. At stage one, we learn individual policies to optimize each auxiliary response. At stage two, we learn a policy to (i) optimize the main response and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive simulations, we demonstrate effectiveness of our approach over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our approach in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of watch time and interactions from video views. Our approach has been fully launched in the production system to optimize user experiences on the platform.

preprint2021arXiv

X-ray and Radio Studies of the candidate Millisecond Pulsar Binary 4FGL J0935.3+0901

4FGL J0935.5+0901, a $γ$-ray source recently identified as a candidate redback-type millisecond pulsar binary (MSP), shows an interesting feature of having double-peaked emission lines in its optical spectrum. The feature would further suggest the source as a transitional MSP system in the sub-luminous disk state. We have observed the source with \xmm\ and Five-hundred-meter Aperture Spherical radio Telescope (FAST) at X-ray and radio frequencies respectivelyfor further studies. From the X-ray observation, a bimodal count-rate distribution, which is a distinctive feature of the transitional MSP systems, is not detected, while the properties of X-ray variability and power-law spectrum are determined for the source. These results help establish the consistency of it being a redback in the radio pulsar state. However no radio pulsation signals are found in the FAST observation, resulting an upper limit on the flux density of $\sim 4\,μ$Jy. Implications of these results are discussed.

preprint2013arXiv

Pomeranchuk cooling of the SU($2N$) ultra-cold fermions in optical lattices

We investigate the thermodynamic properties of a half-filled SU(2N) Fermi-Hubbard model in the two-dimensional square lattice using the determinantal quantum Monte Carlo simulation, which is free of the fermion "sign problem". The large number of hyperfine-spin components enhances spin fluctuations, which facilitates the Pomeranchuk cooling to temperatures comparable to the superexchange energy scale at the case of SU$(6)$. Various quantities including entropy, charge fluctuation, and spin correlations have been calculated.

preprint2011arXiv

Continuous quantum phase transition between two topologically distinct valence bond solid states associated with the same spin value

We propose a simple one-dimensional spin-2 Hamiltonian, which exhibits two topologically distinct valence bond solid states in different exactly solvable limits. We then construct the phase diagram and study the quantum phase transition between these states using the infinite time evolving block decimation algorithms. From the scaling relation between the entanglement entropy and correlation length, we find that the central charge for the underlying critical conformal field theory is $c=2$.

preprint2011arXiv

Particle-hole symmetry and interaction effects in the Kane-Mele-Hubbard model

We prove that the Kane-Mele-Hubbard model with purely imaginary next-nearest-neighbor hoppings has a particle-hole symmetry at half-filling. Such a symmetry has interesting consequences including the absence of charge and spin currents along open edges, and the absence of the sign problem in the determinant quantum Monte-Carlo simulations. Consequentially, the interplay between band topology and strong correlations can be studied at high numeric precisions. The process that the topological band insulator evolves into the antiferromagnetic Mott insulator as increasing interaction strength is studied by calculating both the bulk and edge electronic properties. In agreement with previous theory analyses, the numeric simulations show that the Kane-Mele-Hubbard model exhibits three phases as increasing correlation effects: the topological band insulating phase with stable helical edges, the bulk paramagnetic phase with unstable edges, and the bulk antiferromagnetic phase.

preprint2008arXiv

Distributed Opportunistic Scheduling For Ad-Hoc Communications Under Noisy Channel Estimation

Distributed opportunistic scheduling is studied for wireless ad-hoc networks, where many links contend for one channel using random access. In such networks, distributed opportunistic scheduling (DOS) involves a process of joint channel probing and distributed scheduling. It has been shown that under perfect channel estimation, the optimal DOS for maximizing the network throughput is a pure threshold policy. In this paper, this formalism is generalized to explore DOS under noisy channel estimation, where the transmission rate needs to be backed off from the estimated rate to reduce the outage. It is shown that the optimal scheduling policy remains to be threshold-based, and that the rate threshold turns out to be a function of the variance of the estimation error and be a functional of the backoff rate function. Since the optimal backoff rate is intractable, a suboptimal linear backoff scheme that backs off the estimated signal-to-noise ratio (SNR) and hence the rate is proposed. The corresponding optimal backoff ratio and rate threshold can be obtained via an iterative algorithm. Finally, simulation results are provided to illustrate the tradeoff caused by increasing training time to improve channel estimation at the cost of probing efficiency.

preprint2008arXiv

Distributed Opportunistic Scheduling for MIMO Ad-Hoc Networks

Distributed opportunistic scheduling (DOS) protocols are proposed for multiple-input multiple-output (MIMO) ad-hoc networks with contention-based medium access. The proposed scheduling protocols distinguish themselves from other existing works by their explicit design for system throughput improvement through exploiting spatial multiplexing and diversity in a {\em distributed} manner. As a result, multiple links can be scheduled to simultaneously transmit over the spatial channels formed by transmit/receiver antennas. Taking into account the tradeoff between feedback requirements and system throughput, we propose and compare protocols with different levels of feedback information. Furthermore, in contrast to the conventional random access protocols that ignore the physical channel conditions of contending links, the proposed protocols implement a pure threshold policy derived from optimal stopping theory, i.e. only links with threshold-exceeding channel conditions are allowed for data transmission. Simulation results confirm that the proposed protocols can achieve impressive throughput performance by exploiting spatial multiplexing and diversity.

Dong Zheng

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Two-Stage Constrained Actor-Critic for Short Video Recommendation

Constrained Reinforcement Learning for Short Video Recommendation

X-ray and Radio Studies of the candidate Millisecond Pulsar Binary 4FGL J0935.3+0901

Pomeranchuk cooling of the SU($2N$) ultra-cold fermions in optical lattices

Continuous quantum phase transition between two topologically distinct valence bond solid states associated with the same spin value

Particle-hole symmetry and interaction effects in the Kane-Mele-Hubbard model

Distributed Opportunistic Scheduling For Ad-Hoc Communications Under Noisy Channel Estimation

Distributed Opportunistic Scheduling for MIMO Ad-Hoc Networks