Researcher profile

Yifei Ma

Yifei Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

General in situ feedback control of cascaded liquid crystal spatial light modulators for structured field generation

Cascaded liquid crystal spatial light modulators provide a versatile strategy for the generation of structured light and matter fields, with applications including optical communications, photonic computing, and topological field engineering. However, experimental imperfections, such as temperature-dependent liquid crystal response, variations between individual pixels, and alignment errors, present significant engineering challenges in generating high-quality fields. Moreover, changes in experimental conditions over time mean that calibrating each component once is insufficient for maintaining long-term, high-quality field generation. To address this, we present a general engineering approach based on a bespoke, physically informed, and manifold-constrained gradient-descent scheme that enables in situ feedback control, compensating for such errors in real time without the need to alter the experimental setup. We further demonstrate the correction efficacy of our proposed strategy through experiments in both spatially varying light and matter field generation, including scenarios in which complex vectorial aberrations are artificially introduced into the setup. Together, these demonstrations underscore the practicality of our method and its suitability for deployment in real-world experimental environments, paving the way for robust operation of cascaded architectures for structured field generation.

preprint2026arXiv

Resolving topological obstructions to vectorial structured field control

The use of structured matter, such as optical retarders, for vectorial control is a well-established and widely employed technique in modern optics, and has driven continued advances in the manipulation of complex, spatially varying vectorial fields. However, achieving arbitrary field conversion typically requires the use of cascaded elements, as intrinsic physical and fabrication constraints fundamentally limit individual devices to a restricted subset of transformations. This results in an overall continuous transformation potentially failing to be continuous at the level of the parameters of the cascade, leading to detrimental engineering consequences such as the introduction of complex, discontinuous aberrations that disrupt important topological properties of the underlying matter field. In this work, we establish a novel mathematical framework for analyzing the topological difficulties that emerge in the decomposition of an overall transformation into individual layers, and for determining the minimal depth required to overcome them. The strategy introduced provides a general pathway for optimizing designs for vectorial field control and matter field generation, with particular significance for the manipulation of topological phases in optical polarization fields, such as Stokes skyrmions, where continuity is of vital importance.

preprint2022arXiv

Context Uncertainty in Contextual Bandits with Applications to Recommender Systems

Recurrent neural networks have proven effective in modeling sequential user feedbacks for recommender systems. However, they usually focus solely on item relevance and fail to effectively explore diverse items for users, therefore harming the system performance in the long run. To address this problem, we propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space. REN tries to balance relevance and exploration while taking into account the uncertainty in the representations. Our theoretical analysis shows that REN can preserve the rate-optimal sublinear regret even when there exists uncertainty in the learned representations. Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.

preprint2020arXiv

Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Motivated by the many real-world applications of reinforcement learning (RL) that require safe-policy iterations, we consider the problem of off-policy evaluation (OPE) -- the problem of evaluating a new policy using the historical data obtained by different behavior policies -- under the model of nonstationary episodic Markov Decision Processes (MDP) with a long horizon and a large action space. Existing importance sampling (IS) methods often suffer from large variance that depends exponentially on the RL horizon $H$. To solve this problem, we consider a marginalized importance sampling (MIS) estimator that recursively estimates the state marginal distribution for the target policy at every step. MIS achieves a mean-squared error of $$ \frac{1}{n} \sum\nolimits_{t=1}^H\mathbb{E}_μ\left[\frac{d_t^π(s_t)^2}{d_t^μ(s_t)^2} \mathrm{Var}_μ\left[\frac{π_t(a_t|s_t)}{μ_t(a_t|s_t)}\big( V_{t+1}^π(s_{t+1}) + r_t\big) \middle| s_t\right]\right] + \tilde{O}(n^{-1.5}) $$ where $μ$ and $π$ are the logging and target policies, $d_t^μ(s_t)$ and $d_t^π(s_t)$ are the marginal distribution of the state at $t$th step, $H$ is the horizon, $n$ is the sample size and $V_{t+1}^π$ is the value function of the MDP under $π$. The result matches the Cramer-Rao lower bound in \citet{jiang2016doubly} up to a multiplicative factor of $H$. To the best of our knowledge, this is the first OPE estimation error bound with a polynomial dependence on $H$. Besides theory, we show empirical superiority of our method in time-varying, partially observable, and long-horizon RL environments.