Researcher profile

Mingzhong Wang

Mingzhong Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization

Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient knowledge acquisition by RL agents. However, OMRL still suffers extrapolation errors due to out-of-distribution (OOD) actions, compromised by broad task distributions and Markov Decision Process (MDP) ambiguity in meta-RL setups. Existing research indicates that the generalization of the $Q$ network affects the extrapolation error in offline RL. This paper investigates this relationship by decomposing the $Q$ value into feature and weight components, observing that while decomposition enhances adaptability and convergence in the case of high-quality data, it often leads to policy degeneration or collapse in complex tasks. We observe that decomposed $Q$ values introduce a large estimation bias when the feature encounters OOD samples, a phenomenon we term ''feature overgeneralization''. To address this issue, we propose FLORA, which identifies OOD samples by modeling feature distributions and estimating their uncertainties. FLORA integrates a return feedback mechanism to adaptively adjust feature components. Furthermore, to learn precise task representations, FLORA explicitly models the complex task distribution using a chain of invertible transformations. We theoretically and empirically demonstrate that FLORA achieves rapid adaptation and meta-policy improvement compared to baselines across various environments.

preprint2022arXiv

SimSR: Simple Distance-based State Representation for Deep Reinforcement Learning

This work explores how to learn robust and generalizable state representation from image-based observations with deep reinforcement learning methods. Addressing the computational complexity, stringent assumptions and representation collapse challenges in existing work of bisimulation metric, we devise Simple State Representation (SimSR) operator. SimSR enables us to design a stochastic approximation method that can practically learn the mapping functions (encoders) from observations to latent representation space. In addition to the theoretical analysis and comparison with the existing work, we experimented and compared our work with recent state-of-the-art solutions in visual MuJoCo tasks. The results shows that our model generally achieves better performance and has better robustness and good generalization.

preprint2020arXiv

Smart metastructure method for increasing TC of Bi(Pb)SrCaCuO high-temperature superconductors

Improving the critical transition temperature (TC) of Bi(Pb)SrCaCuO (B(P)SCCO) high-temperature superconductors is important, however, considerable challenges exist. In this study, on the basis of the metamaterial structure and the idea that the injecting energy will promote the formation of Cooper pairs, a smart meta-superconductor B(P)SCCO consisting of B(P)SCCO microparticles and Y2O3:Eu3++Ag or Y2O3:Eu3+ luminophor was designed. In the applied electric field, the Y2O3:Eu3++Ag or Y2O3:Eu3+ luminophor generates an electroluminescence (EL), thereby promoting the TC via EL energy injection. A series of Y2O3:Eu3++Ag topological luminophor-doped B(P)SCCO samples was prepared. Results showed that Y2O3:Eu3++Ag was dispersed around B(P)SCCO particles, forming a metastructure. Accordingly, the onset transition temperature (T_(C,on)) and zero resistance transition temperature (T_(C,0)) of B(P)SCCO increased. Meanwhile, the B(P)SCCO sample doped with 0.2 wt% Y2O3 or Y2O3:Sm3+ nonluminous inhomogeneous phase was also prepared to further prove the influence of EL on the T_C rather than the rare earth effect. Results indicated that the TC of the Y2O3 or Y2O3:Sm3+ doping sample decreased. However, the TC of the 0.2 wt% Y2O3:Eu3++Ag or Y2O3:Eu3+ luminophor-doped sample improved. This outcome further demonstrated that the smart metastructure method can improve the TC of B(P)SCCO.