Researcher profile

Yu Meng

Yu Meng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innovation is Hint-$δ$, an intrinsic reward that quantifies the predictive shift between a Generator model's unassisted response and its response conditioned on a self-generated hint. Using this signal, a Proposer model is trained via GRPO to continuously target the Generator's blind spots by synthesizing challenging queries and informative hints. The Generator is concurrently optimized via DPO to internalize these hint-guided improvements. Theoretically, we prove a best-iterate suboptimality guarantee for an idealized standard-DPO version of G-Zero, provided that the Proposer induces sufficient exploration coverage and the data filteration keeps pseudo-label score noise low. By deriving supervision entirely from internal distributional dynamics, G-Zero bypasses the capability ceilings of external judges, providing a scalable, robust pathway for continuous LLM self-evolution across unverifiable domains.

preprint2023arXiv

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.

preprint2022arXiv

Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type. Recently, prompt-based tuning has demonstrated superior performance to standard fine-tuning in few-shot scenarios by formulating the entity type classification task as a ''fill-in-the-blank'' problem. This allows effective utilization of the strong language modeling capability of Pre-trained Language Models (PLMs). Despite the success of current prompt-based tuning approaches, two major challenges remain: (1) the verbalizer in prompts is either manually designed or constructed from external knowledge bases, without considering the target corpus and label hierarchy information, and (2) current approaches mainly utilize the representation power of PLMs, but have not explored their generation power acquired through extensive general-domain pre-training. In this work, we propose a novel framework for few-shot FET consisting of two modules: (1) an entity type label interpretation module automatically learns to relate type labels to the vocabulary by jointly leveraging few-shot instances and the label hierarchy, and (2) a type-based contextualized instance generator produces new instances based on given instances to enlarge the training set for better generalization. On three benchmark datasets, our model outperforms existing methods by significant margins. Code can be found at https://github.com/teapot123/Fine-Grained-Entity-Typing.

preprint2022arXiv

Lattice calculation of $χ_{c0} \rightarrow 2γ$ decay width

We perform a lattice QCD calculation of the $χ_{c0} \rightarrow 2γ$ decay width using a model-independent method which does not require a momentum extrapolation of the corresponding off-shell form factors. The simulation is performed on ensembles of $N_f=2$ twisted mass lattice QCD gauge configurations with three different lattice spacings. After a continuum extrapolation, the decay width is obtained to be $Γ_{γγ}(χ_{c0})=3.65(83)_{\mathrm{stat}}(21)_{\mathrm{lat.syst}}(66)_{\mathrm{syst}}\, \textrm{keV}$. Albeit this large statistical error, our result is compatible with the experimental results within 1.3$σ$. Potential improvements of the lattice calculation in the future are also discussed.

preprint2022arXiv

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.

preprint2022arXiv

Reflective Dielectric Cavity Enhanced Emission from Hexagonal Boron Nitride Spin Defect Arrays

Among the various kinds of spin defects in hBN, the negatively charged boron vacancy ($\rm V_B^-$) spin defect that can be deterministically generated is undoubtedly a potential candidate for quantum sensing, but its low quantum efficiency restricts its %use in practical applications. Here, we demonstrate a robust enhancement structure with advantages including easy on-chip integration, convenient processing, low cost and suitable broad-spectrum enhancement for $\rm V_B^-$ defects. %Improved photoluminescence (PL) intensity and optically detected magnetic resonance (ODMR) contrast of $\rm V_B^-$ defect arrays. In the experiment, we used a metal reflective layer under the hBN flakes, filled with a transition dielectric layer in the middle, and adjusted the thickness of the dielectric layer to achieve the best coupling between the reflective dielectric cavity and the hBN spin defect. Using a reflective dielectric cavity, we achieved a PL enhancement of approximately 7-fold, and the corresponding ODMR contrast achieved 18\%. Additionally, the oxide layer of the reflective dielectric cavity can be used as an integrated material for micro-nano photonic devices for secondary processing, which means that it can be combined with other enhancement structures to achieve stronger enhancement. This work has guiding significance for realizing the on-chip integration of spin defects in two-dimensional materials.

preprint2022arXiv

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in documents, the difficulty of incorporating external linguistic knowledge, and the lack of both accurate and efficient inference methods for approximating the intractable posterior. Recently, pretrained language models (PLMs) have brought astonishing performance improvements to a wide variety of tasks due to their superior representations of text. Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models. In this paper, we begin by analyzing the challenges of using PLM representations for topic discovery, and then propose a joint latent space learning and clustering framework built upon PLM embeddings. In the latent space, topic-word and document-topic distributions are jointly modeled so that the discovered topics can be interpreted by coherent and distinctive terms and meanwhile serve as meaningful summaries of the documents. Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery, and is conceptually simpler than topic models. On two benchmark datasets in different domains, our model generates significantly more coherent and diverse topics than strong topic models, and offers better topic-wise document representations, based on both automatic and human evaluations.

preprint2020arXiv

A Lattice Study of the Two-photon Decay Widths for Scalar and Pseudo-scalar Charmonium

In this exploratory study, two photon decay widths of pseudo-scalar ($η_c$) and scalar ($χ_{c0}$) charmonium are computed using two ensembles of $N_f=2$ twisted mass lattice QCD gauge configurations. The simulation is performed two lattice ensembles with lattice spacings $a=0.067$ fm with size $32^3\times{64}$ and $a=0.085$ fm with size $24^3\times{48}$, respectively. The results for the decay widths for the two charmonia are obtained which are in the right ballpark however smaller than the experimental ones. Possible reasons for these discrepancies are discussed.

preprint2020arXiv

Discriminative Topic Mining via Category-Name Guided Text Embedding

Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.

preprint2020arXiv

Experimental observation of coherent-information superadditivity in a dephrasure channel

We present an experimental approach to construct a dephrasure channel, which contains both dephasing and erasure noises, and can be used as an efficient tool to study the superadditivity of coherent information. By using a three-fold dephrasure channel, the superadditivity of coherent information is observed, and a substantial gap is found between the zero single-letter coherent information and zero quantum capacity. Particularly, we find that when the coherent information of n channel uses is zero, in the case of larger number of channel uses, it will become positive. These phenomena exhibit a more obvious superadditivity of coherent information than previous works, and demonstrate a higher threshold for non-zero quantum capacity. Such novel channels built in our experiment also can provide a useful platform to study the non-additive properties of coherent information and quantum channel capacity.

preprint2020arXiv

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Given a small set of seed entities (e.g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.

preprint2020arXiv

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.

preprint2020arXiv

Three Photon Decay of $J/ψ$ from Lattice QCD

Three photon decay rate of $J/ψ$ is studied using two $N_f=2$ twisted mass gauge ensembles with lattice spacings $a\simeq 0.085$ fm (I) and $0.067$ fm(II). Using a new method, only the correlation functions directly related to the physical decay width are computed with all polarizations of the initial and final states summed over. Our results for such rare decay on the two ensembles are: $\mathcal{B}_{I,II}(J/ψ\rightarrow 3γ)=(1.614 \pm 0.016 \pm 0.261)\times 10^{-5},(1.809 \pm 0.051 \pm 0.295)\times 10^{-5}$ where the first errors are statistical and the second are estimates from systematics. We also propose a method to analyze the Dalitz plot of the corresponding process based on the lattice data which can provide direct information for the experiments.

preprint2020arXiv

Ward Identity of the Vector Current and the Decay Rate of $η_c\rightarrowγγ$ in Lattice QCD

Using a recently proposed method arXiv:1910.11597 (Yu Meng et al.), we study the two-photon decay rate of $η_c$ using two $N_f=2$ twisted mass gauge ensembles with lattice spacings $0.067$fm and $0.085$fm. The results obtained from these two ensembles can be extrapolated in a naive fashion to the continuum limit, yielding a result that is consistent with the experimental one within two standard deviations. To be specific, we obtain the results for two-photon decay of $η_c$ as $\mathcal{B}(η_c\rightarrow 2γ)= 1.29(3)(18)\times 10^{-4}$ where the first error is statistical and the second is our estimate for the systematic error caused by the finite lattice spacing. It turns out that Ward identity for the vector current is of vital importance within this new method. We find that the Ward identity is violated for local current with a finite lattice spacing, however it will be restored after the continuum limit is taken.