Researcher profile

Yu Lu

Yu Lu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

Recently, post-training methods based on reinforcement learning, with a particular focus on Group Relative Policy Optimization (GRPO), have emerged as the robust paradigm for further advancement of text-to-image (T2I) models. However, these methods are often prone to reward hacking, wherein models exploit biases in imperfect reward functions rather than yielding genuine performance gains. In this work, we identify that normalization could lead to miscalibration and directly removing the prompt-level standard deviation term yields an optimal policy ascent direction that is linear in the advantage but still limits the separation of genuine signals from noise. To mitigate the above issues, we propose Super-Linear Advantage Shaping (SLAS) by revisiting the functional update from an information geometry perspective. By extending the Fisher-Rao information metric with advantage-dependent weighting, SLAS introduces a non-linear geometric structure that reshapes the local policy space. This design relaxes constraints along high-advantage directions to amplify informative updates, while tightening those in low-advantage regions to suppress illusory gradients. In addition, batch-level normalization is applied to stabilize training under varying reward scales. Extensive evaluations demonstrate that SLAS consistently surpasses the DanceGRPO baseline across multiple backbones and benchmarks. In particular, it yields faster training dynamics, improved out-of-domain performance on GenEval and UniGenBench++, and enhanced robustness to model scaling, while mitigating reward hacking and preserving semantic and compositional fidelity in generations.

preprint2026arXiv

PsyCLIENT: Client Simulation via Conversational Trajectory Modeling for Trainee Practice and Model Evaluation in Mental Health Counseling

LLM-based client simulation has emerged as a promising tool for training novice counselors and evaluating automated counseling systems. However, existing client simulation approaches face three key challenges: (1) limited diversity and realism in client profiles, (2) the lack of a principled framework for modeling realistic client behaviors, and (3) a scarcity in Chinese-language settings. To address these limitations, we propose PsyCLIENT, a novel simulation framework grounded in conversational trajectory modeling. By conditioning LLM generation on predefined real-world trajectories that incorporate explicit behavior labels and content constraints, our approach ensures diverse and realistic interactions. We further introduce PsyCLIENT-CP, the first open-source Chinese client profile dataset, covering 60 distinct counseling topics. Comprehensive evaluations involving licensed professional counselors demonstrate that PsyCLIENT significantly outperforms baselines in terms of authenticity and training effectiveness. Notably, the simulated clients are nearly indistinguishable from human clients, achieving an about 95\% expert confusion rate in discrimination tasks. These findings indicate that conversational trajectory modeling effectively bridges the gap between theoretical client profiles and dynamic, realistic simulations, offering a robust solution for mental health education and research. Code and data will be released to facilitate future research in mental health counseling.

preprint2022arXiv

Coupled Channel Effects for the Charmed-Strange Mesons

We make a systematic calculation of the spectra and hadronic decays of the $D_s$ system in a coupled channel framework, where the unquenched effects are induced by the $^3P_0$ model. In the calculation, the wave functions are obtained by using a nonrelativistic potential model and are handled precisely with Gaussian Expansion Method. Even though the fitting mainly focuses on the spectrum, our model agrees well with the experiments on both the spectra and the hadronic decays, suggesting that the coupled channel effect could result in a reasonable and coherent description of the $D_s$ mesons. Based on the calculation, we give a detailed analysis on various aspects of the excited states, especially $D_s(2317), D_s(2460), D_s(2536), D_s(2860), D_s(3040)$. We also predict that $D_s(2{}^3P_0)$ should be a $D^*K^*$ dominant molecule with mass $2894$~MeV, which is only $5$~MeV below the $D^*K^*$ threshold.

preprint2022arXiv

CRIS: CLIP-Driven Referring Image Segmentation

Referring image segmentation aims to segment a referent via a natural linguistic expression.Due to the distinct data properties between text and image, it is challenging for a network to well align text and pixel-level features. Existing approaches use pretrained models to facilitate learning, yet separately transfer the language/vision knowledge from pretrained models, ignoring the multi-modal corresponding information. Inspired by the recent advance in Contrastive Language-Image Pretraining (CLIP), in this paper, we propose an end-to-end CLIP-Driven Referring Image Segmentation framework (CRIS). To transfer the multi-modal knowledge effectively, CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. More specifically, we design a vision-language decoder to propagate fine-grained semantic information from textual representations to each pixel-level activation, which promotes consistency between the two modalities. In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances. The experimental results on three benchmark datasets demonstrate that our proposed framework significantly outperforms the state-of-the-art performance without any post-processing. The code will be released.

preprint2022arXiv

Interpretable Fault Diagnosis of Rolling Element Bearings with Temporal Logic Neural Network

Machine learning-based methods have achieved successful applications in machinery fault diagnosis. However, the main limitation that exists for these methods is that they operate as a black box and are generally not interpretable. This paper proposes a novel neural network structure, called temporal logic neural network (TLNN), in which the neurons of the network are logic propositions. More importantly, the network can be described and interpreted as a weighted signal temporal logic. TLNN not only keeps the nice properties of traditional neuron networks but also provides a formal interpretation of itself with formal language. Experiments with real datasets show the proposed neural network can obtain highly accurate fault diagnosis results with good computation efficiency. Additionally, the embedded formal language of the neuron network can provide explanations about the decision process, thus achieve interpretable fault diagnosis.

preprint2022arXiv

Proposal for the search for exotic spin-spin interactions at the micrometer scale using functionalized cantilever force sensors

Spin-dependent exotic interactions can be generated by exchanging hypothetical bosons, which were introduced to solve some puzzles in physics. Many precision experiments have been performed to search for such interactions, but no confirmed observation has been made. Here, we propose new experiments to search for the exotic spin-spin interactions that can be mediated by axions or Z$^\prime$ bosons. A sensitive functionalized cantilever is utilized as a force sensor to measure the interactions between the spin-polarized electrons in a periodic magnetic source structure and a closed-loop magnetic structure integrated on the cantilever. The source is set to oscillate during data acquisition to modulate the exotic force signal to high harmonics of the oscillating frequency. This helps to suppress the spurious signals at the signal frequency. Different magnetic source structures are designed for different interaction detections. A magnetic stripe structure is designed for Z$^\prime$-mediated interaction, which is insensitive to the detection of axion-mediated interaction. This allows us to measure the coupling constant of both if we assume both exist. With the force sensitivity achievable at low temperature, the proposed experiments are expected to search for the parameter spaces with much smaller coupling constant than the current stringent constraints from micrometer to millimeter range. Specifically, the lower bound of the parameter space will be seven orders of magnitude lower than the stringent constraints for Z$^\prime$-mediated interaction, and an order of magnitude lower for axion-mediated interaction, at the interaction range of $10\, μ$m.

preprint2021arXiv

Development of a GPU-accelerated Monte Carlo dose calculation module for nuclear medicine, ARCHER-NM: Demonstration for a PET/CT imaging procedure

This paper describes the development and validation of a Monte Carlo (MC) dose computing module dedicated to organ dose calculations of patients undergoing nuclear medicine (NM) internal radiation exposures involving 18F-FDG PET/CT examination. This new module extends the more-than-10-years-long ARCHER project that developed a GPU-accelerated MC dose engine by adding dedicated NM source-definition features. To validate the code, we compared dose distributions from the 0.511-MeV point photon source calculated for a water phantom as well as a patient PET/CT phantom against a well-tested MC code, GATE. The water-phantom results show excellent agreement, suggesting that the radiation physics module in the new NM code is adequate. To demonstrate the clinical utility and advantage of ARCHER-NM, one set of PET/CT data for an adult male NM patient is calculated using the new code. Radiosensitive organs in the CT dataset are segmented using a CNN-based tool called DeepViewer. The PET image intensity maps are converted to radioactivity distributions to allow for MC radiation transport dose calculations at the voxel level. The dose rate maps and corresponding statistical uncertainties were calculated for the duration of PET image acquisition. The dose rate results of the 18F-FDG PET imaging patient show that ARCHER-NM's results agree very well with those of the GATE within 0.58% to 4.11%. Most impressively, ARCHER-NM obtains such results in less than 0.5 minutes while it takes GATE as much as 376 minutes. This is the first study presenting GPU-accelerated patient-specific MC internal radiation dose rate calculations for clinically realistic 18F-FDG PET/CT imaging cases involving auto-segmentation of whole-body PET/CT images. This study suggests that modern computing tools -- ARCHER-NM and DeepViewer -- are accurate and fast enough for routine internal dosimetry in NM clinics.

preprint2020arXiv

A random-walk model for dark matter halo spins

We extend the random-walk model of Vitvitska et al. for predicting the spins of dark matter halos from their merger histories. Using updated merger rates, orbital parameter distributions, and N-body constraints we show that this model can accurately reproduce the distribution of spin parameters measured in N-body simulations when we include a weak correlation between the spins of halos and the angular momenta of infalling subhalos. We further show that this model is in approximate agreement with the correlation of the spin magnitude over time as determined from N-body simulations, while it slightly underpredicts the correlation in the direction of the spin vector measured from the same simulations. This model is useful for predicting spins from merger histories derived from non-N-body sources, thereby circumventing the need for very high resolution simulations to permit accurate measurements of spins. It may be particularly relevant to modeling systems which accumulate angular momentum from halos over time (such as galactic discs) - we show that this model makes small but significant changes in the distribution of galactic disc sizes computed using the Galacticus semi-analytic galaxy formation model.

preprint2020arXiv

C-DLinkNet: considering multi-level semantic features for human parsing

Human parsing is an essential branch of semantic segmentation, which is a fine-grained semantic segmentation task to identify the constituent parts of human. The challenge of human parsing is to extract effective semantic features to resolve deformation and multi-scale variations. In this work, we proposed an end-to-end model called C-DLinkNet based on LinkNet, which contains a new module named Smooth Module to combine the multi-level features in Decoder part. C-DLinkNet is capable of producing competitive parsing performance compared with the state-of-the-art methods with smaller input sizes and no additional information, i.e., achiving mIoU=53.05 on the validation set of LIP dataset.

preprint2020arXiv

GINet: Graph Interaction Network for Scene Parsing

Recently, context reasoning using image regions beyond local convolution has shown great potential for scene parsing. In this work, we explore how to incorporate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). The GI unit is capable of enhancing feature representations of convolution networks over high-level semantics and learning the semantic coherency adaptively to each sample. Specifically, the dataset-based linguistic knowledge is first incorporated in the GI unit to promote context reasoning over the visual graph, then the evolved representations of the visual graph are mapped to each local representation to enhance the discriminated capability for scene parsing. GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph. We perform full ablation studies to demonstrate the effectiveness of each component in our approach. Particularly, the proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.

preprint2020arXiv

Object Instance Mining for Weakly Supervised Object Detection

Weakly supervised object detection (WSOD) using only image-level annotations has attracted growing attention over the past few years. Existing approaches using multiple instance learning easily fall into local optima, because such mechanism tends to learn from the most discriminative object in an image for each category. Therefore, these methods suffer from missing object instances which degrade the performance of WSOD. To address this problem, this paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection. OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs, without any additional annotations. During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training. In addition, we design an object instance reweighted loss to learn larger portion of each object instance to further improve the performance. The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach.

preprint2020arXiv

Quark structure of the $χ_{\rm c}(3P)$ and $X(4274)$ resonances and their strong and radiative decays

We calculate the masses of $χ_{\rm c}(3P)$ states with threshold corrections in a coupled-channel model. The model was recently applied to the description of the properties of $χ_{\rm c}(2P)$ and $χ_{\rm b}(3P)$ multiplets [Phys.\ Lett.\ B {\bf 789}, 550 (2019)]. We also compute the open-charm strong decay widths of the $χ_{\rm c}(3P)$ states and their radiative transitions. According to our predictions, the $χ_{\rm c}(3P)$ states should be dominated by the charmonium core, but they may also show small meson-meson components. The $X(4274)$ is interpreted as a $c \bar c$ $χ_{\rm c1}(3P)$ state. More informations on the other members of the $χ_{\rm c}(3P)$ multiplet, as well as a more rigorous analysis of the $X(4274)$'s decay modes, are needed to provide further indications on the quark structure of the previous resonance.

preprint2020arXiv

Towards Interpretable Deep Learning Models for Knowledge Tracing

As an important technique for modeling the knowledge states of learners, the traditional knowledge tracing (KT) models have been widely used to support intelligent tutoring systems and MOOC platforms. Driven by the fast advancements of deep learning techniques, deep neural network has been recently adopted to design new KT models for achieving better prediction performance. However, the lack of interpretability of these models has painfully impeded their practical applications, as their outputs and working mechanisms suffer from the intransparent decision process and complex inner structures. We thus propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models. Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model by backpropagating the relevance from the model's output layer to its input layer. The experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions, and partially validate the computed relevance scores from both question level and concept level. We believe it can be a solid step towards fully interpreting the DLKT models and promote their practical applications in the education domain.

preprint2020arXiv

Unveiling the secrets of the mid-infrared Moon

The Moon's optical characteristics in visible and long-wavelength infrared (LWIR) have long been observed with our eyes or with instruments. What the mid-infrared (MIR) Moon looks like is still a mystery. For the first time we present detailed appearance of the MIR Moon observed by a high-resolution geostationary satellite and reveal the essence behind its appearance. The appearance of the MIR Moon is opposite to its normal visible appearance. In addition the MIR Moon shows limb darkening. Both the absolute and the relative brightness distribution of the MIR lunar disk changes with the solar incidence angle. The signatures of the MIR Moon are controlled by both the reflection and emission of the lunar surface. We also show first-ever brightness temperature maps of the lunar disk without needing a mosaic, which better show the temperature variation across the lunar disk. They reveal that the relationship between brightness temperature and solar incidence angle i is cos1/bi, and the power parameter is smaller than the Lambertian temperature model of cos1/4i observed for lunar orbit-based measurements. The slower decrease of the brightness temperature when moving away from the sub-solar point than the Lambertian model is due to topographic effects. The brightness temperature is dominated by albedo and the solar incidence angle and influenced by the topography. Our results indicate that the Moon in the MIR exhibits many interesting phenomena which were previously unknown, and contains abundant information about lunar reflection and thermal emission for future study.