Researcher profile

Zhenyu Yang

Zhenyu Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

FedHPro: Federated Hyper-Prototype Learning via Gradient Matching

Federated Learning (FL) enables collaborative training of distributed clients while protecting privacy. To enhance generalization capability in FL, prototype-based FL is in the spotlight, since shared global prototypes offer semantic anchors for aligning client-specific local prototypes. However, existing methods update global prototypes at the prototype-level via averaging local prototypes or refining global anchors, which often leads to semantic drift across clients and subsequently yields a misaligned global signal. To alleviate this issue, we introduce hyper-prototypes, defined by a set of learnable global class-wise prototypes to preserve underlying semantic knowledge across clients. The hyper-prototypes are optimized via gradient matching to align with class-relevant characteristics distilled directly from clients' real samples, rather than prototype-level descriptors. We further propose FedHPro, a Federated Hyper-Prototype Learning framework, to leverage hyper-prototypes to promote inter-class separability via mutual-contrastive learning with client-specific margin, while encouraging intra-class uniformity through a consistency penalty. Comprehensive experiments under diverse heterogeneous scenarios confirm that 1) hyper-prototypes produce a more semantically consistent global signal, and 2) FedHPro achieves state-of-the-art performance on several benchmark datasets. Code is available at \href{https://github.com/mala-lab/FedHPro}{https://github.com/mala-lab/FedHPro}.

preprint2025arXiv

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments

Among existing online mobile-use benchmarks, AndroidWorld has emerged as the dominant benchmark due to its reproducible environment and deterministic evaluation; however, recent agents achieving over 90% success rates indicate its saturation and motivate the need for a more challenging benchmark. In addition, its environment lacks key application categories, such as e-commerce and enterprise communication, and does not reflect realistic mobile-use scenarios characterized by vague user instructions and hybrid tool usage. We introduce MobileWorld, a substantially more challenging benchmark designed to reflect real-world usage through 201 tasks across 20 applications. MobileWorld derives its difficulty from an emphasis on long-horizon, cross-application workflows, requiring nearly twice as many completion steps on average (27.8 vs. 14.3) and featuring a significantly higher proportion of multi-app tasks (62.2% vs. 9.5%) than AndroidWorld. To overcome the limitations of existing environments, MobileWorld achieves a balance between production-grade utility and reproducible evaluation by utilizing open-source alternatives to industry standards (e.g., Mattermost for Slack). This approach enables a fully observable and controlled environment through source code modification and direct backend database access for precise verification. MobileWorld also introduces novel task categories, including agent-user interaction and Model Context Protocol (MCP)-augmented tasks, for evaluating agents in user-aware, hybrid-tool scenarios. To facilitate evaluation, we develop a planner-executor agentic framework with extended action spaces to support user interactions and MCP calls. Our results reveal a sharp performance drop compared to AndroidWorld, with the best agentic framework and end-to-end model achieving 51.7% and 20.9% success rates, respectively, highlighting ample headroom for future research.

preprint2022arXiv

A Deep Learning Model with Radiomics Analysis Integration for Glioblastoma Post-Resection Survival Prediction

Purpose: To develop a novel deep-learning model that integrates radiomics analysis in a multi-dimensional feature fusion workflow for glioblastoma (GBM) post-resection survival prediction. Methods: A cohort of 235 GBM patients with complete surgical resection was divided into short-term/long-term survival groups with 1-yr survival time threshold. Each patient received a pre-surgery multi-parametric MRI exam, and three tumor subregions were segmented by neuroradiologists. The developed model comprises three data source branches: in the 1st radiomics branch, 456 radiomics features (RF) were from each patient; in the 2nd deep learning branch, an encoding neural network architecture was trained for survival group prediction using each single MR modality, and high-dimensional parameters of the last two network layers were extracted as deep features (DF). The extracted radiomics features and deep features were processed by a feature selection procedure to reduce dimension size of each feature space. In the 3rd branch, non-image-based patient-specific clinical features (PSCF) were collected. Finally, data sources from all three branches were fused as an integrated input for a supporting vector machine (SVM) execution for survival group prediction. Different strategies of model design, including 1) 2D/3D-based image analysis, and 2) different data source combinations in SVM input design, were investigated in comparison studies. Results: The model achieved 0.638 prediction accuracy when using PSCF only, which was higher than the results using RF or DF only in both 2D and 3D analysis. The joint use of RF/PSCF improved accuracy results to 0.681 in 3D analysis. The most accurate models in 2D/3D analysis reached the highest accuracy 0.745 with different combinations of RF/DF/ PSCF, and the corresponding ROC AUC results were 0.69(2D) and 0.71(3D), respectively.

preprint2022arXiv

A Neural Ordinary Differential Equation Model for Visualizing Deep Neural Network Behaviors in Multi-Parametric MRI based Glioma Segmentation

Purpose: To develop a neural ordinary differential equation (ODE) model for visualizing deep neural network (DNN) behavior during multi-parametric MRI (mp-MRI) based glioma segmentation as a method to enhance deep learning explainability. Methods: By hypothesizing that deep feature extraction can be modeled as a spatiotemporally continuous process, we designed a novel deep learning model, neural ODE, in which deep feature extraction was governed by an ODE without explicit expression. The dynamics of 1) MR images after interactions with DNN and 2) segmentation formation can be visualized after solving ODE. An accumulative contribution curve (ACC) was designed to quantitatively evaluate the utilization of each MRI by DNN towards the final segmentation results. The proposed neural ODE model was demonstrated using 369 glioma patients with a 4-modality mp-MRI protocol: T1, contrast-enhanced T1 (T1-Ce), T2, and FLAIR. Three neural ODE models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT). The key MR modalities with significant utilization by DNN were identified based on ACC analysis. Segmentation results by DNN using only the key MR modalities were compared to the ones using all 4 MR modalities. Results: All neural ODE models successfully illustrated image dynamics as expected. ACC analysis identified T1-Ce as the only key modality in ET and TC segmentations, while both FLAIR and T2 were key modalities in WT segmentation. Compared to the U-Net results using all 4 MR modalities, Dice coefficient of ET (0.784->0.775), TC (0.760->0.758), and WT (0.841->0.837) using the key modalities only had minimal differences without significance. Conclusion: The neural ODE model offers a new tool for optimizing the deep learning model inputs with enhanced explainability. The presented methodology can be generalized to other medical image-related deep learning applications.

preprint2022arXiv

Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

Despite the success of text-to-text pre-trained models in various natural language generation (NLG) tasks, the generation performance is largely restricted by the number of labeled data in downstream tasks, particularly in data-to-text generation tasks. Existing works mostly utilize abundant unlabeled structured data to conduct unsupervised pre-training for task adaption, which fail to model the complex relationship between source structured data and target texts. Thus, we introduce self-training as a better few-shot learner than task-adaptive pre-training, which explicitly captures this relationship via pseudo-labeled data generated by the pre-trained model. To alleviate the side-effect of low-quality pseudo-labeled data during self-training, we propose a novel method called Curriculum-Based Self-Training (CBST) to effectively leverage unlabeled data in a rearranged order determined by the difficulty of text generation. Experimental results show that our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.

preprint2022arXiv

High fill factor confocal compound eyes fabricated by direct laser writing for better imaging quality

We fabricate two kinds of 100% fill factor compound eye structures using direct laser writing, including conventional compound eyes (CVCEs) with the same focal length of each microlens unit, and specially designed confocal compound eyes (CFCEs). For CFCEs, the focal length of each microlens unit is determined by its position and is equal to the distance between the microlens unit and the image sensor. In this letter, the optical properties of CVCEs and CFCEs are tested and compared. It is found that compared with CVCEs, CFCEs can improve the focusing efficiency by about 7%, enlarge the imaging area by about 25%, and have better imaging quality at the edge of the field of view.

preprint2022arXiv

LaMemo: Language Modeling with Look-Ahead Memory

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.

preprint2022arXiv

Phase-Only Holographic Assisted Planar Printing for Massively Multiplexed Optical Display and Encryption

Multiplexed planar printings, made of single or few layer micro and nano optical platforms, are essential for high capacity display, information storage and encryption. Although having been developed rapidly, the demonstrated channels are still limited and also lack instantaneity. Here, holograms and printings, always regarded as two independent information coding domains with totally different principles, are combined together through our proposed angle multiplexing framework, leading to multiplexed printings with hundreds of channels. Based on such approach, we experimentally encode respectively 25 gray scale printings into 25 angles and even 8 gray scale videos into 8 angles with a phase-only spatial light modulator. As a bridge between printings and holograms, our method allows to generate printings combining various holographic methods. Beneficial from this, we demonstrate a gradient metasurface based 324 channel printing which multiplexes angles, polarizations and wavelengths simultaneously. Our work paves the way to flexibly angle-dependent printing display and massively multiplexed encryption systems.

preprint2022arXiv

Quantification of lung function on CT images based on pulmonary radiomic filtering

Purpose: To develop a radiomics filtering technique for characterizing spatial-encoded regional pulmonary ventilation information on lung CT. Methods: The lung volume was segmented on 46 CT images, and a 3D sliding window kernel was implemented across the lung volume to capture the spatial-encoded image information. Fifty-three radiomic features were extracted within the kernel, resulting in a 4th order tensor object. As such, each voxel coordinate of the original lung was represented as a 53-dimensional feature vector, such that radiomic features could be viewed as feature maps within the lungs. To test the technique as a potential pulmonary ventilation biomarker, the radiomic feature maps were compared to paired functional images (Galligas PET or DTPA-SPECT) based on Spearman correlation (r) analysis. Results: The radiomic feature map GLRLM-based Run-Length Non-Uniformity and GLCOM-based Sum Average are found to be highly correlated with the functional imaging. The achieved r (median [range]) for the two features are 0.46 [0.05, 0.67] and 0.45 [0.21, 0.65] across 46 patients and 2 functional imaging modalities, respectively. Conclusions: The results provide evidence that local regions of sparsely encoded heterogeneous lung parenchyma on CT are associated with diminished radiotracer uptake and measured lung ventilation defects on PET/SPECT imaging. These findings demonstrate the potential of radiomics to serve as a complementary tool to the current lung quantification techniques and provide hypothesis-generating data for future studies.

preprint2022arXiv

Semantic-Enhanced Explainable Finetuning for Open-Domain Dialogues

This paper propose to combine pretrained language models with the modular dialogue paradigm for open-domain dialogue modeling. Our method, semantic-enhanced finetuning, instantiates conversation understanding, planning, and response generation as a language model finetuning task. At inference, we disentangle semantic and token variations by specifying sampling methods and constraints for each module separately. For training and evaluation, we present X-Weibo, a Chinese multi-turn open-domain dialogue dataset with automatic annotation for emotions, DAs, and topical words. Experiments show that semantic-enhanced finetuning outperforms strong baselines on non-semantic and semantic metrics, improves the human-evaluated relevance, coherence, and informativeness, and exhibits considerable controllability over semantic variables.

preprint2021arXiv

The distance between the weights of the neural network is meaningful

In the application of neural networks, we need to select a suitable model based on the problem complexity and the dataset scale. To analyze the network's capacity, quantifying the information learned by the network is necessary. This paper proves that the distance between the neural network weights in different training stages can be used to estimate the information accumulated by the network in the training process directly. The experiment results verify the utility of this method. An application of this method related to the label corruption is shown at the end.

preprint2020arXiv

Compact optical polarization-insensitive zoom metalens-doublet

Metasurface-based lenses (metalenses) offer specific conceptual advantages compared to ordinary refractive lenses. For example, it is possible to tune the focal length of a metalens doublet by varying the relative angle between the two metalenses while fixing their distance, leading to an extremely compact zoom lens. An improved polarization-insensitive design based on silicon-nanocylinders on silica substrates is presented. This design is realized and characterized experimentally at 1550 nm wavelength. By varying the relative angle between the metalenses in steps of 10 degrees, tuning of the doublet focal length is demonstrated from -54 mm to -+3 mm to +54 mm. This results in a zoom factor of an imaging system varying between 1 and 18. For positive focal lengths, the doublet focusing efficiency has a minimum of 34% and a maximum of 83%. Experiment and theory are in very good agreement.