Source author record

Zihan Chen

Zihan Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision Information Theory math.IT astro-ph.HE Distributed, Parallel, and Cluster Computing eess.IV eess.SP astro-ph.SR cond-mat.mtrl-sci eess.SY gr-qc Networking and Internet Architecture Quantitative Methods Systems and Control

Catalog footprint

What is connected

12works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

Memory plays a central role in enabling large language models (LLMs) to operate over sequential tasks by accumulating and reusing experience over time. However, existing evaluations of LLM memory mostly rely on aggregate metrics such as final hold-out accuracy or cumulative online performance, which can obscure critical failure modes such as forgetting and negative transfer. In this paper, we introduce SeqMem-Eval, a diagnostic evaluation framework for sequentially evolving LLM memory. Drawing inspiration from continual learning, it targets a test-time setting in which memory is external, prompt-mediated, and updated without modifying model parameters. Rather than focusing only on final performance, SeqMem-Eval evaluates how memory states evolve, generalize, consolidate experience, and retain useful information during sequential inference. Specifically, it measures online utility, hold-out generalization, backward transfer, and forgetting, providing a finer-grained view of memory quality. Through extensive experiments across diverse tasks and memory methods, we show that higher final or cumulative accuracy does not necessarily imply better memory quality: many methods exhibit strong performance gains while suffering from substantial forgetting or negative transfer. Moreover, different memory designs exhibit distinct trade-offs between adaptability and stability that remain invisible under standard evaluation metrics.

preprint2026arXiv

Lean Clients, Full Accuracy: Hybrid Zeroth- and First-Order Split Federated Learning

Split Federated Learning (SFL) enables collaborative training between resource-constrained edge devices and a compute-rich server. Communication overhead is a central issue in SFL and can be mitigated with auxiliary networks. Yet, the fundamental client-side computation challenge remains, as back-propagation requires substantial memory and computation costs, severely limiting the scale of models that edge devices can support. To enable more resource-efficient client computation and reduce the client-server communication, we propose HERON-SFL, a novel hybrid optimization framework that integrates zeroth-order (ZO) optimization for local client training while retaining first-order (FO) optimization on the server. With the assistance of auxiliary networks, ZO updates enable clients to approximate local gradients using perturbed forward-only evaluations per step, eliminating memory-intensive activation caching and avoiding explicit gradient computation in the traditional training process. Leveraging the low effective rank assumption, we theoretically prove that HERON-SFL's convergence rate is independent of model dimensionality, addressing a key scalability concern common to ZO algorithms. Empirically, on ResNet training and language model (LM) fine-tuning tasks, HERON-SFL matches benchmark accuracy while reducing client peak memory by up to 64% and client-side compute cost by up to 33% per step, substantially expanding the range of models that can be trained or adapted on resource-limited devices.

preprint2026arXiv

Machine Learning-Driven Creep Law Discovery Across Alloy Compositional Space

Hihg-temperature creep characterization of structural alloys traditionally relies on serial uniaxial tests, which are highly inefficient for exploring the large search space of alloy compositions and for material discovery. Here, we introduce a machine-learning-assisted, high-throughput framework for creep law identification based on a dimple array bulge instrument (DABI) configuration, which enables parallel creep testing of 25 dimples, each fabricated from a different alloy, in a single experiment. Full-field surface displacements of dimples undergoing time-dependent creep-induced bulging under inert gas pressure are measured by 3D digital image correlation. We train a recurrent neural network (RNN) as a surrogate model, mapping creep parameters and loading conditions to the time-dependent deformation response of DABI. Coupling this surrogate with a particle swarm optimization scheme enables rapid and global inverse identification with sparsity regularization of creep parameters from experiment displacement-time histories. In addition, we propose a phenomenological creep law with a time-dependent stress exponent that captures the sigmoidal primary creep observed in wrought INCONEL 625 and extracts its temperature dependence from DABI test at multiple temperatures. Furthermore, we employ a general creep law combining several conventional forms together with regularized inversion to identify the creep laws for 47 additional Fe-, Ni-, and Co-rich alloys and to automatically select the dominant functional form for each alloy. This workflow combined with DABI experiment provides a quantitative, high-throughput creep characterization platform that is compatible with data mining, composition-property modeling, and nonlinear structural optimization with creep behavior across a large alloy design space.

preprint2023arXiv

Imperceptible Adversarial Attack via Invertible Neural Networks

Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack via Invertible Neural Networks (AdvINN) method to produce robust and imperceptible adversarial examples. Specifically, AdvINN fully takes advantage of the information preservation property of Invertible Neural Networks and thereby generates adversarial examples by simultaneously adding class-specific semantic information of the target class and dropping discriminant information of the original class. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that the proposed AdvINN method can produce less imperceptible adversarial images than the state-of-the-art methods and AdvINN yields more robust adversarial examples with high confidence compared to other adversarial attacks.

preprint2023arXiv

Personalized Federated Learning with Attention-based Client Selection

Personalized Federated Learning (PFL) relies on collective data knowledge to build customized models. However, non-IID data between clients poses significant challenges, as collaborating with clients who have diverse data distributions can harm local model performance, especially with limited training data. To address this issue, we propose FedACS, a new PFL algorithm with an Attention-based Client Selection mechanism. FedACS integrates an attention mechanism to enhance collaboration among clients with similar data distributions and mitigate the data scarcity issue. It prioritizes and allocates resources based on data similarity. We further establish the theoretical convergence behavior of FedACS. Experiments on CIFAR10 and FMNIST validate FedACS's superiority, showcasing its potential to advance personalized federated learning. By tackling non-IID data challenges and data scarcity, FedACS offers promising advances in the field of personalized federated learning.

preprint2022arXiv

FedCorr: Multi-Stage Federated Learning for Label Noise Correction

Federated learning (FL) is a privacy-preserving distributed learning paradigm that enables clients to jointly train a global model. In real-world FL implementations, client data could have label noise, and different clients could have vastly different label noise levels. Although there exist methods in centralized learning for tackling label noise, such methods do not perform well on heterogeneous label noise in FL settings, due to the typically smaller sizes of client datasets and data privacy requirements in FL. In this paper, we propose $\texttt{FedCorr}$, a general multi-stage framework to tackle heterogeneous label noise in FL, without making any assumptions on the noise models of local clients, while still maintaining client data privacy. In particular, (1) $\texttt{FedCorr}$ dynamically identifies noisy clients by exploiting the dimensionalities of the model prediction subspaces independently measured on all clients, and then identifies incorrect labels on noisy clients based on per-sample losses. To deal with data heterogeneity and to increase training stability, we propose an adaptive local proximal regularization term that is based on estimated local noise levels. (2) We further finetune the global model on identified clean clients and correct the noisy labels for the remaining noisy clients after finetuning. (3) Finally, we apply the usual training on all clients to make full use of all local data. Experiments conducted on CIFAR-10/100 with federated synthetic label noise, and on a real-world noisy dataset, Clothing1M, demonstrate that $\texttt{FedCorr}$ is robust to label noise and substantially outperforms the state-of-the-art methods at multiple noise levels.

preprint2022arXiv

Optimize Deep Learning Models for Prediction of Gene Mutations Using Unsupervised Clustering

Deep learning has become the mainstream methodological choice for analyzing and interpreting whole-slide digital pathology images (WSIs). It is commonly assumed that tumor regions carry most predictive information. In this paper, we proposed an unsupervised clustering-based multiple-instance learning, and apply our method to develop deep-learning models for prediction of gene mutations using WSIs from three cancer types in The Cancer Genome Atlas (TCGA) studies (CRC, LUAD, and HNSCC). We showed that unsupervised clustering of image patches could help identify predictive patches, exclude patches lack of predictive information, and therefore improve prediction on gene mutations in all three different cancer types, compared with the WSI based method without selection of image patches and models based on only tumor regions. Additionally, our proposed algorithm outperformed two recently published baseline algorithms leveraging unsupervised clustering to assist model prediction. The unsupervised-clustering-based approach for mutation prediction allows identification of the spatial regions related to mutation of a specific gene via the resolved probability scores, highlighting the heterogeneity of a predicted genotype in the tumor microenvironment. Finally, our study also demonstrated that selection of tumor regions of WSIs is not always the best way to identify patches for prediction of gene mutations, and other tissue types in the tumor micro-environment may provide better prediction ability for gene mutations than tumor tissues.

preprint2022arXiv

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

We demonstrate that merely analog transmissions and match filtering can realize the function of an edge server in federated learning (FL). Therefore, a network with massively distributed user equipments (UEs) can achieve large-scale FL without an edge server. We also develop a training algorithm that allows UEs to continuously perform local computing without being interrupted by the global parameter uploading, which exploits the full potential of UEs' processing power. We derive convergence rates for the proposed schemes to quantify their training efficiency. The analyses reveal that when the interference obeys a Gaussian distribution, the proposed algorithm retrieves the convergence rate of a server-based FL. But if the interference distribution is heavy-tailed, then the heavier the tail, the slower the algorithm converges. Nonetheless, the system run time can be largely reduced by enabling computation in parallel with communication, whereas the gain is particularly pronounced when communication latency is high. These findings are corroborated via excessive simulations.

preprint2022arXiv

Towards Federated Long-Tailed Learning

Data privacy and class imbalance are the norm rather than the exception in many machine learning tasks. Recent attempts have been launched to, on one side, address the problem of learning from pervasive private data, and on the other side, learn from long-tailed data. However, both assumptions might hold in practical applications, while an effective method to simultaneously alleviate both issues is yet under development. In this paper, we focus on learning with long-tailed (LT) data distributions under the context of the popular privacy-preserved federated learning (FL) framework. We characterize three scenarios with different local or global long-tailed data distributions in the FL framework, and highlight the corresponding challenges. The preliminary results under different scenarios reveal that substantial future work are of high necessity to better resolve the characterized federated long-tailed learning tasks.

preprint2020arXiv

DeepFacePencil: Creating Face Images from Freehand Sketches

In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches. Existing image-to-image translation methods require a large-scale dataset of paired sketches and images for supervision. They typically utilize synthesized edge maps of face images as training data. However, these synthesized edge maps strictly align with the edges of the corresponding face images, which limit their generalization ability to real hand-drawn sketches with vast stroke diversity. To address this problem, we propose DeepFacePencil, an effective tool that is able to generate photo-realistic face images from hand-drawn sketches, based on a novel dual generator image translation network during training. A novel spatial attention pooling (SAP) is designed to adaptively handle stroke distortions which are spatially varying to support various stroke styles and different levels of details. We conduct extensive experiments and the results demonstrate the superiority of our model over existing methods on both image quality and model generalization to hand-drawn sketches.

preprint2015arXiv

The Mass of the Black Hole in the X-ray Binary Nova Muscae 1991

The optical counterpart of the black-hole soft X-ray transient Nova Muscae 1991 has brightened by $Δ{V}\approx0.8$ mag since its return to quiescence 23 years ago. We present the first clear evidence that the brightening of soft X-ray transients in quiescence occurs at a nearly linear rate. This discovery, and our precise determination of the disk component of emission obtained using our $simultaneous$ photometric and spectroscopic data, have allowed us to identify and accurately model archival ellipsoidal light curves of the highest quality. The simultaneity, and the strong constraint it provides on the component of disk emission, is a key element of our work. Based on our analysis of the light curves, and our earlier measurements of the mass function and mass ratio, we have obtained for Nova Muscae 1991 the first accurate estimates of its systemic inclination $i=43.2^{+2.1}_{-2.7}$ deg, and black hole mass $M=11.0^{+2.1}_{-1.4}\ M_\odot$. Based on our determination of the radius of the secondary, we estimate the distance to be $D=4.95^{+0.69}_{-0.65}$ kpc. We discuss the implications of our work for future dynamical studies of black-hole soft X-ray transients.

preprint2015arXiv

The Spin of The Black Hole in the X-ray Binary Nova Muscae 1991

The bright soft X-ray transient Nova Muscae 1991 was intensively observed during its entire 8-month outburst using the Large Area Counter (LAC) onboard the Ginga satellite. Recently, we obtained accurate estimates of the mass of the black hole primary, the orbital inclination angle of the system, and the distance. Using these crucial input data and Ginga X-ray spectra, we have measured the spin of the black hole using the continuum-fitting method. For four X-ray spectra of extraordinary quality we have determined the dimensionless spin parameter of the black hole to be a/M = 0.63 (-0.19, +0.16) (1 sigma confidence level), a result that we confirm using eleven additional spectra of lower quality. Our spin estimate challenges two published results: It is somewhat higher than the value predicted by a proposed relationship between jet power and spin; and we find that the spin of the black hole is decidedly prograde, not retrograde as has been claimed.

Zihan Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

Lean Clients, Full Accuracy: Hybrid Zeroth- and First-Order Split Federated Learning

Machine Learning-Driven Creep Law Discovery Across Alloy Compositional Space

Imperceptible Adversarial Attack via Invertible Neural Networks

Personalized Federated Learning with Attention-based Client Selection

FedCorr: Multi-Stage Federated Learning for Label Noise Correction

Optimize Deep Learning Models for Prediction of Gene Mutations Using Unsupervised Clustering

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Towards Federated Long-Tailed Learning

DeepFacePencil: Creating Face Images from Freehand Sketches

The Mass of the Black Hole in the X-ray Binary Nova Muscae 1991

The Spin of The Black Hole in the X-ray Binary Nova Muscae 1991