Researcher profile

Fen Xia

Fen Xia contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

The scaling behavior, in which test performance often improves as model size and data increase, is a central empirical phenomenon in modern deep learning, yet its theoretical basis remains incomplete. In this paper, we study depth expansion in normalized residual networks: starting from a trained model in an old hypothesis class, we insert a new residual block at an intermediate layer and ask when such an expansion can yield a provable improvement in test risk. We develop a unified framework that decomposes this question into representational gain, optimization gain, and generalization transfer. First, under a first-order descent condition near zero initialization, we prove that the expanded hypothesis class contains an auxiliary jumpboard model with strictly smaller population risk than the original model. Second, under norm control tailored to post-normalized residual architectures, we establish a norm-based Rademacher complexity bound for the expanded model class. These ingredients lead to two complementary test-risk guarantees: one route passes through population risk and is tighter when a positive population margin is available, while the other works directly at the train/test level, avoids Hoeffding transfer, and is more robust in degenerate regimes. Together, these results provide a theorem-driven mechanism under which residual depth expansion can improve test performance in normalized residual networks. More broadly, they suggest that scaling is inherently joint: depth creates new improving directions, width enhances the finite-sample observability of weak signals, and data determines whether the statistical cost of expansion can be controlled.

preprint2020arXiv

The Scalability for Parallel Machine Learning Training Algorithm: Dataset Matters

To gain a better performance, many researchers put more computing resource into an application. However, in the AI area, there is still a lack of a successful large-scale machine learning training application: The scalability and performance reproducibility of parallel machine learning training algorithm are limited and there are a few pieces of research focusing on why these indexes are limited but there are very few research efforts explaining the reasons in essence. In this paper, we propose that the sample difference in dataset plays a more prominent role in parallel machine learning algorithm scalability. Dataset characters can measure sample difference. These characters include the variance of the sample in a dataset, sparsity, sample diversity and similarity in sampling sequence. To match our proposal, we choose four kinds of parallel machine learning training algorithms as our research objects: (1) Asynchronous parallel SGD algorithm (Hogwild! algorithm) (2) Parallel model average SGD algorithm (Mini-batch SGD algorithm) (3) Decenterilization optimization algorithm, (4) Dual Coordinate Optimization (DADM algorithm). These algorithms cover different types of machine learning optimization algorithms. We present the analysis of their convergence proof and design experiments. Our results show that the characters datasets decide the scalability of the machine learning algorithm. What is more, there is an upper bound of parallel scalability for machine learning algorithms.