Source author record

Zhou Fang

Zhou Fang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology math.OC Quantitative Methods Artificial Intelligence Computation Computation and Language cond-mat.supr-con Hardware Architecture math.DS Multiagent Systems Performance

Catalog footprint

What is connected

9works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness function, and termination rule, dynamically adapts to runtime execution context through retrieval-augmented prompt synthesis. We designed, implemented, and deployed KernelEvolve to optimize a wide variety of production recommendation models across generations of NVIDIA and AMD GPUs, as well as Meta's AI accelerators. We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness. KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines across diverse production use cases and for heterogeneous AI systems at-scale. Beyond performance efficiency improvements, KernelEvolve significantly mitigates the programmability barrier for new AI hardware by enabling automated kernel generation for in-house developed AI hardware.

preprint2022arXiv

Efficacy of regularized multi-task learning based on SVM models

This paper investigates the efficacy of a regularized multi-task learning (MTL) framework based on SVM (M-SVM) to answer whether MTL always provides reliable results and how MTL outperforms independent learning. We first find that M-SVM is Bayes risk consistent in the limit of large sample size. This implies that despite the task dissimilarities, M-SVM always produces a reliable decision rule for each task in terms of misclassification error when the data size is large enough. Furthermore, we find that the task-interaction vanishes as the data size goes to infinity, and the convergence rates of M-SVM and its single-task counterpart have the same upper bound. The former suggests that M-SVM cannot improve the limit classifier's performance; based on the latter, we conjecture that the optimal convergence rate is not improved when the task number is fixed. As a novel insight of MTL, our theoretical and experimental results achieved an excellent agreement that the benefit of the MTL methods lies in the improvement of the pre-convergence-rate factor (PCR, to be denoted in Section III) rather than the convergence rate. Moreover, this improvement of PCR factors is more significant when the data size is small.

preprint2022arXiv

Stochastic filtering for multiscale stochastic reaction networks based on hybrid approximations

In the past few decades, the development of fluorescent technologies and microscopic techniques has greatly improved scientists' ability to observe real-time single-cell activities. In this paper, we consider the filtering problem associate with these advanced technologies, i.e., how to estimate latent dynamic states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. A good solution to this problem can further improve scientists' ability to extract information about intracellular systems from time-course experiments. A straightforward approach to this filtering problem is to use a particle filter where particles are generated by simulation of the full model and weighted according to observations. However, the exact simulation of the full dynamic model usually takes an impractical amount of computational time and prevents this type of particle filters from being used for real-time applications, such as transcription regulation networks. Inspired by the recent development of hybrid approximations to multiscale chemical reaction networks, we approach the filtering problem in an alternative way. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network and, therefore, can greatly reduce the computational effort required to simulate the dynamics. As a result, we are able to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using several numerical examples.

preprint2022arXiv

Type-enriched Hierarchical Contrastive Strategy for Fine-Grained Entity Typing

Fine-grained entity typing (FET) aims to deduce specific semantic types of the entity mentions in text. Modern methods for FET mainly focus on learning what a certain type looks like. And few works directly model the type differences, that is, let models know the extent that one type is different from others. To alleviate this problem, we propose a type-enriched hierarchical contrastive strategy for FET. Our method can directly model the differences between hierarchical types and improve the ability to distinguish multi-grained similar types. On the one hand, we embed type into entity contexts to make type information directly perceptible. On the other hand, we design a constrained contrastive strategy on the hierarchical structure to directly model the type differences, which can simultaneously perceive the distinguishability between types at different granularity. Experimental results on three benchmarks, BBN, OntoNotes, and FIGER show that our method achieves significant performance on FET by effectively modeling type differences.

preprint2020arXiv

Stochastic filters based on hybrid approximations of multiscale stochastic reaction networks

We consider the problem of estimating the dynamic latent states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network, and it can greatly reduce the computational effort required to simulate the dynamics. This enables us to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using a numerical example.

preprint2016arXiv

Observation of Ising spin-nematic order and its close relationship to the superconductivity in FeSe single crystals

Superconducting FeSe single crystals of (001) orientation are synthesized via a hydrothermal ion-release route. An Ising spin-nematic order is identified by our systematic measurements of in-plane angular-dependent magnetoresistance (AMR) and static magnetization. The turn-on temperature of anisotropic AMR signifies the Ising spin-nematic ordering temperature Tsn, below which a two-fold rotational symmetry is observed in the iron plane. A downward curvature appears below Tsn in the temperature dependence of static magnetization for the weak in-plane magnetic field as reported previously. Remarkably, we find a universal linear relationship between Tc and Tsn among various superconducting samples, indicating that the spin nematicity and the superconductivity in FeSe have a common microscopic origin.

preprint2016arXiv

Stochastic Weak Passivity Based Stabilization of Stochastic Systems with Nonvanishing Noise

For stochastic systems with nonvanishing noise, i.e., at the desired state the noise port does not vanish, it is impossible to achieve the global stability of the desired state in the sense of probability. This bad property also leads to the loss of stochastic passivity at the desired state if a radially unbounded Lyapunov function is expected as the storage function. To characterize a certain (globally) stable behavior for such a class of systems, the stochastic asymptotic weak stability is proposed in this paper which suggests the transition measure of the state to be convergent and the ergodicity. By defining stochastic weak passivity that admits stochastic passivity only outside a ball centered around the desired state but not in the whole state space, we develop stochastic weak passivity theorems to ensure that the stochastic systems with nonvanishing noise can be globally\locally stabilized in weak sense through negative feedback law. Applications are shown to stochastic linear systems and a nonlinear process system, and some simulation are made on the latter further.

preprint2011arXiv

Sparse Group Selection Through Co-Adaptive Penalties

Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well approximated by a fit involving only a small number of covariates -- a so called sparsity assumption, which leads to the Lasso and other methods. In many situations, however, the covariates can be considered to be structured, in that the selection of some variables favours the selection of others -- with variables organised into groups entering or leaving the model simultaneously as a special case. This structure creates a different form of sparsity. In this paper, we suggest the Co-adaptive Lasso to fit models accommodating this form of `group sparsity'. The Co-adaptive Lasso is fast and simple to calculate, and we show that it holds theoretical advantages over the Lasso, performs well under a broad set of conclusions, and is very competitive in empirical simulations in comparison with previously suggested algorithms like the Group Lasso and the Adaptive Lasso.

preprint2010arXiv

LASSO ISOtone for High Dimensional Additive Isotonic Regression

Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling to additive isotonic regression. Thus, it is viable in many situations with high dimensional predictor variables, where selection of significant versus insignificant variables are required. We suggest an algorithm involving a modification of the backfitting algorithm CPAV. We give a numerical convergence result, and finally examine some of its properties through simulations. We also suggest some possible extensions that improve performance, and allow calculation to be carried out when the direction of the monotonicity is unknown.

Zhou Fang

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Efficacy of regularized multi-task learning based on SVM models

Stochastic filtering for multiscale stochastic reaction networks based on hybrid approximations

Type-enriched Hierarchical Contrastive Strategy for Fine-Grained Entity Typing

Stochastic filters based on hybrid approximations of multiscale stochastic reaction networks

Observation of Ising spin-nematic order and its close relationship to the superconductivity in FeSe single crystals

Stochastic Weak Passivity Based Stabilization of Stochastic Systems with Nonvanishing Noise

Sparse Group Selection Through Co-Adaptive Penalties

LASSO ISOtone for High Dimensional Additive Isotonic Regression