Source author record

Hongtao Yu

Hongtao Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computation and Language Hardware Architecture Multiagent Systems Performance physics.chem-ph

Catalog footprint

What is connected

3works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness function, and termination rule, dynamically adapts to runtime execution context through retrieval-augmented prompt synthesis. We designed, implemented, and deployed KernelEvolve to optimize a wide variety of production recommendation models across generations of NVIDIA and AMD GPUs, as well as Meta's AI accelerators. We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness. KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines across diverse production use cases and for heterogeneous AI systems at-scale. Beyond performance efficiency improvements, KernelEvolve significantly mitigates the programmability barrier for new AI hardware by enabling automated kernel generation for in-house developed AI hardware.

preprint2022arXiv

NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

NeuralKG is an open-source Python-based library for diverse representation learning of knowledge graphs. It implements three different series of Knowledge Graph Embedding (KGE) methods, including conventional KGEs, GNN-based KGEs, and Rule-based KGEs. With a unified framework, NeuralKG successfully reproduces link prediction results of these methods on benchmarks, freeing users from the laborious task of reimplementing them, especially for some methods originally written in non-python programming languages. Besides, NeuralKG is highly configurable and extensible. It provides various decoupled modules that can be mixed and adapted to each other. Thus with NeuralKG, developers and researchers can quickly implement their own designed models and obtain the optimal training methods to achieve the best performance efficiently. We built an website in http://neuralkg.zjukg.cn to organize an open and shared KG representation learning community. The source code is all publicly released at https://github.com/zjukg/NeuralKG.

preprint2020arXiv

Enhanced Catalytic Activity of Gold@Polydopamine Nanoreactors with Multi-compartment Structure Under NIR Irradiation

Photothermal conversion (PTC) nanostructures have great potential for applications in many fields, and therefore, they have attracted tremendous attention. However, the construction of a PTC nanoreactor with multi-compartment structure to achieve the combination of unique chemical properties and structural feature is still challenging due to the synthetic difficulties. Herein, we designed and synthesized a catalytically active, PTC gold (Au)@polydopamine (PDA) nanoreactor driven by infrared irradiation using assembled PS-b-P2VP nanosphere as soft template. The particles exhibit multi-compartment structure which is revealed by 3D electron tomography characterization technique. They feature permeable shells with tunable shell thickness. Full kinetics for the reduction reaction of 4-nitrophenol has been investigated using these particles as nanoreactors and compared with other reported systems. Notably, a remarkable acceleration of the catalytic reaction upon near-infrared irradiation is demonstrated, which reveals for the first time the importance of the synergistic effect of photothermal conversion and complex inner structure to the kinetics of the catalytic reduction. The ease of synthesis and fresh insights into catalysis will promote a new platform for novel nanoreactor studies.