Researcher profile

Kai Jiang

Kai Jiang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBenchX, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than method design. Category explains nearly three times more variance in semantic correctness than method (9.4% vs 3.3% explained deviance), and 72% of Fusion tasks fail across all five methods while Math tasks are solved consistently. Second, iterative refinement improves correctness, but not performance. Across GEAK iterations, compile rate rises from 52.3% to 68.8% while average speedup declines from $1.58\times$ to $1.44\times$; newly rescued kernels consistently underperform persistently correct ones ($1.16\times$ vs $1.58\times$ speedup in round~0$\to$1). Third, correctness does not imply efficiency. 46.6% of correct kernels are slower than the PyTorch eager baseline, and cross-hardware speedup variance reaches $21.4\times$. Besides, quantization remains completely unsolved (0/30 successes) despite non-trivial compilation rates, revealing systematic misunderstanding of numerical computation contracts rather than surface-level syntax errors. These findings suggest that future progress depends on handling global coordination, explicitly modeling numerical precision, and incorporating hardware efficiency into generation. The code is available at https://github.com/BonnieW05/KernelBenchX

preprint2026arXiv

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blackwell GPUs to accelerate attention computation. Our implementation achieves 1038 TOPS on RTX5090, which is a 5x speedup over the fastest FlashAttention on RTX5090. Experiments show that our FP4 attention can accelerate inference of various models in a plug-and-play way. Second, we pioneer low-bit attention to training tasks. Existing low-bit attention works like FlashAttention3 and SageAttention focus only on inference. However, the efficiency of training large models is also important. To explore whether low-bit attention can be effectively applied to training tasks, we design an accurate and efficient 8-bit attention for both forward and backward propagation. Experiments indicate that 8-bit attention achieves lossless performance in fine-tuning tasks but exhibits slower convergence in pretraining tasks. The code is available at https://github.com/thu-ml/SageAttention.

preprint2026arXiv

Spectral Distribution of one-dimensional Photonic Quasicrystals: The Role of Irrational Numbers

In this paper, we construct a one-dimensional photonic quasicrystal by combining two incommensurate spatial harmonics, where the ratio of their periods is the irrational number β. We evaluate the photonic quasicrystal accurately by a generalized spectral method that embeds the quasiperiodic structure into a higher-dimensional periodic system. We study the spectral distribution of one-dimensional photonic quasicrystals and find some interesting phenomena. As the computational resolution N increases, there are more eigenvalues within finite frequency bandwidths, and the maximum localization always occurs at spectral gap edges for states near index N + 1. By varying βwithin the range of (0,1), we present a butterfly-shaped spectral structure with abundant band gaps. We find that the spectral structure factor Q (defined as I_{mg}/N, where I_{mg} is the maximum gap index) exhibits different linear patterns as βchanges: Q = 1 - βwhen β< βc, while Q = βwhen β> βc, where βc \approx 0.424 is the transition point. This linear relationship holds robustly in the strong quasiperiodic regime (βaway from 0 or 1) and is independent of the specific type of irrational number used. The relationship disappears (weak quasiperiodic regime) near β= 0 or β= 1. It demonstrates that the intrinsic spectral properties of one-dimensional photonic quasicrystals are fundamentally governed by the magnitude of the irrational parameter β.

preprint2024arXiv

Accurately recover global quasiperiodic systems by finite points

Quasiperiodic systems, related to irrational numbers, are space-filling structures without decay nor translation invariance. How to accurately recover these systems, especially for non-smooth cases, presents a big challenge in numerical computation. In this paper, we propose a new algorithm, finite points recovery (FPR) method, which is available for both smooth and non-smooth cases, to address this challenge. The FPR method first establishes a homomorphism between the lower-dimensional definition domain of the quasiperiodic function and the higher-dimensional torus, then recovers the global quasiperiodic system by employing interpolation technique with finite points in the definition domain without dimensional lifting. Furthermore, we develop accurate and efficient strategies of selecting finite points according to the arithmetic properties of irrational numbers. The corresponding mathematical theory, convergence analysis, and computational complexity analysis on choosing finite points are presented. Numerical experiments demonstrate the effectiveness and superiority of FPR approach in recovering both smooth quasiperiodic functions and piecewise constant Fibonacci quasicrystals. While existing spectral methods encounter difficulties in accurately recovering non-smooth quasiperiodic functions.

preprint2024arXiv

Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the novel FY-4A-Himawari-8 (FYH) dataset, which includes nine distinct cloud categories and uses precise domain adaptation methods to align 70,419 image-label pairs in terms of projection, temporal resolution, and spatial resolution, thereby facilitating the training of supervised deep learning networks. Given the complexity and diversity of cloud formations, we have thoroughly analyzed the challenges inherent to cloud recognition tasks, examining the intricate characteristics and distribution of the data. To effectively address these challenges, we designed a Distribution-aware Interactive-Attention Network (DIAnet), which preserves pixel-level details through a high-resolution branch and a parallel multi-resolution cross-branch. We also integrated a distribution-aware loss (DAL) to mitigate the imbalance across cloud categories. An Interactive Attention Module (IAM) further enhances the robustness of feature extraction combined with spatial and channel information. Empirical evaluations on the FYH dataset demonstrate that our method outperforms other cloud recognition networks, achieving superior performance in terms of mean Intersection over Union (mIoU). The code for implementing DIAnet is available at https://github.com/icey-zhang/DIAnet.

preprint2022arXiv

Adaptive Fairness-Aware Online Meta-Learning for Changing Environments

The fairness-aware online learning framework has arisen as a powerful tool for the continual lifelong learning setting. The goal for the learner is to sequentially learn new tasks where they come one after another over time and the learner ensures the statistic parity of the new coming task across different protected sub-populations (e.g. race and gender). A major drawback of existing methods is that they make heavy use of the i.i.d assumption for data and hence provide static regret analysis for the framework. However, low static regret cannot imply a good performance in changing environments where tasks are sampled from heterogeneous distributions. To address the fairness-aware online learning problem in changing environments, in this paper, we first construct a novel regret metric FairSAR by adding long-term fairness constraints onto a strongly adapted loss regret. Furthermore, to determine a good model parameter at each round, we propose a novel adaptive fairness-aware online meta-learning algorithm, namely FairSAOML, which is able to adapt to changing environments in both bias control and model precision. The problem is formulated in the form of a bi-level convex-concave optimization with respect to the model&#39;s primal and dual parameters that are associated with the model&#39;s accuracy and fairness, respectively. The theoretic analysis provides sub-linear upper bounds for both loss regret and violation of cumulative fairness constraints. Our experimental evaluation on different real-world datasets with settings of changing environments suggests that the proposed FairSAOML significantly outperforms alternatives based on the best prior online learning approaches.

preprint2021arXiv

Transition pathways connecting crystals and quasicrystals

Due to structural incommensurability, the emergence of a quasicrystal from a crystalline phase represents a challenge to computational physics. Here the nucleation of quasicrystals is investigated by using an efficient computational method applied to a Landau free-energy functional. Specifically, transition pathways connecting different local minima of the Lifshitz-Petrich model are obtained by using the high-index saddle dynamics. Saddle points on these paths are identified as the critical nuclei of the 6-fold crystals and 12-fold quasicrystals. The results reveal that phase transitions between the crystalline and quasicrystalline phases could follow two possible pathways, corresponding to a one-stage phase transition and a two-stage phase transition involving a metastable lamellar quasicrystalline state, respectively.

preprint2020arXiv

Complete integrability of diffeomorphisms and their local normal forms

In this paper, we consider the normal form problem of a commutative family of germs of diffeomorphisms at a fixed point, say the origin, of $\mathbb{K}^n$ ($\mathbb{K}=\mathbb{C}$ or $\mathbb{R}$). We define a notion of integrability of such a family. We give sufficient conditions which ensure that such an integrable family can be transformed into a normal form by an analytic (resp. a smooth) transformation if the initial diffeomorphisms are analytic (resp. smooth).

preprint2020arXiv

High-order energy stable schemes of incommensurate phase-field crystal model

This article focuses on the development of high-order energy stable schemes for the multi-length-scale incommensurate phase-field crystal model which is able to study the phase behavior of aperiodic structures. These high-order schemes based on the scalar auxiliary variable (SAV) and spectral deferred correction (SDC) approaches are suitable for the L 2 gradient flow equation, i.e., the Allen-Cahn dynamic equation. Concretely, we propose a second-order Crank-Nicolson (CN) scheme of the SAV system, prove the energy dissipation law, and give the error estimate in the almost periodic function sense. Moreover, we use the SDC method to improve the computational accuracy of the SAV/CN scheme. Numerical results demonstrate the advantages of high-order numerical methods in numerical computations and show the influence of length-scales on the formation of ordered structures.

preprint2020arXiv

High-order energy stable schemes of incommensurate phase-field crystal model

This article focuses on the development of high-order energy stable schemes for the multi-length-scale incommensurate phase-field crystal model which is able to study the phase behavior of aperiodic structures. These high-order schemes based on the scalar auxiliary variable (SAV) and spectral deferred correction (SDC) approaches are suitable for the L 2 gradient flow equation, i.e., the Allen-Cahn dynamic equation. Concretely, we propose a second-order Crank-Nicolson (CN) scheme of the SAV system, prove the energy dissipation law, and give the error estimate in the almost periodic function sense. Moreover, we use the SDC method to improve the computational accuracy of the SAV/CN scheme. Numerical results demonstrate the advantages of high-order numerical methods in numerical computations and show the influence of length-scales on the formation of ordered structures.

preprint2020arXiv

Reinforcement Learning with Goal-Distance Gradient

Reinforcement learning usually uses the feedback rewards of environmental to train agents. But the rewards in the actual environment are sparse, and even some environments will not rewards. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. Although using shaped rewards is effective when solving sparse reward tasks, it is limited to specific problems and learning is also susceptible to local optima. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment. Our method use the minimum number of transitions between states as the distance to replace the rewards of environmental, and proposes a goal-distance gradient to achieve policy improvement. We also introduce a bridge point planning method based on the characteristics of our method to improve exploration efficiency, thereby solving more complex tasks. Experiments show that our method performs better on sparse reward and local optimal problems in complex environments than previous work.

preprint2019arXiv

Stability of three-dimensional icosahedral quasicrystals in multi-component systems

The relative stability of three-dimensional icosahedral quasicrystals in multi-component systems has been investigated based on a coupled-mode Swift-Hohenberg model with two-length-scales. A recently developed projection method, which provides a unified numerical framework to study periodic crystals and quasicrystals, is used to compute free energies to high accuracy. Compared with traditional approaches, the advantage of the projection method has been also discussed detailedly. A rigorous and systematical computation demonstrates that three-dimensional icosahedral quasicrystal, two-dimensional decagonal quasicrystal are stable phases in such a simple multi-component coupled-mode Swift-Hohenberg model. The result extends the multiple length-scales interaction mechanism which can stabilize quasicrystals from single-component to multi-component systems.