Source author record

Kai Jiang

Kai Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.soft Machine Learning Artificial Intelligence math.NA Numerical Analysis physics.comp-ph Computer Vision cond-mat.mtrl-sci Performance cond-mat.mes-hall Hardware Architecture math-ph math.DS math.MP physics.optics

Catalog footprint

What is connected

19works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBenchX, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than method design. Category explains nearly three times more variance in semantic correctness than method (9.4% vs 3.3% explained deviance), and 72% of Fusion tasks fail across all five methods while Math tasks are solved consistently. Second, iterative refinement improves correctness, but not performance. Across GEAK iterations, compile rate rises from 52.3% to 68.8% while average speedup declines from $1.58\times$ to $1.44\times$; newly rescued kernels consistently underperform persistently correct ones ($1.16\times$ vs $1.58\times$ speedup in round~0$\to$1). Third, correctness does not imply efficiency. 46.6% of correct kernels are slower than the PyTorch eager baseline, and cross-hardware speedup variance reaches $21.4\times$. Besides, quantization remains completely unsolved (0/30 successes) despite non-trivial compilation rates, revealing systematic misunderstanding of numerical computation contracts rather than surface-level syntax errors. These findings suggest that future progress depends on handling global coordination, explicitly modeling numerical precision, and incorporating hardware efficiency into generation. The code is available at https://github.com/BonnieW05/KernelBenchX

preprint2026arXiv

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blackwell GPUs to accelerate attention computation. Our implementation achieves 1038 TOPS on RTX5090, which is a 5x speedup over the fastest FlashAttention on RTX5090. Experiments show that our FP4 attention can accelerate inference of various models in a plug-and-play way. Second, we pioneer low-bit attention to training tasks. Existing low-bit attention works like FlashAttention3 and SageAttention focus only on inference. However, the efficiency of training large models is also important. To explore whether low-bit attention can be effectively applied to training tasks, we design an accurate and efficient 8-bit attention for both forward and backward propagation. Experiments indicate that 8-bit attention achieves lossless performance in fine-tuning tasks but exhibits slower convergence in pretraining tasks. The code is available at https://github.com/thu-ml/SageAttention.

preprint2026arXiv

Spectral Distribution of one-dimensional Photonic Quasicrystals: The Role of Irrational Numbers

In this paper, we construct a one-dimensional photonic quasicrystal by combining two incommensurate spatial harmonics, where the ratio of their periods is the irrational number β. We evaluate the photonic quasicrystal accurately by a generalized spectral method that embeds the quasiperiodic structure into a higher-dimensional periodic system. We study the spectral distribution of one-dimensional photonic quasicrystals and find some interesting phenomena. As the computational resolution N increases, there are more eigenvalues within finite frequency bandwidths, and the maximum localization always occurs at spectral gap edges for states near index N + 1. By varying βwithin the range of (0,1), we present a butterfly-shaped spectral structure with abundant band gaps. We find that the spectral structure factor Q (defined as I_{mg}/N, where I_{mg} is the maximum gap index) exhibits different linear patterns as βchanges: Q = 1 - βwhen β< βc, while Q = βwhen β> βc, where βc \approx 0.424 is the transition point. This linear relationship holds robustly in the strong quasiperiodic regime (βaway from 0 or 1) and is independent of the specific type of irrational number used. The relationship disappears (weak quasiperiodic regime) near β= 0 or β= 1. It demonstrates that the intrinsic spectral properties of one-dimensional photonic quasicrystals are fundamentally governed by the magnitude of the irrational parameter β.

preprint2024arXiv

Accurately recover global quasiperiodic systems by finite points

Quasiperiodic systems, related to irrational numbers, are space-filling structures without decay nor translation invariance. How to accurately recover these systems, especially for non-smooth cases, presents a big challenge in numerical computation. In this paper, we propose a new algorithm, finite points recovery (FPR) method, which is available for both smooth and non-smooth cases, to address this challenge. The FPR method first establishes a homomorphism between the lower-dimensional definition domain of the quasiperiodic function and the higher-dimensional torus, then recovers the global quasiperiodic system by employing interpolation technique with finite points in the definition domain without dimensional lifting. Furthermore, we develop accurate and efficient strategies of selecting finite points according to the arithmetic properties of irrational numbers. The corresponding mathematical theory, convergence analysis, and computational complexity analysis on choosing finite points are presented. Numerical experiments demonstrate the effectiveness and superiority of FPR approach in recovering both smooth quasiperiodic functions and piecewise constant Fibonacci quasicrystals. While existing spectral methods encounter difficulties in accurately recovering non-smooth quasiperiodic functions.

preprint2024arXiv

Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the novel FY-4A-Himawari-8 (FYH) dataset, which includes nine distinct cloud categories and uses precise domain adaptation methods to align 70,419 image-label pairs in terms of projection, temporal resolution, and spatial resolution, thereby facilitating the training of supervised deep learning networks. Given the complexity and diversity of cloud formations, we have thoroughly analyzed the challenges inherent to cloud recognition tasks, examining the intricate characteristics and distribution of the data. To effectively address these challenges, we designed a Distribution-aware Interactive-Attention Network (DIAnet), which preserves pixel-level details through a high-resolution branch and a parallel multi-resolution cross-branch. We also integrated a distribution-aware loss (DAL) to mitigate the imbalance across cloud categories. An Interactive Attention Module (IAM) further enhances the robustness of feature extraction combined with spatial and channel information. Empirical evaluations on the FYH dataset demonstrate that our method outperforms other cloud recognition networks, achieving superior performance in terms of mean Intersection over Union (mIoU). The code for implementing DIAnet is available at https://github.com/icey-zhang/DIAnet.

preprint2022arXiv

Adaptive Fairness-Aware Online Meta-Learning for Changing Environments

The fairness-aware online learning framework has arisen as a powerful tool for the continual lifelong learning setting. The goal for the learner is to sequentially learn new tasks where they come one after another over time and the learner ensures the statistic parity of the new coming task across different protected sub-populations (e.g. race and gender). A major drawback of existing methods is that they make heavy use of the i.i.d assumption for data and hence provide static regret analysis for the framework. However, low static regret cannot imply a good performance in changing environments where tasks are sampled from heterogeneous distributions. To address the fairness-aware online learning problem in changing environments, in this paper, we first construct a novel regret metric FairSAR by adding long-term fairness constraints onto a strongly adapted loss regret. Furthermore, to determine a good model parameter at each round, we propose a novel adaptive fairness-aware online meta-learning algorithm, namely FairSAOML, which is able to adapt to changing environments in both bias control and model precision. The problem is formulated in the form of a bi-level convex-concave optimization with respect to the model's primal and dual parameters that are associated with the model's accuracy and fairness, respectively. The theoretic analysis provides sub-linear upper bounds for both loss regret and violation of cumulative fairness constraints. Our experimental evaluation on different real-world datasets with settings of changing environments suggests that the proposed FairSAOML significantly outperforms alternatives based on the best prior online learning approaches.

preprint2021arXiv

Transition pathways connecting crystals and quasicrystals

Due to structural incommensurability, the emergence of a quasicrystal from a crystalline phase represents a challenge to computational physics. Here the nucleation of quasicrystals is investigated by using an efficient computational method applied to a Landau free-energy functional. Specifically, transition pathways connecting different local minima of the Lifshitz-Petrich model are obtained by using the high-index saddle dynamics. Saddle points on these paths are identified as the critical nuclei of the 6-fold crystals and 12-fold quasicrystals. The results reveal that phase transitions between the crystalline and quasicrystalline phases could follow two possible pathways, corresponding to a one-stage phase transition and a two-stage phase transition involving a metastable lamellar quasicrystalline state, respectively.

preprint2020arXiv

Complete integrability of diffeomorphisms and their local normal forms

In this paper, we consider the normal form problem of a commutative family of germs of diffeomorphisms at a fixed point, say the origin, of $\mathbb{K}^n$ ($\mathbb{K}=\mathbb{C}$ or $\mathbb{R}$). We define a notion of integrability of such a family. We give sufficient conditions which ensure that such an integrable family can be transformed into a normal form by an analytic (resp. a smooth) transformation if the initial diffeomorphisms are analytic (resp. smooth).

preprint2020arXiv

High-order energy stable schemes of incommensurate phase-field crystal model

This article focuses on the development of high-order energy stable schemes for the multi-length-scale incommensurate phase-field crystal model which is able to study the phase behavior of aperiodic structures. These high-order schemes based on the scalar auxiliary variable (SAV) and spectral deferred correction (SDC) approaches are suitable for the L 2 gradient flow equation, i.e., the Allen-Cahn dynamic equation. Concretely, we propose a second-order Crank-Nicolson (CN) scheme of the SAV system, prove the energy dissipation law, and give the error estimate in the almost periodic function sense. Moreover, we use the SDC method to improve the computational accuracy of the SAV/CN scheme. Numerical results demonstrate the advantages of high-order numerical methods in numerical computations and show the influence of length-scales on the formation of ordered structures.

preprint2020arXiv

High-order energy stable schemes of incommensurate phase-field crystal model

preprint2020arXiv

Reinforcement Learning with Goal-Distance Gradient

Reinforcement learning usually uses the feedback rewards of environmental to train agents. But the rewards in the actual environment are sparse, and even some environments will not rewards. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. Although using shaped rewards is effective when solving sparse reward tasks, it is limited to specific problems and learning is also susceptible to local optima. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment. Our method use the minimum number of transitions between states as the distance to replace the rewards of environmental, and proposes a goal-distance gradient to achieve policy improvement. We also introduce a bridge point planning method based on the characteristics of our method to improve exploration efficiency, thereby solving more complex tasks. Experiments show that our method performs better on sparse reward and local optimal problems in complex environments than previous work.

preprint2019arXiv

Stability of three-dimensional icosahedral quasicrystals in multi-component systems

The relative stability of three-dimensional icosahedral quasicrystals in multi-component systems has been investigated based on a coupled-mode Swift-Hohenberg model with two-length-scales. A recently developed projection method, which provides a unified numerical framework to study periodic crystals and quasicrystals, is used to compute free energies to high accuracy. Compared with traditional approaches, the advantage of the projection method has been also discussed detailedly. A rigorous and systematical computation demonstrates that three-dimensional icosahedral quasicrystal, two-dimensional decagonal quasicrystal are stable phases in such a simple multi-component coupled-mode Swift-Hohenberg model. The result extends the multiple length-scales interaction mechanism which can stabilize quasicrystals from single-component to multi-component systems.

preprint2016arXiv

Dirac Fermions induced in strained zigzag phosphorus nanotubes and the applications in field effect transistors

In this work, Dirac fermions have been obtained and engineered in one-dimensional (1D) zigzag phosphorus nanotubes (ZPNTs). We have performed a comprehensive first-principle computational study of the electronic properties of ZPNTs with various diameters. The results indicate that as the lattice parameter (Lc) along axial direction increases, ZPNTs undergo transitions from metal to semimetal and semimetal to semiconductor, whereas Dirac fermions appear at Lc ranging from 3.90Å to 4.10Å. In particular, a field effect transistor (FET) based on a 12-ZPNT (with 12 unit cells in transverse direction) exhibits semiconductor behaviors with efficient gate-effect modulation at Lc= 4.60Å. However, only weak gate modulation is demonstrated when the nanotube becomes semimetal at Lc= 4.10Å. This study indicates that ZPNTs are profoundly appealing in applications in the strain sensors. Our findings pave the way for development of high-performance strain-engineered electronics based on Dirac Fermions in 1D materials.

preprint2016arXiv

Strain effect engineered in α-Al2O3/monolayer MoS2 interface by first principle calculations

With the advances in low dimensional transition metal dichalcolgenides (TMDCs) based metal oxide semiconductor field effect transistor (MOSFET), the interface between semiconductors and dielectrics has received considerable attention due to its dramatic effects on the morphology and charge transport of semiconductors. In this study, first principle calculations were utilized to investigate the strain effect induced by the interface between Al2O3 (0001) and monolayer MoS2. The results indicate that Al2O3 in 1.3nm thickness can apply the strain of 0.3% on MoS2 monolayer. The strain effect monotonically increases with the larger thickness of the dielectric layer. Also, the study on temperature effect indicates the monotonic lattice expansion induced by the higher temperature. Our study proposes that the dielectric engineering can be an effective tool for strain effect in the nanotechnology.

preprint2015arXiv

Self-Assembly of Asymmetrically Interacting ABC Star Triblock Copolymer Melts

The phase behavior of asymmetrically interacting ABC star triblock copolymer melts is investigated by the self-consistent field theory (SCFT). Motivated by the experimental systems, in this study, we focus on the systems in which the Flory-Huggins interaction parameters satisfy $χ_{AC}>χ_{BC}\approx χ_{AB}$. Using various initialization strategies, a large number of periodic structures have been obtained in our calculations. A fourth-order pseudospectral algorithm combined with Anderson mixing method is used to compute the free energy of candidate structures carefully. The stability has been detailedly analyzed by splitting the free energy into internal and entropic parts. A complete and complex triangular phase diagram is presented for a model with $χ_{AC}>χ_{BC}= χ_{AB}$ in which fifteen ordered phases, including two-, and three-dimensional structures, have been predicted to be stable from the SCFT calculations. Generally speaking, with the asymmetrical interactions, the hierarchical structures tend to be formed near the B-rich corner of the triangular phase diagram. This work broadens the previous theoretical results from equal interaction systems to unequal interaction systems. The predicted phase behavior is in good agreement with experimental observations and previous theoretical results.

preprint2015arXiv

Stability of Soft Quasicrystals in a Coupled-Mode Swift-Hohenberg Model for Three-Component Systems

In this article, we discuss the stability of soft quasicrystalline phases in a coupled-mode Swift-Hohenberg model for three-component systems, where the characteristic length scales are governed by the positive-definite gradient terms. Classic two-mode approximation method and direct numerical minimization are applied to the model. In the latter approach, we apply the projection method to deal with the potentially quasiperiodic ground states. A variable cell method of optimizing the shape and size of higher-dimensional periodic cell is developed to minimize the free energy with respect to the order parameters. Based on the developed numerical methods, we rediscover decagonal and dodecagonal quasicrystalline phases, and find diverse periodic phases and complex modulated phases. Furthermore, phase diagrams are obtained in various phase spaces by comparing the free energies of different candidate structures. It does show not only the important roles of system parameters, but also the effect of optimizing computational domain. In particular, the optimization of computational cell allows us to capture the ground states and phase behavior with higher fidelity. We also make some discussions on our results and show the potential of applying our numerical methods to a larger class of mean-field free energy functionals.

preprint2015arXiv

Stability of Two-Dimensional Soft Quasicrystals

The relative stability of two-dimensional soft quasicrystals is examined using a recently developed projection method which provides a unified numerical framework to compute the free energy of periodic crystal and quasicrystals. Accurate free energies of numerous ordered phases, including dodecagonal, decagonal and octagonal quasicrystals, are obtained for a simple model, i.e. the Lifshitz-Petrich free energy functional, of soft quasicrystals with two length-scales. The availability of the free energy allows us to construct phase diagrams of the system, demonstrating that, for the Lifshitz-Petrich model, the dodecagonal and decagonal quasicrystals can become stable phases, whereas the octagonal quasicrystal stays as a metastable phase.

preprint2013arXiv

Analytic Structure of the SCFT Energy Functional of Multicomponent Block Copolymers

This paper concerns the analytic structure of the self-consistent field theory (SCFT) energy functional of multicomponent block copolymer systems which contain more than two chemically distinct blocks. The SCFT has enjoyed considered success and wide usage in investigation of the complex phase behavior of block copolymers. It is well-known that the physical solutions of the SCFT equations are saddle points, however, the analytic structure of the SCFT energy functional has received little attention over the years. A recent work by Fredrickson and collaborators [see the monograph by Fredrickson, The Equilibrium Theory of Inhomogeneous Polymers, (2006), pp. 203-209] has analysed the mathematical structure of the field energy functional for polymeric systems, and clarified the index-1 saddle point nature of the problem produced by the incompressibility constraint. In this paper, our goals are to draw further attention to multicomponent block copolymers utilizing the Hubbard-Stratonovich transformation used by Fredrickson and co-workers. We first show that the saddle point character of the SCFT energy functional of multicomponent block copolymer systems may be high index, not only produced by the incompressibility constraint, but also by the Flory-Huggins interaction parameters. Our analysis will be beneficial to many theoretical studies, such as the nucleation theory of ordered phases, the mesoscopic dynamics. As an application, we then utilize the discovery to develop the gradient-based iterative schemes to solve the SCFT equations, and illustrate its performance through several numerical experiments taking ABC star triblock copolymers as an example.

preprint2013arXiv

Numerical Methods for Quasicrystals

Quasicrystals are one kind of space-filling structures. The traditional crystalline approximant method utilizes periodic structures to approximate quasicrystals. The errors of this approach come from two parts: the numerical discretization, and the approximate error of Simultaneous Diophantine Approximation which also determines the size of the domain necessary for accurate solution. As the approximate error decreases, the computational complexity grows rapidly, and moreover, the approximate error always exits unless the computational region is the full space. In this work we focus on the development of numerical method to compute quasicrystals with high accuracy. With the help of higher-dimensional reciprocal space, a new projection method is developed to compute quasicrystals. The approach enables us to calculate quasicrystals rather than crystalline approximants. Compared with the crystalline approximant method, the projection method overcomes the restrictions of the Simultaneous Diophantine Approximation, and can also use periodic boundary conditions conveniently. Meanwhile, the proposed method efficiently reduces the computational complexity through implementing in a unit cell and using pseudospectral method. For illustrative purpose we work with the Lifshitz-Petrich model, though our present algorithm will apply to more general systems including quasicrystals. We find that the projection method can maintain the rotational symmetry accurately. More significantly, the algorithm can calculate the free energy density to high precision.

Kai Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Spectral Distribution of one-dimensional Photonic Quasicrystals: The Role of Irrational Numbers

Accurately recover global quasiperiodic systems by finite points

Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

Adaptive Fairness-Aware Online Meta-Learning for Changing Environments

Transition pathways connecting crystals and quasicrystals

Complete integrability of diffeomorphisms and their local normal forms

High-order energy stable schemes of incommensurate phase-field crystal model

High-order energy stable schemes of incommensurate phase-field crystal model

Reinforcement Learning with Goal-Distance Gradient

Stability of three-dimensional icosahedral quasicrystals in multi-component systems

Dirac Fermions induced in strained zigzag phosphorus nanotubes and the applications in field effect transistors

Strain effect engineered in α-Al2O3/monolayer MoS2 interface by first principle calculations

Self-Assembly of Asymmetrically Interacting ABC Star Triblock Copolymer Melts

Stability of Soft Quasicrystals in a Coupled-Mode Swift-Hohenberg Model for Three-Component Systems

Stability of Two-Dimensional Soft Quasicrystals

Analytic Structure of the SCFT Energy Functional of Multicomponent Block Copolymers

Numerical Methods for Quasicrystals