State-Space NTK Collapse Near Bifurcations
Rich feature learning in tasks that unfold over time often requires the model to pass through bifurcations, constituting qualitative changes in the underlying model dynamics. We develop a local theory of gradient descent near these transitions through the empirical state-space neural tangent kernel (sNTK). Our central finding is that bifurcations both dominate and simplify learning dynamics: near bifurcations, we can reduce sNTK to a rank-one operator corresponding to learning in a classical normal form system, providing an analytically tractable description of the local learning geometry, even for high-dimensional recurrent systems. Concretely, we give a procedure for decomposing sNTK into bifurcation-relevant and residual channels, showing that near commonly codimension-1 bifurcations the relevant channel is a rank-one operator that is highly amplified. This amplification causes the bifurcation channel to dominate the full sNTK. Thus, bifurcations locally warp the learning landscape, funneling gradient descent into a few critical dynamical directions and making the nearby kernel and loss geometry predictable from classical normal forms. We illustrate this in a student-teacher recurrent neural network: the first learned bifurcation coincides with a sharp collapse in sNTK effective rank and the emergence of a dominant parameter direction whose restricted sNTK closely matches the landscape predicted by the scalar pitchfork normal form. Finally, we show that low-rank natural gradient methods resolve the resulting learning instability near bifurcations with very little overhead over SGD.