Source author record

Mengxi Wu

Mengxi Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.atom-ph Machine Learning physics.optics quant-ph Artificial Intelligence Computation and Language cond-mat.mes-hall

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths

Designing a unified neural network to efficiently and inherently process sequential data with arbitrary lengths is a central and challenging problem in sequence modeling. The design choices in Transformer, including quadratic complexity and weak length extrapolation, have limited their ability to scale to long sequences. In this work, we propose Gecko, a neural architecture that inherits the design of Mega and Megalodon (exponential moving average with gated attention), and further introduces multiple technical components to improve its capability to capture long range dependencies, including timestep decay normalization, sliding chunk attention mechanism, and adaptive working memory. In a controlled pretraining comparison with Llama2 and Megalodon in the scale of 7 billion parameters and 2 trillion training tokens, Gecko achieves better efficiency and long-context scalability. Gecko reaches a training loss of 1.68, significantly outperforming Llama2-7B (1.75) and Megalodon-7B (1.70), and landing close to Llama2-13B (1.67). Notably, without relying on any context-extension techniques, Gecko exhibits inherent long-context processing and retrieval capabilities, stably handling sequences of up to 4 million tokens and retrieving information from contexts up to $4\times$ longer than its attention window. Code: https://github.com/XuezheMax/gecko-llm

preprint2026arXiv

GQA-μP: The maximal parameterization update for grouped query attention

Hyperparameter transfer across model architectures dramatically reduces the amount of compute necessary for tuning large language models (LLMs). The maximal update parameterization (μP) ensures transfer through principled mathematical analysis but can be challenging to derive for new model architectures. Building on the spectral feature-learning view of Yang et al. (2023a), we make two advances. First, we promote spectral norm conditions on the weights from a heuristic to the definition of feature learning, and as a consequence arrive at the Complete-P depth and weight-decay scalings without recourse to lazy-learning. Second, we consider a modified spectral norm that preserves the valid scaling law of network weights when weight matrices are not full rank. This enables (to our knowledge, the first) derivation of μP scalings for grouped-query attention (GQA). We demonstrate the efficacy of our theoretical derivations by showing learning rate transfer across the GQA repetition hyperparameter as well as experiments regarding transfer over weight decay.

preprint2022arXiv

Conditional Seq2Seq model for the time-dependent two-level system

We apply the deep learning neural network architecture to the two-level system in quantum optics to solve the time-dependent Schrodinger equation. By carefully designing the network structure and tuning parameters, above 90 percent accuracy in super long-term predictions can be achieved in the case of random electric fields, which indicates a promising new method to solve the time-dependent equation for two-level systems. By slightly modifying this network, we think that this method can solve the two- or three-dimensional time-dependent Schrodinger equation more efficiently than traditional approaches.

preprint2020arXiv

Attosecond synchronization of extreme ultraviolet high harmonics from crystals

The interaction of strong near-infrared (NIR) laser pulses with wide-bandgap dielectrics produces high harmonics in the extreme ultraviolet (XUV) wavelength range. These observations have opened up the possibility of attosecond metrology in solids, which would benefit from a precise measurement of the emission times of individual harmonics with respect to the NIR laser field. Here we show that, when high-harmonics are detected from the input surface of a magnesium oxide crystal, a bichromatic probing of the XUV emission shows a clear synchronization largely consistent with a semiclassical model of electron-hole recollisions in bulk solids. On the other hand, the bichromatic spectrogram of harmonics originating from the exit surface of the 200 $μ$m-thick crystal is strongly modified, indicating the influence of laser field distortions during propagation. Our tracking of sub-cycle electron and hole re-collisions at XUV energies is relevant to the development of solid-state sources of attosecond pulses.

preprint2016arXiv

Multi-level perspective on high-order harmonic generation in solids

We investigate high-order harmonic generation in a solid, modeled as a multi-level system dressed by a strong infrared laser field. We show that the cutoff energies and the relative strengths of the multiple plateaus that emerge in the harmonic spectrum can be understood both qualitatively and quantitatively by considering a combination of adiabatic and diabatic processes driven by the strong field. Such a model was recently used to interpret the multiple plateaus exhibited in harmonic spectra generated by solid argon and krypton [Ndabashimiye {\it et al.}, Nature 534, 520 (2016)]. We also show that when the multi-level system originates from the Bloch state at the $Γ$ point of the band structure, the laser-dressed states are equivalent to the Houston states [Krieger {\it el al.} Phys. Rev. B 33, 5494 (1986)] and will therefore map out the band structure away from the $Γ$ point as the laser field increases. This leads to a semi-classical three-step picture in momentum space that describes the high-order harmonic generation process in a solid.

preprint2015arXiv

High harmonic generation from Bloch electrons in solids

We study the generation of high harmonic radiation by Bloch electrons in a model transparent solid driven by a strong mid-infrared laser field. We solve the single-electron time-dependent Schrödinger equation (TDSE) using a velocity-gauge method [New J. Phys. 15, 013006 (2013)] that is numerically stable as the laser intensity and number of energy bands are increased. The resulting harmonic spectrum exhibits a primary plateau due to the coupling of the valence band to the first conduction band, with a cutoff energy that scales linearly with field strength and laser wavelength. We also find a weaker second plateau due to coupling to higher-lying conduction bands, with a cutoff that is also approximately linear in the field strength. To facilitate the analysis of the time-frequency characteristics of the emitted harmonics, we also solve the TDSE in a time-dependent basis set, the Houston states [Phys. Rev. B 33, 5494 (1986)], which allows us to separate inter-band and intra-band contributions to the time-dependent current. We find that the inter-band and intra-band contributions display very different time-frequency characteristics. We show that solutions in these two bases are equivalent under an unitary transformation but that, unlike the velocity gauge method, the Houston state treatment is numerically unstable when more than a few low lying energy bands are used.

preprint2014arXiv

Multiphoton transitions for delay-zero calibration in attosecond spectroscopy

The exact delay-zero calibration in an attosecond pump-probe experiment is important for the correct interpretation of experimental data. In attosecond transient absorption spectroscopy the determination of the delay-zero exclusively from the experimental results is not straightforward and may introduce significant errors. Here, we report the observation of quarter-laser-cycle (4ω) oscillations in a transient absorption experiment in helium using an attosecond pulse train overlapped with a precisely synchronized, moderately strong infrared pulse. We demonstrate how to extract and calibrate the delay-zero with the help of the highly nonlinear 4ω signal. A comparison with the solution of the time-dependent Schrödinger equation is used to confirm the accuracy and validity of the approach. Moreover, we study the mechanisms behind the quarter-laser-cycle and the better-known half-laser-cycle oscillations as a function of experimental parameters. This investigation yields an indication of the robustness of our delay-zero calibration approach.

preprint2013arXiv

Quantum interference in attosecond transient absorption of laser-dressed helium atoms

We calculate the transient absorption of an isolated attosecond pulse by helium atoms subject to a delayed infrared (\ir) laser pulse. With the central frequency of the broad attosecond spectrum near the ionization threshold, the absorption spectrum is strongly modulated at the sub-\ir-cycle level. Given that the absorption spectrum results from a time-integrated measurement, we investigate the extent to which the delay-dependence of the absorption yields information about the attosecond dynamics of the atom-field energy exchange. We find two configurations in which this is possible. The first involves multi photon transitions between bound states that result in interference between different excitation pathways. The other involves the modification of the bound state absorption lines by the IR field, which we find can result in a sub-cycle time dependence only when ionization limits the duration of the strong field interaction.