Source author record

Yuqing Cheng

Yuqing Cheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence physics.optics Sound cond-mat.mes-hall

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation

A common design pattern in high-quality music generation is to handle structure and fidelity in different representation spaces: a generator first models high-level structure, followed by diffusion-based or neural decoding stages that reconstruct fine details. In this work, we explore an alternative view: both may be progressively modeled within a single deep acoustic-token hierarchy. To study this, we build a 64-layer residual vector quantization (RVQ) acoustic representation and propose a two-stage coarse-to-fine generation framework. A backbone model first generates coarse acoustic tokens for the full track, and a super-resolution model then completes finer tokens within the same acoustic token space. The super-resolution stage works at full-track scale and refines tokens layer by layer while running in parallel over time, leading to a fixed 62-step inference process. To jointly improve lyric alignment and fine-detail reconstruction, we further introduce hybrid-attention training: the alignment objective uses causal attention, while layer-wise refinement uses full attention. A key finding is that text--vocal alignment can emerge within pure acoustic-token language modeling, without requiring a separate semantic token stage. Moreover, initializing the super-resolution model from the trained backbone significantly improves convergence and final quality. Taken together, our results suggest that high-quality music generation can be effectively pursued without separating structure and fidelity into heterogeneous representation spaces. Instead, both can be progressively modeled within a unified acoustic-token hierarchy, pointing toward a simpler and more unified path to high-quality music generation.

preprint2026arXiv

Modeling Music as a Time-Frequency Image: A 2D Tokenizer for Music Generation

Autoregressive music generation depends strongly on the audio tokenizer. Existing high-fidelity codecs often use residual multi-codebook quantization, which preserves reconstruction quality but complicates language modeling after sequence flattening, as the residual hierarchy imposes strong sequential dependencies and can amplify error accumulation. We propose BandTok, a generation-oriented 2D Mel-spectrogram tokenizer that represents each frame with Mel-frequency band tokens from a single shared codebook. This design yields a physically interpretable time-frequency token grid with a more independent token structure, making it better suited for autoregressive modeling. BandTok improves reconstruction with a multi-scale PatchGAN objective and EMA codebook updates. We further introduce an autoregressive language model with 2D Rotary Position Embedding (2D RoPE) to preserve temporal and frequency-band structure during generation. Experiments show that BandTok improves over residual-codebook tokenizers and achieves strong results in a data-limited setting. The source code and generation demos for this work are publicly available.

preprint2022arXiv

Absorption and photoluminescence properties of coupled plasmon-exciton (plexciton) systems

Plexciton is the formation of new hybridized energy states originated from the coupling between plasmon and exciton. To reveal the optical properties of both exciton and plexciton, we develop a classic oscillator model to describe the behavior of them. Particularly, the coupling case, i.e., plexciton, is investigated theoretically in detail. In strong coupling, the electromagnetically induced transparency is achieved for the absorption spectra; the splitting behaviors of the modes are carefully analyzed, and the splitting largely depends on the effective number of the electrons and the resonance coupling; the photoluminescence spectra show that the spectral shapes remain almost unchanged for weak coupling and change a lot for strong coupling; the emission intensity of the exciton is strongly enhanced by the plasmon and can reach to the order of $10^{10}$ for a general case. We also show the comparisons between our model and the published experiments to validate its validity. This work may be useful for understanding the mechanism of the plexciton and for the development of new applications.

preprint2015arXiv

Plasmonic nano-resonator enhanced one-photon luminescence from single gold nanorods

Strong Stokes and anti-Stokes one-photon luminescence from single gold nanorods is measured in experiments. It is found that the intensity and polarization of the Stokes and anti-Stokes emissions are in strong correlation. Our experimental observation discovered a coherent process in light emission from single gold nanorods. We present a theoretical mode, based on the concept of cavity resonance, for consistently understanding both Stokes and anti-Stokes photoluminescence. Our theory is in good agreement of all our measurements.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint