Source author record

Jiamin Chen

Jiamin Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language astro-ph.GA

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require constructing full-length inputs: we preserve the original short context as an intact first segment and append a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence while maintaining the semantic continuity of the training text--a property absent in chunk-based simulation approaches that split contiguous context. We provide a theoretical analysis grounded in Rotary Position Embedding and the Bernstein inequality, showing that position interpolation induces a rigorous smoothness constraint over the attention function, with shared Transformer parameters further suppressing unstable extrapolation to unobserved intermediate distances. Applied to LLaMA-family models extending the context window from 8K to 64K, EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, surpassing LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while requiring substantially less computation. These results demonstrate that long-context generalization can be induced from sparse positional supervision, challenging the prevailing assumption that dense long-sequence training is necessary for reliable context-window extension. The code is available at https://github.com/clx1415926/EndPrompt.

preprint2026arXiv

Measuring Maximum Activations in Open Large Language Models

The dynamic range of activations is a first-order constraint for low-bit quantization, activation scaling, and stable LLM inference. Prior work characterized outlier features and massive activations on pre-2024 LLaMA-style models, and the downstream activation-quantization stack inherits that picture without revisiting it for the post-LLaMA open-model boom. We ask the deployment-oriented question: how large can activations get in modern open LLMs, and how does this magnitude vary across families, generations, and training stages? Under a unified pipeline (5,000-sample multi-domain corpus, family-specific tokenization, identical hooks across embeddings, hidden states, attention, MLP/MoE, SwiGLU gates, and final norm), we measure global and layerwise maxima on 27 checkpoints from 8 open families spanning dense, MoE, vision-language, intermediate-training, and instruction-tuned variants. We find that (i) global maxima span over nearly four orders of magnitude at comparable parameter counts, with Qwen3.5 and MoE checkpoints in the 10^2 to 10^3 range and Gemma3-27B-it reaching ~7 x 10^5; (ii) cross-family and cross-generation comparisons break simple monotonic scaling; and (iii) MoE checkpoints exhibit 14.0-23.4x lower peaks than matched-scale dense counterparts, while the residual stream carries the global maximum in 22/24 checkpoints. A lightweight INT-8 sanity check shows that measured maxima co-vary with low-bit reconstruction error via activation-scale selection. We conclude that maximum activation magnitude is a model property tied to family, architecture, and training stage - not a simple byproduct of size - and should be measured and reported alongside any open-weight release before low-bit deployment. The code is publicly available at https://github.com/clx1415926/Max_act_llm.

preprint2022arXiv

Identification of new M31 star cluster candidates from PAndAS images using convolutional neural networks

Context.Identification of new star cluster candidates in M31 is fundamental for the study of the M31 stellar cluster system. The machine-learning method convolutional neural network (CNN) is an efficient algorithm for searching for new M31 star cluster candidates from tens of millions of images from wide-field photometric surveys. Aims.We search for new M31 cluster candidates from the high-quality $g$- and $i$-band images of 21,245,632 sources obtained from the Pan-Andromeda Archaeological Survey (PAndAS) through a CNN. Methods.We collected confirmed M31 clusters and noncluster objects from the literature as our training sample. Accurate double-channel CNNs were constructed and trained using the training samples. We applied the CNN classification models to the PAndAS $g$- and $i$-band images of over 21 million sources to search new M31 cluster candidates. The CNN predictions were finally checked by five experienced human inspectors to obtain high-confidence M31 star cluster candidates. Results.After the inspection, we identified a catalogue of 117 new M31 cluster candidates. Most of the new candidates are young clusters that are located in the M31 disk. Their morphology, colours, and magnitudes are similar to those of the confirmed young disk clusters. We also identified eight globular cluster candidates that are located in the M31 halo and exhibit features similar to those of confirmed halo globular clusters. The projected distances to the M31 centre for three of them are larger than 100\,kpc.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint