Source author record

Yongjin Yang

Yongjin Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language hep-ex nucl-ex physics.ins-det

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Supervised fine-tuning (SFT) is widely used to inject new knowledge into language models, but it often degrades pretrained capabilities such as reasoning and general-domain performance. We argue this forgetting arises because fine-tuning targets from humans or external systems diverge from the model's autoregressive distribution, forcing the optimizer to imitate low-probability token sequences. To address this problem, we propose MixSD, a simple external-teacher-free method for distribution-aligned knowledge injection. Instead of training on fixed targets, MixSD constructs supervision dynamically by mixing tokens from two conditionals of the base model itself: an expert conditional that observes the injected fact in context, and a naive conditional that reflects the model's original prior. The resulting supervision sequences preserve the factual learning signal while remaining substantially closer to the base model's distribution. We evaluate MixSD on two synthetic corpora that we construct to study factual recall and arithmetic function acquisition in a controlled setting, together with established benchmarks for open-domain factual question answering and knowledge editing. Across multiple model scales and settings, MixSD consistently achieves a better memorization-retention trade-off compared to SFT and on-policy self distillation baselines, retaining up to 100% of the base model's held-out capability while maintaining near-perfect training accuracy, whereas standard SFT retains as little as 1%. We further show that MixSD produces substantially lower-NLL supervision targets under the base model and reduces harmful movement along Fisher-sensitive parameter directions. These results suggest that aligning supervision with the model's native generation distribution is a simple and effective principle for knowledge injection that mitigates catastrophic forgetting.

preprint2022arXiv

First operation of undoped CsI directly coupled with SiPMs at 77 Kelvin

The light yield of a small undoped cesium iodide (CsI) crystal directly coupled with two silicon photomultipliers (SiPMs) at about 77~Kelvin was measured to be $43.0 \pm 1.1$~photoelectrons (PE) per keV electron-equivalent (keV$_\text{ee}$) using $X$ and $γ$-ray peaks from an $^{241}$Am radioactive source from 18 to 60 keV. The high light yield together with some other technical advantages illustrate the great potential of this novel combination for neutrino and low-mass dark matter detection, particularly at accelerator-based neutrino sources, where random background can be highly suppressed by requiring coincident triggers between SiPMs and beam pulse timing signals. Some potential drawbacks of using cryogenic SiPMs instead of photomultiplier tubes (PMTs) were identified, such as worse energy resolution and optical cross-talks between SiPMs. Their influence to rare-event detection was discussed and possible solutions were provided.