Source author record

Yiran Cheng

Yiran Cheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.AG Computation and Language Cryptography and Security math.CV

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs

Preserving privacy in sensitive data while pretraining large language models on small, domain-specific corpora presents a significant challenge. In this work, we take an exploratory step toward privacy-preserving continual pretraining by proposing an entity-based framework that synthesizes encrypted training data to protect personally identifiable information (PII). Our approach constructs a weighted entity graph to guide data synthesis and applies deterministic encryption to PII entities, enabling LLMs to encode new knowledge through continual pretraining while granting authorized access to sensitive data through decryption keys. Our results on limited-scale datasets demonstrate that our pretrained models outperform base models and ensure PII security, while exhibiting a modest performance gap compared to models trained on unencrypted synthetic data. We further show that increasing the number of entities and leveraging graph-based synthesis improves model performance, and that encrypted models retain instruction-following capabilities with long retrieved contexts. We discuss the security implications and limitations of deterministic encryption, positioning this work as an initial investigation into the design space of encrypted data pretraining for privacy-preserving LLMs. Our code is available at https://github.com/DataArcTech/SoE.

preprint2022arXiv

Mukai's program for non-primitive curves on K3 surfaces

Mukai's program seeks to recover a K3 surface $X$ from any curve $C$ on it by exhibiting it as a Fourier-Mukai partner to a Brill-Noether locus of vector bundles on the curve. In the case $X$ has Picard number one and the curve $C\in |H|$ is primitive, this was confirmed by Feyzbakhsh for $g\geq 11$ and $g\neq 12$. More recently, Feyzbakhsh has shown that certain moduli spaces of stable bundles on $X$ are isomorphic to the Brill-Noether locus of curves in $|H|$ if $g$ is sufficiently large. In this paper, we work with irreducible curves in a non-primitive ample linear system $|mH|$ and prove that Mukai's program is valid for any irreducible curve when $g\neq 2$, $mg\geq 11$ and $mg\neq 12$. Furthermore, we introduce the destabilising regions to improve Feyzbakhsh's analysis. We show that there are hyper-Kähler varieties as Brill-Noether loci of curves in every dimension.

preprint2020arXiv

Hyperplane Sections of Hypersurfaces

We compute some numerical invariants of the lines on hyperplane sections of a smooth cubic threefold over complex numbers. We also prove that for any smooth hypersurface $X\subset \mathbb P^{n+1}$ of degree $d$ over an algebraically closed field of characteristic zero, if $d>n>1$ and $(n,d)\neq (2,3),(3,4)$, then a general hyperplane section only admits finitely many others which are isomorphic to it.

preprint2018arXiv

Drawing cone spherical metrics via Strebel differentials

Cone spherical metrics are conformal metrics with constant curvature one and finitely many conical singularities on compact Riemann surfaces. By using Strebel differentials as a bridge, we construct a new class of cone spherical metrics on compact Riemann surfaces by drawing on the surfaces some class of connected metric ribbon graphs.