Researcher profile

Yiran Cheng

Yiran Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs

Preserving privacy in sensitive data while pretraining large language models on small, domain-specific corpora presents a significant challenge. In this work, we take an exploratory step toward privacy-preserving continual pretraining by proposing an entity-based framework that synthesizes encrypted training data to protect personally identifiable information (PII). Our approach constructs a weighted entity graph to guide data synthesis and applies deterministic encryption to PII entities, enabling LLMs to encode new knowledge through continual pretraining while granting authorized access to sensitive data through decryption keys. Our results on limited-scale datasets demonstrate that our pretrained models outperform base models and ensure PII security, while exhibiting a modest performance gap compared to models trained on unencrypted synthetic data. We further show that increasing the number of entities and leveraging graph-based synthesis improves model performance, and that encrypted models retain instruction-following capabilities with long retrieved contexts. We discuss the security implications and limitations of deterministic encryption, positioning this work as an initial investigation into the design space of encrypted data pretraining for privacy-preserving LLMs. Our code is available at https://github.com/DataArcTech/SoE.

preprint2022arXiv

Mukai's program for non-primitive curves on K3 surfaces

Mukai's program seeks to recover a K3 surface $X$ from any curve $C$ on it by exhibiting it as a Fourier-Mukai partner to a Brill-Noether locus of vector bundles on the curve. In the case $X$ has Picard number one and the curve $C\in |H|$ is primitive, this was confirmed by Feyzbakhsh for $g\geq 11$ and $g\neq 12$. More recently, Feyzbakhsh has shown that certain moduli spaces of stable bundles on $X$ are isomorphic to the Brill-Noether locus of curves in $|H|$ if $g$ is sufficiently large. In this paper, we work with irreducible curves in a non-primitive ample linear system $|mH|$ and prove that Mukai's program is valid for any irreducible curve when $g\neq 2$, $mg\geq 11$ and $mg\neq 12$. Furthermore, we introduce the destabilising regions to improve Feyzbakhsh's analysis. We show that there are hyper-Kähler varieties as Brill-Noether loci of curves in every dimension.

preprint2020arXiv

Hyperplane Sections of Hypersurfaces

We compute some numerical invariants of the lines on hyperplane sections of a smooth cubic threefold over complex numbers. We also prove that for any smooth hypersurface $X\subset \mathbb P^{n+1}$ of degree $d$ over an algebraically closed field of characteristic zero, if $d>n>1$ and $(n,d)\neq (2,3),(3,4)$, then a general hyperplane section only admits finitely many others which are isomorphic to it.