Source author record

Yitian Chen

Yitian Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

gr-qc Machine Learning Artificial Intelligence astro-ph.CO Computation and Language

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HearSay Benchmark: Do Audio LLMs Leak What They Hear?

While Audio Large Language Models (ALLMs) have achieved remarkable progress in understanding and generation, their potential privacy implications remain largely unexplored. This paper takes the first step to investigate whether ALLMs inadvertently leak user privacy solely through acoustic voiceprints and introduces $\textit{HearSay}$, a comprehensive benchmark constructed from over 22,000 real-world audio clips. To ensure data quality, the benchmark is meticulously curated through a rigorous pipeline involving automated profiling and human verification, guaranteeing that all privacy labels are grounded in factual records. Extensive experiments on $\textit{HearSay}$ yield three critical findings: $\textbf{Significant Privacy Leakage}$: ALLMs inherently extract private attributes from voiceprints, reaching 92.89% accuracy on gender and effectively profiling social attributes. $\textbf{Insufficient Safety Mechanisms}$: Alarmingly, existing safeguards are severely inadequate; most models fail to refuse privacy-intruding requests, exhibiting near-zero refusal rates for physiological traits. $\textbf{Reasoning Amplifies Risk}$: Chain-of-Thought (CoT) reasoning exacerbates privacy risks in capable models by uncovering deeper acoustic correlations. These findings expose critical vulnerabilities in ALLMs, underscoring the urgent need for targeted privacy alignment. The codes and dataset are available at https://github.com/JinWang79/HearSay_Benchmark

preprint2026arXiv

Towards Mitigating Excessive Forgetting in LLM Unlearning via Entanglement-Guidance with Proxy Constraint

Large language models (LLMs) are trained on massive datasets that may include private or copyrighted content. Due to growing privacy and ownership concerns, data owners may request the removal of their data from trained models. Machine unlearning provides a practical solution by removing the influence of specific data without full retraining. However, most existing methods still suffer from over-unlearning due to the lack of a principled mechanism to regulate the forgetting boundary, leading to unnecessary utility degradation and heightened privacy and robustness risks. In this work, we propose EGUP (Entanglement-Guided Unlearning with Proxy Constraint), a novel framework that leverages entanglement and proxy constraint to guide the unlearning process while mitigating over-unlearning. Within each iteration, EGUP employs inter-sample entanglement to adaptively reweight the unlearning strength, assigning greater unlearning efforts to forget samples that are semantically closer to retained knowledge. Across iterations, EGUP leverages intra-sample entanglement to track the representation shift of each forget sample and dynamically adjust its unlearning effort. In addition, we incorporate a proxy constraint that approximates the model's expected outputs after unlearning, forming a reference boundary that softly regularizes the unlearning process. EGUP is compatible with existing gradient-based objectives and serves as a plug-and-play enhancement. We evaluate EGUP on the TOFU and MUSE benchmarks, demonstrating consistent improvements in the unlearning-utility trade-off across multiple LLMs. Moreover, EGUP achieves performance close to the retrained model while remaining scalable and robust.

preprint2024arXiv

Nonlinear Effects In Black Hole Ringdown From Scattering Experiments I: spin and initial data dependence of quadratic mode coupling

We investigate quadratic quasinormal mode coupling in black hole spacetime through numerical simulations of single perturbed black holes using both numerical relativity and second-order black hole perturbation theory. Focusing on the dominant $\ell=|m|=2$ quadrupolar modes, we find good agreement (within $\sim10\%$) between these approaches, with discrepancies attributed to truncation error and uncertainties from mode fitting. Our results align with earlier studies extracting the coupling coefficients from select binary black hole merger simulations, showing consistency for the same remnant spins. Notably, the coupling coefficient is insensitive to a diverse range of initial data, including configurations that led to a significant (up to $5\%$) increase in the remnant black hole mass. These findings present opportunities for testing the nonlinear dynamics of general relativity with ground-based gravitational wave observatories. Lastly, we provide evidence of a bifurcation in coupling coefficients between counter-rotating and co-rotating quasinormal modes as black hole spin increases.

preprint2022arXiv

Multipole moments on the common horizon in a binary-black-hole simulation

We construct the covariantly defined multipole moments on the common horizon of an equal-mass, non-spinning, quasicircular binary-black-hole system. We see a strong correlation between these multipole moments and the gravitational waveform. We find that the multipole moments are well described by the fundamental quasinormal modes at sufficiently late times. For each multipole moment, at least two fundamental modes of different $\ell$ are detectable in the best model. These models provide faithful estimates of the true mass and spin of the remnant black hole. We also show that by including overtones, the $\ell=m=2$ mass multipole moment admits an excellent quasinormal-mode description at all times after the merger. This demonstrates the perhaps surprising power of perturbation theory near the merger.

preprint2020arXiv

Probabilistic Forecasting with Temporal Convolutional Neural Network

We present a probabilistic forecasting framework based on convolutional neural network for multiple related time series forecasting. The framework can be applied to estimate probability density under both parametric and non-parametric settings. More specifically, stacked residual blocks based on dilated causal convolutional nets are constructed to capture the temporal dependencies of the series. Combined with representation learning, our approach is able to learn complex patterns such as seasonality, holiday effects within and across series, and to leverage those patterns for more accurate forecasts, especially when historical data is sparse or unavailable. Extensive empirical studies are performed on several real-world datasets, including datasets from JD.com, China's largest online retailer. The results show that our framework outperforms other state-of-the-art methods in both accuracy and efficiency.