Source author record

Shi Feng

Shi Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence cond-mat.str-el Machine Learning

Catalog footprint

What is connected

9works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning

Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning. In this paper, we uncover a critical mechanism underlying safety degradation by analyzing parameter dynamics, where benign fine-tuning causes parameters to cumulatively drift toward danger-aligned directions, progressively undermining the model's safety. This finding suggests that samples contributing more to this drift has greater fine-tuning risks. Based on this insight, we propose a method of Sample-Level Quantification of Safety Degradation (SQSD), which quantifies the influence of each training sample on safety degradation. Specifically, SQSD computes continuous risk scores to samples by measuring their induced parameter updates' projection difference between danger and safety directions. Extensive experiments across multiple models and datasets demonstrate that SQSD effectively quantifies sample-level fine-tuning risks and exhibits strong transferability across model architectures, parameter scales, and parameter-efficient methods.

preprint2022arXiv

Gapless to gapless phase transitions in quantum spin chains

We investigate spin chains with bilinear-biquadratic (BLBQ) spin interactions as a function of an applied magnetic field $h$. At the Uimin-Lai-Sutherland (ULS) critical point we find a gapless to gapless transition revealed by the dynamical structure factor $S(q,ω)$ as a function of $h$. At $h=0$, the envelope of the lowest energy excitations goes soft at {\it two points} $q_1=2π/3$ and $q_2=4π/3$, dubbed the phase A. With increasing field, the spectral peaks at each of the gapless points bifurcate, making in total {\it four} soft modes, and combine to form a new set of excitations that soften at a {\it single} point $q=π$ at $h_{c1}\approx 0.94$. Beyond $h_{c1}$ the system enters another gapless B-phase until the transition at $h_{c2}=4$ to the fully polarized phase. We compare the ULS model results with those for the AKLT model as a representative of gapped Haldane phase. We explain the mechanism of the gapless to gapless transition in the ULS model using its conserved charges and a spinon band picture. We also discuss the universality of central charges of the BLBQ family of models subjected to a magnetic field.

preprint2022arXiv

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation

Building dialogue generation systems in a zero-shot scenario remains a huge challenge, since the typical zero-shot approaches in dialogue generation rely heavily on large-scale pre-trained language generation models such as GPT-3 and T5. The research on zero-shot dialogue generation without cumbersome language models is limited due to lacking corresponding parallel dialogue corpora. In this paper, we propose a simple but effective Multilingual learning framework for Zero-shot Dialogue Generation (dubbed as MulZDG) that can effectively transfer knowledge from an English corpus with large-scale training samples to a non-English corpus with zero samples. Besides, MulZDG can be viewed as a multilingual data augmentation method to improve the performance of the resource-rich language. First, we construct multilingual code-switching dialogue datasets via translation utterances randomly selected from monolingual English datasets. Then we employ MulZDG to train a unified multilingual dialogue model based on the code-switching datasets. The MulZDG can conduct implicit semantic alignment between different languages. Experiments on DailyDialog and DSTC7 datasets demonstrate that MulZDG not only achieve competitive performance under zero-shot case compared to training with sufficient examples but also greatly improve the performance of the source language.

preprint2021arXiv

A Graph Reasoning Network for Multi-turn Response Selection via Customized Pre-training

We investigate response selection for multi-turn conversation in retrieval-based chatbots. Existing studies pay more attention to the matching between utterances and responses by calculating the matching score based on learned features, leading to insufficient model reasoning ability. In this paper, we propose a graph-reasoning network (GRN) to address the problem. GRN first conducts pre-training based on ALBERT using next utterance prediction and utterance order prediction tasks specifically devised for response selection. These two customized pre-training tasks can endow our model with the ability of capturing semantical and chronological dependency between utterances. We then fine-tune the model on an integrated network with sequence reasoning and graph reasoning structures. The sequence reasoning module conducts inference based on the highly summarized context vector of utterance-response pairs from the global perspective. The graph reasoning module conducts the reasoning on the utterance-level graph neural network from the local perspective. Experiments on two conversational reasoning datasets show that our model can dramatically outperform the strong baseline methods and can achieve performance which is close to human-level.

preprint2021arXiv

Quizbowl: The Case for Incremental Question Answering

Scholastic trivia competitions test knowledge and intelligence through mastery of question answering. Modern question answering benchmarks are one variant of the Turing test. Specifically, answering a set of questions as well as a human is a minimum bar towards demonstrating human-like intelligence. This paper makes the case that the format of one competition -- where participants can answer in the middle of hearing a question (incremental) -- better differentiates the skill between (human or machine) players. Additionally, merging a sequential decision-making sub-task with question answering (QA) provides a good setting for research in model calibration and opponent modeling. Thus, embedded in this task are three machine learning challenges: (1) factoid QA over thousands of Wikipedia-like answers, (2) calibration of the QA model's confidence scores, and (3) sequential decision-making that incorporates knowledge of the QA model, its calibration, and what the opponent may do. We make two contributions: (1) collecting and curating a large factoid QA dataset and an accompanying gameplay dataset, and (2) developing a model that addresses these three machine learning challenges. In addition to offline evaluation, we pitted our model against some of the most accomplished trivia players in the world in a series of exhibition matches spanning several years. Throughout this paper, we show that collaborations with the vibrant trivia community have contributed to the quality of our dataset, spawned new research directions, and doubled as an exciting way to engage the public with research in machine learning and natural language processing.

preprint2021arXiv

Universal Adversarial Triggers for Attacking and Analyzing NLP

Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction. For example, triggers cause SNLI entailment accuracy to drop from 89.94% to 0.55%, 72% of "why" questions in SQuAD to be answered "to kill american people", and the GPT-2 language model to spew racist output even when conditioned on non-racial contexts. Furthermore, although the triggers are optimized using white-box access to a specific model, they transfer to other models for all tasks we consider. Finally, since triggers are input-agnostic, they provide an analysis of global model behavior. For instance, they confirm that SNLI models exploit dataset biases and help to diagnose heuristics learned by reading comprehension models.

preprint2019arXiv

Magnetic phase transitions in quantum spin-orbital liquids

We investigate the spin and orbital correlations of a superexchange model with spin $S=1$ and orbital $L=1$ relevant for $5d^4$ transition metal Mott insulators, using exact diagonalization and density matrix renormalization group (DMRG). For spin-orbit coupling $λ=0$, the orbitals are in an entangled state that is decoupled from the spins. We find two phases with increasing $λ$: (I) the S2 phase with two peaks in the structure factor for $λ\leλ_{c1}\approx 0.34 J$ where $J$ is the ferromagnetic exchange; and, (II) the $S1$ phase for $λ_{c1}<λ\leλ_{c2}\approx 1.2 J$ with emergent antiferromagnetic correlations. Both S1 and S2 phases are shown to exhibit power law correlations, indicative of a gapless spectrum. Upon increasing $λ> λ_{c2}$ leads to a product state of local spin-orbital singlets that exhibit exponential decay of correlations, indicative of a gapped phase. We obtain insights into the phases from the well-known Uimin-Lai-Sutherland (ULS) model in an external field that provides an approximate description of our model within mean field theory.

preprint2018arXiv

Pathologies of Neural Models Make Interpretations Difficult

One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word. To understand the limitations of these methods, we use input reduction, which iteratively removes the least important word from the input. This exposes pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods. As we confirm with human experiments, the reduced examples lack information to support the prediction of any label, but models still make the same predictions with high confidence. To explain these counterintuitive results, we draw connections to adversarial examples and confidence calibration: pathological behaviors reveal difficulties in interpreting neural models trained with maximum likelihood. To mitigate their deficiencies, we fine-tune the models by encouraging high entropy outputs on reduced examples. Fine-tuned models become more interpretable under input reduction without accuracy loss on regular examples.

preprint2016arXiv

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model

Neural machine translation has shown very promising results lately. Most NMT models follow the encoder-decoder framework. To make encoder-decoder models more flexible, attention mechanism was introduced to machine translation and also other tasks like speech recognition and image captioning. We observe that the quality of translation by attention-based encoder-decoder can be significantly damaged when the alignment is incorrect. We attribute these problems to the lack of distortion and fertility models. Aiming to resolve these problems, we propose new variations of attention-based encoder-decoder and compare them with other models on machine translation. Our proposed method achieved an improvement of 2 BLEU points over the original attention-based encoder-decoder.

Shi Feng

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning

Gapless to gapless phase transitions in quantum spin chains

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation

A Graph Reasoning Network for Multi-turn Response Selection via Customized Pre-training

Quizbowl: The Case for Incremental Question Answering

Universal Adversarial Triggers for Attacking and Analyzing NLP

Magnetic phase transitions in quantum spin-orbital liquids

Pathologies of Neural Models Make Interpretations Difficult

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model