Source author record

Peng Fu

Peng Fu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Logic in Computer Science Computation and Language Machine Learning Computer Vision eess.IV math.CT Programming Languages

Catalog footprint

What is connected

11works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Co-Evolving Policy Distillation

RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first training experts and then performing OPD, though avoiding divergence, fails to fully absorb teacher capabilities due to large behavioral pattern gaps between teacher and student. We propose Co-Evolving Policy Distillation (CoPD), which encourages parallel training of experts and introduces OPD during each expert's ongoing RLVR training rather than after complete expert training, with experts serving as mutual teachers (making OPD bidirectional) to co-evolve. This enables more consistent behavioral patterns among experts while maintaining sufficient complementary knowledge throughout. Experiments validate that CoPD achieves all-in-one integration of text, image, and video reasoning capabilities, significantly outperforming strong baselines such as mixed RLVR and MOPD, and even surpassing domain-specific experts. The model parallel training pattern offered by CoPD may inspire a novel training scaling paradigm.

preprint2022arXiv

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.

preprint2022arXiv

Neutral Utterances are Also Causes: Enhancing Conversational Causal Emotion Entailment with Social Commonsense Knowledge

Conversational Causal Emotion Entailment aims to detect causal utterances for a non-neutral targeted utterance from a conversation. In this work, we build conversations as graphs to overcome implicit contextual modelling of the original entailment style. Following the previous work, we further introduce the emotion information into graphs. Emotion information can markedly promote the detection of causal utterances whose emotion is the same as the targeted utterance. However, it is still hard to detect causal utterances with different emotions, especially neutral ones. The reason is that models are limited in reasoning causal clues and passing them between utterances. To alleviate this problem, we introduce social commonsense knowledge (CSK) and propose a Knowledge Enhanced Conversation graph (KEC). KEC propagates the CSK between two utterances. As not all CSK is emotionally suitable for utterances, we therefore propose a sentiment-realized knowledge selecting strategy to filter CSK. To process KEC, we further construct the Knowledge Enhanced Directed Acyclic Graph networks. Experimental results show that our method outperforms baselines and infers more causes with different emotions from the targeted utterance.

preprint2022arXiv

On the Lambek embedding and the category of product-preserving presheaves

It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits, while still enjoying most of the other useful properties of the Yoneda embedding. We call this modified embedding the Lambek embedding. The category of limit-preserving functors is known to be a reflective subcategory of the category of all functors, i.e., there is a left adjoint for the inclusion functor. In the literature, the existence of this left adjoint is often proved non-constructively, e.g., by an application of Freyd's adjoint functor theorem. In this paper, we provide an alternative, more constructive proof of this fact. We first explain the Lambek embedding and why it preserves coproducts. Then we review some concepts from multi-sorted algebras and observe that there is a one-to-one correspondence between product-preserving presheaves and certain multi-sorted term algebras. We provide a construction that freely turns any presheaf functor into a product-preserving one, hence giving an explicit definition of the left adjoint functor of the inclusion. Finally, we sketch how to extend our method to prove that the subcategory of limit-preserving functors is also reflective.

preprint2020arXiv

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers' interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.

preprint2020arXiv

A lateral semicircular canal segmentation based geometric calibration for human temporal bone CT Image

Computed Tomography (CT) of the temporal bone has become an important method for diagnosing ear diseases. Due to the different posture of the subject and the settings of CT scanners, the CT image of the human temporal bone should be geometrically calibrated to ensure the symmetry of the bilateral anatomical structure. Manual calibration is a time-consuming task for radiologists and an important pre-processing step for further computer-aided CT analysis. We propose an automatic calibration algorithm for temporal bone CT images. The lateral semicircular canals (LSCs) are segmented as anchors at first. Then, we define a standard 3D coordinate system. The key step is the LSC segmentation. We design a novel 3D LSC segmentation encoder-decoder network, which introduces a 3D dilated convolution and a multi-pooling scheme for feature fusion in the encoding stage. The experimental results show that our LSC segmentation network achieved a higher segmentation accuracy. Our proposed method can help to perform calibration of temporal bone CT images efficiently.

preprint2016arXiv

Operational Semantics of Resolution and Productivity in Horn Clause Logic

This paper presents a study of operational and type-theoretic properties of different resolution strategies in Horn clause logic. We distinguish four different kinds of resolution: resolution by unification (SLD-resolution), resolution by term-matching, the recently introduced structural resolution, and partial (or lazy) resolution. We express them all uniformly as abstract reduction systems, which allows us to undertake a thorough comparative analysis of their properties. To match this small-step semantics, we propose to take Howard's System H as a type-theoretic semantic counterpart. Using System H, we interpret Horn formulas as types, and a derivation for a given formula as the proof term inhabiting the type given by the formula. We prove soundness of these abstract reduction systems relative to System H, and we show completeness of SLD-resolution and structural resolution relative to System H. We identify conditions under which structural resolution is operationally equivalent to SLD-resolution. We show correspondence between term-matching resolution for Horn clause programs without existential variables and term rewriting.

preprint2015arXiv

A Type-Theoretic Approach to Resolution

We propose a new type-theoretic approach to SLD-resolution and Horn-clause logic programming. It views Horn formulas as types, and derivations for a given query as a construction of the inhabitant (a proof-term) for the type given by the query. We propose a method of program transformation that allows to transform logic programs in such a way that proof evidence is computed alongside SLD-derivations. We discuss two applications of this approach: in recently proposed productivity theory of structural resolution, and in type class inference.

preprint2015arXiv

A Type-Theoretic Approach to Structural Resolution

Structural resolution (or S-resolution) is a newly proposed alternative to SLD-resolution that allows a systematic separation of derivations into term-matching and unification steps. Productive logic programs are those for which term-matching reduction on any query must terminate. For productive programs with coinductive meaning, finite term-rewriting reductions can be seen as measures of observation in an infinite derivation. Ability of handling corecursion in a productive way is an attractive computational feature of S-resolution. In this paper, we make first steps towards a better conceptual understanding of operational properties of S-resolution as compared to SLD-resolution. To this aim, we propose a type system for the analysis of both SLD-resolution and S-resolution. We formulate S-resolution and SLD-resolution as reduction systems, and show their soundness relative to the type system. One of the central methods of this paper is realizability transformation, which makes logic programs productive and non-overlapping. We show that S-resolution and SLD-resolution are only equivalent for programs with these two properties.

preprint2015arXiv

Proof Relevant Corecursive Resolution

Resolution lies at the foundation of both logic programming and type class context reduction in functional languages. Terminating derivations by resolution have well-defined inductive meaning, whereas some non-terminating derivations can be understood coinductively. Cycle detection is a popular method to capture a small subset of such derivations. We show that in fact cycle detection is a restricted form of coinductive proof, in which the atomic formula forming the cycle plays the role of coinductive hypothesis. This paper introduces a heuristic method for obtaining richer coinductive hypotheses in the form of Horn formulas. Our approach subsumes cycle detection and gives coinductive meaning to a larger class of derivations. For this purpose we extend resolution with Horn formula resolvents and corecursive evidence generation. We illustrate our method on non-terminating type class resolution problems.

preprint2012arXiv

Irrelevance, Heterogeneous Equality, and Call-by-value Dependent Type Systems

We present a full-spectrum dependently typed core language which includes both nontermination and computational irrelevance (a.k.a. erasure), a combination which has not been studied before. The two features interact: to protect type safety we must be careful to only erase terminating expressions. Our language design is strongly influenced by the choice of CBV evaluation, and by our novel treatment of propositional equality which has a heterogeneous, completely erased elimination form.

Peng Fu

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Co-Evolving Policy Distillation

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

Neutral Utterances are Also Causes: Enhancing Conversational Causal Emotion Entailment with Social Commonsense Knowledge

On the Lambek embedding and the category of product-preserving presheaves

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

A lateral semicircular canal segmentation based geometric calibration for human temporal bone CT Image

Operational Semantics of Resolution and Productivity in Horn Clause Logic

A Type-Theoretic Approach to Resolution

A Type-Theoretic Approach to Structural Resolution

Proof Relevant Corecursive Resolution

Irrelevance, Heterogeneous Equality, and Call-by-value Dependent Type Systems