Source author record

Yunpu Ma

Yunpu Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computer Vision Information Retrieval Multiagent Systems quant-ph

Catalog footprint

What is connected

13works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).

preprint2026arXiv

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

Visual instruction tuning adapts pre-trained Multimodal Large Language Models (MLLMs) to follow human instructions for real-world applications. However, the rapid growth of these datasets introduces significant redundancy, leading to increased computational costs. Existing methods for selecting instruction data aim to prune this redundancy, but predominantly rely on computationally demanding techniques such as proxy-based inference or training-based metrics. Consequently, the substantial computational costs incurred by these selection processes often exacerbate the very efficiency bottlenecks they are intended to resolve, posing a significant challenge to the scalable and effective tuning of MLLMs. To address this challenge, we first identify a critical, yet previously overlooked, factor: the anisotropy inherent in visual feature distributions. We find that this anisotropy induces a \textit{Global Semantic Drift}, and overlooking this phenomenon is a key factor limiting the efficiency of current data selection methods. Motivated by this insight, we devise \textbf{PRISM}, the first training-free framework for efficient visual instruction selection. PRISM surgically removes the corrupting influence of global background features by modeling the intrinsic visual semantics via implicit re-centering. Empirically, PRISM reduces the end-to-end time for data selection and model tuning to just 30\% of conventional pipelines. More remarkably, it achieves this efficiency while simultaneously enhancing performance, surpassing models fine-tuned on the full dataset across eight multimodal and three language understanding benchmarks, culminating in a 101.7\% relative improvement over the baseline. The code is available for access via \href{https://github.com/bibisbar/PRISM}{this repository}.

preprint2026arXiv

Select to Think: Unlocking SLM Potential with Local Sufficiency

Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls introduce substantial latency and costs. Alternatively, standard distillation is often hindered by the capacity limitation, as SLMs struggle to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token consistently resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose SELECT TO THINK (S2T), which reframes the LLM's role from open-ended generation to selection among the SLM's proposals, simplifying the supervision signal to discrete candidate rankings. Leveraging this, we introduce S2T-LOCAL, which distills the selection logic into the SLM, empowering it to perform autonomous re-ranking without inference-time LLM dependency. Empirically, we demonstrate that a 1.5B SLM's top-8 candidates capture the 32B LLM's choice with 95% hit rate. Translating this potential into performance, S2T-LOCAL improves greedy decoding by 24.1% on average across benchmarks, effectively matching the efficacy of 8-path self-consistency while operating with single-trajectory efficiency.

preprint2024arXiv

Differentiable Quantum Architecture Search For Job Shop Scheduling Problem

The Job shop scheduling problem (JSSP) plays a pivotal role in industrial applications, such as signal processing (SP) and steel manufacturing, involving sequencing machines and jobs to maximize scheduling efficiency. Before, JSSP was solved using manually defined circuits by variational quantum algorithm (VQA). Finding a good circuit architecture is task-specific and time-consuming. Differentiable quantum architecture search (DQAS) is a gradient-based framework that can automatically design circuits. However, DQAS is only tested on quantum approximate optimization algorithm (QAOA) and error mitigation tasks. Whether DQAS applies to JSSP based on a more flexible algorithm, such as variational quantum eigensolver (VQE), is still open for optimization problems. In this work, we redefine the operation pool and extend DQAS to a framework JSSP-DQAS by evaluating circuits to generate circuits for JSSP automatically. The experiments conclude that JSSP-DQAS can automatically find noise-resilient circuit architectures that perform much better than manually designed circuits. It helps to improve the efficiency of solving JSSP.

preprint2022arXiv

APPTeK: Agent-Based Predicate Prediction in Temporal Knowledge Graphs

In temporal Knowledge Graphs (tKGs), the temporal dimension is attached to facts in a knowledge base resulting in quadruples between entities such as (Nintendo, released, Super Mario, Sep-13-1985), where the predicate holds within a time interval or at a timestamp. We propose a reinforcement learning agent gathering temporal relevant information about the query entities' neighborhoods, simultaneously. We refer to the encodings of the explored graph structures as fingerprints which are used as input to a Q-network. Our agent decides sequentially which relation type needs to be explored next to expand the local subgraphs of the query entities. Our evaluation shows that the proposed method yields competitive results compared to state-of-the-art embedding algorithms for tKGs, and we additionally gain information about the relevant structures between subjects and objects.

preprint2022arXiv

Open-domain Dialogue Generation Grounded with Dynamic Multi-form Knowledge Fusion

Open-domain multi-turn conversations normally face the challenges of how to enrich and expand the content of the conversation. Recently, many approaches based on external knowledge are proposed to generate rich semantic and information conversation. Two types of knowledge have been studied for knowledge-aware open-domain dialogue generation: structured triples from knowledge graphs and unstructured texts from documents. To take both advantages of abundant unstructured latent knowledge in the documents and the information expansion capabilities of the structured knowledge graph, this paper presents a new dialogue generation model, Dynamic Multi-form Knowledge Fusion based Open-domain Chatt-ing Machine (DMKCM).In particular, DMKCM applies an indexed text (a virtual Knowledge Base) to locate relevant documents as 1st hop and then expands the content of the dialogue and its 1st hop using a commonsense knowledge graph to get apposite triples as 2nd hop. To merge these two forms of knowledge into the dialogue effectively, we design a dynamic virtual knowledge selector and a controller that help to enrich and expand knowledge space. Moreover, DMKCM adopts a novel dynamic knowledge memory module that effectively uses historical reasoning knowledge to generate better responses. Experimental results indicate the effectiveness of our method in terms of dialogue coherence and informativeness.

preprint2022arXiv

Temporal Knowledge Graph Forecasting with Neural ODE

There has been an increasing interest in inferring future links on temporal knowledge graphs (KG). While links on temporal KGs vary continuously over time, the existing approaches model the temporal KGs in discrete state spaces. To this end, we propose a novel continuum model by extending the idea of neural ordinary differential equations (ODEs) to multi-relational graph convolutional networks. The proposed model preserves the continuous nature of dynamic multi-relational graph data and encodes both temporal and structural information into continuous-time dynamic embeddings. In addition, a novel graph transition layer is applied to capture the transitions on the dynamic graph, i.e., edge formation and dissolution. We perform extensive experiments on five benchmark datasets for temporal KG reasoning, showing our model's superior performance on the future link forecasting task.

preprint2022arXiv

TLogic: Temporal Logical Rules for Explainable Link Forecasting on Temporal Knowledge Graphs

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting -- event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

preprint2021arXiv

Contrastive Learning for Recommender System

Recommender systems, which analyze users' preference patterns to suggest potential targets, are indispensable in today's society. Collaborative Filtering (CF) is the most popular recommendation model. Specifically, Graph Neural Network (GNN) has become a new state-of-the-art for CF. In the GNN-based recommender system, message dropout is usually used to alleviate the selection bias in the user-item bipartite graph. However, message dropout might deteriorate the recommender system's performance due to the randomness of dropping out the outgoing messages based on the user-item bipartite graph. To solve this problem, we propose a graph contrastive learning module for a general recommender system that learns the embeddings in a self-supervised manner and reduces the randomness of message dropout. Besides, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Pairwise Ranking (BPR) based on a negative sampling strategy. However, BPR has the following problems: suboptimal sampling and sample bias. We introduce a new debiased contrastive loss to solve these problems, which provides sufficient negative samples and applies a bias correction probability to alleviate the sample bias. We integrate the proposed framework, including graph contrastive module and debiased contrastive module with several Matrix Factorization(MF) and GNN-based recommendation models. Experimental results on three public benchmarks demonstrate the effectiveness of our framework.

preprint2020arXiv

Debate Dynamics for Human-comprehensible Fact-checking on Knowledge Graphs

We propose a novel method for fact-checking on knowledge graphs based on debate dynamics. The underlying idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to justify the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, referred to as the judge, decides whether the fact is true or false. The two agents can be considered as sparse feature extractors that present interpretable evidence for either the thesis or the antithesis. In contrast to black-box methods, the arguments enable the user to gain an understanding for the decision of the judge. Moreover, our method allows for interactive reasoning on knowledge graphs where the users can raise additional arguments or evaluate the debate taking common sense reasoning and external information into account. Such interactive systems can increase the acceptance of various AI applications based on knowledge graphs and can further lead to higher efficiency, robustness, and fairness.

preprint2020arXiv

Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs

The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes process to a neurally self-modulating multivariate point process, which enables the capturing of more complex and realistic impacts of past events on future events. However, this approach is limited by the number of possible event types, making it impossible to model the dynamics of evolving graph sequences, where each possible link between two nodes can be considered as an event type. The number of event types increases even further when links are directional and labeled. To address this issue, we propose the Graph Hawkes Neural Network that can capture the dynamics of evolving graph sequences and can predict the occurrence of a fact in a future time instance. Extensive experiments on large-scale temporal multi-relational databases, such as temporal knowledge graphs, demonstrate the effectiveness of our approach.

preprint2020arXiv

Reasoning on Knowledge Graphs with Debate Dynamics

We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.

preprint2020arXiv

The Tensor Brain: Semantic Decoding for Perception and Memory

We analyse perception and memory, using mathematical models for knowledge graphs and tensors, to gain insights into the corresponding functionalities of the human mind. Our discussion is based on the concept of propositional sentences consisting of \textit{subject-predicate-object} (SPO) triples for expressing elementary facts. SPO sentences are the basis for most natural languages but might also be important for explicit perception and declarative memories, as well as intra-brain communication and the ability to argue and reason. A set of SPO sentences can be described as a knowledge graph, which can be transformed into an adjacency tensor. We introduce tensor models, where concepts have dual representations as indices and associated embeddings, two constructs we believe are essential for the understanding of implicit and explicit perception and memory in the brain. We argue that a biological realization of perception and memory imposes constraints on information processing. In particular, we propose that explicit perception and declarative memories require a semantic decoder, which, in a simple realization, is based on four layers: First, a sensory memory layer, as a buffer for sensory input, second, an index layer representing concepts, third, a memoryless representation layer for the broadcasting of information ---the "blackboard", or the "canvas" of the brain--- and fourth, a working memory layer as a processing center and data buffer. We discuss the operations of the four layers and relate them to the global workspace theory. In a Bayesian brain interpretation, semantic memory defines the prior for observable triple statements. We propose that ---in evolution and during development--- semantic memory, episodic memory, and natural language evolved as emergent properties in agents' process to gain a deeper understanding of sensory information.

Yunpu Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

Select to Think: Unlocking SLM Potential with Local Sufficiency

Differentiable Quantum Architecture Search For Job Shop Scheduling Problem

APPTeK: Agent-Based Predicate Prediction in Temporal Knowledge Graphs

Open-domain Dialogue Generation Grounded with Dynamic Multi-form Knowledge Fusion

Temporal Knowledge Graph Forecasting with Neural ODE

TLogic: Temporal Logical Rules for Explainable Link Forecasting on Temporal Knowledge Graphs

Contrastive Learning for Recommender System

Debate Dynamics for Human-comprehensible Fact-checking on Knowledge Graphs

Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs

Reasoning on Knowledge Graphs with Debate Dynamics

The Tensor Brain: Semantic Decoding for Perception and Memory