Researcher profile

Wenjie Tang

Wenjie Tang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction

In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the continuously expanding and evolving patterns in streaming traffic networks. To address this challenge, we propose a simple yet efficient dual-branch continual learning framework for traffic prediction, named CoMemNet. The fast-converging Online branch undertakes the primary prediction tasks, while the momentum-updated Target branch extracts historical information using Wasserstein Distance features to create a Dynamic Contrastive Sampler (DC Sampler). This sampler selects a node set with significant dynamic network feature changes for training, effectively mitigating the issue of catastrophic forgetting. Additionally, the backbone incorporates a lightweight Node-Adaptive Temporal Memory Buffer (TMRB-N) to consolidate old knowledge through memory replay and address the risk of memory explosion. Finally, we provide two newly curated open-source datasets. Experimental results demonstrate that CoMemNet achieves state-of-the-art (SOTA) performance across all three large-scale real-world datasets. The code is available at: https://github.com/meiwu5/CoMemNet.

preprint2026arXiv

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

Reinforcement learning from verifiable rewards (RLVR) is a promising paradigm for improving large language model (LLM) agents on long-horizon interactive tasks. However, in partially observable environments, incomplete observations cause agent beliefs to drift over time, while delayed rewards obscure the causal impact of intermediate decisions, exacerbating temporal credit assignment challenges. To address this, we propose ReBel (Reward Belief), a process-level reinforcement learning algorithm that explicitly models structured belief states to summarize interaction history and guide subsequent policy learning. ReBel introduces belief-consistency supervision, converting discrepancies between predicted beliefs and observed feedback into dense self-supervised signals without requiring external step-wise annotations or verifiers. It also employs belief-aware grouping to compare trajectories under similar belief states, yielding more robust and lower-variance advantage estimates. We evaluate ReBel on challenging long-horizon benchmarks, including ALFWorld and WebShop. ReBel improves task success by up to $20.4$ percentage points over the episode-level baseline GRPO and increases sample efficiency by $2.1\times$. These results suggest that belief-aware self-supervision is a promising direction for reliable long-horizon decision-making under partial observability. Code is available at: https://github.com/Fateyetian/Rebel.git.