Workspace feed

Open the papers and discussions carrying the strongest momentum right now

Signed-out visitors get a high-signal public ranking pass. Sign in to unlock following mode and behavior-aware personalization.

Following mode requires authentication, so the page fell back to Trends.

8papers in this pass
0followed authors in graph
0saved works shaping context
0explicit preference signals

Ranking snapshot

Why these papers surface first

Ranking balances graph proximity, likes, dislikes, trust, reviews, citations, recency and anti-repeat controls.

Ranking pass feed-v432 candidates0 personalized item(s)Global best-of-streamTrust 12

Topics in this pass

What the feed is orbiting around

Personal workspace

Create an account to turn this public feed into your research feed

Signed-out visitors can browse momentum, but an account lets BZPEER learn from opens, saves, follows, votes and comments so recommendations fit your actual research path.

Unlocks after sign up

What this account makes available

Trending now

8 ranked item(s)

Rank #1preprint2026arXiv

The Abel--Jacobi map over the twistor-$\mathbb{P}^1$ and real local class field theory

We study the Abel--Jacobi map over the twistor-$\mathbb{P}^1$ in the context of Scholze's geometrisation of the real local Langlands correspondence. In a similar spirit to a result of Fargues over the Fargues--Fontaine curve, we prove that pullback along the Abel--Jacobi map induces an equivalence on Picard groupoids and use this to recover local class field theory for archimedean local fields.

Score 38.3
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+17.1Catalog momentum+1.7Reason fit+1.5Research signal
0Citations0Reviews+0Signal17.1Trend0Trust
Rank #2preprint2026arXiv

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

Autonomous agents have rapidly matured as task executors and seen widespread deployment via harnesses such as OpenClaw. Safety concerns have rightly drawn growing research attention, and beneath them lie the values silently steering agent behavior. Existing value benchmarks, however, remain confined to LLMs, leaving agent values largely uncharted. From intuitive, empirical, and theoretical vantage points, we show that an agent's values diverge from those of its underlying LLM, and the agentic modality further introduces dataset-, evaluation-, and system-level challenges absent from text-only protocols. We close this gap with Agent-ValueBench, the first benchmark dedicated to agent values. It features 394 executable environments across 16 domains, offering 4,335 value-conflict tasks that cover 28 value systems and 332 dimensions. Every instance is co-synthesized through our purpose-built end-to-end pipeline and curated per-instance by professional psychologists. Each task ships with two pole-aligned golden trajectories whose checkpoints anchor a trajectory-level rubric-based judge. Benchmarking 14 frontier proprietary and open-weights models across 4 mainstream harnesses, we uncover three concerted findings. Agent values first manifest as a Value Tide of cross-model homogeneity beneath interpretable counter-currents. This tide bends non-additively under harness pull, and yet more decisively under deliberate steering via embedded skills. Together these results signal that the agent-alignment lever is shifting from classical model alignment and prompt steering toward harness alignment and skill steering.

Score 36.9
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+15.8Catalog momentum+1.6Reason fit+1.5Research signal
0Citations0Reviews+0Signal15.8Trend0Trust
Rank #3preprint2026arXiv

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

Mixture policies theoretically offer greater flexibility than unimodal policies in continuous action reinforcement learning, but the practical benefits of this complexity remain elusive. Mixture policies are notably absent from most state-of-the-art algorithms, raising a fundamental question: Is the added representational overhead useful? We show that increased flexibility can theoretically enhance solution quality and entropy robustness. Yet standard algorithms like SAC do not leverage these advantages. A core issue is the lack of a low-variance reparameterization trick for mixtures, a luxury Gaussian policies enjoy. We propose a marginalized reparameterization (MRP) estimator to address this, proving it offers lower variance than the standard likelihood-ratio (LR) approach. Our experiments across Gym MuJoCo, DeepMind Control Suite, and MetaWorld show that MRP mixture policies significantly outperform their LR ones, and reach parity (sometimes better) with Gaussian counterparts. In addition, we do find several cases where MRP mixture policies exhibit clear empirical advantages. In this paper, we provide a clearer understanding of the trade-offs involved, elevating MRP mixture policies from theoretical curiosity to a practical tool.

Score 36.9
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+15.8Catalog momentum+1.6Reason fit+1.5Research signal
0Citations0Reviews+0Signal15.8Trend0Trust
Rank #4preprint2026arXiv

Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning

Unified Multimodal Models (UMMs) excel in general tasks but struggle to bridge the gap between personalized understanding and generation. Prior works largely rely on implicit token-level alignment via supervised fine-tuning, which fails to fully capture the potential synergy between comprehension and creation. In this work, we propose Sync-R1, an end-to-end reinforcement learning framework that jointly optimizes personalized understanding and generation within a single, explicit reasoning loop. Through this unified feedback process, Sync-R1 enables personalized comprehension to guide content creation, while the resulting generation quality reciprocally refines understanding within an integrated reward landscape. To efficiently orchestrate this dual-task synergy, we introduce Sync-GRPO, a reinforcement learning method utilizing an ensemble reward system. Furthermore, we propose Dynamic Group Scaling (DGS), which adaptively filters low-potential trajectories to reduce gradient variance and accelerate convergence. To better reflect real-world complexity, we introduce UnifyBench++, featuring denser textual descriptions and richer user contexts. Experimental results demonstrate that Sync-R1 achieves state-of-the-art performance, showcasing superior cross-task reasoning and robust personalization without requiring complex cold-start procedures. The code and the UnifyBench++ dataset will be released at: https://github.com/arctanxarc/UniCTokens.

Score 36.9
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+15.8Catalog momentum+1.6Reason fit+1.5Research signal
0Citations0Reviews+0Signal15.8Trend0Trust
Rank #5preprint2026arXiv

Toward an Origin of Human Randomness: Interaction-Driven Enhancement in the Rock-Paper-Scissors Game

Human-generated randomness is constrained by cognitive, motor, and strategic biases. This study examines how these constraints appear in individual behavior and how they may be modified through interaction with another human. We analyzed repeated rock-paper-scissors data from 9 participants, yielding 108 human-human matches and 216 individual player sequences. Using Lempel-Ziv complexity (LZC), we compared human-human sequences with the RNG-opponent condition. In the RNG-opponent condition, the maximum human LZC value was 84, which we used as an empirical reference. In the human-human condition, most sequences remained below this value, but a small number exceeded it, producing a small high-complexity tail that was not present in the RNG-opponent condition. We introduced a sensitivity measure that captures whether a player responds to the opponent's recent frequency bias by choosing the move that beats the opponent's most frequent recent move. Partial regression showed that focal-player sensitivity positively predicted future entropy in the opponent's move sequence after controlling for the opponent's current entropy. Circular-shift surrogate analyses indicated that this relation was most clearly interaction-specific when the opponent was in a low-entropy state, where the recent move distribution contained a clear frequency bias. These results suggest that human randomness is not only an isolated individual capacity, but can be shaped by interaction in a state-dependent manner. The findings identify a local mechanism by which interaction may destabilize biased behavior and increase entropy, providing a concrete basis for future causal experiments and generative models of high-complexity human behavior.

Score 35.3
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+14.4Catalog momentum+1.5Research signal+1.4Reason fit
0Citations0Reviews+0Signal14.4Trend0Trust
Rank #6preprint2026arXiv

CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

Spatial intelligence requires multimodal large language models (MLLMs) to move beyond single-view perception and reason consistently about objects, visibility, geometry, and interactions across multiple viewpoints. However, progress in cross-view reasoning remains limited by three major gaps: the scarcity of large-scale well-annotated training data, the lack of comprehensive benchmarks for systematic evaluation, and the absence of explicit alignment mechanisms that establish object-level consistency across views. To address these gaps, we thoroughly develop CrossView Suite across three coordinated components: CrossViewSet, CrossViewBench, and CrossViewer. Firstly, we introduce a multi-agent data engine to meticulously curate a large-scale, high-quality cross-view instruction dataset, termed CrossViewSet, covering 17 fine-grained task types with 1.6M samples. Second, we meticulously create a scene-disjoint CrossViewBench to comprehensively assess the cross-view spatial understanding capability of an MLLM, evaluating it across various aspects. Finally, we propose CrossViewer, a progressive three-stage framework for cross-view spatial reasoning in MLLMs, following a Perception -> Alignment -> Reasoning paradigm. Our method equips an adaptive spatial region tokenizer to capture fine-grained object representations, and then aligns the multi-view objects explicitly, and thus fuses aligned features for boosting the cross-view inference capacity for MLLMs. Extensive experiments and analyses show that large-scale training data, systematic evaluation, and explicit cross-view alignment are all critical for advancing MLLMs from single-view perception toward real-world spatial intelligence. The project page is available at https://github.com/Thinkirin/Crossview-Suite.

Score 35.3
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+14.4Catalog momentum+1.5Research signal+1.4Reason fit
0Citations0Reviews+0Signal14.4Trend0Trust
Rank #7preprint2026arXiv

A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability

Agents built on large language models (LLMs) rely on a range of reliability techniques, including retry, majority voting, and self-consistency, that have been developed in parallel rather than within a common analytical framework. We observe that an LLM sampled at temperature $T$ is a discrete stochastic channel $p(y \mid x)$ in the sense of Shannon's coding theory, and use this identity as the entry point for such a framework grounded in communication theory. Each of these techniques is a special case of one of six classical reliability operators: diversity combining, hybrid retransmission, iterative generator-critic decoding, rateless sampling, structured redundant verification, and difficulty-adaptive routing. Within the framework we give two closed-form results: a noise-variance threshold above which uniform averaging beats quality-weighted averaging, and a contractivity criterion for generator-critic refinement, consistent with a contractive-to-divergent transition we observe between 3B- and 14B-parameter models. We further introduce a cost-aware semantic-nearest-neighbor router whose single Lagrangian knob traverses the quality-cost frontier without retraining. Across six channel configurations spanning local and cloud models on 69 hard tasks, no fixed model-technique-budget choice dominates, motivating per-task allocation. On a 300-item hard split of MMLU, GSM8K, and HumanEval, our router occupies the full empirical Pareto frontier: at matched quality, its normalized cost is ${\approx}56$\% lower than the strongest fixed technique; at matched normalized cost, it improves quality by ${\approx}7$\% ($26$\% over single-shot decoding). These results argue for consolidating these reliability techniques into a single tunable layer informed by channel coding.

Score 35.3
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+14.4Catalog momentum+1.5Research signal+1.4Reason fit
0Citations0Reviews+0Signal14.4Trend0Trust
Rank #8preprint2026arXiv

AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play

Poker is an imperfect information game that has served as a long-standing benchmark for decision-making under uncertainty. To maximize utility beyond the Nash equilibrium, an agent can deviate from Nash-equilibrium policies to exploit suboptimal play. We introduce AlphaExploitem, which extends the competitive RL poker agent AlphaHoldem by using a hierarchical transformer encoder that enables reasoning over previously played hands and modifying the training procedure with the inclusion of a diverse pool of exploitable opponents to facilitate learning to exploit. We train and evaluate AlphaExploitem on two standard benchmarks for imperfect-information games. Empirically, AlphaExploitem successfully exploits weak play by both in- and out-of-distribution opponents, without losing performance against NE opponents.

Score 35.3
Catalog momentumOpen access
Why this is here

Ranked mainly because of Catalog momentum.

Advanced score breakdown
+18Recency+14.4Catalog momentum+1.5Research signal+1.4Reason fit
0Citations0Reviews+0Signal14.4Trend0Trust