Source author record

Lei Hou

Lei Hou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence math.AG math.CV

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Second Main Theorem for Entire Curves Intersecting Three Conics

We establish a Second Main Theorem for entire holomorphic curves $ f: \mathbb{C} \to \mathbb{P}^2 $ intersecting a generic configuration of three conics $\mathcal{C}= \mathcal{C}_1+ \mathcal{C}_2+ \mathcal{C}_3 $ in the complex projective plane $\mathbb{P}^2$. Using invariant logarithmic $2$-jet differentials with negative twists, we prove the estimate \[ T_f(r) \leqslant 5 \sum_{i=1}^3 N_f^{[1]}(r, \mathcal{C}_i) + o\big(T_f(r)\big)\quad\parallel, \] where $ T_f(r) $ is the Nevanlinna characteristic function, and $ N_f^{[1]}(r, \mathcal{C}_i) $ is the $1$-truncated counting function. The key innovation of our approach is establishing new vanishing lemmas of the form \[ H^0\bigl(\mathbb{P}^2,\, E_{2,m}T_{\mathbb{P}^2}^*(\log \mathcal{C}) \otimes \mathcal{O}_{\mathbb{P}^2}(-t)\bigr) = 0 \] for specific pairs $(m, t)$, achieved by combining algebro-geometric arguments with computer-assisted computations through a mod-$p$ reduction technique. This yields a systematic method for proving vanishing results for negatively twisted jet differentials -- a key component in complex hyperbolic geometry.

preprint2026arXiv

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Reinforcement learning (RL) has emerged as a critical technique for enhancing LLM-based deep search agents. However, existing approaches primarily rely on binary outcome rewards, which fail to capture the comprehensiveness and factuality of agents' reasoning process, and often lead to undesirable behaviors such as shortcut exploitation and hallucinations. To address these limitations, we propose \textbf{Citation-aware Rubric Rewards (CaRR)}, a fine-grained reward framework for deep search agents that emphasizes reasoning comprehensiveness, factual grounding, and evidence connectivity. CaRR decomposes complex questions into verifiable single-hop rubrics and requires agents to satisfy these rubrics by explicitly identifying hidden entities, supporting them with correct citations, and constructing complete evidence chains that link to the predicted answer. We further introduce \textbf{Citation-aware Group Relative Policy Optimization (C-GRPO)}, which combines CaRR and outcome rewards for training robust deep search agents. Experiments show that C-GRPO consistently outperforms standard outcome-based RL baselines across multiple deep search benchmarks. Our analysis also validates that C-GRPO effectively discourages shortcut exploitation, promotes comprehensive, evidence-grounded reasoning, and exhibits strong generalization to open-ended deep research tasks. Our code and data are available at https://github.com/THUDM/CaRR.

preprint2026arXiv

StoryAlign: Evaluating and Training Reward Models for Story Generation

Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences. StoryRMB contains $1,133$ high-quality, human-verified instances, each consisting of a prompt, one chosen story, and three rejected stories. We find existing reward models struggle to select human-preferred stories, with the best model achieving only $66.3\%$ accuracy. To address this limitation, we construct roughly $100,000$ high-quality story preference pairs across diverse domains and develop StoryReward, an advanced reward model for story preference trained on this dataset. StoryReward achieves state-of-the-art (SoTA) performance on StoryRMB, outperforming much larger models. We also adopt StoryReward in downstream test-time scaling applications for best-of-n (BoN) story selection and find that it generally chooses stories better aligned with human preferences. We will release our dataset, model, and code to facilitate future research. Related code and data are available at https://github.com/THU-KEG/StoryReward.

Lei Hou

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

A Second Main Theorem for Entire Curves Intersecting Three Conics

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

StoryAlign: Evaluating and Training Reward Models for Story Generation