Researcher profile

Lei Hou

Lei Hou contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

A Second Main Theorem for Entire Curves Intersecting Three Conics

We establish a Second Main Theorem for entire holomorphic curves \( f: \mathbb{C} \to \mathbb{P}^2 \) intersecting a generic configuration of three conics \(\mathcal{C}= \mathcal{C}_1+ \mathcal{C}_2+ \mathcal{C}_3 \) in the complex projective plane $\mathbb{P}^2$. Using invariant logarithmic $2$-jet differentials with negative twists, we prove the estimate \[ T_f(r) \leqslant 5 \sum_{i=1}^3 N_f^{[1]}(r, \mathcal{C}_i) + o\big(T_f(r)\big)\quad\parallel, \] where \( T_f(r) \) is the Nevanlinna characteristic function, and \( N_f^{[1]}(r, \mathcal{C}_i) \) is the $1$-truncated counting function. The key innovation of our approach is establishing new vanishing lemmas of the form \[ H^0\bigl(\mathbb{P}^2,\, E_{2,m}T_{\mathbb{P}^2}^*(\log \mathcal{C}) \otimes \mathcal{O}_{\mathbb{P}^2}(-t)\bigr) = 0 \] for specific pairs \((m, t)\), achieved by combining algebro-geometric arguments with computer-assisted computations through a mod-\(p\) reduction technique. This yields a systematic method for proving vanishing results for negatively twisted jet differentials -- a key component in complex hyperbolic geometry.

preprint2026arXiv

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Reinforcement learning (RL) has emerged as a critical technique for enhancing LLM-based deep search agents. However, existing approaches primarily rely on binary outcome rewards, which fail to capture the comprehensiveness and factuality of agents' reasoning process, and often lead to undesirable behaviors such as shortcut exploitation and hallucinations. To address these limitations, we propose \textbf{Citation-aware Rubric Rewards (CaRR)}, a fine-grained reward framework for deep search agents that emphasizes reasoning comprehensiveness, factual grounding, and evidence connectivity. CaRR decomposes complex questions into verifiable single-hop rubrics and requires agents to satisfy these rubrics by explicitly identifying hidden entities, supporting them with correct citations, and constructing complete evidence chains that link to the predicted answer. We further introduce \textbf{Citation-aware Group Relative Policy Optimization (C-GRPO)}, which combines CaRR and outcome rewards for training robust deep search agents. Experiments show that C-GRPO consistently outperforms standard outcome-based RL baselines across multiple deep search benchmarks. Our analysis also validates that C-GRPO effectively discourages shortcut exploitation, promotes comprehensive, evidence-grounded reasoning, and exhibits strong generalization to open-ended deep research tasks. Our code and data are available at https://github.com/THUDM/CaRR.

preprint2026arXiv

StoryAlign: Evaluating and Training Reward Models for Story Generation

Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences. StoryRMB contains $1,133$ high-quality, human-verified instances, each consisting of a prompt, one chosen story, and three rejected stories. We find existing reward models struggle to select human-preferred stories, with the best model achieving only $66.3\%$ accuracy. To address this limitation, we construct roughly $100,000$ high-quality story preference pairs across diverse domains and develop StoryReward, an advanced reward model for story preference trained on this dataset. StoryReward achieves state-of-the-art (SoTA) performance on StoryRMB, outperforming much larger models. We also adopt StoryReward in downstream test-time scaling applications for best-of-n (BoN) story selection and find that it generally chooses stories better aligned with human preferences. We will release our dataset, model, and code to facilitate future research. Related code and data are available at https://github.com/THU-KEG/StoryReward.