Researcher profile

Daniel George

Daniel George contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model behaviors require satisfying several qualitative criteria at once. Rubric-based rewards address this setting by grading prompt-specific criteria and aggregating them into a scalar reward. Yet standard static aggregations conflate a criterion's human-assigned importance with its current usefulness as an optimization signal. We show that this assumption breaks down in rubric RL: many important criteria are already saturated or currently unreachable, while criteria that distinguish rollouts are not necessarily those with the largest human weights. We introduce POW3R, a policy-aware rubric reward framework that preserves human weights and category balance as the rubric objective while adapting criterion-level reward weights during training. POW3R uses rollout-level contrast to emphasize criteria that currently separate the policy's outputs, making the GRPO reward more informative without changing the underlying evaluation target. Across three base policies on two datasets spanning multimodal and text-only settings, POW3R wins $24$ of $30$ base-policy/metric comparisons, improving both mean rubric reward and strict completion (the fraction of prompts whose response satisfies every required rubric criterion) over vanilla GRPO with rubric rewards, and reaches the same plateau in $2.5$--$4\times$ fewer training steps. Rubric rewards should therefore distinguish what should matter in the final answer from what can teach the current policy.

preprint2017arXiv

Denoising Gravitational Waves using Deep Learning with Recurrent Denoising Autoencoders

Gravitational wave astronomy is a rapidly growing field of modern astrophysics, with observations being made frequently by the LIGO detectors. Gravitational wave signals are often extremely weak and the data from the detectors, such as LIGO, is contaminated with non-Gaussian and non-stationary noise, often containing transient disturbances which can obscure real signals. Traditional denoising methods, such as principal component analysis and dictionary learning, are not optimal for dealing with this non-Gaussian noise, especially for low signal-to-noise ratio gravitational wave signals. Furthermore, these methods are computationally expensive on large datasets. To overcome these issues, we apply state-of-the-art signal processing techniques, based on recent groundbreaking advancements in deep learning, to denoise gravitational wave signals embedded either in Gaussian noise or in real LIGO noise. We introduce SMTDAE, a Staired Multi-Timestep Denoising Autoencoder, based on sequence-to-sequence bi-directional Long-Short-Term-Memory recurrent neural networks. We demonstrate the advantages of using our unsupervised deep learning approach and show that, after training only using simulated Gaussian noise, SMTDAE achieves superior recovery performance for gravitational wave signals embedded in real non-Gaussian LIGO noise.