Researcher profile

Soo Min Kwon

Soo Min Kwon contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

CoDistill-GRPO: A Co-Distillation Recipe for Efficient Group Relative Policy Optimization

Group Relative Policy Optimization (GRPO) has emerged as a powerful algorithm for improving the reasoning capabilities of language models, but often fails to improve small models due to sparse rewards on difficult tasks. Existing works mitigate this issue by leveraging a larger model, either to provide hints for rollouts or to provide dense reward signals through knowledge distillation (KD). However, this assumes the existence of such an oracle, and training one can significantly increase total training time. In this work, we propose CoDistill-GRPO, a co-distillation algorithm that simultaneously trains a large and a small model by maximizing carefully designed GRPO objectives. The two models learn from each other: the small model uses an on-policy KD reward to learn from the large model's distribution, while the large model is updated using rollouts generated by the small model with importance reweighting, reducing the computational overhead of rollout generation. We show that CoDistill-GRPO substantially improves small model performance over standard GRPO on mathematical benchmarks across both Qwen and Llama models. Specifically, with Qwen2.5-Math-1.5B, we observe an accuracy increase of over 11.6 percentage points over the base model and an additional 6.0 percentage points over GRPO on the Minerva dataset. Interestingly, the larger model (Qwen2.5-Math-7B) trained with CoDistill-GRPO nearly matches standard GRPO performance despite training on small-model rollouts. This highlights CoDistill-GRPO as a cost-effective alternative to GRPO for larger models, yielding an approximate 18% speedup, which may be of independent interest.

preprint2022arXiv

Low-Rank Phase Retrieval with Structured Tensor Models

We study the low-rank phase retrieval problem, where the objective is to recover a sequence of signals (typically images) given the magnitude of linear measurements of those signals. Existing solutions involve recovering a matrix constructed by vectorizing and stacking each image. These algorithms model this matrix to be low-rank and leverage the low-rank property to decrease the sample complexity required for accurate recovery. However, when the number of available measurements is more limited, these low-rank matrix models can often fail. We propose an algorithm called Tucker-Structured Phase Retrieval (TSPR) that models the sequence of images as a tensor rather than a matrix that we factorize using the Tucker decomposition. This factorization reduces the number of parameters that need to be estimated, allowing for a more accurate reconstruction in the under-sampled regime. Interestingly, we observe that this structure also has improved performance in the over-determined setting when the Tucker ranks are chosen appropriately. We demonstrate the effectiveness of our approach on real video datasets under several different measurement models.