Source author record

Su-Hyeon Kim

Su-Hyeon Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Formal Languages and Automata Theory Machine Learning Programming Languages

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations

Large language models from different families use different hidden dimensions, tokenizers, and training procedures, making behavioral directions difficult to compare or transfer across models. We introduce an anchor-projection framework that maps hidden representations from each model into a shared anchor coordinate space (ACS). Behavioral directions extracted from source models are projected into ACS and averaged into a canonical direction. For a new model, the canonical direction is reconstructed into its native hidden space using only anchor activations, without fine-tuning or target-specific direction extraction. We evaluate five instruction-tuned model families and ten behavioral axes. We find that same-axis directions align tightly across the Llama-Qwen-Mistral-Phi (LQMP) cluster in ACS. This shared structure transfers to downstream tasks. For the aligned LQMP cluster, held-out targets achieve (0.83) ten-way detection accuracy and (0.95) mean binary AUROC, while canonical steering induces refusal-rate shifts of up to +0.46% under distribution shift. Sensitivity analyses show that two source models and small anchor pools already suffice to approximate transferable directions. Overall, ACS provides a novel perspective on cross-family interpretability, revealing that representation-level transfer remains robust across model families.

preprint2026arXiv

How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs

Large Reasoning Models (LRMs) achieve remarkable success through explicit thinking steps, yet the thinking steps introduce a novel risk by potentially amplifying unsafe behaviors. Despite this vulnerability, conventional defense mechanisms remain ineffective as they overlook the unique reasoning dynamics of LRMs. In this work, we find that the emergence of safe-reminding phrases within thinking steps plays a pivotal role in ensuring LRM safety. Motivated by this finding, we propose SafeRemind, a decoding-time defense method that dynamically injects safe-reminding phrases into thinking steps. By leveraging entropy triggers to intervene at decision-locking points, SafeRemind redirects potentially harmful trajectories toward safer outcomes without requiring any parameter updates. Extensive evaluations across five LRMs and six benchmarks demonstrate that SafeRemind substantially enhances safety, achieving improvements of up to 45.5%p while preserving core reasoning utility.

preprint2026arXiv

KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification

Online communication increasingly amplifies toxic language, and recent research actively explores methods for detecting and rewriting such content. Existing studies primarily focus on non-obfuscated text, which limits robustness in the situation where users intentionally disguise toxic expressions. In particular, Korean allows toxic expressions to be easily disguised through its agglutinative characteristic. However, obfuscation in Korean remains largely unexplored, which motivates us to introduce a KOTOX: Korean toxic dataset for deobfuscation and detoxification. We categorize Korean obfuscation patterns into linguistically grounded classes and define transformation rules derived from real-world examples. Using these rules, we provide paired neutral and toxic sentences alongside their obfuscated counterparts. Models trained on our dataset better handle obfuscated text without sacrificing performance on non-obfuscated text. This is the first dataset that simultaneously supports deobfuscation and detoxification for the Korean language. We expect it to facilitate better understanding and mitigation of obfuscated toxic content in LLM for Korean. Our code and data are available at https://github.com/leeyejin1231/KOTOX.

preprint2022arXiv

Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting

Due to the practical importance of regular expressions (regexes, for short), there has been a lot of research to automatically generate regexes from positive and negative string examples. We tackle the problem of learning regexes faster from positive and negative strings by relying on a novel approach called `neural example splitting'. Our approach essentially split up each example string into multiple parts using a neural network trained to group similar substrings from positive strings. This helps to learn a regex faster and, thus, more accurately since we now learn from several short-length strings. We propose an effective regex synthesis framework called `SplitRegex' that synthesizes subregexes from `split' positive substrings and produces the final regex by concatenating the synthesized subregexes. For the negative sample, we exploit pre-generated subregexes during the subregex synthesis process and perform the matching against negative strings. Then the final regex becomes consistent with all negative strings. SplitRegex is a divided-and-conquer framework for learning target regexes; split (=divide) positive strings and infer partial regexes for multiple parts, which is much more accurate than the whole string inferring, and concatenate (=conquer) inferred regexes while satisfying negative strings. We empirically demonstrate that the proposed SplitRegex framework substantially improves the previous regex synthesis approaches over four benchmark datasets.

Su-Hyeon Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations

How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs

KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification

Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting