Source author record

Jungseul Ok

Jungseul Ok appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computer Vision Networking and Internet Architecture eess.AS eess.SP Information Theory math.IT Multimedia Robotics Sound

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation

Recent text-to-image (T2I) generators can synthesize realistic images, but still struggle with compositional prompts involving multiple objects, counts, attributes, and relations. We introduce EPIC (Efficient Predicate-Guided Inference-Time Control), a training-free inference-time refinement framework for compositional T2I generation. EPIC casts refinement as predicate-guided search: it parses the original prompt once into a fixed visual program of object variables and typed predicates, covering checkable conditions such as object presence, counts, attributes, and relations. Each generated or edited image is verified against this program using visual evidence extracted from that image. An image is judged to satisfy the prompt only when all predicates are satisfied; otherwise, failed predicates decide the next step, routing local failures to targeted editing and global failures to resampling while the fixed visual program remains unchanged. On GenEval2, EPIC improves prompt-level accuracy from 34.16% for single-pass generation with the base generator to 71.46%. Under the same generator/editor setting and maximum image-model execution budget, EPIC outperforms the strongest prior refinement baseline by 19.23 points while reducing realized cost by 31% in image-model executions, 72% in MLLM calls, and 81% in MLLM tokens per prompt.

preprint2026arXiv

Exploring Iterative Controllable Summarization with Large Language Models

Large language models (LLMs) have demonstrated remarkable performance in abstractive summarization tasks. However, their ability to precisely control summary attributes (e.g., length or topic) remains underexplored, limiting their adaptability to specific user preferences. In this paper, we systematically explore the controllability of LLMs. To this end, we revisit summary attribute measurements and introduce iterative evaluation metrics, failure rate and average iteration count to precisely evaluate controllability of LLMs, rather than merely assessing errors. Our findings show that LLMs struggle more with numerical attributes than with linguistic attributes. To address this challenge, we propose a guide-to-explain framework (GTE) for controllable summarization. Our GTE framework enables the model to identify misaligned attributes in the initial draft and guides it in self-explaining errors in the previous output. By allowing the model to reflect on its misalignment, GTE generates well-adjusted summaries that satisfy the desired attributes with robust effectiveness, requiring surprisingly fewer iterations than other iterative approaches.

preprint2026arXiv

Interaction-Aware Influence Functions for Group Attribution

Influence functions approximate how removing a training example changes a quantity of interest, called the target function, such as a held-out loss. To estimate the influence of a group of examples, the standard practice is to sum the individual influences of its members. However, this sum does not capture how examples jointly affect the target: a pair of examples may be redundant or complementary, but the sum cannot distinguish these cases. We propose an interaction-aware influence function that characterizes how interactions between examples influence the target. By expanding the target to second order around the trained parameters, we obtain an estimator that augments the standard sum with a pairwise interaction term that captures the alignment between two examples' effects on the target. We empirically evaluate our estimator in two settings. First, on six dataset-model pairs spanning logistic regression, MLPs, and ResNet-9, our estimator tracks leave-group-out retraining substantially better than first-order influence across all settings. Second, when used as a greedy selection rule for instruction-tuning data on Llama-3.1-8B, it beats prior influence-based and representation-similarity baselines on five of seven downstream tasks, in a regime where standard influence-based selection underperforms random selection.

preprint2026arXiv

MMTB: Evaluating Terminal Agents on Multimedia-File Tasks

Terminals provide a powerful interface for AI agents by exposing diverse tools for automating complex workflows, yet existing terminal-agent benchmarks largely focus on tasks grounded in text, code, and structured files. However, many real-world workflows require practitioners to work directly with audio and video files. Working with such multimedia files calls for terminal agents not only to understand multimedia content, but also to convert auditory and visual evidence across related files into appropriate actions. To evaluate terminal agents on multimedia-file tasks, we introduce MultiMedia-TerminalBench (MMTB), a benchmark of 105 tasks across 5 meta-categories where terminal agents directly operate with audio and video files. Alongside MMTB, we propose Terminus-MM, a multimedia harness that extends Terminus-KIRA with audio and video perception for terminal agents. Together, MMTB and Terminus-MM support a controlled study of multimedia terminal agents, revealing how different forms of multimedia access shape task outcomes and determine which evidence agents rely on to construct executable terminal workflows. MMTB media and metadata are released at https://huggingface.co/datasets/mm-tbench/mmtb-media

preprint2026arXiv

PaT: Planning-after-Trial for Efficient Test-Time Code Generation

Beyond training-time optimization, scaling test-time computation has emerged as a key paradigm to extend the reasoning capabilities of Large Language Models (LLMs). However, most existing methods adopt a rigid Planning-before-Trial (PbT) policy, which inefficiently allocates test-time compute by incurring planning overhead even on directly solvable problems. We propose Planning-after-Trial (PaT), an adaptive policy for code generation that invokes a planner only upon verification failure. This adaptive policy naturally enables a heterogeneous model configuration: a cost-efficient model handles generation attempts, while a powerful model is reserved for targeted planning interventions. Empirically, across multiple benchmarks and model families, our approach significantly advances the cost-performance Pareto frontier. Notably, our heterogeneous configuration achieves performance comparable to a large homogeneous model while reducing inference cost by approximately 69\%.

preprint2026arXiv

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors. Code is available at https://github.com/tree-jhk/procompnav.

preprint2026arXiv

When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models

Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against sampled equal-compute alternatives for the same token and score each by the next-token probability it assigns to the realized token in a verified reasoning trajectory. The result is sharply token-conditional: the standard router is well-aligned with route utility on confident tokens but uninformative on the fragile tokens that drive hard reasoning, where lower-loss equal-compute routes consistently exist inside the frozen model but are not selected. The same pattern holds across Qwen3-30B-A3B, GPT-OSS-20B, DeepSeek-V2-Lite, and OLMoE-1B-7B, and follows structurally from how standard top-$k$ training evaluates routing decisions: the language modeling loss scores only the executed route, and load balancing depends only on aggregate routing statistics. A minimal router-only update to the final-layer router, leaving every expert and every other router frozen, is sufficient to shift pass@K on AIME 2024+2025 and HMMT 2025 for both Qwen3-30B-A3B and GPT-OSS-20B, suggesting that at least part of the failure reflects router-reachable misallocation rather than expert capacity alone.

preprint2022arXiv

Combating Label Distribution Shift for Active Domain Adaptation

We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint. Inspired by recent analysis on a critical issue from label distribution mismatch between source and target in domain adaptation, we devise a method that addresses the issue for the first time in ADA. At its heart lies a novel sampling strategy, which seeks target data that best approximate the entire target distribution as well as being representative, diverse, and uncertain. The sampled target data are then used not only for supervised learning but also for matching label distributions of source and target domains, leading to remarkable performance improvement. On four public benchmarks, our method substantially outperforms existing methods in every adaptation scenario.

preprint2022arXiv

Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution

Audio super resolution aims to predict the missing high resolution components of the low resolution audio signals. While audio in nature is a continuous signal, current approaches treat it as discrete data (i.e., input is defined on discrete time domain), and consider the super resolution over a fixed scale factor (i.e., it is required to train a new neural network to change output resolution). To obtain a continuous representation of audio and enable super resolution for arbitrary scale factor, we propose a method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA). Our method locally parameterizes a chunk of audio as a function of continuous time, and represents each chunk with the local latent codes of neighboring chunks so that the function can extrapolate the signal at any time coordinate, i.e., infinite resolution. To learn a continuous representation for audio, we design a self-supervised learning strategy to practice super resolution tasks up to the original resolution by stochastic selection. Our numerical evaluation shows that LISA outperforms the previous fixed-scale methods with a fraction of parameters, but also is capable of arbitrary scale super resolution even beyond the resolution of training data.

preprint2022arXiv

MetaSSD: Meta-Learned Self-Supervised Detection

Deep learning-based symbol detector gains increasing attention due to the simple algorithm design than the traditional model-based algorithms such as Viterbi and BCJR. The supervised learning framework is often employed to predict the input symbols, where training symbols are used to train the model. There are two major limitations in the supervised approaches: a) a model needs to be retrained from scratch when new train symbols come to adapt to a new channel status, and b) the length of the training symbols needs to be longer than a certain threshold to make the model generalize well on unseen symbols. To overcome these challenges, we propose a meta-learning-based self-supervised symbol detector named MetaSSD. Our contribution is two-fold: a) meta-learning helps the model adapt to a new channel environment based on experience with various meta-training environments, and b) self-supervised learning helps the model to use relatively less supervision than the previously suggested learning-based detectors. In experiments, MetaSSD outperforms OFDM-MMSE with noisy channel information and shows comparable results with BCJR. Further ablation studies show the necessity of each component in our framework.

preprint2022arXiv

Robust Deep Learning from Crowds with Belief Propagation

Crowdsourcing systems enable us to collect large-scale dataset, but inherently suffer from noisy labels of low-paid workers. We address the inference and learning problems using such a crowdsourced dataset with noise. Due to the nature of sparsity in crowdsourcing, it is critical to exploit both probabilistic model to capture worker prior and neural network to extract task feature despite risks from wrong prior and overfitted feature in practice. We hence establish a neural-powered Bayesian framework, from which we devise deepMF and deepBP with different choice of variational approximation methods, mean field (MF) and belief propagation (BP), respectively. This provides a unified view of existing methods, which are special cases of deepMF with different priors. In addition, our empirical study suggests that deepBP is a new approach, which is more robust against wrong prior, feature overfitting and extreme workers thanks to the more sophisticated BP than MF.

preprint2013arXiv

Optimal Rate Sampling in 802.11 Systems

In 802.11 systems, Rate Adaptation (RA) is a fundamental mechanism allowing transmitters to adapt the coding and modulation scheme as well as the MIMO transmission mode to the radio channel conditions, and in turn, to learn and track the (mode, rate) pair providing the highest throughput. So far, the design of RA mechanisms has been mainly driven by heuristics. In contrast, in this paper, we rigorously formulate such design as an online stochastic optimisation problem. We solve this problem and present ORS (Optimal Rate Sampling), a family of (mode, rate) pair adaptation algorithms that provably learn as fast as it is possible the best pair for transmission. We study the performance of ORS algorithms in both stationary radio environments where the successful packet transmission probabilities at the various (mode, rate) pairs do not vary over time, and in non-stationary environments where these probabilities evolve. We show that under ORS algorithms, the throughput loss due to the need to explore sub-optimal (mode, rate) pairs does not depend on the number of available pairs, which is a crucial advantage as evolving 802.11 standards offer an increasingly large number of (mode, rate) pairs. We illustrate the efficiency of ORS algorithms (compared to the state-of-the-art algorithms) using simulations and traces extracted from 802.11 test-beds.

preprint2012arXiv

Embedding of Virtual Network Requests over Static Wireless Multihop Networks

Network virtualization is a technology of running multiple heterogeneous network architecture on a shared substrate network. One of the crucial components in network virtualization is virtual network embedding, which provides a way to allocate physical network resources (CPU and link bandwidth) to virtual network requests. Despite significant research efforts on virtual network embedding in wired and cellular networks, little attention has been paid to that in wireless multi-hop networks, which is becoming more important due to its rapid growth and the need to share these networks among different business sectors and users. In this paper, we first study the root causes of new challenges of virtual network embedding in wireless multi-hop networks, and propose a new embedding algorithm that efficiently uses the resources of the physical substrate network. We examine our algorithm's performance through extensive simulations under various scenarios. Due to lack of competitive algorithms, we compare the proposed algorithm to five other algorithms, mainly borrowed from wired embedding or artificially made by us, partially with or without the key algorithmic ideas to assess their impacts.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Machine Learning Artificial Intelligence Computation and Language Computer Vision Networking and Internet Architecture eess.AS eess.SP Information Theory math.IT Multimedia Robotics Sound

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.07260:author:4:jungseul-ok

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.07248:author:6:jungseul-ok

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.11722:author:3:jungseul-ok

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.15675:author:5:jungseul-ok

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.06223:author:5:jungseul-ok

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.10966:author:7:jungseul-ok

Imported May 20, 2026Synced May 20, 2026

4 works

Dongwoo Kim

Researcher

Dongwoo Kim contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Donggyu Yun

Researcher

Donggyu Yun contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Hoyoung Kim

Researcher

Hoyoung Kim contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Jaechang Kim

Researcher

Jaechang Kim contributes to research discovery and scholarly infrastructure.

Open to collaborate

Jungseul Ok

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation

Exploring Iterative Controllable Summarization with Large Language Models

Interaction-Aware Influence Functions for Group Attribution

MMTB: Evaluating Terminal Agents on Multimedia-File Tasks

PaT: Planning-after-Trial for Efficient Test-Time Code Generation

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models

Combating Label Distribution Shift for Active Domain Adaptation

Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution

MetaSSD: Meta-Learned Self-Supervised Detection

Robust Deep Learning from Crowds with Belief Propagation

Optimal Rate Sampling in 802.11 Systems

Embedding of Virtual Network Requests over Static Wireless Multihop Networks