Source author record

Hyunsoo Cho

Hyunsoo Cho appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO Artificial Intelligence Computation and Language Computer Vision Machine Learning

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A More Word-like Image Tokenization for MLLMs

Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a visual projector that maps the pixels into a sequence of tokens in its embedding space, so that images can be presented in essentially the same form as text. However, the language model has been optimized to operate on discrete, semantically meaningful tokens, while prevailing visual projectors transform an image into a long stream of continuous and highly correlated embeddings. This causes the visual tokens to behave differently from the word-like units that LLMs are originally trained to understand. We propose a novel Disentangled Visual Tokenization (DiVT) that clusters patch embeddings into coherent semantic units, so each token corresponds to a distinct visual concept instead of a rigid grid cell. DiVT further adapts its token budget to image complexity, providing an explicit accuracy-compute trade-off modifying neither the vision encoder nor the language model. Across diverse multimodal benchmarks, DiVT matches or surpasses baselines with significantly fewer visual tokens, demonstrating robustness under limited token budgets, significantly reducing memory cost and latency while making visual inputs more compatible with LLMs. Our code is available at https://github.com/snuviplab/DiVT.

preprint2022arXiv

Combinatorics on bounded free Motzkin paths and its applications

In this paper, we construct a bijection from a set of bounded free Motzkin paths to a set of bounded Motzkin prefixes that induces a bijection from a set of bounded free Dyck paths to a set of bounded Dyck prefixes. We also give bijections between a set of bounded cornerless Motzkin paths and a set of $t$-core partitions, and a set of bounded cornerless symmetric Motzkin paths and a set of self-conjugate $t$-core partitions. As an application, we get explicit formulas for the number of ordinary and self-conjugate $t$-core partitions with a fixed number of corners.

preprint2022arXiv

Results on bar-core partitions, core shifted Young diagrams, and doubled distinct cores

Simultaneous bar-cores, core shifted Young diagrams (or CSYDs), and doubled distinct cores have been studied since Morris and Yaseen introduced the concept of bar-cores. In this paper, our goal is to give a formula for the number of these core partitions on $(s,t)$-cores and $(s,s+d,s+2d)$-cores for the remaining cases that are not covered yet. In order to achieve this goal, we observe a characterization of $\bar{s}$-core partitions to obtain characterizations of doubled distinct $s$-core partitions and $s$-CSYDs. By using them, we construct $NE$ lattice path interpretations of these core partitions on $(s,t)$-cores. Also, we give free Motzkin path interpretations of these core partitions on $(s,s+d,s+2d)$-cores.

preprint2022arXiv

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task. Such a process (i.e., in-context learning), however, naturally leads to high reliance on the demonstrations which are usually selected from external datasets. In this paper, we propose self-generated in-context learning (SG-ICL), which generates demonstrations for in-context learning from PLM itself to minimize the reliance on the external demonstration. We conduct experiments on four different text classification tasks and show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples. Moreover, our generated demonstrations show more consistent performance with low variance compared to randomly selected demonstrations from the training dataset.

preprint2020arXiv

Self-conjugate $(s,s+d,\dots,s+pd)$-core partitions and free rational Motzkin paths

A partition is called an $(s_1,s_2,\dots,s_p)$-core partition if it is simultaneously an $s_i$-core for all $i=1,2,\dots,p$. Simultaneous core partitions have been actively studied in various directions. In particular, researchers concerned with properties of such partitions when the sequence of $s_i$ is an arithmetic progression. In this paper, for $p\geq 2$ and relatively prime positive integers $s$ and $d$, we propose the $(s+d,d;a)$-abacus of a self-conjugate partition and establish a bijection between the set of self-conjugate $(s,s+d,\dots,s+pd)$-core partitions and the set of free rational Motzkin paths with appropriate conditions. For $p=2,3$, we give formulae for the number of self-conjugate $(s,s+d,\dots,s+pd)$-core partitions and the number of self-conjugate $(s,s+1,\dots,s+p)$-core partitions with $m$ corners.

preprint2020arXiv

The $(s,s+d,\dots,s+pd)$-core partitions and the rational Motzkin paths

In this paper, we propose an $(s+d,d)$-abacus for $(s,s+d,\dots,s+pd)$-core partitions and establish a bijection between the $(s,s+d,\dots,s+pd)$-core partitions and the rational Motzkin paths of type $(s+d,-d)$. This result not only gives a lattice path interpretation of the $(s,s+d,\dots,s+pd)$-core partitions but also counts them with a closed formula. Also we enumerate $(s,s+1,\dots,s+p)$-core partitions with $k$ corners and self-conjugate $(s,s+1,\dots,s+p)$-core partitions.

Hyunsoo Cho

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A More Word-like Image Tokenization for MLLMs

Combinatorics on bounded free Motzkin paths and its applications

Results on bar-core partitions, core shifted Young diagrams, and doubled distinct cores

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Self-conjugate $(s,s+d,\dots,s+pd)$-core partitions and free rational Motzkin paths

The $(s,s+d,\dots,s+pd)$-core partitions and the rational Motzkin paths