Source author record

Sundong Kim

Sundong Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computer Science and Game Theory Computer Vision cs.CY Information Retrieval Multiagent Systems Neural and Evolutionary Computing physics.soc-ph

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Noise to Diversity: Random Embedding Injection in LLM Reasoning

Recent soft prompt research has tried to improve reasoning by inserting trained vectors into LLM inputs, yet whether the gain comes from the learned content or from the act of injection itself has not been carefully separated. We study Random Soft Prompts (RSPs), which drop the training step entirely and append a freshly drawn sequence of random embedding vectors to the input. Each RSP vector is sampled from an isotropic Gaussian fitted to the entrywise mean and variance of the pretrained embedding table; the sequence carries no learned content, and yet reaches accuracy comparable to optimized soft prompts on math reasoning benchmarks in several settings. The mechanism unfolds in two stages: because attention has to absorb a never-seen-before random position, the distribution over the first few generated tokens flattens and reasoning trajectories branch, and as generation continues this influence dilutes naturally so the response commits to a single completion. We show that during inference RSPs lift early-stage token diversity and, combined with temperature sampling, widen Pass@N, the probability that at least one out of N attempts is correct. Beyond inference, we carry the same effect into DAPO training and demonstrate practical gains. Our contributions are: (i) RSP isolates the simplest form of soft prompt -- training-free, freshly resampled -- providing a unified lens for the structural effect of injection that variants otherwise differing in training and form all share; (ii) a theoretical and empirical validation of the underlying mechanism; and (iii) an extension from inference to training.

preprint2022arXiv

Active Learning for Human-in-the-Loop Customs Inspection

We study the human-in-the-loop customs inspection scenario, where an AI-assisted algorithm supports customs officers by recommending a set of imported goods to be inspected. If the inspected items are fraudulent, the officers can levy extra duties. Th formed logs are then used as additional training data for successive iterations. Choosing to inspect suspicious items first leads to an immediate gain in customs revenue, yet such inspections may not bring new insights for learning dynamic traffic patterns. On the other hand, inspecting uncertain items can help acquire new knowledge, which will be used as a supplementary training resource to update the selection systems. Based on multiyear customs datasets obtained from three countries, we demonstrate that some degree of exploration is necessary to cope with domain shifts in trade data. The results show that a hybrid strategy of selecting likely fraudulent and uncertain items will eventually outperform the exploitation-only strategy.

preprint2022arXiv

FedX: Unsupervised Federated Learning with Cross Knowledge Distillation

This paper presents FedX, an unsupervised federated learning framework. Our model learns unbiased representation from decentralized and heterogeneous local data. It employs a two-sided knowledge distillation with contrastive learning as a core component, allowing the federated system to function without requiring clients to share any data features. Furthermore, its adaptable architecture can be used as an add-on module for existing unsupervised algorithms in federated settings. Experiments show that our model improves performance significantly (1.58--5.52pp) on five unsupervised algorithms.

preprint2022arXiv

Knowledge Sharing via Domain Adaptation in Customs Fraud Detection

Knowledge of the changing traffic is critical in risk management. Customs offices worldwide have traditionally relied on local resources to accumulate knowledge and detect tax fraud. This naturally poses countries with weak infrastructure to become tax havens of potentially illicit trades. The current paper proposes DAS, a memory bank platform to facilitate knowledge sharing across multi-national customs administrations to support each other. We propose a domain adaptation method to share transferable knowledge of frauds as prototypes while safeguarding the local trade information. Data encompassing over 8 million import declarations have been used to test the feasibility of this new system, which shows that participating countries may benefit up to 2-11 times in fraud detection with the help of shared knowledge. We discuss implications for substantial tax revenue potential and strengthened policy against illicit trades.

preprint2021arXiv

Customs Fraud Detection in the Presence of Concept Drift

Capturing the changing trade pattern is critical in customs fraud detection. As new goods are imported and novel frauds arise, a drift-aware fraud detection system is needed to detect both known frauds and unknown frauds within a limited budget. The current paper proposes ADAPT, an adaptive selection method that controls the balance between exploitation and exploration strategies used for customs fraud detection. ADAPT makes use of the model performance trends and the amount of concept drift to determine the best exploration ratio at every time. Experiments on data from four countries over several years show that each country requires a different amount of exploration for maintaining its fraud detection system. We find the system with ADAPT can gradually adapt to the dataset and find the appropriate amount of exploration ratio with high performance.

preprint2016arXiv

Automatic Knowledge Base Evolution by Learning Instances

Knowledge base is the way to store structured and unstructured data throughout the web. Since the size of the web is increasing rapidly, there are huge needs to structure the knowledge in a fully automated way. However fully-automated knowledge-base evolution on the Semantic Web is a major challenges, although there are many ontology evolution techniques available. Therefore learning ontology automatically can contribute to the semantic web society significantly. In this paper, we propose full-automated ontology learning algorithm to generate refined knowledge base from incomplete knowledge base and rdf-triples. Our algorithm is data-driven approach which is based on the property of each instance. Ontology class is being elaborated by generalizing frequent property of its instances. By using that developed class information, each instance can find its most relatively matching class. By repeating these two steps, we achieve fully-automated ontology evolution from incomplete basic knowledge base.

preprint2016arXiv

Behavior of Self-Motivated Agents in Complex Networks

Traditional evolutionary game theory describes how certain strategy spreads throughout the system where individual player imitates the most successful strategy among its neighborhood. Accordingly, player doesn't have own authority to change their state. However in the human society, peoples do not just follow strategies of other people, they choose their own strategy. In order to see the decision of each agent in timely basis and differentiate between network structures, we conducted multi-agent based modeling and simulation. In this paper, agent can decide its own strategy by payoff comparison and we name this agent as "Self-motivated agent". To explain the behavior of self-motivated agent, prisoner's dilemma game with cooperator, defector, loner and punisher are considered as an illustrative example. We performed simulation by differentiating participation rate, mutation rate and the degree of network, and found the special coexisting conditions.

Sundong Kim

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

From Noise to Diversity: Random Embedding Injection in LLM Reasoning

Active Learning for Human-in-the-Loop Customs Inspection

FedX: Unsupervised Federated Learning with Cross Knowledge Distillation

Knowledge Sharing via Domain Adaptation in Customs Fraud Detection

Customs Fraud Detection in the Presence of Concept Drift

Automatic Knowledge Base Evolution by Learning Instances

Behavior of Self-Motivated Agents in Complex Networks