Source author record

Jinho D. Choi

Jinho D. Choi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Machine Learning Human-Computer Interaction

Catalog footprint

What is connected

19works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

Behavioral logs provide rich signals for user modeling, but are noisy and interleaved across diverse intents. Recent work uses LLMs to generate interpretable natural-language personas from user logs, yet evaluation often emphasizes downstream utility, providing limited assurance of persona quality itself. We propose a hierarchical framework that aggregates user actions into intent memories and induces multiple evidence-grounded personas by clustering and labeling these memories. We formulate persona induction as an optimization problem over persona quality-captured by cluster cohesion, persona-evidence alignment, and persona truthfulness-and train the persona model using a groupwise extension of Direct Preference Optimization (DPO). Experiments on a large-scale service log and two public datasets show that our method induces more coherent, evidence-grounded, and trustworthy personas, while also improving future interaction prediction.

preprint2026arXiv

PersonaKit (PK): A Plug-and-Play Platform for User Testing Diverse Roles in Full-Duplex Dialogue

As spoken dialogue systems expand beyond traditional assistant roles to encompass diverse personas -- such as authoritative instructors, uncooperative merchants, or distracted workers -- they require distinct, human-like turn-taking behaviors to maintain psychological immersion. However, current full-duplex systems often default to a rigid, overly accommodating ``always-yield'' policy during overlapping speech, which severely undermines character consistency for non-submissive roles. Evaluating alternative, persona-specific turn-taking strategies through empirical user studies is challenging because building real-time full-duplex test environments requires substantial engineering overhead. To address this, we present PersonaKit (PK), an open-source, low-latency web platform for the rapid prototyping and evaluation of conversational agents. Using intuitive JSON configurations, researchers can define personas, specify probabilistic interruption-handling behaviors (e.g., yield, hold, bridge, or override), and automatically deploy comparative A/B surveys. Through an in-the-wild evaluation with 8 distinct personas, we demonstrate that PersonaKit provides an extensible, end-to-end framework for studying complex sociolinguistic behaviors in next-generation spoken agents.

preprint2022arXiv

Modeling Task Interactions in Document-Level Joint Entity and Relation Extraction

We target on the document-level relation extraction in an end-to-end setting, where the model needs to jointly perform mention extraction, coreference resolution (COREF) and relation extraction (RE) at once, and gets evaluated in an entity-centric way. Especially, we address the two-way interaction between COREF and RE that has not been the focus by previous work, and propose to introduce explicit interaction namely Graph Compatibility (GC) that is specifically designed to leverage task characteristics, bridging decisions of two tasks for direct task interference. Our experiments are conducted on DocRED and DWIE; in addition to GC, we implement and compare different multi-task settings commonly adopted in previous work, including pipeline, shared encoders, graph propagation, to examine the effectiveness of different interactions. The result shows that GC achieves the best performance by up to 2.3/5.1 F1 improvement over the baseline.

preprint2022arXiv

Online Coreference Resolution for Dialogue Processing: Improving Mention-Linking on Real-Time Conversations

This paper suggests a direction of coreference resolution for online decoding on actively generated input such as dialogue, where the model accepts an utterance and its past context, then finds mentions in the current utterance as well as their referents, upon each dialogue turn. A baseline and four incremental-updated models adapted from the mention-linking paradigm are proposed for this new setting, which address different aspects including the singletons, speaker-grounded encoding and cross-turn mention contextualization. Our approach is assessed on three datasets: Friends, OntoNotes, and BOLT. Results show that each aspect brings out steady improvement, and our best models outperform the baseline by over 10%, presenting an effective system for this setting. Further analysis highlights the task characteristics, such as the significance of addressing the mention recall.

preprint2022arXiv

Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-sentence Dependency Graph

We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path.

preprint2020arXiv

Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean

In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency relations to reflect morphological features and flexible word-order aspects in Korean. The original and the revised versions of PKT-UD are experimented with transformer-based parsing models using biaffine attention. The parsing model trained on the revised corpus shows a significant improvement of 3.0% in labeled attachment score over the model trained on the previous corpus. Our error analysis demonstrates that this revision allows the parsing model to learn relations more robustly, reducing several critical errors that used to be made by the previous model.

preprint2020arXiv

Emora STDM: A Versatile Framework for Innovative Dialogue System Development

This demo paper presents Emora STDM (State Transition Dialogue Manager), a dialogue system development framework that provides novel workflows for rapid prototyping of chat-based dialogue managers as well as collaborative development of complex interactions. Our framework caters to a wide range of expertise levels by supporting interoperability between two popular approaches, state machine and information state, to dialogue management. Our Natural Language Expression package allows seamless integration of pattern matching, custom NLP modules, and database querying, that makes the workflows much more efficient. As a user study, we adopt this framework to an interdisciplinary undergraduate course where students with both technical and non-technical backgrounds are able to develop creative dialogue managers in a short period of time.

preprint2020arXiv

Emora: An Inquisitive Social Chatbot Who Cares For You

Inspired by studies on the overwhelming presence of experience-sharing in human-human conversations, Emora, the social chatbot developed by Emory University, aims to bring such experience-focused interaction to the current field of conversational AI. The traditional approach of information-sharing topic handlers is balanced with a focus on opinion-oriented exchanges that Emora delivers, and new conversational abilities are developed that support dialogues that consist of a collaborative understanding and learning process of the partner's life experiences. We present a curated dialogue system that leverages highly expressive natural language templates, powerful intent classification, and ontology resources to provide an engaging and interesting conversational experience to every user.

preprint2020arXiv

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and simplify the current state-of-the-art approach to enhance its model efficiency. We then evaluate our simplified approaches on those three tasks using token embeddings generated by BERT. 12 datasets in both English and Chinese are used for our experiments. The BERT models outperform the previously best-performing models by 2.5% on average (7.5% for the most significant case). Moreover, an in-depth analysis on the impact of BERT embeddings is provided using self-attention, which helps understanding in this rich yet representation. All models and source codes are available in public so that researchers can improve upon and utilize them to establish strong baselines for the next decade.

preprint2020arXiv

Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings

We present a novel online algorithm that learns the essence of each dimension in word embeddings by minimizing the within-group distance of contextualized embedding groups. Three state-of-the-art neural-based language models are used, Flair, ELMo, and BERT, to generate contextualized word embeddings such that different embeddings are generated for the same word type, which are grouped by their senses manually annotated in the SemCor dataset. We hypothesize that not all dimensions are equally important for downstream tasks so that our algorithm can detect unessential dimensions and discard them without hurting the performance. To verify this hypothesis, we first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Several KNN approaches are experimented to establish strong baselines for WSD. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%. Our work can be used to conduct future research on the interpretability of contextualized embeddings.

preprint2020arXiv

Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning

This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant. We face the challenges of developing robust models on a small dataset where each document may consist of over 10K tokens with full of noise including tabular text and task-irrelevant sentences. We first experiment four types of encoders to empirically decide the best document representation, and then apply reinforcement learning to remove noisy text from the long documents, which models the noise extraction process as a sequential decision problem. Our results show that the old bag-of-words encoder outperforms deep learning-based encoders on this task, and reinforcement learning is able to improve upon baseline while pruning out 25% text segments. Our analysis depicts that reinforcement learning is able to identify both typical noisy tokens and task-specific noisy text.

preprint2020arXiv

Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.

preprint2020arXiv

Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media

We present a transformer-based sarcasm detection model that accounts for the context from the entire conversation thread for more robust predictions. Our model uses deep transformer layers to perform multi-head attentions among the target utterance and the relevant context in the thread. The context-aware models are evaluated on two datasets from social media, Twitter and Reddit, and show 3.1% and 7.0% improvements over their baselines. Our best models give the F1-scores of 79.0% and 75.0% for the Twitter and Reddit datasets respectively, becoming one of the highest performing systems among 36 participants in this shared task.

preprint2020arXiv

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utterance prediction and the token span prediction is applied to fine-tune for span-based question answering (QA). Our approach is evaluated on the FriendsQA dataset and shows improvements of 3.8% and 1.4% over the two state-of-the-art transformer models, BERT and RoBERTa, respectively.

preprint2020arXiv

XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders

This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media. For the individual models, deep transformer layers are applied to perform multi-head attentions. For the ensemble model, the utterance representations taken from those individual models are concatenated and fed into a linear decoder to make the final decisions. Our ensemble model outperforms the individual models and shows up to 8.6% improvement over the individual models on the development set. On the test set, it achieves macro-F1 of 90.9% and becomes one of the high performing systems among 85 participants in the sub-task A of this shared task. Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.

preprint2016arXiv

Multi-Field Structural Decomposition for Question Answering

This paper presents a precursory yet novel approach to the question answering task using structural decomposition. Our system first generates linguistic structures such as syntactic and semantic trees from text, decomposes them into multiple fields, then indexes the terms in each field. For each question, it decomposes the question into multiple fields, measures the relevance score of each field to the indexed ones, then ranks all documents by their relevance scores and weights associated with the fields, where the weights are learned through statistical modeling. Our final model gives an absolute improvement of over 40% to the baseline approach using simple search for detecting documents containing answers.

preprint2016arXiv

SelQA: A New Benchmark for Selection-based Question Answering

This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.

preprint2014arXiv

Targetable Named Entity Recognition in Social Media

We present a novel approach for recognizing what we call targetable named entities; that is, named entities in a targeted set (e.g, movies, books, TV shows). Unlike many other NER systems that need to retrain their statistical models as new entities arrive, our approach does not require such retraining, which makes it more adaptable for types of entities that are frequently updated. For this preliminary study, we focus on one entity type, movie title, using data collected from Twitter. Our system is tested on two evaluation sets, one including only entities corresponding to movies in our training set, and the other excluding any of those entities. Our final model shows F1-scores of 76.19% and 78.70% on these evaluation sets, which gives strong evidence that our approach is completely unbiased to any par- ticular set of entities found during training.

preprint2013arXiv

Preparing Korean Data for the Shared Task on Parsing Morphologically Rich Languages

This document gives a brief description of Korean data prepared for the SPMRL 2013 shared task. A total of 27,363 sentences with 350,090 tokens are used for the shared task. All constituent trees are collected from the KAIST Treebank and transformed to the Penn Treebank style. All dependency trees are converted from the transformed constituent trees using heuristics and labeling rules de- signed specifically for the KAIST Treebank. In addition to the gold-standard morphological analysis provided by the KAIST Treebank, two sets of automatic morphological analysis are provided for the shared task, one is generated by the HanNanum morphological analyzer, and the other is generated by the Sejong morphological analyzer.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computation and Language Artificial Intelligence Machine Learning Human-Computer Interaction

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2604.26120:author:5:jinho-d-choi

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.06007:author:2:jinho-d-choi

Imported May 20, 2026Synced May 20, 2026

4 works

Liyan Xu

Researcher

Liyan Xu contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Xiangjue Dong

Researcher

Xiangjue Dong contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Changmao Li

Researcher

Changmao Li contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Han He

Researcher

Han He contributes to research discovery and scholarly infrastructure.

Open to collaborate

Jinho D. Choi

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

PersonaKit (PK): A Plug-and-Play Platform for User Testing Diverse Roles in Full-Duplex Dialogue

Modeling Task Interactions in Document-Level Joint Entity and Relation Extraction

Online Coreference Resolution for Dialogue Processing: Improving Mention-Linking on Real-Time Conversations

Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-sentence Dependency Graph

Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean

Emora STDM: A Versatile Framework for Innovative Dialogue System Development

Emora: An Inquisitive Social Chatbot Who Cares For You

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings

Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning

Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders

Multi-Field Structural Decomposition for Question Answering

SelQA: A New Benchmark for Selection-based Question Answering

Targetable Named Entity Recognition in Social Media

Preparing Korean Data for the Shared Task on Parsing Morphologically Rich Languages