Source author record

Jong C. Park

Jong C. Park appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Machine Learning Information Retrieval

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Natural Language Processing for Tigrinya: Current State and Future Directions

Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 50 studies from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across fifteen downstream tasks, including morphological processing, part-of-speech tagging, named entity recognition, machine translation, question-answering, speech recognition, and synthesis. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently driven by milestones in resource creation. We identify key challenges rooted in Tigrinya's morphological properties and resource scarcity, and highlight promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves both as a reference for researchers and as a roadmap for advancing Tigrinya NLP. An anthology of surveyed studies and resources is publicly available.

preprint2022arXiv

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

Dense retrieval models, which aim at retrieving the most relevant document for an input query on a dense representation space, have gained considerable attention for their remarkable success. Yet, dense models require a vast amount of labeled training data for notable performance, whereas it is often challenging to acquire query-document pairs annotated by humans. To tackle this problem, we propose a simple but effective Document Augmentation for dense Retrieval (DAR) framework, which augments the representations of documents with their interpolation and perturbation. We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.

preprint2022arXiv

ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls

Online trolls increase social costs and cause psychological damage to individuals. With the proliferation of automated accounts making use of bots for trolling, it is difficult for targeted individual users to handle the situation both quantitatively and qualitatively. To address this issue, we focus on automating the method to counter trolls, as counter responses to combat trolls encourage community users to maintain ongoing discussion without compromising freedom of expression. For this purpose, we propose a novel dataset for automatic counter response generation. In particular, we constructed a pair-wise dataset that includes troll comments and counter responses with labeled response strategies, which enables models fine-tuned on our dataset to generate responses by varying counter responses according to the specified strategy. We conducted three tasks to assess the effectiveness of our dataset and evaluated the results through both automatic and human evaluation. In human evaluation, we demonstrate that the model fine-tuned on our dataset shows a significantly improved performance in strategy-controlled sentence generation.

preprint2022arXiv

Non-Autoregressive Sign Language Production via Knowledge Distillation

Sign Language Production (SLP) aims to translate expressions in spoken language into corresponding ones in sign language, such as skeleton-based sign poses or videos. Existing SLP models are either AutoRegressive (AR) or Non-Autoregressive (NAR). However, AR-SLP models suffer from regression to the mean and error propagation during decoding. NSLP-G, a NAR-based model, resolves these issues to some extent but engenders other problems. For example, it does not consider target sign lengths and suffers from false decoding initiation. We propose a novel NAR-SLP model via Knowledge Distillation (KD) to address these problems. First, we devise a length regulator to predict the end of the generated sign pose sequence. We then adopt KD, which distills spatial-linguistic features from a pre-trained pose encoder to alleviate false decoding initiation. Extensive experiments show that the proposed approach significantly outperforms existing SLP models in both Frechet Gesture Distance and Back-Translation evaluation.

Jong C. Park

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Natural Language Processing for Tigrinya: Current State and Future Directions

Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation

ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls

Non-Autoregressive Sign Language Production via Knowledge Distillation