Researcher profile

Huy Nghiem

Huy Nghiem contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

VietMix: A Naturally-Occurring Parallel Corpus and Augmentation Framework for Vietnamese-English Code-Mixed Machine Translation

Machine translation (MT) systems universally degrade when faced with code-mixed text. This problem is more acute for low-resource languages that lack dedicated parallel corpora. This work directly addresses this gap for Vietnamese-English, a language context characterized by challenges including orthographic ambiguity and the frequent omission of diacritics in informal text. We introduce VietMix, the first expert-translated, naturally occurring parallel corpus of Vietnamese-English code-mixed text. We establish VietMix's utility by developing a data augmentation pipeline that leverages iterative fine-tuning and targeted filtering. Experiments show that models augmented with our data outperform strong back-translation baselines by up to +3.5 xCOMET points and improve zero-shot models by up to +11.9 points. Our work delivers a foundational resource for a challenging language pair and provides a validated, transferable framework for building and augmenting corpora in other low-resource settings.

preprint2022arXiv

"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic

Content warning: This work displays examples of explicit and/or strongly offensive language. Fueled by a surge of anti-Asian xenophobia and prejudice during the COVID-19 pandemic, many have taken to social media to express these negative sentiments. Identifying these posts is crucial for moderation and understanding the nature of hate in online spaces. In this paper, we create and annotate a corpus of tweets to explore anti-Asian hate speech with a finer level of granularity. Our analysis reveals that this emergent form of hate speech often eludes established approaches. To address this challenge, we develop a model and an accompanied efficient training regimen that incorporates agreement between annotators. Our approach produces up to 8.8% improvement in macro F1 scores over a strong established baseline, indicating its effectiveness even in settings where consensus among annotators is low. We demonstrate that we are able to identify hate speech that is systematically missed by established hate speech detectors.