Source author record

Kyumin Lee

Kyumin Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Information Retrieval Artificial Intelligence cs.CY physics.soc-ph Computation and Language Human-Computer Interaction Machine Learning Neural and Evolutionary Computing

Catalog footprint

What is connected

9works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a particularly effective post-training paradigm for improving reasoning capabilities, with critic-free algorithms such as GRPO and GSPO enabling scalable optimization. However, RLVR post-training with full fine-tuning (FFT) requires substantial GPU memory and incurs high training costs. Although parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), effectively reduce computational costs, they often suffer from a noticeable performance gap compared to full fine-tuning in post-training for complex reasoning tasks. In this paper, we propose Hybrid-LoRA, an efficient hybrid post-training framework that selectively applies full fine-tuning to a small subset of modules less suited to low-rank adaptation, while adapting the remaining components with LoRA. We introduce a novel Hybrid-LoRA Score to rank candidate modules according to their sensitivity to low-rank adaptation under a fixed parameter budget. Experiments show that Hybrid-LoRA closely matches full fine-tuning performance under a 10% full fine-tuning module budget, with the remaining candidate modules adapted by LoRA, consistently outperforming four state-of-the-art PEFT post-training baselines, achieving improvements of up to 5.65% and on average 4.36% over the best baseline.

preprint2022arXiv

Extracting and Visualizing Wildlife Trafficking Events from Wildlife Trafficking Reports

Experts combating wildlife trafficking manually sift through articles about seizures and arrests, which is time consuming and make identifying trends difficult. We apply natural language processing techniques to automatically extract data from reports published by the Eco Activists for Governance and Law Enforcement (EAGLE). We expanded Python spaCy's pre-trained pipeline and added a custom named entity ruler, which identified 15 fully correct and 36 partially correct events in 15 reports against an existing baseline, which did not identify any fully correct events. The extracted wildlife trafficking events were inserted to a database. Then, we created visualizations to display trends over time and across regions to support domain experts. These are accessible on our website, Wildlife Trafficking in Africa (https://wildlifemqp.github.io/Visualizations/).

preprint2021arXiv

Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection

The widespread of fake news and misinformation in various domains ranging from politics, economics to public health has posed an urgent need to automatically fact-check information. A recent trend in fake news detection is to utilize evidence from external sources. However, existing evidence-aware fake news detection methods focused on either only word-level attention or evidence-level attention, which may result in suboptimal performance. In this paper, we propose a Hierarchical Multi-head Attentive Network to fact-check textual claims. Our model jointly combines multi-head word-level attention and multi-head document-level attention, which aid explanation in both word-level and evidence-level. Experiments on two real-word datasets show that our model outperforms seven state-of-the-art baselines. Improvements over baselines are from 6\% to 18\%. Our source code and datasets are released at \texttt{\url{https://github.com/nguyenvo09/EACL2021}}.

preprint2020arXiv

Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this paper we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms eight state-of-the-art recommendation models, achieving at least 3~5.3% improvement.

preprint2020arXiv

Quaternion-Based Self-Attentive Long Short-Term User Preference Encoding for Recommendation

Quaternion space has brought several benefits over the traditional Euclidean space: Quaternions (i) consist of a real and three imaginary components, encouraging richer representations; (ii) utilize Hamilton product which better encodes the inter-latent interactions across multiple Quaternion components; and (iii) result in a model with smaller degrees of freedom and less prone to overfitting. Unfortunately, most of the current recommender systems rely on real-valued representations in Euclidean space to model either user's long-term or short-term interests. In this paper, we fully utilize Quaternion space to model both user's long-term and short-term preferences. We first propose a QUaternion-based self-Attentive Long term user Encoding (QUALE) to study the user's long-term intents. Then, we propose a QUaternion-based self-Attentive Short term user Encoding (QUASE) to learn the user's short-term interests. To enhance our models' capability, we propose to fuse QUALE and QUASE into one model, namely QUALSE, by using a Quaternion-based gating mechanism. We further develop Quaternion-based Adversarial learning along with the Bayesian Personalized Ranking (QABPR) to improve our model's robustness. Extensive experiments on six real-world datasets show that our fused QUALSE model outperformed 11 state-of-the-art baselines, improving 8.43% at HIT@1 and 10.27% at NDCG@1 on average compared with the best baseline.

preprint2016arXiv

How to Succeed in Crowdfunding: a Long-Term Study in Kickstarter

Crowdfunding platforms have become important sites where people can create projects to seek funds toward turning their ideas into products, and back someone else's projects. As news media have reported successfully funded projects (e.g., Pebble Time, Coolest Cooler), more people have joined crowdfunding platforms and launched projects. But in spite of rapid growth of the number of users and projects, a project success rate at large has been decreasing because of launching projects without enough preparation and experience. Little is known about what reactions project creators made (e.g., giving up or making the failed projects better) when projects failed, and what types of successful projects we can find. To solve these problems, in this manuscript we (i) collect the largest datasets from Kickstarter, consisting of all project profiles, corresponding user profiles, projects' temporal data and users' social media information; (ii) analyze characteristics of successful projects, behaviors of users and understand dynamics of the crowdfunding platform; (iii) propose novel statistical approaches to predict whether a project will be successful and a range of expected pledged money of the project; (iv) develop predictive models and evaluate performance of the models; (v) analyze what reactions project creators had when project failed, and if they did not give up, how they made the failed projects successful; and (vi) cluster successful projects by their evolutional patterns of pledged money toward understanding what efforts project creators should make in order to get more pledged money. Our experimental results show that the predictive models can effectively predict project success and a range of expected pledged money.

preprint2016arXiv

Understanding Citizen Reactions and Ebola-Related Information Propagation on Social Media

In severe outbreaks such as Ebola, bird flu and SARS, people share news, and their thoughts and responses regarding the outbreaks on social media. Understanding how people perceive the severe outbreaks, what their responses are, and what factors affect these responses become important. In this paper, we conduct a comprehensive study of understanding and mining the spread of Ebola-related information on social media. In particular, we (i) conduct a large-scale data-driven analysis of geotagged social media messages to understand citizen reactions regarding Ebola; (ii) build information propagation models which measure locality of information; and (iii) analyze spatial, temporal and social properties of Ebola-related information. Our work provides new insights into Ebola outbreak by understanding citizen reactions and topic-based information propagation, as well as providing a foundation for analysis and response of future public health crises.

preprint2014arXiv

The Dark Side of Micro-Task Marketplaces: Characterizing Fiverr and Automatically Detecting Crowdturfing

As human computation on crowdsourcing systems has become popular and powerful for performing tasks, malicious users have started misusing these systems by posting malicious tasks, propagating manipulated contents, and targeting popular web services such as online social networks and search engines. Recently, these malicious users moved to Fiverr, a fast-growing micro-task marketplace, where workers can post crowdturfing tasks (i.e., astroturfing campaigns run by crowd workers) and malicious customers can purchase those tasks for only $5. In this paper, we present a comprehensive analysis of Fiverr. First, we identify the most popular types of crowdturfing tasks found in this marketplace and conduct case studies for these crowdturfing tasks. Then, we build crowdturfing task detection classifiers to filter these tasks and prevent them from becoming active in the marketplace. Our experimental results show that the proposed classification approach effectively detects crowdturfing tasks, achieving 97.35% accuracy. Finally, we analyze the real world impact of crowdturfing tasks by purchasing active Fiverr tasks and quantifying their impact on a target site. As part of this analysis, we show that current security systems inadequately detect crowdsourced manipulation, which confirms the necessity of our proposed crowdturfing task detection approach.

preprint2014arXiv

Who Will Retweet This? Automatically Identifying and Engaging Strangers on Twitter to Spread Information

There has been much effort on studying how social media sites, such as Twitter, help propagate information in different situations, including spreading alerts and SOS messages in an emergency. However, existing work has not addressed how to actively identify and engage the right strangers at the right time on social media to help effectively propagate intended information within a desired time frame. To address this problem, we have developed two models: (i) a feature-based model that leverages peoples' exhibited social behavior, including the content of their tweets and social interactions, to characterize their willingness and readiness to propagate information on Twitter via the act of retweeting; and (ii) a wait-time model based on a user's previous retweeting wait times to predict her next retweeting time when asked. Based on these two models, we build a recommender system that predicts the likelihood of a stranger to retweet information when asked, within a specific time window, and recommends the top-N qualified strangers to engage with. Our experiments, including live studies in the real world, demonstrate the effectiveness of our work.

Kyumin Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

Extracting and Visualizing Wildlife Trafficking Events from Wildlife Trafficking Reports

Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection

Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

Quaternion-Based Self-Attentive Long Short-Term User Preference Encoding for Recommendation

How to Succeed in Crowdfunding: a Long-Term Study in Kickstarter

Understanding Citizen Reactions and Ebola-Related Information Propagation on Social Media

The Dark Side of Micro-Task Marketplaces: Characterizing Fiverr and Automatically Detecting Crowdturfing

Who Will Retweet This? Automatically Identifying and Engaging Strangers on Twitter to Spread Information