Source author record

Jinqiang Wang

Jinqiang Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Human-Computer Interaction Artificial Intelligence Computation and Language

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

LLM-Driven Preference Data Synthesis for Proactive Prediction of the Next User Utterance in Human-Machine Dialogue

Proactively predicting a users next utterance in human-machine dialogue can streamline interaction and improve user experience. Existing commercial API-based solutions are subject to privacy concerns while deploying general-purpose LLMs locally remains computationally expensive. As such, training a compact, task-specific LLM provides a practical alternative. Although user simulator methods can predict a user's next utterance, they mainly imitate their speaking style rather than advancing the dialogue. Preference data synthesis has been investigated to generate data for proactive next utterance prediction and help align LLMs with user preferences. Yet existing methods lack the ability to explicitly model the intent reasoning that leads to the user's next utterance and to define and synthesize preference and non-preference reasoning processes for predicting the user's next utterance.To address these challenges, we propose ProUtt, an LLM-driven preference data synthesis method for proactive next utterance prediction. ProUtt converts dialogue history into an intent tree and explicitly models intent reasoning trajectories by predicting the next plausible path from both exploitation and exploration perspectives. It then constructs preference and non-preference reasoning processes by perturbing or revising intent tree paths at different future turns. Extensive evaluations using LLM-as-a-judge and human judgments demonstrate that ProUtt consistently outperforms existing data synthesis methods, user simulators, and commercial LLM APIs across four benchmark datasets. We release both the code and the synthesized datasets to facilitate future research.

preprint2022arXiv

Negative Selection by Clustering for Contrastive Learning in Human Activity Recognition

Contrastive learning has been applied to Human Activity Recognition (HAR) based on sensor data owing to its ability to achieve performance comparable to supervised learning with a large amount of unlabeled data and a small amount of labeled data. The pre-training task for contrastive learning is generally instance discrimination, which specifies that each instance belongs to a single class, but this will consider the same class of samples as negative examples. Such a pre-training task is not conducive to human activity recognition tasks, which are mainly classification tasks. To address this problem, we follow SimCLR to propose a new contrastive learning framework that negative selection by clustering in HAR, which is called ClusterCLHAR. Compared with SimCLR, it redefines the negative pairs in the contrastive loss function by using unsupervised clustering methods to generate soft labels that mask other samples of the same cluster to avoid regarding them as negative samples. We evaluate ClusterCLHAR on three benchmark datasets, USC-HAD, MotionSense, and UCI-HAR, using mean F1-score as the evaluation metric. The experiment results show that it outperforms all the state-of-the-art methods applied to HAR in self-supervised learning and semi-supervised learning.

preprint2022arXiv

Sensor Data Augmentation by Resampling for Contrastive Learning in Human Activity Recognition

While deep learning has contributed to the advancement of sensor-based Human Activity Recognition (HAR), it is usually a costly and challenging supervised task with the needs of a large amount of labeled data. To alleviate this issue, contrastive learning has been applied for sensor-based HAR. Data augmentation is an essential part of contrastive learning and has a significant impact on the performance of downstream tasks. However, current popular augmentation methods do not achieve competitive performance in contrastive learning for sensor-based HAR. Motivated by this issue, we propose a new sensor data augmentation method by resampling, which simulates more realistic activity data by varying the sampling frequency to maximize the coverage of the sampling space. In addition, we extend MoCo, a popular contrastive learning framework, to MoCoHAR for HAR. The resampling augmentation method will be evaluated on two contrastive learning frameworks, SimCLRHAR and MoCoHAR, using UCI-HAR, MotionSensor, and USC-HAD datasets. The experiment results show that the resampling augmentation method outperforms all state-of-the-art methods under a small amount of labeled data, on SimCLRHAR and MoCoHAR, with mean F1-score as the evaluation metric. The results also demonstrate that not all data augmentation methods have positive effects in the contrastive learning framework.

Jinqiang Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

LLM-Driven Preference Data Synthesis for Proactive Prediction of the Next User Utterance in Human-Machine Dialogue

Negative Selection by Clustering for Contrastive Learning in Human Activity Recognition

Sensor Data Augmentation by Resampling for Contrastive Learning in Human Activity Recognition