Researcher profile

Jey Han Lau

Jey Han Lau contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Beyond Long Tail POIs: Transition-Centered Generalization for Human Mobility Prediction

Human mobility prediction forecasts a user's next Point of Interest (POI) from historical trajectories, supporting applications from recommendation to urban planning. Recent studies have recognized the problem with long-tail POIs in human mobility prediction, which are POIs with few visit records, making new visits to such POIs difficult to predict. Our analysis shows that many predictions fail even for visits to popular POIs. The underlying cause is often transition-level sparsity: the corresponding source-destination transition appears rarely, or never appears, in the training set. We therefore argue that a core bottleneck in human mobility prediction lies in transition-level long-tail generalization. We formulate this problem as compositional generalization and propose a tRansition rEconstruction framework for Compositional generAlization in next-POI prediction (RECAP). RECAP reconstructs long-tail transitions from two generalizable signals: multi-hop transitivity in the global transition graph and revisit evidence from a user's historical trajectory. It further uses warm-transition holdout training to discourage memorization of frequent transitions and encourage generalization from transferable signals. Experiments on multiple real-world datasets show that RECAP consistently improves prediction accuracy, with clear gains on tail transitions.

preprint2026arXiv

PathISE: Learning Informative Path Supervision for Knowledge Graph Question Answering

Knowledge Graph Question Answering (KGQA) aims to answer user questions by reasoning over Knowledge Graphs (KGs). Recent KGQA methods mainly follow the retrieval-augmented generation paradigm to ground Large Language Models~(LLMs) with structured knowledge from KGs. However, training effective models to retrieve question-relevant evidence from KGs typically requires high-quality intermediate supervision signals, such as question-relevant paths or subgraphs, which are time- and resource-intensive to obtain. We propose PathISE, a novel framework for learning high-quality intermediate supervision from answer-level labels. PathISE introduces a lightweight transformer-based estimator that estimates the informativeness of relation paths to construct pseudo path-level supervision. This supervision is then distilled into an LLM path generator, whose generated paths are grounded in the KG to provide compact evidence for inductive answer reasoning. ExtensiveISE experiments on three KGQA benchmarks show that PathISE achieves competitive or state-of-the-art KGQA performance, and provides reusable supervision signals that can enhance existing KGQA models, without relying on costly LLM-refined supervision signals. Our source code is available at https://anonymous.4open.science/r/PathISE-2F87.

preprint2022arXiv

An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation

We study the interpretability issue of task-oriented dialogue systems in this paper. Previously, most neural-based task-oriented dialogue systems employ an implicit reasoning strategy that makes the model predictions uninterpretable to humans. To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains. Since deriving reasoning chains requires multi-hop reasoning for task-oriented dialogues, existing neuro-symbolic approaches would induce error propagation due to the one-phase design. To overcome this, we propose a two-phase approach that consists of a hypothesis generator and a reasoner. We first obtain multiple hypotheses, i.e., potential operations to perform the desired task, through the hypothesis generator. Each hypothesis is then verified by the reasoner, and the valid one is selected to conduct the final prediction. The whole system is trained by exploiting raw textual dialogues without using any reasoning chain annotations. Experimental studies on two public benchmark datasets demonstrate that the proposed approach not only achieves better results, but also introduces an interpretable decision process.

preprint2022arXiv

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences). We construct a novel dataset for focus, coverage, and inter-sentential coherence, and develop automatic methods for evaluating each of the four dimensions of FFCI based on cross-comparison of evaluation metrics and model-based evaluation methods, including question answering (QA) approaches, semantic textual similarity (STS), next-sentence prediction (NSP), and scores derived from 19 pre-trained language models. We then apply the developed metrics in evaluating a broad range of summarization models across two datasets, with some surprising findings.

preprint2022arXiv

ITTC @ TREC 2021 Clinical Trials Track

This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track. The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes. We explore different ways of representing trials and topics using NLP techniques, and then use a common retrieval model to generate the ranked list of relevant trials for each topic. The results from all our submitted runs are well above the median scores for all topics, but there is still plenty of scope for improvement.

preprint2022arXiv

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia's 700+ languages. We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems. Finally, we provide general recommendations to help develop NLP technology not only for languages of Indonesia but also other underrepresented languages.

preprint2022arXiv

Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering

Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks. In this paper, we focus on task-oriented dialogue and investigate whether popular datasets such as MultiWOZ contain such data artifacts. We found that by only keeping frequent phrases in the training examples, state-of-the-art models perform similarly compared to the variant trained with full data, suggesting they exploit these spurious correlations to solve the task. Motivated by this, we propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns. We also experiment with adversarial filtering to remove "easy" training instances so that the model would focus on learning from the "harder" instances. We conduct a number of generalization experiments -- e.g., cross-domain/dataset and adversarial tests -- to assess the robustness of our approach and found that it works exceptionally well.

preprint2020arXiv

COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research

We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest. We developed this system over COVID-19 literature to help medical professionals and researchers explore the literature evidence, and improve findability of relevant information. COVID-SEE is available at http://covid-see.com.

preprint2020arXiv

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to assess the quality of these adversarial examples, as minor perturbations (such as changing a word in a sentence) can lead to a significant shift in their meaning, readability and classification label. In this paper, we propose an evaluation framework consisting of a set of automatic evaluation metrics and human evaluation guidelines, to rigorously assess the quality of adversarial examples based on the aforementioned properties. We experiment with six benchmark attacking methods and found that some methods generate adversarial examples with poor readability and content preservation. We also learned that multiple factors could influence the attacking performance, such as the length of the text inputs and architecture of the classifiers.

preprint2020arXiv

Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?

As part of growing NLP capabilities, coupled with an awareness of the ethical dimensions of research, questions have been raised about whether particular datasets and tasks should be deemed off-limits for NLP research. We examine this question with respect to a paper on automatic legal sentencing from EMNLP 2019 which was a source of some debate, in asking whether the paper should have been allowed to be published, who should have been charged with making such a decision, and on what basis. We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.

preprint2020arXiv

How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context

We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect which uniformly raises acceptability. Next, we test unidirectional and bidirectional language models in their ability to predict acceptability ratings. The bidirectional models show very promising results, with the best model achieving a new state-of-the-art for unsupervised acceptability prediction. The two sets of experiments provide insights into the cognitive aspects of sentence processing and central issues in the computational modelling of text and discourse.

preprint2020arXiv

Less is More: Rejecting Unreliable Reviews for Product Question Answering

Promptly and accurately answering questions on products is important for e-commerce applications. Manually answering product questions (e.g. on community question answering platforms) results in slow response and does not scale. Recent studies show that product reviews are a good source for real-time, automatic product question answering (PQA). In the literature, PQA is formulated as a retrieval problem with the goal to search for the most relevant reviews to answer a given product question. In this paper, we focus on the issue of answerability and answer reliability for PQA using reviews. Our investigation is based on the intuition that many questions may not be answerable with a finite set of reviews. When a question is not answerable, a system should return nil answers rather than providing a list of irrelevant reviews, which can have significant negative impact on user experience. Moreover, for answerable questions, only the most relevant reviews that answer the question should be included in the result. We propose a conformal prediction based framework to improve the reliability of PQA systems, where we reject unreliable answers so that the returned results are more concise and accurate at answering the product question, including returning nil answers for unanswerable questions. Experiments on a widely used Amazon dataset show encouraging results of our proposed framework. More broadly, our results demonstrate a novel and effective application of conformal methods to a retrieval task.

preprint2020arXiv

You are right. I am ALARMED -- But by Climate Change Counter Movement

The world is facing the challenge of climate crisis. Despite the consensus in scientific community about anthropogenic global warming, the web is flooded with articles spreading climate misinformation. These articles are carefully constructed by climate change counter movement (cccm) organizations to influence the narrative around climate change. We revisit the literature on climate misinformation in social sciences and repackage it to introduce in the community of NLP. Despite considerable work in detection of fake news, there is no misinformation dataset available that is specific to the domain.of climate change. We try to bridge this gap by scraping and releasing articles with known climate change misinformation.