Source author record

Patrick Ng

Patrick Ng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision cs.CY Information Retrieval

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual information. This limits their ability to solve problems that demand accurate visual perception, such as visual puzzles. We show that visual perception is the key bottleneck in such tasks: converting images into textual descriptions significantly improves performance, yielding gains of 26.7% for Claude 3.5 and 23.6% for Claude 3.7. To address this, we investigate reward-driven RL as a mechanism to unlock long visual reasoning in open-source MLLMs without requiring costly supervision. We design and evaluate six reward functions targeting different reasoning aspects, including image understanding, thinking steps, and answer accuracy. Using group relative policy optimization (GRPO), our approach explicitly incentivizes longer, structured reasoning and mitigates bypassing of visual information. Experiments on Qwen-2.5-VL-7B achieve 5.56% improvements over the base model, with consistent gains across both in-domain and out-of-domain settings.

preprint2026arXiv

MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation

Multimodal Retrieval-Augmented Generation (Visual RAG) significantly advances question answering by integrating visual and textual evidence. Yet, current evaluations fail to systematically account for query difficulty and ambiguity. We propose MRAG-Suite, a diagnostic evaluation platform integrating diverse multimodal benchmarks (WebQA, Chart-RAG, Visual-RAG, MRAG-Bench). We introduce difficulty-based and ambiguity-aware filtering strategies, alongside MM-RAGChecker, a claim-level diagnostic tool. Our results demonstrate substantial accuracy reductions under difficult and ambiguous queries, highlighting prevalent hallucinations. MM-RAGChecker effectively diagnoses these issues, guiding future improvements in Visual RAG systems.

preprint2022arXiv

REKnow: Enhanced Knowledge for Joint Entity and Relation Extraction

Relation extraction is an important but challenging task that aims to extract all hidden relational facts from the text. With the development of deep language models, relation extraction methods have achieved good performance on various benchmarks. However, we observe two shortcomings of previous methods: first, there is no unified framework that works well under various relation extraction settings; second, effectively utilizing external knowledge as background information is absent. In this work, we propose a knowledge-enhanced generative model to mitigate these two issues. Our generative model is a unified framework to sequentially generate relational triplets under various relation extraction settings and explicitly utilizes relevant knowledge from Knowledge Graph (KG) to resolve ambiguities. Our model achieves superior performance on multiple benchmarks and settings, including WebNLG, NYT10, and TACRED.

preprint2020arXiv

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships. Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14%, and 20% when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA.

preprint2016arXiv

Do Price Ranges Increase Click-throughs?

An important goal of online comparison shopping services is to "convert" a viewer from general product category pages (for example product groups such as "smartphones" or "air-conditioners") to detailed product pages and ultimately to order pages. Comparison shopping websites provide a familiar web interface as well as a chance for consumers to purchase items at competitive prices. In return for providing access to a large market of potential consumers, the comparison shopping service usually receives financial compensation for product clicks and orders. This study looked at 2.5 million product listing visits at price.com.hk to determine whether a modification in the way prices are displayed on general category pages resulted in more "conversions" to product detail pages. We found a statistically significant improvement over-all as a result of the new price display resulting in 3.6% more product clicks over all categories. Additional analysis showed that the effect is heterogeneous among different categories, and in a few cases there may be some categories negatively affected by the display modification.