Source author record

Raymond Ng

Raymond Ng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.HE astro-ph.IM Computation and Language gr-qc

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adapting Natural Language Processing Models Across Jurisdictions: A pilot Study in Canadian Cancer Registries

Population-based cancer registries depend on pathology reports as their primary diagnostic source, yet manual abstraction is resource-intensive and contributes to delays in cancer data. While transformer-based NLP systems have improved registry workflows, their ability to generalize across jurisdictions with differing reporting conventions remains poorly understood. We present the first cross-provincial evaluation of adapting BCCRTron, a domain-adapted transformer model developed at the British Columbia Cancer Registry, alongside GatorTron, a biomedical transformer model, for cancer surveillance in Canada. Our training dataset consisted of approximately 104,000 and 22,000 de-identified pathology reports from the Newfoundland & Labrador Cancer Registry (NLCR) for Tier 1 (cancer vs. non-cancer) and Tier 2 (reportable vs. non-reportable) tasks, respectively. Both models were fine-tuned using complementary synoptic and diagnosis focused report section input pipelines. Across NLCR test sets, the adapted models maintained high performance, demonstrating transformers pretrained in one jurisdiction can be localized to another with modest fine-tuning. To improve sensitivity, we combined the two models using a conservative OR-ensemble achieving a Tier 1 recall of 0.99 and reduced missed cancers to 24, compared with 48 and 54 for the standalone models. For Tier 2, the ensemble achieved 0.99 recall and reduced missed reportable cancers to 33, compared with 54 and 46 for the individual models. These findings demonstrate that an ensemble combining complementary text representations substantially reduce missed cancers and improve error coverage in cancer-registry NLP. We implement a privacy-preserving workflow in which only model weights are shared between provinces, supporting interoperable NLP infrastructure and a future pan-Canadian foundation model for cancer pathology and registry workflows.

preprint2022arXiv

UniMAP: Model-free detection of unclassified noise transients in LIGO-Virgo data using the Temporal Outlier Factor

Data from current gravitational wave detectors contains a high rate of transient noise (glitches) that can trigger false detections and obscure true astrophysical events. Existing noise-detection algorithms largely rely on model-based methods that may miss noise transients unwitnessed by auxiliary sensors or with exotic morphologies. We propose the Unicorn Multi-window Anomaly-detection Pipeline (UniMAP): a model-free algorithm to identify and characterize transient noise leveraging the Temporal Outlier Factor (TOF) via a multi-window data-resampling scheme. We show this windowing scheme extends the anomaly detection capabilities of the TOF algorithm to resolve noise transients of arbitrary morphology and duration. We demonstrate the efficacy of this pipeline in detecting glitches during LIGO and Virgo's third observing run, and discuss potential applications.