Source author record

James Allan

James Allan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Computation and Language physics.ao-ph

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

European Aerosol Phenomenology -- 8: Harmonised Source Apportionment of Organic Aerosol using 22 Year-long ACSM/AMS Datasets

Organic aerosol (OA) is a key component to total submicron particulate matter (PM1), and comprehensive knowledge of OA sources across Europe is crucial to mitigate PM1 levels. Europe has a well-established air quality research infrastructure from which yearlong datasets using 21 aerosol chemical speciation monitors (ACSMs) and 1 aerosol mass spectrometer (AMS) were gathered during 2013-2019. It includes 9 non-urban and 13 urban sites. This study developed a state-of-the-art source apportionment protocol to analyse long-term OA mass spectrum data by applying the most advanced source apportionment strategies (i.e., rolling PMF, ME-2, and bootstrap). This harmonised protocol enables the quantifications of the most common OA components such as hydrocarbon-like OA (HOA), biomass burning OA (BBOA), cooking-like OA (COA), more oxidised-oxygenated OA (MO-OOA), and less oxidised-oxygenated OA (LO-OOA). Other components such as coal combustion OA (CCOA), solid fuel OA (SFOA: mainly mixture of coal and peat combustion), cigarette smoke OA (CSOA), sea salt (mostly inorganic but part of the OA mass spectrum), coffee OA, and ship industry OA could also be separated at a few specific sites. Oxygenated OA (OOA) components make up most of the submicron OA mass (average = 71.1%, a range of 43.7-100%). Solid fuel combustion-related OA components (i.e., BBOA, CCOA, and SFOA) are still considerable with in total 16.0% yearly contribution to the OA, yet mainly during winter months (21.4%). Overall, this comprehensive protocol works effectively across all sites governed by different sources and generates robust and consistent source apportionment results. Our work presents a comprehensive overview of OA sources in Europe with a unique combination of high time resolution and long-term data coverage (9-36 months), providing essential information to improve/validate air quality, health impact, and climate models.

preprint2021arXiv

CEQE: Contextualized Embeddings for Query Expansion

In this work we leverage recent advances in context-sensitive language models to improve the task of query expansion. Contextualized word representation models, such as ELMo and BERT, are rapidly replacing static embedding models. We propose a new model, Contextualized Embeddings for Query Expansion (CEQE), that utilizes query-focused contextualized embedding vectors. We study the behavior of contextual representations generated for query expansion in ad-hoc document retrieval. We conduct our experiments on probabilistic retrieval models as well as in combination with neural ranking models. We evaluate CEQE on two standard TREC collections: Robust and Deep Learning. We find that CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. We further find that multiple passes of expansion and reranking result in continued gains in effectiveness with CEQE-based approaches outperforming other approaches. The final model incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-of-the-art transformer-based re-ranking model, Birch.

preprint2020arXiv

A Study of Neural Matching Models for Cross-lingual IR

In this study, we investigate interaction-based neural matching models for ad-hoc cross-lingual information retrieval (CLIR) using cross-lingual word embeddings (CLWEs). With experiments conducted on the CLEF collection over four language pairs, we evaluate and provide insight into different neural model architectures, different ways to represent query-document interactions and word-pair similarity distributions in CLIR. This study paves the way for learning an end-to-end CLIR system using CLWEs.