Researcher profile

James Allan

James Allan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

European Aerosol Phenomenology -- 8: Harmonised Source Apportionment of Organic Aerosol using 22 Year-long ACSM/AMS Datasets

Organic aerosol (OA) is a key component to total submicron particulate matter (PM1), and comprehensive knowledge of OA sources across Europe is crucial to mitigate PM1 levels. Europe has a well-established air quality research infrastructure from which yearlong datasets using 21 aerosol chemical speciation monitors (ACSMs) and 1 aerosol mass spectrometer (AMS) were gathered during 2013-2019. It includes 9 non-urban and 13 urban sites. This study developed a state-of-the-art source apportionment protocol to analyse long-term OA mass spectrum data by applying the most advanced source apportionment strategies (i.e., rolling PMF, ME-2, and bootstrap). This harmonised protocol enables the quantifications of the most common OA components such as hydrocarbon-like OA (HOA), biomass burning OA (BBOA), cooking-like OA (COA), more oxidised-oxygenated OA (MO-OOA), and less oxidised-oxygenated OA (LO-OOA). Other components such as coal combustion OA (CCOA), solid fuel OA (SFOA: mainly mixture of coal and peat combustion), cigarette smoke OA (CSOA), sea salt (mostly inorganic but part of the OA mass spectrum), coffee OA, and ship industry OA could also be separated at a few specific sites. Oxygenated OA (OOA) components make up most of the submicron OA mass (average = 71.1%, a range of 43.7-100%). Solid fuel combustion-related OA components (i.e., BBOA, CCOA, and SFOA) are still considerable with in total 16.0% yearly contribution to the OA, yet mainly during winter months (21.4%). Overall, this comprehensive protocol works effectively across all sites governed by different sources and generates robust and consistent source apportionment results. Our work presents a comprehensive overview of OA sources in Europe with a unique combination of high time resolution and long-term data coverage (9-36 months), providing essential information to improve/validate air quality, health impact, and climate models.

preprint2021arXiv

CEQE: Contextualized Embeddings for Query Expansion

In this work we leverage recent advances in context-sensitive language models to improve the task of query expansion. Contextualized word representation models, such as ELMo and BERT, are rapidly replacing static embedding models. We propose a new model, Contextualized Embeddings for Query Expansion (CEQE), that utilizes query-focused contextualized embedding vectors. We study the behavior of contextual representations generated for query expansion in ad-hoc document retrieval. We conduct our experiments on probabilistic retrieval models as well as in combination with neural ranking models. We evaluate CEQE on two standard TREC collections: Robust and Deep Learning. We find that CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. We further find that multiple passes of expansion and reranking result in continued gains in effectiveness with CEQE-based approaches outperforming other approaches. The final model incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-of-the-art transformer-based re-ranking model, Birch.

preprint2020arXiv

A Study of Neural Matching Models for Cross-lingual IR

In this study, we investigate interaction-based neural matching models for ad-hoc cross-lingual information retrieval (CLIR) using cross-lingual word embeddings (CLWEs). With experiments conducted on the CLEF collection over four language pairs, we evaluate and provide insight into different neural model architectures, different ways to represent query-document interactions and word-pair similarity distributions in CLIR. This study paves the way for learning an end-to-end CLIR system using CLWEs.