Source author record

Alan F. Smeaton

Alan F. Smeaton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia Human-Computer Interaction Artificial Intelligence Machine Learning eess.IV Computational Complexity eess.SP Computation and Language Information Retrieval

Catalog footprint

What is connected

20works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Analysis of Conversational Volatility During Telecollaboration Sessions for Second Language Learning

Tandem telecollaboration is a pedagogy used in second language learning where mixed groups of students meet online in videoconferencing sessions to practice their conversational skills in their target language. We have built and deployed a system called L2 Learning to support post-session review and self-reflection on students participation in such meetings. We automatically compute a metric called Conversational Volatility which quantifies the amount of interaction among participants, indicating how dynamic or flat the conversations were. Our analysis on more than 100 hours of video recordings involving 28 of our students indicates that conversations do not get more dynamic as meetings progress, that there is a wide variety of levels of interaction across students and student groups, and the speaking in French appears to have more animated conversations than speaking in English, though the reasons for that are not clear.

preprint2022arXiv

Playback-centric visualisations of video usage using weighted interactions to guide where to watch in an educational context

The increase in use of online educational tools has led to a large amount of educational video materials made available for students. Finding the right video content is usually supported by the overarching learning management system and its interface that organises video items by course, categories and weeks, and makes them searchable. However, once a video is found, students are left without further guidance as to what parts in that video they should focus on. In this article, an additional timeline visualisation to augment the conventional playback timeline is introduced which employs a novel playback weighting strategy in which the history of different video interactions generate scores based on the context of each playback. The resultant scores are presented on the additional timeline, making it in effect a playback-centric usage graph nuanced by how each playback was executed. Students can selectively watch those portions which the contour of the usage visualisation suggests. The visualisation was implemented and deployed in an undergraduate course at a university for two full semesters. 270 students used the system throughout both semesters watching 52 videos, guided by visualisations on what to watch. Analysis of playback logs revealed students selectively watched corresponding to the most important portions of the videos as assessed by the instructor who created the videos. The characteristics of this as a way of guiding students as to where to watch as well as a complementary tool for playback analysis, are discussed. Further insights into the potential values of this visualisation and its underlying playback weighting strategy are also discussed.

preprint2021arXiv

Attention Based Video Summaries of Live Online Zoom Classes

This paper describes a system developed to help University students get more from their online lectures, tutorials, laboratory and other live sessions. We do this by logging their attention levels on their laptops during live Zoom sessions and providing them with personalised video summaries of those live sessions. Using facial attention analysis software we create personalised video summaries composed of just the parts where a student's attention was below some threshold. We can also factor in other criteria into video summary generation such as parts where the student was not paying attention while others in the class were, and parts of the video that other students have replayed extensively which a given student has not. Attention and usage based video summaries of live classes are a form of personalised content, they are educational video segments recommended to highlight important parts of live sessions, useful in both topic understanding and in exam preparation. The system also allows a Professor to review the aggregated attention levels of those in a class who attended a live session and logged their attention levels. This allows her to see which parts of the live activity students were paying most, and least, attention to. The Help-Me-Watch system is deployed and in use at our University in a way that protects student's personal data, operating in a GDPR-compliant way.

preprint2021arXiv

Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding

Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to be complex and computationally intense, and may be difficult to deploy in a practical video coding pipeline. This work focuses on reducing the complexity of such methodologies, to design a set of simplified and cost-effective attention-based architectures for chroma intra-prediction. A novel size-agnostic multi-model approach is proposed to reduce the complexity of the inference process. The resulting simplified architecture is still capable of outperforming state-of-the-art methods. Moreover, a collection of simplifications is presented in this paper, to further reduce the complexity overhead of the proposed prediction architecture. Thanks to these simplifications, a reduction in the number of parameters of around 90% is achieved with respect to the original attention-based methodologies. Simplifications include a framework for reducing the overhead of the convolutional operations, a simplified cross-component processing model integrated into the original architecture, and a methodology to perform integer-precision approximations with the aim to obtain fast and hardware-aware implementations. The proposed schemes are integrated into the Versatile Video Coding (VVC) prediction pipeline, retaining compression efficiency of state-of-the-art chroma intra-prediction methods based on neural networks, while offering different directions for significantly reducing coding complexity.

preprint2021arXiv

Image Segmentation to Identify Safe Landing Zones for Unmanned Aerial Vehicles

There is a marked increase in delivery services in urban areas, and with Jeff Bezos claiming that 86% of the orders that Amazon ships weigh less than 5 lbs, the time is ripe for investigation into economical methods of automating the final stage of the delivery process. With the advent of semi-autonomous drone delivery services, such as Irish startup `Manna', and Malta's `Skymax', the final step of the delivery journey remains the most difficult to automate. This paper investigates the use of simple images captured by a single RGB camera on a UAV to distinguish between safe and unsafe landing zones. We investigate semantic image segmentation frameworks as a way to identify safe landing zones and demonstrate the accuracy of lightweight models that minimise the number of sensors needed. By working with images rather than video we reduce the amount of energy needed to identify safe landing zones for a drone, without the need for human intervention.

preprint2021arXiv

Using a GAN to Generate Adversarial Examples to Facial Image Recognition

Images posted online present a privacy concern in that they may be used as reference examples for a facial recognition system. Such abuse of images is in violation of privacy rights but is difficult to counter. It is well established that adversarial example images can be created for recognition systems which are based on deep neural networks. These adversarial examples can be used to disrupt the utility of the images as reference examples or training data. In this work we use a Generative Adversarial Network (GAN) to create adversarial examples to deceive facial recognition and we achieve an acceptable success rate in fooling the face recognition. Our results reduce the training time for the GAN by removing the discriminator component. Furthermore, our results show knowledge distillation can be employed to drastically reduce the size of the resulting model without impacting performance indicating that our contribution could run comfortably on a smartphone

preprint2020arXiv

A Neuro-AI Interface for Evaluating Generative Adversarial Networks

Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often require large sample sizes for evaluation and do not directly reflect human perception of image quality. In this work, we introduce an evaluation metric called Neuroscore, for evaluating the performance of GANs, that more directly reflects psychoperceptual image quality through the utilization of brain signals. Our results show that Neuroscore has superior performance to the current evaluation metrics in that: (1) It is more consistent with human judgment; (2) The evaluation process needs much smaller numbers of samples; and (3) It is able to rank the quality of images on a per GAN basis. A convolutional neural network (CNN) based neuro-AI interface is proposed to predict Neuroscore from GAN-generated images directly without the need for neural responses. Importantly, we show that including neural responses during the training phase of the network can significantly improve the prediction capability of the proposed model. Codes and data can be referred at this link: https://github.com/villawang/Neuro-AI-Interface.

preprint2020arXiv

Chroma Intra Prediction with attention-based CNN architectures

Neural networks can be used in video coding to improve chroma intra-prediction. In particular, usage of fully-connected networks has enabled better cross-component prediction with respect to traditional linear models. Nonetheless, state-of-the-art architectures tend to disregard the location of individual reference samples in the prediction process. This paper proposes a new neural network architecture for cross-component intra-prediction. The network uses a novel attention module to model spatial relations between reference and predicted samples. The proposed approach is integrated into the Versatile Video Coding (VVC) prediction pipeline. Experimental results demonstrate compression gains over the latest VVC anchor compared with state-of-the-art chroma intra-prediction methods based on neural networks.

preprint2020arXiv

Investigating Memorability of Dynamic Media

The Predicting Media Memorability task in MediaEval'20 has some challenging aspects compared to previous years. In this paper we identify the high-dynamic content in videos and dataset of limited size as the core challenges for the task, we propose directions to overcome some of these challenges and we present our initial result in these directions.

preprint2020arXiv

Keystroke Dynamics as Part of Lifelogging

In this paper we present the case for including keystroke dynamics in lifelogging. We describe how we have used a simple keystroke logging application called Loggerman, to create a dataset of longitudinal keystroke timing data spanning a period of more than 6 months for 4 participants. We perform a detailed analysis of this data by examining the timing information associated with bigrams or pairs of adjacently-typed alphabetic characters. We show how there is very little day-on-day variation of the keystroke timing among the top-200 bigrams for some participants and for others there is a lot and this correlates with the amount of typing each would do on a daily basis. We explore how daily variations could correlate with sleep score from the previous night but find no significant relation-ship between the two. Finally we describe the public release of this data as well including as a series of pointers for future work including correlating keystroke dynamics with mood and fatigue during the day.

preprint2020arXiv

Leveraging Audio Gestalt to Predict Media Memorability

Memorability determines what evanesces into emptiness, and what worms its way into the deepest furrows of our minds. It is the key to curating more meaningful media content as we wade through daily digital torrents. The Predicting Media Memorability task in MediaEval 2020 aims to address the question of media memorability by setting the task of automatically predicting video memorability. Our approach is a multimodal deep learning-based late fusion that combines visual, semantic, and auditory features. We used audio gestalt to estimate the influence of the audio modality on overall video memorability, and accordingly inform which combination of features would best predict a given video's memorability scores.

preprint2020arXiv

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU Project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features.

preprint2020arXiv

Optimised Convolutional Neural Networks for Heart Rate Estimation and Human Activity Recognition in Wrist Worn Sensing Applications

Wrist-worn smart devices are providing increased insights into human health, behaviour and performance through sophisticated analytics. However, battery life, device cost and sensor performance in the face of movement-related artefact present challenges which must be further addressed to see effective applications and wider adoption through commoditisation of the technology. We address these challenges by demonstrating, through using a simple optical measurement, photoplethysmography (PPG) used conventionally for heart rate detection in wrist-worn sensors, that we can provide improved heart rate and human activity recognition (HAR) simultaneously at low sample rates, without an inertial measurement unit. This simplifies hardware design and reduces costs and power budgets. We apply two deep learning pipelines, one for human activity recognition and one for heart rate estimation. HAR is achieved through the application of a visual classification approach, capable of robust performance at low sample rates. Here, transfer learning is leveraged to retrain a convolutional neural network (CNN) to distinguish characteristics of the PPG during different human activities. For heart rate estimation we use a CNN adopted for regression which maps noisy optical signals to heart rate estimates. In both cases, comparisons are made with leading conventional approaches. Our results demonstrate a low sampling frequency can achieve good performance without significant degradation of accuracy. 5 Hz and 10 Hz were shown to have 80.2% and 83.0% classification accuracy for HAR respectively. These same sampling frequencies also yielded a robust heart rate estimation which was comparative with that achieved at the more energy-intensive rate of 256 Hz.

preprint2020arXiv

Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?

This paper describes the MediaEval 2020 \textit{Predicting Media Memorability} task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video-to-Text dataset, containing more action rich video content as compared with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for participants' run submissions.

preprint2020arXiv

Synthetic-Neuroscore: Using A Neuro-AI Interface for Evaluating Generative Adversarial Networks

Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. Arguably the most striking results have been in the area of image synthesis. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often require large sample sizes for evaluation and do not directly reflect human perception of image quality. In this work, we describe an evaluation metric we call Neuroscore, for evaluating the performance of GANs, that more directly reflects psychoperceptual image quality through the utilization of brain signals. Our results show that Neuroscore has superior performance to the current evaluation metrics in that: (1) It is more consistent with human judgment; (2) The evaluation process needs much smaller numbers of samples; and (3) It is able to rank the quality of images on a per GAN basis. A convolutional neural network (CNN) based neuro-AI interface is proposed to predict Neuroscore from GAN-generated images directly without the need for neural responses. Importantly, we show that including neural responses during the training phase of the network can significantly improve the prediction capability of the proposed model. Materials related to this work are provided at https://github.com/villawang/Neuro-AI-Interface.

preprint2020arXiv

Tracking Skin Colour and Wrinkle Changes During Cosmetic Product Trials Using Smartphone Images

Background: To explore how the efficacy of product trials for skin cosmetics can be improved through the use of consumer-level images taken by volunteers using a conventional smartphone. Materials and Methods: 12 women aged 30 to 60 years participated in a product trial and had close-up images of the cheek and temple regions of their faces taken with a high-resolution Antera 3D CS camera at the start and end of a 4-week period. Additionally, they each had ``selfies'' of the same regions of their faces taken regularly throughout the trial period. Automatic image analysis to identify changes in skin colour used three kinds of colour normalisation and analysis for wrinkle composition identified edges and calculated their magnitude. Results: Images taken at the start and end of the trial acted as baseline ground truth for normalisation of smartphone images and showed large changes in both colour and wrinkle magnitude during the trial for many volunteers. Conclusions: Results demonstrate that regular use of selfie smartphone images within trial periods can add value to interpretation of the efficacy of the trial.

preprint2020arXiv

TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval

The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.

preprint2020arXiv

Utilising Visual Attention Cues for Vehicle Detection and Tracking

Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset.

preprint2015arXiv

Exploring EEG for Object Detection and Retrieval

This paper explores the potential for using Brain Computer Interfaces (BCI) as a relevance feedback mechanism in content-based image retrieval. We investigate if it is possible to capture useful EEG signals to detect if relevant objects are present in a dataset of realistic and complex images. We perform several experiments using a rapid serial visual presentation (RSVP) of images at different rates (5Hz and 10Hz) on 8 users with different degrees of familiarization with BCI and the dataset. We then use the feedback from the BCI and mouse-based interfaces to retrieve localized objects in a subset of TRECVid images. We show that it is indeed possible to detect such objects in complex images and, also, that users with previous knowledge on the dataset or experience with the RSVP outperform others. When the users have limited time to annotate the images (100 seconds in our experiments) both interfaces are comparable in performance. Comparing our best users in a retrieval task, we found that EEG-based relevance feedback outperforms mouse-based feedback. The realistic and complex image dataset differentiates our work from previous studies on EEG for image retrieval.

preprint2014arXiv

Object Segmentation in Images using EEG Signals

This paper explores the potential of brain-computer interfaces in segmenting objects from images. Our approach is centered around designing an effective method for displaying the image parts to the users such that they generate measurable brain reactions. When an image region, specifically a block of pixels, is displayed we estimate the probability of the block containing the object of interest using a score based on EEG activity. After several such blocks are displayed, the resulting probability map is binarized and combined with the GrabCut algorithm to segment the image into object and background regions. This study shows that BCI and simple EEG analysis are useful in locating object boundaries in images.

Alan F. Smeaton

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

An Analysis of Conversational Volatility During Telecollaboration Sessions for Second Language Learning

Playback-centric visualisations of video usage using weighted interactions to guide where to watch in an educational context

Attention Based Video Summaries of Live Online Zoom Classes

Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding

Image Segmentation to Identify Safe Landing Zones for Unmanned Aerial Vehicles

Using a GAN to Generate Adversarial Examples to Facial Image Recognition

A Neuro-AI Interface for Evaluating Generative Adversarial Networks

Chroma Intra Prediction with attention-based CNN architectures

Investigating Memorability of Dynamic Media

Keystroke Dynamics as Part of Lifelogging

Leveraging Audio Gestalt to Predict Media Memorability

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Optimised Convolutional Neural Networks for Heart Rate Estimation and Human Activity Recognition in Wrist Worn Sensing Applications

Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?

Synthetic-Neuroscore: Using A Neuro-AI Interface for Evaluating Generative Adversarial Networks

Tracking Skin Colour and Wrinkle Changes During Cosmetic Product Trials Using Smartphone Images

TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval

Utilising Visual Attention Cues for Vehicle Detection and Tracking

Exploring EEG for Object Detection and Retrieval

Object Segmentation in Images using EEG Signals