Source author record

Jordan J. Bird

Jordan J. Bird appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence eess.AS Sound Computation and Language Graphics Human-Computer Interaction Robotics

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Continuation of Famous Art with AI: A Conditional Adversarial Network Inpainting Approach

Much of the state-of-the-art in image synthesis inspired by real artwork are either entirely generative by filtered random noise or inspired by the transfer of style. This work explores the application of image inpainting to continue famous artworks and produce generative art with a Conditional GAN. During the training stage of the process, the borders of images are cropped, leaving only the centre. An inpainting GAN is then tasked with learning to reconstruct the original image from the centre crop by way of minimising both adversarial and absolute difference losses, which are analysed by both their Fréchet Inception Distances and manual observations which are presented. Once the network is trained, images are then resized rather than cropped and presented as input to the generator. Following the learning process, the generator then creates new images by continuing from the edges of the original piece. Three experiments are performed with datasets of 4766 landscape paintings (impressionism and romanticism), 1167 Ukiyo-e works from the Japanese Edo period, and 4968 abstract artworks. Results show that geometry and texture (including canvas and paint) as well as scenery such as sky, clouds, water, land (including hills and mountains), grass, and flowers are implemented by the generator when extending real artworks. In the Ukiyo-e experiments, it was observed that features such as written text were generated even in cases where the original image did not have any, due to the presence of an unpainted border within the input image.

preprint2022arXiv

Improving Customer Service Chatbots with Attention-based Transfer Learning

With growing societal acceptance and increasing cost efficiency due to mass production, service robots are beginning to cross from the industrial to the social domain. Currently, customer service robots tend to be digital and emulate social interactions through on-screen text, but state-of-the-art research points towards physical robots soon providing customer service in person. This article explores two possibilities. Firstly, whether transfer learning can aid in the improvement of customer service chatbots between business domains. Second, the implementation of a framework for physical robots for in-person interaction. Modelled on social interaction with Twitter customer support accounts, transformer-based chatbot models are initially assigned to learn one domain from an initial random weight distribution. Given shared vocabulary, each model is then tasked with learning another domain by transferring knowledge from the previous. Following studies on 19 different businesses, results show that the majority of models are improved when transferring weights from at least one other domain, in particular those that are more data-scarce than others. General language transfer learning occurs, as well as higher-level transfer of similar domain knowledge, in several cases. The chatbots are finally implemented on Temi and Pepper robots, with feasibility issues encountered and solutions are proposed to overcome them.

preprint2022arXiv

Reducing Overconfidence Predictions for Autonomous Driving Perception

In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks. We demonstrate that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition. We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual SoftMax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase.

preprint2022arXiv

Robotic and Generative Adversarial Attacks in Offline Writer-independent Signature Verification

This study explores how robots and generative approaches can be used to mount successful false-acceptance adversarial attacks on signature verification systems. Initially, a convolutional neural network topology and data augmentation strategy are explored and tuned, producing an 87.12% accurate model for the verification of 2,640 human signatures. Two robots are then tasked with forging 50 signatures, where 25 are used for the verification attack, and the remaining 25 are used for tuning of the model to defend against them. Adversarial attacks on the system show that there exists an information security risk; the Line-us robotic arm can fool the system 24% of the time and the iDraw 2.0 robot 32% of the time. A conditional GAN finds similar success, with around 30% forged signatures misclassified as genuine. Following fine-tune transfer learning of robotic and generative data, adversarial attacks are reduced below the model threshold by both robots and the GAN. It is observed that tuning the model reduces the risk of attack by robots to 8% and 12%, and that conditional generative adversarial attacks can be reduced to 4% when 25 images are presented and 5% when 1000 images are presented.

preprint2022arXiv

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

In modern society, people should not be identified based on their disability, rather, it is environments that can disable people with impairments. Improvements to automatic Sign Language Recognition (SLR) will lead to more enabling environments via digital technology. Many state-of-the-art approaches to SLR focus on the classification of static hand gestures, but communication is a temporal activity, which is reflected by many of the dynamic gestures present. Given this, temporal information during the delivery of a gesture is not often considered within SLR. The experiments in this work consider the problem of SL gesture recognition regarding how dynamic gestures change during their delivery, and this study aims to explore how single types of features as well as mixed features affect the classification ability of a machine learning model. 18 common gestures recorded via a Leap Motion Controller sensor provide a complex classification problem. Two sets of features are extracted from a 0.6 second time window, statistical descriptors and spatio-temporal attributes. Features from each set are compared by their ANOVA F-Scores and p-values, arranged into bins grown by 10 features per step to a limit of the 250 highest-ranked features. Results show that the best statistical model selected 240 features and scored 85.96% accuracy, the best spatio-temporal model selected 230 features and scored 80.98%, and the best mixed-feature model selected 240 features from each set leading to a classification accuracy of 86.75%. When all three sets of results are compared (146 individual machine learning models), the overall distribution shows that the minimum results are increased when inputs are any number of mixed features compared to any number of either of the two single sets of features.

preprint2020arXiv

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We first extract video frames and accompanying audio at one second intervals. The image and the audio datasets are first classified independently, using a fine-tuned VGG16 and an evolutionary optimised deep neural network, with accuracies of 89.27% and 93.72%, respectively. This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips. The tertiary neural network implemented for late fusion outperforms classical state-of-the-art classifiers by around 3% when the two primary networks are considered as feature generators. We show that situations where a single-modality may be confused by anomalous data points are now corrected through an emerging higher order integration. Prominent examples include a water feature in a city misclassified as a river by the audio classifier alone and a densely crowded street misclassified as a forest by the image classifier alone. Both are examples which are correctly classified by our multi-modality approach.

preprint2020arXiv

LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity

In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification. In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes. Using character level LSTMs (supervised learning) and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated by learning from the data provided on a per-subject basis. A neural network is trained to classify the data against a large dataset of Flickr8k speakers and is then compared to a transfer learning network performing the same task but with an initial weight distribution dictated by learning from the synthetic data generated by the two models. The best result for all of the 7 subjects were networks that had been exposed to synthetic data, the model pre-trained with LSTM-produced data achieved the best result 3 times and the GPT-2 equivalent 5 times (since one subject had their best result from both models at a draw). Through these results, we argue that speaker classification can be improved by utilising a small amount of user data but with exposure to synthetically-generated MFCCs which then allow the networks to achieve near maximum classification scores.

Jordan J. Bird

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Continuation of Famous Art with AI: A Conditional Adversarial Network Inpainting Approach

Improving Customer Service Chatbots with Attention-based Transfer Learning

Reducing Overconfidence Predictions for Autonomous Driving Perception

Robotic and Generative Adversarial Attacks in Offline Writer-independent Signature Verification

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity