Researcher profile

Adrian Muscat

Adrian Muscat contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Face2Text revisited: Improved data set and baseline results

Current image description generation models do not transfer well to the task of describing human faces. To encourage the development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA image data set. We describe the properties of this data set, and present results from a face description generator trained on it, which explores the feasibility of using transfer learning from VGGFace/ResNet CNNs. Comparisons are drawn through both automated metrics and human evaluation by 76 English-speaking participants. The descriptions generated by the VGGFace-LSTM + Attention model are closest to the ground truth according to human evaluation whilst the ResNet-LSTM + Attention model obtained the highest CIDEr and CIDEr-D results (1.252 and 0.686 respectively). Together, the new data set and these experimental results provide data and baselines for future work in this area.

preprint2021arXiv

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained and rely on attributes extracted from the image, rather than objects and relations. Given that no data exists for this task, we present an ongoing crowdsourcing study to collect a corpus of descriptions of face images taken `in the wild'. To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus. Primarily, we found descriptions to refer to a mixture of attributes, not only physical, but also emotional and inferential, which is bound to create further challenges for current image-to-text methods.

preprint2021arXiv

One vs Previous and Similar Classes Learning -- A Comparative Study

When dealing with multi-class classification problems, it is common practice to build a model consisting of a series of binary classifiers using a learning paradigm which dictates how the classifiers are built and combined to discriminate between the individual classes. As new data enters the system and the model needs updating, these models would often need to be retrained from scratch. This work proposes three learning paradigms which allow trained models to be updated without the need of retraining from scratch. A comparative analysis is performed to evaluate them against a baseline. Results show that the proposed paradigms are faster than the baseline at updating, with two of them being faster at training from scratch as well, especially on larger datasets, while retaining a comparable classification performance.

preprint2021arXiv

Predicting Relative Depth between Objects from Semantic Features

Vision and language tasks such as Visual Relation Detection and Visual Question Answering benefit from semantic features that afford proper grounding of language. The 3D depth of objects depicted in 2D images is one such feature. However it is very difficult to obtain accurate depth information without learning the appropriate features, which are scene dependent. The state of the art in this area are complex Neural Network models trained on stereo image data to predict depth per pixel. Fortunately, in some tasks, its only the relative depth between objects that is required. In this paper the extent to which semantic features can predict course relative depth is investigated. The problem is casted as a classification one and geometrical features based on object bounding boxes, object labels and scene attributes are computed and used as inputs to pattern recognition models to predict relative depth. i.e behind, in-front and neutral. The results are compared to those obtained from averaging the output of the monodepth neural network model, which represents the state-of-the art. An overall increase of 14% in relative depth accuracy over relative depth computed from the monodepth model derived results is achieved.

preprint2020arXiv

VJAGG -- A Thick-Client Smart-Phone Journey Detection Algorithm

In this paper we describe $Vja\dot{g}\dot{g}$, a battery-aware journey detection algorithm that executes on the mobile device. The algorithm can be embedded in the client app of the transport service provider or in a general purpose mobility data collector. The thick client setup allows the customer/participant to select which journeys are transferred to the server, keeping customers in control of their personal data and encouraging user uptake. The algorithm is tested in the field and optimised for both accuracy in registering complete journeys and battery power consumption. Typically the algorithm can run for a full day without the need of recharging and more than 88% of journeys are correctly detected from origin to destination, whilst 12% would be missing part of the journey.