Source author record

Scott Barnett

Scott Barnett appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence cond-mat.mtrl-sci cs.CY Information Retrieval Machine Learning physics.app-ph

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset

Document-based Question-Answering (QA) tasks are crucial for precise information retrieval. While some existing work focus on evaluating large language models performance on retrieving and answering questions from documents, assessing the LLMs performance on QA types that require exact answer selection from predefined options and numerical extraction is yet to be fully assessed. In this paper, we specifically focus on this underexplored context and conduct empirical analysis of LLMs (GPT-4 and GPT-3.5) on question types, including single-choice, yes-no, multiple-choice, and number extraction questions from documents in zero-shot setting. We use the CogTale dataset for evaluation, which provide human expert-tagged responses, offering a robust benchmark for precision and factual grounding. We found that LLMs, particularly GPT-4, can precisely answer many single-choice and yes-no questions given relevant context, demonstrating their efficacy in information retrieval tasks. However, their performance diminishes when confronted with multiple-choice and number extraction formats, lowering the overall performance of the model on this task, indicating that these models may not yet be sufficiently reliable for the task. This limits the applications of LLMs on applications demanding precise information extraction from documents, such as meta-analysis tasks. These findings hinge on the assumption that the retrievers furnish pertinent context necessary for accurate responses, emphasizing the need for further research. Our work offers a framework for ongoing dataset evaluation, ensuring that LLM applications for information retrieval and document analysis continue to meet evolving standards.

preprint2022arXiv

MLSmellHound: A Context-Aware Code Analysis Tool

Meeting the rise of industry demand to incorporate machine learning (ML) components into software systems requires interdisciplinary teams contributing to a shared code base. To maintain consistency, reduce defects and ensure maintainability, developers use code analysis tools to aid them in identifying defects and maintaining standards. With the inclusion of machine learning, tools must account for the cultural differences within the teams which manifests as multiple programming languages, and conflicting definitions and objectives. Existing tools fail to identify these cultural differences and are geared towards software engineering which reduces their adoption in ML projects. In our approach we attempt to resolve this problem by exploring the use of context which includes i) purpose of the source code, ii) technical domain, iii) problem domain, iv) team norms, v) operational environment, and vi) development lifecycle stage to provide contextualised error reporting for code analysis. To demonstrate our approach, we adapt Pylint as an example and apply a set of contextual transformations to the linting results based on the domain of individual project files under analysis. This allows for contextualised and meaningful error reporting for the end-user.

preprint2020arXiv

A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

Background: Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects. Aims: This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? Method: We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity. Results: Data Science projects suffer from a significantly higher rate of functions that use an excessive numbers of parameters and local variables. Data Science projects also follow different variable naming conventions to non-Data Science projects. Conclusions: The differences indicate that Data Science codebases are distinct from traditional software codebases and do not follow traditional software engineering conventions. Our conjecture is that this may be because traditional software engineering conventions are inappropriate in the context of Data Science projects.

preprint2020arXiv

Beware the evolving 'intelligent' web service! An integration architecture tactic to guard AI-first components

Intelligent services provide the power of AI to developers via simple RESTful API endpoints, abstracting away many complexities of machine learning. However, most of these intelligent services-such as computer vision-continually learn with time. When the internals within the abstracted 'black box' become hidden and evolve, pitfalls emerge in the robustness of applications that depend on these evolving services. Without adapting the way developers plan and construct projects reliant on intelligent services, significant gaps and risks result in both project planning and development. Therefore, how can software engineers best mitigate software evolution risk moving forward, thereby ensuring that their own applications maintain quality? Our proposal is an architectural tactic designed to improve intelligent service-dependent software robustness. The tactic involves creating an application-specific benchmark dataset baselined against an intelligent service, enabling evolutionary behaviour changes to be mitigated. A technical evaluation of our implementation of this architecture demonstrates how the tactic can identify 1,054 cases of substantial confidence evolution and 2,461 cases of substantial changes to response label sets using a dataset consisting of 331 images that evolve when sent to a service.

preprint2020arXiv

Interpreting Cloud Computer Vision Pain-Points: A Mining Study of Stack Overflow

Intelligent services are becoming increasingly more pervasive; application developers want to leverage the latest advances in areas such as computer vision to provide new services and products to users, and large technology firms enable this via RESTful APIs. While such APIs promise an easy-to-integrate on-demand machine intelligence, their current design, documentation and developer interface hides much of the underlying machine learning techniques that power them. Such APIs look and feel like conventional APIs but abstract away data-driven probabilistic behaviour - the implications of a developer treating these APIs in the same way as other, traditional cloud services, such as cloud storage, is of concern. The objective of this study is to determine the various pain-points developers face when implementing systems that rely on the most mature of these intelligent services, specifically those that provide computer vision. We use Stack Overflow to mine indications of the frustrations that developers appear to face when using computer vision services, classifying their questions against two recent classification taxonomies (documentation-related and general questions). We find that, unlike mature fields like mobile development, there is a contrast in the types of questions asked by developers. These indicate a shallow understanding of the underlying technology that empower such systems. We discuss several implications of these findings via the lens of learning taxonomies to suggest how the software engineering community can improve these services and comment on the nature by which developers use them.

preprint2020arXiv

Ranking Computer Vision Service Issues using Emotion

Software developers are increasingly using machine learning APIs to implement 'intelligent' features. Studies show that incorporating machine learning into an application increases technical debt, creates data dependencies, and introduces uncertainty due to non-deterministic behaviour. However, we know very little about the emotional state of software developers who deal with such issues. In this paper, we do a landscape analysis of emotion found in 1,245 Stack Overflow posts about computer vision APIs. We investigate the application of an existing emotion classifier EmoTxt and manually verify our results. We found that the emotion profile varies for different question categories.

preprint2020arXiv

The oxygen partial pressure in solid oxide electrolysis cells with two layer electrolytes

A number of degradation mechanisms have been observed during the long-term operation of solid oxide electrolysis cells (SOEC). Using an electrolyte charge carrier transport model, we quantify the oxygen potentials across the electrolyte and thereby provide insights into these degradation mechanisms. Our model describes the transport of charge carriers in the electrolyte when the oxygen partial pressure is extremely low by accounting for the spatial variation of the concentration of oxygen vacancies in the electrolyte. Moreover, we identify four quantities that characterize the distribution of oxygen partial pressure in the electrolyte, which are directly related to the degradation mechanisms in the electrolyte as well: the two oxygen partial pressures at the interfaces of the electrodes and the electrolyte, the oxygen partial pressure at the interface of YSZ/GDC, and the position of the abrupt change in oxygen potential near the p-n junction that develops in YSZ when one side of the cell is exposed to fuel (low oxygen potential, n-type conduction) and the other side is exposed to oxidant (high oxygen potential, p-type conduction). We give analytical estimates for all of these quantities. These analytical expressions provide guidance on the parameters that need to be controlled to suppress the degradation observed in the electrolyte. In addition, the effects of operating conditions, particularly current density and operating temperature, on degradation are discussed.

preprint2020arXiv

Threshy: Supporting Safe Usage of Intelligent Web Services

Increased popularity of `intelligent' web services provides end-users with machine-learnt functionality at little effort to developers. However, these services require a decision threshold to be set which is dependent on problem-specific data. Developers lack a systematic approach for evaluating intelligent services and existing evaluation tools are predominantly targeted at data scientists for pre-development evaluation. This paper presents a workflow and supporting tool, Threshy, to help software developers select a decision threshold suited to their problem domain. Unlike existing tools, Threshy is designed to operate in multiple workflows including pre-development, pre-release, and support. Threshy is designed for tuning the confidence scores returned by intelligent web services and does not deal with hyper-parameter optimisation used in ML models. Additionally, it considers the financial impacts of false positives. Threshold configuration files exported by Threshy can be integrated into client applications and monitoring infrastructure. Demo: https://bit.ly/2YKeYhE.

Scott Barnett

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset

MLSmellHound: A Context-Aware Code Analysis Tool

A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

Beware the evolving 'intelligent' web service! An integration architecture tactic to guard AI-first components

Interpreting Cloud Computer Vision Pain-Points: A Mining Study of Stack Overflow

Ranking Computer Vision Service Issues using Emotion

The oxygen partial pressure in solid oxide electrolysis cells with two layer electrolytes

Threshy: Supporting Safe Usage of Intelligent Web Services