Source author record

Robert Feldt

Robert Feldt appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence Human-Computer Interaction Machine Learning Applications General Literature Information Theory math.IT Neural and Evolutionary Computing

Catalog footprint

What is connected

23works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Robust web element identification for evolving applications by considering visual overlaps

Fragile (i.e., non-robust) test execution is a common challenge for automated GUI-based testing of web applications as they evolve. Despite recent progress, there is still room for improvement since test execution failures caused by technical limitations result in unnecessary maintenance costs that limit its effectiveness and efficiency. One of the most reported technical challenges for web-based tests concerns how to reliably locate a web element used by a test script. This paper proposes the novel concept of Visually Overlapping Nodes (VON) that reduces fragility by utilizing the phenomenon that visual web elements (observed by the user) are constructed from multiple web-elements in the Document Object Model (DOM) that overlaps visually. We demonstrate the approach in a tool, VON Similo, which extends the state-of-the-art multi-locator approach (Similo) that is also used as the baseline for an experiment. In the experiment, a ground truth set of 1163 manually collected web element pairs, from different releases of the 40 most popular websites on the internet, are used to compare the approaches' precision, recall, and accuracy. Our results show that VON Similo provides 94.7% accuracy in identifying a web element in a new release of the same SUT. In comparison, Similo provides 83.8% accuracy. These results demonstrate the applicability of the visually overlapping nodes concept/tool for web element localization in evolving web applications and contribute a novel way of thinking about web element localization in future research on GUI-based testing.

preprint2022arXiv

A Taxonomy of Information Attributes for Test Case Prioritisation: Applicability, Machine Learning

Most software companies have extensive test suites and re-run parts of them continuously to ensure recent changes have no adverse effects. Since test suites are costly to execute, industry needs methods for test case prioritisation (TCP). Recently, TCP methods use machine learning (ML) to exploit the information known about the system under test (SUT) and its test cases. However, the value added by ML-based TCP methods should be critically assessed with respect to the cost of collecting the information. This paper analyses two decades of TCP research, and presents a taxonomy of 91 information attributes that have been used. The attributes are classified with respect to their information sources and the characteristics of their extraction process. Based on this taxonomy, TCP methods validated with industrial data and those applying ML are analysed in terms of information availability, attribute combination and definition of data features suitable for ML. Relying on a high number of information attributes, assuming easy access to SUT code and simplified testing environments are identified as factors that might hamper industrial applicability of ML-based TCP. The TePIA taxonomy provides a reference framework to unify terminology and evaluate alternatives considering the cost-benefit of the information attributes.

preprint2022arXiv

Automated Black-Box Boundary Value Detection

The input domain of software systems can typically be divided into sub-domains for which the outputs are similar. To ensure high quality it is critical to test the software on the boundaries between these sub-domains. Consequently, boundary value analysis and testing has been part of the toolbox of software testers for long and is typically taught early to students. However, despite its many argued benefits, boundary value analysis for a given specification or piece of software is typically described in abstract terms which allow for variation in how testers apply it. Here we propose an automated, black-box boundary value detection method to support software testers in systematic boundary value analysis with consistent results. The method builds on a metric to quantify the level of boundariness of test inputs: the program derivative. By coupling it with search algorithms we find and rank pairs of inputs as good boundary candidates, i.e. inputs close together but with outputs far apart. We implement our AutoBVA approach and evaluate it on a curated dataset of example programs. Our results indicate that even with a simple and generic program derivative variant in combination with broad sampling over the input space, interesting boundary candidates can be identified.

preprint2022arXiv

Cognition in Software Engineering: A Taxonomy and Survey of a Half-Century of Research

Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research.

preprint2022arXiv

Similarity-based web element localization for robust test automation

Non-robust (fragile) test execution is a commonly reported challenge in GUI-based test automation, despite much research and several proposed solutions. A test script needs to be resilient to (minor) changes in the tested application but, at the same time, fail when detecting potential issues that require investigation. Test script fragility is a multi-faceted problem, but one crucial challenge is reliably identifying and locating the correct target web elements when the website evolves between releases or otherwise fails and reports an issue. This paper proposes and evaluates a novel approach called similarity-based web element localization (Similo), which leverages information from multiple web element locator parameters to identify a target element using a weighted similarity score. The experimental study compares Similo to a baseline approach for web element localization. To get an extensive empirical basis, we target 40 of the most popular websites on the Internet in our evaluation. Robustness is considered by counting the number of web elements found in a recent website version compared to how many of these existed in an older version. Results of the experiment show that Similo outperforms the baseline representing the current state-of-the-art; it failed to locate the correct target web element in 72 out of 598 considered cases compared to 146 failed cases for the baseline approach. This study presents evidence that quantifying the similarity between multiple attributes of web elements when trying to locate them, as in our proposed Similo approach, is beneficial. With acceptable efficiency, Similo gives significantly higher effectiveness (i.e., robustness) than the baseline web element localization approach.

preprint2022arXiv

Test2Vec: An Execution Trace Embedding for Test Case Prioritization

Most automated software testing tasks can benefit from the abstract representation of test cases. Traditionally, this is done by encoding test cases based on their code coverage. Specification-level criteria can replace code coverage to better represent test cases' behavior, but they are often not cost-effective. In this paper, we hypothesize that execution traces of the test cases can be a good alternative to abstract their behavior for automated testing tasks. We propose a novel embedding approach, Test2Vec, that maps test execution traces to a latent space. We evaluate this representation in the test case prioritization (TP) task. Our default TP method is based on the similarity of the embedded vectors to historical failing test vectors. We also study an alternative based on the diversity of test vectors. Finally, we propose a method to decide which TP to choose, for a given test suite. The experiment is based on several real and seeded faults with over a million execution traces. Results show that our proposed TP improves best alternatives by 41.80% in terms of the median normalized rank of the first failing test case (FFR). It outperforms traditional code coverage-based approaches by 25.05% and 59.25% in terms of median APFD and median normalized FFR.

preprint2021arXiv

Ahead of Time Mutation Based Fault Localisation using Statistical Inference

Mutation analysis can effectively capture the dependency between source code and test results. This has been exploited by Mutation Based Fault Localisation (MBFL) techniques. However, MBFL techniques suffer from the need to expend the high cost of mutation analysis after the observation of failures, which may present a challenge for its practical adoption. We introduce SIMFL (Statistical Inference for Mutation-based Fault Localisation), an MBFL technique that allows users to perform the mutation analysis in advance before a failure is observed, allowing the amortisation of the analysis cost. SIMFL uses mutants as artificial faults and aims to learn the failure patterns among test cases against different locations of mutations. Once a failure is observed, SIMFL requires either almost no or very small additional cost for analysis, depending on the used inference model. An empirical evaluation using Defects4J shows that SIMFL can successfully localise up to 113 out of 203 studied faults (55%) at the top, and 159 (78%) faults within the top five, significantly outperforming existing MBFL techniques while using the results of mutation analysis that has been undertaken before the test failure. The amortised cost of mutation analysis can be further reduced by mutation sampling: SIMFL retains 80% of its localisation accuracy at the top rank when using only 10% of generated mutants, compared to results obtained without sampling.

preprint2021arXiv

Empirical Standards for Software Engineering Research

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

preprint2021arXiv

Towards Human-Like Automated Test Generation: Perspectives from Cognition and Problem Solving

Automated testing tools typically create test cases that are different from what human testers create. This often makes the tools less effective, the created tests harder to understand, and thus results in tools providing less support to human testers. Here, we propose a framework based on cognitive science and, in particular, an analysis of approaches to problem-solving, for identifying cognitive processes of testers. The framework helps map test design steps and criteria used in human test activities and thus to better understand how effective human testers perform their tasks. Ultimately, our goal is to be able to mimic how humans create test cases and thus to design more human-like automated test generation systems. We posit that such systems can better augment and support testers in a way that is meaningful to them.

preprint2020arXiv

Bayesian data analysis in empirical software engineering---The case of missing data

Bayesian data analysis (BDA) is today used by a multitude of research disciplines. These disciplines use BDA as a way to embrace uncertainty by using multilevel models and making use of all available information at hand. In this chapter, we first introduce the reader to BDA and then provide an example from empirical software engineering, where we also deal with a common issue in our field, i.e., missing data. The example we make use of presents the steps done when conducting state of the art statistical analysis. First, we need to understand the problem we want to solve. Second, we conduct causal analysis. Third, we analyze non-identifiability. Fourth, we conduct missing data analysis. Finally, we do a sensitivity analysis of priors. All this before we design our statistical model. Once we have a model, we present several diagnostics one can use to conduct sanity checks. We hope that through these examples, the reader will see the advantages of using BDA. This way, we hope Bayesian statistics will become more prevalent in our field, thus partly avoiding the reproducibility crisis we have seen in other disciplines.

preprint2020arXiv

Reducing DNN Labelling Cost using Surprise Adequacy: An Industrial Case Study for Autonomous Driving

Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. Object segmentation is one such task: its aim is to precisely locate boundaries of objects and classify the identified objects, helping autonomous cars to recognise the road environment and the traffic situation. Not only is this task safety critical, but developing a DNN based object segmentation module presents a set of challenges that are significantly different from traditional development of safety critical software. The development process in use consists of multiple iterations of data collection, labelling, training, and evaluation. Among these stages, training and evaluation are computation intensive while data collection and labelling are manual labour intensive. This paper shows how development of DNN based object segmentation can be improved by exploiting the correlation between Surprise Adequacy (SA) and model performance. The correlation allows us to predict model performance for inputs without manually labelling them. This, in turn, enables understanding of model performance, more guided data collection, and informed decisions about further training. In our industrial case study the technique allows cost savings of up to 50% with negligible evaluation inaccuracy. Furthermore, engineers can trade off cost savings versus the tolerable level of inaccuracy depending on different development phases and scenarios.

preprint2020arXiv

SINVAD: Search-based Image Space Navigation for DNN Image Classifier Test Input Generation

The testing of Deep Neural Networks (DNNs) has become increasingly important as DNNs are widely adopted by safety critical systems. While many test adequacy criteria have been suggested, automated test input generation for many types of DNNs remains a challenge because the raw input space is too large to randomly sample or to navigate and search for plausible inputs. Consequently, current testing techniques for DNNs depend on small local perturbations to existing inputs, based on the metamorphic testing principle. We propose new ways to search not over the entire image space, but rather over a plausible input space that resembles the true training distribution. This space is constructed using Variational Autoencoders (VAEs), and navigated through their latent vector space. We show that this space helps efficiently produce test inputs that can reveal information about the robustness of DNNs when dealing with realistic tests, opening the field to meaningful exploration through the space of highly structured images.

preprint2018arXiv

Guiding Deep Learning System Testing using Surprise Adequacy

Deep Learning (DL) systems are rapidly being adopted in safety and security critical domains, urgently calling for ways to test their correctness and robustness. Testing of DL systems has traditionally relied on manual collection and labelling of data. Recently, a number of coverage criteria based on neuron activation values have been proposed. These criteria essentially count the number of neurons whose activation during the execution of a DL system satisfied certain properties, such as being above predefined thresholds. However, existing coverage criteria are not sufficiently fine grained to capture subtle behaviours exhibited by DL systems. Moreover, evaluations have focused on showing correlation between adversarial examples and proposed criteria rather than evaluating and guiding their use for actual testing of DL systems. We propose a novel test adequacy criterion for testing of DL systems, called Surprise Adequacy for Deep Learning Systems (SADL), which is based on the behaviour of DL systems with respect to their training data. We measure the surprise of an input as the difference in DL system's behaviour between the input and the training data (i.e., what was learnt during training), and subsequently develop this as an adequacy criterion: a good test input should be sufficiently but not overtly surprising compared to training data. Empirical evaluation using a range of DL systems from simple image classifiers to autonomous driving car platforms shows that systematic sampling of inputs based on their surprise can improve classification accuracy of DL systems against adversarial examples by up to 77.5% via retraining.

preprint2016arXiv

A Conceptual UX-aware Model of Requirements

User eXperience (UX) is becoming increasingly important for success of software products. Yet, many companies still face various challenges in their work with UX. Part of these challenges relate to inadequate knowledge and awareness of UX and that current UX models are commonly not practical nor well integrated into existing Software Engineering (SE) models and concepts. Therefore, we present a conceptual UX-aware model of requirements for software development practitioners. This layered model shows the interrelation between UX and functional and quality requirements. The model is developed based on current models of UX and software quality characteristics. Through the model we highlight the main differences between various requirement types in particular essentially subjective and accidentally subjective quality requirements. We also present the result of an initial validation of the model through interviews with 12 practitioners and researchers. Our results show that the model can raise practitioners' knowledge and awareness of UX in particular in relation to requirement and testing activities. It can also facilitate UX-related communication among stakeholders with different backgrounds.

preprint2016arXiv

Cross-Section Evidence-based Timelines for Software Process Improvement Retrospectives: A Case Study of User eXperience Integration

Although integrating UX practices into software development processes is a type of Software Process Improvement (SPI) activity, this has not yet been taken into account in UX publications. In this study, we approach UX integration in a software development company in Sweden from a SPI perspective. Following the guidelines in SPI literature, we performed a retrospective meeting at the company to reflect on their decade of SPI activities for enhancing UX integration. The aim of the meeting was to reflect on, learn from, and coordinate various activities spanned across various organizational units and projects. We therefore supported the meeting by a pre- generated timeline of the main activities in the organization that is different from common project retrospective meetings in SPI. This approach is a refinement of a similar approach that is used in Agile projects, and is shown to improve effectiveness of, and decrease memory bias. We hypothesized that this method can be useful in the context of UX integration, and in this broader scope. To evaluate the method we gathered practitioners' view through a questionnaire. The findings showed our hypothesis to be plausible. Here, we present that UX integration research and practice can benefit from the SPI body of knowledge; We also show that such cross-section evidence-based timeline retrospective meetings are useful for UX integration, and in a larger scale than one project, especially for identifying and reflecting on 'organizational issues'. This approach also provides a cross- section longitudinal overview of the SPI activities that cannot easily be gained in other common SPI learning approaches.

preprint2016arXiv

Integrating User eXperience Practices into Software Development Processes: Implications of Subjectivity and Emergent Nature of UX

Many software companies face challenges in their work with User eXperience (UX) and how to integrate UX practices into existing development processes. A better understanding of these challenges can help researchers and practitioners better address them. Existing research does not analyse UX challenges in relation to other software quality characteristics including usability. In this empirical study, we have interviewed 17 practitioners from eight software development companies. Their responses are coded and analysed with thematic analysis. We report 11 challenges that practitioners face in their work with UX. Some of these challenges partly overlap with those reported in existing literature about usability or software quality characteristics. In contrast to these overlaps, the participants of our study either view many of the challenges unique to UX, or more severe than for usability or other quality characteristics. Although at a superficial level challenges with UX and other quality characteristics overlap, we differentiate these challenges at a deeper level through two main aspects of UX: subjectivity and emergent nature. In particular, we identify at least five issues that are essential to the very nature of UX, and add at least seven extra difficulties to the work of practitioners. These difficulties can explain why practitioners perceive the challenges to be more severe than for other quality characteristics. Our findings can be useful for researchers in identifying industrially relevant research areas and for practitioners to learn from empirically investigated challenges and base their improvement efforts on such knowledge. Investigating the overlaps can help finding research areas not only for enhancing practice of UX but also software quality in general. It also makes it easier for practitioners to spot, better understand as well as find mitigation strategies for UX challenges.

preprint2016arXiv

Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing

Context: Verification and validation (V&V) activities make up 20 to 50 percent of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and factors are unknown for automated GUI-based testing. Objective: This paper addresses this lack of knowledge through analysis of the costs and factors associated with the maintenance of automated GUI-based tests in industrial practice. Method: An empirical study at two companies, Siemens and Saab, is reported where interviews about, and empirical work with, Visual GUI Testing is performed to acquire data about the technique's maintenance costs and feasibility. Results: 13 factors are observed that affect maintenance, e.g. tester knowledge/experience and test case complexity. Further, statistical analysis shows that developing new test scripts is costlier than maintenance but also that frequent maintenance is less costly than infrequent, big bang maintenance. In addition a cost model, based on previous work, is presented that estimates the time to positive return on investment (ROI) of test automation compared to manual testing. Conclusions: It is concluded that test automation can lower overall software development costs of a project whilst also having positive effects on software quality. However, maintenance costs can still be considerable and the less time a company currently spends on manual testing, the more time is required before positive, economic, ROI is reached after automation.

preprint2016arXiv

Software Engineers' Attitudes Towards Organizational Change - an Industrial Case Study

In order to cope with a complex and changing environment, industries seek to find new and more efficient ways to conduct their business. According to previous research, many of these change efforts fail to achieve their intended aims. Researchers have therefore sought to identify factors that increase the likelihood of success and found that employees' attitude towards change is one of the most critical. The ability to manage change is especially important in software engineering organizations, where rapid changes in influential technologies and constantly evolving methodologies create a turbulent environment. Nevertheless, to the best of our knowledge, no studies exist that explore attitude towards change in a software engineering organization. In this case study, we have used industry data to examine if the knowledge about the intended change outcome, the understanding of the need for change, and the feelings of participation affect software engineers' openness to change and readiness for change respectively, two commonly used attitude constructs. The result of two separate multiple regression analysis showed that openness to change is predicted by all three concepts, while readiness for change is predicted by need for change and participation. In addition, our research also provides a hierarchy with respect to the three predictive constructs' degree of impact. Ultimately, our result can help managers in software engineering organizations to increase the likelihood of successfully implementing change initiatives that result in a changed organizational behavior. However, the first-order models we propose are to be recognized as early approximations that captures the most significant effects and should therefore, in future research, be extended to include additional software engineering unique factors.

preprint2016arXiv

Stakeholder Involvement: A Success Factor for Achieving Better UX Integration

Stakeholder involvement is one of the major success factors in integrating user experience (UX) practices into software development processes and organizations. It is also a necessity for agile software development. However, practitioners still have limited access to guidelines on successful involvement of UX stakeholders in agile settings. Moreover, agile UX literature does not well address the specific characteristics of UX and it does not clearly differentiate between UX and usability work. This paper presents two guidelines for supporting stakeholder involvement in both UX integration and the daily UX work. In particular, we focus on the special characteristics of UX: being dynamic, subjective, holistic, and context-dependent. The guidelines clarify practical implications of these characteristics for practitioners. In addition, they can help researchers in addressing these characteristics better in agile UX research.

preprint2015arXiv

Test Set Diameter: Quantifying the Diversity of Sets of Test Cases

A common and natural intuition among software testers is that test cases need to differ if a software system is to be tested properly and its quality ensured. Consequently, much research has gone into formulating distance measures for how test cases, their inputs and/or their outputs differ. However, common to these proposals is that they are data type specific and/or calculate the diversity only between pairs of test inputs, traces or outputs. We propose a new metric to measure the diversity of sets of tests: the test set diameter (TSDm). It extends our earlier, pairwise test diversity metrics based on recent advances in information theory regarding the calculation of the normalized compression distance (NCD) for multisets. An advantage is that TSDm can be applied regardless of data type and on any test-related information, not only the test inputs. A downside is the increased computational time compared to competing approaches. Our experiments on four different systems show that the test set diameter can help select test sets with higher structural and fault coverage than random selection even when only applied to test inputs. This can enable early test design and selection, prior to even having a software system to test, and complement other types of test automation and analysis. We argue that this quantification of test set diversity creates a number of opportunities to better understand software quality and provides practical ways to increase it.

preprint2015arXiv

Tester Interactivity makes a Difference in Search-Based Software Testing: A Controlled Experiment

Context: Search-based software testing promises to provide users with the ability to generate high-quality test cases, and hence increase product quality, with a minimal increase in the time and effort required. One result that emerged out of a previous study to investigate the application of search-based software testing (SBST) in an industrial setting was the development of the Interactive Search-Based Software Testing (ISBST) system. ISBST allows users to interact with the underlying SBST system, guiding the search and assessing the results. An industrial evaluation indicated that the ISBST system could find test cases that are not created by testers employing manual techniques. The validity of the evaluation was threatened, however, by the low number of participants. Objective: This paper presents a follow-up study, to provide a more rigorous evaluation of the ISBST system. Method: To assess the ISBST system a two-way crossover controlled experiment was conducted with 58 students taking a Verification and Validation course. The NASA Task Load Index (NASA-TLX) is used to assess the workload experienced by the participants in the experiment. Results: The experimental results validated the hypothesis that the ISBST system generates test cases that are not found by the same participants employing manual testing techniques. A follow-up laboratory experiment also investigates the importance of interaction in obtaining the results. In addition to this main result, the subjective workload was assessed for each participant by means of the NASA-TLX tool. The evaluation showed that, while the ISBST system required more effort from the participants, they achieved the same performance. Conclusions: The paper provides evidence that the ISBST system develops test cases that are not found by manual techniques, and that interaction plays an important role in achieving that result.

preprint2013arXiv

Do System Test Cases Grow Old?

Companies increasingly use either manual or automated system testing to ensure the quality of their software products. As a system evolves and is extended with new features the test suite also typically grows as new test cases are added. To ensure software quality throughout this process the test suite is continously executed, often on a daily basis. It seems likely that newly added tests would be more likely to fail than older tests but this has not been investigated in any detail on large-scale, industrial software systems. Also it is not clear which methods should be used to conduct such an analysis. This paper proposes three main concepts that can be used to investigate aging effects in the use and failure behavior of system test cases: test case activation curves, test case hazard curves, and test case half-life. To evaluate these concepts and the type of analysis they enable we apply them on an industrial software system containing more than one million lines of code. The data sets comes from a total of 1,620 system test cases executed a total of more than half a million times over a time period of two and a half years. For the investigated system we find that system test cases stay active as they age but really do grow old; they go through an infant mortality phase with higher failure rates which then decline over time. The test case half-life is between 5 to 12 months for the two studied data sets.

preprint2011arXiv

A Factorial Experiment on Scalability of Search Based Software Testing

Software testing is an expensive process, which is vital in the industry. Construction of the test-data in software testing requires the major cost and to decide which method to use in order to generate the test data is important. This paper discusses the efficiency of search-based algorithms (preferably genetic algorithm) versus random testing, in soft- ware test-data generation. This study differs from all previous studies due to sample programs (SUTs) which are used. Since we want to in- crease the complexity of SUTs gradually, and the program generation is automatic as well, Grammatical Evolution is used to guide the program generation. SUTs are generated according to the grammar we provide, with different levels of complexity. SUTs will first undergo genetic al- gorithm and then random testing. Based on the test results, this paper recommends one method to use for automation of software testing.

Robert Feldt

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Robust web element identification for evolving applications by considering visual overlaps

A Taxonomy of Information Attributes for Test Case Prioritisation: Applicability, Machine Learning

Automated Black-Box Boundary Value Detection

Cognition in Software Engineering: A Taxonomy and Survey of a Half-Century of Research

Similarity-based web element localization for robust test automation

Test2Vec: An Execution Trace Embedding for Test Case Prioritization

Ahead of Time Mutation Based Fault Localisation using Statistical Inference

Empirical Standards for Software Engineering Research

Towards Human-Like Automated Test Generation: Perspectives from Cognition and Problem Solving

Bayesian data analysis in empirical software engineering---The case of missing data

Reducing DNN Labelling Cost using Surprise Adequacy: An Industrial Case Study for Autonomous Driving

SINVAD: Search-based Image Space Navigation for DNN Image Classifier Test Input Generation

Guiding Deep Learning System Testing using Surprise Adequacy

A Conceptual UX-aware Model of Requirements

Cross-Section Evidence-based Timelines for Software Process Improvement Retrospectives: A Case Study of User eXperience Integration

Integrating User eXperience Practices into Software Development Processes: Implications of Subjectivity and Emergent Nature of UX

Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing

Software Engineers' Attitudes Towards Organizational Change - an Industrial Case Study

Stakeholder Involvement: A Success Factor for Achieving Better UX Integration

Test Set Diameter: Quantifying the Diversity of Sets of Test Cases

Tester Interactivity makes a Difference in Search-Based Software Testing: A Controlled Experiment

Do System Test Cases Grow Old?

A Factorial Experiment on Scalability of Search Based Software Testing