Researcher profile

Stephen G. MacDonell

Stephen G. MacDonell contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
37works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

37 published item(s)

preprint2022arXiv

A Systematic Mapping Study Addressing the Reliability of Mobile Applications: The Need to Move Beyond Testing Reliability

Intense competition in the mobile apps market means it is important to maintain high levels of app reliability to avoid losing users. Yet despite its importance, app reliability is underexplored in the research literature. To address this need, we identify, analyse, and classify the state-of-the-art in the field of mobile apps' reliability through a systematic mapping study. From the results of such a study, researchers in the field can identify pressing research gaps, and developers can gain knowledge about existing solutions, to potentially leverage them in practice. We found 87 relevant papers which were then analysed and classified based on their research focus, research type, contribution, research method, study settings, data, quality attributes and metrics used. Results indicate that there is a lack of research on understanding reliability with regard to context-awareness, self-healing, ageing and rejuvenation, and runtime event handling. These aspects have rarely been studied, or if studied, there is limited evaluation. We also identified several other research gaps including the need to conduct more research in real-world industrial projects. Furthermore, little attention has been paid towards quality standards while conducting research. Outcomes here show numerous opportunities for greater research depth and breadth on mobile app reliability.

preprint2022arXiv

An Empirical Study on the Effectiveness of Data Resampling Approaches for Cross-Project Software Defect Prediction

Crossp-roject defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN) Filter approach have shown promising results in recent studies. A key challenge with defect-prediction datasets is class imbalance, that is highly skewed datasets where non buggy modules dominate the buggy modules. In the past, data resampling approaches have been applied to within-projects defect prediction models to help alleviate the negative effects of class imbalance in the datasets. To address the class imbalance issue in CPDP, the authors assess the impact of data resampling approaches on CPDP models after the NN Filter is applied. The impact on prediction performance of five oversampling approaches (MAHAKIL, SMOTE, Borderline-SMOTE, Random Oversampling, and ADASYN) and three undersampling approaches (Random Undersampling, Tomek Links, and Onesided selection) is investigated and results are compared to approaches without data resampling. The authors' examined six defect prediction models on 34 datasets extracted from the PROMISE repository. The authors results show that there is a significant positive effect of data resampling on CPDP performance, suggesting that software quality teams and researchers should consider applying data resampling approaches for improved recall (pd) and g-measure prediction performance. However if the goal is to improve precision and reduce false alarm (pf) then data resampling approaches should be avoided.

preprint2021arXiv

A Baseline Model for Software Effort Estimation

Software effort estimation (SEE) is a core activity in all software processes and development lifecycles. A range of increasingly complex methods has been considered in the past 30 years for the prediction of effort, often with mixed and contradictory results. The comparative assessment of effort prediction methods has therefore become a common approach when considering how best to predict effort over a range of project types. Unfortunately, these assessments use a variety of sampling methods and error measurements, making comparison with other work difficult. This article proposes an automatically transformed linear model (ATLM) as a suitable baseline model for comparison against SEE methods. ATLM is simple yet performs well over a range of different project types. In addition, ATLM may be used with mixed numeric and categorical data and requires no parameter tuning. It is also deterministic, meaning that results obtained are amenable to replication. These and other arguments for using ATLM as a baseline model are presented, and a reference implementation described and made available. We suggest that ATLM should be used as a baseline of effort prediction quality for all future model comparisons in SEE.

preprint2021arXiv

A Perspective-Based Understanding of Project Success

Answering the call for alternative approaches to researching project management, we explore the evaluation of project success from a subjectivist perspective. An in-depth, longitudinal case study of information systems development in a large manufacturing company was used to investigate how various project stakeholders subjectively perceived the project outcome and what evaluation criteria they drew on in doing so. A conceptual framework is developed for understanding and analyzing evaluations of project success, both formal and informal. The framework highlights how different stakeholder perspectives influence the perceived outcome(s) of a project, and how project evaluations may differ between stakeholders and across time.

preprint2021arXiv

A Systematic Mapping Study on Dynamic Metrics and Software Quality

Several important aspects of software product quality can be evaluated using dynamic metrics that effectively capture and reflect the software's true runtime behavior. While the extent of research in this field is still relatively limited, particularly when compared to research on static metrics, the field is growing, given the inherent advantages of dynamic metrics. The aim of this work is to systematically investigate the body of research on dynamic software metrics to identify issues associated with their selection, design and implementation. Mapping studies are being increasingly used in software engineering to characterize an emerging body of research and to identify gaps in the field under investigation. In this study we identified and evaluated 60 works based on a set of defined selection criteria. These studies were further classified and analyzed to identify their relativity to future dynamic metrics research. The classification was based on three different facets: research focus, research type and contribution type. We found a strong body of research related to dynamic coupling and cohesion metrics, with most works also addressing the abstract notion of software complexity. Specific opportunities for future work relate to a much broader range of quality dimensions.

preprint2021arXiv

A Visual Analysis Approach to Update Systematic Reviews

Context: In order to preserve the value of Systematic Reviews (SRs), they should be frequently updated considering new evidence that has been produced since the completion of the previous version of the reviews. However, the update of an SR is a time consuming, manual task. Thus, many SRs have not been updated as they should be and, therefore, they are currently outdated. Objective: The main contribution of this paper is to support the update of SRs. Method: We propose USR-VTM, an approach based on Visual Text Mining (VTM) techniques, to support selection of new evidence in the form of primary studies. We then present a tool, named Revis, which supports our approach. Finally, we evaluate our approach through a comparison of outcomes achieved using USR-VTM versus the traditional (manual) approach. Results: Our results show that USR-VTM increases the number of studies correctly included compared to the traditional approach. Conclusions: USR-VTM effectively supports the update of SRs.

preprint2021arXiv

Analysing the use of graphs to represent the results of Systematic Reviews in Software Engineering

The presentation of results from Systematic Literature Reviews (SLRs) is generally done using tables. Prior research suggests that results summarized in tables are often difficult for readers to understand. One alternative to improve results' comprehensibility is to use graphical representations. The aim of this work is twofold: first, to investigate whether graph representations result is better comprehensibility than tables when presenting SLR results; second, to investigate whether interpretation using graphs impacts on performance, as measured by the time consumed to analyse and understand the data. We selected an SLR published in the literature and used two different formats to represent its results - tables and graphs, in three different combinations: (i) table format only; (ii) graph format only; and (iii) a mixture of tables and graphs. We conducted an experiment that compared the performance and capability of experts in SLR, as well as doctoral and masters students, in analysing and understanding the results of the SLR, as presented in one of the three different forms. We were interested in examining whether there is difference between the performance of participants using tables and graphs. The graphical representation of SLR data led to a reduction in the time taken for its analysis, without any loss in data comprehensibility. For our sample the analysis of graphical data proved to be faster than the analysis of tabular data. However , we found no evidence of a difference in comprehensibility whether using tables, graphical format or a combination. Overall we argue that graphs are a suitable alternative to tables when it comes to representing the results of an SLR.

preprint2021arXiv

Analyzing Confidentiality and Privacy Concerns: Insights from Android Issue Logs

Context: Post-release user feedback plays an integral role in improving software quality and informing new features. Given its growing importance, feedback concerning security enhancements is particularly noteworthy. In considering the rapid uptake of Android we have examined the scale and severity of Android security threats as reported by its stakeholders. Objective: We systematically mine Android issue logs to derive insights into stakeholder perceptions and experiences in relation to certain Android security issues. Method: We employed contextual analysis techniques to study issues raised regarding confidentiality and privacy in the last three major Android releases, considering covariance of stakeholder comments, and the level of consistency in user preferences and priorities. Results: Confidentiality and privacy concerns varied in severity, and were most prevalent over Jelly Bean releases. Issues raised in regard to confidentiality related mostly to access, user credentials and permission management, while privacy concerns were mainly expressed about phone locking. Community users also expressed divergent preferences for new security features, ranging from more relaxed to very strict. Conclusion: Strategies that support continuous corrective measures for both old and new Android releases would likely maintain stakeholder confidence. An approach that provides users with basic default security settings, but with the power to configure additional security features if desired, would provide the best balance for Android's wide cohort of stakeholders.

preprint2021arXiv

Catching up with Method and Process Practice: An Industry-Informed Baseline for Researchers

Software development methods are usually not applied by the book. Companies are under pressure to continuously deploy software products that meet market needs and stakeholders' requests. To implement efficient and effective development processes, companies utilize multiple frameworks, methods and practices, and combine these into hybrid methods. A common combination contains a rich management framework to organize and steer projects complemented with a number of smaller practices providing the development teams with tools to complete their tasks. In this paper, based on 732 data points collected through an international survey, we study the software development process use in practice. Our results show that 76.8% of the companies implement hybrid methods. Company size as well as the strategy in devising and evolving hybrid methods affect the suitability of the chosen process to reach company or project goals. Our findings show that companies that combine planned improvement programs with process evolution can increase their process' suitability by up to 5%.

preprint2021arXiv

Categorising Software Contexts: Research-in-Progress

A growing number of researchers suggest that software process must be tailored to a project's context to achieve maximal performance. Researchers have studied 'context' in an ad-hoc way, with focus on those contextual factors that appear to be of significance. The result is that we have no useful basis upon which to contrast and compare studies. We are currently researching a theoretical basis for software context for the purpose of tailoring and note that a deeper consideration of the meaning of the term 'context' is required before we can proceed. In this paper, we examine the term and present a model based on insights gained from our initial categorisation of contextual factors from the literature. We test our understanding by analysing a further six documents. Our contribution thus far is a model that we believe will support a theoretical operationalisation of software context for the purpose of process tailoring.

preprint2021arXiv

Causal Factors, Benefits and Challenges of Test-Driven Development: Practitioner Perceptions

This report describes the experiences of one organization's adoption of Test Driven Development (TDD) practices as part of a medium-term software project employing Extreme Programming as a methodology. Three years into this project the team's TDD experiences are compared with their non-TDD experiences on other ongoing projects. The perceptions of the benefits and challenges of using TDD in this context are gathered through five semi-structured interviews with key team members. Their experiences indicate that use of TDD has generally been positive and the reasons for this are explored to deepen the understanding of TDD practice and its effects on code quality, application quality and development productivity. Lessons learned are identified to aid others with the adoption and implementation of TDD practices, and some potential further research areas are suggested.

preprint2021arXiv

Combining Text Mining and Visualization Techniques to Study Teams' Behavioral Processes

There is growing interest in mining software repository data to understand, and predict, various aspects of team processes. In particular, text mining and natural-language processing (NLP) techniques have supported such efforts. Visualization may also supplement text mining to reveal unique multi-dimensional insights into software teams' behavioral processes. We demonstrate the utility of combining these approaches in this study. Future application of these methods to the study of teams' behavioral processes offers promise for both research and practice.

preprint2021arXiv

Communication and Personality Profiles of Global Software Developers

Context: Prior research has established that a small proportion of individuals dominate team communication during global software development. It is not known, however, how these members' contributions affect their teams' knowledge diffusion process, or whether their personality profiles are responsible for their dominant presence. Objective: We set out to address this gap through the study of repository artifacts. Method: Artifacts from ten teams were mined from the IBM Rational Jazz repository. We employed social network analysis (SNA) to group practitioners into two clusters, Top Members and Others, based on the numbers of messages they communicated and their engagement in task changes. SNA metrics (density, in-degree and closeness) were then used to study practitioners' importance in knowledge diffusion. Thereafter, we performed psycholinguistic analysis on practitioners' messages using linguistic dimensions that had been previously correlated with the Big Five personality profiles. Results: For our sample of 146 practitioners we found that Top Members occupied critical roles in knowledge diffusion, and demonstrated more openness to experience than the Others. Additionally, all personality profiles were represented during teamwork, although openness to experience, agreeableness and extroversion were particularly evident. However, no specific personality predicted members' involvement in knowledge diffusion. Conclusion: Task assignment that promotes highly connected team communication networks may mitigate tacit knowledge loss in global software teams. Additionally, while members expressing openness to experience are likely to be particularly driven to perform, this is not entirely responsible for a global team's success.

preprint2021arXiv

Designing Actively Secure, Highly Available Industrial Automation Applications

Programmable Logic Controllers (PLCs) execute critical control software that drives Industrial Automation and Control Systems (IACS). PLCs can become easy targets for cyber-adversaries as they are resource-constrained and are usually built using legacy, less-capable security measures. Security attacks can significantly affect system availability, which is an essential requirement for IACS. We propose a method to make PLC applications more security-aware. Based on the well-known IEC 61499 function blocks standard for developing IACS software, our method allows designers to annotate critical parts of an application during design time. On deployment, these parts of the application are automatically secured using appropriate security mechanisms to detect and prevent attacks. We present a summary of availability attacks on distributed IACS applications that can be mitigated by our proposed method. Security mechanisms are achieved using IEC 61499 Service-Interface Function Blocks (SIFBs) embedding Intrusion Detection and Prevention System (IDPS), added to the application at compile time. This method is more amenable to providing active security protection from attacks on previously unknown (zero-day) vulnerabilities. We test our solution on an IEC 61499 application executing on Wago PFC200 PLCs. Experiments show that we can successfully log and prevent attacks at the application level as well as help the application to gracefully degrade into safe mode, subsequently improving availability.

preprint2021arXiv

Evaluating prediction systems in software project estimation

Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems. Method: A new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random 'predictions', that is guessing, and (3) calculation of effect sizes. Results: Previously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing. Conclusions: Biased accuracy statistics such as MMRE are deprecated. By contrast this new empirical validation framework leads to meaningful results. Such steps will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners.

preprint2021arXiv

Factors that Affect Software Systems Development Project Outcomes: A Survey of Research

Determining the factors that have an influence on software systems development and deployment project outcomes has been the focus of extensive and ongoing research for more than 30 years. We provide here a survey of the research literature that has addressed this topic in the period 1996-2006, with a particular focus on empirical analyses. On the basis of this survey we present a new classification framework that represents an abstracted and synthesized view of the types of factors that have been asserted as influencing project outcomes.

preprint2021arXiv

Finding faults: A scoping study of fault diagnostics for Industrial Cyber-Physical Systems

Context: As Industrial Cyber-Physical Systems (ICPS) become more connected and widely-distributed, often operating in safety-critical environments, we require innovative approaches to detect and diagnose the faults that occur in them. Objective: We profile fault identification and diagnosis techniques employed in the aerospace, automotive, and industrial control domains. By examining both theoretical presentations as well as case studies from production environments, we present a profile of the current approaches being employed and identify gaps. Methodology: A scoping study was used to identify and compare fault detection and diagnosis methodologies that are presented in the current literature. Results: Fault identification and analysis studies from 127 papers published from 2004 to 2019 reveal a wide diversity of promising techniques, both emerging and in-use. These range from traditional Physics-based Models to Data-Driven Artificial Intelligence (AI) and Knowledge-Based approaches. Predictive diagnostics or prognostics featured prominently across all sectors, along with discussions of techniques including Fault trees, Petri nets and Markov approaches. We also profile some of the techniques that have reached the highest Technology Readiness Levels, showing how those methods are being applied in real-world environments beyond the laboratory. Conclusions: Our results suggest that the continuing wide use of both Model-Based and Data-Driven AI techniques across all domains, especially when they are used together in hybrid configuration, reflects the complexity of the current ICPS application space. While creating sufficiently-complete models is labor intensive, Model-free AI techniques were evidenced as a viable way of addressing aspects of this challenge, demonstrating the increasing sophistication of current machine learning systems.(Abridged)

preprint2021arXiv

Investigating a Conceptual Construct for Software Context

A growing number of empirical software engineering researchers suggest that a complementary focus on theory is required if the discipline is to mature. A first step in theory-building involves the establishment of suitable theoretical constructs. For researchers studying software projects, the lack of a theoretical construct for context is problematic for both experimentation and effort estimation. For experiments, insufficiently understood contextual factors confound results, and for estimation, unstated contextual factors affect estimation reliability. We have earlier proposed a framework that we suggest may be suitable as a construct for context i.e. represents a minimal, spanning set for the space of software contexts. The framework has six dimensions, described as Who, Where, What, When, How and Why. In this paper, we report the outcomes of a pilot study to test its suitability by categorising contextual factors from the software engineering literature into the framework. We found that one of the dimensions, Why, does not represent context, but rather is associated with objectives. We also identified some factors that do not clearly fit into the framework and require further investigation. Our contributions are the pursuing of a theoretical approach to understanding software context, the initial establishment and evaluation of a construct for context and the exposure of a lack of clarity of meaning in many 'contexts' currently applied as factors for estimating project outcomes.

preprint2021arXiv

Onshore to Near-Shore Outsourcing Transitions: Unpacking Tensions

This study is directed towards highlighting tensions of incoming and outgoing vendors during outsourcing in a near-shore context. Incoming-and-outgoing of vendors generate a complex form of relationship in which the participating organizations cooperate and compete simultaneously. It is of great importance to develop knowledge about this kind of relationship typically in the current GSE-related multi-sourcing environment. We carried out a longitudinal case study and utilized data from the 'Novopay' project, which is available in the public domain. This project involved an outgoing New Zealand based vendor and incoming Australian based vendor. The results show that the demand for the same human resources, dependency upon cooperation and collaboration between vendors, reliance on each other system's configurations and utilizing similar strategies by the client, which worked for the previous vendor, generated a set of tensions which needed to be continuously managed throughout the project.

preprint2021arXiv

Personality Profiles of Global Software Developers

Context: Individuals' personality traits have been shown to influence their behavior during team work. In particular, positive group attitudes are said to be essential for distributed and global software development efforts where collaboration is critical to project success. Objective: Given this, we have sought to study the influence of global software practitioners' personality profiles from a psycholinguistic perspective. Method: Artifacts from ten teams were selected from the IBM Rational Jazz repository and mined. We employed social network analysis (SNA) techniques to identify and group practitioners into two clusters based on the numbers of messages they communicated, Top Members and Others, and used standard statistical techniques to assess practitioners' engagement in task changes associated with work items. We then performed psycholinguistic analysis on practitioners' messages using linguistic dimensions of the LIWC tool that had been previously correlated with the Big Five personality profiles. Results: For our sample of 146 practitioners, we found that the Top Members demonstrated more openness to experience than the Other practitioners. Additionally, practitioners involved in usability-related tasks were found to be highly extroverted, and coders were most neurotic and conscientious. Conclusion: High levels of organizational and inter-personal skills may be useful for those operating in distributed settings, and personality diversity is likely to boost team performance.

preprint2021arXiv

Progress Report on a Proposed Theory for Software Development

There is growing acknowledgement within the software engineering community that a theory of software development is needed to integrate the myriad methodologies that are currently popular, some of which are based on opposing perspectives. We have been developing such a theory for a number of years. In this position paper, we overview our theory along with progress made thus far. We suggest that, once fully developed, this theory, or one similar to it, may be applied to support situated software development, by providing an overarching model within which software initiatives might be categorised and understood. Such understanding would inevitably lead to greater predictability with respect to outcomes.

preprint2021arXiv

Qualitative Research on Software Development: A Longitudinal Case Study Methodology

This paper reports the use of a qualitative methodology for conducting longitudinal case study research on software development. We provide a detailed description and explanation of appropriate methods of qualitative data collection and analysis that can be utilized by other researchers in the software engineering field. Our aim is to illustrate the utility of longitudinal case study research, as a complement to existing methodologies for studying software development, so as to enable the community to develop a fuller and richer understanding of this complex, multi-dimensional phenomenon. We discuss the insights gained and lessons learned from applying a longitudinal qualitative approach to an empirical case study of a software development project in a large multi-national organization. We evaluate the methodology used to emphasize its strengths and to address the criticisms traditionally made of qualitative research.

preprint2021arXiv

Relating IS Developers' Attitudes to Engagement

Increasing effort is being directed to understanding the personality profiles of highly engaged information systems (IS) developers and the impact of such profiles on development outcomes. However, there has been a lesser degree of attention paid to studying attitudes at a fine-grained level, and relating such attitudes to developers' in-process activities, in spite of the fact that social motivation theory notes the importance of such a relationship in general group work. We have therefore applied linguistic analysis, text mining and visualization, and statistical analysis techniques to artefacts developed by 474 developers to study these issues. Our results indicate that our sample of IS developers conveyed a range of attitudes while working to deliver systems features, and those practitioners who communicated the most were also the most engaged. Additionally, of eight linguistic dimensions considered, expressions regarding work and achievement, as well as insightful attitudes, were most closely related to developers' engagement. Accordingly, team diversity and the provision of active support for outcome-driven developers may contribute positively to maintaining team balance and performance.

preprint2021arXiv

The Impact of Sampling and Rule Set Size on Generated Fuzzy Inference System Predictive Accuracy: Analysis of a Software Engineering Data Set

Software project management makes extensive use of predictive modeling to estimate product size, defect proneness and development effort. Although uncertainty is acknowledged in these tasks, fuzzy inference systems, designed to cope well with uncertainty, have received only limited attention in the software engineering domain. In this study we empirically investigate the impact of two choices on the predictive accuracy of generated fuzzy inference systems when applied to a software engineering data set: sampling of observations for training and testing; and the size of the rule set generated using fuzzy c-means clustering. Over ten samples we found no consistent pattern of predictive performance given certain rule set size. We did find, however, that a rule set compiled from multiple samples generally resulted in more accurate predictions than single sample rule sets. More generally, the results provide further evidence of the sensitivity of empirical analysis outcomes to specific model-building decisions.

preprint2021arXiv

The Many Facets of Distance and Space: the Mobility of Actors in Globally Distributed Project Teams

Global software development practices are shaped by the challenges of time and 'distance', notions perceived to separate sites in a multi-site collaboration. Yet while sites may be fixed, the actors in global projects are mobile, so distance becomes a dynamic spatial dimension rather than a static concept. This empirical study applies grounded theory to unpack the nature of mobility within a three site globally distributed team setting. We develop a model for mapping the movements of team members in local and global spaces, and demonstrate its operation through static snapshots and dynamic patterns evolving over time. Through this study we highlight the complexity of 'mobility' as one facet of 'space' in globally distributed teams and illuminate its tight coupling with the accompanying dimensions of accessibility and context awareness.

preprint2021arXiv

The significance of user-defined identifiers in Java source code authorship identification

When writing source code, programmers have varying levels of freedom when it comes to the creation and use of identifiers. Do they habitually use the same identifiers, names that are different to those used by others? Is it then possible to tell who the author of a piece of code is by examining these identifiers? If so, can we use the presence or absence of identifiers to assist in correctly classifying programs to authors? Is it possible to hide the provenance of programs by identifier renaming? In this study, we assess the importance of three types of identifiers in source code author classification for two different Java program data sets. We do this through a sequence of experiments in which we disguise one type of identifier at a time. These experiments are performed using as a tool the Source Code Author Profiles (SCAP) method. The results show that, although identifiers when examined as a whole do not seem to reflect program authorship for these data sets, when examined separately there is evidence that class names do signal the author of the program. In contrast, simple variables and method names used in Java programs do not appear to reflect program authorship. On the contrary, our analysis suggests that such identifiers are so common as to mask authorship. We believe that these results have applicability in relation to the robustness of code plagiarism analysis and that the underlying methods could be valuable in cases of litigation arising from disputes over program authorship.

preprint2021arXiv

They'll Know It When They See It: Analyzing Post-Release Feedback from the Android Community

It is known that user involvement and user-centered design enhance system acceptance, particularly when end-users' views are considered early in the process. However, the increasingly common method of system deployment, through frequent releases via an online application distribution platform, relies more on post-release feedback from a virtual community. Such feedback may be received from large and diverse communities of users, posing challenges to developers in terms of extracting and identifying the most pressing requests to address. In seeking to tackle these challenges we have used natural language processing techniques to study enhancement requests logged by the Android community. We observe that features associated with a specific subset of topics were most frequently requested for improvement, and that end-users expressed particular discontent with the Jellybean release. End-users also tended to request improvements to specific issues together, potentially posing a prioritization challenge to Google.

preprint2021arXiv

Understanding Technology Use in Global Virtual Teams: Research Methodologies and Methods

Context: The globalisation of activities associated with software development and use has introduced many challenges in practice and for research. While the predominant approach to research in software engineering has followed a positivist science model, this approach may be sub-optimal when addressing problems with a dominant social or cultural dimension, such as those frequently encountered when studying work practices in a globally distributed team setting. The investigation of such a team reported in this paper provides one example of an alternative approach to research in a global context, through a longitudinal interpretive field study seeking to understand how global virtual teams mediated the use of technology. Objective: Our focus in this paper is on the conduct of research in the context of global software activities, particularly as applied to the actions and interactions of global virtual teams. Method: We describe how we undertook a substantial field study of global virtual teams, and highlight how the adopted structuration theory enabled us to deliver effectively against our goals. Results: We believe that the approach taken suited a research context in which situated practices were occurring over time in a highly complex domain, ensuring that our results were both strongly grounded and relevant to practice. It has resulted in the generation of substantive theory and techniques that have been adapted and applied on a pilot basis in further field settings. Conclusion: We conclude that globally distributed teamwork presents a complex context which demands new research approaches, beyond the limited set customarily applied by software engineering researchers. We advocate experimenting with different research methodologies and methods so that we have a more rounded repertoire to address the most important and relevant issues in global software development research.(Abridged)

preprint2021arXiv

Understanding the attitudes, knowledge sharing behaviors and task performance of core developers: A longitudinal study

Context: Prior research has established that a few individuals generally dominate project communication and source code changes during software development, regardless of task assignments at project initiation. Objective: While this phenomenon has been noted, prior research has not sought to understand these dominant individuals. Previous work has found that core communicators are the gatekeepers of their teams' knowledge, and the performance of these members was correlated with their teams' success. Building on this work, we have employed a longitudinal approach to study the way core developers' attitudes, knowledge sharing behaviors and task performance change over the course of their project. Method: We first used Social Network Analysis (SNA) and standard statistical analysis techniques to identify and select artifacts and central practitioners from ten different software development teams. We then applied psycholinguistic analysis and directed content analysis (CA) techniques to interpret the content of these practitioners' messages. Finally, we inspected core developers' activities at various points in time during systems' development. Results: Among our findings, we observe that core developers' attitudes and knowledge sharing behaviors were linked to their involvement in actual software development and the demands of their wider project teams. However, core developers appeared to naturally possess high levels of insightful characteristics. Conclusion: Project performance would likely benefit from strategies aimed at surrounding core developers with other competent communicators. Core developers should also be supported by a wider team who are willing to ask questions and challenge their ideas. Finally, the availability of adequate communication channels would help with maintaining positive team climate especially in distributed developments.(Abridged)

preprint2021arXiv

Using Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews

Background: A systematic literature review (SLR) is a methodology used to aggregate all relevant existing evidence to answer a research question of interest. Although crucial, the process used to select primary studies can be arduous, time consuming, and must often be conducted manually. Objective: We propose a novel approach, known as 'Systematic Literature Review based on Visual Text Mining' or simply SLR-VTM, to support the primary study selection activity using visual text mining (VTM) techniques. Method: We conducted a case study to compare the performance and effectiveness of four doctoral students in selecting primary studies manually and using the SLR-VTM approach. To enable the comparison, we also developed a VTM tool that implemented our approach. We hypothesized that students using SLR-VTM would present improved selection performance and effectiveness. Results: Our results show that incorporating VTM in the SLR study selection activity reduced the time spent in this activity and also increased the number of studies correctly included. Conclusions: Our pilot case study presents promising results suggesting that the use of VTM may indeed be beneficial during the study selection activity when performing an SLR.

preprint2021arXiv

Valuing Evaluation: Methodologies to Bridge Research and Practice

The potential disconnect between research and practice in software engineering (SE) means that the uptake of research outcomes has at times been limited. In this paper we seek to identify research approaches that are rigorous in terms of method but that are also relevant to software engineering practitioners. After considering the correspondence of several approaches to software systems research and practice we recommend a framework for applying grounded theory in SE research, as a means of delivering both robust and useful outcomes.

preprint2021arXiv

Walking Through the Method Zoo: Does Higher Education really meet Software Industry Demands?

Software engineering educators are continually challenged by rapidly evolving concepts, technologies, and industry demands. Due to the omnipresence of software in a digitalized society, higher education institutions (HEIs) have to educate the students such that they learn how to learn, and that they are equipped with a profound basic knowledge and with latest knowledge about modern software and system development. Since industry demands change constantly, HEIs are challenged in meeting such current and future demands in a timely manner. This paper analyzes the current state of practice in software engineering education. Specifically, we want to compare contemporary education with industrial practice to understand if frameworks, methods and practices for software and system development taught at HEIs reflect industrial practice. For this, we conducted an online survey and collected information about 67 software engineering courses. Our findings show that development approaches taught at HEIs quite closely reflect industrial practice. We also found that the choice of what process to teach is sometimes driven by the wish to make a course successful. Especially when this happens for project courses, it could be beneficial to put more emphasis on building learning sequences with other courses.

preprint2021arXiv

What Affects Team Behavior? Preliminary Linguistic Analysis of Communications in the Jazz Repository

There is a growing belief that understanding and addressing the human processes employed during software development is likely to provide substantially more value to industry than yet more recommendations for the implementation of various methods and tools. To this end, considerable research effort has been dedicated to studying human issues as represented in software artifacts, due to its relatively unobtrusive nature. We have followed this line of research and have conducted a preliminary study of team behaviors using data mining techniques and linguistic analysis. Our data source, the IBM Rational Jazz repository, was mined and data from three different project areas were extracted. Communications in these projects were then analyzed using the LIWC linguistic analysis tool. We found that although there are some variations in language use among teams working on project areas dedicated to different software outcomes, project type and the mix of (and number of) individuals involved did not affect team behaviors as evident in their communications. These assessments are initial conjectures, however; we plan further exploratory analysis to validate these results. We explain these findings and discuss their implications for software engineering practice.

preprint2021arXiv

What are Hybrid Development Methods Made Of? An Evidence-based Characterization

Among the multitude of software development processes available, hardly any is used by the book. Regardless of company size or industry sector, a majority of project teams and companies use customized processes that combine different development methods -- so-called hybrid development methods. Even though such hybrid development methods are highly individualized, a common understanding of how to systematically construct synergetic practices is missing. In this paper, we make a first step towards devising such guidelines. Grounded in 1,467 data points from a large-scale online survey among practitioners, we study the current state of practice in process use to answer the question: What are hybrid development methods made of? Our findings reveal that only eight methods and few practices build the core of modern software development. This small set allows for statistically constructing hybrid development methods. Using an 85% agreement level in the participants' selections, we provide two examples illustrating how hybrid development methods are characterized by the practices they are made of. Our evidence-based analysis approach lays the foundation for devising hybrid development methods.

preprint2020arXiv

A Model for Software Contexts

It is widely acknowledged by researchers and practitioners that software development methodologies are generally adapted to suit specific project contexts. Research into practices-as-implemented has been fragmented and has tended to focus either on the strength of adherence to a specific methodology or on how the efficacy of specific practices is affected by contextual factors. We submit the need for a more holistic, integrated approach to investigating context-related best practice. We propose a six-dimensional model of the problem-space, with dimensions organisational drivers (why), space and time (where), culture (who), product life-cycle stage (when), product constraints (what) and engagement constraints (how). We test our model by using it to describe and explain a reported implementation study. Our contributions are a novel approach to understanding situated software practices and a preliminary model for software contexts.

preprint2020arXiv

Consolidating a Model for Describing Situated Software Practices

Many prescriptive approaches to developing software intensive systems have been advocated but each is based on assumptions about context. It has been found that practitioners do not follow prescribed methodologies, but rather select and adapt specific practices according to local needs. As researchers, we would like to be in a position to support such tailoring. However, at the present time we simply do not have sufficient evidence relating practice and context for this to be possible. We have long understood that a deeper understanding of situated software practices is crucial for progress in this area, and have been exploring this problem from a number of perspectives. In this position paper, we draw together the various aspects of our work into a holistic model and discuss the ways in which the model might be applied to support the long term goal of evidence-based decision support for practitioners. The contribution specific to this paper is a discussion on model evaluation, including a proof-of-concept demonstration of model utility. We map Kernel elements from the Essence system to our model and discuss gaps and limitations exposed in the Kernel. Finally, we overview our plans for further refining and evaluating the model.

preprint2020arXiv

Research in Global Software Engineering: A Systematic Snapshot

This paper reports our extended analysis of the recent literature addressing global software engineering (GSE), using a new Systematic Snapshot Mapping (SSM) technique. The primary purpose of this work is to understand what issues are being addressed and how research is being carried out in GSE -- and comparatively, what work is not being conducted. We carried out the analysis in two stages. In the first stage we analyzed 275 papers published between January 2011 and June 2012, and in the second stage we augmented our analysis by considering a further 26 papers (from the 2013 International Conference on Global Software Engineering (ICGSE'13). Our results reveal that, currently, GSE studies are focused on management- and infrastructure-related factors, using principally evaluative research approaches. Most of the studies are conducted at the organizational level, mainly using methods such as interviews, surveys, field studies and case studies. The USA, India and China are major players in GSE, with USA-India collaborations being the most frequently studied, followed by USA-China. While a considerable number of GSE-related studies have been published since January 2011 they are currently quite narrowly focused, on exploratory research and explanatory theories, and the critical research paradigm has been untouched. An absence of formulative research, experimentation and simulation, and a related focus on evaluative approaches, all suggest that existing tools, methods and approaches from related fields are being tested in the GSE context, even though these may not be inherently applicable to the additional scale and complexity of GSE.