Source author record

Mohamed Soliman

Mohamed Soliman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence Databases Computation and Language

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

Technical debt is a metaphor indicating sub-optimal solutions implemented for short-term benefits by sacrificing the long-term maintainability and evolvability of software. A special type of technical debt is explicitly admitted by software engineers (e.g. using a TODO comment); this is called Self-Admitted Technical Debt or SATD. Most work on automatically identifying SATD focuses on source code comments. In addition to source code comments, issue tracking systems have shown to be another rich source of SATD, but there are no approaches specifically for automatically identifying SATD in issues. In this paper, we first create a training dataset by collecting and manually analyzing 4,200 issues (that break down to 23,180 sections of issues) from seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase, Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and Google Monorail). We then propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning. Our findings indicate that: 1) our approach outperforms baseline approaches by a wide margin with regard to the F1-score; 2) transferring knowledge from suitable datasets can improve the predictive performance of our approach; 3) extracted SATD keywords are intuitive and potentially indicating types and indicators of SATD; 4) projects using different issue tracking systems have less common SATD keywords compared to projects using the same issue tracking system; 5) a small amount of training data is needed to achieve good accuracy.

preprint2022arXiv

Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale

We introduce Saga, a next-generation knowledge construction and serving platform for powering knowledge-based applications at industrial scale. Saga follows a hybrid batch-incremental design to continuously integrate billions of facts about real-world entities and construct a central knowledge graph that supports multiple production use cases with diverse requirements around data freshness, accuracy, and availability. In this paper, we discuss the unique challenges associated with knowledge graph construction at industrial scale, and review the main components of Saga and how they address these challenges. Finally, we share lessons-learned from a wide array of production use cases powered by Saga.

preprint2022arXiv

Symptoms of Architecture Erosion in Code Reviews: A Study of Two OpenStack Projects

The phenomenon of architecture erosion can negatively impact the maintenance and evolution of software systems, and manifest in a variety of symptoms during software development. While erosion is often considered rather late, its symptoms can act as early warnings to software developers, if detected in time. In addition to static source code analysis, code reviews can be a source of detecting erosion symptoms and subsequently taking action. In this study, we investigate the erosion symptoms discussed in code reviews, as well as their trends, and the actions taken by developers. Specifically, we conducted an empirical study with the two most active Open Source Software (OSS) projects in the OpenStack community (i.e., Nova and Neutron). We manually checked 21,274 code review comments retrieved by keyword search and random selection, and identified 502 code review comments (from 472 discussion threads) that discuss erosion. Our findings show that (1) the proportion of erosion symptoms is rather low, yet notable in code reviews and the most frequently identified erosion symptoms are architectural violation, duplicate functionality, and cyclic dependency; (2) the declining trend of the identified erosion symptoms in the two OSS projects indicates that the architecture tends to stabilize over time; and (3) most code reviews that identify erosion symptoms have a positive impact on removing erosion symptoms, but a few symptoms still remain and are ignored by developers. The results suggest that (1) code review provides a practical way to reduce erosion symptoms; and (2) analyzing the trend of erosion symptoms can help get an insight about the erosion status of software systems, and subsequently avoid the potential risk of architecture erosion.

preprint2022arXiv

Understanding Software Architecture Erosion: A Systematic Mapping Study

Architecture erosion (AEr) can adversely affect software development and has received significant attention in the last decade. However, there is an absence of a comprehensive understanding of the state of research about the reasons and consequences of AEr, and the countermeasures to address AEr. This work aims at systematically investigating, identifying, and analyzing the reasons, consequences, and ways of detecting and handling AEr. With 73 studies included, the main results are as follows: (1) AEr manifests not only through architectural violations and structural issues but also causing problems in software quality and during software evolution; (2) non-technical reasons that cause AEr should receive the same attention as technical reasons, and practitioners should raise awareness of the grave consequences of AEr, thereby taking actions to tackle AEr-related issues; (3) a spectrum of approaches, tools, and measures has been proposed and employed to detect and tackle AEr; and (4) three categories of difficulties and five categories of lessons learned on tackling AEr were identified. The results can provide researchers a comprehensive understanding of AEr and help practitioners handle AEr and improve the sustainability of their architecture. More empirical studies are required to investigate the practices of detecting and addressing AEr in industrial settings.

preprint2020arXiv

Identification and Remediation of Self-Admitted Technical Debt in Issue Trackers

Technical debt refers to taking shortcuts to achieve short-term goals, which might negatively influence software maintenance in the long-term. There is increasing attention on technical debt that is admitted by developers in source code comments (termed as self-admitted technical debt or SATD). But SATD in issue trackers is relatively unexplored. We performed a case study, where we manually examined 500 issues from two open source projects (i.e. Hadoop and Camel), which contained 152 SATD items. We found that: 1) eight types of technical debt are identified in issues, namely architecture, build, code, defect, design, documentation, requirement, and test debt; 2) developers identify technical debt in issues in three different points in time, and a small part is identified by its creators; 3) the majority of technical debt is paid off, 4) mostly by those who identified it or created it; 5) the median time and average time to repay technical debt are 872.3 and 25.0 hours respectively.

preprint2011arXiv

Automatic Wrappers for Large Scale Web Extraction

We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables us to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based techniques require, we are able to perform information extraction at web-scale, with accuracy unattained with existing unsupervised extraction techniques. Our system is used in production at Yahoo! and powers live applications.

Mohamed Soliman

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale

Symptoms of Architecture Erosion in Code Reviews: A Study of Two OpenStack Projects

Understanding Software Architecture Erosion: A Systematic Mapping Study

Identification and Remediation of Self-Admitted Technical Debt in Issue Trackers

Automatic Wrappers for Large Scale Web Extraction