Source author record

Dror G. Feitelson

Dror G. Feitelson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Computer Science and Game Theory Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

How Developers Extract Functions: An Experiment

Creating functions is at the center of writing computer programs. But there has been little empirical research on how this is done and what are the considerations that developers use. We design an experiment in which we can compare the decisions made by multiple developers under exactly the same conditions. The experiment is based on taking existing production code, "flattening" it into a single monolithic function, and then charging developers with the task of refactoring it to achieve a better design by extracting functions. The results indicate that developers tend to extract functions based on structural cues, such as 'if' or 'try' blocks. And while there are significant correlations between the refactorings performed by different developers, there are also significant differences in the magnitude of refactoring done. For example, the number of functions that were extracted was between 3 and 10, and the amount of code extracted into functions ranged from 37% to 95%.

preprint2022arXiv

When Are Names Similar Or the Same? Introducing the Code Names Matcher Library

Program code contains functions, variables, and data structures that are represented by names. To promote human understanding, these names should describe the role and use of the code elements they represent. But the names given by developers show high variability, reflecting the tastes of each developer, with different words used for the same meaning or the same words used for different meanings. This makes comparing names hard. A precise comparison should be based on matching identical words, but also take into account possible variations on the words (including spelling and typing errors), reordering of the words, matching between synonyms, and so on. To facilitate this we developed a library of comparison functions specifically targeted to comparing names in code. The different functions calculate the similarity between names in different ways, so a researcher can choose the one appropriate for his specific needs. All of them share an attempt to reflect human perceptions of similarity, at the possible expense of lexical matching.

preprint2020arXiv

The Corrective Commit Probability Code Quality Metric

We present a code quality metric, Corrective Commit Probability (CCP), measuring the probability that a commit reflects corrective maintenance. We show that this metric agrees with developers' concept of quality, informative, and stable. Corrective commits are identified by applying a linguistic model to the commit messages. Corrective commits are identified by applying a linguistic model to the commit messages. We compute the CCP of all large active GitHub projects (7,557 projects with at least 200 commits in 2019). This leads to the creation of a quality scale, suggesting that the bottom 10% of quality projects spend at least 6 times more effort on fixing bugs than the top 10%. Analysis of project attributes shows that lower CCP (higher quality) is associated with smaller files, lower coupling, use of languages like JavaScript and C# as opposed to PHP and C++, fewer developers, lower developer churn, better onboarding, and better productivity. Among other things these results support the "Quality is Free" claim, and suggest that achieving higher quality need not require higher expenses.

preprint2015arXiv

Using Students as Experimental Subjects in Software Engineering Research -- A Review and Discussion of the Evidence

Should students be used as experimental subjects in software engineering? Given that students are in many cases readily available and cheap it is no surprise that the vast majority of controlled experiments in software engineering use them. But they can be argued to constitute a convenience sample that may not represent the target population (typically "real" developers), especially in terms of experience and proficiency. This causes many researchers (and reviewers) to have reservations about the external validity of student-based experiments, and claim that students should not be used. Based on an extensive review of published works that have compared students to professionals, we find that picking on "students" is counterproductive for two main reasons. First, classifying experimental subjects by their status is merely a proxy for more important and meaningful classifications, such as classifying them according to their abilities, and effort should be invested in defining and using these more meaningful classifications. Second, in many cases using students is perfectly reasonable, and student subjects can be used to obtain reliable results and further the research goals. In particular, this appears to be the case when the study involves basic programming and comprehension skills, when tools or methodologies that do not require an extensive learning curve are being compared, and in the initial formative stages of large industrial research initiatives -- in other words, in many of the cases that are suitable for controlled experiments of limited scope.

preprint2011arXiv

No justified complaints: On fair sharing of multiple resources

Fair allocation has been studied intensively in both economics and computer science, and fair sharing of resources has aroused renewed interest with the advent of virtualization and cloud computing. Prior work has typically focused on mechanisms for fair sharing of a single resource. We provide a new definition for the simultaneous fair allocation of multiple continuously-divisible resources. Roughly speaking, we define fairness as the situation where every user either gets all the resources he wishes for, or else gets at least his entitlement on some bottleneck resource, and therefore cannot complain about not getting more. This definition has the same desirable properties as the recently suggested dominant resource fairness, and also handles the case of multiple bottlenecks. We then prove that a fair allocation according to this definition is guaranteed to exist for any combination of user requests and entitlements (where a user's relative use of the different resources is fixed). The proof, which uses tools from the theory of ordinary differential equations, is constructive and provides a method to compute the allocations numerically.