Source author record

Hideaki Hata

Hideaki Hata appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Computer Science and Game Theory Artificial Intelligence Digital Libraries

Catalog footprint

What is connected

11works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

"My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors

GitHub Sponsors was launched in 2019, enabling donations to open-source software developers to provide financial support, as per GitHub's slogan: "Invest in the projects you depend on". However, a 2022 study on GitHub Sponsors found that only two-fifths of developers who were seeking sponsorship received a donation. The study found that, other than internal actions (such as offering perks to sponsors), developers had advertised their GitHub Sponsors profiles on social media, such as Twitter (also known as X). Therefore, in this work, we investigate the impact of tweets that contain links to GitHub Sponsors profiles on sponsorship, as well as their reception on Twitter/X. We further characterize these tweets to understand their context and find that (1) such tweets have the impact of increasing the number of sponsors acquired, (2) compared to other donation platforms such as Open Collective and Patreon, GitHub Sponsors has significantly fewer interactions but is more visible on Twitter/X, and (3) developers tend to contribute more to open-source software during the week of posting such tweets. Our findings are the first step toward investigating the impact of social media on obtaining funding to sustain open-source software.

preprint2022arXiv

GitHub Sponsors: Exploring a New Way to Contribute to Open Source

GitHub Sponsors, launched in 2019, enables donations to individual open source software (OSS) developers. Financial support for OSS maintainers and developers is a major issue in terms of sustaining OSS projects, and the ability to donate to individuals is expected to support the sustainability of developers, projects, and community. In this work, we conducted a mixed-methods study of GitHub Sponsors, including quantitative and qualitative analyses, to understand the characteristics of developers who are likely to receive donations and what developers think about donations to individuals. We found that: (1) sponsored developers are more active than non-sponsored developers, (2) the possibility to receive donations is related to whether there is someone in their community who is donating, and (3) developers are sponsoring as a new way to contribute to OSS. Our findings are the first step towards data-informed guidance for using GitHub Sponsors, opening up avenues for future work on this new way of financially sustaining the OSS community.

preprint2022arXiv

Release as a Contract: A Concept of Meta-Maintenance for the Entire FLOSS Ecosystem

We advocate for a paradigm shift in supporting free/libre and open source software (FLOSS) ecosystem maintenance, from focusing on individual projects to monitoring a whole organic system of the entire FLOSS ecosystem, which we call software meta-maintenance. We discuss challenges of building a global source code management system, a global issue management system, and FLOSS human capital index, based on the blockchain technologies.

preprint2022arXiv

Software Supply Chain Map: How Reuse Networks Expand

Clone-and-own is a typical code reuse approach because of its simplicity and efficiency. Cloned software components are maintained independently by a new owner. These clone-and-own operations can be occurred sequentially, that is, cloned components can be cloned again and owned by other new owners on the supply chain. In general, code reuse is not documented well, consequently, appropriate changes like security patches cannot be propagated to descendant software projects. However, the OpenChain Project defined identifying and tracking source code reuses as responsibilities of FLOSS software staffs. Hence supporting source code reuse awareness is in a real need. This paper studies software reuse relations in FLOSS ecosystem. Technically, clone-and-own reuses of source code can be identified by file-level clone set detection. Since change histories are associated with files, we can determine origins and destinations in reusing across multiple software by considering times. By building software supply chain maps, we find that clone-and-own is prevalent in FLOSS development, and set of files are reused widely and repeatedly. These observations open up future challenges of maintaining and tracking global software genealogies.

preprint2021arXiv

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub

Online collaboration platforms such as GitHub have provided software developers with the ability to easily reuse and share code between repositories. With clone-and-own and forking becoming prevalent, maintaining these shared files is important, especially for keeping the most up-to-date version of reused code. Different to related work, we propose the concept of meta-maintenance -- i.e., tracking how the same files evolve in different repositories with the aim to provide useful maintenance opportunities to those files. We conduct an exploratory study by analyzing repositories from seven different programming languages to explore the potential of meta-maintenance. Our results indicate that a majority of active repositories on GitHub contains at least one file which is also present in another repository, and that a significant minority of these files are maintained differently in the different repositories which contain them. We manually analyzed a representative sample of shared files and their variants to understand which changes might be useful for meta-maintenance. Our findings support the potential of meta-maintenance and open up avenues for future work to capitalize on this potential.

preprint2020arXiv

Ammonia: An Approach for Deriving Project-specific Bug Patterns

Finding and fixing buggy code is an important and cost-intensive maintenance task, and static analysis (SA) is one of the methods developers use to perform it. SA tools warn developers about potential bugs by scanning their source code for commonly occurring bug patterns, thus giving those developers opportunities to fix the warnings (potential bugs) before they release the software. Typically, SA tools scan for general bug patterns that are common to any software project (such as null pointer dereference), and not for project specific patterns. However, past research has pointed to this lack of customizability as a severe limiting issue in SA. Accordingly, in this paper, we propose an approach called Ammonia, which is based on statically analyzing changes across the development history of a project, as a means to identify project-specific bug patterns. Furthermore, the bug patterns identified by our tool do not relate to just one developer or one specific commit, they reflect the project as a whole and compliment the warnings from other SA tools that identify general bug patterns. Herein, we report on the application of our implemented tool and approach to four Java projects: Ant, Camel, POI, and Wicket. The results obtained show that our tool could detect 19 project specific bug patterns across those four projects. Next, through manual analysis, we determined that six of those change patterns were actual bugs and submitted pull requests based on those bug patterns. As a result, five of the pull requests were merged.

preprint2020arXiv

From Academia to Software Development: Publication Citations in Source Code Comments

Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories. We propose an automated approach for detecting academic publications based on Named Entity Recognition, and achieve 0.90 in $F_1$ as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from 25,925 active repositories written in seven programming languages. Our findings indicate that academic publications can be knowledge sources for software development. These referenced publications are particularly from journals. In terms of knowledge transfer, algorithm is the most prevalent type of knowledge transferred from the publications, with proposed formulas or equations typically implemented in methods or functions in source code files. In a closer look at GitHub repositories referencing academic publications, we find that science-related repositories are the most frequent among GitHub repositories with publication citations, and that the vast majority of these publications are referenced by repository owners who are different from the publication authors. We also find that referencing older publications can lead to potential issues related to obsolete knowledge.

preprint2020arXiv

Optimizing Affine Maximizer Auctions via Linear Programming: an Application to Revenue Maximizing Mechanism Design for Zero-Day Exploits Markets

Optimizing within the affine maximizer auctions (AMA) is an effective approach for revenue maximizing mechanism design. The AMA mechanisms are strategy-proof and individually rational (if the agents' valuations for the outcomes are nonnegative). Every AMA mechanism is characterized by a list of parameters. By focusing on the AMA mechanisms, we turn mechanism design into a value optimization problem, where we only need to adjust the parameters. We propose a linear programming based heuristic for optimizing within the AMA family. We apply our technique to revenue maximizing mechanism design for zero-day exploit markets. We show that due to the nature of the zero-day exploit markets, if there are only two agents (one offender and one defender), then our technique generally produces a near optimal mechanism: the mechanism's expected revenue is close to the optimal revenue achieved by the optimal strategy-proof and individually rational mechanism (not necessarily an AMA mechanism).

preprint2020arXiv

Predicting Defective Lines Using a Model-Agnostic Technique

Defect prediction models are proposed to help a team prioritize source code areas files that need Software QualityAssurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole filewhile only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e.,LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defectswhile requiring a smaller amount of inspection effort and a manageable computation cost.

preprint2020arXiv

Revenue Maximizing Markets for Zero-Day Exploits

Markets for zero-day exploits (software vulnerabilities unknown to the vendor) have a long history and a growing popularity. We study these markets from a revenue-maximizing mechanism design perspective. We first propose a theoretical model for zero-day exploits markets. In our model, one exploit is being sold to multiple buyers. There are two kinds of buyers, which we call the defenders and the offenders. The defenders are buyers who buy vulnerabilities in order to fix them (e.g., software vendors). The offenders, on the other hand, are buyers who intend to utilize the exploits (e.g., national security agencies and police). Our model is more than a single-item auction. First, an exploit is a piece of information, so one exploit can be sold to multiple buyers. Second, buyers have externalities. If one defender wins, then the exploit becomes worthless to the offenders. Third, if we disclose the details of the exploit to the buyers before the auction, then they may leave with the information without paying. On the other hand, if we do not disclose the details, then it is difficult for the buyers to come up with their private valuations. Considering the above, our proposed mechanism discloses the details of the exploit to all offenders before the auction. The offenders then pay to delay the exploit being disclosed to the defenders.

preprint2019arXiv

Towards Generation of Visual Attention Map for Source Code

Program comprehension is a dominant process in software development and maintenance. Experts are considered to comprehend the source code efficiently by directing their gaze, or attention, to important components in it. However, reflecting the importance of components is still a remaining issue in gaze behavior analysis for source code comprehension. Here we show a conceptual framework to compare the quantified importance of source code components with the gaze behavior of programmers. We use "attention" in attention models (e.g., code2vec) as the importance indices for source code components and evaluate programmers' gaze locations based on the quantified importance. In this report, we introduce the idea of our gaze behavior analysis using the attention map, and the results of a preliminary experiment.

Hideaki Hata

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

"My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors

GitHub Sponsors: Exploring a New Way to Contribute to Open Source

Release as a Contract: A Concept of Meta-Maintenance for the Entire FLOSS Ecosystem

Software Supply Chain Map: How Reuse Networks Expand

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub

Ammonia: An Approach for Deriving Project-specific Bug Patterns

From Academia to Software Development: Publication Citations in Source Code Comments

Optimizing Affine Maximizer Auctions via Linear Programming: an Application to Revenue Maximizing Mechanism Design for Zero-Day Exploits Markets

Predicting Defective Lines Using a Model-Agnostic Technique

Revenue Maximizing Markets for Zero-Day Exploits

Towards Generation of Visual Attention Map for Source Code