Automatic calculation of process metrics and their bug prediction capabilities

Péter Gyimesi

doi:10.14232/actacyb.23.2.2017.7

Authors

Péter Gyimesi

DOI:

https://doi.org/10.14232/actacyb.23.2.2017.7

Abstract

Identifying fault-prone code parts is useful for the developers to help reduce the time required for locating bugs. It is usually done by characterizing the already known bugs with certain kinds of metrics and building a predictive model from the data. For the characterization of bugs, software product and process metrics are the most popular ones. The calculation of product metrics is supported by many free and commercial software products. However, tools that are capable of computing process metrics are quite rare. In this study, we present a method of computing software process metrics in a graph database. We describe the schema of the database created and we present a way to readily get the process metrics from it. With this technique, process metrics can be calculated at the file, class and method levels. We used GitHub as the source of the change history and we selected 5 open-source Java projects for processing. To retrieve positional information about the classes and methods, we used SourceMeter, a static source code analyzer tool. We used Neo4j as the graph database engine, and its query language - cypher - to get the process metrics. We published the tools we created as open-source projects on GitHub. To demonstrate the utility of our tools, we selected 25 release versions of the 5 Java projects and calculated the process metrics for all of the source code elements (files, classes and methods) in these versions. Using our previous published bug database, we built bug databases for the selected projects that contain the computed process metrics and the corresponding bug numbers for files and classes. (We published these databases as an online appendix.) Then we applied 13 machine learning algorithms on the database we created to find out if it is feasible for bug prediction purposes. We achieved F-measure values on average of around 0.7 at the class level, and slightly better values of between 0.7 and 0.75 at the file level. The best performing algorithm was the RandomForest method for both cases.

Downloads

Download data is not yet available.

Automatic calculation of process metrics and their bug prediction capabilities

Authors

DOI:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Developed By

Information

Make a Submission

Current Issue