Understanding and predicting method-level source code changes using commit history data
Abstract
Software development and software maintenance require a large amount of source
code changes to be made to a software repositories. Any change to a repository can
introduce new resource needs which will cost more time and money to the repository
owners. Therefore it is useful to predict future code changes in an effort to help
determine and allocate resources. We are proposing a technique that will predict
whether elements within a repository will change in the near future given the development
history of the repository. The development history is collected from source
code management tools such as GitHub and stored local in a PostgreSQL. The predictions
are developed using the machine learning approaches Support Vector Machine
and Random Forest. Furthermore, we will investigate what factors have the most
impact on the performance of predicting using either Support Vector Machines or
Random Forest with future code changes using commit history. Visualizations were
used as part of the approach to gain a deeper understanding of each repository prior
to making predictions. To validate the results we analyzed open source Java software
repositories including; acra, storm, fresco, dagger, and deeplearning4j.