As evident by Arcuri and Briand's paper "A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering", the field of search-based software engineering (SBSE) relies on statistical methods to support the empirical comparison of different techniques. Yet, this statistical source code is often bespoke and is rarely available so that other researchers can replicate the analyses.
As a means for improving the maturity of the data analysis methods used in the SBSE field, I think that it would be useful if there were shared repositories of well-documented statistical analysis code and replication data. That is, the SBSE community would advance if its "hitchhikers" had access to "free vehicles" in the form of GitHub repositories containing the data sets and statistical analysis code used for published papers.
To learn more about the benefits associated with using shared repositories of statistical code in SBSE, you can read (Kapfhammer, McMinn, & Wright, 2016)'s suggestions for improving the study of data arising from experiments with randomized algorithms. If you would like to examine the source code of that paper, then you can visit its GitHub repository at gkapfham/sbst2016-paper. Or, do you have ideas about how the SBSE community should create, share, and apply statistical software? If so, then please contact me to share your thoughts!
Interested in learning more about this topic? Since this blog post was written, my colleagues and students and I have published (McMinn, Kapfhammer, & Wright, 2016) and released a replication package for it as well. If you are interested in replicating the analyses in that paper, then I encourage you to visit gkapfham/virtualmutationanalysis on GitHub. I would also appreciate your feedback on the approach that we took to create and release this replication package. So, please contact me to share your insights!
Enjoy this post? If so, please read, SEED Interview with Kara King, my most recent article.