Exploring Issue Tracker Data for Ohloh

In the summer of 2013, I spent 6 months as an intern within the Ohloh.net team at Black Duck’s Burlington office. During my internship I did an extensive research on literature around issue trackers, prepared strategies, and developed prototypes for using issue tracker data as an additional data source for Ohloh’s analyses. It is my pleasure to share the end-result of my internship work with you.

Today, Ohloh’s analyses are based on projects’ source code management systems. These analyses tell us a lot about each project’s specifics, its activity, maturity and its user base. However, only taking source code management systems into account does not convey enough about the level of community engagement on the project. We aim to solve this by extending our data sources – beginning with data from issue trackers.

Issue trackers are an important hub for social collaboration within each Open Source project. Discussions about new features and complex technical solutions emerge out of issue trackers. Also, they are central for defect reports and fixes as well as discussion of over-arching project issues. A better understanding of the activities in a project’s issue tracker, leads to a better understanding of the project.

In its rawest form, the data gathered from issue trackers are very diverse. To make them accessible for our analysis, we developed a uniform data model and a toolkit for fetching issue tracker data from open source project’s issue trackers. The model is designed to represent issues from most of the issue trackers with a sharp focus on Jira, Bugzilla and the Github.com issue tracker. Contrary to commit data, issues are mutable – they can be modified. We utilized the Temporal Attribute pattern to express these changes of an issue.

Also, the issues’ mutability forces us to constantly perform incremental updates of our data. As issues are not versioned, we used the modification times of the issues and – where applicable – the filtering mechanisms of the issue tracker to execute the updates in a performant way.

Based on the data we developed 6 visualizations. All of them are intended to deliver different information pieces about an Open Source project. Two of these are discussed below.

The time needed to resolve it – the so-called lag time – is a key piece of meta information. We visualize the lag times within a project with the Lag Time Overview.

Lag Time Overview

The lag time overview as shown above is composed of two box plots. A red box plot shows the lag times by month an issue was opened in, a green box plot shows the lag time by month an issue was closed in. The above figure shows the Lag Time Overview for the jQuery-File-Upload project (a plugin that delivers a jQuery-style file uploader). It shows a typical pattern for a project that aggregated a lot of old issues and closed them all at once in September 2012.

Another key piece of meta-data about issues is the number of open issues that remain open over time. These data are diagrammed in a Backlog chart. The example below shows the backlog chart of the jQuery-File-Upload project from the last example.

Backlog Chart

It shows a constantly increasing backlog of issues. Then, after September 2012 the backlog remained small. This can be an indicator for a change in the project community’s processes or working habits.

For a lot of projects one will see interesting patterns or gain surprising insights. In addition to Lag Time Overview and Backlog Chart, we have implemented a number of additional visual representations:

  • An Activity Chart and a User Activity Chart visualizing the activity within an issue tracker.
  • A Collaboration Matrix that illustrates who works together with whom.
  • An Efficiency Chart designed to estimate a projects community’s efficiency.

Please, play around with the charts in our demo application!

We limited the amount of projects within the demo application to around 1600 projects. Currently only the issue trackers of GitHub projects that have moderate activity, according to Ohloh’s Project Activity Indicator, are shown. Projects with less than 200 issues were excluded. For future iterations we plan to also analyze projects that do not use the Github.com forge. We will cover projects using other popular issue trackers as Bugzilla and Jira. This will enable us to analyze a broader range of projects including the Linux Kernel, Eclipse, Apache projects and more.

We believe, the information regarding issue trackers will make Ohloh’s analyses more valuable. We have many more ideas to enhance the visualizations and metrics regarding issue trackers. But as with everything on Ohloh, this site is devoted to you! So please tell us what you think, let us know what works and what could be improved. The demo web application has Disqus comments set up where you can give your feedback. Have fun!

About Maximillian Capraro

Max was summer intern at the Ohloh.net team in 2013. While in the Ohloh team he wrote his master thesis on data analyses of issue tracker data. Currently, Max is working in a joint research project of the Open Source research group at the University Erlangen-N├╝rnberg and Black Duck on Inner Source.

8 Responses to Exploring Issue Tracker Data for Ohloh