New Subversion Downloader

Beginning today, we’re trying out a new method of importing Subversion repositories. We’ll be using this new method on a limited number of projects, and as the code proves itself we’ll gradually phase it in for all Subversion projects.

If you see an enlistment on your project marked “Subversion (Sync Beta)”, then your project is part of this experiment. Please let us know if you notice anything unusual. We expect the new project reports to be identical to the previous ones.

For those who are curious, here’s a description of what’s changing.

Internally, Ohloh currently stores all project source in Git repositories, and project reports are all prepared based on these Git repositories. This means that source code stored in other formats (Subversion or CVS) must be converted to Git. Ohloh uses an in-house tool to convert a single Subversion or CVS branch to Git.

While this converter has been extremely reliable for us, it has a lot of limitations in its design, and it’s also painfully slow.

We’re trying a new strategy now: we will begin storing Subversion repositories in their native format. We’ll be using the svnsync command to create local mirrors of entire Subversion repositories.

This has a lot of great benefits:

  1. It’s much faster, which means more frequent project report updates for you and less server maintenance for us.
  2. Ohloh’s Subversion converter follows only a single branch, and it cannot follow directory renames (the infamous --stop-on-copy limitation that causes so much forum traffic). The use of svnsync removes these limitations. Note importantly that this does not mean that all of those projects on Ohloh with missing history will suddenly fix themselves — this is just the first step towards that goal.
  3. This code introduces a new abstraction layer in the Ohloh architecture. We now allow multiple native source code formats on our servers. This opens up the ability (finally) for us to add additional source control systems, like Mercurial.
  • +1 I like it. 🙂

  • Our project Omnium falls into the “–stop-on-copy” category.

    It might be a good one to beta test on, it has only a short history of commits (approx 1200) commits over less than a year, and only one accidental move on the trunk about 2 months ago.

  • Tobu (Tobu)

    Were you using git-svn?

    The issue of not following trunk moves appears there too.

  • robin (Robin Luckey)

    No, we did not use git-svn. We used a tool we developed internally, partly because in our earliest days we were not using Git, but mostly because all of the off-the-shelf conversion tools we tried were not robust enough.

    I’m not surprised that git-svn also has the trunk moves problem, because it’s a pretty nasty problem 🙂

  • jnareb (Jakub Narębski)

    Why don’t you use git-svn for importing Subversion repository to Git? If I understand correctly it does support importing multiple branches.

    If it is not fast enough, you could contribute to it to use (git-fast-import)[http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html]

  • robin (Robin Luckey)

    IIRC, git-svn does allow you to follow a single branch if it is simply renamed from one place to another. It doesn’t support branching universally with full fidelity. I could be wrong; git-svn has been a moving target over the last 2 years.

    Our main issue is download speed, as anyone who has become frustrated with the length of our download queue can attest. Any solution which must convert from one format to another has disappointed in this regard, either being slow to download or requiring intense CPU or disk resources.

    Far and away the fastest way to pull changes from Subversion is to use svnsync. This has the beautiful side effect of also pulling the repository in full fidelity, with all branches intact. Now the problem simply becomes parsing through it all to make a nice report.

  • ray315 (ray315)

    Thanks for interesting information. I’m download.

    Alex Серебро

  • Well, I’d like to help accelerate Mercurial support. If you need any help, just contact me.

  • How about letting people enter in version control details for systems you don’t support? It might help in deciding what systems are worth putting effort into supporting.

  • jnareb (Jakub Narębski)

    That (providing space to enter name of “foreign” version control system, and link to repository) would be a very good idea. It would give you some stats which SCM start supporting first.

    Perhaps if there was also place to give “unofficial” repositories…

    You could always get partial (only main branch / trunk) support for various SCMs using Tailor, e.g. by transforming foreign repositories into Git repositories internally…

  • AFrisby (AFrisby)

    Any chance of getting OpenSim (4753) enrolled in this beta downloader? (We are missing most of our revision history due to us doing some development in branches and moving it back)

  • Can ACE/TAO/CIAO be added to the list of beta projects, we have about 14 years of history that is now not recognised

  • Yeah, Agavi (5907) would be a good candidate to test this 🙂 I have even removed all enlistments because the entire info (activity etc) is completely wrong. See http://www.ohloh.net/forums/8/topics/385

  • robin (Robin Luckey)

    Ohloh reports will be identical no matter which downloader we use. The new svnsync downloader only changes the method we use to download the code to our servers; the reports that we then generate for the website will be unchanged.

    Better support for branching is coming, but it is not implemented yet. Using svnsync for download makes import of the branch history possible, but not “free”.

    Unfortunatley, it also looks like not all forges support svnsync, and we need to verify svnsync support on a server-by-server basis. For now, we are only using svnsync with SourceForge and Google code.

  • PHPUnit falls into the “–stop-on-copy” category, too. And svn.phpunit.de supports svnsync.