Migrating the Moshi Monsters backend from SVN to Git

Currently at Mindcandy we use a combination of SVN and Git for all our code. This is because storing lots of frequently changed binary Flash assets in Git is a pretty bad idea. There was also some legacy code that would benefit from moving to Git, but finding the time to do anything about it had been difficult.

Thanks to some recent cleanup and changes, some of those barriers have been disappearing, so over the last couple of days I’ve been getting the Moshi Monsters backend migrated over. It ended up being quite involved, and as we’ll hopefully be migrating more code over, I decided to write-up how I did it.

Problem

Migrating a large SVN repository to Git can cause issues when it contains a large amount of history, tags and branches. This is primarily due to the differences in the way that SVN and Git handle commits and branches.

In short, when using git-svn to migrate, it’s necessary to pull down each commit from SVN, have Git calculate a commit hash for it, then re-commit that to the local repository. Furthermore, because SVN works by copying files for branches and doesn’t merge changes back into trunk in the same way as Git, it is also necessary to track back through every commit in a branch and calculate the commit information. Tags are awkward for similar reasons.

In a repository like the Moshi backend, with a little over 6 years history and plenty of old branches and tags, this can result in Git taking a lot of time, and a lot of CPU to try and go calculate this information, much of which is actually so old that it isn’t needed.

Interestingly though, if we ignore all the branches and tags and just pull down trunk into Git then the process takes about 10 minutes.

Solution

The decision was made to not migrate across all the branches and tags, but instead to get the entirety of the trunk history, and just a select number of the recent branches and tags. Unfortunately, this is a little fiddly to do with git-svn and requires a bit of config magic.

I’ll cover the commands necessary for doing this, however there is some other information that is useful when doing an SVN migration that this won’t go over. Primarily dealing with commit author name transformations. The http://git-scm.com/book/en/Git-and-Other-Systems-Migrating-to-Git site article covers that in more detail.

Commands

The first step is simple enough thankfully, and just requires cloning the SVN repository trunk folder, making sure to tell git-svn that this is just trunk. You can do this by using the -T or –trunk flags, which will make sure Git knows that there could be other folders containing the tags or branches.

git svn clone -T trunk http://svn.url/svn/repo/project/ project

It is worth pointing out that there may be multiple remotes with the same name, but followed by “@“. This happens when a branch was copied from a subfolder in the repository, and is not necessarily a whole repository copy. For example, when cloning our backend project I got this :-

remotes/trunk 0f6ddda [maven-release-plugin] prepare for next development iteration
remotes/trunk@8127 cbef06a Fixing Bug

Going back through the SVN history, it’s possible to see that revision 8128 was where /TRUNK was copied to /trunk. These should be safe to remove, because once Git has pulled everything it will track the history in its own commits. We’ll cover getting rid of them later.

Branches

Once we have this, we need to manually add each branch we want to pull down by adding an svn-remote to our Git config. This needs to have a URL and a fetch ref so Git knows what to get from where.

git config --add svn-remote.mybranch.url http://svn.url/svn/repo/
git config --add svn-remote.mybranch.fetch branches/mybranch:refs/remotes/mybranch

With that done we can fetch it from SVN and create a local branch.

git svn fetch mybranch
git checkout -b local-mybranch remotes/mybranch
git svn rebase mybranch

The fetch may also take a while but once the above is done you have a normal-looking Git branch, ready to be pushed to our new remote Git repository.

Tags

Adding specific tags is pretty similar to adding branches, in fact Git treats SVN tags like branches because really they are just copies of the entire project up to a certain revision. This means that once they’ve been fetched, we’re going to have to convert them to Git tags.

git config --add svn-remote.4.9.9.url http://svn.url/svn/repo/
git config --add svn-remote.4.9.9.fetch tags/4.9.9:refs/remotes/tags/4.9.9
git svn fetch 4.9.9

So now we need to turn this into a real Git tag. We’ll make this an annotated tag and mention that it’s been ported from SVN as well. If you were going to continue working with this repo against SVN then you’d probably want to delete the remote branch, but since we’re just doing a migration I won’t bother.

git tag -a 4.9.9 tags/4.9.9 -m "importing tag from svn"

At this point, if you go back and look at the tag in the Git history, you’ll see that actually it is pointing to a commit that’s sitting off on its own, and not part of the branch history. This is because SVN created a new commit just for the tag, unlike Git which creates tags against existing commits. If you really don’t like this then you could create the tag against the previous commit using :-

git tag -a 4.9.9 tags/4.9.9^ -m "importing tag from svn"

Pushing to Git

With that done, we can now push our repository up to our Git host and not have to worry about SVN again.

git remote add origin 
git push origin --all
git push origin --tags

Now we have a Git repository with all of the Trunk history in and only those branches and tags we specifically wanted. At this point you probably want to set the old SVN repo to be read only and get everybody moved over to Git.

Cleaning up

If you aren’t using this repo for migrations, and are instead just wanting to use git-svn to interact with your Git repository, then you will probably want to clean up the remotes a little. As I mentioned earlier, when Git pulls everything out of SVN, it will create extra remotes for tags and branches at revisions where there were non complete repository copies. Once the data is in Git you don’t need these, so we can safely remove them.

To get a list of them we can use the Git plumbing command for-each-ref.

git for-each-ref --format="%(refname:short)" refs/remotes/ | grep "@"

With this we can iterate through and delete them.

git for-each-ref --format="%(refname:short)" refs/remotes/ | grep "@" | while read ref
do 
  git branch -rd $ref
done

Other options

There are a few other options to git-svn that can be useful when migrating over, though it’s worth investigating them before setting a script running for two days so you don’t end up with a repository that doesn’t contain what you were expecting.

The –no-follow-parent option can be passed when cloning for fetching so that Git won’t follow the commit history all the way back. This will result in things being much quicker, but it also means that, according to the git-svn docs:

branches created by git-svn will all be linear and not share any history

In practice I found that this gave me a linear Git history with nothing in the places I expected. On the plus side, it was way quicker! Worth looking at but use with caution.

The other option worth knowing about is –no-metadata which will stop Git adding in the git-svn-id metadata to each commit. This will result in cleaner commit logs, but means you won’t be able to commit back to the SVN repository. It’s fine if you’re making a clean break from Git, but dangerous otherwise. I’m also not sure how well it works with pulling down separate branches from SVN to merge into Git. That investigation is left as an exercise for the reader! :)

Automating

So it’s all well and good being able to add our branches and tags, but we don’t want to do this by hand for each one when we can write a script to do it for us.

Combining everything we’ve done so far, this shell script should do the job for us and leave us with a nice looking, ready to push, Git repository. I’m doing the cleanup step in the middle just to make sure there’s no ambiguity with which branches and tags are being created, and also so it’s easier to see what’s been created once all the dust settles.

#! /bin/bash

SVNURL='http://svn.url/svn/repo/'
FOLDER_NAME='gittosvn'
BRANCH_FOLDER='branches'
TAG_FOLDER='tags'

BRANCHES='branch1
branch2
branch3'

TAGS='tag1
tag2
tag3'

git svn clone -T trunk $SVNURL $FOLDER_NAME

cd $FOLDER_NAME

for bname in $BRANCHES; do

    git config --add svn-remote.svn-$bname.url $SVNURL
    git config --add svn-remote.svn-$bname.fetch $BRANCH_FOLDER/$bname:refs/remotes/svn-$bname

    git svn fetch svn-$bname

done

for tname in $TAGS; do

    git config --add svn-remote.$tname.url $SVNURL
    git config --add svn-remote.$tname.fetch $TAG_FOLDER/$tname:refs/remotes/tags/$tname

    git svn fetch $tname

done

git for-each-ref --format="%(refname:short)" refs/remotes/ | grep "@" | while read ref; do 

    git branch -rd $ref

done

for bname in $BRANCHES; do

    git checkout -b $bname remotes/svn-$bname
    git svn rebase svn-$bname

done

for tname in $TAGS; do

    git tag -a $tname tags/$tname -m "importing tag from svn"

done

Conclusion

So migrating SVN to Git isn’t too tricky, but there are a few things worth knowing and it can certainly take a long time if you have a lot of history and branches. There are probably some mistakes and useful things I missed so feel free to get in contact if so.

Leave a Reply