Git repositories and big binary files

07 February 2015

Update: Use Git Large File Storage

GitHub and Bitbucket now both have support for Git Large File Storage or Git LFS. Basically, this implementation works similar to the workarounds below (git-bin, git-annex, git-fat) but has official support from two major cloud providers of git version control.

Continued…

I’ve been using Bitbucket to host my code repositories for some time now so when I started game development, it made a lot of sense to me to continue using Bitbucket and Git. At the time, there was no size limit on Bitbucket repositories and Git worked fine for my desktop and phone application development, so why not game development?

The problem

As it turns out, game development can have a lot more binary content than your applications depending on what you’re building. The most recent game I’m working on, “Dungeons of Rune” is a 3D mobile game built with Unity. I decided that as a mobile game, the 2GB limit would be sufficient. Unfortunately, I was incorrect. Textures and audio are binary files that take up a significant amount of space as they are. On top of that, any time you update a binary file, Git needs to store the entire file again. Alone, this would be enough to cause trouble in the long term however I thought that would be manageable as I don’t often re-update binary files too often as they’re mostly sourced from the Asset Store. What I didn’t consider is how assets from the Asset Store often contain many superfluous textures (and other files) that simply weren’t relevant to my project.

If I’d thought of this earlier, I could’ve just not imported/committed these additional files but that sounds like a lot of manual filtering and I feel it’d really slow down development.

The solutions

Note: There’s a nice writeup from Atlassian on handling big repositories with Git.

There are a lot of potential solutions to this problem. All have pros and cons and take time to investigate and set up so nothing is a silver bullet like “download and run suspicious-package.exe to fix all your computer problems”.

Move to a different Git repo host with no size limit

Visual Studio Online currently allows you to create free Git repositories with no size limit but there’s no guarantee they’ll stay that way.

This would be a trivially easy change to implement. Sign up for an account. Define a new Git origin. Send the code to Visual Studio Online and keep my repository history!

Despite solving an immediate problem, this doesn’t solve the long term problem of having a massive repository. Checking out a fresh repository will take approximately “exactly one million years” due to multiple versions of binary files - though you can get around this with a shallow clone using the depth paramter.

If Microsoft (Visual Studio Online) decide to impose a size limit, this could cause problems. They may take the same approach as Atlassian (Bitbucket) and “grandfather in” existing repositories so they have no size limit but there’s no guarantee.

Separate Asset Store folders out into submodules

Git Submodules are a viable option and it makes sense not to have a monolithic repository when you have modular assets. In theory, asset packages could live in separate repositories since they’re modular. Unfortunately, assets aren’t that straightforward in Unity and their code doesn’t always live under a single folder that would make a neat little submodule. So you could take this approach for some modules or for parts of some modules but it wouldn’t be as clean as you might hope. The other problem is that initial cloning of the repository will still take as long as a single repository.

Change to a different version control system

There’s a lot of talk about Perforce’s P4D and using Subversion for version control of binary files. Everything I’ve read about P4D suggests it’s ideal for game development and handles large repositories very well. It’s also free to host a server for up to 20 users. If you’re willing to run your own server, this may be a good solution for you.

If you’re not willing to run your own server, the options for hosted P4D repositories are relatively minimal. Assembla offer free Perforce hosting for up to 1GB repository size and 1000 files which isn’t helpful for this use-case. Their paid plans do solve the space and file limit issues but they begin to get expensive if you only use source control ($24 per month for 5GB + extras as at 8-Feb-2015).

Subversion is also an option. It’s not a distributed version control system like Git so when you pull/update the latest changes, you only pull the files that you need and not the history. This means your server can still bloat but your local copy will feel relatively clean. Hosted Subversion repositories are also readily available. Some find merges difficult with Subversion so make sure this isn’t a problem for you.

Mercurial deals with large files in much the same way the Git extensions do as described below.

There are various pros and cons of using different version control systems and much of these are personal preference. You’ll have to do some more reading and decide for yourself.

Use one of the many Git workarounds to store binary files somewhere else

Using one of the tools below, you can continue to version your binary files with Git but the actual binary files will be stored somewhere else. What this means is you have a single Git repository that doesn’t get bloated. What’s the catch?

First there’s the set up. You need to have somewhere to host the files though there are often many options available there. Some may choose to synchronise via Dropbox or similar client apps, while others use hosted options such as Amazon S3. You will also need to set up your .gitattributes file for your repository to tell it which files need to be stored somewhere else and then there may be some other configuration depending on which option you choose.

Second, there may be some additional workflow steps that you need to do such as pushing the binary files separately to pushing to your git repository. You may be able to use git hooks to do some of this automatically for you but the repos don’t tell you how to set this up.

Last, these tools have varying degrees of support and usage. They seem to work fine but you may run into edge cases down the road. I would suggest ensuring you have multiple clones/checkouts of your latest code+binaries at the very least.

  • git-annex - Probably the most comprehensive solution but also a little more complicated. Files are stored locally and need to be synchronised remotely. Git annex assistant
  • git-fat - Remote storage of binary files via Rsync
  • git-bin - Remote storage of binary files on Amazon S3

More about software-development

Agile Software Development - The Eye Glaze Test

A couple of years ago, I was managing a team of developers at Fairfax Media to build the new Australian Financial Review website. We started...

Visual Studio Code - Terminal alt tab keyboard shortcut

Configuring your own keyboard shortcuts in Visual Studio code is fairly easy once you know which settings to change but finding the right command can...

About Me

I'm a software developer by profession with a wealth of both dev management and programming experience.

My programming background is varied and includes .NET, C#, Node JS, React, Flavours of SQL, Xamarin and Unity among other things.

I spend a great deal of my spare time doing game and web development, writing music and relaxing with my family.

Ping me on Twitter: @panetta