Binary Files in Git

Illustrated drawing of Amadeus Maxmimilian StadlerIllustrated drawing of Amadeus Maxmimilian Stadler

Amadeus Stadler

Last updated December 29, 2021

Whenever you or one of your team members uploads a binary file (such as an image, PDF, or video) to the Media Library, it is added as to the Git repository of your project, since Mattrbld is Git-based. Adding binary files (i.e. files that do not consist of individual lines of text) directly to a version control system such as Git has long been considered a bad practice.

This is because Git stores every change that is done to each of the files it manages, so every change can easily be reverted, but as opposed to simple text files, where Git is able to determine and store only the lines that changed, this is not possible for binary files. Instead, every time a binary file is changed, both the entire original file as well as the new file are stored, quickly bloating a repository.

In this article, we’ll take a look at why this shouldn’t be an issue for most Mattrbld projects if you follow a set of best practices.

A Simple Example

Let’s say you upload an image of yourself to use on the “About Me” page of your blog to Mattrbld. This image has a size of 100KB, meaning the size of your repository has now grown by those 100KB.

A week later you decide you want to remove a nasty spot on your shirt that you hadn’t caught earlier, so you change the file and use the Replace function in the Media Library to swap out the images. The new image is also 100KB, but suddenly your repository is another 100KB larger!

This is because, as mentioned before, Git doesn’t know that you only changed part of the image and instead of saving just those changes, saved the entire new image, while also keeping the old one so you could revert those changes at any point in the future.

Worse than that, even deleting the picture will keep both versions in the repository, just in case you might want to restore one of them at a later point in time.

This example is highly simplified. Git does try to compress some revisions if possible and also removes binary files if they are no longer referenced by any commits, but those technical details are beyond the scope of this article.

Multiply these changes over many different types of binary files and suddenly you and your team members might be running out of storage, or face exceedingly long cloning times without even knowing why, since the files might be invisible to users.

Ordinary Solutions

The usual go-to to avoid these problems is to use a tool called git-lfs, which implements a pointer-based system for dealing with large files and changes to them. Unfortunately, Mattrbld currently does not support this tool, as outlined in the article on Limitations.

In the future, Git may also add support for sparse checkouts, which would help to only pull those files from the repository that are necessary, reducing bandwidth usage and the time necessary for cloning operations.

Binary Files in Mattrbld

Until these options become available to Mattrbld, however, there is no other way to handle binary files than to add them straight to the repository. For most use-cases this should be perfectly fine, since Mattrbld isn’t meant to be used for large, media-heavy projects in the first place and images and other binary content usually doesn’t change much by the time it gets approved for use in a CMS.

Following the best practices below will also improve your experience.

Best Practices

  1. Keep file sizes reasonable. There is no need to upload a 2000px image, when it will never be displayed at 2000px in size. Image fields have a maximum size option that can be used to ensure that users don’t accidentally upload un-cropped and un-optimised images.

  2. Make sure that files are as final as possible before syncing them. As long as a binary file isn’t synced, it is not part of the version control system—so use the live preview if you need to iterate before syncing the final version.

  3. Remove unused binary files before syncing them.

  4. Use SVG files whenever possible, since SVG is not a binary file format.

  5. Consider hosting large videos, archives and PDF documents on an external hosting provider and simply reference them via an external link field.

  6. Restrict which user roles are allowed to upload media files using Permissions to avoid accidents.

Closing Thoughts

Storing binary files in Git isn’t inherently bad. Modern Git is capable of handling binary files, although it cannot (yet) do so as efficiently as it can with text files. As long as you are aware of what’s happening in the background, you should be fine for those types of projects Mattrbld is intended for.

The next article will cover some security concerns regarding SVG images, since Mattrbld supports SVG out of the box.