Copying (branching and forking) should be the exception, not the rule

Git has been wildly successful, partly due to its distributed nature and the ease with which we can create copies of projects. Copies come in several forms: branches within a repository, and clones/forks of an entire repository. The distributed nature allows us to copy an entire project locally and branch to our hearts content to carefully craft a set of changes to share with others. This separation of committing to changes locally, without needing to immediately share work with others, is one of git’s strong points. We can build up a series of small changes, with confidence that each change is exactly what we need. Like checking items off a to-do list. Then, when the list is done, we can review everything, polish it, and share it with everyone else.

Unfortunately, with the ease of copying, comes an onslaught of bad habits. Github, and many git providers, make it easy to fork a project. A fork is a copy of a project, in the cloud. Many providers also make it easy to integrate changes with pull requests, back to the source. This is fantastic for some models of development, like open source projects where people often work out of sync. Another variant of this model, is creating a feature branch for every change to an application. Both styles include making a copy to make changes.

But with the benefits of this model come drawbacks. Drawbacks that should be considered before the same techniques are applied to other models of software development. Like teams working full time on internal projects. After all, open source projects are widely visible. We must be careful not to leap to the conclusion that their practices are universally applicable, or best practices.

Centralized version control

Let’s step back a minute and talk about old school version control. No, I’m not talking about not using version control. The one that came after, centralized version control. In this model, there is a single source of truth, the central repository. This is the baseline with which everyone makes changes and shares them with others.

Working on a project requires copying the latest version from the central repository, making changes and then pushing those changes back. When two or more people work on a project, it’s possible that changes will overlap. Depending on who finishes first, someone else will have to reconcile the differences.

The longer a set of changes sit on someone’s computer, the greater the chance of conflict. Making a copying of anything and changing it, while others do the same, is risky. As days, weeks and months pass, it’s increasingly difficult to merge the copies together. Especially if multiple people are working full time on the same project. To avoid this, many teams adopt the practice of frequently sharing and merging their work, also known as continuous integration.

But for some reason, with the ability to make copies in the cloud, whether by forking or feature branching, it’s easy to forget the integration dilemma. It’s as if the magic of the cloud and git will just take care of everything for us. Sure, git makes copying easy. And, it makes merging easier too. But, it can’t take away the fact that the longer our copies remain out of sync, the greater the likelihood of conflict. Even if we’re no longer sitting on changes for weeks and months locally. If we do the same thing in the cloud, we’re going to have the same problems.

Copies in the cloud can be worse

But it gets worse. If we mandate all changes must flow through a copy in the cloud. Whether by feature branch or forking. Then, we’ve added an extra hoop to jump through. To work on a project, first, we must make copy in the cloud. Then, we must copy from our copy in the cloud to make the changes locally. To share our work, we have to push changes to a copy in the cloud. And then from the copy in the cloud, push to the central copy. Sometimes, we go further and mandate the use of pull requests too. Where someone else on the team must approve a change before it can be shared with others. That’s a lot of added hoops to jump through.

Whatever the benefits, this model is going to decrease the frequency with which people share their work. This model makes it even more likely for people to work isolated on their own islands. Not just for days, but weeks and months at a time. This is not a consequence to take lightly.

When working from copies, I’ve always recommended pulling from the central repository at least daily. This way you’re reconciling your changes with the work that others have recently shared. But, this assumes copies are the exception and the majority of people will continue to frequently share their work with the central repository.

A policy where everyone makes a copy (feature branch or fork) to make any change, taken to its logical conclusion, has everyone sitting on islands. Sharing infrequently. Ironically, even if people are frequently pulling changes from the central repository to their island, it wont matter because nobody is frequently pushing their work to the central repository.

Problems with working isolated

The longer we spend on islands, the greater the chance we’ll stomp on each others work. Because we’re aware of this risk, when we work on islands, we’re much less likely to make significant changes. We know conflict is exponentially proportional to the amount of change. Even changes like renaming something that’s frequently used, will be avoided. Without the ability to fix things as we stumble upon them, the system will quickly accumulate cruft.

Furthermore, we’ll be less likely to know what other people are doing. The act of frequently sharing to a central source helps alert us to others approaching the boundaries of the changes we’re making. When we do overlap changes, we won’t find out about it for potentially weeks and months. With changes scattered across copies, distractions may lead to lost changes. Integration will be delayed. And, our ability to release our application will decline.

These and many other consequences are all familiar problems we encounter when we sit on islands, whether local or in the cloud.

Going too far

Because of the extra hoops to jump through, some turn to other means of finding out about problems. Anything painful should just be automated, right? Often, the next step is to automatically perform merges of individual islands back to the central repository and then run automated builds and tests on the result. The idea is that this will tell us when we’ve gone too far and need to integrate our work.

Even if these automated builds are setup, they’re only merging with the central repository. The longer people work on their own islands, the less valuable this will be, as people aren’t frequently sharing their work with the central repository. Pulling from the central source only helps if copying is the exception, not the rule. Ironically, the more we need this automation, the less likely it can actually help us.

If we continue down this path, I wouldn’t be surprised to see someone propose automatically merging permutations of islands. This is going to get messy quickly and be difficult to have confidence in. The solution will be more complex than the original problem. No matter what we do, we’re stuck trying to remedy the symptom of a more fundamental problem. We have to share and merge our work frequently if we want to minimize conflict.

Copies should be the exception, not the rule

Knowing these consequences, working on islands, especially for changes that will remain isolated for long periods of time, should be avoided. It definitely shouldn’t be a policy for many internal projects. The added obstacle is almost always unnecessary.

Of course there are benefits of copying in the cloud. For example, restricting who can merge to a central repository. To allow for reviews or to impose an approval process. But, there are other ways to accomplish these ends, without discouraging people from frequently sharing work. Perhaps, instead of jumping through hoops of copying to their copy in the cloud, individuals can instead spend time reviewing each others work. Leaving comments, checking a box that they reviewed a commit, pairing with another developer, or building the trust necessary to not need extra layers of approval to make changes to an application.

When copying becomes the exception, not the rule, it can once again become a tool that developers can leverage as necessary. For example, a developer may want a place to share work so they can work from multiple locations or devices. If that’s the case, let them add a copy in the cloud as they feel necessary to help them be more productive. After all, there are other ways to share changes between devices that may fit another developer’s style. Like keeping a copy on a jump drive. In no way does this mediating copy need to be part of the official workflow to reap the benefits of sharing changes.