Git branching suggestions for our environment

Unperverted Vixen

We are currently using Team Foundation Server, and are looking at Git, because we'd like to get some of the features it offers like feature branches and pull requests.

We currently have two long-lived branches, call them master and develop. Developers make changes in develop; the branch is built to QA and tested; and then changes that are ready to go are merged into master and built to UAT for testing before moving to production. (Not all changesets are necessarily merged. Some can live in develop for months or years before moving.)

We are not currently in a place where we're able to spin up environments for feature branches, meaning that two long-lived branches will continue to be a necessity.

Everyone seems to say "don't cherry-pick with Git", but I don't see another option here.

Is cherry-picking really that bad? And if so, any suggestions for how else we can handle this?

error

@Unperverted-Vixen said in Git branching suggestions for our environment:

We are not currently in a place where we're able to spin up environments for feature branches, meaning that two long-lived branches will continue to be a necessity.

I don't see how that follows. For development, we have the following workflow.

master - contains code deployed to production
[feature] - (branched from master) contains work on a developer workstation
int - unstable, basically all the feature branches get merged here to make sure they play nice with others, and for other devs to test
qa - finished feature branches get merged here for testing, suitable for QA folks to test
uat - tested branches approved for release, suitable for business folks to test
sat - staging, basically release candidates live here until they go live
hotfix - emergency patches that can skip the queue

This is a little overkill for average-scale projects. But importantly...

New development happens ONLY on feature branches. Everything else is just merges. Different things get merged different places depending on various approvals. Note that feature branches don't have a corresponding environment, besides localhost.

Edit: I guess I should also mention that we don't merge int to qa or qa to uat, or uat to sat... The feature branches get merged to multiple places depending on where they are in the process. This is sort of analogous to cherry-picking, but with branches not commits.

Unperverted Vixen

@error I think qa maps to what I described as develop, and then we would jump right to master rather than having separate uat/sat branches. But those all seem like exactly the type of long-lived branches I was talking about.

When do you merge from [feature] to master? After it's been approved in sat?

I'd been assuming we'd want to make our feature branches from develop (or qa in your list). The idea of doing it from master is intriguing. Reduces the likelihood of merge conflicts (missed dependencies), and punishes not merging stuff in a timely fashion.

JBert

@Unperverted-Vixen said in Git branching suggestions for our environment:

Everyone seems to say "don't cherry-pick with Git", but I don't see another option here.

Is cherry-picking really that bad? And if so, any suggestions for how else we can handle this?

In itself there is nothing wrong with cherry-picking, it's just that it can cause merge hell when you do it between branches that at some point need to reintegrate. If you do need to reintegrate you want to create a merge commit as soon as possible so that you can resolve the conflicts when you still can remember what the changes in the cherrypicked commit were.

In a lot of git repositories I'm trying to do away with duplicated long-lived branches for development of new releases. Since branches are just pointers to a specific commit, our build process in Azure Devops (former Visual Studio Online and Visual Studio Team Services) has no trouble building any git commit id, so in general our master branch point is just lagging behind the development branch. I would prefer to have all the action happen on master though (because that's the git convention), but as long as development is actively in use I'd rather not disturb the rest of the development team.

Mind you, development is not used to push daily changes. Instead we work with feature branches which branch of it, and those only get merged to develop once a code review is done as part of pull requests. When the code review is done we complete the pull request using the "squash merge" strategy (basically git starts a merge with the target branch, asks you to resolve all conflicts and then creates a single commit as if you made all the changes all at once on the target branch). Once that's done, the feature branch must be discarded or you hit that "merge hell" problem I mentioned before.

In case there are features which take a week or more, or which are dangerous for the release schedule (e.g. delayed due to sudden requirements which would make QA miss the release window) then we park aside code on separate branches. Sadly, those tend to be chaotic because people might not do a squash merge, thus leaving a history with all their daily work. The reason is that Azure Devops only enforces Pull Request processes on branches where you've configured a policy. When somebody adds a new branch, PRs to that branch have no policy assigned and there's no default.

Those separate longer-lived branches should be kept to a strict minimum. Just like in any other VCS they will grow stale quite quickly.

Finally, because I work in product development we do have to keep track of bugfixes. Here we use plain git merges: a fix gets made on the 5.4.x-maintenance branch, this branch gets merged to the 5.5.x-maintenance branch, that branch gets merged into development and so on. Hotfixes which are made on development are backported using a cherry-pick, after which we then merge 5.4.x into 5.5.x into development, and so on.

Of course, this is just how we work. The most extreme and opposite view I've ever read about is Trunk Based Development - basically all new changes must be done on the same branch (though they can live very shortly on a separate branch to facilitate code review). Got a thing to fix? Changes first need to be done on the "trunk" or "master" or whatever you call it, hotfixes then only can be made from cherry-pick backports of the fix to a hotfix branch. In this scenario, new features are constantly merged into the same branch and guarded by feature flags.

While this way of working has its merits (stale branches can't exist), it does mean you need to watch out for feature flag sprawl and dead code paths. While it could work OK for maintenance, there too you need to watch out how much work would be necessary to backport fixes - I know that with EntityFramework code-first migrations you constantly need to regenerate the "database model" snapshot if you don't want any conflicts.

Seems a lot of effort though to sidestep merge hell...

Unperverted Vixen

@JBert said in Git branching suggestions for our environment:

In itself there is nothing wrong with cherry-picking, it's just that it can cause merge hell when you do it between branches that at some point need to reintegrate. If you do need it you want to create a merge commit as soon as possible so that you can resolve the conflicts when you still can remember what the changes in the cherrypicked commit were.

So... the exact same problem we already have, then. That makes me feel better about the idea. 😅

In case there are features which take a week or more, or which are dangerous for the release schedule (e.g. delayed due to sudden requirements which would make QA miss the release window) then we park aside code on separate branches.

The problem with doing that in our environment is that there's no way to test anything in those separate branches, other than a developer's laptop. Sure, the code is there; but there's no environment for it to be deployed to for QA to do any testing, so it's not going to go anywhere.

I like Trunk-Based Development - for web applications where you only have one release in production, it works great. Unfortunately we decided to build tooling around branches instead of learning how to use feature flags.

izzion

@Unperverted-Vixen
What we do in my work project is:

Developers do all work in feature branches named based on the card they're associated with (e.g. bug/12345 or story/23456)
When a feature is ready to be integrated with the main project, a PR is generated against master.
When the PR is merged, master deploys to the staging/testing environment automatically via the CI/CD pipeline
When testing is done and code is ready for production, we deploy the validated build of master to production (manual trigger, but the same general pipeline as the automatic pipeline to test).

If your develop branch has to be deployed somewhere central for people to work on (because they can't really run their feature branch locally well enough), then you probably are best off to just have feature branches merge to develop and then generate PRs from develop to master by cherry picking across the correct commits (e.g. create a branch off of master release 0.9.2 and cherry pick the desired commits in from develop, then PR that back to master)

JBert

@Unperverted-Vixen said in Git branching suggestions for our environment:

(Not all changesets are necessarily merged. Some can live in develop for months or years before moving.)

@Unperverted-Vixen said in Git branching suggestions for our environment:

Everyone seems to say "don't cherry-pick with Git", but I don't see another option here.
Is cherry-picking really that bad? And if so, any suggestions for how else we can handle this?

@Unperverted-Vixen said in Git branching suggestions for our environment:

@JBert said in Git branching suggestions for our environment:

In itself there is nothing wrong with cherry-picking, it's just that it can cause merge hell when you do it between branches that at some point need to reintegrate. If you do need it you want to create a merge commit as soon as possible so that you can resolve the conflicts when you still can remember what the changes in the cherrypicked commit were.

So... the exact same problem we already have, then. That makes me feel better about the idea. 😅

Oops, only now do I see what you want to do (part highlighted in bold).

Well, I think git isn't great at what you want to do here, most of its automatic conflict resolution stuff is contained in the git merge command (and its helper code) but not much applies to git cherry-pick. Basically, what cherry-pick automates is generating a "patch" between a given commit and its parent, and git will then try to apply that patch to the branch you are targeting. When files are near-identical it might work, but if some features are routinely left out or depend on even other features which were left out then cherry-picking will likely be just as painful as whatever you are doing now: every cherry-pick of a commit which "skips" a feature might have to be analyzed again to see what the merge-conflicts mean and whether it's already resolved in the past.

The differences which git brings compared to TFVC:

speed (because everything's local, and since it stores files based on a hash of their content it can quickly know when files are identical and thus skip traversing their contents)
far easier to back out or store work in progress because you can create a local branch first, commit the cherry-pick, test, then do commit --amend to clean up your work in case you erred
far easier and faster to inspect history or file contents. Your choice of git client might matter here, but the TortoiseGit repobrowser is handy for quickly peeking at the contents of a file in a different commit, and you can automate this from the command line so it wins out on Visual Studio's Source Explorer on nearly all days of the week

I guess a lot depends on how you have been merging all those changes and dealing with conflicts in the past. Do you use something like Semantic Merge or so that you've been able to do this for this long?

Mind you, the one who is going to do the cherry-picking to master then better know git inside and out. That means the index, working tree, branches, reflogs, ...

Thinking about this whole "can't we leave out features" reminds me of darcs big selling feature: since it stores changes to files and not snapshots of files it knows what lines got modified when and how. This way its increased granularity might help you. It also has a feature where you can do "unresolved merges" (basically: fix some of the conflicts and leave others up in the air) which you can resolve later so that you can save your work.

All in all though it might not work: I don't know how to work with it despite proposing it (I only read about it in comparisons) and it is unlikely that anyone in your company does. Because changes are more important than files, it might also change the way you commit code, no idea how much of a slog it is. Finally, making things so granular will come with a performance impact. No idea if they have built in some "snapshot caching" in yet, but 5 years back it was said it was slower than most VCS (though no idea how it measures up now).

EDIT: I actually forgot about the conversion process... I wonder if you can even automate that...

robo2

@JBert yeah, your mode of operation is how we work as well. Mostly. Off course there are people in the team that like to cherry-pick, but they are also the people spending time fixing merge conflicts and somehow know all git commands by heart

Unperverted Vixen

@JBert said in Git branching suggestions for our environment:

Well, I think git isn't great at what you want to do here, most of its automatic conflict resolution stuff is contained in the git merge command (and its helper code) but not much applies to git cherry-pick. Basically, what cherry-pick automates is generating a "patch" between a given commit and its parent, and git will then try to apply that patch to the branch you are targeting. When files are near-identical it might work, but if some features are routinely left out or depend on even other features which were left out then cherry-picking will likely be just as painful as whatever you are doing now: every cherry-pick of a commit which "skips" a feature might have to be analyzed again to see what the merge-conflicts mean and whether it's already resolved in the past.

I think that when this happens, stuff is normally isolated. If it's not, well, we're already having to pay that price. As painful as what we're doing now is okay, although we'd obviously like to improve if we can.

I guess a lot depends on how you have been merging all those changes and dealing with conflicts in the past. Do you use something like Semantic Merge or so that you've been able to do this for this long?

I'd say that TFS is able to automerge successfully about 80% of the time, with us using Beyond Compare for the remaining 20%. Most of the time I'm just rubber-stamping BC's automerges; it's really only csproj files where I expect to need manual work.

Mind you, the one who is going to do the cherry-picking to master then better know git inside and out. That means the index, working tree, branches, reflogs, ...

Well, that rules out Git for us.

EDIT: I actually forgot about the conversion process... I wonder if you can even automate that...

Conversion to Git? ADO supports converting the last 180 days of history, but running it on master shows the person who did the conversion. I did a test run with git-tfs and was happy with the results - it brought the entire history in, and even managed to make it so that blame shows the original changeset author rather than the person who did the merge. (For C# at least. Not so much for changesets only containing SQL scripts, for some reason.)

In the end, I don't think we'll be able to delete the historical TFVC repos, so any migration might as well just be a tip migration anyways.

dcon

@Unperverted-Vixen said in Git branching suggestions for our environment:

Beyond Compare

Enough so that I spent my own money on it.

dkf

@Unperverted-Vixen said in Git branching suggestions for our environment:

We currently have two long-lived branches, call them master and develop. Developers make changes in develop; the branch is built to QA and tested; and then changes that are ready to go are merged into master and built to UAT for testing before moving to production. (Not all changesets are necessarily merged. Some can live in develop for months or years before moving.)

What happens at the moment if someone puts something in develop that it turns out you don't want? (For avoidance of argument, let's say that the reason for that is because you've taken a strategic decision to not do it that way. Not something that could have been predicted perfectly when the code was put in the branch.) The code is D.E.A.D and you really shouldn't be looking to keep the corpse around forever (except in the historic commits, of course; it did exist) so how do you clean up?

With git, either you've not merged the origin feature branch (trivial case) or you have merged it and need a commit to remove it again. In the latter case, you can pretty easily get the delta between the commits where the change got introduced to the receiving branch and reverse-apply that (it's the latter part that bamboozles some). Or you can just rewind the tip of the branch if it was a recent enough change... but that requires luck and cooperation from everyone using the develop branch who might've pulled it.

Is cherry-picking really that bad? And if so, any suggestions for how else we can handle this?

I don't mind cherry-picking, but I'm also happy with resolving failed merges by hand (and remember this, sometimes the problem changes are on the receiving branch, not the one you're taking the delta from).

Unperverted Vixen

@dkf said in Git branching suggestions for our environment:

What happens at the moment if someone puts something in develop that it turns out you don't want?

We have to create additional changesets to undo it. Our tool for picking changesets to merge to master will see that neither the original nor undo changesets are linked to the current sprint and won't pick them for merging. (And hopefully nothing in-between relied on those changes. Most of the time that's the case, but that's why we test the master branch before releasing it...)

With git, either you've not merged the origin feature branch (trivial case) or you have merged it and need a commit to remove it again.

That's part of why I want feature branches - so we can be more systematic about doing code review and hopefully have less of that.

bobjanova

You shouldn't need to cherry pick with a setup like this. It sounds pretty similar to ours (although typically we do merge everything onto master before a release).

The trick is that you can merge the same feature branch into both master and develop (as long as they haven't diverged so far that you have merge issues trying to do so). This is only relevant for us when we have to patch a bug fix onto a previous version, but perhaps for you you will want to do it for features too.

Let's imagine long term branches master (the released version) and develop (current development), and that you want to fix a bug and patch the release. Develop is always ahead of master i.e. there are no features we want on master that aren't also on develop.

Make a new branch off master (so it isn't based on commits in develop which are not in master), let's call it bugfix/bad-bug.
Do the work
Submit a PR from bugfix/bad-bug to master
Also submit a PR from bugfix/bad-bug to develop

Now, assuming those merges are clean, the feature branch is merged into both, with the same commits. So merging between them should still be fine.

The important part of this working pattern is that you must create these branches off master, not develop, because otherwise it's impossible to merge them onto master without pulling everything else that's been put into develop.

dfdub

@Unperverted-Vixen said in Git branching suggestions for our environment:

Everyone seems to say "don't cherry-pick with Git", but I don't see another option here.
Is cherry-picking really that bad? And if so, any suggestions for how else we can handle this?

Rule of thumb: If your development model is "keep a linear history in master, don't share any other branches (except release branches) and use rebase + fast-forward merges for feature branches", then cherry-picking is perfectly fine, as the duplicate commits eventually disappear anyway. But if your history is spaghetti and you merge branches into each other frequently, you should probably avoid cherry-picks. The situation you absolutely want to about is having to merge two branches with copies of the same commits.

cherry-pick + rebase = awesome
cherry-pick + merge = gitthulhu

JBert

@dfdub said in Git branching suggestions for our environment:

The situation you absolutely want to about is having to merge two branches with copies of the same commits.

cherry-pick + rebase = awesome
cherry-pick + merge = gitthulhu

Well, I do know the incantations to do a rebase and then make an identical merge commit... But that's history rewriting which has gone so far that it's become history writing again.

One more thing which I might not have stressed enough: even when you do a baseless merge in TFVC it will remember which changesets have been merged where (I hope I still get the terminology right, it's been a while). When you start a fresh merge it should not suggest those changesets again.

When cherry-picking with git your really get duplicate commits and so all that change-tracking needs to be done externally. Git's datamodel really isn't rich, it's just optimized for a particular usecase and alien for the majority of people (which is why it's CLI is so byzantine). Should you switch to git then I hope you can keep track of it outside the git repo...

dfdub

@JBert said in Git branching suggestions for our environment:

Well, I do know the incantations to do a rebase and then make an identical merge commit... But that's history rewriting which has gone so far that it's become history writing again.

Well, attitude towards history rewriting always depends heavily on the development model. (See what I wrote above.) Depending on what your branches are and mean, the information you lose is either useful or useless clutter.

If you start from a clean slate with git, with developers who already worked with git before, I'd strongly suggest adopting a rebase-based model if possible, since that makes things easier to reason about in the long run and prevents huge merge conflicts. History rewriting is actually a good thing if it's done on private branches. But I know that's not always possible, since some projects need a lot of shared branches that are deployed in different places. In those cases, you're right to be sceptical about rewriting any history.

TheCPUWizard

Many ways to do things. I do not believe any are universally better or worse than the others... That being said.

We have switched to use of Feature Flags. Eliminated the need for any branching besides Devlop->Master with ONLY validated (much auto tested) pull requests being able to kick off the merge.

Unperverted Vixen

@JBert said in Git branching suggestions for our environment:

One more thing which I might not have stressed enough: even when you do a baseless merge in TFVC it will remember which changesets have been merged where (I hope I still get the terminology right, it's been a while). When you start a fresh merge it should not suggest those changesets again.
When cherry-picking with git your really get duplicate commits and so all that change-tracking needs to be done externally. Git's datamodel really isn't rich, it's just optimized for a particular usecase and alien for the majority of people (which is why it's CLI is so byzantine). Should you switch to git then I hope you can keep track of it outside the git repo...

You got the TFVC terminology correct. That's... unfortunate and probably a big red flag against cherry-picking.

@TheCPUWizard said in Git branching suggestions for our environment:

We have switched to use of Feature Flags. Eliminated the need for any branching besides Devlop->Master with ONLY validated (much auto tested) pull requests being able to kick off the merge.

That would be the right way to do it, but I don't have any reasonable expectation that we can get from where we are to there.

I'm intrigued by the idea of making a feature branch from master and then merging it to both master and develop (per @error and @bobjanova). I'll spend some time tomorrow testing that out.

robo2

@Unperverted-Vixen there's an oldnewthing set of blogposts a while back that goes into depth on all the things they (Microsof) do to have a common base for merges (and to never cherry-pick anything). The feature branch of master mechanism would probably prevent all the shenanigans pulled in that set of posts, although it was an interesting read anyway.

TheCPUWizard

@Unperverted-Vixen said in Git branching suggestions for our environment:

I don't have any reasonable expectation that we can get from where we are to there.

Feel free to ping me privately. Been helping companies get there for many a year.....

dkf

@JBert said in Git branching suggestions for our environment:

When cherry-picking with git your really get duplicate commits and so all that change-tracking needs to be done externally.

Really? Huh, I didn't know that. Other DVCSs (e.g., fossil) have no problem with it, and fossil at least will actually show an arc in the GUI to show that a cherry pick happened (it's a dashed arc so it's clear it's not a standard merge). It's pretty neat. (Fossil is, of course, utterly philosophically opposed to rebasing as a workflow strategy.)

JBert

@dkf It's not called git - the stupid content tracker for nothing!

Git does not store any history information other than a graph of commit objects. Rename tracking, merge tracking, all of that is calculated only when that information is needed. The reason that it works so fast is because everything is addressed by a content hash, which makes checking if things are identical very fast, and so merging can use lots of little tricks to discard identical content without even having to load said content.

However, detecting duplicate commits is not one of its strengths... Because a cherry-picked commit has a new parent, the content hash of the commit object is also different. Only when the files in the new commit are identical to the files in the branch to merge will git realize that nothing is to be done. Any change to the file -> merge conflict, where it is up to a merge tool to find the identical lines in the file and figure out the changes.

Gąska

@JBert said in Git branching suggestions for our environment:

Any change to the file -> merge conflict, where it is up to a merge tool to find the identical lines in the file and figure out the changes.

I agree that interns often leave much to be desired, but calling them tools is a bit too much.

dkf

@JBert said in Git branching suggestions for our environment:

Git does not store any history information other than a graph of commit objects. Rename tracking, merge tracking, all of that is calculated only when that information is needed. The reason that it works so fast is because everything is addressed by a content hash, which makes checking if things are identical very fast, and so merging can use lots of little tricks to discard identical content without even having to load said content.

Fossil also keeps the ID of the origin commit when you cherry-pick (in the core ledger data model, it might keep more derived stuff in the database) so it is very good at getting merges right. And its GUI is better than all the ones for git I've seen, and everyone uses the same one (simple web-based GUIs have that sort of advantage).

The real advantage that git has comes in the extended workflow, with stuff like CI/CD services.

I use both, on different projects.

HardwareGeek

@dkf said in Git branching suggestions for our environment:

Fossil

Despite being an old fossil myself, I've never used it.