Blakeyxkcd and the years and years of struggling with broken software that just so happens to use git

Kian

What's wrong with editing a commit in the past? You edit it, everyone that pulls from you is told "this source has a different history, do you want pull it?", they can analyse the change same as they do new commits, and the change spreads the same way new commits spread.

You think it's wrong because you know how the underlying implementation handles it, and it's a pain to get it right. But the user doesn't care about the implementation. A good interface makes things that are difficult easy for the user.

asdf

NOBODY KNOWS WHAT THE FUCK A DAG IS!

Seriously? I mean, I get your problem with git, and I mostly agree with your criticism of it. But knowing what a directed acyclic graph is data structures 101. There are many algorithms/data structures you don't need to know because you can easily look them up. But this is different, this is something really basic.

LB_

@Kian said:

What's wrong with editing a commit in the past?

The same thing that's wrong with editing a bitcoin transaction in the past. History is a permanent thing, you know. You can only mutate it by creating a copy with changes. Java does this with its immutable strings, I'm not sure that it's a hard thing to understand. I wouldn't consider immutable history to be an implementation detail.

asdf

There are just other tools that make it easier to work with a DAG of commits.

I don't doubt that, although I've never used one that was easier to use while being as powerful as git. I really wish git had a better user interface. Hell, even its command line interface is inconsistent and crappy.

JBert

@LB_ said:

The problem is that what blakey wants to do is change history.

You're looking at commit messages the way Git treats them as part of the history, whereas Blakey looks at it as just a label for a particular commit which should have the ability to be changed at a later point. The latter is what TFS and Subversion do; committed history cannot be changed but you can override the message.

That's why Blakey was ranting that you need to use different VCSs to get different perspectives.

LB_

Woah, I didn't realize that some VCS don't treat the commit messages as part of the commit. I'll have to think for a while about whether or not I think that's a good idea. Thanks for making it more obvious for me, it wasn't clear from blakey's shouting.

boomzilla

@asdf said:

Maybe the only thing the git is doing wrong is not forcing you to create a branch immediately when you commit with a detached head. Because I cannot think of a single reason why you would want to commit with a detached HEAD and not create a new branch.

I vastly prefer the was hg handles it, which is to create an anonymous branch. Then you can decide what to do with it after the fact: merge with the previous head of the branch (or another anonymous branch), close a branch, etc.

boomzilla

@LB_ said:

Woah, I didn't realize that some VCS don't treat the commit messages as part of the commit. I'll have to think for a while about whether or not I think that's a good idea.

Like blakey, I've had to go back and update a message in svn because I (or someone else) typo'd something like a ticket reference, which breaks other stuff that relies on those being correct. I get the attitude that "history is history" and all that, but the end result in this case is ideology getting in the way of getting shit done.

HardwareGeek

@boomzilla said:

"history is history"

I'm not a git user, but what I've read here suggests to me that git gets this wrong both ways. Editing a commit comment should be easy (probably with a permanent indication that it's been edited) without going through a complicated process that risks corrupting your repo, but git makes this risky. Changing any other history also shouldn't be error-prone; it should be impossible, but git allows it (in a risky way).

boomzilla

@asdf said:

Because I cannot think of a single reason why you would want to commit with a detached HEAD and not create a new branch.

I think a common case for me doing this with hg was...I'm working on something, and I've pulled down stuff that someone else has done. I finish my work, test it, etc, commit it and then go on to merge my anonymous branch with the newly pulled down stuff.

FrostCat

@blakeyrat said:

NOBODY KNOWS WHAT THE FUCK A DAG IS!

Man, no wonder you're down on that college education you got, it was worthless.

asdf

@boomzilla said:

I finish my work, test it, etc, commit it and then go on to merge my anonymous branch with the newly pulled down stuff.

HTFY

In the git world, that means: You need to create a branch. So your use case is not a counter-example. ;)

boomzilla

@asdf said:

In the git world, that means: You need to create a branch. So your use case is not a counter-example.

Yes, I know. Because git hates everyone.

asdf

@boomzilla said:

Because git hates everyone.

I'm pretty sure evil, self-aware software does not have feelings. At least that's what Hollywood taught me.

boomzilla

@asdf said:

I'm pretty sure evil, self-aware software does not have feelings.

I don't think git is any of that. But it still hates everyone.

asdf

@boomzilla said:

I don't think git is any of that.

Well, duh, you're obviously not supposed to think that…

HardwareGeek

@boomzilla said:

But it still hates everyone.

But not as much as Discourse does. Just trying to see who liked @asdf's reply jellypotatoed me 78 posts up-thread.

LB_

@boomzilla said:

I vastly prefer the was hg handles it, which is to create an anonymous branch.

That's how it works in git too though? HEAD is an anonymous branch.@boomzilla said:

Like blakey, I've had to go back and update a message in svn because I (or someone else) typo'd something like a ticket reference, which breaks other stuff that relies on those being correct. I get the attitude that "history is history" and all that, but the end result in this case is ideology getting in the way of getting shit done.

I don't think it's that straightforward: what if some asshole nukes all your commit messages (or changes random ones in subtle but destructive ways)? Even though git lets you force the master branch to an empty commit, the commits are all still there and you can recover state. With separating the message from the commit, you're replacing one set of problems with another set of problems.@HardwareGeek said:

Editing a commit comment should be easy (probably with a permanent indication that it's been edited) without going through a complicated process that risks corrupting your repo, but git makes this risky. Changing any other history also shouldn't be error-prone; it should be impossible, but git allows it (in a risky way).

So you would both add overhead of versioning commit messages separately, and make it impossible to remove accidentally published sensitive data? I disagree.

EDIT: Thinking about it, versioning commit messages separately wouldn't be so bad. It just seems awkward.

Dreikin

@boomzilla said:

Like blakey, I've had to go back and update a message in svn because I (or someone else) typo'd something like a ticket reference, which breaks other stuff that relies on those being correct. I get the attitude that "history is history" and all that, but the end result in this case is ideology getting in the way of getting shit done.

On a related topic, the way to do bug-tracking stuff like that in git should be to use tags, right? Something like git tag tickets/{ticket-id}^[1]. This would make those types of fixes pretty easy (relatively), and seems easier to work with than scanning commit messages for ticket IDs. But I don't think I've ever heard of a tool doing it that way, and I've heard lots about tools doing it the commit-message way.

Still doesn't address the general rewriting messages further back than the previous commit, though.

@boomzilla said:

I don't think git is any of that. But it still hates everyone.

The rat vs. the git - THERE CAN BE ONLY ONE!

@LB_ said:

So you would both add overhead of versioning commit messages separately

Wouldn't be that hard, actually. Just two trees, with all the same commands varying only by target, with the message tree holding refs to the normal tree.

@LB_ said:

make it impossible to remove accidentally published sensitive data?

That's already pretty hard, if you've put it onto the internet and anyone outside of your control has pulled it.

Also, I'm to lazy to scroll back up and see if you addressed this yet: Do you understand the value of selective locking of undiffable files now?

1: If you want to stick a message on the tag, do git tag -a {tag} -m 'message'.

LB_

@Dreikin said:

Wouldn't be that hard, actually.

Yeah, thinking about it now it wouldn't be hard, it just seems kind of awkward to me. If there's an existing VCS where each commit has its own commit message versioned, let me know so I can see how it works.@Dreikin said:

Do you understand the value of selective locking of undiffable files now?

Nobody gave me an example that showed me the light, and I haven't looked it up myself yet.

Dreikin

@LB_ said:

If there's an existing VCS where each commit has its own commit message versioned, let me know so I can see how it works.

Don't know of any; like you most of my experience is with git.

@LB_ said:

Nobody gave me an example that showed me the light, and I haven't looked it up myself yet.

The spreadsheets one wasn't bad conceptually, though it may not fit in with what you think of source control as being for. A (probably?) more common case is graphics assets for games: you don't want two people updating the same graphics file at the same time because the work of one of them is almost certain to be lost. Same for simpler graphics like sprite sheets for a web app.

LB_

Still not seeing it; the only case where I could see that being wasteful is if they both made the exact same pixel-for-pixel changes, or if they both tried to replace the image with an entirely different image. Those are pretty rare though and are usually a result of miscommunication. In all other cases, you can merge the two different versions with decent image editing software. You might even be able to find a git plugin that can help git make the merge automatically, especially in the sprite sheet example.

I agree with what some else said: your VCS shouldn't be your project manager.

@rc4 said:

Maybe it's just me, but don't most companies do things like assign specific tasks to employees? If they did things like "be a manager" and "delegate" (which have been proven to be effective at distributing workload across groups of people) then maybe they wouldn't have to worry about having many people do the same thing at once, unaware that someone else is also working on said task. Maybe it's the company's fault for relying on software that isn't designed to manage employees to manage their employees.

HardwareGeek

@LB_ said:

So you would both add overhead of versioning commit messages separately, and make it impossible to remove accidentally published sensitive data? I disagree.

EDIT: Thinking about it, versioning commit messages separately wouldn't be so bad. It just seems awkward.

I agree that full-blown versioning of commit messages seems like overkill. I was thinking of a flag that would indicate it had been modified, maybe combined with a policy that required review and approval of such changes to ensure they were legitimate before they could be committed and/or pushed upstream.

That might overcome your objection to removing sensitive data. Any change that modifies history requires extraordinary review and approval, and must be committed by an admin that knows WTF he/she is doing and won't screw it up. (Not such a big problem with a hypothetical VCS that doesn't intentionally (or otherwise) make it a difficult, error-prone process.)

Although I find it hard to imagine a scenario in which accidentally publishing sensitive data is even possible unless your whole process is seriously . Really sensitive data — HIPAA/PCI/DoD kind of sensitive — you're already totally fucked, and deleting it isn't going to unfuck you. Customer/personnel data, why are you putting that in your VCS at all? Your proprietary source code is sensitive, but the whole point of a VCS is to store that, and it's on your internal, corporate network, not published, right? OSS that you're putting on a public VCS, what sort of sensitive data could you manage to accidentally put out there?

LB_

@HardwareGeek said:

That might overcome your objection to removing sensitive data. Any change that modifies history requires extraordinary review and approval, and must be committed by an admin that knows WTF he/she is doing and won't screw it up. (Not such a big problem with a hypothetical VCS that doesn't intentionally (or otherwise) make it a difficult, error-prone process.)

GitHub allows you to protect branches so that force pushes are rejected. Protecting or unprotecting a branch requires admin access and retyping your password. With vanilla git, it's a bit different due to the whole distributed thing, but for GitHub this problem is solved.

@HardwareGeek said:

Although I find it hard to imagine a scenario in which accidentally publishing sensitive data is even possible unless your whole process is seriously .

Many repos, especially Google repos, have a config file in the root of the source tree that you are expected to edit and insert a private key. Considering Google's C++ standards, I'd say yes, this is TR:WTF:, but it's still a possibility, and GitHub has instructions for how to deal with it, meaning it has come up before or was seemingly likely enough for them to make the instructions.

dse

@Dreikin said:

A (probably?) more common case is graphics assets for games: you don't want two people updating the same graphics file at the same time because the work of one of them is almost certain to be lost. Same for simpler graphics like sprite sheets for a web app.

BeyondCompare has a nice visual diff for images, I am not sure if it can merge them though. I have set BC3 for my 3-way merge and diff tool, it makes it so easy I am sure even Blakey will not hate it too much.

In general binary files make the git slow and bulky, I do not think git without an extension is the right tool for them. The only time I have had to re-write the history is when I had to remove some huge video files (and later again some huge zip files ) when some idiot committed them.

LB_

Ah yeah, that reminds me of Git-LFS.

dse

@LB_ said:

GitHub allows you to protect branches so that force pushes are rejected. Protecting or unprotecting a branch requires admin access and retyping your password. With vanilla git, it's a bit different due to the whole distributed thing, but for GitHub this problem is solved.

HardwareGeek:

We use Gitlab and master branch is for us admins only :) every one else commits to develop then I merge to master. We use Gitflow, so this makes sense.

boomzilla

@LB_ said:

what if some asshole nukes all your commit messages (or changes random ones in subtle but destructive ways)?

Meh. I've yet to be out asshole'd.

blakeyrat

ScholRLEA

@asdf said:

> NOBODY KNOWS WHAT THE FUCK A DAG IS!

Seriously? I mean, I get your problem with git, and I mostly agree with your criticism of it. But knowing what a directed acyclic graph is data structures 101. There are many algorithms/data structures you don't need to know because you can easily look them up. But this is different, this is something really basic.

More to the point, every VCS that allows you to merge different branches operates on a DAG. However, the term itself rarely comes up in the documentation for any of them (including git), and is mostly used for internal design. I am not surprised that Blakey wouldn't have run into it in this context before, but I expect he recognized it and is just making a point about using technical terms when describing something non-technical, or something like that.

Though as usual Blakey is (deliberately) ignoring the fact that the users of any current VCS would have enough technical background to understand what a directed acyclic graph is - for better or worse, no VCS today, CLI or GUI, is designed for use by casual users, because hardly anyone thinks of things like word-processing documents or spreadsheets as things that can and should have version management.

<rant>
TRWTF isn't that git sucks, it is that all of them suck, because the idea of revisioning hasn't been generalized enough. This is stuff that the file system ought to be handling, not some add-on that is only of interest to turbo-nerds. Until that is the case, and it has a user interface that anyone can handle with minimal effort, then revisioning will remain a across the board.
</rant>

ScholRLEA

I think Blakey uses Git-TFO.

lolwhat

@blakeyrat said:

NOBODY KNOWS WHAT THE FUCK A DAG IS!

What, you don't like DAGs?

https://www.youtube.com/watch?v=zH64dlgyydM

Yamikuronue

@ScholRLEA said:

the users of any current VCS would have enough technical background to understand what a directed acyclic graph is

False.

How many webdevs use VC but have never had any formal education in computer science or mathematics? How many testers are able to pull down builds from VC but don't have any technical knowledge?

ScholRLEA

That's a good point, actually. Still, most VCS systems these days are designed as if it were the case, and this is one place @blakeyrat is right - the designs of those which I am familiar with at least (CVS, Subversion, Git, and back in the mists of time, the old version of MKS Integrity that was based on RCS; I have some exposure to Mercurial, but not enough to really judge) assume that you have such knowledge, and they can be a pain in the ass to use even with a nice front end on top of the CLI tools. I don't actually disagree with Blakey about Git being crap, I just don't see it as all that much worse than its competitors.

Pharylon

As someone who's heavily invested in the .NET world, I've got some opinions here. I've used bot TFVC (Team Foundation Version Control) and SVN quite a bit. Recently, a couple of us at my current job got to do a bit of website greenfield development, and we had to create a new repo. We decided to give Git a try for a couple reasons. Mostly, just because it's so dang popular. I'd only ever used Github for Windows before, which I found it be a universally bad experience, but it seems everyone is moving to Git these days - TFS even now supports it now, so we decided to give it a try.

Even though Visual Studio has perfectly good Git support now, I decided to "go native" and live on the command line. Six months later, and I've drank the Git Kool Aid. If I have my way, I'll never go back to TFVC or SVN. Git - once you understand it - really is very powerful. I especially love the rebasing. I really love the ability to rebase my commits and squash/rewrite things so that my history looks nice and easily readable before I do any merges. By the way, I can't recommend Posh Git enough if you love the CLI on Windows.

LB_

It's funny to me that within the world of people who like git, there are two kinds: those who love rebasing because it makes their DAG cleaner, and those who hate it because it makes the dates on their commits go in a crazy order. I'm the latter kind: I prefer just merging everything with git merge --no-ff --no-commit so I can make sure the code does what it is supposed to before concluding the merge. I only use the rebase command when other developers ask me to squash my commits.

asdf

git merge --no-ff --no-commit

But… but… A clean, linear history without all the old branches makes bisecting so much easier!

Kian

That's only true if you can guarantee that your new history can be built and passes all tests at each commit. Rebasing and messing with the order of commits can't guarantee that. You could insert a subtle error, and then bisecting becomes useless.

dse

It also requires banning WIP commits. That is enforcing a policy; something to look into git-flow for it. Even then it needs proper code review process to make bisect possible.

Pharylon

@dse said:

It also requires banning WIP commits. That is enforcing a policy; something to look into git-flow for it. Even then it needs proper code review process to make bisect possible.

A ton of my commits are "WIP on Subfeature" types that I end up squashing together.

LB_

As others have said, I value having every commit be able to compile much more than having a 'clean' history. Also I hate losing history, I'm perfectly happy with all the messiness of merges and parallel work if it means I can see what happened when at a glance. Also, being able to easily patch old or specific versions is a must-have for some people - rebasing makes this troublesome.

@Pharylon said:

that I end up squashing together.

I use merge commits for this purpose - no reason to toss out all the history of your thinking through the code. I just make sure the code is able to compile at each commit.

I only squash or rebase when another developer asks me to do it in order to get a PR accepted.

Dreikin

@LB_ said:

Also, being able to easily patch old or specific versions is a must-have for some people - rebasing makes this troublesome.

I can't think of anyone advocating rebasing quite that much. Once it's been released to others, it shouldn't be rebased, both because it messes with downstream repos and because of the issue you mentioned.

JBert

@LB_ said:

I'm the latter kind: I prefer just merging everything with git merge --no-ff --no-commit so I can make sure the code does what it is supposed to before concluding the merge.

It happens that this is what hg and other VCSs encourage by making a rebase / replay slightly harder to do. On top of that, they do tend to store branch information as part of the history so that you can always find out which branch originated the commit (especially hg). Git on the other hand is built to forget such "minor" details.

LB_

@JBert said:

Git on the other hand is built to forget such "minor" details.

What do you think tags are for? IMO the branch a commit was pushed to is no more part of its identity than the repository it was pushed to.

boomzilla

@LB_ said:

What do you think tags are for?

So you have to do that stuff manually? Ugh.

@LB_ said:

IMO the branch a commit was pushed to is no more part of its identity than the repository it was pushed to.

It's part of the history. It makes life easier to look at later and figure out what happened, in my experience.

fbmac

This post is deleted!

powerlord

@fbmac said:

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

You know you're supposed to rewrite history before pushing upstream, right?

~and rewrite history, Git Tales, oo-oo!~

TimeBandit

@blakeyrat said:

My (normal) brain

fbmac

If you think of your commit history as a way to separate different changes in an organized way, there are valid reasons for changing it before sending upstream.

Gitlab has a cool feature that prevents you from screwing the master branch in the server. Its pretty safe with this feature on.

HardwareGeek

@ScholRLEA said:

hardly anyone thinks of things like word-processing documents or spreadsheets as things that can and should have version management.

Although I suppose it depends on your definition of "hardly anyone," I'm going to have to disagree with this. Lots of business specifications, product data sheets, and stuff are written as Word docs, and they're version controlled. What changed between Rev. D and Rev. E of the Froobistat9000 datasheet? The document change history gives you a summary, and you can get both versions from version control and compare them if you need specifics. At my last job, a lot of the nitty-gritty details were in spreadsheets, some of which had dozens, if not hundreds, of versions in the VCS.