50 Versions of Shed Control Wars

accalia

well some formats it's going to be impossible to sensibly do merges.

how do you merge a PNG logo that two different people have changed?

How do you merge a SQLITE database?

what does it mean to merge a MP3 or an AVI?

if the format has sensible semantics for merging mutliple changes then you should provide a merge tool, or at least provide sensible documentation so that one can be created, but not every format can be sensibly merged.

boomzilla

@tarunik said:

binary-blob formats

How many of these are the sorts of things that you'd really want something like that? And what's the demand for such a thing?

flabdablet

@blakeyrat said:

classical Boomzilla mode where if you don't agree with something I say, instead of just saying so, you just ask 573,342 pedantic clarifying questions until I say "fuck that" and leave

https://www.youtube.com/watch?v=CtvgxuU9kJ8

tarunik

Is there a reason elementwise (pixel-by-pixel, frame-by-frame) merges wouldn't work for say a PNG or an AVI? (Databases are a semi-special case, though -- blind elementwise merging won't work so well b/c of schema changes, etal.)

Kian

@dkf said:

That totally doesn't work with “distributed” precisely because there isn't fundamentally a single location that is authoritative and which everyone can make requests to lock to

The idea of implementing locking is because you already have a centralized system in place. You have a centralized system in place because your assets don't support working in parallel. The workflow is already centralized and synchronous. Complaining that you can't do it in a distributed fashion doesn't matter, because the central location already exists if you are working on an unmergeable format.

Jaloopa

Can't database backups be stored as incremental? That's a good part of the way there. There are also tools like SQLCompare which can create a script to get from one database to another, effectively a diff

tarunik

@Jaloopa said:

There are also tools like SQLCompare which can create a script to get from one database to another, effectively a diff

Yep -- those tools must have some awareness of what's been done to the schema, though. So, merging databases shouldn't be as impossible as @accalia is implying here...

accalia

@tarunik said:

So, merging databases shouldn't be as impossible as @accalia is implying here...

i wasn't trying to imply impossible.... i was trying to imply extremely difficult and the assumption about how to do the merge is likely to change between different databases. (a database of census data will have different merge semantics than the database for a blog.)

it's probably too difficult to write a general, automated db merge program. but one could develop merge tools that make it possible to manually reconcile differences.

boomzilla

@accalia said:

it's probably too difficult to write a general, automated db merge program.

For the general case, you probably have to record the (or equivalent) SQL statements (or whatever underlies your DB) and consider them "source code" and you can merge base on that. However, this would get real hairy once you have things like sequences involved. Not impossible, but probably not something anyone would want to do.

tarunik

@accalia said:

it's probably too difficult to write a general, automated db merge program. but one could develop merge tools that make it possible to manually reconcile differences.

I generally assume that a merge tool will have to fall back to manual operations in at least some cases -- fully automatic merging isn't a realistic goal, even for text files.

accalia

@boomzilla said:

For the general case, you probably have to record the (or equivalent) SQL statements (or whatever underlies your DB) and consider them "source code" and you can merge base on that.

oh you certainly could. but that wouldn't preserve semantics well without a human guiding it. you would get a structurally correct database out the far side, but it might have corrupted semantic meaning. (think configuration stored in database. cromulent, sure, but common. what happens if the config records are duplicated by the merge?)

boomzilla

@accalia said:

what happens if the config records are duplicated by the merge?

Presumably a merge conflict, just like if you did it to your text based config.

@accalia said:

but that wouldn't preserve semantics well without a human guiding it. you would get a structurally correct database out the far side, but it might have corrupted semantic meaning.

This sort of thing happens with code, too, if you aren't careful.

tarunik

@accalia said:

oh you certainly could. but that wouldn't preserve semantics well without a human guiding it. you would get a structurally correct database out the far side, but it might have corrupted semantic meaning. (think configuration stored in database. cromulent, sure, but common. what happens if the config records are duplicated by the merge?)

Hrm -- don't we see this case with text file merges, too? (In fact, this'd be easier to deal with in a database situation -- we simply say 'oh, both sides have the same PK and the same data for this row, so we grab it')

@boomzilla said:

This sort of thing happens with code, too, if you aren't careful.

I have seen SVN's merge algorithm duplicate an if statement...so it's not a concern unique to databases or even binary files.

accalia

@tarunik said:

Hrm -- don't we see this case with text file merges, too?

yes, we do.

@tarunik said:

In fact, this'd be easier to deal with in a database situation -- we simply say 'oh, both sides have the same PK and the same data for this row, so we grab it

yes, but then we're back to writing that BLOB parser/merger. and databases aren't even the worst contenders for BLOB merging. could you sensibly merge an executable? (ok, you shouldn't have those in the repo anyway, but they do get in there and they're not always in there ad build artifacts that can just be regenerated)

tarunik

@accalia said:

could you sensibly merge an executable?

Given a fully-fledged executable and a known executable file format (say an ELF SO with full symbol tables) -- I wouldn't be surprised if it was possible. Sections and symbols impose a surprising amount of structure on executables...;)

accalia

@tarunik said:

I wouldn't be surprised if it was possible.

true, but would you trust the human doing the merge to get it right 100% of the time?

dkf

@tarunik said:

I have seen SVN's merge algorithm duplicate an if statement...so it's not a concern unique to databases or even binary files.

SVN doesn't have a particularly good implementation of a merge algorithm, FWIW. It's one of the things that really sucks about it. Other VCSs (including git) do a better job.

@accalia said:

true, but would you trust the human doing the merge to get it right 100% of the time?

Belgium that! I urge you to not trust me to get it right all the time! ;)

accalia

@dkf said:

■■■■■■■ that! I urge you to not trust me to get it right all the time!

i don't trust me to get it right 100% of the time, why should i trust you to get it right 100% of the time?

boomzilla

@dkf said:

SVN doesn't have a particularly good implementation of a merge algorithm, FWIW. It's one of the things that really sucks about it. Other VCSs (including git) do a better job.

Seems like it's gotten better recently, though.

tar

Most of my experience with SVN is on a single-person project, which is probably the ideal situation as far as svn diff is concerned, but with 1.8, I've managed moderately complex branch merges (e.g. third party library upgrade) without hitting an unreasonable number of conflicts.

PJH

@dkf said:

SVN doesn't have a particularly good implementation of a merge algorithm, FWIW

I don't seem to have had many problems with it personally - about the only thing it seems to consistently throw up a conflict for, for me, is the date line on kernel .config files (while successfully merging the actual changes in the config.)

Weng

@blakeyrat said:

Blakeyrat is not a person

This saddens me. Particularly because I've been having an email fight with my alleged boss (HR says he isn't so I don't give a fuck) all day and channeling blakeyrat

dkf

@boomzilla said:

Seems like it's gotten better recently, though.

Good to hear. It used to be utterly miserable.

I think one of the things that helped was that they got better at recording metadata about merges. AIUI, having that sort of thing allows advanced merge algorithms to do much better at synchronizing, tracking how things have moved around, etc. I don't know the details, but I've talked to people who have developed SCMs and they say this sort of thing is very useful.

I wonder if they “borrowed” the algorithm from somewhere else? That's what I'd do…

FrostCat

@blakeyrat said:

That is the part that would have to be added, you fucking piece of shit. We've been over this like twice already.

Listen, you pansy, I was speaking rhetorically. I can't help it if you're not smart enough to be able to pretend you understand that.

sloosecannon

@TheCPUWizard said:

First one does a ReSharper to re-org the file into one format and then begins making changes. Second does the same thing but a different re-org style....Now without a lock, things are hosed.

Having seen that exact thing happen in our repo (not ReSharper, but IntelliJ, which does the same thing), nothing gets hosed at all. Merges are kinda painful, but Git and IntelliJ are usually smart enough to figure out when automated or whitespace changes are the only changes and automerge. Then you just go through the conflicts individually and you're done.

To throw my two cents in - the only time I've found locks to be useful is when we're doing something massive like refactoring the ENTIRE project and we don't want anyone throwing changes in to the mix (Note that this was with SVN when I did it)... Even then, in Git, I could just lock the repo down so only I had write access.

I could see a use in some kind of an external git addon that marks files RO based on whether or not an external server says they've been locked. But it wouldn't actually block any changes, just be a friendly reminder. Honestly, that's a small enough tool that anyone who might really need locks could probably write it.

dkf

@sloosecannon said:

Even then, in Git, I could just lock the repo down so only I had write access.

And even then, you could just do it on a branch and then swap the branch labels around once you're done. Or do it in a separate repository. (That's what we do with big refactorings.) Once you think “each change makes its own version on its own branch”, the need for locking reduces a lot. It is a different mindset.

sloosecannon

@dkf said:

And even then, you could just do it on a branch and then swap the branch labels around once you're done. Or do it in a separate repository. (That's what we do with big refactorings.) Once you think “each change makes its own version on its own branch”, the need for locking reduces a lot. It is a different mindset.

Heh. Yeah, if only the guys I work with would try that concept. Everything on master!!!!!!!!!!!

Luckily this is just a mod team so it's a bunch of guys working on it in their spare time. Unfortunately, it's a mod team so I can't tell them "I'm the repo owner, my way or the highway"

dkf

@sloosecannon said:

Everything on master!!!!!!!!!!!

WellTheresYourProblem.mp4

TheCPUWizard

@sloosecannon.... For simple "whitespace" type changes, sure... but run a tool (I just used ReSharper as an example) that changes the order of the members within the file (perhaps reverse alphabetical order), Changes all of the casing (ALL CAPS for privates <evil grin>), Order of parameters to method calls, and a few more things.....and I have yet to find any single compare or merge tool that will properly deal with two conflicting sets (totally different rules).

sloosecannon

@TheCPUWizard said:

@sloosecannon.... For simple "whitespace" type changes, sure... but run a tool (I just used ReSharper as an example) that changes the order of the members within the file (perhaps reverse alphabetical order), Changes all of the casing (ALL CAPS for privates ), Order of parameters to method calls, and a few more things.....and I have yet to find any single compare or merge tool that will properly deal with two conflicting sets (totally different rules).

But locking doesn't really fix that... I seem to recall IntelliJ figuring that kind of stuff out too. I could very possibly be wrong though.

@dkf said:

WellTheresYourProblem.mp4

Mhmm. So much that. But "Merging is confusing, and the last time we tried to merge, we almost deleted the whole repo, and this and that"
Someone tried to merge something in and ended up FORCE PUSHING, obliterating the repo. Fortunately I had an up-to-date local copy..... I removed history writing privileges after that debacle...

tar

@sloosecannon said:

the last time we tried to merge, we almost deleted the whole repo,

:fail.xlsx:

dkf

@sloosecannon said:

Someone tried to merge something in and ended up FORCE PUSHING, obliterating the repo. Fortunately I had an up-to-date local copy..... I removed history writing privileges after that debacle...

That's why history rewriting is a terrible thing. I know it has use-cases, but it's such a catastrophe in the hands of an authorized ass-hat…

EvanED

@dkf said:

That's why history rewriting is a terrible thing.

I know! I actually deleted a bunch of important files at one point, and now I rail against the stupidity that is rm whenever I get the opportunity!

;-)

(Actually the funny thing is that's actually more true than you might think; one of these days I want to write a well-behaved trash command and then alias rm to it, but I've not gotten around to it. In this context, "well-behaved" means it does not move data across mount points; this means you need a separate trash folder on each partition/network drive.)

TheCPUWizard

"But locking doesn't fix that"

Sure it does by forcing serialization of operations on the file.

Kian

@sloosecannon said:

"Merging is confusing, and the last time we tried to merge, we almost deleted the whole repo, and this and that"

No one has been complaining about merging text. Merging is good, and it's why Git and other distributed systems are awesome. Two people can work on their own, then merge their changes. Yay!

The problem is with file types that can't be merged because there is no tool available, or perhaps the format itself makes it impossible. Like images, excel sheets, etc. If two people take the same file, each of them does their own independent changes, and then commit, you get an unresolvable conflict. One of the persons has to take the file modified by the other and redo their work on that version. Notice that the nature of the file itself forces changes to be sequential. The time they spent was wasted.

With a locking mechanism, when the second person starts working, they'd see that the file is being modified by another person, and instead of wasting their time they'd be able to get started on some other task.

Scarlet_Manuka

@blakeyrat said:

What's the "one thing" Microsoft Excel does well, for example?

<a="http://www.joelonsoftware.com/uibook/chapters/fog0000000065.html">Keeping lists turned out to be far more popular than any other activity with Excel. And this led us to invent a whole slew of features that make it easier to keep lists.

dkf

@EvanED said:

Actually the funny thing is that's actually more true than you might think; one of these days I want to write a well-behaved trash command and then alias rm to it, but I've not gotten around to it. In this context, "well-behaved" means it does not move data across mount points; this means you need a separate trash folder on each partition/network drive.

That's tricky to get right since you also need to manage when to not move stuff to the trash at all (perhaps when it's below /tmp?) and you need to make sure you've got enough space left on the drive for whatever you want to write. That last part is not quite as critical as it used to be — drives are pretty damn big now — but it's a real concern. You also need to deal with writability by different users. Probably means doing something like putting a /.trash directory in the root of each filesystem and giving it permissions like /tmp has. I'd guess that'd mean falling back to the old behaviour when there's no trash folder would also work. And you'd also want to integrate with the desktop environment's trash system if that's present.

Fiddly work. I'm usually happy with plain rm, but I use OSX anyway…

Jaloopa

When I was learning Unix and Bash scripting I had a project to create a recycle bin type thing. Didn't think of any of the more edge casey concerns you've brought up. I think it handled overwriting if you trashed two files with the same name (possibly versioning, don't remember after all this time), restore and emptying

tar

@dkf said:

I'm usually happy with plain rm,

I the kind of person who is always holding down Shift when I delete files in a UI, to save having to then perform an unrelated task to get the disk space back.

dkf

@TheCPUWizard said:

Sure it does by forcing serialization of operations on the file.

Fortunately, my checkout contains a different file to yours because my home directory has a different name to yours and is located on a different filesystem. ;-)

More seriously, locks relative to a particular revision of a file — and hence to a particular collection of states of the file tree — can most certainly work, provided they're set when the file is committed. They'd be trivial to create. However, they're nothing like as effective as people seem to think, nor would they be as useful as people think either. There's also the question of whether, if someone deliberately creates a branch, any locks that were in place at the point where the branch was created should be inherited. You can't have locks propagating across the whole history for the file, because one relatively common scenario is to have several production branches at once (e.g., a branch for each supported release, and a current development mainline branch) and there's no good reason for a lock to be meaningful between those.

All operations on a file on a particular timeline (= branch) are serialized already.

@Kian said:

The problem is with file types that can't be merged because there is no tool available, or perhaps the format itself makes it impossible.

It would seem to me that the most productive approach for everyone would be to stop bitching about this and write diff tools that can do the job. If they care.

It should be quite easy to do diffing/merging of things like documents and spreadsheets, as they're typically principally a collection of XML files now. All that's needed to get started would be a ZIP comparator (typically by content/location of extracted files) and an XML structure diff (because doing the diff on the text form would be a miserable failure). The main complexity would be doing the same for images (including embedded images). I don't know those algorithms at all…

@Kian said:

With a locking mechanism, when the second person starts working, they'd see that the file is being modified by another person, and instead of wasting their time they'd be able to get started on some other task.

In many workplaces, that wouldn't work out exactly like that. One person would lock the file for far too long (because they're adjusting the colour of each pixel by hand or something, I don't know) and the other person would just sit on their hands, possibly while bitching to their manager, waiting for the person with the lock to finish their slow and (in this scenario) largely pointless task. In other words, you've gotta account for asshats, and some asshats love locking as it lets them prevent other people from working while letting them seem to be busy themselves.

People are too terrified of merging. People are too terrified of branching.

Jaloopa

@dkf said:

It should be quite easy to do diffing/merging of things like documents and spreadsheets, as they're typically principally a collection of XML files now. All that's needed to get started would be a ZIP comparator (typically by content/location of extracted files) and an XML structure diff (because doing the diff on the text form would be a miserable failure). The main complexity would be doing the same for images (including embedded images). I don't know those algorithms at all…

There are also things like embedded documents. My word document might have an Excel spreadsheet in it, which could also have changed. The diff tool would have to be recursive, which starts getting... messy

dkf

@Jaloopa said:

There are also things like embedded documents. My word document might have an Excel spreadsheet in it, which could also have changed. The diff tool would have to be recursive, which starts getting... messy

You have to deal with that anyway, otherwise you're stuck with diffing effectively undifferentiated binary data, and who would want to do that?!

DoctorJones

@gestahl said:

Gets turned into this using R# 8.2:
public string Foo {
    get { return this.context ?? (this.context = "foo"); }
}</blockquote>
Filed nuder: That's fucking hideous

JBert

@EvanED said:

Git has you covered with (1) (believe it or not, it never stores deltas).

Actually, that's not really true. It does store deltas if it thinks it can compress a file in the "packed" history that way.

See http://git-scm.com/book/en/v2/Git-Internals-Packfiles

JBert

Also, I think nobody mentioned Perforce's P4Sandbox yet?

It's a centralized VCS which allows you to run a local node so that you can work offline. It won't allow local nodes to communicate directly like Git does, so any conflicts (and locks) are always enforced by the central node.

boomzilla

@Kian said:

The time they spent was wasted.

It depends on the nature of the changes, of course. The cognitive work to decide what changes to make might not be lost. Especially if the two changes don't actually overlap.

tarunik

@Kian said:

perhaps the format itself makes it impossible. Like images, excel sheets, etc

Are you sure about this? I can understand some niche format having no mergetool, but merging changes in things as common as images and Excel files should definitely be doable....

@dkf said:

It would seem to me that the most productive approach for everyone would be to stop bitching about this and write diff tools that can do the job. If they care.

Exactly! Even for a binary format -- doesn't, say, a spreadsheet have enough structure in it that you can use that as the basis for diffing and merging?

@Jaloopa said:

There are also things like embedded documents. My word document might have an Excel spreadsheet in it, which could also have changed. The diff tool would have to be recursive, which starts getting... messy

This is mildly annoying, but certainly nothing that can't be done!

Kian

@tarunik said:

Exactly! Even for a binary format -- doesn't, say, a spreadsheet have enough structure in it that you can use that as the basis for diffing and merging?

A good merge tool has to provide the full functionality that the format supports, because merging two versions might require editing the file (not just picking changes). In fact, you need to be able to display all four versions (original, yours, theirs, and merged). At which point, you need to reimplement the entire program that produced the file, with the additional requirement of showing the other three reference files.

So sure, it is not physically impossible to reimplement Office, Photoshop, AutoCAD, etc, with the diff-functionality tacked on. Most projects don't have infinite resources to reimplement entire other programs, however. Not when the payoff is that two guys don't need to coordinate their access to a file.

tarunik

@Kian said:

reimplement

Isn't this what add-ins/extensions are for? Making the merge-tool an Excel/Photoshop/... plugin would be the sensible way out of this mess...

JBert

That still buttumes that you can easily make a plugin to provide the needed functionality. In this case you need custom UI to show differences and extra tooling to merge. Most plugins for e.g. Photoshop are likely made to operate on just a single image e.g. like a filter.