Well-reasoned anti-Git article

dkf

Now I'll admit to not using submodules a lot, but in my experience with both submodules and svn:externals, that sort of splitting up of repositories sucks.

FWIW, we don't use them at all. You simply don't need them when working with Java (unless you're strongly committed to Doing It Wrong)…

Jaime

@dkf said:

The more you tell us, the more I think you guys have dug yourselves into a massive hole. The rest of us really don't want to jump down there after you!

The only problem I have is an endless stream of twenty-something's telling me I need to switch to a DVCS. What I did previously in SubVersion and do currently in TFS works great and I'm very happy with it.

Unfortunately, I know how it can be, so when someone tries to sell me git, I just sit there and say "Yeah, I have that... have that too... that sounds nice, but I don't want to trade it for this".

One of the big trade-offs is at the very heart of DVCS vs CVCS - whole repo cloning. I understand the benefits, and I think they're important for a few people (none of whom I've ever met). But I'm not willing to trade a smaller client footprint for a feature I don't care about. Whole repo cloning is also why granular security will never work in git. There's even this above:

@dkf said:

The sane thing to do is to share the authN problem and use simple authZ on each repo (i.e., everyone is either no access, read access or read-write access)

That's you telling me "no one would ever need granular security, you're TRWTF for even asking for it". As if that's some benefit of git - even though reasonable security has been in every VCS I've ever seen before git.

blakeyrat

@dkf said:

You're putting build artefacts in the repo?

No, we're explicitly not doing that, and I have no idea how you came to that conclusion based on what I typed.

If we were, we'd be able to put our business layer in another repo, and just copy the DLL over as-needed to update its version in the other projects. But we aren't so we can't. (And we shouldn't, because 90% of fixes and code changes end up touching the business layer, separating would be a huge time-sink for zero benefit. Plus the merge nightmares of 20 bug/feature branches all going at once all having a different version of the same binary DLL file.)

EvanED

@dkf said:

FWIW, we don't use them at all. You simply don't need them when working with Java (unless you're strongly committed to Doing It Wrong)…

Ah, that explains a lot.

Based on the fact that you think that's reasonable, I have some other activities I think you might enjoy:

Dumping a lot of Lego™ bricks on the floor, turning out the light, and walking around barefoot
Going to a biker bar and picking a fight
A January ascent of Mt. Everest

--
OK, I'll put facetiousness aside. Now, I only used CVS for a short time, but one of the reasons it is horrible is because.. you can't really even name a revision, because it doesn't know what that is. With Subversion, I can say "hey make sure you're on at least revision 1234" or "this problem repros on revision 23418". With CVS you can't do that for anything that isn't tagged, because every file is tracked separately. That single change was a tremendous boon to usability.

And... to my sensibilities, just sticking stuff in a few different repositories without even being connected by something analogous to submodules is taking that change and... maybe not completely throwing it away, but dramatically diminishing its benefit. What do you do when you want to communicate something like my statements above? Refer to the clock time of the commits you made? Give a bunch of different version numbers for all the repositories involved in your project?

I honestly don't see how you don't view this as a problem.

tarunik

@blakeyrat said:

Except all the projects have to be in the same repo because they all refer to the same business logic layer, implemented as a C# library which has to be included in each project's solution to ensure each is using identical business logic.

External projects are not some brand new invention...

@dkf said:

In the Java world, you'd just have the CI service (e.g., Jenkins) trigger on the repository changes and push verified builds straight to the build repository (e.g., Artifactory) and then developers can just pull everything they're not working on from the shared store (which itself can be rebuilt because it's all built from known versions of everything) by the power of Maven. While it's a little tricky to set up to start out with, it makes for really effective working after that (reliable too) and it's a technique that's been used for years now. Release builds are no more difficult either (though they often have more complex packaging steps).

For binary dependencies, having a company binary dependency repo can be quite handy :) Externals/submodules deal with the source code case acceptably well, in my experience.

@EvanED said:

And... to my sensibilities, just sticking stuff in a few different repositories without even being connected by something analogous to submodules is taking that change and... maybe not completely throwing it away, but dramatically diminishing its benefit. What do you do when you want to communicate something like my statements above? Refer to the clock time of the commits you made? Give a bunch of different version numbers for all the repositories involved in your project?

I honestly don't see how you don't view this as a problem.

QFT and +1 because I'm out of Likes.

@blakeyrat said:

Right; that's why corporations use systems like Stash, which work-around this misfeature by creating a branch of each Jira ticket.

Branch-per-ticket is fine by me -- branches can start off life anywhere you want them to in the DVCS world, so there's nothing stopping you from having local branch-per-ticket even if the rest of your world parties on master, and no need to have a centralized system like Stash simply to do branch-per-ticket branching.

blakeyrat

@tarunik said:

External projects are not some brand new invention...

It's not "external". The business layer project is included in every solution that makes use of it.

cartman82

Serious question for the people who said git submodules suck - why?

I'm currently starting to experimenting with splitting up one of my larger repos into submodules. The commands are a bit finicky (need to make sure everything is properly pulled and brought to the latest version etc.), but it seems like I'll be able to alias the usual commands so things will work more or less transparently. What am I missing?

dkf

@tarunik said:

For binary dependencies, having a company binary dependency repo can be quite handy Externals/submodules deal with the source code case acceptably well, in my experience.

With a build system like maven and a binary artefact repository (following the maven rules) you don't need to keep any of that stuff in your source. I just list the versions of the dependencies that are required and they're fetched as needed. It integrates with the IDE too. We have submodules, sure, but not git submodules. They're there for the organisation of what we're doing into sane pieces, not because of how we choose to store and version things. There's a strong case to be made that, at least with Java, if you're using git submodules then you're Jeffing.

(There's a lot wrong with Java, but its build systems aren't one of those things.)

@blakeyrat said:

The business layer project is included in every solution that makes use of it.

That'd be some sort of dependency, yes? It'd be a bit of a waste to have to rebuild that all the time; using an official (yes, that includes internal official) build is just sane, and knowing what version that is makes it possible to say whether things are compatible or not.

Or maybe it's some sort of source dependency where you've got to generate some code from the API description and then edit it? Those sorts of things are dreadfully fragile. Much better to generate the API and then derive from it by reference or inheritance; at least then you don't have to have automated tools trying to edit the same files as programmers (so eliminating a whole class of dumb-ass errors).

blakeyrat

@cartman82 said:

Serious question for the people who said git submodules suck - why?

2 of the 3 Git clients I have can't deal with them at all.

The third constantly thinks the folder containing the submodule is full of "changed" files, which I have to hit "ignore" on every fucking time I create a commit.

blakeyrat

@dkf said:

That'd be some sort of dependency, yes?

Yes.

@dkf said:

It'd be a bit of a waste to have to rebuild that all the time; using an official (yes, that includes internal official) build is just sane, and knowing what version that is makes it possible to say whether things are compatible or not.

You people don't read my posts at all, do you. I already talked about why we do it.

cartman82

@blakeyrat said:

2 of the 3 Git clients I have can't deal with them at all.

The third constantly thinks the folder containing the submodule is full of "changed" files, which I have to hit "ignore" on every fucking time I create a commit.

Sounds like git GUI-s are broken, not submodules themselves.

blakeyrat

@cartman82 said:

Sounds like git GUI-s are broken, not submodules themselves.

I am not one who enjoys pedantic dickweedery.

Submodules are broken. While they are broken, there are no solution to this problem. This moron problem that only exists because the developers of Git are incompetent retards who didn't look at what other SCM products did before building their own.

cartman82

@blakeyrat said:

I am not one who enjoys pedantic dickweedery.

Submodules are broken. While they are broken, there are no solution to this problem. This moron problem that only exists because the developers of Git are incompetent retards who didn't look at what other SCM products did before building their own.

OK, so you're basically like a Windows hater who's blaming the OS because some crappy driver caused BSOD.

I was hoping for a more sophisticated argument.

blakeyrat

I don't recall blaming anybody. I'm just saying that it's useless until it works. Right now, it doesn't.

I really don't know what you're looking for, here. Would you feel better if I specifically said it doesn't work in Visual Studio, and that's Visual Studio's fault? Ok. There you go. Some blame spread around. Whee.

But that doesn't change the fact that it does not work.

JazzyJosh

Or as alternative, use Maven or Gerrit like a normal person

EvanED

@cartman82 said:

Serious question for the people who said git submodules suck - why?

I'll try to answer my perspectives on this in the next couple days. It's been a while since I used them, and I want to do a few experiments first to make sure I won't be lying. But the short version is that, at least from what I could tell, the abstraction they purport to project of one monolithic repository is leakier than the Titanic's hull.

dcon

@dkf said:

the biggest DVCS repository I've got is about 220MB. That's freaking huge,

That's freaking huge? That's peanuts. Our repo is 3GB. And that's only the Windows side of things. Other platforms have their own.

tar

@EvanED said:

I'll try to answer my perspectives on [git submodules] in the next couple days.

I'm interested to know what problem they're intended to solve. Presumably there's some problem with git which prevents people from just having all the source code for a single project in a single repository.

(Disclaimer, about all I know about submodules is what I read from the OP that they're a DVCS-specific hack which is not necessary in the CVCS world...)

Jaime

@tar said:

I'm interested to know what problem they're intended to solve. Presumably there's some problem with git which prevents people from just having all the source code for a single project in a single repository.

Look at the discussion above; you'll see that the solution to most people's problems with git is "split your code into the smallest repositories possible". Submodules address the newly created problem of "help, my crap is everywhere".

EvanED

@tar said:

Presumably there's some problem with git which prevents people from just having all the source code for a single project in a single repository.

The problem is that "project" is not well-defined.

For example, suppose I am writing a program that has both a Program A portion and a Library 1 portion, which are sort of moderately coupled. Are they part of the same project or not? It'll often be the case that you say "I need to add Feature X" and decide that the logic should be implemented in Library 1, so you add it there, but then you also add calls to it in Program A.

If you consider Program A and Library 1 separate projects and put them in entirely disconnected repositories, you have the problem I mention in a different post above; you often will need to talk about both repositories at the same time, but because they're different repositories you lose the ability to specify a revision, you lose atomicity, you lose the ability to move files from one repository to the other, etc.

If all you have is Program A and Library 1, the solution will often be to put them into the same repository. But the problem with that arises is if you create Program B which also wants to use Library 1. Program A and Program B (in this example) are quite distinct, but both need to be in the same repo as Library 1 or you run into the problems above. And then you add Program C, which also needs to go into the same repo...

If you're in a company situation or something similar and using something like Subversion, putting everything into one big repository is actually really easy to do and works well. If I'm working on Program A, I check out trunk/Program A and trunk/Library 1. If you're working on Program C, you check out trunk/Program C and trunk/Library 1. The administrator can, if desired, say that I have no access to trunk/Program C and you have read-only access to trunk/Program A.

The reason this solution is crappy for things like Git is because you can't do that: when you clone a repository, you get the whole repository. Applying access controls at a finer granularity than an entire repo is... probably doable, but probably fragile. This means you're wasting tons of extra time, bandwidth, and storage space on stuff that you don't care about.

This is why if you look at large projects like KDE and Boost, they (I think) went from a monolithic Subversion repository to a bunch of independent Git repos.

Submodules are trying to find a middle ground between these -- let you have the multiple repos from a physical structuring point of view, but still be able to view it as one big one.

dkf

@dcon said:

That's freaking huge? That's peanuts. Our repo is 3GB. And that's only the Windows side of things. Other platforms have their own.

What are you doing to get it up to that size? Putting binary files in?

dkf

@tar said:

(Disclaimer, about all I know about submodules is what I read from the OP that they're a DVCS-specific hack which is not necessary in the CVCS world...)

They're a way to map one repository as a subdirectory of another one, which is a (crude) way of getting library inclusion. The general concept is most certainly not specific to DVCSs; I can remember doing very similar things back when I used CVS (in the dim and distant past).

FrostCat

@Bulb said:

Simple and easy are not synonyms.

It's interesting how often this kind of thing comes up in multiple places more or less at the same time.

FrostCat

@boomzilla said:

Whenever you try to use something someone else built, it seems like a case of, "It mostly does what I need, but not quite, and we'll have to hack some things in to make it work for what we need."

And then some idiot says "so rather than that, let's just write our own from scratch" which nicely kills the ide of introducing SCM into your company.