Wtfbuild: A build system for the next decade.

flabdablet

I don't think this is a solvable problem.

Not by automated means. If the scenario you describe is happening often enough to slow things down, that's a project management issue.

ben_lubar

@Kian

$ git log --oneline  | grep 'remove all references to Discourse'
730d125 remove all references to Discourse. add NodeBB support.
$ git worktree add ../Kian 730d125
Enter ../Kian (identifier Kian)
Checking out files: 100% (245/245), done.
HEAD is now at 730d125... remove all references to Discourse. add NodeBB support.
$ ls -la ../Kian
total 31
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 ./
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 ../
-rw-r--r-- 1 Owner 197609   82 Feb 14 21:55 .git
-rw-r--r-- 1 Owner 197609   45 Feb 14 21:55 .gitignore
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 Docs/
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 ImportDiscourseComments/
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 Lib/
-rw-r--r-- 1 Owner 197609  485 Feb 14 21:55 README.md
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 SqlScripts/
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 TheDailyWtf/
-rw-r--r-- 1 Owner 197609 2330 Feb 14 21:55 TheDailyWtf.sln
$ ls -la
total 46
drwxr-xr-x 1 Owner 197609    0 Feb 13 10:57 ./
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 ../
drwxr-xr-x 1 Owner 197609    0 Feb 14 21:55 .git/
-rw-r--r-- 1 Owner 197609   45 Jan 25 13:20 .gitignore
drwxr-xr-x 1 Owner 197609    0 Feb 13 10:57 .vs/
drwxr-xr-x 1 Owner 197609    0 Jan 25 13:20 Docs/
drwxr-xr-x 1 Owner 197609    0 Feb 13 10:57 ImportDiscourseComments/
drwxr-xr-x 1 Owner 197609    0 Feb 13 10:57 Lib/
drwxr-xr-x 1 Owner 197609    0 Feb 13 10:24 packages/
-rw-r--r-- 1 Owner 197609  485 Jan 25 13:20 README.md
drwxr-xr-x 1 Owner 197609    0 Jan 25 13:20 SqlScripts/
drwxr-xr-x 1 Owner 197609    0 Feb 14 08:56 TheDailyWtf/
-rw-r--r-- 1 Owner 197609 2330 Feb 13 10:57 TheDailyWtf.sln
$ git status
On branch tdwtf-crp
Your branch is up-to-date with 'origin/tdwtf-crp'.
nothing to commit, working directory clean
$ (cd ../Kian/ && git status)
HEAD detached at 730d125
nothing to commit, working directory clean

Is that not what you wanted?

Kian

@accalia said:

[git clone] is insufficient?

Yes. It looks like it works in a small repo with just text files, building on a single machine. But imagine you have a multi-GB sized repo (as is common in the real world, with binary resources, documentation and build scripts living with the code so that getting set up is as simple as checking out a copy and typing build), and several slaved build machines.

Each build machine will need it's own copy of the source files (it makes the design easier if I don't have them share). If they each have to download the whole repo just to build something, that's a lot of unnecessary network activity. One nice thing about the archive command in this scenario (which is why I mostly forgive it the extra unpacking step) is that I can specify just the source files to be archived, and I can do it to a remote repo. And eventually sending them over a network will benefit from compressing the text files. So instead of multi GB downloads, the build machines can check out just the source, build, and place the object files wherever the one doing the linking will grab them from.

Clone doesn't play nice with this design. And I don't want to redesign the whole build system around git's idiosyncrasies, since svn and mercurial for example do exactly what I wanted. Mercurial's archive command even lets you switch between extracting just the files or the files as an archive of some type, which is perfect (I didn't check the docs for svn, but I imagine it probably does too).

@ben_lubar said:

Is that not what you wanted?

While it looks like it works better than clone, it still presents issues when you want to have a distributed build system. For example, I don't know if you can create a worktree for a remote repo. But even if you do, the repo has to keep track of the worktrees, which means you're modifying the state of the repo. Considering how prone git is to losing its HEAD (haha), I don't want to have multiple machines trying to make changes at the same time. Reading a repository should be a const operation, it shouldn't change the thing being read.

blakeyrat

@Kian said:

Yes. It looks like it works in a small repo with just text files, building on a single machine. But imagine you have a multi-GB sized repo (as is common in the real world, with binary resources, documentation and build scripts living with the code so that getting set up is as simple as checking out a copy and typing build),

OMG Gaska is gonna yell at you, you are WAY DOING IT WRONG!1!!

Yamikuronue

I keep build tools in a separate repository (you could use a branch in git) so you don't have to check them out if you don't want to.

DogsB

@Yamikuronue said:

I keep build tools in a separate repository (you could use a branch in git) so you don't have to check them out if you don't want to.

Gaaaahhh I wanted our build engineer to do this and for them to manage it themselves. Instead those scripts lived in a plugin in the our repository and the build engineer had to be involved in every rebase because you never knew what was going on in that folder.

tar

@Kian said:

imagine you have a multi-GB sized repo... with binary resources,

Large binary datas? Git's greatest weakness!

Buddy

why is that better than just keeping a copy of the repo on each machine and just checking out the desired commit each time you build?

Bulb

@Kian said:

@accalia said:
[git clone] is insufficient?

Yes. It looks like it works in a small repo with just text files, building on a single machine. But imagine you have a multi-GB sized repo (as is common in the real world, with binary resources, documentation and build scripts living with the code so that getting set up is as simple as checking out a copy and typing build), and several slaved build machines.

Actually, that's precisely where you must do git clone.

With git archive or similar, you'll be copying the data to each build directory over and over again and again, because you don't have metadata there. With git clone, you can update it, which means you only need to pull the couple of new revisions and write the couple of changed files to disk.

You can save additional data by using the --reference argument to git clone. You can create one mirror of each configured repository on each build machine, update it periodically and make all other clones reference it. Then the older history will only exist in that one mirror and each working directory will have only the most recent changes.

You can also avoid copying over old histories by using the --depth argument to git clone. You don't need old history on build server, right.

And in most projects, the .git directory is not much larger than the extracted source even if you don't employ any of the above space-saving measures.

And last, but not least, don't forget, that most build scripts query version control metadata. So the build directory requires a clone anyway.

@Kian said:

One nice thing about the archive command in this scenario (which is why I mostly forgive it the extra unpacking step) is that I can specify just the source files to be archived, and I can do it to a remote repo.

That will, usually, be all of the checkout, anyway. If it isn't, you should be using submodules.

Kian

@Buddy said:

why is that better than just keeping a copy of the repo on each machine and just checking out the desired commit each time you build?

The goal is a distributed build system that many developers might be using simultaneously to build and test various versions of the application at the same time. One dev might be working on a new feature and building and running tests against the latest dev version on his branch. Another might be doing bug fixes and searching where a bug was introduced, doing a bisect and thus building many previous versions. Another might be doing a hotfix and working on a previously released branch.

The slaved build machines need to simultaneously be compiling source for each of those, so having a single repo and checking out the appropriate version is a synchronization bottleneck. I could have a repo on each build machine, so that I don't have to hit the central repo every time, but I still have the problem that I need to then extract the source files for a specific revision into a build directory and compile those, while simultaneously extracting the files for a different revision and building those.

So cloning and checking out is insufficient for my needs.

@Bulb said:

Actually, that's precisely where you must do git clone.

Dock yourself five points for starting a post with "Actually". Also, as I've described above, git clone is insufficient. I realize I never explicitly stated the requirement of being able to build any revision in the past simultaneously, but that capability is implied by the stated requirement of having multiple developers running builds at the same time. Devs may be working on any point on the history, so the build system can't just focus on building the last revision of a given branch.

@Bulb said:

And last, but not least, don't forget, that most build scripts query version control metadata. So the build directory requires a clone anyway.

The build master is smart and makes all those decisions. The build master has the official history and master repo. The build slaves just provide the muscle. You give them a set of files and tell them "compile this subset of sources, with these parameters, and give me the output". Some may then be asked to perform additional operations ("run these unit tests, tell me if it passes", "build this installer", etc), but it's just the build master doling out units of work to the slaves.

@Bulb said:

That will, usually, be all of the checkout, anyway. If it isn't, you should be using submodules.

Usually where? Most projects I've seen keep everything related to a single project in the repo of that project, and only use "submodules" (or the equivalent outside of git) for dependencies.

Bulb

@Kian said:

I realize I never explicitly stated the requirement of being able to build any revision in the past simultaneously, but that capability is implied by the stated requirement of having multiple developers running builds at the same time.

git clone --no-checkout --reference /path/to/mirror url:/to/repository dir
cd dir
git checkout abcdef01234567…

Transfers about, um, nothing, if the mirror is up-to-date.

You create the mirror once. Then you transfer just the updates and those will be small.

@Kian said:

The build master is smart and makes all those decisions. The build master has the official history and master repo. The build slaves just provide the muscle. You give them a set of files and tell them "compile this subset of sources, with these parameters, and give me the output".

And “compile this subset of sources” is quite likely to start with:

version=$(git describe)
revision=${version#*-g}
echo "#define VERSION \"$version\"" > version.h
…

The logic must be in the build script, so it runs when developers build locally as well. So if you don't have checkouts, you are forcing the developer to create two versions of the logic, one for checkout and one for build server. Which is pretty much exactly what you don't want to do.

@Kian said:

Most projects I've seen keep everything related to a single project in the repo of that project,

Yes, of course. But to build a project, you usually need everything related to it.

Our project does have a heap of graphic resources in the repo. Yes, I wouldn't need them in all workspaces. I do need them at some, though, because they are needed to build the package, and saving that 242 MiB just isn't worth sorting out what exactly I need where.

Buddy

@Kian said:

but I still have the problem that I need to then extract the source files for a specific revision into a build directory and compile those, while simultaneously extracting the files for a different revision and building those.

So cloning and checking out is insufficient for my needs.

I believe that you are incorrect, though I have never worked with a multi-gb git repo. My understanding is that local clones use hard-links, which makes making a clone of an already-checked-out repo highly efficient. No files would need to be transferred, no files would need to be copied, only the the files that have changed between master and the checked-out revision in the cloned dir would take up extra space on the disk.

Kian

@Bulb said:

And “compile this subset of sources” is quite likely to start with:

That's one way of doing it. Another which is also quite widespread is to have version.h in the repo and update it manually. Like here: https://github.com/SFML/SFML/commit/01d72438debdf0ecc75260a3e7d7201c130537d5

But more generally, running scripts is not compiling. When I say compile, I specifically mean running gcc [options] -c file.cpp -o file.o "Compile this subset of sources" would mean running that command for file#1.cpp to file#n.cpp from a total of m files (with n < m). Running scripts can be a step before compiling, and the output would then be handed to anyone that needs it (caching requires knowing when an included header changes, so you'd already know what source files include version.h).

There's a pretty tight coupling between scripts and build system, so of course I'll have to look into how compatible I want to be with existing practices (being able to import a cmake project, for example, would be nifty) but I'm fine with some scripts needing to be rewritten to fit my system.

@Bulb said:

The logic must be in the build script, so it runs when developers build locally as well. So if you don't have checkouts, you are forcing the developer to create two versions of the logic, one for checkout and one for build server. Which is pretty much exactly what you don't want to do.

Once you've solved the problem for n build machines, a local build is the same thing with n = 1.

@Bulb said:

Our project does have a heap of graphic resources in the repo. Yes, I wouldn't need them in all workspaces. I do need them at some, though, because they are needed to build the package, and saving that 242 MiB just isn't worth sorting out what exactly I need where.

As I said, the master doles out units of work as required. A unit of work can be running a script, compiling a file, processing some resource, zipping files together, running unit tests, etc. Units of work might depend on other units of work (whoever includes version.h has to wait for whoever is going to build it to finish and pass it along) or might not (compiling two source files shouldn't require any synchronization). The one in charge of keeping track of that is the master.

I'm not worried about saving space, I just want to be able to give a slaved build machine exactly what it needs to perform a single task, when it needs it. Cloning creates too much state that has to be tracked. State is bad. Copying files creates no state at all beyond the existence of the files in a directory.

Kian

@Buddy said:

I believe that you are incorrect, though I have never worked with a multi-gb git repo. My understanding is that local clones use hard-links, which makes making a clone of an already-checked-out repo highly efficient. No files would need to be transferred, no files would need to be copied, only the the files that have changed between master and the checked-out revision in the cloned dir would take up extra space on the disk.

Hmm, I take it a "local clone" is something different from a regular clone? That could work.

Buddy

Well, it's basically the same thing. Git just recognizes when the the repo that you have asked it to clone is on the same drive as the directory you are cloning into and behaves more efficiently.

ben_lubar

@Bulb said:

Transfers about, um, nothing, if the mirror is up-to-date.

Or you could use worktrees and get the same effect with no disk overhead.

blakeyrat

@Bulb said:

submodules

Don't work.

accalia

@blakeyrat said:

@Bulb said:
submodules

Don't work.

okay, obligatory question.

why don't they work?

Under what circumstances have you used them and what failures have you encountered?

blakeyrat

@accalia said:

why don't they work?

No GUIs I've encountered support them.

ben_lubar

GitHub Desktop supports them, and it's the only git GUI I've used recently.

accalia

@blakeyrat said:

No GUIs I've encountered support them.

ah. so by "they don't work" you really mean

"I hate GIT and the command line! hate! hate! hate! GUI is life! my precious GUI doesn't support it so it doesn't work at all! If the GUI doesn't support it, and hold my hand walking me through all its functionality then it id bad and broken and no one can possibly believe that it works!"

got it.

Submodules work just fine if you use the officially supported interface to GIT, and if you learned that interface 90% of your problems with GIT would disappear.

but then you don't want to hear that so just replace what i'm saying here with incomprehensible gibberish about how heathcare should be a constitutional right or something else apropriately inflamitory.

blakeyrat

That must be brand new, then. It didn't 6 months ago.

ben_lubar

Ok, I just looked and apparently it supports loading submodules but not changing them.

tar

@blakeyrat said:

[git submodules] Don't work.

I'm not really sure I can think of a usecase for them anyway. If I want to manage external dependencies in my project, I already have a local git repository to store them in. Depending on other repos outside my control is just a disaster waiting to happen.

(Also, it's one whole subsection of the git "manual" and "interface" I don't need to bash my head against, but that's a side benefit...)

blakeyrat

@ben_lubar said:

Ok, I just looked and apparently it supports loading submodules but not changing them.

If it's AWARE of them, that's better than most tools.

Usually what happens is I use the CLI to sync a submodule in the proper location, then the Git GUI tools go nuts thinking those are all new files that ought to be committed in the main repo.

@tar said:

I'm not really sure I can think of a usecase for them anyway.

The use-case is "Git is really shitty at handling large repos and also really shitty at handling binary files, so what's the shittiest way we could attempt to remediate those issues?"

Bonus WTF: Chrome's spellchecker contains "remediation" but not "remediate".

OffByOne

@tar said:

I'm not really sure I can think of a usecase for [git submodules] anyway. If I want to manage external dependencies in my project, I already have a local git repository to store them in.

If those external dependencies are in a git repository somewhere (like on GitHub or another repo on your company git server), there is no compelling reason to also include a copy of those sources with your own project.

@tar said:

Depending on other repos outside my control is just a disaster waiting to happen.

Why?

Keep in mind that a git is a pointer to a specific commit in an external git repository.

Unless history rewriting happens in the remote repo, I can't think of reasonable failure modes. If history rewriting happens in the remote repo, someone needs to be cluebatted.

tar

@OffByOne said:

If those external dependencies are in a git repository somewhere (like on GitHub or another repo on your company git server), there is no compelling reason to also include a copy of those sources with your own project.

You don't have any control over the continued availability of that server.
That's compelling, according to my definition of the term. Sure, on the balance of probability, github's probably not going to vanish permanently, but even a few hours of downtime at the wrong time could cause you to miss a deadline.

The point of version control is to store the entirety of your project, as far as I'm concerned. Anything less than that is adding unwarranted complexity.

Kian

The way I've used submodules is to store within the same repo what revision of your dependencies you currently build with. That said, your submodule would point at a repo you control, not to an external one.

asdf

@Kian said:

The way I've used submodules is to store within the same repo what revision of your dependencies you currently build with.

I prefer to let the build system fetch the dependencies. Build systems are way better at this than submodules.

blakeyrat

I kind of cackle with glee at the thought that someday GitHub might just unplug their servers and instantly fuck-over every hipster JS-loving developer on Earth.

Meanwhile us old codgers who keep our products ENTIRELY on our own servers will laugh and laugh.

asdf

That's why smart people always clone their open-source dependencies' repositories, whether they actually want to fork the code or not.

cvi

@asdf said:

I prefer to let the build system fetch the dependencies. Build systems are way better at this than submodules.

Out of curiosity - what build system are you using for this? I've seen a couple of attempts at this with e.g., cmake, but those haven't really convinced me that that's the way to go. (Of course, the general setup of those projects convinced me to run far fast, so that might be related.)

Right now I manage with svn:externals and a few scripts that help dealing with svn:externals when using "git svn" (i.e., svn backend, git frontend).

ben_lubar

I have my entire project history on my computer because I use something called git.

RaceProUK

I did that once with something called SVN

Kian

@asdf said:

I prefer to let the build system fetch the dependencies. Build systems are way better at this than submodules.

It is interesting that in the land of "do one thing only, and do it well", the source management program tries to expand into the build system's territory, managing dependencies for example.

Would you say that's because the build tools fail at their job, or is the source management software that's overstepping its bounds? Or is the philosophy itself broken?

asdf

@cvi said:

Out of curiosity - what build system are you using for this? I've seen a couple of attempts at this with e.g., cmake

Unfortunately, I don't know any good solution for C++. (I assume you're talking about C++ since you mentioned cmake.) Gradle has promised that they'll be working on C++ dependency management for a while now, but they still don't have anything either.

For pretty much every other language, there are multiple decent build tools which can fetch your dependencies from an internal Artifactory/PyPI/… server.

@Kian said:

Would you say that's because the build tools fail at their job, or is the source management software that's overstepping its bounds?

I'd say it's because people use crappy build tools and/or don't know how to use their build tool correctly. Dependency management is definitely not the VCS's concern.

ben_lubar

@asdf said:

Dependency management is definitely not the VCS's concern.

What system do you control which version of the dependencies get used with?

swayde

@ben_lubar said:

system do you control which version

An SCV? Isn't that a starcraft unit?

ben_lubar

@swayde said:

Isn't that a starcraft unit?

A shit collection vessel?

asdf

@ben_lubar said:

What system do you control which version of the dependencies get used with?

Nice trolling attempt, 8/10.

Kian

@ben_lubar said:

What system do you control which version of the dependencies get used with?

Well, wtfbuild will manage that for me. In ten years, give or take.

Bulb

@Kian said:

Another which is also quite widespread is to have version.h in the repo and update it manually.

I suppose you understand why there is no way to put the commit id into anything in the commit. Git also does not support writing it to anything upon checkout (you could have a smudge filter, but it only runs on files it is changing, so it might not update the version anyway). If you want repeatable builds, you need to know which revision they are from and for that you need the revision ID and the only way is to find out at the start of the build.

@Kian said:

I'm not worried about saving space, I just want to be able to give a slaved build machine exactly what it needs to perform a single task, when it needs it. Cloning creates too much state that has to be tracked. State is bad. Copying files creates no state at all beyond the existence of the files in a directory.

Oh, so you are trying to create a new distcc, not a build server. Ok, you don't need everything on each slave for that.

I am just not sure how many projects that would actually help. See, I come from a different background. I need from my build server to hand out tasks "build this thing for Android", "build this thing for iOS", "build this thing for Windows Phone 8.1", "build this thing for Windows CE". And each needs a cmake run, because each needs completely different toolchain that runs on different OS.

cvi

@asdf said:

Unfortunately, I don't know any good solution for C++. (I assume you're talking about C++ since you mentioned cmake.) Gradle has promised that they'll be working on C++ dependency management for a while now, but they still don't have anything either.

Yeah, sorry, should have mentioned C++.

As said, I've seen hacks on top of cmake. Should be doable with other systems, especially those built on top of general purpose languages (like scons/waf/premake), but it does mean spending some effort on writing and maintaining build scripts that do stuff they weren't really meant to do in the first place.

Something with built-in support would have been nice to try out.

Kian

@Bulb said:

Oh, so you are trying to create a new distcc, not a build server. Ok, you don't need everything on each slave for that.

I'm not doing either. I'm writing a distributed build system. Not one build server, and not many build servers working in tandem. One logical system split over many physical machines. Kind of like SETI@Home but useful.

@Bulb said:

And each needs a cmake run, because each needs completely different toolchain that runs on different OS.

You don't need a different cmake run for each target. You just have a clunky collection of tools. Building for a different architecture with different tools is logically no different from switching between debug and release modes. The build master would just need to be aware that some units of work can only be performed by some machines and not others.

Bulb

@Kian said:

You don't need a different cmake run for each target.

Yes, I do. Perhaps if there was a different tool that could configure several completely different compilers with completely different options in one run, I wouldn't. But cmake can only configure one compiler in one run, so I do.

Kian

Yes, that's why I said that you had a clunky collection of tools. And writing a tool that can configure several completely different compilers with completely different options in one run is one of the goals of this project.

tar

@Bulb said:

Perhaps if there was a different tool that could configure several completely different compilers with completely different options in one run, I wouldn't [need cmake].

Hmm. Now I'm wondering if there's value in writing an alternative to CMake that isn't horrible...

asdf

@Bulb said:

Perhaps if there was a different tool that could configure several completely different compilers with completely different options in one run,

Yes, there is:

Gradle User Manual

Bulb

@asdf said:

http://gradle.org/getting-started-native/

Yuck! Which documentation describes how it works and what it can and can't do? Videos don't count; I ain't gonna watch a video for an hour just to find whether they actually mention the things I want to know.

Basically, building for Linux, Windows NT, Macos and to a large extent Android is a trivial affair.

What is non-trivial affair is building for Windows RT/Metro/Universal/whatever-they-want-to-call-it-today, building for Windows CE/Embedded/whatever-they-want-to-call-that-today and building for iOS. As far as I can tell, there are no practical solutions that wouldn't involve the native build system, i.e. MSBuild + Visual Studio 2012 or newer, Visual Studio 2003-2008 and XCode respectively for the three platforms. And that is where CMake comes in: it does not actually attempt to build anything itself, it generates the projects for those native tools, the only ones fully supported for targeting those platforms.

I would certainly not mind ditching those tools for something consistent. But I simply don't see that on the horizon and I can't imagine Microsoft and/or Apple supporting anything but their Integrated Development Excrements.

asdf

They recently added support for building Visual Studio solutions. A quick Google search also showed that there's an XCode plugin.

I still don't understand what exactly it is you're trying to do, though. A better explanation might certainly help me answer your questions. ;)