Version control done the WTF way



  • I work at a research-focused institute. What actually happens is we have one medium-to-largish software product written in C++ made up of a lot of small libraries and multiple executables that communicate using DBUS. That this DBUS communication involves a real-time video stream being passed around via shared memory is another WTF on its own, here I'll get more into our version control scheme.

    A full build of the system builds about 150 libraries and executables. Every single one of these 150 "subprojects" lives in its own git repository, quite a few of them only contain a single class (#1). "To avoid people messing up each other's code" every repository has exactly one developer that is allowed to push changes to it (#2). Branches are regarded with suspicion and as too difficult to handle because the number of repositories would make coordination difficult (#3). For vaguely specified security reasons contractors (yes, we have those) and interns are not allowed even read access to our source code (not even the basic libraries) instead of simply asking them to sign an NDA (#4).

    Since only one person is allowed to push changes for each subproject making a change that involves multiple libraries (very frequent, see #1) is done as follows:

    1. Make changes locally

    2. commit to a local branch in each of the subprojects

    3. run git format-patch to create a patch folder for each of the subprojects

    4. make a tarball of the patches

    5. E-Mail the tarballs to the people with commit rights for each depository or attach them to JIRA tasks created for the purpose

    6. Push the changes to the subprojects that you have push rights for.

    7. Wait until every developer has manually applied the patches, committed them and pushed them to their repositories, breaking the build for hours or days at a time when one of them doesn't get around to it.

    I've so far proposed to consolidate into a few repositories by a to be defined grouping (not that hard in this case) - rejected, "not granular enough rights management, people are going to be messing up each other's code".

    Oh by the way, our "build system":

    There's a folder with shell scripts that build everything in a fixed order and check out / pull changes for every subproject (#5). Since I'm the one who complained I get to try to find a replacement for this. So far I'm thinking of using git-slave as a way to somehow handle that insane amount of repositories for pulling changes and branching and rake to build it. Since the dependency structure is not documented anywhere (#6) I'm now looking for some way to at least semi-automate the discovery of the dependency structure, maybe some static code analysis tool that works on C++ in Linux.

    I've also broached the idea of branching along with two other developers and the response wasn't too bad, a bit of surprise that it can be done by some people. To help with paranoia part of it was mentioning that gitolite can set access rights per branch on the repositories it offers, a piece of information that no sane development environment would ever have reason to know. I'm still not completely pessimistic because there has to be a way to fix this mess.

    As a bit of clarification: The system has 16 people working on it, 4 have a CS degree, the others are engineers and scientists from different fields.



  • @witchdoctor said:

    I work at a research-focused institute. What actually happens is we have one medium-to-largish software product written in C++ made up of a lot of small libraries and multiple executables that communicate using DBUS. That this DBUS communication involves a real-time video stream being passed around via shared memory is another WTF on its own, here I'll get more into our version control scheme.

    Congratulations, you just wrote the most fucked-up paragraph in TDWTF history.



  • @morbiuswilters said:

    @witchdoctor said:
    I work at a research-focused institute. What actually happens is we have one medium-to-largish software product written in C++ made up of a lot of small libraries and multiple executables that communicate using DBUS. That this DBUS communication involves a real-time video stream being passed around via shared memory is another WTF on its own, here I'll get more into our version control scheme.

    Congratulations, you just wrote the most fucked-up paragraph in TDWTF history.

    Thank you very much, what do I win?

    Some clarification: When an executable has finished processing an image it writes its output image to a shared memory buffer. It then sends a DBUS message to a repeater executable that broadcasts it to all other executables. Executables that are interested in that output read the image from the buffer. The race condition between multiple processes reading from the buffer triggered by an asynchronous message and one process writing its output there is actually intentional. It's supposed to be an implementation of frame dropping when the CPU load goes too high.



  • @witchdoctor said:

    @morbiuswilters said:
    @witchdoctor said:
    I work at a research-focused institute. What actually happens is we have one medium-to-largish software product written in C++ made up of a lot of small libraries and multiple executables that communicate using DBUS. That this DBUS communication involves a real-time video stream being passed around via shared memory is another WTF on its own, here I'll get more into our version control scheme.

    Congratulations, you just wrote the most fucked-up paragraph in TDWTF history.

    Thank you very much, what do I win?

    I was going to say "freedom from the insanity", but I can't really give you that, only my friends Smitch and Wesson can do that..


    How about a dancing Kirby?

    <('_'<)

     (>'_')>

    <('_'<)



  • @witchdoctor said:

    The race condition between multiple processes reading from the buffer triggered by an asynchronous message and one process writing its output there is actually intentional. It's supposed to be an implementation of frame dropping when the CPU load goes too high.

    Ow, my head hurts.



  • It seem people wanted to prove that they could do worse with source control than without.


  • Discourse touched me in a no-no place

    @witchdoctor said:

    Thank you very much, what do I win?
    An internet. Or a purple dildo. I find it hard to keep track these days...



  • @witchdoctor said:

    The race condition between multiple processes reading from the buffer triggered by an asynchronous message and one process writing its output there is actually intentional. It's supposed to be an implementation of frame dropping when the CPU load goes too high.
     

    Ah, good ol' Brick Wall implementation of car brakes.



  • @dhromed said:

    @witchdoctor said:

    The race condition between multiple processes reading from the buffer triggered by an asynchronous message and one process writing its output there is actually intentional. It's supposed to be an implementation of frame dropping when the CPU load goes too high.
     

    Ah, good ol' Brick Wall implementation of car brakes.

    It stops the car, so no need to fix it?



  • @witchdoctor said:

    @dhromed said:

    @witchdoctor said:

    The race condition between multiple processes reading from the buffer triggered by an asynchronous message and one process writing its output there is actually intentional. It's supposed to be an implementation of frame dropping when the CPU load goes too high.
     

    Ah, good ol' Brick Wall implementation of car brakes.

    It stops the car, so no need to fix it?

    Bug report: program runs too quickly



  • Truth be told, this kind of non-intelligent, mechanically intrinsic behaviour is how logic gates are implemented in reality. But once you're in software, you don't actually need to do that anymore.



  • @Ben L. said:

    Bug report: program runs too quickly

    Solved by the input being real-time, that is a set framerate. When reading from disk this is done via timer. It kind of feels like at every point the opposite of the normal choice was made architecture-wise.

    Edit:

    But I did manage to cause something like this by speeding up the import from disk. Needed a work-around so it wouldn't barf when getting fed 180 fps.



  • @Ben L. said:

    Bug report: program runs too quickly
     

    Fix: do something heavier in the loop, like throwing and swallowing and error.

    Program runs slow again and is ready for the next generation of CPUs.



  • Every time some moron tries to sell the "branching is too hard" or any other misconception about git/hg/bzr I point out that the Linux kernel is developed by thousands of developers all over the world and they're still capable of doing it in a sane way, so why a 16 people team can't do it?

    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.



  • @ubersoldat said:

    Every time some moron tries to sell the "branching is too hard" or any other misconception about git/hg/bzr I point out that the Linux kernel is developed by thousands of developers all over the world and they're still capable of doing it in a sane way, so why a 16 people team can't do it?

    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Well, if your software project is spread over 150 separate repositories branching becomes slightly more difficult.



  • @dhromed said:

    Ah, good ol' Brick Wall implementation of car brakes.

    +1



  • @ubersoldat said:

    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.



  • @morbiuswilters said:

    @ubersoldat said:
    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.

    +1 internets



  • @morbiuswilters said:

    @ubersoldat said:
    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.


    +1 internets



  • @morbiuswilters said:

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.
     

    I would quote you, but I'd just have two posts.


  • Discourse touched me in a no-no place

    @witchdoctor said:

    Well, if your software project is spread over 150 separate repositories branching becomes slightly more difficult.
    In that situation, branching is extremely easy, even inevitable. Merging can be trickier though.



  • WTF are you using Git for when people don't understand basic VCS in the first place? I don't blame them for finding branching in a DVCS confusing!

    They'd be better off with VSS or something much less powerful!



  • @morbiuswilters said:

    @ubersoldat said:
    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.


    I lol'd.



  • @morbiuswilters said:

    @ubersoldat said:
    I really think that branching [u]and merging[/u] are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching [u]and merging[/u]." Now they [b]still have one codebase[/b].



  • @English Man said:

    They'd be better off with VSS or something much less powerful!

    No one is ever, EVER, "better off" with VSS.


  • Trolleybus Mechanic

    @mikeTheLiar said:

    @morbiuswilters said:
    @ubersoldat said:
    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.


    I lol'd.<< >>I laughed out loud in order to fully and unambiguously communicate with the joke's originator

     



  • @Lorne Kates said:

    @mikeTheLiar said:

    @morbiuswilters said:
    @ubersoldat said:
    I really think that branching and merging are concepts that like regexp, encoding and bitwise operations, set apart good developers from useless ones.

    Some people, when confronted with a codebase, think "I know, I'll use branching." Now they have two codebases.


    I lol'd.<< >>I laughed out loud breathed through my nose slightly more forcefully than usual in order to fully and unambiguously communicate with the joke's originator

     



  • I felt a slight increase in muscular tension around the lips and cheekbones, which died down within seconds.



  • @English Man said:

    WTF are you using Git for when people don't understand basic VCS in the first place? I don't blame them for finding branching in a DVCS confusing!

    They'd be better off with VSS or something much less powerful!

    It's a 1-1 conversion of the same structure in SVN. It used to be a folder shared via nfs with about 150 SVN repositories. The how to add a submodule documentation used to contain instructions for creating a new SVN repository. Now it's at least on a server and you need to talk to someone who knows the codebase before that happens. It's really hard to believe but the current situation is better than a year ago.

    Personally, I'd prefer 2 mercurial repositories for the system. 2 instead of one because there is some code that has a security clearance rating (the lowest one but still).



  • @English Man said:

    WTF are you using Git for when people don't understand basic VCS in the first place?

    Where - or when - should this be taught? Usually many programming courses focus upon how-to with the code but feature little design principles. Would this be a topic tackled in software engineering?

    So, throw that out to the code monkey experienced developers here: where (and when) did you learn VCS principles?

    I still don't know them fully; I've only had a brush with Subversion+Trac and TortoiseSVN but I was aware of their use, saw their benefit and found them easy to use. I've never forked or merged. Just updated, modified, tested then committed.

    @dhromed said:

    I felt a slight increase in muscular tension around the lips and cheekbones, which died down within seconds.
     

    That usually indicates the purple lubricant is running dry.


  • Discourse touched me in a no-no place

    @Cassidy said:

    @English Man said:

    WTF are you using Git for when people don't understand basic VCS in the first place?

    Where - or when - should this be taught?

    Should? On the programming courses.



    Of course the sky in my world is blue, without the flying pigs.



    My first brush with CVS was on the work placement I did while doing my (still unfinished) BSc. And it was VSS, and I was the only person using it because the shop was primarily C and OS-9, and the software concerned was the Windows UI and written in some ancient version (even at the time) of MSVC C++; making me best placed to mess around with it since I'd done C++ in the 2nd year.



    Current job uses svn, and we do branching, which while it wasn't entirely new to me when we started doing it, is currently overused. Because there are too many branches 'open' at the moment, and stuff that gets fixed in one branch usually needs porting to the other 10 (I exaggerate, but only slightly) branches. A problem fixed in 6.18.3b4 which is still out there? Well that fix needs to go into 6.18.4b7, 6.18.5rc, 6.18.6a4 which are also all still current. Oh, and all the 6.19.x branches. And with trunk.



  • Sometimes it is actually hard to decide whether to have the code in a single repo/project or separate repos/projects.


    I usually consider these things:



    1. Does it have to be used by multiple separate other projects?
    2. Does it have a clearly defined interface and test program, so that it can be developed independently?



      Then:



    3. False, 2. False: They go in same repo.
    4. False, 2. True: They go in same repo, unless it is to be expected that 1. is soon true.
    5. True, 2. False: Make 2. true. Otherwise you are in for trouble.
    6. True, 2. True: Separate repos.


  • @jpa said:

    Sometimes it is actually hard to decide whether to have the code in a single repo/project or separate repos/projects.


    I usually consider these things:



    1. Does it have to be used by multiple separate other projects?
    2. Does it have a clearly defined interface and test program, so that it can be developed independently?



      Then:



    3. False, 2. False: They go in same repo.
    4. False, 2. True: They go in same repo, unless it is to be expected that 1. is soon true.
    5. True, 2. False: Make 2. true. Otherwise you are in for trouble.
    6. True, 2. True: Separate repos.

    At a glance about 10 or so libraries (of 150 modules) have 1.True, of those about 4 have 2.True as well. All others are 1.False, with a lot of them also 2.False.

    The way I see it is that there needs to be a good reason for something to live in a separate repo (like being used by multiple different projects) because living in a separate repo means that it needs to be versioned and tested separately.



  • @Cassidy said:

    Where - or when - should this be taught?

    For me it was an elective class on QA during my bachelor's. I took it as an additional class and picked a different elective.



  • @dhromed said:

    I felt a slight increase in muscular tension around the lips and cheekbones

    You know that weren't the only places.

    Don't be shy. You're among friends here.



  • @morbiuswilters said:

    Don't be shy. You're among friends here.
     




  • @PJH said:

    My first brush with CVS [...] And it was VSS,
     

    Is that a typo of Version Control System, or do you mean that VSS is secretly a Concurrent Version System?


  • Discourse touched me in a no-no place

    @dhromed said:

    @PJH said:

    My first brush with CVS [...] And it was VSS,
     

    Is that a typo of Version Control System, or do you mean that VSS is secretly a Concurrent Version System?

    Sadly[1], neither. The fact it's been talked about on here recently does you no favours....



    [1] With hindsight, it wasn't that bad - but as I said, I was the only one changing code in it - I have no experience (nor, given what I've read since, would like to experience) using it with multiple users.



  • @Cassidy said:

    @English Man said:

    WTF are you using Git for when people don't understand basic VCS in the first place?

    Where - or when - should this be taught? Usually many programming courses focus upon how-to with the code but feature little design principles. Would this be a topic tackled in software engineering?

    At my university, we were required to fix a buggy, extraordinarily WTFy, abandoned OSS project in our (mandatory) software engineering course. IIRC, we had to mail a tarball of our clone of the repository after completing each problem set and they specifically checked whether we were using version control appropriately.

    I hated that course. In retrospect, on the other hand, it taught me more than all others combined.



  • @anonymous_guy said:

    IIRC, we had to mail a tarball of our clone of the repository after completing each problem set and they specifically checked whether we were using version control appropriately.
     

    But if they really wanted to check if you were using version control, they could have just checked out the latest code form your repo once you'd confirmed you're all finished and you'd be assessed against that version.



  • Just out of interest if I'm missing anything more and for ammunition when I bring this up again at work I'll make a list of the drawbacks of the source control system described in the OP (most of it is obvious, but I tend to miss the obvious sometimes):

    • Controlled branching/merging and tagging becomes extremely complicated.
    • Every repository has its own version instead of one version of the whole system with every commit.
    • Changes that span more than one submodule can break the build.
    • Nobody runs the pull-everything script regularly because pulling from 150 repos takes too long and is harder to control.
    • Single committers for each submodule means multiple single points of failure for every change.

    I fully expect very strong resistance. Does anyone have any advice?



  • @witchdoctor said:

    I fully expect very strong resistance. Does anyone have any advice?

    Scopolamine.



  • @witchdoctor said:

    Does anyone have any advice?
     

    Those five points you mentioned aren't a drawback of a source control system, they're more a symptom of a badly-used VCS.

    A VCS is an organisational product like any other - so will require someone to champion it, define procedures and policy, then educate people into its correct use in order to gain the maximum benefits from it. From what I've read, many WTFs emerging aren't with the VCS itself, but how it's poorly-used.

    You'll get the nay-sayers that claim it to be too complicated, that it's not needed. A simple way of tackling this head on is the simple question: are we doing things in the best way we are - and if not then where can they be improved? I'd normally kick off the meeting by trying to draw out all the issues that currently frustrate people, have them unload the baggage of the present methods. A VCS won't magically make them go away, but if things piss people off and you show willing to implement changes, some will go along with it.

    Only other advice I can give is to watch out for those that look for fault in the (change|new way) and use that as a reason not to proceed with the change. I usually hit this with "so should we keep things the way they are and not try out this new stuff".



  • @Cassidy said:

    @witchdoctor said:

    Does anyone have any advice?
     

    Those five points you mentioned aren't a drawback of a source control system, they're more a symptom of a badly-used VCS.

    A VCS is an organisational product like any other - so will require someone to champion it, define procedures and policy, then educate people into its correct use in order to gain the maximum benefits from it. From what I've read, many WTFs emerging aren't with the VCS itself, but how it's poorly-used.

    You'll get the nay-sayers that claim it to be too complicated, that it's not needed. A simple way of tackling this head on is the simple question: are we doing things in the best way we are - and if not then where can they be improved? I'd normally kick off the meeting by trying to draw out all the issues that currently frustrate people, have them unload the baggage of the present methods. A VCS won't magically make them go away, but if things piss people off and you show willing to implement changes, some will go along with it.

    Only other advice I can give is to watch out for those that look for fault in the (change|new way) and use that as a reason not to proceed with the change. I usually hit this with "so should we keep things the way they are and not try out this new stuff".

    By system I meant both the software and the usage pattern. I get that the software is not the problem here.

    Thanks for the advice, I think I'll do that but talk to people individually first. I'm not in a position to make the decision by the way, I need to convince our technical lead (who had the idea of splitting things like this in the first place) and our team leader first.



  • @Cassidy said:

    @anonymous_guy said:

    IIRC, we had to mail a tarball of our clone of the repository after completing each problem set and they specifically checked whether we were using version control appropriately.
     

    But if they really wanted to check if you were using version control, they could have just checked out the latest code form your repo once you'd confirmed you're all finished and you'd be assessed against that version.

    They could have done that, but that would have required too much work on their part, I guess. (Like setting up 500 repos on the university network.)



  • @Vanders said:

    @English Man said:
    They'd be better off with VSS or something much less powerful!
    No one is ever, EVER, "better off" with VSS.
     

    WRONG. I was better off (in some respects)...I had a fairly good business unit FIXING corrupted VSS repositories, now the modern systems end to work reliably and I have lost that particular income stream.



  • @TheCPUWizard said:

    @Vanders said:

    @English Man said:
    They'd be better off with VSS or something much less powerful!

    No one is ever, EVER, "better off" with VSS.
     

    WRONG. I was better off (in some respects)...I had a fairly good business unit FIXING corrupted VSS repositories, now the modern systems end to work reliably and I have lost that particular income stream.


    Had you been fixing some saner VCS, like, say, renaming files with the number of nanoseconds since the last checkout, you would have been much better off. So the point still stands.



  • @PJH said:

    Current job uses svn, and we do branching, which while it wasn't entirely new to me when we started doing it, is currently overused. Because there are too many branches 'open' at the moment, and stuff that gets fixed in one branch usually needs porting to the other 10 (I exaggerate, but only slightly) branches. A problem fixed in 6.18.3b4 which is still out there? Well that fix needs to go into 6.18.4b7, 6.18.5rc, 6.18.6a4 which are also all still current. Oh, and all the 6.19.x branches. And with trunk.

    We do feature branches, but always merge back and build and release from the trunk. It makes multiple concurrent work streams much easier. Once a fix is on the trunk, every release is guaranteed to get it.

    I'm not even sure how you guys do what you do... how do you know which branch gets named 6.18.4 and which gets named 6.18.5 before the work has started? Is the development lead psychic or just hopeful that they'll be released in that order?



  • @Ben L. said:

    @TheCPUWizard said:

    @Vanders said:

    @English Man said:
    They'd be better off with VSS or something much less powerful!

    No one is ever, EVER, "better off" with VSS.
     

    WRONG. I was better off (in some respects)...I had a fairly good business unit FIXING corrupted VSS repositories, now the modern systems end to work reliably and I have lost that particular income stream.


    Had you been fixing some saner VCS, like, say, renaming files with the number of nanoseconds since the last checkout, you would have been much better off. So the point still stands.

     I dont see why a client would pay me to "rename files with....".  On the other hand, when a VSS probem occured and the client was at risk of losing their information they were willing to pay handsomely to recovr the data back to a usable state. Once word spread that this service was available, it reached a fairly wide (albiet regional) audience and there was a near constant flow of billable work. Since the tools for recovery had been developed, the jobs were often staight forward (even if tedious) and profitable. What could be better for a company?


  • Discourse touched me in a no-no place

    @Jaime said:

    @PJH said:
    Current job uses svn, and we do branching, which while it wasn't entirely new to me when we started doing it, is currently overused. Because there are too many branches 'open' at the moment, and stuff that gets fixed in one branch usually needs porting to the other 10 (I exaggerate, but only slightly) branches. A problem fixed in 6.18.3b4 which is still out there? Well that fix needs to go into 6.18.4b7, 6.18.5rc, 6.18.6a4 which are also all still current. Oh, and all the 6.19.x branches. And with trunk.

    We do feature branches, but always merge back and build and release from the trunk. It makes multiple concurrent work streams much easier. Once a fix is on the trunk, every release is guaranteed to get it.

    I'm not even sure how you guys do what you do... how do you know which branch gets named 6.18.4 and which gets named 6.18.5 before the work has started? Is the development lead psychic or just hopeful that they'll be released in that order?

    They get named on release. 6.18.5 should supersede 6.18.4, and if someone on 6.18.4 has a problem they are recommended to move to either 6.18.5 if it fixes their problem or 6.19 stream if it requires work. Problems come when project managers don't want to move to 6.19, so we have to back-port a fix to 6.18.5 and re-release that branch, whereupon the release would be known as 6.18.6. *That's* where the idiocy is - instead of simply retiring the 6.18 branch, we continue to support it. Along with 6.17, 6.16 etc, if they have the bug in (they usually will if it has the feature to begin with.)

    We don't do 'feature branches' in the way that I'm assuming you're talking about. At least not 'publicly' - I've used a couple myself while trying to integrate LDAP for example, but it's not something we generally do as a department.

Log in to reply