Git for noobs. Or something else?



  • I was tasked with figuring out how a bunch of photography/video producer type people can synchronize their work with a corporate server and with each other. The files can be massive (gigabytes), it's all binary/photography type content, and they are mostly working on Windows.

    Right now, they are copying files manually through FTP, which is slow and error prone. Also, a bunch of projects end up trapped in their private computers, with mismatched versions, etc.

    All the usual pre-version control problems. Therefore, the most obvious solution is version control.

    My first thought was to setup some kind of noob-friendly git system. They use a simple GUI to add files, write a message and push them to server. That's all they need in 99% of cases. In case of problems, there's still git underneath, with all its powerful features.

    Unfortunately, all the git GUI-s I've seen are way too complicated and set in the git terminology for non-programmers to use.

    The best one I've seen so far is GitHub for Windows. It would be ideal, but it is tied to using GitHub as host. Since we already maintain our own servers (and we'd like to keep FTP access), that's not in the cards.

    What do you think?
    Do I keep looking for git-based solution? Have some suggestion?
    Try an alternate VCS (SVN)?
    Try to hack GitHub GUI to talk with our custom server (I know it's possible, but user unfriendly)?

    Look into some altogether different solution?
    Like self-hosted file sync (OwnCloud)?
    Or maybe even some custom-coded solution, that could be tied into our other software (appealing to me!)?


  • Discourse touched me in a no-no place

    @cartman82 said in Git for noobs. Or something else?:

    Unfortunately, all the git GUI-s I've seen are way too complicated and set in the git terminology for non-programmers to use.

    The usual problem with Git is that it is extremely hard to keep the gory bits out of sight. The developers of it believe that it is important that people have access to that sort of thing and control over it. Yes, there are scenarios where that's true, but you're not in one of them. πŸ™‚ One of the other VCSs would be a better choice, and given that you're dealing with binary content where you usually want to either update to current or be the sole editing party, probably SVN.

    The biggest difference of all is going from not version controlled to version controlled. Everything after that is gravy.


  • Fake News

    In general I would not think of Git as fit for massive content - remember that each and every file version is kept locally and in an indivisible tree structure:

    • Somebody added a file they didn't need anyway? Sucks, it remains on everyone's disk.
    • Someone need only the files for Trip_Italy? Sucks, either you need to wait all day to clone the entire repo or you need to ask someone to copy the files you need out of version control.

    You could add something like Git Large File Storage to alleviate many of these problems, but it can double the amount of cryptic failure modes, and I haven't used it before so I have no idea if GUIs support it at all.

    Subversion will likely be a better alternative: it's simple, fast enough and can do some of the things I mentioned above. Only drawbacks are that it will save all current files on your disk twice, uncompressed: the "working version" and the "repo version from last update". You won't be able to sync between different computers without both having access to the central repo, but then you also don't have to deal with the possible differences between distributed repositories or local branches.

    The only impossible thing when using any VCS is FTP write access - how would you deal with conflicts or broken connections? Or are you thinking about having only read-only access through FTP with a background job continuously updating some checked out SVN working copy in a folder shared through FTP?


  • kills Dumbledore

    SVN and the TortoiseSVN shell extension was my first thought. Nice and easy to use once someone knowledgable has set up the server side, you can check out only the files you want, and the more complicated stuff is there if you need it.

    Or a shared Dropbox account since that has its own versioning which one would assume is more targetted towards binaries given what people store in DB; but you did mention having your own server, and multi gig files would cost a lot.

    But don't use Git. You're asking for something simple and non technical for large binary files. That's basically everything Git isn't. You don't have a nail here so drop the Git shaped hammer



  • @JBert said in Git for noobs. Or something else?:

    Somebody added a file they didn't need anyway? Sucks, it remains on everyone's disk.
    Someone need only the files for Trip_Italy? Sucks, either you need to wait all day to clone the entire repo or you need to ask someone to copy the files you need out of version control.

    Also, someone accidentally committed 10 GB of files into their repo, then deleted them? Too bad, those 10GB remain in the repo, unless you know the obscure command to prune crap from history.



  • We used a system with (I think) BitTorrent Sync to sync files across user machines and a small compute cluster at a startup. It seems that it's been renamed to Resilio Sync since; can't tell you if it still works the same.

    Essentially, it uses the BitTorrent protocol to synch files onto different machines. From what I remember, it's actually fairly smart about it, i.e., distributing different parts of the file to different machines which can then send on their parts and so. It was relatively painless to set up on a LAN (some 40 nodes or so), and even involve a machine or two outside of the LAN. Locally, you point it to a directory, and it will simply pick up files in that directory.

    What you don't get compared to SVN or GIT is the history. In our case, we didn't really want that anyway, since we didn't have the storage to keep all data around, many of the files where synthetic and only required for a specific run.

    FWIW, when looking up BitTorrent Sync, I stumbled across this list on Wikipedia. Maybe worth a look if you decide to go for synch software instead of a VCS.


  • Discourse touched me in a no-no place

    @cartman82 said in Git for noobs. Or something else?:

    Also, someone accidentally committed 10 GB of files into their repo, then deleted them? Too bad, those 10GB remain in the repo, unless you know the obscure command to prune crap from history.

    That's probably going to happen anyway, and you can just know that if you prune the history, they'll shortly afterwards want to undelete the files because they're important after all. Even if you wait for 18 months between the deletion and the pruning.

    Lusers gotta luse.



  • @JBert said in Git for noobs. Or something else?:

    Subversion will likely be a better alternative: it's simple, fast enough and can do some of the things I mentioned above. Only drawbacks are that it will save all current files on your disk twice, uncompressed: the "working version" and the "repo version from last update". You won't be able to sync between different computers without both having access to the central repo, but then you also don't have to deal with the possible differences between distributed repositories or local branches.

    @Jaloopa said in Git for noobs. Or something else?:

    SVN and the TortoiseSVN shell extension was my first thought. Nice and easy to use once someone knowledgable has set up the server side, you can check out only the files you want, and the more complicated stuff is there if you need it.

    Yeah, SVN seems like the better choice in some regards.

    Downsides:

    • Everything else in our network is git, all the tooling, experience...
    • It doesn't seem to have much of a future, all the development and mindshare is in git.
    • It's still version control, eg. you need to worry about changesets, conflicts, commits...

    @JBert said in Git for noobs. Or something else?:

    The only impossible thing when using any VCS is FTP write access - how would you deal with conflicts or broken connections? Or are you thinking about having only read-only access through FTP with a background job continuously updating some checked out SVN working copy in a folder shared through FTP?

    Yeah, that's a strike against any VCS solution.

    @Jaloopa said in Git for noobs. Or something else?:

    Or a shared Dropbox account since that has its own versioning which one would assume is more targetted towards binaries given what people store in DB; but you did mention having your own server, and multi gig files would cost a lot.

    @cvi said in Git for noobs. Or something else?:

    We used a system with (I think) BitTorrent Sync to sync files across user machines and a small compute cluster at a startup. It seems that it's been renamed to Resilio Sync since; can't tell you if it still works the same.

    Essentially, it uses the BitTorrent protocol to synch files onto different machines. From what I remember, it's actually fairly smart about it, i.e., distributing different parts of the file to different machines which can then send on their parts and so. It was relatively painless to set up on a LAN (some 40 nodes or so), and even involve a machine or two outside of the LAN. Locally, you point it to a directory, and it will simply pick up files in that directory.

    What you don't get compared to SVN or GIT is the history. In our case, we didn't really want that anyway, since we didn't have the storage to keep all data around, many of the files where synthetic and only required for a specific run.

    This kind of solution is actually looking better and better.

    I spoke with some of the guys. They don't really need a lot of the benefits of vcs (history, collaboration). Also, VCS-s are kind of optimized to work with text-based data. They lose lots of benefits with binary data they'll be producing.

    Right now, I have a test setup of OwnCloud, so I'll promote that to production and see how it goes.
    We also looked into BitTorrent Sync, don't remember why we went with OwnCloud. I'll give them another look.


  • SockDev

    @cartman82 said in Git for noobs. Or something else?:

    unless you know the obscure command to prune crap from history.

    which if you do that requires EVERYONE with a clone of the repo to drop their clone and clone a fresh one, otherwise they will merge your edited history into theirs and not only will those 10 Gig of files come back but all the rewritten commits will get duplicated because they're no longer the same commit because your edit caused a divergent history.


  • I survived the hour long Uno hand

    I mean, it sounds like you really just want Dropbox. Do they have an on-premises solution? Or any of their competitors, of which there are like four now?



  • @cartman82 said in Git for noobs. Or something else?:

    We also looked into BitTorrent Sync, don't remember why we went with OwnCloud. I'll give them another look.

    Looks like OwnCloud uses a central server. BT Sync had (for us, back then) the advantage of being purely peer-to-peer, so nodes could swap data directly, and that we didn't need a central server that would potentially get swamped. Then again, backups are way easier if you do have the central server.



  • @cartman82 said in Git for noobs. Or something else?:

    This kind of solution [BTSync]Β is actually looking better and better.

    In that case, may I point your interest towards SyncThing ? I've tried both BTSync and Syncthing. BTSync failed when I fed it many small files (1 000 000 of them; I was simulating my photos collection), whereas Syncthing sailed through -- with a much higher transfer rate there and for my other tests.

    ... Also, in Syncthing you set up per host how many versions of the files it should keep

    Plus, it's open source 🚎



  • @Mikael_Svahnberg said in Git for noobs. Or something else?:

    In that case, may I point your interest towards SyncThing ? I've tried both BTSync and Syncthing. BTSync failed when I fed it many small files (1 000 000 of them; I was simulating my photos collection), whereas Syncthing sailed through -- with a much higher transfer rate there and for my other tests.
    ... Also, in Syncthing you set up per host how many versions of the files it should keep
    Plus, it's open source

    Hmm, Syncthing vs OwnCloud

    Seems everyone's bitching about OwnCloud's speed. From my personal tests, I can confirm it's awfully slow, but I thought that was due to my home internet. I guess not.

    That's it. I'll go with Syncthing for now.

    Thanks @Mikael_Svahnberg !



  • @Jaloopa said in Git for noobs. Or something else?:

    But don't use Git. You're asking for something simple and non technical for large binary files. That's basically everything Git isn't. You don't have a nail here so drop the Git shaped hammerspanner

    FTFY.


  • Impossible Mission - B

    @Jaloopa said in Git for noobs. Or something else?:

    But don't use Git. You're asking for something simple and non technical for large binary files. That's basically everything Git isn't. You don't have a nail here so drop the Git shaped hammer

    Thor Puts His Hammer Down - The Avengers HD – 00:31
    — bahrom7893


  • kills Dumbledore

    @masonwheeler massively unwieldy, technically powerful, those who can pick it up claim it's because they're worthy and you're not.

    Sounds like a good analogy, yeah



  • Once upon a time we called these image or content management systems. Then the Web folks co-opted the term for website builder things, though it appears that "digital asset management" is the more generic term now.

    Unfortunately, my knowledge of them is some 15 years out of date; when I worked for a desktop publishing company the one we made was designed for Mac OS 7. Poking around the Net I see Canto Cumulus is still around, so there's a name for you to look into in case the file sync solution doesn't work out.

    Feels great to be helpful. πŸ™‚



  • We're just about to start using the locally hosted SeaFile Pro https://www.seafile.com/en/home/

    Seems to do everything we want for sharing internally and also externally with clients. Might be worth a look for you.


  • Winner of the 2016 Presidential Election

    @Mikael_Svahnberg said in Git for noobs. Or something else?:

    In that case, may I point your interest towards SyncThing ?

    0_1472757997727_upload-c79db6fa-20c3-45ad-9c33-585b60a7535e
    0_1472758010472_fry.png
    0_1472758053949_upload-906a6020-5a88-486e-a8db-446d3f8358d6
    undefined


  • Winner of the 2016 Presidential Election

    OT: I like git, but I'm definitely on-board with "not for this use". I actually tried that with my photo collection once (several hundred GBs). I knew it was gonna be painful*, especially the initial import, but something later on re-emphasized the bad fit: every time git checks for changes, it may scan all the working files to see what changed, which can mean checking the checksum of every single file, depending on settings and changes made.

    I've seen perforce mentioned for this type of workload (esp. in regard to game assets during development), but I've never looked into further than that.


    *: "Why did you do it then?" Curiosity. I wanted to see how usable, if at all, it was despite definitely being an inappropriate load.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.