Git for noobs. Or something else?



  • I was tasked with figuring out how a bunch of photography/video producer type people can synchronize their work with a corporate server and with each other. The files can be massive (gigabytes), it's all binary/photography type content, and they are mostly working on Windows.

    Right now, they are copying files manually through FTP, which is slow and error prone. Also, a bunch of projects end up trapped in their private computers, with mismatched versions, etc.

    All the usual pre-version control problems. Therefore, the most obvious solution is version control.

    My first thought was to setup some kind of noob-friendly git system. They use a simple GUI to add files, write a message and push them to server. That's all they need in 99% of cases. In case of problems, there's still git underneath, with all its powerful features.

    Unfortunately, all the git GUI-s I've seen are way too complicated and set in the git terminology for non-programmers to use.

    The best one I've seen so far is GitHub for Windows. It would be ideal, but it is tied to using GitHub as host. Since we already maintain our own servers (and we'd like to keep FTP access), that's not in the cards.

    What do you think?
    Do I keep looking for git-based solution? Have some suggestion?
    Try an alternate VCS (SVN)?
    Try to hack GitHub GUI to talk with our custom server (I know it's possible, but user unfriendly)?

    Look into some altogether different solution?
    Like self-hosted file sync (OwnCloud)?
    Or maybe even some custom-coded solution, that could be tied into our other software (appealing to me!)?


  • Discourse touched me in a no-no place

    @cartman82 said in Git for noobs. Or something else?:

    Unfortunately, all the git GUI-s I've seen are way too complicated and set in the git terminology for non-programmers to use.

    The usual problem with Git is that it is extremely hard to keep the gory bits out of sight. The developers of it believe that it is important that people have access to that sort of thing and control over it. Yes, there are scenarios where that's true, but you're not in one of them. :) One of the other VCSs would be a better choice, and given that you're dealing with binary content where you usually want to either update to current or be the sole editing party, probably SVN.

    The biggest difference of all is going from not version controlled to version controlled. Everything after that is gravy.


  • Fake News

    In general I would not think of Git as fit for massive content - remember that each and every file version is kept locally and in an indivisible tree structure:

    • Somebody added a file they didn't need anyway? Sucks, it remains on everyone's disk.
    • Someone need only the files for Trip_Italy? Sucks, either you need to wait all day to clone the entire repo or you need to ask someone to copy the files you need out of version control.

    You could add something like Git Large File Storage to alleviate many of these problems, but it can double the amount of cryptic failure modes, and I haven't used it before so I have no idea if GUIs support it at all.

    Subversion will likely be a better alternative: it's simple, fast enough and can do some of the things I mentioned above. Only drawbacks are that it will save all current files on your disk twice, uncompressed: the "working version" and the "repo version from last update". You won't be able to sync between different computers without both having access to the central repo, but then you also don't have to deal with the possible differences between distributed repositories or local branches.

    The only impossible thing when using any VCS is FTP write access - how would you deal with conflicts or broken connections? Or are you thinking about having only read-only access through FTP with a background job continuously updating some checked out SVN working copy in a folder shared through FTP?


  • kills Dumbledore

    SVN and the TortoiseSVN shell extension was my first thought. Nice and easy to use once someone knowledgable has set up the server side, you can check out only the files you want, and the more complicated stuff is there if you need it.

    Or a shared Dropbox account since that has its own versioning which one would assume is more targetted towards binaries given what people store in DB; but you did mention having your own server, and multi gig files would cost a lot.

    But don't use Git. You're asking for something simple and non technical for large binary files. That's basically everything Git isn't. You don't have a nail here so drop the Git shaped hammer



  • @JBert said in Git for noobs. Or something else?:

    Somebody added a file they didn't need anyway? Sucks, it remains on everyone's disk.
    Someone need only the files for Trip_Italy? Sucks, either you need to wait all day to clone the entire repo or you need to ask someone to copy the files you need out of version control.

    Also, someone accidentally committed 10 GB of files into their repo, then deleted them? Too bad, those 10GB remain in the repo, unless you know the obscure command to prune crap from history.



  • We used a system with (I think) BitTorrent Sync to sync files across user machines and a small compute cluster at a startup. It seems that it's been renamed to Resilio Sync since; can't tell you if it still works the same.

    Essentially, it uses the BitTorrent protocol to synch files onto different machines. From what I remember, it's actually fairly smart about it, i.e., distributing different parts of the file to different machines which can then send on their parts and so. It was relatively painless to set up on a LAN (some 40 nodes or so), and even involve a machine or two outside of the LAN. Locally, you point it to a directory, and it will simply pick up files in that directory.

    What you don't get compared to SVN or GIT is the history. In our case, we didn't really want that anyway, since we didn't have the storage to keep all data around, many of the files where synthetic and only required for a specific run.

    FWIW, when looking up BitTorrent Sync, I stumbled across this list on Wikipedia. Maybe worth a look if you decide to go for synch software instead of a VCS.


  • Discourse touched me in a no-no place

    @cartman82 said in Git for noobs. Or something else?:

    Also, someone accidentally committed 10 GB of files into their repo, then deleted them? Too bad, those 10GB remain in the repo, unless you know the obscure command to prune crap from history.

    That's probably going to happen anyway, and you can just know that if you prune the history, they'll shortly afterwards want to undelete the files because they're important after all. Even if you wait for 18 months between the deletion and the pruning.

    Lusers gotta luse.



  • @JBert said in Git for noobs. Or something else?:

    Subversion will likely be a better alternative: it's simple, fast enough and can do some of the things I mentioned above. Only drawbacks are that it will save all current files on your disk twice, uncompressed: the "working version" and the "repo version from last update". You won't be able to sync between different computers without both having access to the central repo, but then you also don't have to deal with the possible differences between distributed repositories or local branches.

    @Jaloopa said in Git for noobs. Or something else?:

    SVN and the TortoiseSVN shell extension was my first thought. Nice and easy to use once someone knowledgable has set up the server side, you can check out only the files you want, and the more complicated stuff is there if you need it.

    Yeah, SVN seems like the better choice in some regards.

    Downsides:

    • Everything else in our network is git, all the tooling, experience...
    • It doesn't seem to have much of a future, all the development and mindshare is in git.
    • It's still version control, eg. you need to worry about changesets, conflicts, commits...

    @JBert said in Git for noobs. Or something else?:

    The only impossible thing when using any VCS is FTP write access - how would you deal with conflicts or broken connections? Or are you thinking about having only read-only access through FTP with a background job continuously updating some checked out SVN working copy in a folder shared through FTP?

    Yeah, that's a strike against any VCS solution.

    @Jaloopa said in Git for noobs. Or something else?:

    Or a shared Dropbox account since that has its own versioning which one would assume is more targetted towards binaries given what people store in DB; but you did mention having your own server, and multi gig files would cost a lot.

    @cvi said in Git for noobs. Or something else?:

    We used a system with (I think) BitTorrent Sync to sync files across user machines and a small compute cluster at a startup. It seems that it's been renamed to Resilio Sync since; can't tell you if it still works the same.

    Essentially, it uses the BitTorrent protocol to synch files onto different machines. From what I remember, it's actually fairly smart about it, i.e., distributing different parts of the file to different machines which can then send on their parts and so. It was relatively painless to set up on a LAN (some 40 nodes or so), and even involve a machine or two outside of the LAN. Locally, you point it to a directory, and it will simply pick up files in that directory.

    What you don't get compared to SVN or GIT is the history. In our case, we didn't really want that anyway, since we didn't have the storage to keep all data around, many of the files where synthetic and only required for a specific run.

    This kind of solution is actually looking better and better.

    I spoke with some of the guys. They don't really need a lot of the benefits of vcs (history, collaboration). Also, VCS-s are kind of optimized to work with text-based data. They lose lots of benefits with binary data they'll be producing.

    Right now, I have a test setup of OwnCloud, so I'll promote that to production and see how it goes.
    We also looked into BitTorrent Sync, don't remember why we went with OwnCloud. I'll give them another look.


  • FoxDev

    @cartman82 said in Git for noobs. Or something else?:

    unless you know the obscure command to prune crap from history.

    which if you do that requires EVERYONE with a clone of the repo to drop their clone and clone a fresh one, otherwise they will merge your edited history into theirs and not only will those 10 Gig of files come back but all the rewritten commits will get duplicated because they're no longer the same commit because your edit caused a divergent history.


  • I survived the hour long Uno hand

    I mean, it sounds like you really just want Dropbox. Do they have an on-premises solution? Or any of their competitors, of which there are like four now?



  • @cartman82 said in Git for noobs. Or something else?:

    We also looked into BitTorrent Sync, don't remember why we went with OwnCloud. I'll give them another look.

    Looks like OwnCloud uses a central server. BT Sync had (for us, back then) the advantage of being purely peer-to-peer, so nodes could swap data directly, and that we didn't need a central server that would potentially get swamped. Then again, backups are way easier if you do have the central server.



  • @cartman82 said in Git for noobs. Or something else?:

    This kind of solution [BTSync] is actually looking better and better.

    In that case, may I point your interest towards SyncThing ? I've tried both BTSync and Syncthing. BTSync failed when I fed it many small files (1 000 000 of them; I was simulating my photos collection), whereas Syncthing sailed through -- with a much higher transfer rate there and for my other tests.

    ... Also, in Syncthing you set up per host how many versions of the files it should keep

    Plus, it's open source 🚎



  • @Mikael_Svahnberg said in Git for noobs. Or something else?:

    In that case, may I point your interest towards SyncThing ? I've tried both BTSync and Syncthing. BTSync failed when I fed it many small files (1 000 000 of them; I was simulating my photos collection), whereas Syncthing sailed through -- with a much higher transfer rate there and for my other tests.
    ... Also, in Syncthing you set up per host how many versions of the files it should keep
    Plus, it's open source

    Hmm, Syncthing vs OwnCloud

    https://www.reddit.com/r/sysadmin/comments/40226g/owncloud_vs_seafile_vs_pydio/

    Seems everyone's bitching about OwnCloud's speed. From my personal tests, I can confirm it's awfully slow, but I thought that was due to my home internet. I guess not.

    That's it. I'll go with Syncthing for now.

    Thanks @Mikael_Svahnberg !


  • :belt_onion:

    @Jaloopa said in Git for noobs. Or something else?:

    But don't use Git. You're asking for something simple and non technical for large binary files. That's basically everything Git isn't. You don't have a nail here so drop the Git shaped hammerspanner

    FTFY.


  • Impossible Mission - B

    @Jaloopa said in Git for noobs. Or something else?:

    But don't use Git. You're asking for something simple and non technical for large binary files. That's basically everything Git isn't. You don't have a nail here so drop the Git shaped hammer

    Thor Puts His Hammer Down - The Avengers – 00:31
    — bahrom7893


  • kills Dumbledore

    @masonwheeler massively unwieldy, technically powerful, those who can pick it up claim it's because they're worthy and you're not.

    Sounds like a good analogy, yeah



  • Once upon a time we called these image or content management systems. Then the Web folks co-opted the term for website builder things, though it appears that "digital asset management" is the more generic term now.

    Unfortunately, my knowledge of them is some 15 years out of date; when I worked for a desktop publishing company the one we made was designed for Mac OS 7. Poking around the Net I see Canto Cumulus is still around, so there's a name for you to look into in case the file sync solution doesn't work out.

    Feels great to be helpful. :)


  • 🚽 Regular

    We're just about to start using the locally hosted SeaFile Pro https://www.seafile.com/en/home/

    Seems to do everything we want for sharing internally and also externally with clients. Might be worth a look for you.


  • Winner of the 2016 Presidential Election

    @Mikael_Svahnberg said in Git for noobs. Or something else?:

    In that case, may I point your interest towards SyncThing ?

    0_1472757997727_upload-c79db6fa-20c3-45ad-9c33-585b60a7535e
    0_1472758010472_fry.png
    0_1472758053949_upload-906a6020-5a88-486e-a8db-446d3f8358d6
    :headdesk:


  • Winner of the 2016 Presidential Election

    OT: I like git, but I'm definitely on-board with "not for this use". I actually tried that with my photo collection once (several hundred GBs). I knew it was gonna be painful*, especially the initial import, but something later on re-emphasized the bad fit: every time git checks for changes, it may scan all the working files to see what changed, which can mean checking the checksum of every single file, depending on settings and changes made.

    I've seen perforce mentioned for this type of workload (esp. in regard to game assets during development), but I've never looked into further than that.


    *: "Why did you do it then?" Curiosity. I wanted to see how usable, if at all, it was despite definitely being an inappropriate load.



  • @cartman82, get out of that summoning circle this instant! Precentor Waterly will come back when he's no longer feeling so butthurt about being called out for dissing @ben_lubarin his own good time!



  • @ScholRLEA said in Git for noobs. Or something else?:

    @cartman, get out of that summoning circle this instant! Preceptor Waterly will come back when he's no longer feeling so butthurt about being called out for dissing @ben_lubarin his own good time!

    Who's @cartman? Do you mean @cartman82?



  • @ben_lubar Oh, bugger, I fscked that up, yeah. Let me go fix it, thanks foir the heads' up.


  • Notification Spam Recipient

    @ben_lubar said in Git for noobs. Or something else?:

    Who's @cartman?

    0_1472763091661_upload-4f4066cd-8bd0-4a1c-ae28-bb749d62f90d

    Mind setting that up as a bot puppet? I'm sure some hilarity could ensue...



  • @Tsaukpaetra its first post simply needs to say "I HATE YOU GUYS, SO FUCKIN' MUCH."



  • @Dreikin said in Git for noobs. Or something else?:

    I've seen perforce mentioned for this type of workload (esp. in regard to game assets during development), but I've never looked into further than that.

    FWIW: Back with the DTP company we used Perforce and it worked very well for source and binaries. I ended up setting up a Perforce server at home for my own use and it worked great for my own needs, which included a lot of binary files. (I make lots of game help documents.)

    Unfortunately, again, this was years ago. I switched away from it when I retired my Windows 2000 Server because I wanted to try a distributed VCS setup. Hopefully when Perforce moved to their new engine they didn't hurt their binary advantage.


  • kills Dumbledore

    @Arantor said in Git for noobs. Or something else?:

    @Tsaukpaetra its first post simply needs to say "I HATE YOU GUYS, SO FUCKIN' MUCH."

    No, one post only.

    screw you guys, I'm going home!



  • @cartman82 said in Git for noobs. Or something else?:

    That's it. I'll go with Syncthing for now.

    I have an update on this.

    Syncthing turned out to be meh.

    • A bit unclear and/or outdated install instructions for the server. Even though I used a debian repo install, I had to set up my own systemd service
    • Listens on a localhost:5858 or something, so I had to set up reverse proxy on the server before I could see the interface
    • Interface is bare-bones, but reasonable enough
    • On Windows, you download and run a console application, which works as a self-hosted server for the web interface. The app is only provided as a zipped console app. No installer to speak of. If you want a background service, you're on your own. This is a crap setup for desktop users, especially compared with dropbox.
    • Actually, there is no notion of a "server" with syncthing. Each instance is presumed to be a userspace program running behind a firewall
    • There is no notion of a user. You share particular directories within the system and that's it.
    • Instances find each other using a public network of well known relays. So even though I have a public static IP available, my PC had to go through the relay to find the server.
    • To connect two machines, you have to copy "public ID" of machine 1 into the web interface of machine 2 and vice versa. Clunky.
    • Once two machine know about each other, you watch the little green indicator and wait for it to turn green, meaning the machines have found each other. That happens slowly and unreliably.
    • Once connection is established, upload speed is nothing to write home about.
    • The whole process is reminiscent of the eMule network. Slow to start, slow to transfer, the impatient need not apply.
    • Restarting the syncthing worker apps is a common thing. This sometimes helps the instances find each other. Other times, not so much.
    • Sometimes you get a red "server has crashed" popup and are forced to restart the app. After restart, the server is always there, but some of the settings you were playing with might be reverted.
    • If you are transferring through local network, everything works much better

    Overall, this seems like a nice plaything for a nerd family, but is ill suited for what I need.

    This morning, I installed OwnCloud. Impressions:

    • Server installation is a typical Apache + MySQL setup. Painless if you're used to it (that is, you are a server admin)
    • There is a user management system. Each person that will use your cloud sync gets their own user, with permissions, logging etc. Well suited for corporate use.
    • Client apps are on the same level as Dropbox. Windows installer, wizard, seamless background service. Nice and easy.
    • Once you add stuff to your local folder, OwnCloud immediately starts uploading to the server. No looking at green dot, hoping it blinks.
    • Upload speed is bad. Less than 10MB / min.

    So far, OwnCloud has been a great experience, except the upload speed. I think I'll create accounts for the producers and let them try it out. If the speed becomes an issue, I'll look into some other solution.


  • I survived the hour long Uno hand

    @cartman82 I'm very tempted to move to OwnCloud for managing my media server; I've been using a network share to what is essentially a NAS, but it's not always reliable. Keep me posted on how that works out for you?



  • @Yamikuronue Sure.


  • area_deu

    @cartman82 said in Git for noobs. Or something else?:

    • On Windows, you download and run a console application, which works as a self-hosted server for the web interface. The app is only provided as a zipped console app. No installer to speak of. If you want a background service, you're on your own. This is a crap setup for desktop users, especially compared with dropbox.

    True. There is a GUI add-on somewhere, but usually I just drop a NSSM one-liner via GPO and am done with it. Nothing for the average end user, though.

    • Actually, there is no notion of a "server" with syncthing. Each instance is presumed to be a userspace program running behind a firewall

    I actually like that design. No central server as a SPOF and/or bandwidth and/or disk space bottleneck. Two users want to share another folder with their multi-terabyte porn collection just between them? I don't care.

    • There is no notion of a user. You share particular directories within the system and that's it.

    True, but you can control which devices (or, more correctly, syncthing instances) get which directories.

    • Instances find each other using a public network of well known relays. So even though I have a public static IP available, my PC had to go through the relay to find the server.

    No. You can add the device together with a static IP. Only if you use "dynamic" as IP/hostname, the discovery servers get involved. Plus you can run your own discovery server, if you want to.

    • To connect two machines, you have to copy "public ID" of machine 1 into the web interface of machine 2 and vice versa. Clunky.

    Newer versions also give you a popup-style message in the web GUI asking if you want to approve the incoming device. But yes, one ID has to be pasted manually.

    • Once two machine know about each other, you watch the little green indicator and wait for it to turn green, meaning the machines have found each other. That happens slowly and unreliably.

    I use it as a fire-and-forget solution for background sync. Hasn't let me down yet.

    • Sometimes you get a red "server has crashed" popup and are forced to restart the app. After restart, the server is always there, but some of the settings you were playing with might be reverted.

    Never had that happen on my ~15 devices (Windows and Android). shrug

    So far, OwnCloud has been a great experience, except the upload speed.

    I would happily trade upload speed in the long run for a slighty more complex one-time setup. YMMV. My experiences with OwnCloud (admittedly a few years ago and with a Windows server) were nothing but a world of suck.


  • I survived the hour long Uno hand

    @ChrisH said in Git for noobs. Or something else?:

    No central server as a SPOF and/or bandwidth and/or disk space bottleneck.

    @ChrisH said in Git for noobs. Or something else?:

    you can control which devices (or, more correctly, syncthing instances) get which directories.

    @ChrisH said in Git for noobs. Or something else?:

    Never had that happen on my ~15 devices (Windows and Android). shrug

    @ChrisH said in Git for noobs. Or something else?:

    I would happily trade upload speed in the long run for a slighty more complex one-time setup

    This really sounds like it's the Git of Dropboxes....


  • area_deu

    @Yamikuronue said in Git for noobs. Or something else?:

    This really sounds like it's the Git of Dropboxes....

    No, it actually has a usable GUI in addition to the command line. 🚎



  • @ChrisH said in Git for noobs. Or something else?:

    My experiences with OwnCloud (admittedly a few years ago and with a Windows server) were nothing but a world of suck.

    And you didn't expect it to suck ? :wtf:


  • area_deu

    @TimeBandit said in Git for noobs. Or something else?:

    And you didn't expect it to suck ? :wtf:

    Why should I? I have other PHP applications running there without problems.



  • I suspect "Git for Noobs" would be a single-page book with just a few lines making fun of the reader for not knowing what the hell a rebase is.

    The End.



  • @ChrisH said in Git for noobs. Or something else?:

    No. You can add the device together with a static IP. Only if you use "dynamic" as IP/hostname, the discovery servers get involved. Plus you can run your own discovery server, if you want to.

    I presumed this was the case. I was hoping you can tell the "server" instance "this is your public IP", and then once it finds the other instances, it can relay information. But that didn't work, just kept crashing as I was trying to set the IP binding.

    Expecting users to enter IP themselves could work, but again... clunky.

    @ChrisH said in Git for noobs. Or something else?:

    Newer versions also give you a popup-style message in the web GUI asking if you want to approve the incoming device. But yes, one ID has to be pasted manually.

    My experience with this:

    • (share folder on the "server")
    • There. It should show up on the client now.
    • (wait)
    • Why isn't it working?
    • (wait)
    • Restart client
    • (wait)
    • Restart server
    • (wait)
    • Restart client and server
    • (wait)
    • (go to do other things)
    • (come back much later)
    • Oooh, there is the prompt
    • Click
    • Client crashes

    @ChrisH said in Git for noobs. Or something else?:

    I would happily trade upload speed in the long run for a slighty more complex one-time setup. YMMV. My experiences with OwnCloud (admittedly a few years ago and with a Windows server) were nothing but a world of suck.

    We'll see if the users can live with OneCloud speed. I'm a bit worried about the clunky PHP setup and backups. But the client apps have so far been excellent. And that's mostly what the users will see.


  • :belt_onion:

    @cartman82 said in Git for noobs. Or something else?:

    @cartman82 said in Git for noobs. Or something else?:

    That's it. I'll go with Syncthing for now.

    I have an update on this.

    Syncthing turned out to be meh.

    • A bit unclear and/or outdated install instructions for the server. Even though I used a debian repo install, I had to set up my own systemd service
    • Listens on a localhost:5858 or something, so I had to set up reverse proxy on the server before I could see the interface
    • Interface is bare-bones, but reasonable enough
    • On Windows, you download and run a console application, which works as a self-hosted server for the web interface. The app is only provided as a zipped console app. No installer to speak of. If you want a background service, you're on your own. This is a crap setup for desktop users, especially compared with dropbox.
    • Actually, there is no notion of a "server" with syncthing. Each instance is presumed to be a userspace program running behind a firewall
    • There is no notion of a user. You share particular directories within the system and that's it.
    • Instances find each other using a public network of well known relays. So even though I have a public static IP available, my PC had to go through the relay to find the server.
    • To connect two machines, you have to copy "public ID" of machine 1 into the web interface of machine 2 and vice versa. Clunky.
    • Once two machine know about each other, you watch the little green indicator and wait for it to turn green, meaning the machines have found each other. That happens slowly and unreliably.
    • Once connection is established, upload speed is nothing to write home about.
    • The whole process is reminiscent of the eMule network. Slow to start, slow to transfer, the impatient need not apply.
    • Restarting the syncthing worker apps is a common thing. This sometimes helps the instances find each other. Other times, not so much.
    • Sometimes you get a red "server has crashed" popup and are forced to restart the app. After restart, the server is always there, but some of the settings you were playing with might be reverted.
    • If you are transferring through local network, everything works much better

    Overall, this seems like a nice plaything for a nerd family, but is ill suited for what I need.

    This morning, I installed OwnCloud. Impressions:

    • Server installation is a typical Apache + MySQL setup. Painless if you're used to it (that is, you are a server admin)
    • There is a user management system. Each person that will use your cloud sync gets their own user, with permissions, logging etc. Well suited for corporate use.
    • Client apps are on the same level as Dropbox. Windows installer, wizard, seamless background service. Nice and easy.
    • Once you add stuff to your local folder, OwnCloud immediately starts uploading to the server. No looking at green dot, hoping it blinks.
    • Upload speed is bad. Less than 10MB / min.

    So far, OwnCloud has been a great experience, except the upload speed. I think I'll create accounts for the producers and let them try it out. If the speed becomes an issue, I'll look into some other solution.

    Try SyncTrazor. Makes the setup much simpler.


  • area_deu

    @sloosecannon said in Git for noobs. Or something else?:

    SyncTrazor

    Oooh, that looks nice. They should deliver that as standard instead of their shitty command line thing.


Log in to reply