What is the Truck-Factor of Popular GitHub Applications?



  • @boomzilla said:

    Ah, yes, figured that out....now I get:

    Welcome to Node, where nobody tells you how to get things working.

    Try just typing in random trendy JS build systems and see how far you get:

    npm update
    npm install
    npm upgrade
    grunt
    grunt run
    grunt upgrade
    grunt update
    gulp
    gulp upgrade

    etc. you'll hit on it eventually.



  • Free software is awesome and benefits everyone, but unfortunately without a reliable source of income it's hard to get good developers working on it.

    The only way I can think of is to get the government to fund it. I've been meaning to start a thread about that but I'm too lazy.


  • ♿ (Parody)

    I hope you're happy. I now have a github account.

    I probably shouldn't have included the change to async.eachSeries since that slows things down, but meh...


  • I survived the hour long Uno hand

    PR accepted :)

    WTF Discourse toaster? Yes, the topic is important to me, but I'm directly responding to people already in the fucking conversation. Gah!


  • I survived the hour long Uno hand

    @blakeyrat said:

    Welcome to Node, where nobody tells you how to get things working.

    It's actually baked into the project.json when you init a project: "main": "index.js", tells you what file to run to kick off the program. But I also added it to my readme because I'm nice like that.


  • Java Dev

    A quick google suggests async.eachLimit should be able to limit number of open files. Though really, in an async environment, you'd expect a file open to defer if you're running into too many open files - maybe you can handle that error by just retrying at a later time?


  • ♿ (Parody)

    Yes! eachLimit works. I set it to 200 and definitely seemed faster. I agree that deferring would be a better solution.


  • Discourse touched me in a no-no place

    @anonymous234 said:

    Free software is awesome [...] hard to get good developers

    :wtf: Are you saying software made by bad developers is awesome?


  • I survived the hour long Uno hand

    Bookmarked for when I have time to look into it


  • FoxDev

    ideally you should also include a script names start so you can npm start the project as well.

    see sockbot2.0 for example



  • No, it's hard to consistently get good developers to work on them, which is why so many of those projects are maintained by just one or two.


  • I survived the hour long Uno hand

    @cartman82 said:

    Slightly anonymized.

    How slightly?

    I found what I believe to be the bug. However, my test input lines look like:

    ^91f49a0 (Yami 2015-07-13 16:19:46 -0400 1)

    While yours look like

    add7a76f scripts/inspect.sh (cartmans.name 2014-02-26 13:17:57 +0100 1) #!/bin/bash

    Note the script name? I'm not sure why it's showing up there but not in my saved git blame output. I don't know git enough to know if that's a real thing.

    That said, I found one source of the bug!


  • I survived the hour long Uno hand

    Hrm. I got my copy of git to add the extra data to blame as well. And a few other flags I didn't know existed.

    It's not entirely gone now, but my latest push is much better. Plus, it grabs the whole username, not just the first word!


  • I survived the hour long Uno hand

    So now it also condenses duplicate names. It will key off your email, not your name, and will pick the most-used name for that email address to report out on.



  • Hmmm, this can't be right:

    ^Ckane@kane-TECRA-Z40-B:~/projects/BusFactor$ node index.js -r ../site-monitor/ -t git
    File owners:
    ========================
    		Gemfile
     ---- SNIP FOR BREVITY, SEE THE NEXT POST ---
    		lib/site_monitor/site_check.rb
    
    
    
    Authors:
    ========================
    (15) [100%]
    ---------Bus has killed project----------
    
    
    
    Bus factor: 1
    Total files: 15
    
    

    Yep, usename detecting is failing me for some reason.

    kane@kane-TECRA-Z40-B:~/projects/BusFactor$ git --version
    git version 2.1.0
    kane@kane-TECRA-Z40-B:~/projects/BusFactor$ node --version
    v0.10.25
    kane@kane-TECRA-Z40-B:~/projects/BusFactor$ npm --version
    1.4.21
    


  • @Yamikuronue your latest 2 commits introduced a regression.

    kane@kane-TECRA-Z40-B:~/projects/BusFactor$ git checkout HEAD^^
    kane@kane-TECRA-Z40-B:~/projects/BusFactor$ node index.js -r ../site-monitor/ -t git
    File owners:
    ========================
    Kane York		Gemfile
    Kane York		Gemfile.lock
    Arpit Jalan		README.md
    Arpit Jalan		asset_monitor.rb
    Kane York		env.rb
    Arpit Jalan		monitor_helper.rb
    Kane York		notification_helper.rb
    Kane York		prod_changes.patch
    Kane York		site_monitor.rb
    Kane York		extra-certificates/Gandi Standard SSL CA.pem
    Kane York		extra-certificates/ca-bundle.pem
    Kane York		lib/site_monitor/all.rb
    Kane York		lib/site_monitor/result.rb
    Kane York		lib/site_monitor/site.rb
    Kane York		lib/site_monitor/site_check.rb
    
    
    
    Authors:
    ========================
    Kane York(12) [80%]
    ---------Bus has killed project----------
    Arpit Jalan(3) [20%]
    
    
    
    Bus factor: 1
    Total files: 15
    

    Hmm, I don't think that assessment is particularly fair. Here's a command on the repo, output modified to be RLE:

    kane@kane-TECRA-Z40-B:~/projects/site-monitor$ git log --format=%aN
    37 Kane York
    33 Arpit Jalan
    01 Sam
    


  • @Yamikuronue said:

    How slightly?

    Just the name search/replaced.


  • I survived the hour long Uno hand

    Can I get a sample git blame?

    @riking said:

    Hmm, I don't think that assessment is particularly fair.

    The methodology accounts for people making frequent commits to the same few files by counting the number of lines in a file that were actually altered by that person. If someone comes in and rewrites an entire file in one commit, they own it more than the guy who did 12 commits to it before then.


  • FoxDev

    although to be fair, and i've pointed it out before, it would be better to assign partial ownership of the file.

    as it stands if you have a file where:

    • A owns 25% of the lines
    • B owns 35% of the lines
    • C owns 40% of the lines

    C would own the file, despite actually owning a minority of the file.

    proportionate ownership by lines rather than files would be a better metric.


  • I survived the hour long Uno hand

    Yeah, but that would need to have a completely different implementation. And it's not strictly better; it's better in that case, but inthe case where

    • A owns 10% from making a clever PR
    • B owns 10% ditto
    • C owns 10% ditto
    • D owns 5% from doing some global documentation
    • E owns 35% remaining from having originally wrote the file
    • F-K each own 5% from small tweaks and so on, hit-and-run type contributions

    It's fair to say E is the one who really understands the file the most.


  • FoxDev

    well yes, and you would want to ideally ignore comment and blank lines, as well as lines that encode no logic (} and }); alone on a line spring to mind.)

    and yes, BusFactor will be a heuristic no matter what we do. because you could have someone codereviewing everything but never actually comiting anything who understands 100% of the code


  • Discourse touched me in a no-no place

    @accalia said:

    C would own the file, despite actually owning a minority of the file.

    Well, we do have a term, plurality, for that.


  • I survived the hour long Uno hand

    I made a whole bunch of updates to the git algorithm over the last few days. I think I've got all the blank author bugs worked out now. Let me know if I haven't :)



  • I ran commit cbbac against https://github.com/torvalds/linux.git and got the following error after 23 hours:

    Error while scanning repository!
    [Error: Documentation/blockdev/drbd/conn-states-8.dot is not a Word Document.
    ]

    Looking at this file, it appears to be a plain text [DOT file](https://en.wikipedia.org/wiki/DOT_(graph_description_language) rather than a Word template. I briefly looked over your code, but I couldn't see anywhere you're trying to open the file using the OS's default program.

    This is on Windows 8.1.


  • I survived the hour long Uno hand

    @Choonster said:

    I couldn't see anywhere you're trying to open the file using the OS's default program.

    I'm.... not. At all. WTF?

    That's gotta be a problem with git blame. All I'm doing is running blame and parsing the output. Errors encountered while scanning are reported like that output you showed, but they're all coming from the underlying source control system. Errors with my code usually will show up as either a filesystem problem during checkout or authors being listed as "unknown!".

    Can you run a git blame on that file?



  • Running git blame Documentation/blockdev/drbd/conn-states-8.dot > blame.txt on the local clone of the repo results in the same "not a Word Document" error in the command prompt, but does produce this in blame.txt:

    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  1) digraph conn_states {
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  2) 	StandAllone  -> WFConnection   [ label = "ioctl_set_net()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  3) 	WFConnection -> Unconnected    [ label = "unable to bind()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  4) 	WFConnection -> WFReportParams [ label = "in connect() after accept" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  5) 	WFReportParams -> StandAllone  [ label = "checks in receive_param()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  6) 	WFReportParams -> Connected    [ label = "in receive_param()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  7) 	WFReportParams -> WFBitMapS    [ label = "sync_handshake()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  8) 	WFReportParams -> WFBitMapT    [ label = "sync_handshake()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700  9) 	WFBitMapS -> SyncSource        [ label = "receive_bitmap()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 10) 	WFBitMapT -> SyncTarget        [ label = "receive_bitmap()" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 11) 	SyncSource -> Connected
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 12) 	SyncTarget -> Connected
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 13) 	SyncSource -> PausedSyncS
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 14) 	SyncTarget -> PausedSyncT
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 15) 	PausedSyncS -> SyncSource
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 16) 	PausedSyncT -> SyncTarget
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 17) 	Connected   -> WFConnection    [ label = "* on network error" ]
    b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 18) }
    
    

  • I survived the hour long Uno hand

    Yeah, so all Bus Factor does is report the error back up the chain. I don't know enough about git to figure that out, anyone? @accalia?



  • I decided to report the error to the Windows fork of Git (link), since it seems to be a problem in Git itself.


  • FoxDev

    i think that's a bug in the windows port. will look into it.



  • Apparently this is expected behaviour, since Git for Windows is configured to use its own Word helper to visualise .dot files by default (see the issue for more details). You can override this by adding *.dot diff=-astextplain to the repo's .gitattributes or .git/info/attributes file.

    I'm not sure if reporting warnings on stderr is appropriate. Should BusFactor explicitly ignore this particular warning or just expect the user to add the attributes file?


  • Discourse touched me in a no-no place

    @anonymous234 said:

    The only way I can think of is to get the government to fund it.

    You don't want that. Governments are fad-prone and inclined to thunder off to the next big thing just at the point where they're needed to keep the support going. What you should be looking for is small businesses building themselves on top of the language stack that you're wanting to take large, since once they're going, they're committed to keeping things working.



  • But I have it on good authority that government is the solution to all our problems. Like having too much money, and too many choices for who pays for healthcare, and... :trollface:


  • Discourse touched me in a no-no place

    @izzion said:

    But I have it on good authority that government is the solution to all our problems. Like having too much money, and too many choices for who pays for healthcare, and... :trollface:

    Well, there is that, but I was thinking much more specifically. Governments are large organisations, and as such they tend to switch from technology to technology in tune with whatever some know-nothing leader has read in Popular Astrology that week. I've seen this sort of thing happen from non-governments too: it's the same stupid pattern, and as a good software engineer I recognise such things.

    Stuff survives when people actually commit to it. Small businesses are best for that, especially if it lets them grow a bit bigger but not so much that they become able to start on the road to becoming big businesses.



  • This post is deleted!

Log in to reply