What is the Truck-Factor of Popular GitHub Applications?
-
Ah, yes, figured that out....now I get:
Welcome to Node, where nobody tells you how to get things working.
Try just typing in random trendy JS build systems and see how far you get:
npm update
npm install
npm upgrade
grunt
grunt run
grunt upgrade
grunt update
gulp
gulp upgradeetc. you'll hit on it eventually.
-
Free software is awesome and benefits everyone, but unfortunately without a reliable source of income it's hard to get good developers working on it.
The only way I can think of is to get the government to fund it. I've been meaning to start a thread about that but I'm too lazy.
-
I hope you're happy. I now have a github account.
I probably shouldn't have included the change to async.eachSeries since that slows things down, but meh...
-
PR accepted :)
WTF Discourse toaster? Yes, the topic is important to me, but I'm directly responding to people already in the fucking conversation. Gah!
-
Welcome to Node, where nobody tells you how to get things working.
It's actually baked into the project.json when you init a project:
"main": "index.js",
tells you what file to run to kick off the program. But I also added it to my readme because I'm nice like that.
-
A quick google suggests async.eachLimit should be able to limit number of open files. Though really, in an async environment, you'd expect a file open to defer if you're running into too many open files - maybe you can handle that error by just retrying at a later time?
-
Yes! eachLimit works. I set it to 200 and definitely seemed faster. I agree that deferring would be a better solution.
-
Free software is awesome [...] hard to get good developers
Are you saying software made by bad developers is awesome?
-
Bookmarked for when I have time to look into it
-
ideally you should also include a script names start so you can
npm start
the project as well.see sockbot2.0 for example
-
No, it's hard to consistently get good developers to work on them, which is why so many of those projects are maintained by just one or two.
-
Slightly anonymized.
How slightly?
I found what I believe to be the bug. However, my test input lines look like:
^91f49a0 (Yami 2015-07-13 16:19:46 -0400 1)
While yours look like
add7a76f scripts/inspect.sh (cartmans.name 2014-02-26 13:17:57 +0100 1) #!/bin/bash
Note the script name? I'm not sure why it's showing up there but not in my saved git blame output. I don't know git enough to know if that's a real thing.
That said, I found one source of the bug!
-
Hrm. I got my copy of git to add the extra data to blame as well. And a few other flags I didn't know existed.
It's not entirely gone now, but my latest push is much better. Plus, it grabs the whole username, not just the first word!
-
So now it also condenses duplicate names. It will key off your email, not your name, and will pick the most-used name for that email address to report out on.
-
Hmmm, this can't be right:
^Ckane@kane-TECRA-Z40-B:~/projects/BusFactor$ node index.js -r ../site-monitor/ -t git File owners: ======================== Gemfile ---- SNIP FOR BREVITY, SEE THE NEXT POST --- lib/site_monitor/site_check.rb Authors: ======================== (15) [100%] ---------Bus has killed project---------- Bus factor: 1 Total files: 15
Yep, usename detecting is failing me for some reason.
kane@kane-TECRA-Z40-B:~/projects/BusFactor$ git --version git version 2.1.0 kane@kane-TECRA-Z40-B:~/projects/BusFactor$ node --version v0.10.25 kane@kane-TECRA-Z40-B:~/projects/BusFactor$ npm --version 1.4.21
-
@Yamikuronue your latest 2 commits introduced a regression.
kane@kane-TECRA-Z40-B:~/projects/BusFactor$ git checkout HEAD^^ kane@kane-TECRA-Z40-B:~/projects/BusFactor$ node index.js -r ../site-monitor/ -t git File owners: ======================== Kane York Gemfile Kane York Gemfile.lock Arpit Jalan README.md Arpit Jalan asset_monitor.rb Kane York env.rb Arpit Jalan monitor_helper.rb Kane York notification_helper.rb Kane York prod_changes.patch Kane York site_monitor.rb Kane York extra-certificates/Gandi Standard SSL CA.pem Kane York extra-certificates/ca-bundle.pem Kane York lib/site_monitor/all.rb Kane York lib/site_monitor/result.rb Kane York lib/site_monitor/site.rb Kane York lib/site_monitor/site_check.rb Authors: ======================== Kane York(12) [80%] ---------Bus has killed project---------- Arpit Jalan(3) [20%] Bus factor: 1 Total files: 15
Hmm, I don't think that assessment is particularly fair. Here's a command on the repo, output modified to be RLE:
kane@kane-TECRA-Z40-B:~/projects/site-monitor$ git log --format=%aN 37 Kane York 33 Arpit Jalan 01 Sam
-
-
Can I get a sample git blame?
Hmm, I don't think that assessment is particularly fair.
The methodology accounts for people making frequent commits to the same few files by counting the number of lines in a file that were actually altered by that person. If someone comes in and rewrites an entire file in one commit, they own it more than the guy who did 12 commits to it before then.
-
although to be fair, and i've pointed it out before, it would be better to assign partial ownership of the file.
as it stands if you have a file where:
- A owns 25% of the lines
- B owns 35% of the lines
- C owns 40% of the lines
C would own the file, despite actually owning a minority of the file.
proportionate ownership by lines rather than files would be a better metric.
-
Yeah, but that would need to have a completely different implementation. And it's not strictly better; it's better in that case, but inthe case where
- A owns 10% from making a clever PR
- B owns 10% ditto
- C owns 10% ditto
- D owns 5% from doing some global documentation
- E owns 35% remaining from having originally wrote the file
- F-K each own 5% from small tweaks and so on, hit-and-run type contributions
It's fair to say E is the one who really understands the file the most.
-
well yes, and you would want to ideally ignore comment and blank lines, as well as lines that encode no logic (
}
and});
alone on a line spring to mind.)and yes, BusFactor will be a heuristic no matter what we do. because you could have someone codereviewing everything but never actually comiting anything who understands 100% of the code
-
C would own the file, despite actually owning a minority of the file.
Well, we do have a term, plurality, for that.
-
I made a whole bunch of updates to the git algorithm over the last few days. I think I've got all the blank author bugs worked out now. Let me know if I haven't :)
-
I ran commit cbbac against https://github.com/torvalds/linux.git and got the following error after 23 hours:
Error while scanning repository!
[Error: Documentation/blockdev/drbd/conn-states-8.dot is not a Word Document.
]Looking at this file, it appears to be a plain text [DOT file](https://en.wikipedia.org/wiki/DOT_(graph_description_language) rather than a Word template. I briefly looked over your code, but I couldn't see anywhere you're trying to open the file using the OS's default program.
This is on Windows 8.1.
-
I couldn't see anywhere you're trying to open the file using the OS's default program.
I'm.... not. At all. WTF?
That's gotta be a problem with git blame. All I'm doing is running blame and parsing the output. Errors encountered while scanning are reported like that output you showed, but they're all coming from the underlying source control system. Errors with my code usually will show up as either a filesystem problem during checkout or authors being listed as "unknown!".
Can you run a git blame on that file?
-
Running
git blame Documentation/blockdev/drbd/conn-states-8.dot > blame.txt
on the local clone of the repo results in the same "not a Word Document" error in the command prompt, but does produce this in blame.txt:b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 1) digraph conn_states { b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 2) StandAllone -> WFConnection [ label = "ioctl_set_net()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 3) WFConnection -> Unconnected [ label = "unable to bind()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 4) WFConnection -> WFReportParams [ label = "in connect() after accept" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 5) WFReportParams -> StandAllone [ label = "checks in receive_param()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 6) WFReportParams -> Connected [ label = "in receive_param()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 7) WFReportParams -> WFBitMapS [ label = "sync_handshake()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 8) WFReportParams -> WFBitMapT [ label = "sync_handshake()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 9) WFBitMapS -> SyncSource [ label = "receive_bitmap()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 10) WFBitMapT -> SyncTarget [ label = "receive_bitmap()" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 11) SyncSource -> Connected b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 12) SyncTarget -> Connected b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 13) SyncSource -> PausedSyncS b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 14) SyncTarget -> PausedSyncT b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 15) PausedSyncS -> SyncSource b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 16) PausedSyncT -> SyncTarget b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 17) Connected -> WFConnection [ label = "* on network error" ] b411b363 (Philipp Reisner 2009-09-25 16:07:19 -0700 18) }
-
Yeah, so all Bus Factor does is report the error back up the chain. I don't know enough about git to figure that out, anyone? @accalia?
-
I decided to report the error to the Windows fork of Git (link), since it seems to be a problem in Git itself.
-
i think that's a bug in the windows port. will look into it.
-
Apparently this is expected behaviour, since Git for Windows is configured to use its own Word helper to visualise .dot files by default (see the issue for more details). You can override this by adding
*.dot diff=-astextplain
to the repo's .gitattributes or .git/info/attributes file.I'm not sure if reporting warnings on stderr is appropriate. Should BusFactor explicitly ignore this particular warning or just expect the user to add the attributes file?
-
The only way I can think of is to get the government to fund it.
You don't want that. Governments are fad-prone and inclined to thunder off to the next big thing just at the point where they're needed to keep the support going. What you should be looking for is small businesses building themselves on top of the language stack that you're wanting to take large, since once they're going, they're committed to keeping things working.
-
But I have it on good authority that government is the solution to all our problems. Like having too much money, and too many choices for who pays for healthcare, and...
-
But I have it on good authority that government is the solution to all our problems. Like having too much money, and too many choices for who pays for healthcare, and...
Well, there is that, but I was thinking much more specifically. Governments are large organisations, and as such they tend to switch from technology to technology in tune with whatever some know-nothing leader has read in Popular Astrology that week. I've seen this sort of thing happen from non-governments too: it's the same stupid pattern, and as a good software engineer I recognise such things.
Stuff survives when people actually commit to it. Small businesses are best for that, especially if it lets them grow a bit bigger but not so much that they become able to start on the road to becoming big businesses.
-
This post is deleted!