More Go Chum In The WTF Ocean
-
@morbiuswilters said:
@drurowin said:
A statically-linked build of p7zip is a whopping 275% faster at compression on my machine...
I call bullshit. There's no way static linking is going to give performance gains like that. If you're not: 1) lying; or 2) unable to do a simple benchmark accurately, then I'd say it's a result of something else you did.
In fact, this illustrates just how clueless you are. Anybody who understands compilers, linking and PIC would never claim a 275% increase in performance from static linking. Even a knowledgeable person in favor of static linking (if such a thing exists) would instantly realize that that large of a performance gain is ridiculous and something must be wrong. This just shows that, once again, you have no fucking clue what you are talking about.
The methodology was "time 7z a test.file /path/to/10/gb/of/data; time 7zstatic a test2.file /path/to/10/gb/of/data". I swear I am not making this up. I'll rerun the test later and even post a video.
-
@drurowin said:
I'll rerun the test later and even post a video.
Why don't you finish drawing and coloring your avatar first.
-
@Ronald said:
Why don't you finish drawing and coloring your avatar first.
He probably has too much trouble gripping the pens with this noodle-y right arm there.
-
@Ronald said:
@drurowin said:
I'll rerun the test later and even post a video.
Why don't you finish drawing and coloring your avatar first.
I haven't got my tablet working with Solaris yet.
-
@Ronald said:
@drurowin said:
I'll rerun the test later and even post a video.
Why don't you finish drawing and coloring your avatar first.
Dear Strong Bad,
Why don't you creat a montage?
-
@Ben L. said:
@Ronald said:
@drurowin said:
I'll rerun the test later and even post a video.
Why don't you finish drawing and coloring your avatar first.
Dear Strong Bad,
Why don't you creat a montage?I'd like to see ya twy.
-
@drurowin said:
The methodology was "time 7z a test.file /path/to/10/gb/of/data; time 7zstatic a test2.file /path/to/10/gb/of/data". I swear I am not making this up.
You fucking idiot, the first run would prime the disk cache and result in the second run being significantly faster. This is what I meant when I said "unable to do a simple benchmark accurately."
If you wanted to test the performance of static linking, you should have done: 1) several run-throughs so you can throw out an outliers; and 2) do a pre-run before each real run to prime the disk cache.
Once again, a person with any competence or experience or sense at all would have seen a 275% increase in performance and said "Yep, I fucked up the test somehow", even if they were rabidly pro-static-linking. The problem is you had no clue what you were doing, which isn't shocking considering the opinions you hold.
-
@morbiuswilters said:
@drurowin said:
The methodology was "time 7z a test.file /path/to/10/gb/of/data; time 7zstatic a test2.file /path/to/10/gb/of/data". I swear I am not making this up.
You fucking idiot, the first run would prime the disk cache and result in the second run being significantly faster. This is what I meant when I said "unable to do a simple benchmark accurately."
If you wanted to test the performance of static linking, you should have done: 1) several run-throughs so you can throw out an outliers; and 2) do a pre-run before each real run to prime the disk cache.
Once again, a person with any competence or experience or sense at all would have seen a 275% increase in performance and said "Yep, I fucked up the test somehow", even if they were rabidly pro-static-linking. The problem is you had no clue what you were doing, which isn't shocking considering the opinions you hold.
Well at least knowing what you're doing isn't a requirement for posting here. I'll try it later with disk caching disabled.
Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
-
@drurowin said:
Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
@morbiuswilters said:you should have done: 1) several run-throughs so you can throw out an outliers; and 2) do a pre-run before each real run to prime the disk cache.
-
@Salamander said:
@drurowin said:
@morbiuswilters said:Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
you should have done: 1) several run-throughs so you can throw out an outliers; and 2) do a pre-run before each real run to prime the disk cache.
He'll still say I didn't do it right, and it irks me that I can't just disable caching. But I'll give that a try tonight when I'm done with real work.
-
@drurowin said:
Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
Why would you disable disk caching?
Look: write a script that does this:
1) Run command
2) Run command (store time in new variable)
3) Repeat previous 10 times
4) Output median of 10 stored timesThen run that for both commands and compare the results.
It's not fucking hard.
-
@blakeyrat said:
@drurowin said:
Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
Why would you disable disk caching?
Look: write a script that does this:
1) Run command
2) Run command (store time in new variable)
3) Repeat previous 10 times
4) Output median of 10 stored timesThen run that for both commands and compare the results.
It's not fucking hard.
Well, then I'm going to have to use a set smaller than 10 GB of data; otherwise your parents will have made all of you go to bed before I get done.
-
@Salamander said:
@drurowin said:
Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
@morbiuswilters said:you should have done: 1) several run-throughs so you can throw out an outliers; and 2) do a pre-run before each real run to prime the disk cache.
Right, you don't want to disable the disk cache for a test like this. You're not testing the throughput of your I/O controller or the performance of ZFS' I/O scheduler. You want to test this all in-memory.
In fact, whatever file you're zipping (and the zip it outputs) should fit entirely in memory. Maybe make a tmpfs to hold the uncompressed input and the compressed output. You only need a few GB. What you want to avoid is testing the performance of the I/O system.
-
@morbiuswilters said:
@Salamander said:
@drurowin said:
@morbiuswilters said:Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
you should have done: 1) several run-throughs so you can throw out an outliers; and 2) do a pre-run before each real run to prime the disk cache.
Right, you don't want to disable the disk cache for a test like this. You're not testing the throughput of your I/O controller or the performance of ZFS' I/O scheduler. You want to test this all in-memory.
In fact, whatever file you're zipping (and the zip it outputs) should fit entirely in memory. Maybe make a tmpfs to hold the uncompressed input and the compressed output. You only need a few GB. What you want to avoid is testing the performance of the I/O system.
I have 32 GB RAM, my existing 10 GB test data should fit. I'll try a tmpfs.
-
@drurowin said:
@blakeyrat said:
@drurowin said:
Edit: Hmm, it looks like I can't disable ZFS's caching. Any suggestions for how to do this without being accused of "fucking up the test", to show that static linking has definite solid performance advantages?
Why would you disable disk caching?
Look: write a script that does this:
1) Run command
2) Run command (store time in new variable)
3) Repeat previous 10 times
4) Output median of 10 stored timesThen run that for both commands and compare the results.
It's not fucking hard.
Well, then I'm going to have to use a set smaller than 10 GB of data; otherwise your parents will have made all of you go to bed before I get done.
Yes, you do. You want the set (input and output) to fit in memory. Also, instead of storing the median, output all 10 times. The reason is you might have some really crazy outliers that need to be thrown out. You could probably get away with something like throwing out the top and bottom results, then taking the median of the remaining 8.
If the performance increase you see is anything more than 5% you probably fucked up somewhere. Seriously, PIC isn't quite as fast as statically-linked code, but the penalty is not nearly what you seem to think. Do you really think every OS on the planet has been using a linking strategy which made programs 4 times slower and you're the first person to figure out the truth?
-
@morbiuswilters said:
If the performance increase you see is anything more than 5% you probably fucked up somewhere. Seriously, PIC isn't quite as fast as statically-linked code, but the penalty is not nearly what you seem to think. Do you really think every OS on the planet has been using a linking strategy which made programs 4 times slower and you're the first person to figure out the truth?
Performance vs your (valid) argument about maintainability.
-
@morbiuswilters said:
You could probably get away with something like throwing out the top and bottom results, then taking the median of the remaining 8.
... yes. That would be a completely different result than taking the median of all 10 values.
(Morbs, it's not often I can call you an idiot. But today! Today! You are the idiot. Sorry.)
-
@blakeyrat said:
@morbiuswilters said:
You could probably get away with something like throwing out the top and bottom results, then taking the median of the remaining 8.
... yes. That would be a completely different result than taking the median of all 10 values.
(Morbs, it's not often I can call you an idiot. But today! Today! You are the idiot. Sorry.)
What if the top result is really fat and you can only take away a third of it?
-
@drurowin said:
@morbiuswilters said:
If the performance increase you see is anything more than 5% you probably fucked up somewhere. Seriously, PIC isn't quite as fast as statically-linked code, but the penalty is not nearly what you seem to think. Do you really think every OS on the planet has been using a linking strategy which made programs 4 times slower and you're the first person to figure out the truth?
Performance vs your (valid) argument about maintainability.
Of course there's some performance gain to avoiding PIC, but if it was 4-fold, then no OS would use dynamic linking.
-
@blakeyrat said:
@morbiuswilters said:
You could probably get away with something like throwing out the top and bottom results, then taking the median of the remaining 8.
... yes. That would be a completely different result than taking the median of all 10 values.
(Morbs, it's not often I can call you an idiot. But today! Today! You are the idiot. Sorry.)
Dammit, I meant "mean", not "median". But, yeah, taking the median of all 10 should be fine, too. This isn't the most precise test in the world, anyway.
-
ben@loads foo$ ls -l total 1932 -rw-rw-r--. 1 ben ben 1018 May 27 22:12 foo.go -rwxrwxr-x. 1 ben ben 36583 May 27 22:12 foo-shared -rwxrwxr-x. 1 ben ben 1934936 May 27 22:12 foo-static ben@loads foo$ file * foo.go: C source, ASCII text foo-shared: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=0x0a6d08210da35d09861b705dda0680ae9c00d242, not stripped foo-static: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped ben@loads foo$ ldd foo-* foo-shared: linux-vdso.so.1 => (0x00007fff04ffe000) libgo.so.0 => /lib64/libgo.so.0 (0x00007f73057dd000) libm.so.6 => /lib64/libm.so.6 (0x0000003625e00000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003627200000) libc.so.6 => /lib64/libc.so.6 (0x0000003625a00000) /lib64/ld-linux-x86-64.so.2 (0x0000003625600000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003626200000) foo-static: not a dynamic executable ben@loads foo$ cat foo.go package main import ( "bytes" "fmt" "crypto/sha1" "compress/gzip" ) func main() { var buf bytes.Buffer defer func() { sha := sha1.New() sha.Write(buf.Bytes()) fmt.Printf("% x\n", sha.Sum(nil)) }() w1, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression) defer w1.Close() w2, _ := gzip.NewWriterLevel(w1, gzip.BestCompression) defer w2.Close() w3, _ := gzip.NewWriterLevel(w2, gzip.BestCompression) defer w3.Close() w4, _ := gzip.NewWriterLevel(w3, gzip.BestCompression) defer w4.Close() w5, _ := gzip.NewWriterLevel(w4, gzip.BestCompression) defer w5.Close() w6, _ := gzip.NewWriterLevel(w5, gzip.BestCompression) defer w6.Close() w7, _ := gzip.NewWriterLevel(w6, gzip.BestCompression) defer w7.Close() w8, _ := gzip.NewWriterLevel(w7, gzip.BestCompression) defer w8.Close() w9, _ := gzip.NewWriterLevel(w8, gzip.BestCompression) defer w9.Close() var b []]byte sha := sha1.New() for i := 0; i < 1000; i++ { fmt.Printf("%d ", i) b = sha.Sum(b) sha.Write(b) w9.Write(b) } }
Won't be a jiffy.
I have a Pentium 4, so it will be quite a few jiffies.
Edit: I'm changing that loop to 10k iterations instead of 1k.
-
@morbiuswilters said:
If the performance increase you see is anything more than 5% you probably fucked up somewhere.
Rerun it 15 times, average is 7.7% faster. The outliers were 284% faster and 0.3% faster, which I removed because the 284% was WAY the fuck out there. The next highest one was in the 20s of percents. Still, though, it's a measurable gain.
-
+ for i in '{0..4}' + /usr/bin/time -v ./foo-shared Command being timed: "./foo-shared" User time (seconds): 3387.62 System time (seconds): 24.01 Percent of CPU this job got: 84% Elapsed (wall clock) time (hss or m:ss): 1:07:28 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2285928 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 46559 Minor (reclaiming a frame) page faults: 1218681 Voluntary context switches: 49327 Involuntary context switches: 1101848 Swaps: 0 File system inputs: 2664640 File system outputs: 560 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + /usr/bin/time -v ./foo-static Command being timed: "./foo-static" User time (seconds): 2107.56 System time (seconds): 17.13 Percent of CPU this job got: 80% Elapsed (wall clock) time (hss or m:ss): 44:05.69 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2351512 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 46519 Minor (reclaiming a frame) page faults: 978369 Voluntary context switches: 283808 Involuntary context switches: 774157 Swaps: 0 File system inputs: 2821440 File system outputs: 704 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + for i in '{0..4}' + /usr/bin/time -v ./foo-shared Command being timed: "./foo-shared" User time (seconds): 3669.83 System time (seconds): 27.48 Percent of CPU this job got: 80% Elapsed (wall clock) time (hss or m:ss): 1:16:11 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2345112 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 43531 Minor (reclaiming a frame) page faults: 1034713 Voluntary context switches: 45576 Involuntary context switches: 1590604 Swaps: 0 File system inputs: 2477144 File system outputs: 872 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + /usr/bin/time -v ./foo-static Command being timed: "./foo-static" User time (seconds): 2307.04 System time (seconds): 22.42 Percent of CPU this job got: 74% Elapsed (wall clock) time (hss or m:ss): 52:21.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2394384 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 73384 Minor (reclaiming a frame) page faults: 1268979 Voluntary context switches: 367508 Involuntary context switches: 1035768 Swaps: 0 File system inputs: 4471096 File system outputs: 632 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + for i in '{0..4}' + /usr/bin/time -v ./foo-shared Command being timed: "./foo-shared" User time (seconds): 3890.41 System time (seconds): 31.58 Percent of CPU this job got: 78% Elapsed (wall clock) time (hss or m:ss): 1:23:32 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2293712 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 41250 Minor (reclaiming a frame) page faults: 1121608 Voluntary context switches: 43162 Involuntary context switches: 1921375 Swaps: 0 File system inputs: 2349232 File system outputs: 232 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + /usr/bin/time -v ./foo-static Command being timed: "./foo-static" User time (seconds): 2488.11 System time (seconds): 25.71 Percent of CPU this job got: 70% Elapsed (wall clock) time (hss or m:ss): 59:13.04 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2415208 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 72966 Minor (reclaiming a frame) page faults: 1165381 Voluntary context switches: 406809 Involuntary context switches: 1508734 Swaps: 0 File system inputs: 4456288 File system outputs: 1160 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + for i in '{0..4}' + /usr/bin/time -v ./foo-shared Command being timed: "./foo-shared" User time (seconds): 4139.62 System time (seconds): 34.59 Percent of CPU this job got: 70% Elapsed (wall clock) time (hss or m:ss): 1:38:58 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2299820 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 52247 Minor (reclaiming a frame) page faults: 1017613 Voluntary context switches: 55211 Involuntary context switches: 2499290 Swaps: 0 File system inputs: 2825400 File system outputs: 1280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 + /usr/bin/time -v ./foo-static Command being timed: "./foo-static" User time (seconds): 2571.57 System time (seconds): 29.90 Percent of CPU this job got: 62% Elapsed (wall clock) time (hss or m:ss): 1:08:51 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2331084 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 90333 Minor (reclaiming a frame) page faults: 1165370 Voluntary context switches: 476299 Involuntary context switches: 1960679 Swaps: 0 File system inputs: 5391832 File system outputs: 992 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
-
Ben L.: Master of pointless information.
Seriously, damn near half of that is just the same lines repeated over and over again.
-
@Salamander said:
Ben L.: Master of pointless information.
Seriously, damn near half of that is just the same lines repeated over and over again.
Fine, since you're obviously incapable of reading information and discarding the parts you deem unimportant, here's a slimmed-down version:SHARED User time (seconds): 3387.62 System time (seconds): 24.01 STATIC User time (seconds): 2107.56 System time (seconds): 17.13 SHARED User time (seconds): 3669.83 System time (seconds): 27.48 STATIC User time (seconds): 2307.04 System time (seconds): 22.42 SHARED User time (seconds): 3890.41 System time (seconds): 31.58 STATIC User time (seconds): 2488.11 System time (seconds): 25.71 SHARED User time (seconds): 4139.62 System time (seconds): 34.59 STATIC User time (seconds): 2571.57 System time (seconds): 29.90
-
A pretty dramatic difference. Also, if I'm following correctly, both binaries were compiled from Go? So I guess it's a myth then that Go produces only statically linked binaries.
-
And if you can't read:
-
@joe.edwards said:
A pretty dramatic difference. Also, if I'm following correctly, both binaries were compiled from Go? So I guess it's a myth then that Go produces only statically linked binaries.
There's a compiler called gccgo, which I used for the shared one.
-
@Ben L. said:
Fine, since you're obviously incapable of reading information and discarding the parts you deem unimportant...
Some of us have jobs that we're trying to avoid doing. We don't want to spend time sifting through your console output.
As for your results.. is this some Go program? I thought Go didn't support dynamic linking..
-
@Ben L. said:
@joe.edwards said:
A pretty dramatic difference. Also, if I'm following correctly, both binaries were compiled from Go? So I guess it's a myth then that Go produces only statically linked binaries.
There's a compiler called gccgo, which I used for the shared one.But then you're testing something significantly different than static vs. dynamic linking. (How do you people not understand how to do a simple experiment??) You're testing a completely different compiler for a language that supposedly doesn't support anything but static linking (so who knows what weird hoops the dynamic-linking compiler is jumping through?)