Why is my program slow?
-
user@tuesday:~/csgo$ ruby --version ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux] user@tuesday:~/csgo$ /usr/bin/time ruby stats.rb > /dev/null 99.20user 6.85system 1:46.18elapsed 99%CPU (0avgtext+0avgdata 12616maxresident)k 1752inputs+0outputs (5major+3237minor)pagefaults 0swaps
That's almost two minutes to process 28MB of logs. And the size of my log collection is only going to get bigger.
Here's the code:
require 'json' # an unquoted string string_inner = /((?:[^\\"]|\\[\\"])*)/ # a quoted string string = /"#{string_inner}"/ # a name-id-steam-team combo # served with a side of fries name = /"#{string_inner}<#{string_inner}><#{string_inner}><#{string_inner}>"/ # a name-id-steam combo # hold the team name_1 = /"#{string_inner}<#{string_inner}><#{string_inner}>"/ # an integer coordinate coord = /\[(-?[0-9]+) (-?[0-9]+) (-?[0-9]+)\]/ stats = {} map = "" Dir['logs/*.log'].sort.each do |fn| open fn do |f| cvars = false bomb_planted = false hostage_reached = false f.each_line do |l| if l =~ /^L ([0-9]{2})\/([0-9]{2})\/([0-9]{4,}) - ([0-9]{2}):([0-9]{2}):([0-9]{2}): / time = Time.new $3.to_i, $1.to_i, $2.to_i, $4.to_i, $5.to_i, $6.to_i l = l[$&.size..-1] else p l raise "malformed log line" end if cvars next unless l =~ /^server cvars end$/ end case l when /^Log file started \(file #{string}\) \(game #{string}\) \(version #{string}\)$/ # ignore when /^Log file closed$/ # ignore when /^Loading map #{string}$/ map = $1.split('/').last when /^Started map #{string} \(CRC #{string}\)$/ map = $1.split('/').last when /^server cvars start$/ cvars = true when /^server cvars end$/ cvars = false when /^server_cvar: #{string} #{string}$/ # ignore when /^server_message: #{string}$/ # ignore when /^Your server is out of date and will be shutdown during hibernation or changelevel, whichever comes first\.$/ # ignore when /^#{name} entered the game$/ # ignore when /^#{name} connected, address #{string}$/ # ignore when /#{name} disconnected \(reason #{string}\)$/ # ignore when /^#{name_1} switched from team <#{string_inner}> to <#{string_inner}>$/ # ignore when /^#{name} changed name to #{string}$/ # ignore when /^#{name} purchased #{string}$/ # ignore when /^World triggered #{string}$/ case event = $1 when "Game_Commencing" # ignore when "Round_Start" bomb_planted = false hostage_reached = false when "Round_End" # ignore else p l raise "unprocessed line" end when /^World triggered "killlocation" \(attacker_position #{string}\) \(victim_position #{string}\)$/ # ignore when /^#{name} triggered "weaponstats2?" .*$/ # ignore when /^\[META\] .*$/ # ignore when /^#{name} #{coord} attacked #{name} #{coord} with #{string} \(damage #{string}\) \(damage_armor #{string}\) \(health #{string}\) \(armor #{string}\) \(hitgroup #{string}\)$/ # ignore when /^#{name} #{coord} committed suicide with #{string}$/ # ignore when /^#{name} #{coord} killed #{name} #{coord} with #{string}( \(headshot\))?$/ # ignore when /^#{name} assisted killing #{name}$/ # ignore when /^#{name} triggered #{string}$/ case event = $5 when "headshot" # ignore when "Got_The_Bomb" # ignore when "Dropped_The_Bomb" # ignore when "Planted_The_Bomb" bomb_planted = true when "Begin_Bomb_Defuse_Without_Kit" # ignore when "Defused_The_Bomb" # ignore when "Touched_A_Hostage" hostage_reached = true when "Rescued_A_Hostage" # ignore when "round_mvp" # ignore else p l raise "unprocessed line" end when /^Team #{string} triggered #{string} \(CT #{string}\) \(T #{string}\)$/ stats[map] = { ct_win: 0, t_win: 0, hostage_reached: 0, hostage_rescued: 0, bomb_planted: 0, bomb_detonated: 0, bomb_defused: 0, time_ran_out: 0 } if stats[map].nil? if bomb_planted stats[map][:bomb_planted] += 1 bomb_planted = false end if hostage_reached stats[map][:hostage_reached] += 1 hostage_reached = false end case event = $2 when "SFUI_Notice_CTs_Win" stats[map][:ct_win] += 1 when "SFUI_Notice_Terrorists_Win" stats[map][:t_win] += 1 when "SFUI_Notice_Target_Saved" stats[map][:time_ran_out] += 1 when "SFUI_Notice_All_Hostages_Rescued" stats[map][:hostage_rescued] += 1 when "SFUI_Notice_Hostages_Not_Rescued" stats[map][:time_ran_out] += 1 when "SFUI_Notice_Bomb_Defused" stats[map][:bomb_defused] += 1 when "SFUI_Notice_Target_Bombed" stats[map][:bomb_detonated] += 1 else p l raise "unprocessed line" end when /^Team #{string} scored #{string} with #{string} players$/ # ignore when /^Molotov projectile spawned at .*$/ # ignore when /^#{name} threw (molotov|hegrenade|decoy|flashbang|smokegrenade) #{coord}$/ # ignore else p l raise "unprocessed line" end end end end puts JSON.pretty_generate stats
This isn't a disk IO issue.
cat log/*
finishes in less than 0.01 seconds.
-
Because you're using Ruby, which isn't noted for being very fast even for a scripting language?
Snark aside, you're matching a long list of regular expressions against a string, so you've got plenty of potential there for problems. In particular, you need to check whether you're doing a lot of recompilation of regular expressions (REs are processed by first compiling them to an internal form and then using a special interpreter to run against that form and do the match — I don't think the PCRE library compiles all the way to binary code, but I could be wrong) as that would really hurt a lot. It would also help if you order the REs so events that are more common are checked for first. Another thing you could try would be if you just extracted the first word of the action string and matched against just that for literals (“Log”, “server”, “Team”, “Molotov”, etc.) to try to clear out those cases first; ideally the log would have had a token at the start (well, after the timestamp) that indicated the event type so you could just extract that and use it to look up in a hash table what RE would be needed to match the line's content instead of trying each one in order, but we're not that lucky. It's also not clear to me that you should check for unprocessed lines; not doing that would let you get rid of a whole load of REs that have
ignore
bodies right now.You also seem to be parsing the timestamp and creating an object there which you never use. Date parsing is always ridiculously complicated (because people) and should be avoided if you don't need it.
-
Ok, I commented out the Time.new line and inlined the regex constants. Now my program finishes in 6 seconds instead of 120 seconds. I think that's mostly from the regex compiler not being as aggressive with caching as I thought it would be. Thanks for pointing that out!