Why is my program slow?



  • user@tuesday:~/csgo$ ruby --version
    ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux]
    user@tuesday:~/csgo$ /usr/bin/time ruby stats.rb > /dev/null
    99.20user 6.85system 1:46.18elapsed 99%CPU (0avgtext+0avgdata 12616maxresident)k
    1752inputs+0outputs (5major+3237minor)pagefaults 0swaps

    That's almost two minutes to process 28MB of logs. And the size of my log collection is only going to get bigger.

    Here's the code:

    require 'json'
    
    # an unquoted string
    string_inner = /((?:[^\\"]|\\[\\"])*)/
    
    # a quoted string
    string = /"#{string_inner}"/
    
    # a name-id-steam-team combo
    # served with a side of fries
    name = /"#{string_inner}<#{string_inner}><#{string_inner}><#{string_inner}>"/
    
    # a name-id-steam combo
    # hold the team
    name_1 = /"#{string_inner}<#{string_inner}><#{string_inner}>"/
    
    # an integer coordinate
    coord = /\[(-?[0-9]+) (-?[0-9]+) (-?[0-9]+)\]/
    
    stats = {}
    
    map = ""
    
    Dir['logs/*.log'].sort.each do |fn|
            open fn do |f|
                    cvars = false
                    bomb_planted = false
                    hostage_reached = false
    
                    f.each_line do |l|
    if l =~ /^L ([0-9]{2})\/([0-9]{2})\/([0-9]{4,}) - ([0-9]{2}):([0-9]{2}):([0-9]{2}): /
                                    time = Time.new $3.to_i, $1.to_i, $2.to_i, $4.to_i, $5.to_i, $6.to_i
                                    l = l[$&.size..-1]
                            else
                                    p l
                                    raise "malformed log line"
                            end
    
    if cvars
                                    next unless l =~ /^server cvars end$/
                            end
    
    case l
                            when /^Log file started \(file #{string}\) \(game #{string}\) \(version #{string}\)$/
                                    # ignore
                            when /^Log file closed$/
                                    # ignore
                            when /^Loading map #{string}$/
                                    map = $1.split('/').last
                            when /^Started map #{string} \(CRC #{string}\)$/
                                    map = $1.split('/').last
                            when /^server cvars start$/
                                    cvars = true
                            when /^server cvars end$/
                                    cvars = false
                            when /^server_cvar: #{string} #{string}$/
                                    # ignore
                            when /^server_message: #{string}$/
                                    # ignore
                            when /^Your server is out of date and will be shutdown during hibernation or changelevel, whichever comes first\.$/
                                    # ignore
                            when /^#{name} entered the game$/
                                    # ignore
                            when /^#{name} connected, address #{string}$/
                                    # ignore
                            when /#{name} disconnected \(reason #{string}\)$/
                                    # ignore
                            when /^#{name_1} switched from team <#{string_inner}> to <#{string_inner}>$/
                                    # ignore
                            when /^#{name} changed name to #{string}$/
                                    # ignore
                            when /^#{name} purchased #{string}$/
                                    # ignore
                            when /^World triggered #{string}$/
    case event = $1
                                    when "Game_Commencing"
                                            # ignore
                                    when "Round_Start"
                                            bomb_planted = false
                                            hostage_reached = false
                                    when "Round_End"
                                            # ignore
                                    else
                                            p l
                                            raise "unprocessed line"
                                    end
                            when /^World triggered "killlocation" \(attacker_position #{string}\) \(victim_position #{string}\)$/
                                    # ignore
                            when /^#{name} triggered "weaponstats2?" .*$/
                                    # ignore
                            when /^\[META\] .*$/
                                    # ignore
                            when /^#{name} #{coord} attacked #{name} #{coord} with #{string} \(damage #{string}\) \(damage_armor #{string}\) \(health #{string}\) \(armor #{string}\) \(hitgroup #{string}\)$/
                                    # ignore
                            when /^#{name} #{coord} committed suicide with #{string}$/
                                    # ignore
                            when /^#{name} #{coord} killed #{name} #{coord} with #{string}( \(headshot\))?$/
                                    # ignore
                            when /^#{name} assisted killing #{name}$/
                                    # ignore
                            when /^#{name} triggered #{string}$/
    case event = $5
                                    when "headshot"
                                            # ignore
                                    when "Got_The_Bomb"
                                            # ignore
                                    when "Dropped_The_Bomb"
                                            # ignore
                                    when "Planted_The_Bomb"
                                            bomb_planted = true
                                    when "Begin_Bomb_Defuse_Without_Kit"
                                            # ignore
                                    when "Defused_The_Bomb"
                                            # ignore
                                    when "Touched_A_Hostage"
                                            hostage_reached = true
                                    when "Rescued_A_Hostage"
                                            # ignore
                                    when "round_mvp"
                                            # ignore
                                    else
                                            p l
                                            raise "unprocessed line"
                                    end
                            when /^Team #{string} triggered #{string} \(CT #{string}\) \(T #{string}\)$/
                                    stats[map] = {
                                            ct_win: 0,
                                            t_win: 0,
                                            hostage_reached: 0,
                                            hostage_rescued: 0,
                                            bomb_planted: 0,
                                            bomb_detonated: 0,
                                            bomb_defused: 0,
                                            time_ran_out: 0
                                    } if stats[map].nil?
    
    if bomb_planted
                                            stats[map][:bomb_planted] += 1
                                            bomb_planted = false
                                    end
    if hostage_reached
                                            stats[map][:hostage_reached] += 1
                                            hostage_reached = false
                                    end
    
    case event = $2
                                    when "SFUI_Notice_CTs_Win"
                                            stats[map][:ct_win] += 1
                                    when "SFUI_Notice_Terrorists_Win"
                                            stats[map][:t_win] += 1
                                    when "SFUI_Notice_Target_Saved"
                                            stats[map][:time_ran_out] += 1
                                    when "SFUI_Notice_All_Hostages_Rescued"
                                            stats[map][:hostage_rescued] += 1
                                    when "SFUI_Notice_Hostages_Not_Rescued"
                                            stats[map][:time_ran_out] += 1
                                    when "SFUI_Notice_Bomb_Defused"
                                            stats[map][:bomb_defused] += 1
                                    when "SFUI_Notice_Target_Bombed"
                                            stats[map][:bomb_detonated] += 1
                                    else
                                            p l
                                            raise "unprocessed line"
                                    end
                            when /^Team #{string} scored #{string} with #{string} players$/
                                    # ignore
                            when /^Molotov projectile spawned at .*$/
                                    # ignore
                            when /^#{name} threw (molotov|hegrenade|decoy|flashbang|smokegrenade) #{coord}$/
                                    # ignore
                            else
                                    p l
                                    raise "unprocessed line"
                            end
                    end
            end
    end
    
    puts JSON.pretty_generate stats
    

    This isn't a disk IO issue. cat log/* finishes in less than 0.01 seconds.


  • Discourse touched me in a no-no place

    Because you're using Ruby, which isn't noted for being very fast even for a scripting language?

    Snark aside, you're matching a long list of regular expressions against a string, so you've got plenty of potential there for problems. In particular, you need to check whether you're doing a lot of recompilation of regular expressions (REs are processed by first compiling them to an internal form and then using a special interpreter to run against that form and do the match — I don't think the PCRE library compiles all the way to binary code, but I could be wrong) as that would really hurt a lot. It would also help if you order the REs so events that are more common are checked for first. Another thing you could try would be if you just extracted the first word of the action string and matched against just that for literals (“Log”, “server”, “Team”, “Molotov”, etc.) to try to clear out those cases first; ideally the log would have had a token at the start (well, after the timestamp) that indicated the event type so you could just extract that and use it to look up in a hash table what RE would be needed to match the line's content instead of trying each one in order, but we're not that lucky. It's also not clear to me that you should check for unprocessed lines; not doing that would let you get rid of a whole load of REs that have ignore bodies right now.

    You also seem to be parsing the timestamp and creating an object there which you never use. Date parsing is always ridiculously complicated (because people) and should be avoided if you don't need it.



  • Ok, I commented out the Time.new line and inlined the regex constants. Now my program finishes in 6 seconds instead of 120 seconds. I think that's mostly from the regex compiler not being as aggressive with caching as I thought it would be. Thanks for pointing that out!


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.