But it's in the logs



  • I just discovered that our application uses log4j with rolling file appenders. So far so good. Then I discovered that we have numerous log files; one for each major piece of the application. Ok, sort of tolerable. Except that a single logical transaction spans numerous pieces of the application. On numerous servers. That are not time synchronized. And each component of the application uses a different date-time format in the log4j configs. So you can't just: grep thingYouNeedToResearch *.log | sort.

    First you need to get special permission to log in to production servers (45 minutes).

    Next you need to go to each machine and: grep thing *.log > tmp

    Next you need to scp (push) the tmp file to an intermediate server.

    Next you need to log into the dev server and scp (pull) the tmp file (from each of the prod servers) to your dev server.

    Next you need to log into the intermediate server and delete the tmp files.

    Next you need to go to your dev server, and run scipts (that don't exist until you write them because nobody on the team knows how to use sed) on each file in order to reformat the date-time fields.

    Next you need to combine all the tmp files.

    Next you need to sort the single combined tmp file.

    Now you can first grep to see what the hell happened.

    And if you discover that you need to find something else that you didn't grep for, go back to step 1.

    Oh, and you can't just ftp the log files because the log4j appenders are set to allow the files to grow to 1GB, and we keep the last 10 of them, but we don't start a new one each day, and some of our transactions can span 10 days. That'd be 1GB * 10 days * 18 different logs = 180GB per server  for 16 servers.

    Fuck.



  • @snoofle said:

    ...

    And if you discover that you need to find something else that you didn't grep for, go back to step 1.

    Oh, and you can't just ftp the log files because the log4j appenders are set to allow the files to grow to 1GB, and we keep the last 10 of them, but we don't start a new one each day, and some of our transactions can span 10 days. That'd be 1GB * 10 days * 18 different logs = 180GB per server  for 16 servers.

    Fuck.

    I would just take my sweet time grep'ing through the logs then. The longer that it takes, the more likely a higher-up will complain. When the higher-up does complain, you can basically just walk him through the process that you outlined to demonstrate how badly this needs to be changed. Hopefully then you can get the green light to fix it.



  • A simple rolling file appender is not an adequate way to do logging on a distributed system. A short-term solution would be to move to a SocketAppender (http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/net/SocketAppender.html), although in the end you're probably going to want something like Chainsaw (http://logging.apache.org/chainsaw/index.html) to be able to manage all of these systems.

    Regardless, I suggest that you suggest the aweful scenarios that can happen with multiple, non-syncing servers that cooperate within a single transaction. Don't forget that a concrete doomsday scenario will do wonders to motivate management to action.

    Yes, I do realize that this does not solve your immediate problem.



  • @Sock Puppet 5 said:

    A simple rolling file appender is not an adequate way to do logging on a distributed system. A short-term solution would be to move to a SocketAppender ([url]http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/net/SocketAppender.html[/url])

    I was about to suggest similar. And I'm sure you'd be able to just modify the log4j config files without 3 levels of manager approval, right? Right?



  • @snoofle said:

     [ . . . ] numerous servers. That are not time synchronized.

    Is that not perhaps the underlying WTF? 

    @snoofle said:

    Oh, and you can't just ftp the log files because the log4j appenders are set to allow the files to grow to 1GB, and we keep the last 10 of them, but we don't start a new one each day, and some of our transactions can span 10 days. That'd be 1GB * 10 days * 18 different logs = 180GB per server  for 16 servers.

    Are the log files actually circular buffers, or could you write an FTP script that uses REST commands to selectively receive just the tail of the logs?

     



  •  Add a 17th server:

    the log server that gather all logs :)



  • @tchize said:

     Add a 17th server:

    the log server that gather all logs :)

    One log to rule them all,

    One log to find them,

    One log to bring them all,

    and in the server bind them



  • @Megaman22 said:

    @tchize said:

     Add a 17th server:

    the log server that gather all logs :)

    One log to rule them all,

    One log to find them,

    One log to bring them all,

    and in the server bind them


    +1



  • @Sock Puppet 5 said:

    (...)although in the end you're probably going to want something like Chainsaw

    Yeah, that was my impression, too. Either that or a lot of explosives.



  • @Megaman22 said:

    @tchize said:

     Add a 17th server:

    the log server that gather all logs :)

    One log to rule them all,

    One log to find them,

    One log to bring them all,

    and in the server bind them

     

    Rolls over your neighbour's dog
    Log Log Log Log
    It's better than bad
    It's log.

     


  • 🚽 Regular

    @dhromed said:

    @Megaman22 said:

    @tchize said:

     Add a 17th server:

    the log server that gather all logs :)

    One log to rule them all,

    One log to find them,

    One log to bring them all,

    and in the server bind them

     

    Rolls over your neighbour's dog
    Log Log Log Log
    It's better than bad
    It's log.

     

     

    +2

     



  • @dhromed said:

    @Megaman22 said:
    @tchize said:

    Add a 17th server:

    the log server that gather all logs :)

    One log to rule them all,

    One log to find them,

    One log to bring them all,

    and in the server bind them

    Rolls over your neighbour's dog
    Log Log Log Log
    It's better than bad
    It's log.

    Do you guys not know YouTube exists? Worst transcription ever.

    Besides, what's a "neighbour"?



  • @blakeyrat said:

    Besides, what's a "neighbour"?
    Same as a neibor, only corrcetly spelled.

     


  • ♿ (Parody)

    @Ilya Ehrenburg said:

    @blakeyrat said:
    Besides, what's a "neighbour"?

    Same as a neibor, only corrcetly spelled.

    What's a neibor? Is it anything like a neighbor?



  • @blakeyrat said:

    Worst transcription ever.

    Besides, what's a "neighbour"?

     

    I couldn't be arsed to look it up.

     



  • @boomzilla said:

    @Ilya Ehrenburg said:
    @blakeyrat said:
    Besides, what's a "neighbour"?
    Same as a neibor, only corrcetly spelled.

    What's a neibor? Is it anything like a neighbor?

     

    Indeed. It's a misspelled neighbor. Intentionally, in this case. I hoped everybody could correct it themselves and maybe come to the conclusion that there's more than one way to skin a cat to correctly spell neighbour.



  • @blakeyrat said:

    Besides, what's a "neighbour"?

    A bour that neigh



  • @Ilya Ehrenburg said:

    Intentionally, in this case.
     

    It's just how I spell it.



  •  Neighbour. Yes, so do I.



  • @blakeyrat said:

    Besides, what's a "neighbour"?

    Jesus' answer to this question is legendary.

     



  • @Sock Puppet 5 said:

    Jesus' answer to this question is legendary.
    Not just that answer, but His whole pattern of responding to questions was to not answer the question that was asked, but the question that should have been asked.

    This would get Him downvoted on StackOverflow.



  • @Sock Puppet 5 said:

    @blakeyrat said:

    Besides, what's a "neighbour"?

    Jesus' answer to this question is legendary.

    At first I was confused because I did not know to which jesus you were referring, then I saw the tags.... I did not know that in spanish it was Jesus Caminacielo instead (just like Bruce Wayne is Bruno Diaz), however it seems you time code for the movie (I assumed you mashed them up together) is wrong, there is nothing relevant there.



  • I know you're not looking for solutions to someone's conflated logging configuration, but splunk (www.splunk.com) can be amazing.



  • @serguey123 said:

    @Sock Puppet 5 said:

    @blakeyrat said:

    Besides, what's a "neighbour"?

    Jesus' answer to this question is legendary.

    At first I was confused because I did not know to which jesus you were referring, then I saw the tags.... I did not know that in spanish it was Jesus Caminacielo instead (just like Bruce Wayne is Bruno Diaz), however it seems you time code for the movie (I assumed you mashed them up together) is wrong, there is nothing relevant there.

     

    What in God's name are you talking about?



  • @Justice said:

    @serguey123 said:

    @Sock Puppet 5 said:

    @blakeyrat said:

    Besides, what's a "neighbour"?

    Jesus' answer to this question is legendary.

    At first I was confused because I did not know to which jesus you were referring, then I saw the tags.... I did not know that in spanish it was Jesus Caminacielo instead (just like Bruce Wayne is Bruno Diaz), however it seems you time code for the movie (I assumed you mashed them up together) is wrong, there is nothing relevant there.

     

    What in God's name are you talking about?

     

    Hint 1: Tag on Sock Puppet 5's post

    Hint 2: http://translate.google.com/#es|en|camina%20cielo


Log in to reply