Beyond CodeSOD



  • Software does not exist for itself in many cases. We work in business. Or in government agencies, which might be another level of WTF.
    So, a lot of WTF may happen around the software, not directly in the software.

    Customer: Are your devices XYZ certified?
    Sales: Not yet. We are working with Certification Agency to get them certified.
    Customer: Well, then let's add the Certificate to your deliverables.
    Sales: OK.

    Now, the system is up and running.
    Billing: Dear Customer, why don't you pay our invoice?
    Customer: We are still waiting for the Certificate.

    And he's right.
    Never will the devices he received be certified.
    We are working on getting the next generation of devices certified (i.e. the successor of the devices he got). And the next generation is still in development. Not yet a prototype available...

    :phb: Where's our €€€€?

    Well, 💩


  • Considered Harmful

    @BernieTheBernie got refs lined up?


  • kills Dumbledore

    This sounds positively utopian. Sales people promising things that are actually on the roadmap, rather than only in their own minds? Wonderful



  • A different kind of "Beyond CodeSOD":

    Our hotline guy Tony often writes long emails with many screenshots to some developer, typically to Bernie (whom else?) showing the faulty situation on the customer's system.

    Virtually never included are: log files.
    Because Bernie always asks for log files in order to find out what went wrong under which circumstances.

    Tony: "C'mon Bernie, you are no baby anymore. You can log into the customer's system and get the files yourself."
    And guess, Fritz, the head of development, quality, service, then told Bernie to just do the needful and get the information he needs himself.

    Yeah great!
    During several long years Bernie unsuccesfully tried to teach his cow-orkers how to do things correctly.
    They just refused.
    And succeeded.

    The absolute value (i.e. Math.Abs()) of Bernie's morale keeps moving towards previoulsy unknown high levels.

    Fortunately, there's the Daily What The Fuck where Bernie can slack.



  • @BernieTheBernie I'll bet my phone team is worse than yours.

    A few days ago I attended a meeting where the manager of the call center was presenting ideas of how we could improve the "reset password" feature of our site. During the meeting, I found out we are here because the most common cause of a call is that someone needs help resetting their password.

    Within five minutes, I discovered that the "text me a password reset code" part of our system stopped working in November of 2020.

    So, we are in a meeting where the call center isn't aware that no one, and I mean no one, is getting a code to continue with the reset process. They are actually out there doing usability research to find out how to make a "better process" while being oblivious to the fact that the current problem isn't a bad process - it's a broken process.

    BTW, we never get log files as well. We also get problems reported days or weeks after they happen with no time or even date of occurrence to go look in a log file.



  • @Jaime said in Beyond CodeSOD:

    I'll bet my phone team is worse than yours.

    I'll bet there's someone here on TDWTF who has a worse phone team than yours.
    And after he told us here, I'll bet ...

    How many levels are possible?

    Infinite.
    There are absofuckinglutely no limits to incompetence.
    :trwtf:


  • Considered Harmful

    @BernieTheBernie said in Beyond CodeSOD:

    A different kind of "Beyond CodeSOD":

    Our hotline guy Tony often writes long emails with many screenshots to some developer, typically to Bernie (whom else?) showing the faulty situation on the customer's system.

    Virtually never included are: log files.
    Because Bernie always asks for log files in order to find out what went wrong under which circumstances.

    Tony: "C'mon Bernie, you are no baby anymore. You can log into the customer's system and get the files yourself."
    And guess, Fritz, the head of development, quality, service, then told Bernie to just do the needful and get the information he needs himself.

    Yeah great!
    During several long years Bernie unsuccesfully tried to teach his cow-orkers how to do things correctly.
    They just refused.
    And succeeded.

    The absolute value (i.e. Math.Abs()) of Bernie's morale keeps moving towards previoulsy unknown high levels.

    Fortunately, there's the Daily What The Fuck where Bernie can slack.

    Tony needs a log file to keep around, so he has one when Bernie asks.



  • :phb: received an email which said that some business partner wants to share his location with him. :phb: was clever to ask the partner if that was intended, and he responded that he never sent that, and it could be spam.

    So, :phb: decided to warn us all that there might be a spam mail to us, and not to click the links in it. :phb: included a screen shot (so we can not click the link): (I anonymized it a little)
    GoogleLocationSharing.png

    Tony decided to investigate. He opened Google with his private account, and shared his location with his business account, and received an email which looked exactly like that on his business email address. He clicked the link, and saw that he is at the office.

    So, Tony decided to write an all-clear email to us: look, I did it myself, and it really looks like that. Just don't know why business partner wants to share his localtion with :phb: .
    :surprised-pikachu:

    By the way, Tony has network admin privileges in our company. Are you interested in sharing your location with him?


  • Discourse touched me in a no-no place

    @BernieTheBernie said in Beyond CodeSOD:

    Tony has network admin privileges in our company.

    :surprised-pikachu:



  • What happens when the "Manager Development, Quality Assurance, Training" uses source control?
    Old-Tmp.png
    Yeah, you cannot be really sure: keep an old folder, several .old files and several .tmp files. Just in case you'll need one of them later again.





  • @Steve_The_Cynic Needs updating for the glitchy Git.



  • Last Friday, our newest version was released.
    The test quality was ... as usual (though Fritz, head of QA, told me, it was better than ever before).

    Now a terrible issue was discovered:
    Our application does many things in many different background threads. They can send "sign of life messages", which are then checked for their age, and if they are too old, a repair may be started. Previously, there was only one setting for the maximum age of all sign of life messages. That did not make a lot of sense anymore, after a component designed by Kevin was happy alive when it sent them some 3 minutes apart - for Kevin's component that may be acceptable, but others have to take care of fire, and there a couple of seconds should be the maximum. So I changed that to a setting per component.
    Of course the thread checking those issues gets also checked, and it's maximum age was hard coded to two times its interval (typically 3 seconds) - what a bad decision by ... myself: confiteor peccatus sum.
    That thread also has to ask other (external) components for their issues. And one of them was not reachable (switched off), and that takes some 20 s to discover. Far beyond those 2*3s maximum age...

    During the last several months, nobody discovered that. Of course, all tests just tested simple scenarios, why should failure scenarios be included? :wtf_owl:

    OK. Some patch is required.

    Developed it in the current version, run the unit tests, and viola: two more failures than usual. Which tests did I just destroy? Failed to see it.
    Checked out the "tagged" version of the release, opened it in VS, tried to run the unit tests: it does not even compile.

    Cluster fuck.



  • One of our Windows Services has a memory leak. Hence, Johnny (the developer of that service) and Fritz (head of Development & Quality) decided to implement a hack: a script will stop and then start it again once per day. That häck was used with the previous version several years ago, too.

    That seemed to work until...

    Somewhen in September, the service at one customer remained in status "Stopping" and was still so 3 month later. And the securtiy critical (!) service was not doing what it was expected during all that time.



  • @BernieTheBernie said in Beyond CodeSOD:

    One of our Windows Services has a memory leak. Hence, Johnny (the developer of that service) and Fritz (head of Development & Quality) decided to implement a hack: a script will stop and then start it again once per day. That häck was used with the previous version several years ago, too.

    That's the proper :kneeling_warthog: solution. It is much simpler than fixing the app and works around a whole bunch of similar issues at once, with relatively little effort.

    That seemed to work until...

    Somewhen in September, the service at one customer remained in status "Stopping" and was still so 3 month later. And the securtiy critical (!) service was not doing what it was expected during all that time.

    I see two problems here:

    • The stop service should have a timeout. If the service takes more than x (20s, 1min, whatever) to shut down, the host process should be unceremoniously eliminated. Systemd does it—I've seen it plenty of times on our embedded devices when debugging some problem.
    • If the service is security critical, it should have a watch dog checking it sends a ping every y (40s, 2min, whatever) and either forcefully restart it, raise an alarm, or both, if it does not. Of course it's responsibility of the programmer to put that notification in the main processing loop to make sure it will stop being sent if some processing locks up. Systemd has that implemented too and we use it on our devices as well.


  • @Bulb Well, there is some other service querying some status from that failed service. I haven't seen the log files of that service, because ... my cow-orker think I am a magician.
    But more likely, that communication item still worked: thanks to great configuration, it is not explicitly stopped with at service shutdown (but normally that does not prevent the shutdown, it just may take a little longer).



  • @Bulb said in Beyond CodeSOD:

    The stop service should have a timeout. If the service takes more than x (20s, 1min, whatever) to shut down, the host process should be unceremoniously eliminated. Systemd does it—I've seen it plenty of times on our embedded devices when debugging some problem.

    Oddly, I just saw on StackOverflow:

    And now comes the question: how was it possible for the servcei to stay in "Stopping" for 3 months?
    The wonders of Windows...


  • Java Dev

    @BernieTheBernie said in Beyond CodeSOD:

    One of our Windows Services has a memory leak.

    I love valgrind for leaks. Though that requires keeping your code clean - a couple of binaries I cannot reasonably check because the oracle instant client spams errors like crazy.

    Is there a windows equivalent of valgrind?



  • @PleegWat said in Beyond CodeSOD:

    Is there a windows equivalent of valgrind?

    There is a similar tool with somewhat fewer features called Dr. Memory, that does work on Windooze and does track memory leaks.



  • This post is deleted!


  • @PleegWat said in Beyond CodeSOD:

    @BernieTheBernie said in Beyond CodeSOD:

    One of our Windows Services has a memory leak.

    I love valgrind for leaks. Though that requires keeping your code clean - a couple of binaries I cannot reasonably check because the oracle instant client spams errors like crazy.

    Is there a windows equivalent of valgrind?

    Anyway, I don't like to poke my nose into Johnny's ugly code... (though the leak could actually be in a third-party library, e.g. for can bus communication).



  • @BernieTheBernie said in Beyond CodeSOD:

    @Bulb Well, there is some other service querying some status from that failed service. I haven't seen the log files of that service, because ... my cow-orker think I am a magician.

    Logs don't count. If a critical service isn't working for some amount of time, the operations must start getting mails, then urgent mails, then SMSs. That's called monitoring.

    But more likely, that communication item still worked: thanks to great configuration, it is not explicitly stopped with at service shutdown (but normally that does not prevent the shutdown, it just may take a little longer).

    Well, if it was still working, and didn't eat all memory in those three months, then it wasn't actually a problem.

    Otherwise the thing that really needs to be fixed is the monitoring. Someone needs to know they should go an power-cycle the machine. Things can always go south far enough that it will be required.


  • Java Dev

    @BernieTheBernie said in Beyond CodeSOD:

    @PleegWat said in Beyond CodeSOD:

    @BernieTheBernie said in Beyond CodeSOD:

    One of our Windows Services has a memory leak.

    I love valgrind for leaks. Though that requires keeping your code clean - a couple of binaries I cannot reasonably check because the oracle instant client spams errors like crazy.

    Is there a windows equivalent of valgrind?

    Anyway, I don't like to poke my nose into Johnny's ugly code... (though the leak could actually be in a third-party library, e.g. for can bus communication).

    Knowing where the memory which was eventually leaked was originally allocated is only half the problem.


  • Discourse touched me in a no-no place

    @PleegWat said in Beyond CodeSOD:

    Knowing where the memory which was eventually leaked was originally allocated is only half the problem.

    Yes. You also need to log the places where it was not deallocated. :tro-pop:



  • @BernieTheBernie said in Beyond CodeSOD:

    the service at one customer remained in status "Stopping" and was still so 3 month later. And the securtiy critical (!) service was not doing what it was expected during all that time.

    If it takes 3 months to notice that the service doesn't work, it's definitely not critical. 🚎


Log in to reply