In other news today...



  • @remi or just the fact that GMT!=UTC?
    Sunday, 31 March 2024, 2:30 AM GMT isn't a valid date time combination.



  • @remi said in In other news today...:
    going over UK airspace […] NAN to NUL at 12:00

    A NFFN--PANU flight wouldn't get anywhere near UK airspace, being completely over the vast expenses of the Pacific ocean. </:faxbarrierjoker:>

    @BernieTheBernie said in In other news today...:

    @remi or just the fact that GMT!=UTC?
    Sunday, 31 March 2024, 2:30 AM GMT isn't a valid date time combination.

    GMT does equal either UTC (if you take it as the current standard time-zone of UK) or UT1 (if you take it in the original meaning of being a mean solar time as measured in Greenwich).

    Sunday, 31 March 2024, 2:30 AM GMT is a valid combination. It won't ever appear on the clock using Europe/London time zone, but not because the time would never occur in GMT, but because the Europe/London time zone will be BST (British Summer Time) at that point.

    Buy the way, you should have used 1:30, because Europe/London switches goes from 0:59:59 GMT to 2:00:00 BST—so the transition occurs at the same moment as the transition from 1:59:59 CET to 3:00:00 CEST.

    Also :kneeling_warthog: checking whether any of the countries in western Africa that use the UTC+0 time-zone use the GMT abbreviation, because if they do, that time will show on their clocks—because countries near the equator do not observe summer time.


  • Java Dev

    @Bulb said in In other news today...:

    Also checking whether any of the countries in western Africa that use the UTC+0 time-zone use the GMT abbreviation, because if they do, that time will show on their clocks—because countries near the equator do not observe summer time.

    I know Portugal is in UTC+0, but they call it WET. I do not know whether they observe summer time and :kneeling_warthog: to look it up.



  • @Bulb said in In other news today...:

    A NFFN--PANU flight wouldn't get anywhere near UK airspace, being completely over the vast expenses of the Pacific ocean. </:faxbarrierjoker:>

    I was already quite surprised that there are both a NAN and a NUL airport, and even more that at least one of them makes some tiny amount of sense rather than being just a strip of grass barely large enough to get a code (Fiji isn't a big travel destination but Nadi is apparently the main airport there, so not a totally unlikely destination). Since the outage was in the UK I had to shoehorn it in some way and... well, you've gotta do what you've gotta do. :mlp_shrug:



  • @remi said in In other news today...:

    a flight plan submitted by a French airline could be behind the problem

    Aha: there were some ÀÁÂ or like that in a string? :trollface:
    Or were those people exceptionally bad and used an Ü?


  • Java Dev

    @BernieTheBernie said in In other news today...:

    @remi said in In other news today...:

    a flight plan submitted by a French airline could be behind the problem

    Aha: there were some ÀÁÂ or like that in a string? :trollface:
    Or were those people exceptionally bad and used an Ü?

    Nah, I bet they did not put in the timezones correctly, leading to a flight which would arrive before it departed.



  • @BernieTheBernie said in In other news today...:

    Or were those people exceptionally bad and used an Ü?

    Or a ÿ, for maximum :trollface:.

    Yes, it is a real thing that does occur in French, in a couple of place names. But none of those has an airport, so no, that's not possible for real.


  • Java Dev

    @remi Not to be confused with ij



  • @PleegWat said in In other news today...:

    Nah, I bet they did not put in the timezones correctly, leading to a flight which would arrive before it departed.

    If that's the case, I don't know what is the most astonishing: that such a basic error could cause the whole system to keel down, or that it did not happen earlier.

    Maybe if that caused the flight to arrive the day before it departed? But even then...


  • Discourse touched me in a no-no place

    @remi said in In other news today...:

    I don't know what is the most astonishing: that such a basic error could cause the whole system to keel down, or that it did not happen earlier.

    The problem is not that the data is bad, but that the system refuses to do anything until that bad data item is fixed. Presumably the problem's spread because of an index on the timestamp with that timestamp not being properly validated (or the validation rules having changed a little bit).

    Which is why I think timestamps should be stored as either Unix seconds or Julian day numbers (depending on effective resolution desired).



  • @BernieTheBernie said in In other news today...:

    @remi said in In other news today...:

    a flight plan submitted by a French airline could be behind the problem

    Aha: there were some ÀÁÂ or like that in a string? :trollface:

    Flight plans are a jumbled mess of abbreviations—four letter airport codes, five letter waypoint codes, one to three letter radio beacon codes, one or two letters and three digits route codes—that all use just basic telegraph alphabet. Something like NFFN - NN A579 CARRP - CDB Q59 BET - PANU.

    On the other hand that leaves a lot room for being invalid. An code that does not reference anything, a code for something too far away, code that matches two things (because the codes are not unique) etc.

    Now the dispatcher submitting it will have another application to create it, but if the databases get out of sync for some wrong reason (the codes do change over time), all sorts of funny things may happen.



  • @dkf Sure, but my point is that such an error is so basic that it seems almost impossible that it did not happen before. So many airlines and so many flights and so many timezones (and changes in timezones with DST etc.), and for years and years that kind of error would never have happened?

    Though maybe the error did happen from time to time, and controllers just manually fixed it and because it never happened at a moment of high traffic it only caused minor delays and nobody cared until now. :thonking:

    Long-standing bug, recurs regularly, users know a workaround so don't bother, maybe a ticket in bug tracker exists but for some reason was never fixed, until one day the stars align and instead of a minor annoyance it becomes a huge shit storm. Yeah, that sounds quite likely, actually. :dumpster-fire: :this_is_fine:


  • 🚽 Regular

    @PleegWat said in In other news today...:

    I know Portugal is in UTC+0, but they call it WET. I do not know whether they observe summer time and :kneeling_warthog: to look it up.

    We do. We're currently on it. So it's currently UTC+1. (so-called WEST)


  • Java Dev

    @Zecc Right, course, should have known. You're in the EU. Summer time is mandatory per EU directive. They tried to get rid of it a few years ago but it stranded in bureaucracy.


  • Discourse touched me in a no-no place



  • @remi said in In other news today...:

    Though maybe the error did happen from time to time

    only with the latest system update coud it cause a crash.
    As users, aren't we used to that?



  • @remi said in In other news today...:

    Long-standing bug

    Or not... Could be a side effect regression of something else that was just updated. We all know that never happens!



  • @dcon oh, come on, such a critical system is obviously never updated without thorough testing, that would just...

    :spittake: :rofl:

    Sorry, I couldn't keep a straight face saying that.


  • Discourse touched me in a no-no place

    @remi said in In other news today...:

    @dkf Sure, but my point is that such an error is so basic that it seems almost impossible that it did not happen before. So many airlines and so many flights and so many timezones (and changes in timezones with DST etc.), and for years and years that kind of error would never have happened?

    Though maybe the error did happen from time to time, and controllers just manually fixed it and because it never happened at a moment of high traffic it only caused minor delays and nobody cared until now. :thonking:

    Long-standing bug, recurs regularly, users know a workaround so don't bother, maybe a ticket in bug tracker exists but for some reason was never fixed, until one day the stars align and instead of a minor annoyance it becomes a huge shit storm. Yeah, that sounds quite likely, actually. :dumpster-fire: :this_is_fine:

    The problem with the index is, I suspect, how it goes from being a problem with one row (meh...) to being a problem that stops all work (panic!). Ordering many query results by various timestamps makes a ton of sense; you don't normally want to have air traffic controllers dealing with flight legs that landed a few days ago. I'm guessing that the DBA somehow forgot that real data usually contains junk. (That sounds both likely and very unlikely. Did the old DBA retire and a noob take up the reins?)

    It's possible that the problem is that the index doesn't build and so many of the queries do full table scans, with the massive increase in resource consumption that implies. That would be almost as bad as things just failing. Maybe worse.



  • @dkf the CEO of the air traffic control thing said in one interview that they can't just discard an erroneous piece of data (and keep processing the rest) because they have to understand what that piece of data was supposed to mean otherwise they might end up with one flight that is there but not on their maps or in the wrong place. He said this is (part of) the reason why a single piece of bad data caused the whole system to grind to a stop.

    Which does make some amount of sense, in the abstract general sense. You can't just assume that because a plane made a typo in their flight plan they don't exist at all.

    But it also sounds very much like a desperate flailing about and an attempt at obfuscating things ("don't try to understand, it's complicated!"), which you would expect from a CEO doing crisis management while they probably themselves don't really know what happened.

    So I guess the best we can do is keep making wild suppositions based on absolutely zero information. 🎆


  • Java Dev

    @remi The one thing we can assume is that the flight plan was previously accepted by probably France, which is why the UK cannot reject it outright.



  • @PleegWat well :technically-correct: we can't even assume that (and even less so as we (I, at least) don't have the slightest idea of how flight plans and air traffic control works).

    The only tiny piece of rumour (not even information!) is that a "French airline" is involved, but that doesn't mean the flight involved the French airspace (or air traffic control) (unless a flight plan from an airline registered in country X somehow always involves air traffic control from country X?). The flight could have been between two non-French airports.

    Especially since the biggest French airline is Air France, which is also KLM and thus probably has a significant number of flights operating to/from AMS.

    So now it's your fault (you're from 🇳🇱, right?)! :whistling:


  • Java Dev

    @remi Hence why I wrote 'probably France'.


  • Discourse touched me in a no-no place

    @remi said in In other news today...:

    So I guess the best we can do is keep making wild suppositions based on absolutely zero information. 🎆

    Yep. I wasn't trying to figure out why the data was originally bad — shit happens, OK? — but rather why one piece of bad data brought everything to a halt. Wild supposition, of course, but without an index its hard to see why one record would cause so much chaos.

    Points to problems with the DBA and the process of data ingestion.



  • @remi said in In other news today...:

    @dkf Sure, but my point is that such an error is so basic that it seems almost impossible that it did not happen before. So many airlines and so many flights and so many timezones (and changes in timezones with DST etc.), and for years and years that kind of error would never have happened?

    I am fairly confident a flight plan does not involve any timezones at all. The times are printed in local time on the tickets and the departure and arrival boards and such, but on the operations side, everything is in UTC.

    Though maybe the error did happen from time to time, and controllers just manually fixed it and because it never happened at a moment of high traffic it only caused minor delays and nobody cared until now. :thonking:

    If it happened occasionally, they'd probably know how to fix it. Because …

    Long-standing bug, recurs regularly, users know a workaround so don't bother, maybe a ticket in bug tracker exists but for some reason was never fixed, until one day the stars align and instead of a minor annoyance it becomes a huge shit storm. Yeah, that sounds quite likely, actually. :dumpster-fire: :this_is_fine:

    … in critical systems like this, publishing workarounds is actually vastly preferred to fixing the bugs. Because every time you fix a bug, you risk introducing another one, and you have to re-run all the tests and then you can still miss it.



  • @remi said in In other news today...:

    @dkf the CEO of the air traffic control thing said in one interview that they can't just discard an erroneous piece of data (and keep processing the rest) because they have to understand what that piece of data was supposed to mean otherwise they might end up with one flight that is there but not on their maps or in the wrong place. He said this is (part of) the reason why a single piece of bad data caused the whole system to grind to a stop.

    Which does make some amount of sense, in the abstract general sense. You can't just assume that because a plane made a typo in their flight plan they don't exist at all.

    Except … that's pretty much exactly what you do. Because it's not an issue. The computers don't control anything, they just pass the information around so the controllers don't need to spend as much time on the phone with each other and can manage more aircraft. If an aircraft calls and the flight plan isn't there, they can just re-enter whatever is needed, or if it comes to worst, jot it down on strips of paper and carry on like it's 1930 … except they can only do that with so many flights.

    It is also quite common to not accept the flight plan for various reasons, expecting the pilot or the dispatcher will just try filing a new version with suitable modifications. So no, a flight plan with error could be safely deleted and not bring the system down.

    So it had to be some larger problem causing some system-level integrity check failure. But what … yeah, we know approximately nothing about the system.


    @PleegWat said in In other news today...:

    @remi The one thing we can assume is that the flight plan was previously accepted by probably France, which is why the UK cannot reject it outright.

    It's … more complicated than that. In the “single European sky”, the flight plans are accepted (after being rejected and modified a couple of times due to various sectors being expected to be too busy at the intended time) by the Eurocontrol “network manager operations centre”, and all the individual ATC units have their systems connected to, and through, that.

    Which suggests some inconsistency between those systems could have been the problem.


  • Notification Spam Recipient

    @remi said in In other news today...:

    otherwise they might end up with one flight that is there but not on their maps or in the wrong place.

    I'd rather have a plane attempting to gain clearance for their flight to get told "Hold on, we don't have you on the list, stand by" rather than the entire fleet conglomeration halt....

    But that's just me.



  • Mass-reply to a few of the comments above. Obviously I'm just talking without knowing anything, not even what @Bulb said (thanks!).

    @dkf said in In other news today...:

    without an index its hard to see why one record would cause so much chaos.

    If processing explicitly stops (or triggers some other process) when a faulty record is encountered, then no index is involved and chaos can still happen if that triggered process for some reason was way too slow.

    (of course stopping everything for any sort of faulty record would be dumb, but maybe the record was subtly wrong in a way that never happened before (see my joke about NAN to NUL at exactly 0:00) and that fell through the usual safeguards)

    @Bulb said in In other news today...:

    on the operations side, everything is in UTC.

    I'm not surprised it is, that is the only thing that makes sense. But since we're talking about an error happening, it's possible someone messed up and entered something in the wrong time. That should have been caught before, obviously, but again the ATC shouldn't have failed anyway, and it did, so... not impossible. But in any case, my own view is that such an error would have happened before already, so I'm dubious this would be the cause here.

    If an aircraft calls and the flight plan isn't there, they can just re-enter whatever is needed, or if it comes to worst, jot it down on strips of paper and carry on like it's 1930 … except they can only do that with so many flights.

    I gather from a couple of articles that this is basically what happened. The faulty piece of data had to be handled manually (either directly that piece of data, or it was rejected and the aircraft submitted again something through a different process or whatever intermediate steps), which took time. And likely, in addition to the manual processing itself being slower than the automatic one, this would slow down the processing of other things because they have to be cross-referenced with the data entered manually.

    Then, as I said, if (when) that happens in a normal period, this just causes a couple of minor delays. But here it happened at the worst possible moment, with very busy skies (and maybe some random other factor we have no idea about such as one controller being out sick or whatever) and instead of being smoothed out in a few minutes, this cascaded into a major failure. A bit like a minor accident on the road can either be unnoticeable or turn into a major roadblock. Or a log file that grows a bit too much can either be nothing, or cause a system with a disk 99.9% full to fall down.

    @Tsaukpaetra said in In other news today...:

    I'd rather have a plane attempting to gain clearance for their flight to get told "Hold on, we don't have you on the list, stand by" rather than the entire fleet conglomeration halt....

    I don't expect the answer was "freeze everything," that would indeed be stupid. But "hold on" then causes the controller to spend 30 more seconds handling that flight, which means 30 s during which they're not handling another flight. And it also means if you've got another flight getting clearance for more or less the same spot, you've got to spend 30 more seconds manually checking if they're going to interact with the first flight or not, possibly telling them to "hold on" as well and so on.


  • Notification Spam Recipient

    @remi said in In other news today...:

    @Tsaukpaetra said in In other news today...:

    I'd rather have a plane attempting to gain clearance for their flight to get told "Hold on, we don't have you on the list, stand by" rather than the entire fleet conglomeration halt....

    I don't expect the answer was "freeze everything," that would indeed be stupid. But "hold on" then causes the controller to spend 30 more seconds handling that flight, which means 30 s during which they're not handling another flight. And it also means if you've got another flight getting clearance for more or less the same spot, you've got to spend 30 more seconds manually checking if they're going to interact with the first flight or not, possibly telling them to "hold on" as well and so on.

    Well, yeah, that's actually one of the reasons flights delay so damn often, actually. Everything cascades. It's normal, expected, and halfway accounted for in the system of operations as a built-in result.


  • Discourse touched me in a no-no place


  • BINNED

    @loopback0 said in In other news today...:

    :take_my_money:

    d04da5b6-3fe1-497d-b4ea-70419e259ec6-grafik.png

    ae520abd-514c-492c-9db8-2d76a9e7d51a-grafik.png

    RTX ON

    Explain to me how obviously higher poly meshes and higher quality textures are the result of "raytracing"?
    Fucking liars!

    Half-Life 2 Remade Assets: a project setting out to recreate assets used across Half-Life 2 with high fidelity graphics and physically accurate properties.

    Thought so.


  • Discourse touched me in a no-no place

    @topspin Nvidia's marketing department redefined RTX ON to mean "remastered but also with RTX" when they moved on from just sticking fancy lights into old games and announced RTX Remix.


  • BINNED

    @topspin said in In other news today...:

    Fucking liars!

    @loopback0 said in In other news today...:

    marketing department

    :same_picture.pptx:



  • @topspin no, no, “fucking liars” is what is done by the spouses/significant others/bits on the side of the marketing department.


  • BINNED

    @Arantor Tsaukpaetra, is that you?
    🍹





  • @Zecc said in In other news today...:

    @PleegWat said in In other news today...:

    I know Portugal is in UTC+0, but they call it WET. I do not know whether they observe summer time and :kneeling_warthog: to look it up.

    We do. We're currently on it. So it's currently UTC+1. (so-called WEST)

    there is no DRY time there?



  • @remi said in In other news today...:

    such a critical system is obviously never updated without thorough testing

    unless it's a hotfix, so we spend the entire year using hotfixes to ship new features



  • @sockpuppet7 There's plenty of good wine in 🇵🇹 , hence they won't suffer a dry time. 🍷



  • @jinpa said in In other news today...:

    Stop leaking :trolley-garage: members' pictures! :doing_it_wrong:

    The first one (in the onebox) is clearly @boomzilla

    (inb4: "so are all the others" of course)



  • @jinpa said in In other news today...:

    That is some literal trolling. And of course it's done by a Dane.



  • A funny and nice to read story on measuring programmer productivity:
    https://dannorth.net/2023/09/02/the-worst-programmer/



  • @BernieTheBernie

    Just don’t try to measure the individual contribution of a unit in a complex adaptive system, because the premise of the question is flawed.

    QFT. I wish more people would understand that.


  • Considered Harmful

    @BernieTheBernie said in In other news today...:

    A funny and nice to read story on measuring programmer productivity:
    https://dannorth.net/2023/09/02/the-worst-programmer/

    Funny, but that's all there is to it. The number of unproductive morons vastly exceeds the number of Tims.



  • @Applied-Mediocrity said in In other news today...:

    The number of unproductive morons vastly exceeds the number of Tims.

    The point is that every simple-ish “objective” metric has cases it does not properly cover. The actual unproductive morons may even be closing tickets, looking like they are moderately productive, but if they amass enough technical debt in the process, the end result may easily be that they don't.


  • Considered Harmful

    @Bulb And that is precisely why it's merely funny and not very useful. It states a truism and does not help to address the actual problem of bad programmers.


  • 🚽 Regular

    @BernieTheBernie said in In other news today...:

    A funny and nice to read story on measuring programmer productivity:
    https://dannorth.net/2023/09/02/the-worst-programmer/

    I saw that elsewhere before and someone linked to this then:

    It's a short one, I can quote the whole thing here:

    In early 1982, the Lisa software team was trying to buckle down for the big push to ship the software within the next six months. Some of the managers decided that it would be a good idea to track the progress of each individual engineer in terms of the amount of code that they wrote from week to week. They devised a form that each engineer was required to submit every Friday, which included a field for the number of lines of code that were written that week.

    Bill Atkinson, the author of Quickdraw and the main user interface designer, who was by far the most important Lisa implementer, thought that lines of code was a silly measure of software productivity. He thought his goal was to write as small and fast a program as possible, and that the lines of code metric only encouraged writing sloppy, bloated, broken code.

    He recently was working on optimizing Quickdraw's region calculation machinery, and had completely rewritten the region engine using a simpler, more general algorithm which, after some tweaking, made region operations almost six times faster. As a by-product, the rewrite also saved around 2,000 lines of code.

    He was just putting the finishing touches on the optimization when it was time to fill out the management form for the first time. When he got to the lines of code part, he thought about it for a second, and then wrote in the number: -2000.

    I'm not sure how the managers reacted to that, but I do know that after a couple more weeks, they stopped asking Bill to fill out the form, and he gladly complied.



  • @Zecc said in In other news today...:

    the rewrite also saved around 2,000 lines of code.

    The original algorithm took 2,001 lines of code, didn't it?


  • Considered Harmful

    @Zecc said in In other news today...:

    -2000
    ...
    I'm not sure how the managers reacted to that

    Negative numbers broke the reporting software and the vendor refused to fix it 🍹



  • @Applied-Mediocrity said in In other news today...:

    Negative numbers broke the reporting software excel lotus-1-2-3 visicalc spreadsheet and the vendor refused its author no longer remembered how to fix it

    🔧


Log in to reply