Airbus A350 must be rebooted every 149 hours
-
In a mandatory airworthiness directive (AD) reissued earlier this week, EASA urged operators to turn their A350s off and on again to prevent "partial or total loss of some avionics systems or functions".
Operators need to completely power the airliner down before it reaches 149 hours of continuous power-on time.
The original 2017 AD was brought about by "in-service events where a loss of communication occurred between some avionics systems and avionics network" (sic). The impact of the failures ranged from "redundancy loss" to "complete loss on a specific function hosted on common remote data concentrator and core processing input/output modules".
Filed Under: I just flew in from Las Vegas, and boy are my arms tired.
-
@El_Heffe Even Windows 95 managed 49 days.
-
@El_Heffe I tried to find some critical power of 2 that could be correlated with 149 hours, but I couldn't really find anything. The best I've got is is 2^29 milliseconds which is about 149 hours and 8 minutes, but that feels a bit too low to be problematic. But who knows, maybe they're using one of those ancient weird languages that repurpose the upper 2 bits of an integer for something else...
-
@Gąska As the Airworthiness Directive just refers to the issue possibly occuring "after 149 hours" it might be that 149 is a number deliberately lower than the actual number to allow a buffer.
-
@loopback0 yeah, but I can't find anything else that's even in the ballpark of that. And 149 is a weirdly specific number...
Filed under: Here be dragons, ancient programming jokes, ancient Pokemon jokes
-
@Gąska said in Airbus A350 must be rebooted every 149 hours:
The best I've got is is 2^29 milliseconds which is about 149 hours and 8 minutes, but that feels a bit too low to be problematic
I was first going to guess they count tenths of milliseconds, but that doesn't quite add up (you'd only get 119 hours). Eights of a millisecond would kinda match.
-
@Gąska Well they're talking about data concentrators that send ARINC 429 (an ancient serial-like protocol that operates at 100 Kbps in "High Speed Mode") labels over ARINC 664 (Ethernet, with alterations) links. Depending on the implementation, an ARINC 429 frame can contain 19 to 23 bits of data information, which might amount to something but I'm too lazy to think more on it.
-
@Gąska said in Airbus A350 must be rebooted every 149 hours:
@loopback0 yeah, but I can't find anything else that's even in the ballpark of that. And 149 is a weirdly specific number...
Yes, that was my first thought. For anything computer related, 149 is a weird number.
-
@cvi said in Airbus A350 must be rebooted every 149 hours:
@Gąska said in Airbus A350 must be rebooted every 149 hours:
The best I've got is is 2^29 milliseconds which is about 149 hours and 8 minutes, but that feels a bit too low to be problematic
I was first going to guess they count tenths of milliseconds, but that doesn't quite add up (you'd only get 119 hours). Eights of a millisecond would kinda match.
The article says they need to reboot before reaching 149 hours.
149 hours is 536,400 seconds, which is a little more than 512k.
Maybe they just need more RAM.
-
@El_Heffe said in Airbus A350 must be rebooted every 149 hours:
The article says they need to reboot before reaching 149 hours.
Eights of a millisecond would give you ~149.13 hours, so that should be enough, right? (Or am I failing at basic arithmetic here?)
-
@El_Heffe said in Airbus A350 must be rebooted every 149 hours:
149 hours is 536,400 seconds, which is a little more than 512k.
Maybe they just need more RAM.For an ancient ARINC 429-based LRU, that's quite possible. (Let's just say my employer is currently working on new ideas for something that allows them to update these old LRU's without requiring 8" floppy disks...)
-
@mott555 Like a serial port?
-
@Captain As I understand it, the flight LRU's are updated by plugging an "updater" LRU into them with ARINC 429 connections, and blasting specially-formatted ARINC 429 labels at them that wipes their memory and replaces their software. So we don't really touch the flight LRU's, but we have to emulate the "updater" LRU, most of which are older than I am, poorly-documented, require floppy disks, and were made by companies that ceased to exist decades ago. But this is a side of our business that I have very little hand in, so I don't know a whole lot about it.
-
@mott555 There's also [[[E]E]P]ROM replacement*, which I'm sure was the path for one box I worked on.
* Changes are likely tiny, but we thank the airline industry for paying for them. Replacement chips may come on entire boards or in entire replacement devices. Please allow 4-6 months for delivery.
-
Sounds like a memory leak.
-
@Gąska said in Airbus A350 must be rebooted every 149 hours:
But who knows, maybe they're using one of those ancient weird languages that repurpose the upper 2 bits of an integer for something else...
First bit is the replicated master/slave status, and second is the sync status.
-
@boomzilla: those are the kind of systems where dynamic memory allocation is usually frowned upon or forbidden entirely, so I doubt it.
-
@Zerosquare said in Airbus A350 must be rebooted every 149 hours:
those are the kind of systems where
dynamic memory allocationgood code is usually frowned upon or forbidden entirelyFTFY
-
But that's true almost everywhere, isn't it?
And avionics is definitely not the worst for code quality.
I mean, imagine if it was developed by typical NodeJS developers. Plane crashes rate would drop to zero... because planes would never take off in the first place.
-
@Zerosquare said in Airbus A350 must be rebooted every 149 hours:
And avionics is definitely not the worst for code quality.
Nobody says it is. Even excluding the shit stuff that is 99% of what is written by students (which is never intended to hit production) there's a lot worse than avionics code.
-
@Gąska said in Airbus A350 must be rebooted every 149 hours:
@El_Heffe I tried to find some critical power of 2 that could be correlated with 149 hours, but I couldn't really find anything. The best I've got is is 2^29 milliseconds which is about 149 hours and 8 minutes, but that feels a bit too low to be problematic. But who knows, maybe they're using one of those ancient weird languages that repurpose the upper 2 bits of an integer for something else...
@mott555 said in Airbus A350 must be rebooted every 149 hours:
@Gąska Well they're talking about data concentrators that send ARINC 429 (an ancient serial-like protocol that operates at 100 Kbps in "High Speed Mode") labels over ARINC 664 (Ethernet, with alterations) links. Depending on the implementation, an ARINC 429 frame can contain 19 to 23 bits of data information, which might amount to something but I'm too lazy to think more on it.
So probably 2^19 seconds then?
-
@PleegWat said in Airbus A350 must be rebooted every 149 hours:
So probably 2^19 seconds then?
That's a 145 (and a bit) hours. So not that.
-
You reboot it by typing 4 8 15 16 23 42, don't you
-
You know, I do recall an event one time I flew where the plane apparently lost power for a full minute. Wonder if this was related...?
-
@Zerosquare said in Airbus A350 must be rebooted every 149 hours:
@boomzilla: those are the kind of systems where dynamic memory allocation is usually frowned upon or forbidden entirely, so I doubt it.
Yeah, to avoid this sort of problem. So it's probably not a traditional memory leak, because they're not allocating, but it sure sounds like they're using something up over time and then aren't recycling it properly or whatever.
-
@boomzilla TRWTF is using up all file handles in avionics software?
-
@mott555 said in Airbus A350 must be rebooted every 149 hours:
100 Kbps in "High Speed Mode"
And this is why I love serial protocols and don't care what anyone says.
-
To be honest, I don't expect planes to be operated continuously for a very long time. Certainly no longer than a few days.
-
@_P_ said in Airbus A350 must be rebooted every 149 hours:
@boomzilla TRWTF is using up all file handles in avionics software?
My WAG is that it's a rollover of the tick source used for the RTOS. I've seen this happen, everything works fine until rollover and then suddenly everything grids to a halt. Embarrassing, and if you didn't spot the error during development then your testing regime has to have the device powered long enough to trigger it, which it probably won't.
If the counter is 32-bit then 4294967295/536400 = ~8kHz
That's not unreasonable as a tick rate if it's a cooperative RTOS, which is a standard config for embedded things.
-
-
@Cursorkeys Aha, the number of days was not the days until failure, just the safe interval to reboot at:
The FAA said:
This AD was prompted by the determination that a Model 787 airplane that has
been powered continuously for 248 days can lose all alternating current (AC) electrical
power due to the generator control units (GCUs) simultaneously going into failsafe mode.
This condition is caused by a software counter internal to the GCUs that will overflow
after 248 days of continuous power. We are issuing this AD to prevent loss of all AC
electrical power, which could result in loss of control of the airplane.
-
@Cursorkeys said in Airbus A350 must be rebooted every 149 hours:
failsafe
Is there some alternate definition of this word of which I am unaware?
-
@error said in Airbus A350 must be rebooted every 149 hours:
@Cursorkeys said in Airbus A350 must be rebooted every 149 hours:
failsafe
Is there some alternate definition of this word of which I am unaware?
Failsafe in the sense that an uncontrolled generator might pop a shear pin or cause an IDG disconnect maybe? You could reset the GCU using the circuit-breakers in-flight.
On the engines I'm familiar with that have IDGs, if you have a disconnect then there is no way to reconnect while in the air, you literally have to twist a physical handle to reconnect the generator to the gearbox.
-
@Cursorkeys said in Airbus A350 must be rebooted every 149 hours:
uncontrolled generator might pop a shear pin or cause an IDG disconnect maybe? You could reset the GCU
Ah, yes. *smiles blankly and nods* Of course.
-
@error said in Airbus A350 must be rebooted every 149 hours:
@Cursorkeys said in Airbus A350 must be rebooted every 149 hours:
uncontrolled generator might pop a shear pin or cause an IDG disconnect maybe? You could reset the GCU
Ah, yes. *smiles blankly and nods* Of course.
Apologies, I should have defined the terms.
Shear pins are bits of metal that are designed to snap at a particular mechanical load. Very useful for, say, protecting the rest of a gearbox from a runaway generator by snapping, and thus physically separating the drive shafts before everything gets trashed. Obviously you'd have to take everything apart and put a new unbroken one in when it goes.
IDG - Integrated Drive Generator - Basically a constant-speed-drive for the generator, all the ones I've seen have a disconnect function where they mechanically sever the gear-train in-between the generator and the engine gearbox. The disconnect is reversible, but not while you are in the air.
GCU - Generator Control Unit - Deals with things like exciting the generator (basically you choose how much power comes out of the generator by putting some into it, management of this is pretty important) and protection (overload, over temperature etc...).
-
@error said in Airbus A350 must be rebooted every 149 hours:
@Cursorkeys said in Airbus A350 must be rebooted every 149 hours:
failsafe
Is there some alternate definition of this word of which I am unaware?
-
-
@No_1 said in Airbus A350 must be rebooted every 149 hours:
@Cursorkeys said in Airbus A350 must be rebooted every 149 hours:
WAG
Wife and girlfriend?
I hope for him they're the same person.
-
@PleegWat
Where is the fun in that?