WTF of my day (ok, one of them).
This is something I'd seen before, which was good. Because that meant that it only took me 45 minutes to diagnose the issue and provide a solution to my coworker whose ticket it was. But still.
We have a piece of software that makes some requests and opens a pair of UDP sockets. Those sockets are authenticated by a janky homegrown protocol (literally called the Janky Session Protocol). This relies on a HLO (hello) message that the server replies to with a hashed token. The client then sends back a LGN (login) with that same token and a salt agreed on earlier. If it matches, the session begins. But it has to match for the entire socket--if I HLO on ephemeral port A and LGN on ephemeral port B, that's a no-go.
The server has a CNAME DNS record pointed at a load balancer, which has 3 IPs backing it. Of course different connections could get any of those records depending on if they re-look-up the DNS entry each time or not.
All of this is setup and, while janky, mostly works.
Now the WTF.
On macs, the default DNS resolution in NodeJS is to do the lookup once and then cache it at the os level, meaning the client always gets the same remote IP and is happy for the duration of the connection.
On linux, the default DNS resolution in NodeJS is to do the lookup each and every time a connection attempt is made. Which makes some sense--that's why load balancers exist. But for UDP and sockets, this makes for bad things--it will always get a different IP for each part of the connection.
As a result, when we're testing the thing even dockerized, it will always succeed on our developer machines and always fail when run from a linux-based EC2 instance. With extremely unhelpful error messages. I didn't notice it until I actually did a tcpdump
on the traffic and noticed that the remote IPs were changing (and thus the sockets weren't the same).
And no, there's no toggle. You have to manually do the lookup and force the connection to use the same IP (basically override the DNS name and do the connection to a specific IP, resolved once manually before you start the socket.