Any Python multiprocessing gurus around here?


  • Impossible Mission - B

    Trying to track down a bizarre problem at work.

    We've got a Python script that's throwing a bunch of errors in a piece of the code where they shouldn't be coming from. Digging into it, I've managed to trace it down to the multiprocessing.Pool() constructor.

    All that that's supposed to do is set up a MP pool object, but somewhere, somehow, when I call that method, a bunch of other code ends up running that's not supposed to run until much later in the script! A ton of error messages get printed before the multiprocessing.Pool() call returns.

    Does anyone have any idea how that can happen? :wtf: :wtf: :wtf:


  • Discourse touched me in a no-no place

    I don't claim to be a guru (in Python), but it seems that making a pool makes a bunch of other objects, notably including several threads, several queues, and several processes. It also depends on the pickling of objects, forking (on non-Windows) and other tricks. The way in which these coordinate is non-trivial, and not necessarily stable internally between releases of Python as a bunch of it involves undocumented support classes. In fact, there's a whole fucking lot of them; it's a damn snake pit down there.

    Can you be a little more precise about the nature of what's detonating? Some things I would expect to be extremely problematic in concurrence with such trickiness (such as signal handling), and “it runs lots of code that it shouldn't do yet and prints lots of errors” really doesn't make for an actionable issue. 😉



  • I'm doing a bunch of multiprocessing.Pool stuff now.

    If you have code that's running that you don't think should be running, have you checked that all code is either in classes, functions or behind

    if __name__ == '__main__':

    If you think it's the constructor, is it possible you're shadowing the Pool class? If you have a local file with the same name (like multiprocessing), it will get it from there instead of the system one.

    You can also debug the code and step through it. Easy in eclipse with pydev, but remote debugging possible too.



  • @masonwheeler not specific of multiprocessing, but a bug that runs unexpected code: if your code uses the 'future' module (not the built in __future__) in certain cases can import unintended modules.

    By exampe, if your app starts by a main.py and a test.py is in the same directory, and some of your modules imports future, in certain cases that test.py is imported

    Great surprise.

    They are some other problematic names besides test.py

    bugreport: https://github.com/PythonCharmers/python-future/issues/268

    edit: markdown


Log in to reply