@dkf said in Stream compress s3 file in python:
@Bulb It's the exact same thing as always, as I said up thread:
@dkf said in Stream compress s3 file in python:
Getting good speed out of I/O is, as usual, all about minimising buffer copies and memory management. It was ever thus.
That applies to all languages. Python has some particular nuances (it's got a brutally thick layer between what happens at user level and the base level of system calls or even just the real bytecode engine) but getting buffer management right is always important for IO. Understanding how to handle streaming of things matters a lot too (as then you can avoid a lot of memory churn) but that's often harder in Python than you might think unless you dive down to the bottom layer of its networking libraries.
Still, nothing quite beats the server-side networking problems I was finding in Ruby a decade ago, with common libraries setting up huge caches only for them to be used exactly once... I've not looked since, but Ruby always was famous for being resource hungry so I suspect that's not gone away.
So I ended up subsclassing io.RawIOBase and implementing a buffer using a threading.Queue with chunksize in each item. Seems to be working fairly well. Was able to create the object, give it to zstandard to write the compressed data to and to the upload thread to upload the data, then give the the compressor stream to the download thread to write to and all seems well.