SQLite in Python



  • @HardwareGeek said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    lightyears better than whatever you're using.

    Not to get into a dick-measuring contest, but I'm using an i9-11900 (8 cores — par with M1, half of M2), 64 GB of RAM (4x the max, 8x the default for M1 Air, and 2.67x the max for M2) and 7.5 TB of storage (6.5 internal, 1 external; 2.5 SSD, 4.5 spinning rust — the best M1 or M2 Air can do is 2TB internal), driving dual 4k monitors. So the M2 processor might be better, the whole system is lightyears short of lightyears better than mine, and I sure as hell couldn't get an Air that's even a little better than my system for $700.

    I have been a very vocal critic of Apple for many years. All I can say is that my i9 laptop was running slow AF with the fan running all the time when I was barely running anything.

    On my M1 Max, I have trouble getting it to go over 50% СPU, I'm running Windows in parallels and I don't even feel it. Somehow it uses less memory and less storage.

    I get that Macs suck, they're expensive, not upgradeable, but somehow they just work better.



  • @dangeRuss said in SQLite in Python:

    I think you just install a new one. You can have multiple WSLs. Not sure you can just upgrade from one to another.

    Thank you. That was actually helpful. It turned out I already had a long-forgotten browser tab that explained how to do that (same one that explained how to install it in the first place). Tab hoarding FTW!


  • Discourse touched me in a no-no place

    @dangeRuss said in SQLite in Python:

    I have been a very vocal critic of Apple for many years. All I can say is that my i9 laptop was running slow AF with the fan running all the time when I was barely running anything.
    On my M1 Max, I have trouble getting it to go over 50% СPU, I'm running Windows in parallels and I don't even feel it. Somehow it uses less memory and less storage.

    Those Macs have supreme memory bandwidth. That's by far and away their best feature. They also run reasonably cool.

    A tricked out PC desktop can probably still outshine them for sheer grunt power. That will have much better cooling and a far bigger power budget. As long as you're not having to dedicate a large part of it to McAfee and Windows Update.



  • @dkf said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    I have been a very vocal critic of Apple for many years. All I can say is that my i9 laptop was running slow AF with the fan running all the time when I was barely running anything.
    On my M1 Max, I have trouble getting it to go over 50% СPU, I'm running Windows in parallels and I don't even feel it. Somehow it uses less memory and less storage.

    Those Macs have supreme memory bandwidth. That's by far and away their best feature. They also run reasonably cool.

    A tricked out PC desktop can probably still outshine them for sheer grunt power. That will have much better cooling and a far bigger power budget. As long as you're not having to dedicate a large part of it to McAfee and Windows Update.

    True, with water cooling you can probably beat them, but at the cost of turning PC into a whole house heater, and at probably many times the cost.

    For a laptop I don't think you can beat it right now. I think the only time I've even seen the fans turn on was when I was running some crazy multiprocessing python stuff. I don't think they have ever been on during normal usage.



  • @dkf said in SQLite in Python:

    As long as you're not having to dedicate a large part of it to McAfee and Windows Update.

    No McAfee, but a huge :fu: to Windows Update. It rebooted my PC last night. I didn't lose any unsaved files, but I lost my interactive Python and SQLite sessions, and I had to rummage through the window stack for the browser window with all my SQLite help searches. And that's why I haven't written a single line of Python this morning. Wait a minute, no it's not. I haven't written a single line of Python this morning because I've been busy working on work stuff. That's right, working. 👀:seye::whistling:


  • Discourse touched me in a no-no place

    @dangeRuss said in SQLite in Python:

    at the cost of turning PC into a whole house heater

    Probably no more than a 1kW heater or thereabouts. Modern power supplies are about 90% efficient, according to the box of the one I've got.



  • @dkf 2 or 3 desktops running 24/7 make a very effective heater for a smallish bedroom, that's for sure.


  • Discourse touched me in a no-no place

    @HardwareGeek That would depend on your application load profile too, but I get your point. I have mine go to sleep overnight to cut noise and light.



  • @dkf said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    at the cost of turning PC into a whole house heater

    Probably no more than a 1kW heater or thereabouts. Modern power supplies are about 90% efficient, according to the box of the one I've got.

    Yea I guess you can't pull more than 1500 watts from an outlet, so yea probably just below a space heater.

    I'm just Amazed at how efficient this ARM stuff is. The windows I'm running in Paralells is an ARM version, wonder when we will see ARM based laptops for Windows.



  • @dangeRuss said in SQLite in Python:

    wonder when we will see ARM based laptops for Windows.

    https://www.xda-developers.com/best-windows-on-arm/



  • ARGH!!! After installing the newer WSL VM, the script won't run. Python is throwing an error from the html package (supplying HTMLParser).

    File "/usr/lib/python3.10/html/parser.py", line 109, in feed
    self.rawdata = self.rawdata + data
    AttributeError: 'NameDataParser' object has no attribute 'rawdata'. Did you mean: 'road_data'?

    Note this is coming from the package, not my script. There is no rawdata in my script. But there definitely is a self.rawdata in HTMLParser, the base class of NameDataParser.

    And it's failing identically (except for the line number) in the older VM, which was running previously. And trying to update html with pip install html -U fails with the error

    Collecting html
    Downloading html-1.16.tar.gz (7.6 kB)
    Preparing metadata (setup.py) ... error
    error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [6 lines of output]
    Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-install-a83fnxym/html_fefcd7ce148b42cf90ea6d90f66ac3a8/setup.py", line 12, in <module>
    long_description = __doc__.decode('utf8'),
    AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?
    [end of output]

    Well, that's true. str doesn't have a decode() method. Since lots of people use pip, it's hard to believe such a glaring error could get into the wild, much less a LTS distro, so presumably Python doesn't usually think __doc__ is a str (presumably, it's supposed to be a bytes object). So WhyTF is Python suddenly deciding it's a str???

    Clearly, fate does not want me to finish this project.



  • @HardwareGeek said in SQLite in Python:

    ARGH!!! After installing the newer WSL VM, the script won't run. Python is throwing an error from the html package (supplying HTMLParser).

    File "/usr/lib/python3.10/html/parser.py", line 109, in feed
    self.rawdata = self.rawdata + data
    AttributeError: 'NameDataParser' object has no attribute 'rawdata'. Did you mean: 'road_data'?

    Note this is coming from the package, not my script. There is no rawdata in my script. But there definitely is a self.rawdata in HTMLParser, the base class of NameDataParser.

    And it's failing identically (except for the line number) in the older VM, which was running previously. And trying to update html with pip install html -U fails with the error

    Collecting html
    Downloading html-1.16.tar.gz (7.6 kB)
    Preparing metadata (setup.py) ... error
    error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [6 lines of output]
    Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-install-a83fnxym/html_fefcd7ce148b42cf90ea6d90f66ac3a8/setup.py", line 12, in <module>
    long_description = __doc__.decode('utf8'),
    AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?
    [end of output]

    Well, that's true. str doesn't have a decode() method. Since lots of people use pip, it's hard to believe such a glaring error could get into the wild, much less a LTS distro, so presumably Python doesn't usually think __doc__ is a str (presumably, it's supposed to be a bytes object). So WhyTF is Python suddenly deciding it's a str???

    Clearly, fate does not want me to finish this project.

    Not sure the issue here but often it's a python version or dependency problem. I don't remember WSL so well, but should it be pip3 and not pip? Sometimes a package you installed overshadows a bundled dependency.


  • Discourse touched me in a no-no place

    @HardwareGeek said in SQLite in Python:

    presumably Python doesn't usually think __doc__ is a str (presumably, it's supposed to be a bytes object). So WhyTF is Python suddenly deciding it's a str???

    It is supposed to be a str in all versions of Python. It contains a human-readable documentation string (if present; programmers remain bad at documenting things everywhere). Sounds like you have a mix up between Python 2 and 3.



  • @HardwareGeek said in SQLite in Python:

    Well, that's true. str doesn't have a decode() method. Since lots of people use pip, it's hard to believe such a glaring error could get into the wild, much less a LTS distro, so presumably Python doesn't usually think __doc__ is a str (presumably, it's supposed to be a bytes object). So WhyTF is Python suddenly deciding it's a str???

    str used to have a decode() method back in python 2. That means the code running is expecting to be run under python 2. A new Linux distro will not have it, because it's been out of support for a couple of years, but you may have some lying in your home.

    Unfortunately you stripped out the most important part of that backtrace, the file locations, but I'd wager a guess they started with your home/lib/python/. That directory unfortunately didn't change from python 2 to python 3, and is first on the search path, so any code there—installed there by pip or just raw setuptools some time long, long ago—will get loaded in preference to the distributed content.

    At the moment I suggest you just delete that directory (look in the backtrace which it is; you might have some extra configuration to share it with Windows or something) or move it out of the way and install the dependencies again, preferably with dpkg.

    Clearly, fate does not want me to finish this project.

    My suggestion to use python sanely is to:

    • Use the system package manager in very, very strong preference to pip.
    • Whenever you need to install with pip or manually, create a venv for the use-case.
    • Prefer using pip as the-venv-python3 -m pip install -r requirements.txt, where you first write what you need in that file.

    I installed azure cli with pip on my work machine a while ago¹ and already regret it. The problem is that the hairball of Microsoft Azure libraries has cross-dependencies that prevent just upgrading them all and pip doesn't have any command to solve an upgrade like dpkg has.


    ¹ because, for some :wtf: reason, the Microsoft azure-cli package carries its own python and I wanted to use some of the libraries directly.



  • @dangeRuss said in SQLite in Python:

    I don't remember WSL so well, but should it be pip3 and not pip?

    The current Ubuntu package (already as of Jammy; WSL is just a VM with a normal Linux distro in it) installs both, and both are for python3, because python2 is long out of support. But any time you want to be sure, the official way to call it is

    python3 -m pip
    

    That can be used when you have multiple versions of python installed and in venvs (which is preferred way to use it anyway).


  • Discourse touched me in a no-no place

    @Bulb said in SQLite in Python:

    My suggestion to use python sanely is to:

    • Use the system package manager in very, very strong preference to pip.

    My general recommendation is to avoid the system Python as much as possible precisely because it is so often a broken old version. The system package manager should only be used for handling changes to the system packages.

    • Whenever you need to install with pip or manually, create a venv for the use-case.

    That is good advice. Once in the virtual environment (which is really just a directory structure and some environment variables) you can install with tools like pip. Install from wheels instead of from source. Unless you have a lot of spare time.

    Being "in the virtual environment" means you've done:

    . theEnv/bin/activate
    

    in your shell. It changes your shell prompt so you know what's going on.

    I usually recommend doing python -m pip -U pip as one of the first things in preparing a virtual environment for use. Because the version of pip you bootstrap with is usually outdated. :rolleyes:

    • Prefer using pip as the-venv-python3 -m pip install -r requirements.txt, where you first write what you need in that file.

    There are other ways. So. Many. Other. Ways... :doing_it_wrong:



  • @dkf said in SQLite in Python:

    My general recommendation is to avoid the system Python as much as possible precisely because it is so often a broken old version. The system package manager should only be used for handling changes to the system packages.

    It depends on what system it is and who administers it. If you administer it, and are using something reasonably new (like latest LTS Ubuntu), the distribution version is much easier to keep up-to-date then anything you install manually, because the distribution package manager is going to handle the dependency resolution for you.

    If for some reason you are on something like two versions old RHEL administered by somebody else, well, then you'll totally need to install newer python yourself.


  • Discourse touched me in a no-no place

    @Bulb said in SQLite in Python:

    If for some reason you are on something like two versions old RHEL administered by somebody else, well, then you'll totally need to install newer python yourself.

    IME the system Python is usually of a very uncertain version. It might be up to date, but probably won't be. If you're using macOS you can definitely count on it not being a good plan to use it; Apple do some weird things to software versions under the covers (and updates are messy to apply). On Linux, versions are all over the place and users don't normally know the first thing about what's going on.

    Windows handles this by not having a system version of Python at all. It's a solution I suppose...



  • @dangeRuss said in SQLite in Python:

    If all you're doing is IO, you're probably fine with a single CPU.

    If all you're doing is IO, you could just do non-blocking/vectorized IO, no need to waste resources on threads. I'm sure even Python supports that somehow (and it'll be less of a wtf than actual threading in it).


  • Discourse touched me in a no-no place

    @cvi said in SQLite in Python:

    I'm sure even Python supports that somehow (and it'll be less of a wtf than actual threading in it).

    Yep. I've not used it.



  • @dkf Apparently not for local file access, though. (Did the Linux kernel ever make that work for regular files?)

    Edit: Nevermind, I take my above post back. I just remembered the clusterfuck that async file IO is, and it appears that it hasn't significantly improved. *sigh*



  • @cvi said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    If all you're doing is IO, you're probably fine with a single CPU.

    If all you're doing is IO, you could just do non-blocking/vectorized IO, no need to waste resources on threads. I'm sure even Python supports that somehow (and it'll be less of a wtf than actual threading in it).

    yea you could do async, but isn't that basically threads?



  • @dangeRuss said in SQLite in Python:

    but isn't that basically threads?

    Ideally, no. Though if you don't have proper support, it might be emulated with threads.



  • @cvi said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    but isn't that basically threads?

    Ideally, no. Though if you don't have proper support, it might be emulated with threads.

    So what does it use? Some sort of light weight threads, no?


  • Java Dev

    @dangeRuss said in SQLite in Python:

    @cvi said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    If all you're doing is IO, you're probably fine with a single CPU.

    If all you're doing is IO, you could just do non-blocking/vectorized IO, no need to waste resources on threads. I'm sure even Python supports that somehow (and it'll be less of a wtf than actual threading in it).

    yea you could do async, but isn't that basically threads?

    It doesn't need threads in userspace, if the kernel offers the necessary APIs.
    A blocking IO call will read as much data as the buffer size you are presenting, or will write the entire buffer you are providing to it.
    However, a non-blocking IO call only reads any data the kernel already loaded, and only writes as much data as the kernel has buffer space for. This mean the call never blocks waiting for disk or network IO.



  • @dangeRuss said in SQLite in Python:

    So what does it use? Some sort of light weight threads, no?

    What @PleegWat said. Kernel deals with it. The caveat is that "standard" non-blocking IO doesn't really work for regular files/block devices. It's mainly for network sockets (I tested it on a network FS at one point, but don't remember the results).

    There are additional APIs for async IO (which is separate from non-blocking IO). With async IO, you queue reads/writes actively and then wait for them to complete. Window's overlapped IO has always been more of the async IO flavour rather than non-blocking one.

    With non-blocking IO, you get the data in the buffer you provided to the call (or it's been transferred out of the buffer in case of writes) when the call returns. With async IO, that's not the case. So, async calls are more akin to "here's a buffer, put/grab data [from] there, and tell me when it's done". This could be pushed all the way down to hardware level. It could even end up being DMA:d from some device without any active CPU involvement. Though that obviously depends on the hardware/drivers etc. (and probably needs special APIs to make sure the destination memory is DMAable an so on).



  • @cvi said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    So what does it use? Some sort of light weight threads, no?

    What @PleegWat said. Kernel deals with it. The caveat is that "standard" non-blocking IO doesn't really work for regular files/block devices. It's mainly for network sockets (I tested it on a network FS at one point, but don't remember the results).

    There are additional APIs for async IO (which is separate from non-blocking IO). With async IO, you queue reads/writes actively and then wait for them to complete. Window's overlapped IO has always been more of the async IO flavour rather than non-blocking one.

    With non-blocking IO, you get the data in the buffer you provided to the call (or it's been transferred out of the buffer in case of writes) when the call returns. With async IO, that's not the case. So, async calls are more akin to "here's a buffer, put/grab data [from] there, and tell me when it's done". This could be pushed all the way down to hardware level. It could even end up being DMA:d from some device without any active CPU involvement. Though that obviously depends on the hardware/drivers etc. (and probably needs special APIs to make sure the destination memory is DMAable an so on).

    So if I wanted to compress a bunch of files in s3 by getting a file stream putting it through a compressor then giving the file stream to boto3 to upload the file, would it make a difference if I used asyncio?


  • Discourse touched me in a no-no place

    @cvi said in SQLite in Python:

    Apparently not for local file access, though. (Did the Linux kernel ever make that work for regular files?)

    It doesn't work for regular files on any OS. They all assume regular files are immediately readable and writable always. (Yes, this is true on Windows as well.) That's not really a problem; regular files really are that available, as there's nowhere in the OS that delays things. The problem is that that assumption falls apart for files on a network mount; they're still not selectable/waitable on, yet you can't always read or write them promptly, and so sometimes block. Yay. 😒



  • @dkf said in SQLite in Python:

    That's not really a problem; regular files really are that available, as there's nowhere in the OS that delays things.

    There's a lot of waiting on common IO devices. SATA has a relatively modest bandwidth and if there's something like spinning rust involved, seek times are non-negligible (can be more than a network mount on a local network). I'd argue that non-blocking IO could help here, but a true async IO interface is probably better (and doesn't mess with existing assumptions as much; though, not sure if people commonly set O_NONBLOCK or something on their regular files...).


  • Discourse touched me in a no-no place

    @dangeRuss said in SQLite in Python:

    So if I wanted to compress a bunch of files in s3 by getting a file stream putting it through a compressor then giving the file stream to boto3 to upload the file, would it make a difference if I used asyncio?

    Well, you've got the transfers which are I/O-bound (and so are good candidates for asyncio) and the compressor (CPU-bound, so sucktastic in Python, though the compressor is probably in C because it is likely zlib). The answer is "maybe".



  • @dangeRuss said in SQLite in Python:

    So if I wanted to compress a bunch of files in s3 by getting a file stream putting it through a compressor then giving the file stream to boto3 to upload the file, would it make a difference if I used asyncio?

    Compression sounds like it's more likely to be compute bound rather than IO bound. Either way, if whatever is your bottleneck (CPU or IO) is saturated, there's not much for you to do.

    Edit: :hanzo:


  • Java Dev

    @dangeRuss said in SQLite in Python:

    So if I wanted to compress a bunch of files in s3 by getting a file stream putting it through a compressor then giving the file stream to boto3 to upload the file, would it make a difference if I used asyncio?

    Depends on where your bottlenecks are. As others mentioned, compression is a CPU-heavy workload. Though zlib isn't all that expensive anymore nowadays.



  • @PleegWat said in SQLite in Python:

    Though zlib isn't all that expensive anymore nowadays.

    This is a few years old (2017):

    Back then, zlib would go at about ~100MB. A more modern CPU will likely perform quite a bit better; quick Google didn't show any very definitive results, though. Cloudflare has a more recent benchmark but they also seem to be on old CPUs, so the numbers are similar. The ~100MB/s is with the lowest compression ratio as well. Increasing the ratio makes the speed drop off rapidly.

    Either way, 100-200MB/s wouldn't saturate the IO bandwidth on a modern system. (SATA = 600 MB/s ideally; anything NVME is going to be faster.)

    That would be one of the reasons for looking at something like Zstd or brotli or whatever, especially if it's just for transport compression (e.g., you get better compression than zlib and can still saturate something like a Gigabit link from a single core).


  • Java Dev

    @cvi I'm remembering that it's at least 10 years ago that we switched from popen("zcat myfile.gz","r") to gzopen("myfile.gz","r") because reading from a pipe was proving more expensive than gzip-decompressing a plain file. Though I admit decompressing is generally faster than compressing.


  • BINNED

    @PleegWat said in SQLite in Python:

    popen

    Eww. :half-trolleybus-br:


  • Discourse touched me in a no-no place

    @cvi said in SQLite in Python:

    100-200MB/s wouldn't saturate the IO bandwidth on a modern system

    But will it saturate what bandwidth is available for reaching a remote system? In theory, no, but in practice? It depends on what others are doing, people who have no idea about your download/upload tasks.



  • @dkf said in SQLite in Python:

    @cvi said in SQLite in Python:

    100-200MB/s wouldn't saturate the IO bandwidth on a modern system

    But will it saturate what bandwidth is available for reaching a remote system? In theory, no, but in practice? It depends on what others are doing, people who have no idea about your download/upload tasks.

    I have not often seen speeds over 200MB/s on an S3 transfer. And this is going to be running in a lambda.



  • @cvi said in SQLite in Python:

    That would be one of the reasons for looking at something like Zstd or brotli or whatever, especially if it's just for transport compression (e.g., you get better compression than zlib and can still saturate something like a Gigabit link from a single core).

    I am actually using zstd, although it's not doing much for parquet files which are already gzipped.


  • Discourse touched me in a no-no place

    @dangeRuss said in SQLite in Python:

    I have not often seen speeds over 200MB/s on an S3 transfer. And this is going to be running in a lambda.

    Exactly. That's about what I'd expect unless you do something really clever with reserving network bandwidth... but you don't get that sort of smartness with most cloud providers. Their whole cost model depends on not having that sort of thing going on so they can share out hardware.



  • @dkf said in SQLite in Python:

    @dangeRuss said in SQLite in Python:

    I have not often seen speeds over 200MB/s on an S3 transfer. And this is going to be running in a lambda.

    Exactly. That's about what I'd expect unless you do something really clever with reserving network bandwidth... but you don't get that sort of smartness with most cloud providers. Their whole cost model depends on not having that sort of thing going on so they can share out hardware.

    No I mean you could get network optimized instances. Probably not on a lambda though


  • Java Dev

    @topspin said in SQLite in Python:

    @PleegWat said in SQLite in Python:

    popen

    Eww. :half-trolleybus-br:

    Which is the other reason I wasn't sad to see it go. Though I still have infrastructure I use to build and start pipelines, at least that is all backed by execve().



  • @dkf said in SQLite in Python:

    But will it saturate what bandwidth is available for reaching a remote system? In theory, no, but in practice?

    It depends. The bandwidth is on the input end of the equation, so it depends where on the system the input data lives. The graphs show a ~3 "times" compression ratio, so 100-200MB/s in then translates into a 30-70MB/s of compressed data to be transferred.

    Can you push that much to the remote system?



  • @cvi said in SQLite in Python:

    @dkf Apparently not for local file access, though. (Did the Linux kernel ever make that work for regular files?)

    Edit: Nevermind, I take my above post back. I just remembered the clusterfuck that async file IO is, and it appears that it hasn't significantly improved. *sigh*

    The situation has improved, on the system interface level, a lot, with io_uring(7), but few async frameworks caught up so far. But we're talking about Python, and the issue there is completely different.

    On the system level, there are three “async” interfaces:

    1. Non-blocking operations. These are the normal read/write on a file descriptor with O_NONBLOCK flag set, together with poll(2)/epoll(7) to tell you which file descriptors are ready for use. This only works for sockets, pipes and character devices (serial lines, pseudoterminals), not for regular files. The reason is, that sockets and pipes become ‘ready’ by an external event (data arrived, data acknowledged so the send buffer could be released), so poll can tell you whether the event happened and read/write can return with EAGAIN if it did not. But for regular files, the call to read/write triggers, and drives, the disk operation, so the file is always “ready” even if the kernel has to wait for the periphery inside the operation sometimes.
    2. aio(7): The older interface for true async I/O, where the operation is driven by tasklets or kernel thread, so you don't need separate threads for the operations on your side. As far as I can tell, this should work for regular files too. But the way it signals completion with signals is pain in the back end to handle, especially in a multi-threaded program, so few ever use it.
    3. io_uring(7): The new interface for true async I/O. In this case you set up a pair of buffers. In one you put the requests and in the other the kernel tells you which are done. You still need a system call to tell the kernel you've placed new requests and a system call to ask whether there are new completions, but you can batch it, so you are finally saving some kernel entry/exit pairs. But it's fairly low level, so you really need a library on top of it to take care of all the synchronization, and few have caught up so far.

    But so long as we are talking about Python, the issue is somewhere else: the Big Interpreter Lock.

    Threads in Python are a joke. It's a problem for all high-level languages, really. In C++, you just tell the programmer to lock anything that is shared between threads and if they fail to do so, let things crash and burn. But in high-level languages like Perl, Python or JavaScript, there is a lot of internal bookkeeping going on, and that's the interpreter's responsibility to lock properly. That's why Perl only shares what you explicitly tell it to, while Python just locks everything while interpreting the bytecode. Basically the only way to use Python threads that makes sense is to start long running natively implemented functions in them, and hope the BIL gets properly released for it.

    JavaScript instead punted and refused to implement threads at all, and instead introduced the async functions. Which Python is finally catching up to with asyncio …, but a brief look through the documentation suggests it does not actually come with async versions of plain read and write that would use at least worker threads.

    There are some binding for libuv, the even handling library NodeJS uses, and its documentation even says it uses io_uring since 1.45. … but they don't seem to integrate with the Python's async machinery, so they are probably pain to use. Python's not a good language for that level of performance, really.



  • @Bulb said in SQLite in Python:

    But so long as we are talking about Python, the issue is somewhere else: the Big Interpreter Lock.

    An io_uring like interface (or even something like Window's overlapped IO IIRC) wouldn't suffer from this, though. There are no (Python) threads, thus no interpreter lock.

    I'll trust you that there are no good Python interfaces for this at the moment. :-)



  • @HardwareGeek said in SQLite in Python:

    ARGH!!! After installing the newer WSL VM, the script won't run. Python is throwing an error from the html package (supplying HTMLParser).

    File "/usr/lib/python3.10/html/parser.py", line 109, in feed
    self.rawdata = self.rawdata + data
    AttributeError: 'NameDataParser' object has no attribute 'rawdata'. Did you mean: 'road_data'?

    Note this is coming from the package, not my script. There is no rawdata in my script. But there definitely is a self.rawdata in HTMLParser, the base class of NameDataParser.

    I tried again after 3 weeks of ignoring this. It turns out I am :trwtf:.

    I neglected to call HTMLParser.__init__() from NameDataParser.__init__(). Therefore, none of the base class members got created. There really wasn't a rawdata in my class.

    Fixed a few other typos, and it's now chewing on a lot of data.


  • Discourse touched me in a no-no place

    @HardwareGeek Usually better to use super().__init__(). I have my IDE set to warn when I miss out an important one of those.



  • @dkf I was reading somewhere (SO?) that super() is deprecated and BaseClassName is now the preferred way of accessing the base class functions. But 🤷♂; I'm far from an expert in Python.


  • Discourse touched me in a no-no place

    @HardwareGeek I think I found that post, and it was talking about the Python 2 version of the syntax. You don't want that, I hope!

    The time to avoid it and do explicit calls is when you're doing non-trivial multiple inheritance... but that is probably best avoided anyway. (Trivial MI might contribute methods, but not a fields or an __init__; it is like implementing an interface in some other languages.)



  • @dkf said in SQLite in Python:

    Python 2 version of the syntax. You don't want that, I hope!

    :eek: No!

    non-trivial multiple inheritance

    No multiple inheritance at all, unless there's some within the library itself.



  • @dkf said in SQLite in Python:

    The time to avoid it and do explicit calls is when you're doing non-trivial multiple inheritance...

    Au contraire. super() is what implements the multiple inheritance.

    Of course the tricky part is that when you are doing multiple inheritance, the arguments to all the __init__ functions have to all match, because the C3MRO means the super() might point to some place in the inheritance tree other than your immediate base class.

    That is normally solved by passing around **kwargs and picking out what each class needs, but classes that were not designed with multiple inheritance in mind can break if you multiply inherit them.


Log in to reply