I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad
-
Hi everyone, long time lurker, first time poster. I'm going to play it safe and start by saying something everyone agrees with.
I needed a script for quickly downloading, parsing and filtering a particular JSON file. Very simple stuff. I'm fairly new to Python, but I know my way around somewhat.
So, first I test things on a local file. Everything works flawlessly, and the script completes in a blink of an eye. So let's actually download live data. Just replace file open with urlopen. I run a script and... it hangs up. 10 seconds and still nothing. Repeated several times . The link works fine in Firefox so it's definitely something wrong with the script. I start googling, and found out how to set proper user agent to avoid being detected as a bot, which might be the issue.
But then I look back to the terminal I left open... and the script finished successfully! It didn't hang up, it just took so long to run! Like, over 20 seconds! Come on, it's just 1 meg! curl can do that in under a second! So I rewrote the script to spawn curl instead of calling urlopen. Traditional knowledge would say that spawning a process and reading stdout is definitely slower than pretty much anything else you could try. Not this time. I've saved over 90% run time by switching to curl. It wasn't easy, though. The documentation for Python stdlib is by far the absolutely worst stdlib documentation I've ever seen. I worked with in-house languages used for just one project that had better documentation than Python. If it wasn't for StackOverflow I'd never figure out how to pass curl's stdout stream directly to JSON parser without collecting it all in a string first. It's obvious in hindsight. But it doesn't say anywhere in the docs that Popen.stdout has read() method.
Of course I didn't have to use Python. I could use anything else. In fact, Python wasn't even my first choice. I tried JS (Node) first. The code was 10 times shorter and I wrote it 10 times faster than the Python version. And I wrote it before the Python version. But I couldn't use it because in this project we use Node 9, and Node didn't have Array.prototype.map until version 11. What the fucking fuck! How could a language interpreter released in the last decade not have the most basic container functions! It didn't even occur to me to check! Even C++, the king of having nothing useful in its stdlib, had it since at least 2003! (In a very roundabout way but still.) How is this even possible!?
Yes, I can use npm run or nvm run or whatever the right one is called, to run the script with the right version, and put it in shell script for easy access. But I wanted to share the script with the rest of the team, and pinning it to a particular random Node version completely unrelated to the project would be at leasr annoying. No problem, I thought. I'm just going to run with latest installed Node whatever it is, I thought. Oh my how wrong I was. You see, there is no way to tell n?m to use latest installed version. There is no way to check what the latest version even is. It's either a specific number, or default. You can't even get a list of installed versions. I mean, technically you can. There's n?m ls (amazing command name BTW) that returns a list of what's installed. But the output is a twisted maze of aliases that's neither human- nor machine-readable. It would take me longer to write a parser for that than it took me to write a parser for the actual thing I wanted to parse. Both the JS and Python version twice over.
-
Welcome @Gustav.
I see by the thread's tags you'll feel right at home here.
-
@Gustav welcome.
The answer, of course, is that you should have used PHP which has curl enabled by default, has array mapping built in (and has done for 20 years), and has remarkably fewer version gotchas than Nodeland does. (because then youād have PHP on you)
That said, the Python ecosystem as a whole feels your pain, and made the requests package for this sort of nonsense. But thatās not helpful when writing a script you intend to shareā¦
-
There's deno now as an alternative to node. It's a single executable requiring no installation and accepting TypeScript.
I don't have any experience with it, but it might be worth experimenting with.Also the other one which name I forget? Bun? https://bun.sh/ I know even less about that one.
-
@Arantor said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
That said, the Python ecosystem as a whole feels your pain, and made the requests package for this sort of nonsense.
Looking through Github, requests is just a wrapper around urllib3. And urllib3 is the same shit as urlib2 - still implemented in pure Python - that somehow isn't so inconceivably slow. Makes one wonder what the hell went wrong with urllib on their SECOND attempt.
-
@Zecc said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
There's deno now as an alternative to node.
Let me guess. Its package manager is called mpn?
-
@Gustav not sure itās quite the āsecond attemptā as much as āsecond group to give it a goā and/or āthis is for Python 2 vs that is for Python 3ā because that schism is still ongoing.
-
@Arantor actually urllib2 was added in Python 2.1. (source 1) (source 2)
-
@Gustav I donāt really pay attention to the Python ecosystem, just I had a vague memory of this being one of the examples of the 2 vs 3 schism, and that this was also one of those āweāre calling it v2 despite it having little relation to v1ā dramas. Happy to be corrected though.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
But I couldn't use it because in this project we use Node 9, and Node didn't have Array.prototype.map until version 11.
Wouldn't that be like a three line polyfill to add?
Even C++, the king of having nothing useful in its stdlib, had it since at least 2003!
Do you mean
std::transform
? Ew, while "best practice", that's probably harder to use than just a loop.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Let me guess. Its package manager is called mpn?
Jokes aside, I think it's one of these two:
- there is no package manager
- its package manager is called deno
-
@Zecc said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
There's deno now as an alternative to node. It's a single executable requiring no installation and accepting TypeScript.
I don't have any experience with it, but it might be worth experimenting with.Also the other one which name I forget? Bun? https://bun.sh/ I know even less about that one.
I was about to suggest finding a VS plugin that vomits a bundled runtime and node modules as an InstallShield package to conveniently ship your 5-line script as a 300 MB package. "Bundler" reads like someone made that already, but I'll be damned if I'm going to find out.
-
@LaoC you did just describe what it feels like to write JS the rest of the time.
-
@Arantor meh. Just don't look too closely at what's in
node_modules
and you should be fine (provided your disk is big enough).
-
@boomzilla I never go in there, Iām not nearly that crazy.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Looking through Github, requests is just a wrapper around urllib3. And urllib3 is the same shit as urlib2 - still implemented in pure Python - that somehow isn't so inconceivably slow. Makes one wonder what the hell went wrong with urllib on their SECOND attempt.
I vaguely remember being forced to use
urllib
in one script (centos 4 in 2014 is wtf by itself) and it was kinda "not user-friendly". I cannot remember exactly though. Something along the lines of "you need to open TCP connection to correct IP and port first" ... but I might confuse it with some other WTF platform.
-
@boomzilla said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
provided your disk is big enough
My computer is hung down to here slaps knee
-
I just would have used curl and jq.
-
@ObjectMike I tried that but
$ jq Command 'jq' not found, but can be installed with: sudo apt install jq
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@ObjectMike I tried that but
$ jq Command 'jq' not found, but can be installed with: sudo apt install jq
Install it?
-
@ObjectMike the whole point is to have something that doesn't need random crap installed on your system to work. If it's not available by default in both of the last two LTSes of all major Linux distros, or if it's not already one of the dependencies of the project this script is supposed to be used for, it might as well not exist. I write scripts to make life easier, not harder.
-
So much for
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
I could use anything else
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
The documentation for Python stdlib is by far the absolutely worst stdlib documentation I've ever seen.
It's very patchy. Some is good, some is awful. The IO capabilities are among the worse bits.
You're also right that Python is slow. Painfully so. (We have to use numpy to get any real speed at work; numpy is pretty fast on large matrices.) But a download of a few MB of data should be quick enough; something else is going on there, something really dumb and annoyingly bad. You weren't trying to download one byte at a time, were you?
-
@dkf I literally just did
json.load(urlopen(...))
. That was enough to break it. And it's not a fault of JSON parser becausejson.load(Popen(['curl', '-fsS', ...]).stdout)
works completely fine.Speaking of curl. I want to personally congratulate the fuckwit who designed the -s and -S flags. For those who don't know, the former means "disable two unrelated features with completely different purposes, one of which is used much more often than the other" and the latter means "disable disabling the less used feature when disabling the two unrelated features". This isn't just not user-friendly, it's actively user-hostile.
-
https://manpages.org/curl says:
-s, --silent
Silent or quiet mode. Don't show progress meter or error messages. Makes Curl mute. It will still output the data you ask for, potentially even to the terminal/stdout unless you redirect it.-S, --show-error
When used with -s it makes curl show an error message if it fails.:-|
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@ObjectMike the whole point is to have something that doesn't need random crap installed on your system to work. If it's not available by default in both of the last two LTSes of all major Linux distros, or if it's not already one of the dependencies of the project this script is supposed to be used for, it might as well not exist. I write scripts to make life easier, not harder.
What do you mean, "exist by default"? It doesn't make any sense to require something to be installed as part of a distro's default package selection. The whole point of a package repository is that you can easily and automatically install and update stuff. If your software has a dependency, just specify it and the package manager will pull it in, case closed. It's worth giving some thought to the question of whether you need a whole Node ecosystem of your own (or anything Node for that matter) but something like jq is just a no-brainer.
-
@LaoC said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
The whole point of a package repository is that you can easily and automatically install and update stuff.
I've bricked more OS installations by automatically installing and updating stuff from package repository than through all other methods combined. Both Linux and Windows, if we were to count Windows Update as a package repository.
That aside, yes, it might be available in MY repository, but no guarantee it'll be available for other team members. People here set up the OS on their own, and we have a mix of several distros and several versions of each distro. Each of them is going to have a potentially different version of jq, with potentially different set of supported features. On the other hand, everyone is and will always be able to run Python 3.5 scripts.
We recently went through a rewrite of some build scripts because one of the tools we used had breaking changes in Ubuntu 21.04 compared to 18.04, neither repo's build would run on the other Ubuntu because system libraries were different, and the only alternative to those was building from source ourselves. We decided it was easier to just stop using that tool.
It's worth giving some thought to the question of whether you need a whole Node ecosystem of your own (or anything Node for that matter) but something like jq is just a no-brainer.
We already depend on Node for critical parts of the project, so (a) we already need to install it anyway, and (b) it's never going away for us. On the other hand, jq isn't part of our world at all at the moment. Also, Node (from v11 upward) can do everything jq can, and then some - without weird ad-hoc command syntax. If we allow telling people to install jq as a viable solution, why wouldn't I tell them instead to install Node 16? It's actually a more stable solution because Node is outside the Linux repo ecosystem, which means it's always equally available to everyone and always in the same version (at a given point in time).
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
On the other hand, everyone is and will always be able to run Python 3.5 scripts.
Which is now out of date badly. 3.7 is the oldest receiving security fixes.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@LaoC said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
The whole point of a package repository is that you can easily and automatically install and update stuff.
I've bricked more OS installations by automatically installing and updating stuff from package repository than through all other methods combined. Both Linux and Windows, if we were to count Windows Update as a package repository.
Bricked
I've bricked about two machines in 25 years of Linux: my first SuSE where I managed to break some essential driver on a laptop that couldn't boot from CD, and a Gentoo that I tried to switch from 32bit to 64 on the fly. With a binary package manager? Not that I could remember (inb4 ). Hell, even the one time where I confused terminal windows and pasted the ready-made "switch this MF to CentOS-8-Stream" command line into an Oracle Linux shell didn't break anything that couldn't be fixed with a single reboot.That aside, yes, it might be available in MY repository, but no guarantee it'll be available for other team members. People here set up the OS on their own, and we have a mix of several distros and several versions of each distro. Each of them is going to have a potentially different version of jq, with potentially different set of supported features. On the other hand, everyone is and will always be able to run Python 3.5 scripts.
Breaking changes in Python are not unheard of, OTOH I don't know of anything that would have stopped working in
jq
. If people can't keep up with major distros in terms of common utilities (i.e. if it's not in your mainline repo, find a backport or overlay or COPR or whatever it's called in your neck of the woods), maybe they shouldn't be setting up their own systems.We recently went through a rewrite of some build scripts because one of the tools we used had breaking changes in Ubuntu 21.04 compared to 18.04, neither repo's build would run on the other Ubuntu because system libraries were different, and the only alternative to those was building from source ourselves. We decided it was easier to just stop using that tool.
Shit happens, but not even keeping to the basic tool set that's guaranteed to be available on every distro will protect you from that. I recently had a patch upgrade (3.1.3-14 to 3.1.3-16 or something) of rsync, on enterprisey, "stable" CentOS no less, introduce some different default that completely broke connectivity to the version that came with Oracle Linux. Easily fixed with a command line option, but not if you have hundreds of rsyncs hidden in dozens of different fragile macros spread over several 10k lines of shell scripts written by
monkeysDBA.It's worth giving some thought to the question of whether you need a whole Node ecosystem of your own (or anything Node for that matter) but something like jq is just a no-brainer.
We already depend on Node for critical parts of the project, so (a) we already need to install it anyway, and (b) it's never going away for us. On the other hand, jq isn't part of our world at all at the moment. Also, Node (from v11 upward) can do everything jq can, and then some - without weird ad-hoc command syntax. If we allow telling people to install jq as a viable solution, why wouldn't I tell them instead to install Node 16? It's actually a more stable solution because Node is outside the Linux repo ecosystem, which means it's always equally available to everyone and always in the same version (at a given point in time).
If you don't yet see a plethora of options for fucking it up anyway, I'm sure your cow-orcers will find them
-
@Gustav it's a half joke, but it can make sense depending of your situation, it's possible to make a docker run line that takes your script and directory mounted as a volume and run your script with a node version available as a container on dockerhub
edit: And know I'm considering actually doing it, because I have a similar problem with some build scripts
edit: and before someone points it, my build script can't run entirely on a container, because of legacy things that load windows GUI dlls that nobody wants to rewrite
-
@ObjectMike said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
I just would have used curl and jq.
Nice, that's installed on our RHEL machines. Need to remember to check that out, should I ever need it.
-
@LaoC said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Breaking changes in Python are not unheard of
They don't break things between versions much, except for the godawful 2 ā 3 change, but that was a major version change so nobody can claim that they weren't warned.
Breaking division was an awful thing to do to any code that was using math quite a bit, and things took a long time to revalidate afterwards. The other changes were larger but much easier to handle the migration for as tooling was good at finding them and/or they were easy to search for. But if anyone suggests changing division semantics, put their pulped body in the dumpster before setting it on fire.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
But if anyone suggests changing division semantics, put their pulped body in the dumpster before setting it on fire.
So you're saying it would be a divisive change?
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@LaoC said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Breaking changes in Python are not unheard of
They don't break things between versions much,
Not much but not zero either is what I was trying to say.
Breaking division was an awful thing to do to any code that was using math quite a bit, and things took a long time to revalidate afterwards.
TBF the 2.x behavior was moronic to begin with in a weakly typed language. Not that always using bigint was much better.
-
@LaoC said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
TBF the 2.x behavior was moronic to begin with in a weakly typed language.
But changing it was terrible because it was so damn difficult to track down all the places where changes needed to happen. The problem wasn't finding all the
/
symbols, but rather auditing each of them to figure out whether that should change to//
or not, when all the parameters to the operator were often parameters or fields and nobody'd really documented what the types were. Nobody really cares if the operator changes to become platonically more correct, but they do care a lot if their shit breaks weirdly.Contrast this with, say, the changes to
print
. Those were obnoxious, but a simple syntax checker would point out exactly everywhere there was a problem and you could just sit down and grind through fixing them all with the help of a large coffee.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
This isn't just not user-friendly, it's actively user-hostile.
*nix.
-
@dcon said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
This isn't just not user-friendly, it's actively user-hostile.
*nix.
User friendly, but choosy about who it is friends with.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@dcon said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
This isn't just not user-friendly, it's actively user-hostile.
*nix.
User friendly, but choosy about who it is friends with.
Wdym, it showereth its users with grandmotherly kindness.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Breaking division was an awful thing to do to any code that was using math quite a bit
Well, who would do that in an ML frontend.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@LaoC said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
TBF the 2.x behavior was moronic to begin with in a weakly typed language.
But changing it was terrible because it was so damn difficult to track down all the places where changes needed to happen. The problem wasn't finding all the
/
symbols, but rather auditing each of them to figure out whether that should change to//
or not, when all the parameters to the operator were often parameters or fields and nobody'd really documented what the types were. Nobody really cares if the operator changes to become platonically more correct, but they do care a lot if their shit breaks weirdly."We can't clarify if this division should round or not because we don't know what the intended semantics of this code was to begin with."
Yeah, that's certainly a problem.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Of course I didn't have to use Python. I could use anything else. In fact, Python wasn't even my first choice. I tried JS (Node) first.
Dynamic languages are hot garbage. We've known this for decades. That knowledge fell out of fashion for a while in the early part of the 21st century, and then a lot of developers got burned and have painfully returned to what we've known all along.
How long does it take to run this in something actually designed for programming, rather than a toy scripting system designed to make the monkey dance when you mouse over it?
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
This isn't just not user-friendly, it's actively user-hostile.
I see you've met the *nix ecosystem.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
I've bricked more OS installations by automatically installing and updating stuff from package repository than through all other methods combined.
What in the world have you been doing?
In my entire career, in my entire life, I have "bricked" a grand total of zero OS installations. By any means. The closest I came involved a buggy patch to a piece of consumer software that accidentally started executing a "delete the entire hard drive" command, which I caught and killed before it had erased more than half the system. It was repairable. And I was in my teens at the time.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
But if anyone suggests changing division semantics, put their pulped body in the dumpster before setting it on fire.
That was one of the few things Python unambiguously did right in its update. Having
5/2=2
has always been an ugly, error-prone mistake from the day Dennis Ritchie first coded it up. And like so many things in C and later C++, he got it wrong after earlier languages had already gotten it right, so it's not like he even has the excuse of being a pioneer breaking unfamiliar ground.Python was absolutely correct to abandon the garbage that is C division.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
But changing it was terrible because it was so damn difficult to track down all the places where changes needed to happen. The problem wasn't finding all the
/
symbols, but rather auditing each of them to figure out whether that should change to//
or not, when all the parameters to the operator were often parameters or fields and nobody'd really documented what the types were.This isn't a division semantics problem; it's a dynamic typing problem. See above, re: don't use toy scripting languages to do non-trivial work.
-
@dkf said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
*nix.
User friendly, but choosy about who it is friends with.
A friend who makes you work to be one.
-
@Mason_Wheeler said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
I've bricked more OS installations by automatically installing and updating stuff from package repository than through all other methods combined.
What in the world have you been doing?
Half of it can be attributed to installing Nvidia drivers. The rest is mostly, but not only, distro upgrades.
-
@Mason_Wheeler said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Having 5/2=2 has always been an ugly, error-prone mistake from the day Dennis Ritchie first coded it up.
In statically typed languages, what's the alternative? Make the return type float? Disallow integer divisions entirely?
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@Mason_Wheeler said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Having 5/2=2 has always been an ugly, error-prone mistake from the day Dennis Ritchie first coded it up.
In statically typed languages, what's the alternative? Make the return type float? Disallow integer divisions entirely?
The same thing that Python eventually did: make the return type a float and create a distinct integer division operator for when that's what you want. Pascal did that before C even existed, and it's the most correct solution.
-
@Gustav said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
@Mason_Wheeler said in I knew Python was slow, but not THIS slow. And I knew JS was bad, but not THIS bad:
Having 5/2=2 has always been an ugly, error-prone mistake from the day Dennis Ritchie first coded it up.
In statically typed languages, what's the alternative? Make the return type float? Disallow integer divisions entirely?
Make the return type float. If you expressly want integer, use a different operator to signify the difference. I've seen languages use \ for this.
The trick is you have to get this in early so you don't get shafted with the 'it was this before, now it's that' problem.
Edit: