Article Discourse developers badly need to read (but will not)



  • This is a dependency visualization of every Rails app I’ve ever used.

    In other words, this gem wasn’t used by your app. It wasn’t used by Rails directly. It wasn’t used by ActionMailer directly. It was used deep in the bowels of the ActionMailer implementation and it was using far too much memory. Every single Rails app in existence was using 10MB too much due to this issue.



  • BTW, C# is great at avoiding this problem by virtue of having a HUGE standard library that contains about 99.9% of what any app needs.

    Sadly, it's not perfect. I wish I had worked here before our app got hooked to that beast named "WebAPI2". Of which we use maybe 10% of its capability, but have spend hundreds of hours working around its bugs and flaws.



  • That looks like one of those bizarre creatures that lives deep on the ocean floor around thermal vents and stuff.


  • FoxDev

    Either that, or something that would kill you in three days if you contracted it



  • @blakeyrat said:

    In other words, this gem wasn’t used by your app. It wasn’t used by Rails directly. It wasn’t used by ActionMailer directly. It was used deep in the bowels of the ActionMailer implementation and it was using far too much memory. Every single Rails app in existence was using 10MB too much due to this issue.

    The only WTF here is that this MIME type class library (or "gem", whatever) used too much RAM.

    I don't see the point of OMG EVERYONE'S USING THIS WITHOUT EVEN KNOWING IT!!!1!

    Well, since every web app needs some kind of MIME functionality, of course every fucking app ends up using this gem. I don't see why's that so surprising. Whether it's a packaged as a third party library, or part of a monolithic framework, the code you need somehow ends up inside your app. Otherwise, you don't get the functionality.

    The article has some valid points. Don't include the code you don't need. Don't require an entire library just to get one function you could easily copy/paste on its own. Design your libraries to use as few dependencies as possible.

    Too bad the example they use to make that point is flawed.



  • @cartman82 said:

    Too bad the example they use to make that point is flawed.

    It's Ruby. What did you expect?



  • This is how I feel about software that tries to move everything to a plugin system, for example Firefox.

    Sure, you're adding a lot of flexibility, but now you have to trust hundreds of random developers to do their part correctly, and each of them will want to use different libraries to do the same thing (like the HTTP example in the article).



  • @anonymous234 said:

    (like the HTTP example in the article).

    Yeah, to me the HTTP example is FAR more egregious than the MIME handler example.

    Why are there so many HTTP classes in Ruby? ... why is there more than literally one?


  • I survived the hour long Uno hand

    Node has the same problem.

    For my latest toy project I used json for the config files instead of yaml, because json can be read natively by node, so I had less dependencies.


  • Grade A Premium Asshole

    @blakeyrat said:

    Every single Rails app in existence was using 10MB too much due to this issue.

    So what? I mean, sure, if I were deploying 10K instances of it, I might care. But even at 100 instances deployed as inefficiently as possible it is only ~1GB of memory. Memory is cheap, I couldn't give a shit less about an excess 10MB of RAM.



  • I was expecting an article titled "How not to be terrible at application design and implementation and QA and public relations". Or is that too weirdly specific?



  • @blakeyrat said:

    Why are there so many HTTP classes in Ruby? ... why is there more than literally one?

    Do you really even need to ask?



  • The .net ecosystem has a million monkeys running around, and only two HTTP clients in it. WebClient and HttpClient. And WebClient is deprecated (I believe-- too lazy to check.)



  • @blakeyrat said:

    BTW, C# is great at avoiding this problem by virtue of having a HUGE standard library that contains about 99.9% of what any app needs.
    PHP also brags about this. It sucks profoundly.



  • Which means that the .NET one is written really well, if people don't say it sucks profoundly and there aren't a bunch of attempts to make better ones.



  • It does not mean it is written really well, just that it does not suck profoundly.

    There are many languages that where people don't say the standard library sucks profoundly and where there are not bunch of attempts to make replacements. PHP is a notable example, possibly because it has a long history of suck and while the maintainers are trying to fix it, they are also restricted by backward compati(de)bility and the replacements, which also suck, exist already and can't be easily undone. Another notable example is D, which undergone a Great Standard Library Schism and that was big part of the reason why it failed to get much traction. And I suspect Node is another, possibly because the Continuation Passing Scheme leads to much less readable code and everybody is trying to fix it, not seeing that it can't be done without proper syntactic sugar (async/yield).


  • FoxDev

    @Bulb said:

    Continuation Passing Scheme leads to much less readable code

    Only if you don't know how to write it properly (like any other language tool)



  • Oh, I am sure you can write readable CPS code. But Joe The Bold Brash Greenhorn can't, because it takes some learning, experience and thought and he does not have those. But because he's bold and brash, instead of learning, he thinks he can fix it. And so another library is born. And it solves nothing, because that would take the same amount of learning and experience to even properly know what there is to fix, and, well, the bold brash greenhorn is not in for learning.



  • @RaceProUK said:

    Only if you don't know how to write it properly (like any other language tool)

    That's kind of the point of this discussion, isn't it?

    Node.JS's job is to teach users how to write it properly. To make writing it properly significantly easier than writing it wrong, so even the laziest most ignorant developer won't go out of their way to write it wrong.


  • FoxDev

    @blakeyrat said:

    iNode.JS's job is to teach users how to write it properly. To make writing it properly significantly easier than writing it wrong, so even the laziest most ignorant developer won't go out of their way to write it wrong

    I like Node, but I'll be the first to admit it's failing in this respect. Then again, it's not helped by CPS being tricky to understand in the first place; thankfully, the new Promises is a lot easier to get right. Or it would be, if the core libraries had been updated with them...



  • I get the distinct sense that nobody involved in the creation of maintenance of Node.JS has more than 6 months' experience in any other programming environment. There certainly ain't any C#/.net users up in there.



  • Node is an awesome environment, it's just held back by the fact that JavaScript sucks. If they were to fork JavaScript or something it could be much much better. Have an option to hide the async stuff and make things appear to be blocking, and that will help with newbies. Also add real classes, though I'd leave the prototype stuff in because it is useful in some cases. And give the programmer access to threads. Bam, perfection.



  • @RaceProUK said:

    Promises

    I don't work in node, so I don't know, but I wonder what they actually are as it does not seem they could be what they are in other languages: a way to synchronously wait for a result of asynchronous call.

    @mott555 said:

    If they were to fork JavaScript or something it could be much much better.

    The whole reason for node's existence is that it uses JavaScript. Without that, there would be no point as there are much better environments in other languages already.

    @mott555 said:

    And give the programmer access to threads.

    Yes, in the end, event loop and async IO can only get you so far, because the most efficient way to do IO is in page faults (via mmap), but that for rather obvious reasons can't be done async. Plus even if you are not trying, some major page faults are going to happen anyway and without more threads to schedule, you can't utilize that CPU time.

    Unfortunately threading and dynamic languages don't go together very well. Dynamic languages are generally expected not to be able to cause hard crashes, but it's very hard to avoid in a data race. So these languages need to avoid them in the metadata even if the user fails to lock properly and that leads to things like big interpreter lock (in python and I believe ruby as well). Even Java takes some penalty for making pointer writes atomic on platforms where they natively are not (on x86 they are, but not on arm and others). The only languages that can utilize full power of threads are the ones that embrace Undefined Behaviour (C++), purely functional ones (Haskell) and the ones that can guard from data races with static analysis (Rust). The rest needs to rely on multiple processes (which is, fortunately, perfectly fine for web servers).



  • @Bulb said:

    the most efficient way to do IO is in page faults (via mmap), but that for rather obvious reasons can't be done async.

    Never heard of non blocking io? That's essentially the same.(with some sugar)

    Moar:

    No second onebox for me 😤



  • @Bulb said:

    Yes, in the end, event loop and async IO can only get you so far, because the most efficient way to do IO is in page faults (via mmap), but that for rather obvious reasons can't be done async. Plus even if you are not trying, some major page faults are going to happen anyway and without more threads to schedule, you can't utilize that CPU time.

    If a web forum relies on IO enough that that would be necessary, I'm not sure how you got a billion concurrent users to not melt your raspberry pi.



  • @Bulb said:

    The only languages that can utilize full power of threads are the ones that embrace Undefined Behaviour (C++), purely functional ones (Haskell) and the ones that can guard from data races with static analysis (Rust).

    There's a fourth type: languages that have something like valgrind for threading. Go has -race as a build flag that makes it detect data races at runtime.



  • @ben_lubar said:

    Go has -race as a build flag that makes it detect data races at runtime.

    Then what happens? It just crashes and leaves users fucked?



  • If you're releasing a build of your program with debugging instrumentation in it, your users are already fucked. It's for developers who want to know where the data races are.



  • Well, first of all, you didn't say that was a debugging mode, you only said it was a compiler flag. PROTIP: as I type here like 57 times a day, I AM NOT TELEPATHIC.

    Secondly, does that actually do analysis to find potential data races, or does it only "detect" them if they happened to occur while the developer is debugging?

    Thirdly, you still haven't answer the question: what happens after they're detected? I assumed the program crashed, but you didn't say and I don't know.



  • Article blakeyrat developers badly need to read (but will not)



  • @swayde said:

    Never heard of non blocking io? That's essentially the same.(with some sugar)

    Page faults, are, by their very nature, blocking.

    Whether you use async or non-blocking does not make that much difference. It is mmap that does.

    @ben_lubar said:

    There's a fourth type: languages that have something like valgrind for threading. Go has -race as a build flag that makes it detect data races at runtime.

    No, there isn't. Providing tools to debug races is nice, but you can't rely on the developer doing it. So the language will still either have undefined behaviour, or pay the price of some limited synchronization of everything to avoid the worst problems.



  • Having your program crash because you fucked up synchronization is no different from having your program crash because you fucked up any other part of it. A compiler author can only promise that your program will do what you wrote it to do, not necessarily what you want it to do.

    However, if you use channels for cross-thread communication and don't hold the same pointers in multiple threads simultaneously, you can guarantee that your program will not have synchronization-related problems.


  • Discourse touched me in a no-no place

    @Bulb said:

    Whether you use async or non-blocking does not make that much difference. It is mmap that does.

    FYI, mmap only ever helps with loading plain files, whereas async I/O is mainly for other types of device. It particularly helps with sockets, though it's very much not the only use; it's applicable to anything you can select/poll on.

    You can't mmap a socket. The OS will tell you to bugger off if you try (probably with ENODEV).

    Windows has similar capabilities, but labels the API calls and errors differently.

    @blakeyrat said:

    Secondly, does that actually do analysis to find potential data races, or does it only "detect" them if they happened to occur while the developer is debugging?

    Probably only finds ones that actually occur while running. Detecting all potential ones is a… different class of problem entirely. That sort of thing requires non-trivial analysis tools like model checkers, and those things really don't scale well at all. (I used to write this sort of thing professionally. It took a supercomputer of the time to analyse even a pretty small program.)

    People round here are used to going WTF when an algorithm is merely quadratic. The true general analysis tools, unless someone's made an amazing breakthrough, tend to be at least EXPSPACE, and often much worse. Scaling with them is a joke. Fortunately, most programs are largely extremely tractable in practice.


  • Discourse touched me in a no-no place

    @Bulb said:

    Unfortunately threading and dynamic languages don't go together very well.

    I see you've only ever seen one model in use. There are others that do scale well and which don't crash. They do this by not trying to use a single flat memory space shared across all threads. Partitioning adds overhead to some algorithms, but makes everything much easier to analyse.

    Most code is mostly bound to a single thread most of the time. Even in C++, the language you propose as the “super thread-aware high performance wunderkind god's gift to languages”, most algorithms are mostly single threaded and most memory is only used from a single thread. This is how most people comprehend algorithms; thinking in parallel is difficult.

    @Bulb said:

    The rest needs to rely on multiple processes (which is, fortunately, perfectly fine for web servers).

    Multiprocess is good in some ways — the OS can help you with security partitioning! — but comes with quite a bit of overhead. Can't have one without the other.


  • Java Dev

    @dkf said:

    You can't mmap a socket. The OS will tell you to bugger off if you try (probably with ENODEV).

    Sure you can. You just have to setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &params, sizeof(params)); and/or setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &params, sizeof(params)); first, with appropriately populated parameters. See https://www.kernel.org/doc/Documentation/networking/netlink_mmap.txt.



  • Use typescript or ES6.



  • @dkf said:

    ; thinking in parallel is difficult.

    https://en.wikipedia.org/wiki/Amdahl's_law
    And often futile, unless you have embarrassingly parallel problems.



  • @Bulb said:

    Oh, I am sure you can write readable CPS code

    I'm not. CPS is (or can be) useful as a compiler intermediate language, but it's pretty much impenetrable to use as a normal written language. It's also ridiculously verbose.
    @ben_lubar said:
    However, if you use channels for cross-thread communication and don't hold the same pointers in multiple threads simultaneously, you can guarantee that your program will not have synchronization-related problems.

    That's a rather bold* statement.

    * Well, perhaps "brave" is a better wording. [b]After all, this is a bold statement.[/b] Or, at least, it is [i]until you try quoting it in Discourse[/i].



  • @tufty said:

    That's a rather bold* statement.

    Sorry, I should have said "you will not have data races". Because it's impossible to have data races within a thread and each location in memory would only be accessible to whatever thread currently owns it.


  • Trolleybus Mechanic

    @tufty said:

    That's a rather bold*strong text statement.

    💾🐎TFY



  • @dkf said:

    FYI, mmap only ever helps with loading plain files

    I know. But most server applications also need some local data, and since these days disk and network throughput are comparable, the disk can easily become a bottleneck if you do it synchronous in single thread. And non-blocking does not work for disk. AIO does, but I am not sure how efficient it is. Also, I don't know whether there is an asynchronous alternative to sendfile.

    @dkf said:

    Probably only finds ones that actually occur while running.

    I can't speak for go, but what valgrind does is that it detects when a memory location is accessed by more than one thread and there is no lock that would be consistently locked around each access. That's just a heuristics that might point out problems and at the same time has a lot of false positives.

    @dkf said:

    I see you've only ever seen one model in use. There are others that do scale well and which don't crash. They do this by not trying to use a single flat memory space shared across all threads. Partitioning adds overhead to some algorithms, but makes everything much easier to analyse.

    Yes, I've only seen the common languages (python, ruby, perl, groovy). Which language does this? I would like to look at that.

    @dkf said:

    Even in C++, the language you propose as the “super thread-aware high performance wunderkind god's gift to languages”

    No, I don't. C++ allows everything you need, but leaves everything up to you, making it hard and error-prone.

    @ben_lubar said:

    However, if you use channels for cross-thread communication and don't hold the same pointers

    … a thing go does absolutely nothing to prevent you from doing. Rust does, though.



  • @blakeyrat said:

    The .net ecosystem has a million monkeys running around, and only two HTTP clients in it. WebClient and HttpClient. And WebClient is deprecated (I believe-- too lazy to check.)

    Don't forget about good ol' HttpRequest, HttpWebRequest and WebRequest.

    @Bulb said:

    @RaceProUK said:
    Promises

    I don't work in node, so I don't know, but I wonder what they actually are as it does not seem they could be what they are in other languages: a way to synchronously wait for a result of asynchronous call.

    Promises are best compared to callbacks, but with a slightly different syntax:

    myAsyncMethod().then(function (successResult) {
            alert("yay it worked");
        }, function (failureResult) {
            alert("boo it failed");
        }
    );
    

  • Discourse touched me in a no-no place

    @Bulb said:

    Also, I don't know whether there is an asynchronous alternative to sendfile.

    The shipping of bytes from a file directly to a socket tends to be a small part of overall processing. It's only ever really applicable to CDNs (or crufty old protocols like FTP 😱) and everyone just uses them for real instead, and once you've got an encrypted connection, you can't use sendfile (unless you're able to offload the encryption with the session key into the network hardware, which is not most folks). The few people left who need it tend to either not care about performance or use nginx. 😉

    Apart from that, the main technique of speeding up I/O to files is apparently to use scatter/gather. I've yet to see any application that was running so tight to the limit that merely reducing the number of system calls (as opposed to IOPS) made a huge difference.

    @Bulb said:

    Yes, I've only seen the common languages (python, ruby, perl, groovy). Which language does this? I would like to look at that.

    Tcl. The script-level threading API should be shipped with any 8.6 version, and is available as a third-party package for 8.1 through 8.5. Disclosure: I'm one of the developers, though I mostly stay out of the threading side of things. ;)

    @Bulb said:

    making it hard and error-prone

    :giggity:

    I think the latest C++ standard provides a little bit more than C, but not hugely more. It's still effectively not much more than the brain-damaged capabilities available in POSIX threads…



  • @ben_lubar said:

    Sorry, I should have said "you will not have data races". Because it's impossible to have data races within a thread and each location in memory would only be accessible to whatever thread currently owns it.

    There's always going to be a bigger idiot.
    And today, that idiot is going to be me:



  • @AlexMedia said:

    Don't forget about good ol' HttpRequest, HttpWebRequest and WebRequest.

    Those aren't three different HTTP clients; those are three additional entry-points to the two HTTP clients we've already discussed.



  • @ben_lubar said:

    if you use channels for cross-thread communication and don't hold the same pointers in multiple threads simultaneously, you can guarantee that your program will not have synchronization-related problems

    Deadlock is still possible.



  • At least Go crashes when it deadlocks instead of just sitting there forever. Although it's pretty hard to deadlock, say, every single thread in a web server.



  • @ben_lubar said:

    At least Go crashes when it deadlocks

    Is that by design? How does it know?



  • If the scheduler sees that every thread is waiting on a synchronization primitive, it crashes with a stack trace.



  • @ben_lubar said:

    it crashes with a stack trace.

    Do you mean several stack traces? That'd be preferable.


Log in to reply