Astronomically bad



  • Modern theoretical astrophysics relies heavily on the utilization of simulations to model astrophysical processes which take place on enormous timescales. The people who write the simulations are held in high esteem by the community at large. The code contained in them.... well, see for yourself. (anonymized to protect the criminally stupid)

    file_stuff.c

    #include "globals.h"

    void set_labels(void) {
        enum field_names i;

        for (i = 0; i < NUM_FIELD_NAMES; i++)
            switch (i) {
            case FIELD_1:
                strncpy(labels[FIELD_1], "AAA ", 4);
                break;
            case FIELD_2:
                strncpy(labels[FIELD_2], "BBB ", 4);
                break;
            case FIELD_3:
                strncpy(labels[FIELD_3], "CC  ", 4);
                break;
            case FIELD_4:
                strncpy(labels[FIELD_4], "DDDD", 4);
                break;
            case FIELD_5:
                strncpy(labels[FIELD_5], "E   ", 4);
                break;
            case FIELD_6:
                strncpy(labels[FIELD_6], "FFF ", 4);
                break;
            case FIELD_7:
                strncpy(labels[FIELD_7], "GGGG", 4);
                break;
            case FIELD_8:
                strncpy(labels[FIELD_8], "HHH ", 4);
                break;
            case FIELD_9:
                strncpy(labels[FIELD_9], "III", 4);
                break;
            case FIELD_10:
                strncpy(labels[FIELD_10], "JJJJ", 4);
                break;
            case FIELD_11:
                strncpy(labels[FIELD_11], "KKKK", 4);
                break;
            }
        }
    }

    globals.h

    typedef enum {FIELD_1, /*snip*/, FIELD11} field_names;

    #define NUM_FIELD_NAMES 11

    extern char labels[NUM_FIELD_NAMES][4];

    globals.c

    #include "globals.h"

    char labels[NUM_FIELD_NAMES][4]; /* <! This table holds four-byte character tags used for file output */

     

    Given this setup, you might assume that labels is used throughout the program. It is used exactly once -- where set_labels() is called.

     I can't help but think that the person who wrote this (a most esteemed theoretical astrophysicist) had a thought process that went something like: "I need to populate an array of string literals. I know! I'll use an enum to make it easily accessible! But how do I put them together? I know! I'll use my trusty for-loop!"

    And this isn't the worst coding offense I have seen in astronomy software. At least they are using Doxygen comments.



  • They write software like that in C? Dear God.



  • It's our old pal the for-switch loop!



  • @blakeyrat said:

    They write software like that in C? Dear God.

    Most numerical software in astronomy is written in C or FORTRAN 77. I prefer bad C over good FORTRAN 77, personally. There are a few brave souls who have ventured into using C++. Remember how I mentioned this isn't the worst that I have seen? Yeah, C++ is a little too technical for some folks. My favorite WTF was using member templates to pass boolean parameters to a method in lieu of formal parameters. It's C++, we have to use templates!



  • @communist_goatboy said:

    Most numerical software in astronomy is written in C or FORTRAN 77.

    But why? I have to admit I have no knowledge of FORTRAN, but I can't think of a worse language than C for this use.

    Is it just the same as the video games industry where they used C because they had a good reason to 15 years ago, and now that they no longer do they can't switch because they're all dinosaurs?



  • @communist_goatboy said:

    At least they are using Doxygen comments.
     

    This does not earn points in my book.  But then again, that's probably due to the particular way in which our projects (mis|ab)use Doxygen.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    But why?
    Because that's what they have expertise in. As someone new comes in, they've got to learn the crusty old languages and libraries (they need to work with existing code after all) and after that they're typically part of the problem, producing more code in those crusty old languages. There's no focus on redoing things right, because the code isn't what anyone's career advancement is tied to. (Well, not historically.)

    It could be worse. It could've involved IDL. (Never used it? It's like Fortran decided to rape a scripting language. And I use that word deliberately. I hate IDL!)



  • Right, which is why "we've always done it that way" is the most harmful sentence in the English language.



  • Here's the think about academic-y types coding: The code is a means to an end, rather than the product itself. I'm guessing most people on this forum have an software engineering background, and probably care more than the average programmer about writing quality code and avoiding WTF's. You're steeped in code through school, and writing code is integral to a CS degree in the way math is integral to a physics degree. (Yes, I know CS != software engineering, but that still seems like a common enough degree->career path).

    I'm not in academia, but I'm more scientist* than software engineer. I try to write sane, elegant, comprehensible code, but sometimes time pressures force me to do some ugly, ugly things to meet deadlines. I care about my code quality more than most of my peers, but the analysis I'm doing is always more important than code I might only use for one specific project.

    I'm not going to defend that code as good code. However, the time of a scientist is finite, much like everyone else. If you suggest giving them stronger programming backgrounds, then you'll have to forego some other knowledge and skills. "Because we've always done it that way" isn't a great rationale, but "Because it's time-consuming to transition to a more modern language which few astrophysicists have even a passing familiarity with for marginal benefits" sums up the situation pretty darned well.

    *Statistician, AKA "data scientist", as the current buzzword goes.



  • @thatmushroom said:

    I'm not going to defend that code as good code. However, the time of a scientist is finite, much like everyone else. If you suggest giving them stronger programming backgrounds, then you'll have to forego some other knowledge and skills. "Because we've always done it that way" isn't a great rationale, but "Because it's time-consuming to transition to a more modern language which few astrophysicists have even a passing familiarity with for marginal benefits" sums up the situation pretty darned well.

    Be that as it may, I cannot help but believe that the kind of thought process responsible for writing


    enum field_names i;
    for (i = 0; i < NUM_FIELD_NAMES; i++)
    switch (i) {
    case FIELD_1:
    strncpy(labels[FIELD_1], "AAA ", 4);
    break;
    case FIELD_2:
    strncpy(labels[FIELD_2], "BBB ", 4);
    break;
    case FIELD_3:
    strncpy(labels[FIELD_3], "CC ", 4);
    break;
    case FIELD_4:
    strncpy(labels[FIELD_4], "DDDD", 4);
    break;
    case FIELD_5:
    strncpy(labels[FIELD_5], "E ", 4);
    break;
    case FIELD_6:
    strncpy(labels[FIELD_6], "FFF ", 4);
    break;
    case FIELD_7:
    strncpy(labels[FIELD_7], "GGGG", 4);
    break;
    case FIELD_8:
    strncpy(labels[FIELD_8], "HHH ", 4);
    break;
    case FIELD_9:
    strncpy(labels[FIELD_9], "III", 4);
    break;
    case FIELD_10:
    strncpy(labels[FIELD_10], "JJJJ", 4);
    break;
    case FIELD_11:
    strncpy(labels[FIELD_11], "KKKK", 4);
    break;
    }
    }

    rather than

    strncpy(labels[FIELD_1], "AAA ", 4);
    strncpy(labels[FIELD_2], "BBB ", 4);
    strncpy(labels[FIELD_3], "CC ", 4);
    strncpy(labels[FIELD_4], "DDDD", 4);
    strncpy(labels[FIELD_5], "E ", 4);
    strncpy(labels[FIELD_6], "FFF ", 4);
    strncpy(labels[FIELD_7], "GGGG", 4);
    strncpy(labels[FIELD_8], "HHH ", 4);
    strncpy(labels[FIELD_9], "III", 4);
    strncpy(labels[FIELD_10], "JJJJ", 4);
    strncpy(labels[FIELD_11], "KKKK", 4);
    has surely got to cause complicator's-gloves bogosity in everything its owner touches. Sure, a non-programmer might not be sufficiently across the use of C initializers and the relationships between C arrays and strings and pointers to get the whole table neatly set up at compile time, but honestly the for-switch antipattern is not a matter of unfamiliarity with programming languages or even of defensible site conventions - it has to reflect some kind of fundamental cognitive deficiency.



  • @thatmushroom said:

    You're steeped in code through school, and writing code is integral to a CS degree in the way math is integral to a physics degree. (Yes, I know CS != software engineering, but that still seems like a common enough degree->career path).

    To call it "integral" would be to exaggerate the relationship between writing code and the CS degree course I studied. In the first year, it was necessary to write a total of about 200 lines. In the second year there was a group project; I wrote maybe 400 lines, but there was one person on my team allocated to documentation who didn't code anything, and I can't remember whether the project manager wrote any code either. In the third year, I chose to do a software project, but other options were hardware, pure theory, or probably even psychology (UX). It was definitely possible to graduate with a good grade having written only about 200 lines of code.

    Maths is integral to a CS degree in the way it's integral to a physics degree, but getting any experience at writing code is a bonus.


  • ♿ (Parody)

    @blakeyrat said:

    @communist_goatboy said:
    Most numerical software in astronomy is written in C or FORTRAN 77.

    But why? I have to admit I have no knowledge of FORTRAN, but I can't think of a worse language than C for this use.

    FORTRAN has traditionally been strong in numerical simulation sort of stuff. There is a lot of code out there. I assume someone modernized at some point and started using C. But that's a fairly big deal. Worse than simply rewriting in the same language. They'd have a lot of extra validation and verification to do before anyone would trust the new stuff. And they'd probably end up with a heap of crap no matter what language they used. I'm sure it's not an easy case to make to your funding board (or whatever).



  • @blakeyrat said:

    @communist_goatboy said:
    Most numerical software in astronomy is written in C or FORTRAN 77.

    But why? I have to admit I have no knowledge of FORTRAN, but I can't think of a worse language than C for this use.

    Is it just the same as the video games industry where they used C because they had a good reason to 15 years ago, and now that they no longer do they can't switch because they're all dinosaurs?

    From what I remember from my couple years dallying with a career in astronomy it's a combination of those languages being very fast and because they're the languages everyone else uses. The second point I'm sure will bring plenty of hackles around here but these people in many cases learned programming from other physicists/astronomers who only know and work with those languages. In my graduate program there was a computational physics course and you could do the work in FORTRAN or C/C++. It was taught by a member of the physics department and for a lot of people it was their first time writing code. Outside of that if their research required programming it often meant working with their adviser's code which again tends to be FORTRAN or C, and again it's often their first or one of their first exposures to coding.

    To be fair they are fast languages with good support for writing massively parallel software (this last is kind of circular of course; the good support exists because physicists and astronomers knowing mostly FORTRAN and C needed to be able to write code to run their massive simulations on supercomputers and so people wrote it and improved it over time because of that need). And having a good base of existing code which has been used, tested, and fixed is not something which should be thrown away. But coding is a means to an end for most of these people and it's not high on most of their lists of priorities to improve their software skills or to find new languages or environments to work with.

    Of course this is all a great recipe for WTF code but at least in this case they are not only the developers but also the users, and the pain they inflict from is only on themselves and not paying (or not) customers.



  • @boomzilla said:

    ... someone [b]modernized[/b] at some point and started using [b]C[/b] ...

    Ha, that gave me a chuckle.



  • @blakeyrat said:

    @communist_goatboy said:
    Most numerical software in astronomy is written in C or FORTRAN 77.

    But why? I have to admit I have no knowledge of FORTRAN, but I can't think of a worse language than C for this use.

    Is it just the same as the video games industry where they used C because they had a good reason to 15 years ago, and now that they no longer do they can't switch because they're all dinosaurs?

    A lot of it has to do with legacy code and people who were doing this stuff in the 70s/80s when FORTRAN77 and C were the only real options for scientific computing. Why people today start new projects using these languages boils down to familiarity with the old tools and a complete lack of knowledge of using the right tool for the job. FORTRAN77 should die in a fire. FORTRAN95 or 2003 are actually pretty good languages for scientific computing as they have nice array syntaxes and several parallel computing platforms support them (e.g., OpenMP, MPI, etc.). The newer parallel platforms like TBB, OpenCL, and CUDA are oriented around C++ which I think is the language which has the best balance of abstraction and number-crunching power. The complexity of modern simulations really requires using multiple languages (compiled and interpreted!) layered and connected properly. But we both know that is never going to happen. Although it might keep TDWTF open for a few more years...



  • Two points:

    1)

    @thatmushroom said:

    Here's the think about academic-y types coding: The code is a means to an end, rather than the product itself.

    Duh? That's always true. The only people who think otherwise are those open source Git-using types who couldn't write a functional application to save their own lives.

    2)

    @thatmushroom said:

    I'm not going to defend that code as good code. However, the time of a scientist is finite, much like everyone else. If you suggest giving them stronger programming backgrounds, then you'll have to forego some other knowledge and skills.

    Why don't they make use of this amazing invention known as the "job market" and hire someone to write the code? That would save even more of their time, and the only skill they'd need is being able to explain what the fuck they're doing to another human being (which they'd have to do sooner or later anyway.)


  • Discourse touched me in a no-no place

    @blakeyrat said:

    Why don't they make use of this amazing invention known as the "job market" and hire someone to write the code?
    Some do, but they tend to be the exception and not the rule. What's more, they have a point: most programmers cannot write numeric-heavy software, even on pain of… well, a lot of pain. Making a complex algorithm be numerically stable so you actually get meaningful answers out is non-trivial. (Game programmers mostly use tricks and fudges to avoid having to do these really hard bits, but academic physics simulations really have to be exactly right. There's a number of stories about careers being ruined when it was found out that someone got things horribly wrong.)

    Most of the time these days, I work with biologists. They make physicists look like coding gods.



  • @dkf said:

    What's more, they have a point: most programmers cannot write numeric-heavy software, even on pain of… well, a lot of pain. Making a complex algorithm be numerically stable so you actually get meaningful answers out is non-trivial.

    Right, I get that 100%. But by using C, they're taking that amount of pain and cranking it up to 11.

    @dkf said:

    There's a number of stories about careers being ruined when it was found out that someone got things horribly wrong

    Well they can do what they like, but from my perspective I'd be a hell of a lot more worried than I accidentally wrote code exposed one of the nasty warts in C and destroyed my career that way. At least in modern, memory-managed languages your algorithm is (by-and-large) just your algorithm and not "your algorithm and the 50,000 lines of boilerplate and bullshit required to manage your own memory and make-up for C's lack of any useful data types for scientific work in a big congealed lump of crap".

    @dkf said:

    Most of the time these days, I work with biologists. They make physicists look like coding gods.

    I read an article a few years back about a bunch of genetic studies that had to be re-done because they were using Excel to store the gene sequences and Excel's dumb auto-type code thought some of the sequences were actually dates. Or something like that.



  • @blakeyrat said:

    Why don't they make use of this amazing invention known as the "job market" and hire someone to write the code? That would save even more of their time, and the only skill they'd need is being able to explain what the fuck they're doing to another human being (which they'd have to do sooner or later anyway.)
     

    Because transaction costs are huge, and because almost everybody that is able to understand what the fuck they're doing is another scientist of the same speciality, and those normaly know the same language and create the same kinds of WTFs when coding.



  • @dkf said:

    @blakeyrat said:
    But *why*?
    It could be worse. It could've involved IDL. (Never used it? It's like Fortran decided to rape a scripting language. And I use that word deliberately. I hate IDL!)
     

    Haha, that is the best description of IDL I have heard. I am glad that I have finally found someone else who absolutely loathes that heinous language.

     



  •  @thatmushroom said:

    Here's the think about academic-y types coding: The code is a means to an end, rather than the product itself.

    Granted, much of the software written in astronomy is a bit of one-off code that needs to perform some specific task and will likely not be used ever again. In that case, you can use Brainfuck for all I care. The code example I presented comes from a very widely-used simulation package that has 1800+ citations. A means to an end, it is not.

     



  • @blakeyrat said:

    @dkf said:
    What's more, they have a point: most programmers cannot write numeric-heavy software, even on pain of… well, a lot of pain. Making a complex algorithm be numerically stable so you actually get meaningful answers out is non-trivial.

    Right, I get that 100%. But by using C, they're taking that amount of pain and cranking it up to 11.

    Eeehh, C isn't that bad. If you want a language with surprise typing, no memory management for file streams, a bastardized array interface (with even more surprise typing!), and is sold as a vectorized language but doesn't support vectorization beyond a single dimension and truncates mismatching dimensions, then look to IDL. Jesus fuck, someone kill that language.

    @blakeyrat said:

    @dkf said:
    There's a number of stories about careers being ruined when it was found out that someone got things horribly wrong

    Well they can do what they like, but from my perspective I'd be a hell of a lot more worried than I accidentally wrote code exposed one of the nasty warts in C and destroyed my career that way. At least in modern, memory-managed languages your algorithm is (by-and-large) just your algorithm and not "your algorithm and the 50,000 lines of boilerplate and bullshit required to manage your own memory and make-up for C's lack of any useful data types for scientific work in a big congealed lump of crap".

    Accuracy, speed, and scalability are the three rungs of the heirarchy of scientific computing. Maintainability is a far-away and fleeting throught for most code-writers in science. I'm definitely not arguing this should be the case, but without the three fundamental characteristics, no one is going to care what you wrote. I am firmly in the camp that maintainable code is much easier to make accurate, fast, and scalable. But that camp is a very tiny fraction of people who write scientific code. For the sake of speed, managing your own memory is really not that big of an obstacle. Yes, something like FORTRAN* or C++ can be of great help here, but sometimes doing it yourself is necessary to squeeze the last drop out of the silicon. For example, there is a team at Argonne writing a cosmological simulation designed to model a trillion particles using 100k+ CPU cores. They wrote their own memory pool allocator simply because new/delete were too slow. They were getting such heavy CPU utilization that they managed to discover a defect in the fab process of the CPU sockets when they melted under water cooling. Even in less intensive applications, using a managed language has performance penalities that are too large to cope with. One thing I would like to see is the use of managed languages to sketch out programs before porting them to the heavy-lifting languages so that people will focus on algorithms rather than implementation details.

    *FORTRAN90/95/2003/2008 only



  • @dkf said:

    (Game programmers mostly use tricks and fudges to avoid having to do these really hard bits, but academic physics simulations really have to be exactly right. There's a number of stories about careers being ruined when it was found out that someone got things horribly wrong.)
    Six Words


  • Discourse touched me in a no-no place

    @communist_goatboy said:

    For the sake of speed, managing your own memory is really not that big of an obstacle.
    It's not really a big obstacle at all. The usual approach is to just allocate a big honking array and then just use that without changing it. Memory requirements? Allocate once and deallocate by making the process exit. Trivial. (Fortran's big advantage over C and C++ is that it's got better rules for handling aliasing, which apparently permits more aggressive optimisations, but I suspect that that would be true for many higher-level languages too; they've just not had nearly as much effort poured into optimisation yet.)

    It's getting the numerics right that is hard and known to be so. The order of operations is extremely important, and mathematical operations cease to be associative in the normal sense, since you're managing the error as well as the result, but this means that knowing what you get and when you get it matters ever so much.


Log in to reply