Why things are done a certain way



  • A junior coworker was having trouble debugging through some code that looked like this (long method names replaced with single characters for brevity, spaces added by me):

    a().b().c().d(    e().f().g(      h().i().j()     )      );
    

    While stepping through the code, he couldn't understand why the debugger kept highlighting the same line over and over again. I showed him how to make it a bunch of individual function calls:

    HResult hr = h();
    IResult ir = hr.i();
    JResult jr = ir.j();
    

    EResult er = e();
    FResult fr = er.f();
    GResult gr = fr.g( jr );

    AResult ar = a();
    BResult br = ar.b();
    CResult cr = br.c();
    DResult dr = cr.d(gr) // net result of big statement

    so that you could step through it more easily. "But the compiler should be able to handle this code!" Um, yes it can, and so can the debugger; it's YOU who can't handle it: KISS!

    It turns out that one of the nested methods was returning a null object on which the subsequent hard coded method call caused a null pointer exception.

    Ya gotta love rookies.



  •  Well, to be fair, he has a point. The compiler should be able to determine precisely where the error is occurring in the chain and point to that precise location instead of just giving a line number, saving you from the tedious process of splitting the line up in order to facilitate debugging.



  • Check the state of the call stack when each method is executed, that'll help to pinpoint the error.



  • @Bumble Bee Tuna said:

     Well, to be fair, he has a point. The compiler should be able to determine precisely where the error is occurring in the chain and point to that precise location instead of just giving a line number, saving you from the tedious process of splitting the line up in order to facilitate debugging.

    I agree. Ideally debuggers should be able to deal with any sequence point, not just line numbers (and every function call and return is a sequence point).

    Need better debugger UI.



  • Whatever. TRWTF is having a line like a().b().c().d(e().f().g(h().i().j()));



  • @Zecc said:

    Whatever. TRWTF is having a line like a().b().c().d(e().f().g(h().i().j()));
    Bullshit.  Function chaining is a beautiful, magical thing.



  •  the wtf here is that bumble bee tuna thinks that an ideal debugger would solve all of his logic problems



  • @snoofle said:

    so that you could step through it more easily. "But the compiler should be able to handle this code!" Um, yes it can, and so can the debugger; it's YOU who can't handle it: KISS!

     

    Honestly, the person who wrote that code should be the one who needs to consider KISS.  That code is a disaster and would be disaster to try and debug and find out exactly where the null is causing you problems.  You would have to step into and then return from each function call.  Furthermore, you would have to see the return value which isn't always straightforward in every debugger.  

    Lesson 1:  Refactor this shit to put it on multiple lines

    Lesson 2:  Maybe rework the OO of the whole design



  • At least he spaced it to make the parentheses more obvious. So it's not completely stupid... just really stupid.



  • @Bumble Bee Tuna said:

     Well, to be fair, he has a point. The compiler should be able to determine precisely where the error is occurring in the chain and point to that precise location instead of just giving a line number, saving you from the tedious process of splitting the line up in order to facilitate debugging.

    It's not possible. By the time the error is detectable by an automated process, the damage is already done and it's too late to figure out where it came from. The debugger will stop on line like:

    foo->bar=baz;

    And you'll see that 'foo' is NULL. So the question now is -- how did 'foo' get to be NULL? And to determine that, you have to go back to whatever set 'foo' in the first place. That's very hard if the code is someting like:

    x(y(t), p(q));

    Especially if the function 'y' contains some code like 'if(t==NULL) return NULL; ...' and the 'if' is 'just in case' code.  There's no way to tell if that 'if' got triggered unless you put a breakpoint in it or step through the code manually.

    The OP is completely corrrect. It is the person who came to him for help that was confused by the code, not the compiler or the debugger. If they understood the flow of the code, they could easily have stepped to the precise point, but breaking the code up as the OP suggested (though perhaps not that radically) does make it significantly easier to troubleshoot.

    If you look at x(y(t),p(q)); you may really need a breakpoint to see the return value of 'y', and there's no good place to put it. You can open up the code for 'y', but unless it has only one return statement, you'll have a hard time ensuring you catch its return value.

    DS




  •  [quote user="Renan "C#" Sousa"]Check the state of the call stack when each method is executed, that'll help to pinpoint the error.[/quote]

    These are not nested calls. The call stack will only have the current scope in it.



  •  I'm gonna go with the "He has got a point" faction here. As far as I know, the concept of functions in programming is in widespread use since C, probably earlier. So why the hell are debuggers still operating on lines? Shouldn't they operate on expressions? 

    @joelkatz said:

    If you look at x(y(t),p(q)); you may really need a breakpoint to see the return value of 'y', and there's no good place to put it. You can open up the code for 'y', but unless it has only one return statement, you'll have a hard time ensuring you catch its return value.

    DS

     

    Which is why such a debugger should allow to set a breakpoint on the y(t) part of that expression. I don't know how you could do that UI-wise, though maybe it would work if you specified breakpoints as a line/column pair, not just a line. Alternatively, you could introduce a "step through expression" feature that would work like "step into", only even more fine-grained.

    (In before "we should all be rich, too": I know I can't change the ways debuggers work. But you can still note that a behavior is stupid.)

     



  • @PSWorx said:

      Alternatively, you could introduce a "step through expression" feature that would work like "step into", only even more fine-grained.

    (In before "we should all be rich, too": I know I can't change the ways debuggers work. But you can still note that a behavior is stupid.)

    You can already do this by doing a "step into" and then a "step out of."  




  •  While true, that can get really annoying after about 10 calls.



  • @DescentJS said:

     While true, that can get really annoying after about 10 calls.

     

    Yeah, It gets pretty old.



  • @PSWorx said:

    I'm gonna go with the "He has got a point" faction here. As far as I know, the concept of functions in programming is in widespread use since C, probably earlier. So why the hell are debuggers still operating on lines? Shouldn't they operate on expressions?

    Not disagreeing with you, but I think it's a band-aid for the real problem.  If you have 10 function calls on a single line, you are writing some code that is really painful to read, debug and modify.



  • @bstorer said:

    @Zecc said:
    Whatever. TRWTF is having a line like a().b().c().d(e().f().g(h().i().j()));
    Bullshit.  Function chaining is a beautiful, magical thing.
    Sure. I like it and use it too.

    But there is function chaining and there is giving yourself enough chain to hang.

    There is also the ability for line breaks.

    @lolwtf said:

    At least he spaced it to make the parentheses more obvious. So it's not completely stupid... just really stupid.
    snoofle did it, not the complete stupid.

    @DescentJS said:

    While true, that can get really annoying after about 10 calls.
    Did you just create an infinite loop?



  • @Zecc said:

    Whatever. TRWTF is having a line like a().b().c().d(e().f().g(h().i().j()));
    Gonna have to go with this. Sure, compiler/debugger capabilities could be better, but ultimately this is unreadable crap. I'm all for nesting, but there is a limit. Imagine trying to read the above with the actual function names.

    And if you're trying to modify something instead of debugging, your tools no matter how advanced, wont help you.



  • @joelkatz said:

    @Bumble Bee Tuna said:

     Well, to be fair, he has a point. The compiler should be able to determine precisely where the error is occurring in the chain and point to that precise location instead of just giving a line number, saving you from the tedious process of splitting the line up in order to facilitate debugging.

    It's not possible. By the time the error is detectable by an automated process, the damage is already done and it's too late to figure out where it came from. The debugger will stop on line like:

    foo->bar=baz;

    And you'll see that 'foo' is NULL. So the question now is -- how did 'foo' get to be NULL? And to determine that, you have to go back to whatever set 'foo' in the first place.

    I think the point here is that all snoofle's modification did was to force the debugger to reveal what 'foo' was - it didn't begin to explore why the function which made it null did so. A better debugger might tell you something like "Can't call method "bar" on a null value" in the first instance, which would have obviated the whole anecdote. But you need to work with the tools you have, or explain to your boss why you need to spend the time it takes to make better ones.



  • @snoofle said:

    a().b().c().d(    e().f().g(      h().i().j()     )      );

    @snoofle said:
    It turns out that one of the nested methods was returning a null object on which the subsequent hard coded method call caused a null pointer exception.

    So wait a minute, because those are all dots, not arrows, which implies that one of our rookie's routines must have been doing something like

    Eresult &e(void) {
      return * (E *) NULL;
    };

    ... doesn't it?  In which case it's thoroughly undefined behaviour that nobody has a right to expect either compiler or debugger to handle gracefully.  It knows that that pointer can't possibly be NULL because it's a reference, goddammit!

    Of course I'm talking C++, maybe this is C# or some other baroque creation that makes C++ look actually sane ...

     



  • @DaveK said:


    Of course I'm talking C++, maybe this is C# or some other baroque creation that makes C++ look actually sane ...
    I'm going to assume this is another one of your unfunny jokes.  Nothing can make C++ look sane.  Not BrainFuck, not INTERCAL, not SSDS; nothing.



  • @DaveK said:

    @snoofle said:

    a().b().c().d(    e().f().g(      h().i().j()     )      );

    @snoofle said:
    It turns out that one of the nested methods was returning a null object on which the subsequent hard coded method call caused a null pointer exception.

    So wait a minute, because those are all dots, not arrows, which implies that one of our rookie's routines must have been doing something like

    Eresult &e(void) {
      return * (E *) NULL;
    };

    ... doesn't it?  In which case it's thoroughly undefined behaviour that nobody has a right to expect either compiler or debugger to handle gracefully.  It knows that that pointer can't possibly be NULL because it's a reference, goddammit!

    Of course I'm talking C++, maybe this is C# or some other baroque creation that makes C++ look actually sane ...

     

    Erm, it was java


  • @bstorer said:

    But there is function chaining and there is giving yourself enough chain to hang.

    Very true, which was the point I was making to the rookie. The source line in question was almost 800 characters long - on one line.

    There's an art to striking a balance between using this kind of construct because you can, and readability/supportablility. Now this kid has the seeds of a clue.

     



  • @tster said:

    Lesson 2:  Maybe rework the OO of the whole design

     

    Some guy named Demeter had something to say about that, I think.



  • @bstorer said:

    I'm going to assume this is another one of your unfunny jokes.  Nothing can make C++ look sane.  Not BrainFuck, not INTERCAL
     

    INTERCAL's "COME FROM" operator is about the only ill-concieved programming concept C++ *doesn't* contain.



  • TRWTF is a debugger that can't tell you exactly which pixel in your source code caused the error.



  • The historical problem is that the compilation process broke up into separate source/object/target phases, which broke the link between between source and target. Therefore the only symbol information available to the debugger was that associated with the object.To get past this problem, you either have to (a) invent a new object standard, or (b) not be standards compliant. Also, it's a lot easier if you are using a language that was properly designed for easy compilation: with a language like C, it's just rather more difficult to maintain a simple relationship between source and target.



  • @savar said:

    @tster said:

    Lesson 2:  Maybe rework the OO of the whole design

     

    Some guy named Demeter had something to say about that, I think.

    Um... guy?



  • TRWTF is that they couldnt add line-breaks and let the debugger do their work for them. I've had lines like
    [code]a.b(c().d.e(f).g().q(zz));[/code]that when simply expanded out to

    a.b(
        c(
          ).d.e(
           f
          ).g(
               ).q(
                   zz)
       );

    with appropriate debug points worked just fine in the debugger.


  • Garbage Person

    @Zecc said:

    Whatever. TRWTF is having a line like a().b().c().d(e().f().g(h().i().j()));
    Sometimes, when you're stuck in API Hell, those lines become alright because they represent one single discrete "Do shit" step. 

     

    Plus all the subresults will be automatically dereferenced. There's REALLY no point in writing 45 lines of code to carry one one logical step and then dispose of all the objects created in carrying out that step. That just makes for hideous code.

     

    It is naturally important to make sure your one clusterfuck line is properly written, however, and uses the API's resources in the proper combination, and that you know how to use your debugger properly.



  • @morbiuswilters said:

    Not disagreeing with you, but I think it's a band-aid for the real problem.  If you have 10 function calls on a single line, you are writing some code that is really painful to read, debug and modify.

    Depends on the problem you're trying to solve, I'd say. There are some problems that are really better represented with a functional programming style, in which case chained/nested fucntions can actually be easier to comprehend.

    As an example, I've recently had to parse a custom XML format in java, with XPath or similar technologies not available (Yes, I know, the Real WTF). So I've had the choice between

    Node rootNode = doc.getChildren()[1];

    Node firstRecordNode = null;
    for (Node i : rootNode.getChildren()) {
     if (i.getNodeName() == "person") {
      firstRecordNode = i;
      break;
     }
    }

    Node dateOfBirthNode = null;
    for (Node i : firstRecordNode.getChildren()) {
     if (i.getNodeName() == "dateOfBirth") {
      dateOfBirthNode = i;
      break;
     }
    }

    Node dayOfBirthNode = dateOfBirthNode.getChildren()[1];

    String dayOfBirth = dayOfBirthNode.textValue();

    return dayOfBirth;

    or writing an utility function and doing

    return doc.findFirst("root").findFirst("person").findFirst("dateOfBirth").findFirst("day").textValue();

     

    I'd like to note however, that this only works for (practically) pure functions. I agree that chaining functions with side effects is a huge WTF.



  • @PSWorx said:

    @morbiuswilters said:

    Not disagreeing with you, but I think it's a band-aid for the real problem.  If you have 10 function calls on a single line, you are writing some code that is really painful to read, debug and modify.

    Depends on the problem you're trying to solve, I'd say. There are some problems that are really better represented with a functional programming style, in which case chained/nested fucntions can actually be easier to comprehend.

    As an example, I've recently had to parse a custom XML format in java, with XPath or similar technologies not available (Yes, I know, the Real WTF). So I've had the choice between

    Node rootNode = doc.getChildren()[1];

    Node firstRecordNode = null;
    for (Node i : rootNode.getChildren()) {
     if (i.getNodeName() == "person") {
      firstRecordNode = i;
      break;
     }
    }

    Node dateOfBirthNode = null;
    for (Node i : firstRecordNode.getChildren()) {
     if (i.getNodeName() == "dateOfBirth") {
      dateOfBirthNode = i;
      break;
     }
    }

    Node dayOfBirthNode = dateOfBirthNode.getChildren()[1];

    String dayOfBirth = dayOfBirthNode.textValue();

    return dayOfBirth;

    or writing an utility function and doing

    return doc.findFirst("root").findFirst("person").findFirst("dateOfBirth").findFirst("day").textValue();

     

    I'd like to note however, that this only works for (practically) pure functions. I agree that chaining functions with side effects is a huge WTF.

    That's only 5 function calls, which is about the limit I would adhere to when using chaining.  Better to split it into multiple lines and use temp variables with understandable names.  In any sensible language (read: not Java or C++) the overhead is going to be negligible.



  • @morbiuswilters said:

     

    That's only 5 function calls, which is about the limit I would adhere to when using chaining.  Better to split it into multiple lines and use temp variables with understandable names.  In any sensible language (read: not Java or C++) the overhead is going to be negligible.

    Which is why I disagree with you.  Since the difference in any worthwhile language is negligable, why not just chain the functions?  That's not to say I'd want them all on a single line, but I don't see the advantage of

    a = foo.a();
    b = a.b();
    ...
    z = y.z();
    over something like
    foo.a()
       .b()
       .c()
    ...
       .z();

    Personally I find the latter a little more transparent than the former, but your mileage may vary.



  • @bstorer said:

    @morbiuswilters said:

     

    That's only 5 function calls, which is about the limit I would adhere to when using chaining.  Better to split it into multiple lines and use temp variables with understandable names.  In any sensible language (read: not Java or C++) the overhead is going to be negligible.

    Which is why I disagree with you.  Since the difference in any worthwhile language is negligable, why not just chain the functions?  That's not to say I'd want them all on a single line, but I don't see the advantage of

    a = foo.a();
    b = a.b();
    ...
    z = y.z();
    over something like
    foo.a()
       .b()
       .c()
    ...
       .z();

    Personally I find the latter a little more transparent than the former, but your mileage may vary.

    I'm not against all chaining, but I would limit it somewhat.  Something like:

    bar = foo.a().b().c().d().e();

    baz = bar.f().g().h().i().j();

     

    Of course, I do have to wonder if there's a case where chaining 10 function calls together is possible without creating a massive WTF.  The "one function per line" approach you advocate is fine, too, but I prefer breaking it up a bit to make later modifications easier.  It also lets you make clearer code as the temp variables can be given meaningful names.  Whatever, though, it's mostly a matter of personal taste.



  • @morbiuswilters said:

    I agree that chaining functions with side effects is a huge WTF.
    Have you met jQuery?

     

    Anyways, people seem to be missing the fact that there are three chains in the original post's code.

    You could simply change

    [code]a().b().c().d(e().f().g(h().i().j()));[/code]

    to:

    [code]j = h().i().j();
    g = e().f().g(j);
    a().b().c().d(g);[/code]

    and that alone would gain you a ton of readability.

    EDIT: I forgot to say that I agree with @morbiuswilters said:

    Of course, I do have to wonder if there's a case where chaining 10 function calls together is possible without creating a massive WTF.  The "one function per line" approach you advocate is fine, too, but I prefer breaking it up a bit to make later modifications easier.  It also lets you make clearer code as the temp variables can be given meaningful names.  Whatever, though, it's mostly a matter of personal taste.



  • @morbiuswilters said:

    Of course, I do have to wonder if there's a case where chaining 10 function calls together is possible without creating a massive WTF.  The "one function per line" approach you advocate is fine, too, but I prefer breaking it up a bit to make later modifications easier.  It also lets you make clearer code as the temp variables can be given meaningful names.  Whatever, though, it's mostly a matter of personal taste.
    While we're on the topic, the JodaTime library for Java, which attempts to fix the massive WTF that is Java's Date/Time system, uses function chaining to generate formatters.  For example, something like this could be used to format a period of time into "w days, x hours, y minutes, z seconds":

    [code]PeriodFormatter format = new PeriodFormatterBuilder().appendDays().appendSuffix(" day", " days").appendSeparator(", ").appendHours().appendSuffix(" hour", " hours").appendSeparator(", ").appendMinutes().appendSuffix(" minute", " minutes").appendSeparator(", ").appendSeconds().appendSuffix(" second", " seconds").toFormatter();[/code]

    Ain't it pretty?  Eclipse cannot manage to format it in any reasonable way.  Ultimately, I went with this, which I think is pretty readable:

     

    PeriodFormatter format = new PeriodFormatterBuilder()
            .appendDays().appendSuffix(" day", " days").appendSeparator(", ")
            .appendHours().appendSuffix(" hour", " hours").appendSeparator(", ")
            .appendMinutes().appendSuffix(" minute", " minutes").appendSeparator(", ")
            .appendSeconds().appendSuffix(" second", " seconds")
            .toFormatter();
    


  • @bstorer said:

    While we're on the topic, the JodaTime library for Java, which attempts to fix the massive WTF that is Java's Date/Time system, uses function chaining to generate formatters.  For example, something like this could be used to format a period of time into "w days, x hours, y minutes, z seconds":

    <font face="Lucida Console" size="2">PeriodFormatter format = new PeriodFormatterBuilder().appendDays().appendSuffix(" day", " days").appendSeparator(", ").appendHours().appendSuffix(" hour", " hours").appendSeparator(", ").appendMinutes().appendSuffix(" minute", " minutes").appendSeparator(", ").appendSeconds().appendSuffix(" second", " seconds").toFormatter();</font>

    Ain't it pretty?  Eclipse cannot manage to format it in any reasonable way.  Ultimately, I went with this, which I think is pretty readable:

     

    PeriodFormatter format = new PeriodFormatterBuilder()
            .appendDays().appendSuffix(" day", " days").appendSeparator(", ")
            .appendHours().appendSuffix(" hour", " hours").appendSeparator(", ")
            .appendMinutes().appendSuffix(" minute", " minutes").appendSeparator(", ")
            .appendSeconds().appendSuffix(" second", " seconds")
            .toFormatter();
    

    Goddammit, what's wrong with strfrtime()?  That reminds me of the libraries that use function chaining to write SQL and regexes.




  • @PSWorx said:

    return doc.findFirst("root").findFirst("person").findFirst("dateOfBirth").findFirst("day").textValue();
    Doesn't this blow up in your face the first time you try to read a malformed record the e.g. doesn't contain a dateOfBirth node?



  • @morbiuswilters said:

    Goddammit, what's wrong with strfrtime()?
    Well that's not very enterprisey, is it?  This is Java, after all.



  •  Question to op: Is this an intern from first year of college? Or a graduate? If so what school because I'd like to prevent any employment from that one :P



  • @Zecc said:

    You could simply change

    <font size="2" face="Lucida Console">a().b().c().d(e().f().g(h().i().j()));</font>

    to:

    <font size="2" face="Lucida Console">j = h().i().j();
    g = e().f().g(j);
    a().b().c().d(g);</font>

    and that alone would gain you a ton of readability.

    At least in C++, though, that's a semantic change. For example, previously, the variable you called 'j' would be destroyed as soon as 'g' returned. Now it will not. Previously the variable you called 'g' would be destroyed when the function 'g' returned, now it will not. This can break the code if the destructors do something important.

    As a contrived example, assume that the variable you called 'j'  is of a type that only one such object can exist at a time and that the line as a whole returns an object of that type too. In the original line, 'j' and the return of the entire line do not have overlapping lifespans. With your change, they do.

    One can argue this is a WTF in C++. However, nobody has yet designed a language feature that couldn't be abused.

    DS



  • @joelkatz said:

    @Zecc said:

    You could simply change

    <font face="Lucida Console" size="2">a().b().c().d(e().f().g(h().i().j()));</font>

    to:

    <font face="Lucida Console" size="2">j = h().i().j();
    g = e().f().g(j);
    a().b().c().d(g);</font>

    and that alone would gain you a ton of readability.

    At least in C++, though, that's a semantic change. For example, previously, the variable you called 'j' would be destroyed as soon as 'g' returned. Now it will not. Previously the variable you called 'g' would be destroyed when the function 'g' returned, now it will not. This can break the code if the destructors do something important.

    As a contrived example, assume that the variable you called 'j'  is of a type that only one such object can exist at a time and that the line as a whole returns an object of that type too. In the original line, 'j' and the return of the entire line do not have overlapping lifespans. With your change, they do.

    One can argue this is a WTF in C++. However, nobody has yet designed a language feature that couldn't be abused.

    DS

     

    The bigger problem is that it's changed the order of execution of the functions.  While this shouldn't be an issue if the functions are pure, this might cause problems in other cases.



  • @Sir Twist said:

    @PSWorx said:
    return doc.findFirst("root").findFirst("person").findFirst("dateOfBirth").findFirst("day").textValue();
    Doesn't this blow up in your face the first time you try to read a malformed record the e.g. doesn't contain a dateOfBirth node?
     

    Well, I cleaned this up for readability. In that particular example, it did blow up - but controlled. findFirst would throw a special user defined exception that is later caught and turned into an "Document xyz is invalid" message. The idea being "I don't want to check after each node step if my document is still valid. (Or at least I don't want to write code for each node step)"

    If you feel REALLY enterprisey and have a contractual obligation never to use exceptions, you could also make it return a dummy object that overrided findFirst as { return this; } and then at the end check against this dummy object. I don't take any responsibillity for possible front page mentions after that, though.

    @Zecc said:

    Have you met jQuery?

    Guuuuhh... This is about the same difference as between adding spice to your meal and emptying the whole salt shaker into it.

    I can already imagine the hilarity that ensues when you try to debug this thing.

    @morbiuswilters said:

    Goddammit, what's wrong with strfrtime()?  That reminds me of the libraries that use function chaining to write SQL and regexes.

     

    Well, in theory this would allow the formatter/sql/regex already to be built at compile-time to a certain degree. After all, if you pass your pattern as a string argument, it has to be parsed and compiled again at least each time you run your program. Then again, I've no idea how much difference it really makes.

    I think JS and Pearl have the cleanest solution to this problem. They simply make the whole regex syntax a part of the programming language itself. That doesn't just give the compiler the maximum potential for optimisations, it also gives you a (wait for it...) escape from of escape hell. *cough*java regexes*cough*

    E4X even went a step further and did the same thing with XPath.



  • @PSWorx said:

    I think JS and Pearl have the cleanest solution to this problem. They simply make the whole regex syntax a part of the programming language itself. That doesn't just give the compiler the maximum potential for optimisations, it also gives you a (wait for it...) escape from of escape hell. *cough*java regexes*cough*
     

    The idea is good, but Javascript's using /regexhere/ syntax instead of quoting regex causes no end of pain for people who have to parse it.

    You gotta love seeing lines like:

    (location.href + '/').replace(/\/\/$/, '/')

     /\/\/\/\/\/\/\/\/\/\/\/ indeed.



  • @PSWorx said:

    I think JS and Pearl have the cleanest solution to
    this problem. They simply make the whole regex syntax a part of the
    programming language itself. That doesn't just give the compiler the
    maximum potential for optimisations, it also gives you a (wait for it...) escape from of escape hell.
    coughjava regexescough

    I wasn't aware that PEARL 90 had regexes.  Or are you one of those confused people who can't spell Perl?  Be careful, because Pearl is also a programming language - but it's very different than Perl.

    That having said, wait till you get a load of Perl6.  All your old escape hell is back again, with a vengeance: any character in a regex must either be a member of \w, it must be escaped, or it must be performing a special function.  So, if you have forward slash characters, you have to escape them, even if you did make the start/stop characters for the regex something other than forward slash.  (Note: I still find it useful to have start/stop characters for the regex be characters which are not in the regex.  But I also find that regexes with a lot of literal special characters look better when surrounded by non-slash characters anyway - it just seems to me to be easier to distinguish {} or # from \ than it is to distinguish / from \.  Not that the latter's hard, but why make programming harder than it has to be?)

    (For the curious, the reason Perl6 does this is forward compatibility: they've added a few new special characters to regexes, and found it breaks quite a bit of old code that was trusting those to be literals.  This requirement catches a lot of those routines (because most of them used multiple special characters as literals), and it protects against such changes in the future.)



  • @tgape said:

    That having said, wait till you get a load of Perl6.  All your old escape hell is back again, with a vengeance: any character in a regex must either be a member of \w, it must be escaped, or it must be performing a special function.  So, if you have forward slash characters, you have to escape them, even if you did make the start/stop characters for the regex something other than forward slash.
    Good thing Perl6 will never actually exist.



  • @bstorer said:

    Good thing Perl6 will never actually exist.

    It does exist, at least in beta.  It's just that it's unlikely to exceed 100 downloads to unique persons for any version this decade.

    Personally, I'm hoping all the cool features get ported to 5.10+, and then, around maybe 2020 or so, we all jump to Perl7, which admits that all of the bits that didn't get ported back to perl5 were dross.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    The idea is good, but Javascript's using /regexhere/ syntax instead of quoting regex causes no end of pain for people who have to parse it.

    You gotta love seeing lines like:

    (location.href + '/').replace(/\/\/$/, '/')

     /\/\/\/\/\/\/\/\/\/\/\/ indeed.

    JS doesn't  have the ability to change the separator character?


  • @blakeyrat said:

    You gotta love seeing lines like:

    (location.href + '/').replace(///$/, '/')


    Why would that even be necessary? I've yet to see a webserver balk at doubled-up "/"'s in the path.



  • @tgape said:

    I wasn't aware that PEARL 90 had regexes.

     

    Argh! Touché.

    @blakeyrat said:

    The idea is good, but Javascript's using /regexhere/ syntax instead of quoting regex causes no end of pain for people who have to parse it.

     

    Well, I agree, JS took a good idea and implemented it in the worst possible way. I don't know why exactly the designers found it a good idea to use / as the delimiting characters, but I guess there were drugs involved.

    Another fun consequence of this is that the empty regex // is identical to JS' comment marker. So as not to confuse the parser, you have to write weird stuff like /()/ if you want an empty regex.


Log in to reply