Mongo gone zalgo



  • A stupid node microlibrary and a common mistake was all it took for this person to summon zalgo in a mongo database.



  • @wharrgarbl that appears to be a long-winded way to describe the pitfalls of extended object prototypes combined with the incautious use of for...in.



  • @anotherusername incautious use of it in relation to other libraries including ones that have methods that have little real business being around?



  • @Arantor no, incautious use of it in relation to objects whose prototypes have been extended to add enumerable properties that you didn't expect them to have.

    0_1494351079089_upload-59e382a0-844a-492c-83d4-80a30cb8f74d

    edit: oh, I see your point. But it doesn't matter: they could've been bitten just as hard by methods that did have business being around.



  • @anotherusername I blame typing on mobile while at work leading me to be more brief than I would normally like.

    Yes, there are problems with the whole extending base prototypes, which is why I have long had the habit of:

    for (var i in my_object) {
      if (my_object.hasOwnProperty(i)) {
        // whatevs
      }
    }
    

    There's probably a nicer way to do this if you have libraries around (jQuery.each comes to mind, or probably something in ES6) but I've done enough in my time without libraries that I often avoid them unless it would legitimately get me some benefit and I still occasionally do stuff in environments where I don't trust whatever libraries may have done to whatever is the current environment.

    I'm not saying that avoiding something that spits out ANSI-esque codes would have solved the problem, but it would certainly have made it less worrisome - because that crap and the resulting what-looks-like-mojibake looks like a hack at first glance if you don't know what is going on.

    And if you have a cluster of libraries that you don't know inside out, and more importantly that the library authors don't know (and thus write for) such environments, you get this kind of thing happen.



  • @Arantor said in Mongo gone zalgo:

    Yes, there are problems with the whole extending base prototypes, which is why I have long had the habit of:

    for (var i in my_object) {
      if (my_object.hasOwnProperty(i)) {
        // whatevs
      }
    }
    

    Putting this at the very top of your project would work pretty well, too. 🚎

    Object.getOwnPropertyNames(window).forEach(function (propName) {
      try {
        if (window[propName] && window[propName].prototype) {
          Object.freeze(window[propName].prototype);
        }
      } catch (err) {
      }
    });
    

  • I survived the hour long Uno hand

    @Arantor said in Mongo gone zalgo:

    which is why I have long had the habit of:

    Most linters will enforce that pattern these days. Eslint calls it "guard-for-in".


  • Discourse touched me in a no-no place

    @wharrgarbl said in Mongo gone zalgo:

    A stupid node microlibrary and a common mistake was all it took for this person to summon zalgo in a mongo database.

    All production databases axiomatically contain zalgo problems. Why? Because there's always someone who fucks up when inserting the data…



  • @dkf but requiring a library that does color stuff, and suddenly there is zalgo in your database is something thay only heard about in JavaScript


  • Discourse touched me in a no-no place

    @wharrgarbl I'd expect it in Ruby too, TBH. 😒



  • @dkf but guess what?

    Not in PHP. That's right, for once PHP doesn't entirely fuck you over.



  • @Arantor We knew for years that zalgo was coming, now we know how.


  • Discourse touched me in a no-no place

    @Arantor said in Mongo gone zalgo:

    Not in PHP.

    Either that's because it only ever claims to handle bytes (and so lets developers get it right, or — more likely, TBQH — fuck users over without being aware of it) or it is because it is unhealthily smart about autodetecting what shit is coming out of the DB. The problem is that, no matter how much you might want otherwise, bad data gets into databases. There's lots of reasons it can happen, but the usual effect is just painful anyway. I guess you can use statistical techniques (and iconv) to fix it, but the data's garbled before it gets to you.

    I've dealt with this in the past (handling species occurrence data from a biodiversity database) and the problem was that each row in the DB could and did have a different encoding. Some were UTF-8. Some ISO-8859-1 (or -15; they're nearly impossible to distinguish if you're not working with currencies). Some were one of the old Windows code pages (I forget which). Some were Shift JIS. Some were KOI8-R. There were probably others too. We could only figure out what was really going on because we knew these fields were actually names of researchers that we could google for properly once we'd got a candidate decoding… and yes, the data coming over the wire from the DB (which claimed it was all perfect UTF-8; TOTAL LIES!) really had this mash of pre-zalgo-d crap.

    And some of our users insisted on checking whether their own work was in there correctly. It wasn't. Of course. :headdesk:



  • @dkf It's not just a case of bad data entering the database, it happened with almost no action from the application developer:

    • a library that is a dependency of a dependency injected a method "zalgo" on the built-in string type
    • the developer passed a string to a db function that expects an dictionary object
    • mongo understands the method zalgo as an attribute and inserts it's return value as an attribute in the database object

    None of these 3 are possible in java or c#, and only the second one is possible in PHP.



  • @dkf said in Mongo gone zalgo:

    @Arantor said in Mongo gone zalgo:

    Not in PHP.

    Either that's because it only ever claims to handle bytes (and so lets developers get it right, or — more likely, TBQH — fuck users over without being aware of it)

    It's because it only ever claims to handle bytes. Strings are really just byte arrays, which means whatever you do you have to remember to use the mb functions if you explicitly want to handle stuff as such.

    And when bad data gets into databases, typically ISO-8859-something being brute forced into UTF-8 DB, it usually gets truncated because invalid byte sequences.

    Most people in the PHP world know to SET NAMES utf8 and then just pretend everything is UTF-8 thereafter and oddly enough that's usually close enough - right until you try to do things like truncation on bytes rather than characters...


  • kills Dumbledore

    @wharrgarbl said in Mongo gone zalgo:

    a library that is a dependency of a dependency injected a method "zalgo" on the built-in string type

    @wharrgarbl said in Mongo gone zalgo:

    None of these 3 are possible in java or c#,

        class Program
        {
            static void Main(string[] args)
            {
                Console.WriteLine("test".Zalgo());
                Console.Read();
            }
        }
    
        static class Extensions
        {
            public static string Zalgo (this string input)
            {
                return  "ť͇̞̳͋̏̈ͣ̄̄̈͆é̸̛̙͙̟̩̍ͬs̞̱̣̩͓͕̲ͮ̆̈́͌̊t̢̺͓̗̞̥̮̆̓̋ͫ ̡̩̼͎̱̉̈ͯ̅t̴̩̟̹̩͖̰͛̅̏̋ͮ͊e̷̱̯͙͗̌͗̎ͩs̨̲͈̀̃ͧͦͨ͒́t͕̤̟̹̩̠̞̒̈́͜ ̩̰̰̲̱̟͑ͬ̐ͥ͑̏ͥ̈t̗̩͆̅͘ḙ̶ͦ̆͂̄͘s̡͚̝̤̟̹̜͉̬ͨ̈́ͬ͞͡t͖͓̣̰͖ͪ͊̀";
            }
        }
    


  • @Jaloopa extension methods won't appear in reflection, and won't affect code that doesn't reference it directly


  • Discourse touched me in a no-no place

    @wharrgarbl There are deeper levels of fuckery possible if you're really determined. OTOH, the zalgo that is deliberate is not the true zalgo.



  • @Arantor said in Mongo gone zalgo:

    Most people in the PHP world know to SET NAMES utf8 and then just pretend everything is UTF-8 thereafter and oddly enough that's usually close enough - right until you try to do things like truncation on bytes rather than characters...

    Or you try working with emoji. MySQL's utf8 only supports 3 bytes per char, so they 🤔 added utf8mb4.



  • @DCoder been there, done that, wrote a patch for one system to convert to numeric entities as part of the stock htmlspecialchars call that system did for everything.

    But Toby Faire, not PHP's fault.


  • FoxDev

    @DCoder said in Mongo gone zalgo:

    MySQL's utf8 only supports 3 bytes per char

    0_1494529176733_1ouucn.jpg


Log in to reply