WTF Bites



  • @Arantor said in WTF Bites:

    PHP is fuckin’ oldschool, man. We were there, Gandalf, 3000 years ago, etc.

    The :belt_onion: thread is :arrows:, but please leave your PHP elephant at the door.



  • @Zerosquare said in WTF Bites:

    @Arantor said in WTF Bites:

    PHP is fuckin’ oldschool, man. We were there, Gandalf, 3000 years ago, etc.

    please leave your PHP elephant at the door.

    Which one?



  • @Bulb said in WTF Bites:

    very occasionally by the user to allow hyphenating a long word if the software does not know the rules, or does not know them for that word.

    It’s a common enough character in DTP that InDesign has an easy keyboard shortcut for it:

    Discretionary hyphen.png

    But this is the kind of program that gets used by people who (should) know when to use a soft hyphen, and do so on purpose, like you say. Compared to the world’s most popular word processor:

    Word soft hyphen.png


  • BINNED

    @Gurth said in WTF Bites:

    Compared to the world’s most popular word processor:

    I discovered the other week that it doesn’t even kern by default, at any size. I knew approximately every other opentype feature comes disabled, but kerning?



  • @Arantor said in WTF Bites:

    Which one?

    Both of them.
    30cf9871-7c78-482e-b7da-5fc4189c818f-image.png


  • BINNED

    @Gurth said in WTF Bites:

    Compared to the world’s most popular word processor:

    Word soft hyphen.png

    Ctrl+- inserts a soft hyphen for me.
    At least on Windows. On mac I can't get it to do that with either Cmd+- (zooms out) or Ctrl+- (can't tell if it does anything at all).



  • @Zerosquare said in WTF Bites:

    @Arantor said in WTF Bites:

    Which one?

    Both of them.
    30cf9871-7c78-482e-b7da-5fc4189c818f-image.png

    Mine don’t look quite like that.

    IMG_0904.jpeg



  • @Arantor said in WTF Bites:

    Mine don’t look quite like that.

    Maybe they are the legacy, deprecated versions?



  • @Zerosquare well, one is the Laravel edition and one is just the classic edition without the stupid 8 on it.


  • Considered Harmful

    @Arantor Where's Postgres? 🐘



  • @Applied-Mediocrity said in WTF Bites:

    @Arantor Where's Postgres? 🐘

    That would require me to go to a Postgres conference, which I have not had the fortune to yet attend.


  • Considered Harmful

    @Arantor said in WTF Bites:

    @Applied-Mediocrity said in WTF Bites:

    @Arantor Where's Postgres? 🐘

    That would require me to go to a Postgres conference, which I have not had the fortune to not yet attend.

    🔧:trollface:



  • @kazitor said in WTF Bites:

    @HardwareGeek said in WTF Bites:

    Incidentally, autocarrot on my phone seems to have just learned the word autocarrot. I wonder if it will continue suggesting it as a potential completion if I haven't just typed it recently.

    My phone still suggests made-up words I haven’t used in five years.

    Like rhamphorhynchus? :tro-pop-wave:



  • @Arantor said in WTF Bites:

    @Zerosquare said in WTF Bites:

    @Arantor said in WTF Bites:

    PHP is fuckin’ oldschool, man. We were there, Gandalf, 3000 years ago, etc.

    please leave your PHP elephant at the door.

    Which one?

    real_php_elephant_v2_no_really_this_is_the_one


  • Notification Spam Recipient

    @Arantor said in WTF Bites:

    @Zerosquare said in WTF Bites:

    @Arantor said in WTF Bites:

    Which one?

    Both of them.
    30cf9871-7c78-482e-b7da-5fc4189c818f-image.png

    Mine don’t look quite like that.

    IMG_0904.jpeg

    You have the gay edition. 🎉



  • @Tsaukpaetra they’re hetero life mates, like Jay and Silent Bob.



  • @Bulb the question: is it actually PHP or is it Windows fucking this up and PHP not correctly unfucking it?

    Note that this doesn’t apply on non-Windows which makes me think part of the deal is Windows’ blasé attitude to command line things in the first place.

    (NB, haven’t looked at the code to verify, but also, running PHP in prod on Windows is fucking :laugh-harder: territory anyway, especially if you’re not running it hanging off the Apache integration in the first place)



  • @Arantor said in WTF Bites:

    Mine don’t look quite like that.

    Just put them into the washing machine, together. After a couple of washes, they may have reached the correct colour version.
    For more tips on internet connected elephpant washing, ask ✈ member.



  • @Arantor Ok, I bit the :doing_it_wrong: and skimmed through the article. The problem is, as usual, failure to work around Windows :wtf:s.

    See, in more cultured operating systems, command-line arguments are passed as good old array of strings, and the system only uses one representation of those that has long since changed from whatever legacy encoding to utf-8. But as Microsoft jumped on the Unicode bandwagon before things settled, and had too many engineers at hand, you have four options on Windows: you can accept the arguments as either single string or a list of strings, and you can accept them either as utf-16 or as the pre-unicode 8-bit encoding.

    For backward compatidebility reasons, programs that just use the C standard main function get the arguments as a list of strings in the legacy 8-bit encoding. So the internal utf-16 string that goes through the internal API has to be converted to the legacy encoding, which is where this “best fit” thing happens.

    Now any self-respecting application that wants to deal with unicode properly implements the utf-16 interface, and converts it to whatever internal encoding it wants to work with (usually utf-8) and then back to utf-16 to pass to the utf-16 variants of the Windows API. Most portability layers do that.

    It seems like something in PHP is still using the 8-bit interfaces somewhere that in Windows are still the legacy pre-unicode encoding. Either for backward compatidebility, or because lack of knowledge/understanding/manpower.


  • BINNED

    @Bulb ugh, nah. That definitely sounds like a case of “not a PHP problem, CLOSED WONTFIX.

    Just don’t parsepass command line parameters that your OS can’t handle correctly. Who needs soft hyphens in the command line anyway.

    Oh, you’re dynamically creating command lines from untrusted user input? Well, there’s your real problem.



  • @topspin Except … well, it's part of the CGI specification. Yeah, using CGI on Windows is dumb anyway, starting processes on Windows is slow, you really don't want to start one for each request.


  • Java Dev

    @Bulb said in WTF Bites:

    See, in more cultured operating systems, command-line arguments are passed as good old array of strings, and the system only uses one representation of those that has long since changed from whatever legacy encoding to utf-8.

    I'm pretty sure utf-8 is the convention, not the rule, and the OS just deals with bytes.

    Without reading TFA, part of the problem has to be command line string splitting. Else standard hyphens on the input end would cause the same problem.



  • @topspin said in WTF Bites:

    @Bulb ugh, nah. That definitely sounds like a case of “not a PHP problem, CLOSED WONTFIX.

    The process gets a single UCS-2 string; any and all conversion, splitting, parsing, and hyphen substitution is being done by PHP (or by whatever C runtime it uses). Checking the source code, the Visual C Runtime is doing the conversion to ISO 8859-whatever and splitting on spaces, but PHP has its own custom special sauce getopt that's screwing it up.


  • BINNED

    @TwelveBaud I didn’t RTFA :doing_it_wrong:, just going by what was posted here. In this case, it does sound like a bug in PHP.


  • BINNED

    Status: I tried to type ”pennies”. The iPhone didn’t like that and wanted to correct the word to Pennie’s instead. So I tapped the little x on the autocorrect pop up to make it go away. Then I tapped on the key to close the quote. It then applied the autocorrect that I had explicitly told it I do not want exactly one character ago. :angry:

    Since the PowerPoint incident I’ve contemplated a bit and have come to the following conclusion.

    • On a computer with a real keyboard, I do not want autocorrect, ever. 9 times out of 10, it does the opposite of what I want. Which does not make up for the one time I typo’d something and would’ve corrected it manually anyway. Sometimes, especially in Word/PPT, I do want auto-format, i.e. I enter something like --> and it formats it to a real arrow glyph. But that’s a conscious short-cut, not autocorrect, and often enough even that gets in the way when I don’t want it.
    • On a phone, autocorrect is an unfortunate necessity. I actually rely on it to capitalize things, that is, I’m almost always too lazy to press shift manually, hoping that it’ll do the right thing. (And manually correcting if it doesn’t.) But that only applies to the immediate autocorrect that it displays while I type a word. This “smart” context sensitive shit that corrects stuff up to a few words back that they introduced several years ago is the devil. Not once has that not pissed me off. I wish I could selectively turn it off instead of just turning autocorrect on/off completely.

  • Discourse touched me in a no-no place

    @topspin said in WTF Bites:

    I wish I could selectively turn it off instead of just turning autocorrect on/off completely.

    You can at least keep auto capitalisation on with auto correct turned off.


  • BINNED

    @loopback0 yeah, I just saw that when I checked again. Maybe that’d work as a compromise, but I think I still want normal autocorrect on



  • @topspin Have turned it off. It gets too confused with multiple languages. Takes a bit longer to type something out, given that phone keyboards suck (I'm sure they've gotten worse...). Then again, I don't really type that much on the phone.

    Autocarrot on a computer with a real keyboard = 💀.



  • @TwelveBaud that code seems to be buttuming the hyphens it’s fed are real hyphens - there’s nothing in that function about normalising anything funky.

    Eh, I don’t know. There’s enough stupid all around.



  • @Arantor said in WTF Bites:

    There’s enough stupid all around.

    PHP, so yes. 🎺



  • I'm not sure what's up with my (Android) phone's swipe-to-spell feature, but it often ignores the first word I swipe. When I try to replicate it intentionally, it works fine, though.



  • @PleegWat said in WTF Bites:

    @Bulb said in WTF Bites:

    See, in more cultured operating systems, command-line arguments are passed as good old array of strings, and the system only uses one representation of those that has long since changed from whatever legacy encoding to utf-8.

    I'm pretty sure utf-8 is the convention, not the rule, and the OS just deals with bytes.

    It depends on the filesystem. Most Unix filesystems deal with bytes, but some deal with Unicode in various forms, so the kernel needs to convert appropriately to unify it. MacOS HFS+ goes so far as to doing normalization, picking the less common decomposed form, and not preserving the normalization¹, forcing software on MacOS to deal with it.

    Without reading TFA, part of the problem has to be command line string splitting. Else standard hyphens on the input end would cause the same problem.

    1. Not splitting, but interpretation. Hyphens have special meaning on command-line.
    2. Input does not get converted like this in the bowels of standard library. It is encoded exactly as the client sent it.

    ¹ If reading the directory returns exactly the same string under which the file was created, most software does not need to care what exactly the equivalence rules are. Which is the case on Windows, but not on MacOS, where it will be normalized to the decomposed form.



  • @Bulb said in WTF Bites:

    @Zerosquare said in WTF Bites:

    PHP will apply what’s known as a ‘best fit’ mapping, and helpfully assume that, when the user entered a soft hyphen, they actually intended to type a real hyphen, and interpret it as such.

    Warum, kurwa, just warum‽

    Hyphen-minus, U+002D, is right there on every keyboard, while soft-hyphen, U+00AD, requires some kind of shortcut or compose sequence or perusing an insert widget. If the user went through the trouble of using that, they almost certainly did so because they meant a soft-hyphen and not a plain hyphen-minus.

    Copy-paste from an application that automatically converts "helpfully" converts dashes to the "typographically correct" ones. These days, Slack is the biggest offender with its symmetric quotes.

    In (pre-)2005 the offender was obviously Word.



  • @TwelveBaud said in WTF Bites:

    @topspin said in WTF Bites:

    @Bulb ugh, nah. That definitely sounds like a case of “not a PHP problem, CLOSED WONTFIX.

    The process gets a single UCS-2 string; any and all conversion, splitting, parsing, and hyphen substitution is being done by PHP (or by whatever C runtime it uses).

    NO. By the PHP process, but by the C runtime before any actual PHP code runs..

    Checking the source code, the Visual C Runtime is doing the conversion to ISO 8859-whatever and splitting on spaces, but PHP has its own custom special sauce getopt that's screwing it up.

    The PHP getopt is fine. It is taking the C standard int argc, char* const *argv arguments, which means after the Visual C Runtime screwed it up.

    Of course Windows is just being its usual obnoxious self, and the correct fix would be for PHP to add a version of that getopt taking int argc, wchar_t* const *argv to parse the arguments without letting Windows convert them. Unfortunately nobody in the PHP project realized just how obnoxious Windows is.



  • @cvi said in WTF Bites:

    @topspin Have turned it off. It gets too confused with multiple languages. Takes a bit longer to type something out, given that phone keyboards suck (I'm sure they've gotten worse...). Then again, I don't really type that much on the phone.

    Autocarrot on a computer with a real keyboard = 💀.

    On a related note, the keyboard on latest Android started inserting space when selecting a word from the completion list. That pissed a lot of people, because:

    • Even in English, often you want to continue with a period, comma, dash, closing parenthesis or something.
    • In languages with flexing, like Czech, it is often useful to complete a stem of the word and add appropriate suffix which, of course, would not have a space before it.

    Overall it means the space should not be there half of the time.

    Of course, Google ain't listening to feedback and isn't adding an option to turn it off.



  • @Kamil-Podlesak said in WTF Bites:

    Copy-paste from an application that automatically converts "helpfully" converts dashes to the "typographically correct" ones.

    :doubt: The soft hyphen is not the typographically correct variant of any other symbol, dash or otherwise; it's a special character which should only be displayed right before a line break, and not be rendered at all otherwise. So the "typographically correct" way of dealing with soft hyphens, if anything, would be to entirely remove them from the copied text. (or maybe insert soft hyphens wherever a word might possibly be hyphenated, but that would be so egregiously problematic that I don't think anyone went there)



  • @Kamil-Podlesak said in WTF Bites:

    @Bulb said in WTF Bites:

    @Zerosquare said in WTF Bites:

    PHP will apply what’s known as a ‘best fit’ mapping, and helpfully assume that, when the user entered a soft hyphen, they actually intended to type a real hyphen, and interpret it as such.

    Warum, kurwa, just warum‽

    Hyphen-minus, U+002D, is right there on every keyboard, while soft-hyphen, U+00AD, requires some kind of shortcut or compose sequence or perusing an insert widget. If the user went through the trouble of using that, they almost certainly did so because they meant a soft-hyphen and not a plain hyphen-minus.

    Copy-paste from an application that automatically converts "helpfully" converts dashes to the "typographically correct" ones. These days, Slack is the biggest offender with its symmetric quotes.

    In (pre-)2005 the offender was obviously Word.

    It turns out that the misfeature is not related to any actual use-case. It is simply the Windows default UTF-16→CP12¹whatever conversion function trying to map as many characters as possible to something, sensible or not—in the process allowing to sneak them past sanitization.


    ¹ … actually I think it's converting to 8-something; there are two legacy charsets in Windows for different levels of backward compatidebility and I don't remember the exact rules which is used when.


  • Considered Harmful

    @topspin said in WTF Bites:

    Status: I tried to type ”pennies”. The iPhone didn’t like that and wanted to correct the word to Pennie’s instead. So I tapped the little x on the autocorrect pop up to make it go away. Then I tapped on the key to close the quote. It then applied the autocorrect that I had explicitly told it I do not want exactly one character ago. :angry:

    Just imagine what Tsaukpaetra's autocarrot would have done to your pennies.



  • @ixvedeusi said in WTF Bites:

    maybe insert soft hyphens wherever a word might possibly be hyphenated, but that would be so egregiously problematic that I don't think anyone went there

    About the only use-case I can think of for doing that, would be if you have a text that must be shown in a program that doesn’t know about hyphenation, but which you do want hyphenated.

    In general it would be far easier to just use the built-in hyphenation functionality of programs like word processors and web browsers, and accept that some words may not be hyphenated correctly, than to pre-process a whole text to insert soft hyphens.

    But your comment does have me wondering if word processors etc. insert a hyphen-minus character or a soft hyphen character when they automatically hyphenate a word … Lacking a font editor at the moment, though, I can’t whip up something to test that, unfortunately.

    Edit: Oh, there turns out to exist this. After struggling with its UI a bit, I managed to make a font which has an open square for the A, a dash for the hyphen-minus and a disc for the soft hyphen. Let’s see what happens if I use this in InDesign …

    Interesting. First, a short line that doesn’t wrap:

    Soft hyphen, no break.png

    Above my trial font, below Helvetica for comparison (with non-printing characters visible, as here, it shows the discretionary hyphen as a short blue dash). Oddly, in my test font it makes the soft hyphen into a visible rectangle, slightly bigger than what I used for the A — but not the disc I drew it as.

    Add an extra A to each line, and:—

    Soft hyphen, line break.png

    it shows the hyhen-minus character (with a blue ! over it to show it’s a discretionary hyphen that’s now visible).

    And the same in Pages:

    Soft hyphen, no break (Pages).png
    Soft hyphen, line break (Pages).png

    I don’t have Word to also test it in, and though I could fire up some other apps to try, I feel I’ve satisfied my curiosity now so I’m not going to bother :kneeling_warthog:



  • @Gurth said in WTF Bites:

    About the only use-case I can think of for doing that, would be if you have a text that must be shown in a program that doesn’t know about hyphenation, but which you do want hyphenated.

    If the text layout engine a program uses doesn't understand hyphenation, sticking a bunch of hyphens of any kind in the text won't get it to put in line breaks at them. Soft hyphens only designate a place where the engine can hyphenate if it wants, regardless of what its dictionary says for that word.

    There are no perfect answers when guessing intent. :(



  • @Parody said in WTF Bites:

    If the text layout engine a program uses doesn't understand hyphenation, sticking a bunch of hyphens of any kind in the text won't get it to put in line breaks at them. Soft hyphens only designate a place where the engine can hyphenate if it wants, regardless of what its dictionary says for that word.

    The case is when the layout engine can hyphenate, but does not have the language-specific data to tell it where. So some other component before it that does have that data inserts the soft hyphens and then the layout engine breaks lines at those that end up at end of line.

    Anyway, it still don't matter. The codepage conversion code doesn't do it because it would be the right thing to do it because it would actually be useful, it does it because someone thought it should convert all unicode hyphens to the one available in ascii.



  • @Bulb said in WTF Bites:

    The codepage conversion code doesn't do it because it would be the right thing to do it because it would actually be useful, it does it because someone thought it should convert all unicode hyphens to the one available in ascii muh it has always worked like this!.

    FTFY after finding this: https://www.unicode.org/L2/L2003/03155r-kuhn-soft-hyphen.pdf

    :headdesk:

    E: replaced huge onebox with a link to save some of that expensive thread real-estate


  • Java Dev

    @cvi said in WTF Bites:

    I'm sure they've gotten worse...

    To enter an e, press the 3 key twice.
    To enter two consecutive e's, press the 3 key twice, wait for at least a full second, then press the 3 key another two times.


  • Java Dev

    @Bulb said in WTF Bites:

    @cvi said in WTF Bites:

    @topspin Have turned it off. It gets too confused with multiple languages. Takes a bit longer to type something out, given that phone keyboards suck (I'm sure they've gotten worse...). Then again, I don't really type that much on the phone.

    Autocarrot on a computer with a real keyboard = 💀.

    On a related note, the keyboard on latest Android started inserting space when selecting a word from the completion list. That pissed a lot of people, because:

    • Even in English, often you want to continue with a period, comma, dash, closing parenthesis or something.
    • In languages with flexing, like Czech, it is often useful to complete a stem of the word and add appropriate suffix which, of course, would not have a space before it.

    Overall it means the space should not be there half of the time.

    Of course, Google ain't listening to feedback and isn't adding an option to turn it off.

    In my experience, if autocomplete inserts a space and the next character you manually enter is a period, comma, &c the autocompleted space is removed again. Of course that still leaves your flexing case.



  • @PleegWat That's how SwiftKey does it. But it's still not perfect, and often I have to manually go back with the arrows to fix quotation marks and parentheses, because they get associated with the wrong side of spaces



  • @hungrier said in WTF Bites:

    go back with the arrows

    656a6e96-0b75-4f67-a2d6-dcf81cc8b05d-image.png

    I do too, but it's not easily discoverable, so I forget they exist.



  • @HardwareGeek Keep the on-screen space bar/key pressed for a second, then swipe.



  • @Gurth Holding the spacebar just causes the CPU to overheatbrings up the menu to switch keyboard languages, and swiping does nothing.


  • Discourse touched me in a no-no place

    @Bulb said in WTF Bites:

    Unfortunately nobody in the PHP project realized just how obnoxious Windows is.

    Or they realised, but didn't want to deal with all that bullshit for a non-primary-target platform.


  • Discourse touched me in a no-no place

    @Bulb said in WTF Bites:

    Of course, Google ain't listening to feedback

    They never ever did.

    and isn't adding an option to turn it off.

    You can install a different keyboard app; it's a normal app with some extra permissions and which is called at the right time.


Log in to reply