Twitter links with double underscores in username breaks onebox
-
Repro:
https://twitter.com/_Chloeann/status/1017861208345645056
Raw:
https://twitter.com/__Chloeann_/status/1017861208345645056
-
I'm sure we've had this before...
FakeEdit: Ah - here we go:
https://what.thedailywtf.com/topic/23511/url-s-with-underscores-in-them
-
https://twitter.com/__Chloeann_/status/1017861208345645056
https://twitter.com/_Chloeann/status/1017861208345645056
<https://twitter.com/__Chloeann_/status/1017861208345645056>
-
@doctorjones Is there some reason people who build URL parsers never bother to fucking read the list of characters that can appear in URLs? I feel like there's a new version of this same bug every fucking month.
-
@blakeyrat said in Twitter links with double underscores in username breaks onebox:
@doctorjones Is there some reason people who build URL parsers never bother to fucking read the list of characters that can appear in URLs? I feel like there's a new version of this same bug every fucking month.
It's literally unpossible to know which special junk characters can be in the box thing on top of my Facebook viewer.
-
@blakeyrat said in Twitter links with double underscores in username breaks onebox:
@doctorjones Is there some reason people who build URL parsers never bother to fucking read the list of characters that can appear in URLs? I feel like there's a new version of this same bug every fucking month.
and this particular incarnation of the bug has been sat there for 12 months since the original bug report
-
@ben_lubar said in URL's with underscores in them:
This is not a NodeBB bug. NodeBB is following the CommonMark spec.
-
@zecc quoted in Twitter links with double underscores in username breaks onebox:
NodeBB is following the CommonMark spec.
Is there anywhere a bug report could be filed for CommonMark?</rhetorical>
The current version of the CommonMark spec is complete, and quite robust after a year of public feedback … but not quite final.
With your help, we plan to announce a finalized 1.0 spec and test suite in 2018.
<3
™®©<unknown>
-
Protip: you can fix most CommonMark link parsing problems with angle brackets:
-
@ben_lubar said in Twitter links with double underscores in username breaks onebox:
Protip: you can fix most CommonMark link parsing problems with angle brackets:
s9e\TextFormatter (Fatdown/PHP) :
-
@pjh Good thing we're not using that one, then, because it violates the spec.
-
@ben_lubar said in Twitter links with double underscores in username breaks onebox:
Protip: you can fix most CommonMark link parsing problems with angle brackets:
Hey, here's another idea: why don't we stop using that shitty broken technology altogether!??!?!?
What kind of weird ant-carried fungus infects the brain of open source fans and makes them think Markdown is a good idea? Why do shitty technologies spread so fast?
-
@ben_lubar said in Twitter links with double underscores in username breaks onebox:
Protip: you can fix most CommonMark link parsing problems with angle brackets:
Thanks Ben, I will try to remember that tip. It's just a shame that I need to.
(The following is not directed at Ben) I don't think shrugging and saying "it's the spec's fault" is a great way of dealing with this problem. There's got to be a way of working around this, so the user doesn't have to be familiar with the finer points of markdown parsing. @julianlam are you guys aware of this issue? You could fix this by implicitly inserting angled brackets when you see a raw URL in a post.
-
@doctorjones said in Twitter links with double underscores in username breaks onebox:
You could fix this by implicitly inserting angled brackets when you see a raw URL in a post.
That's what's already happening - it just isn't working for URLs with some random types of punctuation in them and no those types aren't documented.
-
@ben_lubar it sounds like we need a step running in the editor before we pass the content to the markdown parser.
Something like:
// put angled brackets around any raw URLs so the markdown parser doesn't fuck them post = post.replace(/(?!([^\s]*[='"\(\)]))(?<url>(?<protocol>[^\s<>]*:\/\/)[^\s<>]*)/g, "<$1>");
Edit, I've tested the regex here, feel free to use it. I've probably not thought of all the edge cases, but it's a start.
-
@doctorjones I'd be wary of doing this because it means that performing a search for a specific URL may or may not return the correct result.
-
@bb36e we're talking about raw, not rendered content.
-
@ben_lubar said in Twitter links with double underscores in username breaks onebox:
That's what's already happening - it just isn't working for URLs with some random types of punctuation in them and no those types aren't documented.
So what's stopping you or anybody from fixing this, again? I'm confused.
-
@blakeyrat the same answer as always:
-
@ben_lubar So you've given up on the "it would break the standard that's not really a standard and nobody cares about it" argument?
-
@doctorjones from my understanding, search doesn't run on rendered content. try searching for a post based on a youtube onebox, for example and you won't get the right results, whereas searching for the youtube URL will give you results.
-
@blakeyrat said in Twitter links with double underscores in username breaks onebox:
@ben_lubar So you've given up on the "it would break the standard that's not really a standard and nobody cares about it" argument?
That's a pretty standard argument.
-
@blakeyrat said in Twitter links with double underscores in username breaks onebox:
@ben_lubar said in Twitter links with double underscores in username breaks onebox:
Protip: you can fix most CommonMark link parsing problems with angle brackets:
Hey, here's another idea: why don't we stop using that shitty broken technology altogether!??!?!?
What kind of weird ant-carried fungus infects the brain of open source fans and makes them think Markdown is a good idea? Why do shitty technologies spread so fast?
PRs are accepted
-
@bb36e said in Twitter links with double underscores in username breaks onebox:
@doctorjones from my understanding, search doesn't run on rendered content. try searching for a post based on a youtube onebox, for example and you won't get the right results, whereas searching for the youtube URL will give you results.
Sorry for the confusion, I'm talking about a render step before we pass it off to the markdown engine. The whole point of raw is that it remains an unmolested version of what the user typed. I'm not suggesting changing the raw, otherwise it wouldn't be raw any more.
-
Why does the markdown parser even look at active urls anyway? We already have some method to tell it
when it **shouldn't** parse
, and clickable/oneboxable urls should be one of these cases.
-
This.
-
-
@pie_flavor said in Twitter links with double underscores in username breaks onebox:
https://twitter.com/\_\_Chloeann_/status/1017861208345645056
Twatter really didn't like that link:
-
@hungrier said in Twitter links with double underscores in username breaks onebox:
Why does the markdown parser even look at active urls anyway? We already have some method to tell it
when it **shouldn't** parse
, and clickable/oneboxable urls should be one of these cases.The oneboxing is a separate plugin from the markdown processor. I'm not sure in which order those happen or how exactly they interact any more (been a while since I looked at all of that). But the point is that the markdown stuff has no awareness of iframely or the youtube plugin or anything else.
-
@boomzilla Maybe we need to MD5 encode urls before the markdown process then restore them afterwards