The "fa-spin" Testing Thread
-
XSS awards all around?
Nah, fuck it, make that a "spin award".
-
And so the fail continues. This is why you neuter everything and then only parse what you know to be safe.
I'm half willing to bet that if I were to submit posts with <script> tags in, whereby the s and the cript part (or any other arbitrary break in the word) were split by a 0x00 byte, half the browsers would still parse it and Discourse would probably ignore it... but I can't be arsed to simulate such a post.
-
This is why you neuter everything and then only parse what you know to be safe.
I think that's what they're trying to do. But somehow they're only looking at the first class. Or something?
Filed Under: Fucking Users
-
No, they're not, they're really not. It's the reason that invalid tags like
<aaa>
don't necessarily turn up in actual source properly. <aaa></aaa> And why we had/have to manually escape < as an entity ourselves!
-
This is why you neuter everything and then only parse what you know to be safe.
Or you only support something like BBCode, avoiding the whole capability for users to enter HTML, styles, and classes and make a mess of things.
Filed under: [Of course, you have to be sure your parser for BBCode won't incidentally inject something that messes things up](#tag2), [unless there is an official spec I'm not aware of](#tag2), [not like I've looked](#tag2)
-
And no, there's no official bbcode spec. Everyone does it differently, and half of them use regex!
-
Just in case the posts do get rebaked (whatever that means), here's a visual backup:
http://gfycat.com/ExcellentGenerousAsianpiedstarling
http://gfycat.com/DetailedWeightyFlyinglemur
-
Everyone does it differently, and half of them use regex!
Is there something wrong with using regex to parse HTML?
-
I didn't think there was, but I wasn't sure since I don't work on forums or anything else that would use BBCode.
-
rebaked (whatever that means),
TLDR: it's a cache.
Essentially they have two copies of each post. The 'raw' code that someone typed in, and a cached version of parsing that raw code so that it doesn't have to be parsed every time someone wants to view the post.
The cached version is the 'baked' version, and it only gets rebaked under certain conditions, the usual one is when a post gets edited, but I'm fairly certain there'll be a facility whereby every post gets rebaked.
-
And no, there's no official bbcode spec. Everyone does it differently, and half of them use regex!
What the hell is with the regex fascination? I'm known to be a bit regex happy and use it at times when some form of
string.replace
would do, but why in the hell are so many people trying to parse irregular languages with regexes?Then again, I did write this abomination:
http://what.thedailywtf.com/t/asterisk-pbx/1235/31?u=onyx
But in my defense AEL doesn't have a proper
substr
, let alonestrpos
, so I might be semi-vindicated. Maybe.
-
XSS awards all around?
http://what.thedailywtf.com/t/badges/1494/9?u=pjh
... Sheesh - what a fucking mess... expand it/go to the OP to see it properly.
-
The problem with regexes on user content like this means you can find yourself dealing with pathological cases. Like we saw recently on here in fact where previewing posts could actually effectively lock the browser if they were suitably twisted and nasty.
All the forum software that use regex parsers have had at least one major vulnerability in the 'denial of service' category from intentionally badly formed posts.
-
What the hell is with the regex fascination?
Because "They Can Do Everything"™
Except they have some failings, syntax and feature differences between platforms, and can be a nightmare to maintain the more complex the regex.
-
but I'm fairly certain there'll be a facility whereby every post gets rebaked.
Just provide one of these nifty button things for it next to the like button once the facility has arrived. I'm sure "The Official 'Rebake All Posts' Thread" will spring into existence shortly after.
-
OK...I'll admit to not following the exploits or looking very deep into their code.
One weird thing is that they seem to have no distinction between whitelisting what the markdown stuff generates vs what the user types in. It seems like they should be simply sanitizing user input and trust the markdown, since that's their code.
-
It seems like they should be aggressively sanitizing user input and trust the markdown, since that's their code.
FTFY, and yes, that is probably what they should be doing. I really don't understand why they don't trust their own code either.
-
I really don't understand why they don't trust their own code either.
Perhaps because one of us will find a way to make their markdown parser emit something unsanitary.
-
I assume they never trusted it (not looking into the revision history for their posting mechanism), not just after we started using it.
I would have to assume, actually, that they first translate Markdown and BBCode into HTML, then try to strip out things from the HTML that aren't in the whitelist. But then, why not do it in reverse order? Less to potentially have to check against the whitelist, and again, you1 should be able to trust whatever you translate into HTML.
1 Royal "you", but not the Discourse devs.
-
I really don't understand why they don't trust their own code either
Because they know their code?
-
Interesting fact, other systems do sometimes allow 'safe' HTML intermixed in the bbcode, but invariably silently convert it to bbcode, and then run it through the bbc parser. Some do it at post-save time, some do it at display time, some do a mixture depending on the situation (e.g. accepting content with WYSIWYG editors where it might be intermixed anyway, for raw HTML for simpler stuff and bbc for complex stuff)
-
I really don't understand why they don't trust their own code either.
Cause they are following the two basic rules of life
- Don't trust things made by incompetents
- Never assume you are competent
-
And so the fail continues. This is why you neuter everything and then only parse what you know to be safe.
I'm half willing to bet that if I were to submit posts with <script> tags in, whereby the s and the cript part (or any other arbitrary break in the word) were split by a 0x00 byte, half the browsers would still parse it and Discourse would probably ignore it... but I can't be arsed to simulate such a post.
I've read through their sanitizer pretty thoroughly, and they do strip NULs. Actually it seems to be some Apache-licensed thing. But they've made a few modifications, my best hope for a breakthrough is in their changes.
I've found a few very promising leads, but I don't want to share them yet. The gold badge will be mine!
Filed under: Hint:
html.sanitize
-
-
-
Meanwhile the developers are having a hard time stopping this little 'featurette':
https://meta.discourse.org/t/classes-are-not-being-sanitized-in-cooked-markdown/17367/17?u=pjh
Three "fixes" at the time of posting, and even I can still make it happen.
-
Spoke too soon - they have an update since that post. Just going to update...
-
Updated and applied up to
Hard-refresh your browser before relying on the preview pane showing anything remotely accurate.
Go wild people...
-
-
Hmm. I'm fairly certain I'll get short shrift if I go and report that one over in meta..
-
So, business as usual?
-
-
It's like you bastards always patch the sploits while I'm not around to play with them :(
-
I know
This one looked like fun
-
So... if I steal this code...
:moon:
Well, carp! Foiled again!
-
Get up earlier then! :)
-
Well I posted it anyway, and got two likes within eight minutes, from the usual suspects...
-
-
This post is deleted!
-
Keeps the orbits balanced so that the earth doesn't wobble and the sun doesn't move.
-
Is there something wrong with using regex to parse HTML
Yes. The same answer applies to the truncated version of that question, viz:Is there something wrong with using regex to parse
Regular expressions are not a parser. Regular expressions are not a replacement for a parser. If you think they are, and rely on regular expressions for parsing, you are a fucking idiot.
-
The thread so far doesn't even break Chrome on mobile
Discourse (mis)renders like an angry bag of weasels for me on mobile. Making it crash would be a big improvement.
-
+1, would troll again.
-
I'm not that pathological arantor. I just like to break things.
-
-
Good job. @sam's bugfixing is a barrier to bug preservation.
-
Huh, @faoileag's avatar is now a broken image in my quoted reply.
I'm assuming that's because he's changed his avatar and it hasn't cached the old one.
I thought it was stated that the WHOLE POINT of Discourse's Borg approach to other peoples images was so that broken image links wouldn't happen on Discourse. Is this a known bug @sam?
-
Huh, @faoileag's avatar is now a broken image in my quoted reply.
Interesting. It could mean that the avatar is baked into the reply but for the usual reasons (we are talking about Discourse here) I doubt that.After all, when I visit this forum from mobile, I see that greenish fractal avatar next to my posts, but the avatar I used before the one I used before the greenish fractal avatar in the title bar.
I'm assuming that's because he's changed his avatar and it hasn't cached the old one.
A sensible approach would be to give a new avatar a new filename on upload... I wouldn't be surprised if Discourse didn't.Edit: Might indeed be baked in, the url of my avatar in your reply is different from the one in my post. :surprise:
-
Yep, seems to be baked in. But they DO seem to have unique identifiers.
Your current set of avatars are:
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/20/3811.png
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/32/3811.png
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/45/3811.png
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/120/3811.pngSo the first folder is for size in pixels and the second looks like an ID/GUID as you had '3595' before.
None of the expected images are served under the old ID of 3595.
Edit:
Yeah after checking all possible images in the space [0000:9999] the only active images are with ID/GUID 3811 and 0000 (which is empty).
So, they have do have a unique identifier but then destroy the old ones for some reason which breaks linkage :brillant:
-
Interesting. It could mean that the avatar is baked into the reply but for the usual reasons (we are talking about Discourse here) I doubt that.
This reply is for testing purposes only.