The "fa-spin" Testing Thread

Onyx

XSS awards all around?

Nah, fuck it, make that a "spin award".

Arantor

And so the fail continues. This is why you neuter everything and then only parse what you know to be safe.

I'm half willing to bet that if I were to submit posts with <script> tags in, whereby the s and the cript part (or any other arbitrary break in the word) were split by a 0x00 byte, half the browsers would still parse it and Discourse would probably ignore it... but I can't be arsed to simulate such a post.

boomzilla

@Arantor said:

This is why you neuter everything and then only parse what you know to be safe.

I think that's what they're trying to do. But somehow they're only looking at the first class. Or something?

Filed Under: Fucking Users

Arantor

No, they're not, they're really not. It's the reason that invalid tags like <aaa> don't necessarily turn up in actual source properly. <aaa></aaa> And why we had/have to manually escape < as an entity ourselves!

ChaosTheEternal

@Arantor said:

This is why you neuter everything and then only parse what you know to be safe.

Or you only support something like BBCode, avoiding the whole capability for users to enter HTML, styles, and classes and make a mess of things.

Filed under: [Of course, you have to be sure your parser for BBCode won't incidentally inject something that messes things up](#tag2), [unless there is an official spec I'm not aware of](#tag2), [not like I've looked](#tag2)

Arantor

And no, there's no official bbcode spec. Everyone does it differently, and half of them use regex!

Zecc

Just in case the posts do get rebaked (whatever that means), here's a visual backup:

http://gfycat.com/ExcellentGenerousAsianpiedstarling
http://gfycat.com/DetailedWeightyFlyinglemur

Keith

@Arantor said:

Everyone does it differently, and half of them use regex!

Is there something wrong with using regex to parse HTML?

ChaosTheEternal

I didn't think there was, but I wasn't sure since I don't work on forums or anything else that would use BBCode.

PJH

@Zecc said:

rebaked (whatever that means),

TLDR: it's a cache.

Essentially they have two copies of each post. The 'raw' code that someone typed in, and a cached version of parsing that raw code so that it doesn't have to be parsed every time someone wants to view the post.

The cached version is the 'baked' version, and it only gets rebaked under certain conditions, the usual one is when a post gets edited, but I'm fairly certain there'll be a facility whereby every post gets rebaked.

Onyx

@Arantor said:

And no, there's no official bbcode spec. Everyone does it differently, and half of them use regex!

What the hell is with the regex fascination? I'm known to be a bit regex happy and use it at times when some form of string.replace would do, but why in the hell are so many people trying to parse irregular languages with regexes?

Then again, I did write this abomination:

http://what.thedailywtf.com/t/asterisk-pbx/1235/31?u=onyx

But in my defense AEL doesn't have a proper substr, let alone strpos, so I might be semi-vindicated. Maybe.

PJH

@Onyx said:

XSS awards all around?

http://what.thedailywtf.com/t/badges/1494/9?u=pjh

... Sheesh - what a fucking mess... expand it/go to the OP to see it properly.

Arantor

The problem with regexes on user content like this means you can find yourself dealing with pathological cases. Like we saw recently on here in fact where previewing posts could actually effectively lock the browser if they were suitably twisted and nasty.

All the forum software that use regex parsers have had at least one major vulnerability in the 'denial of service' category from intentionally badly formed posts.

ChaosTheEternal

@Onyx said:

What the hell is with the regex fascination?

Because "They Can Do Everything"™

Except they have some failings, syntax and feature differences between platforms, and can be a nightmare to maintain the more complex the regex.

faoileag

@PJH said:

but I'm fairly certain there'll be a facility whereby every post gets rebaked.

Just provide one of these nifty button things for it next to the like button once the facility has arrived. I'm sure "The Official 'Rebake All Posts' Thread" will spring into existence shortly after.

boomzilla

OK...I'll admit to not following the exploits or looking very deep into their code.

One weird thing is that they seem to have no distinction between whitelisting what the markdown stuff generates vs what the user types in. It seems like they should be simply sanitizing user input and trust the markdown, since that's their code.

ChaosTheEternal

@boomzilla said:

It seems like they should be aggressively sanitizing user input and trust the markdown, since that's their code.

FTFY, and yes, that is probably what they should be doing. I really don't understand why they don't trust their own code either.

HardwareGeek

@ChaosTheEternal said:

I really don't understand why they don't trust their own code either.

Perhaps because one of us will find a way to make their markdown parser emit something unsanitary.

ChaosTheEternal

I assume they never trusted it (not looking into the revision history for their posting mechanism), not just after we started using it.

I would have to assume, actually, that they first translate Markdown and BBCode into HTML, then try to strip out things from the HTML that aren't in the whitelist. But then, why not do it in reverse order? Less to potentially have to check against the whitelist, and again, you¹ should be able to trust whatever you translate into HTML.

¹ Royal "you", but not the Discourse devs.

faoileag

@ChaosTheEternal said:

I really don't understand why they don't trust their own code either

Because they know their code?

Arantor

Interesting fact, other systems do sometimes allow 'safe' HTML intermixed in the bbcode, but invariably silently convert it to bbcode, and then run it through the bbc parser. Some do it at post-save time, some do it at display time, some do a mixture depending on the situation (e.g. accepting content with WYSIWYG editors where it might be intermixed anyway, for raw HTML for simpler stuff and bbc for complex stuff)

locallunatic

@ChaosTheEternal said:

I really don't understand why they don't trust their own code either.

Cause they are following the two basic rules of life

Don't trust things made by incompetents
Never assume you are competent

error

@Arantor said:

And so the fail continues. This is why you neuter everything and then only parse what you know to be safe.

I'm half willing to bet that if I were to submit posts with <script> tags in, whereby the s and the cript part (or any other arbitrary break in the word) were split by a 0x00 byte, half the browsers would still parse it and Discourse would probably ignore it... but I can't be arsed to simulate such a post.

I've read through their sanitizer pretty thoroughly, and they do strip NULs. Actually it seems to be some Apache-licensed thing. But they've made a few modifications, my best hope for a breakthrough is in their changes.

I've found a few very promising leads, but I don't want to share them yet. The gold badge will be mine!

Filed under: Hint: html.sanitize

abarker

Wheee!

Thanks @fatbull!

the_dragon

Behold, I give you celestial motion (originally posted here):

PJH

Meanwhile the developers are having a hard time stopping this little 'featurette':

https://meta.discourse.org/t/classes-are-not-being-sanitized-in-cooked-markdown/17367/17?u=pjh

Three "fixes" at the time of posting, and even I can still make it happen.

PJH

Spoke too soon - they have an update since that post. Just going to update...

PJH

Updated and applied up to

Jul 9, 2014

FIX: better whitelisting · discourse/discourse@d54c28a

A platform for community discussion. Free, open, simple. - FIX: better whitelisting · discourse/discourse@d54c28a

Hard-refresh your browser before relying on the preview pane showing anything remotely accurate.

Go wild people...

ben_lubar

PJH

Hmm. I'm fairly certain I'll get short shrift if I go and report that one over in meta..

Arantor

So, business as usual?

PJH

darkmatter

It's like you bastards always patch the sploits while I'm not around to play with them :(

RTapeLoadingError

I know

This one looked like fun

Onyx

So... if I steal this code...

:moon:

Well, carp! Foiled again!

PJH

Get up earlier then! :)

PJH

Well I posted it anyway, and got two likes within eight minutes, from the usual suspects...

dhromed

@the_dragon said:

celestial motion

What's the point of all the nested spin and flip?

Nagesh

This post is deleted!

the_dragon

Keeps the orbits balanced so that the earth doesn't wobble and the sun doesn't move.

tufty

@Keith said:

Is there something wrong with using regex to parse HTML

Yes. The same answer applies to the truncated version of that question, viz:

Is there something wrong with using regex to parse

Regular expressions are not a parser. Regular expressions are not a replacement for a parser. If you think they are, and rely on regular expressions for parsing, you are a fucking idiot.

Cursorkeys

@faoileag said:

The thread so far doesn't even break Chrome on mobile

Discourse (mis)renders like an angry bag of weasels for me on mobile. Making it crash would be a big improvement.

Keith

+1, would troll again.

Matches

I'm not that pathological arantor. I just like to break things.

ender

Entire thread immortalized:

LoremIpsum

Good job. @sam's bugfixing is a barrier to bug preservation.

Cursorkeys

Huh, @faoileag's avatar is now a broken image in my quoted reply.

I'm assuming that's because he's changed his avatar and it hasn't cached the old one.

I thought it was stated that the WHOLE POINT of Discourse's Borg approach to other peoples images was so that broken image links wouldn't happen on Discourse. Is this a known bug @sam?

faoileag

@Cursorkeys said:

Huh, @faoileag's avatar is now a broken image in my quoted reply.

Interesting. It could mean that the avatar is baked into the reply but for the usual reasons (we are talking about Discourse here) I doubt that.

After all, when I visit this forum from mobile, I see that greenish fractal avatar next to my posts, but the avatar I used before the one I used before the greenish fractal avatar in the title bar.

@Cursorkeys said:

I'm assuming that's because he's changed his avatar and it hasn't cached the old one.

A sensible approach would be to give a new avatar a new filename on upload... I wouldn't be surprised if Discourse didn't.

Edit: Might indeed be baked in, the url of my avatar in your reply is different from the one in my post. :surprise:

Cursorkeys

Yep, seems to be baked in. But they DO seem to have unique identifiers.

Your current set of avatars are:
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/20/3811.png
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/32/3811.png
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/45/3811.png
http://what.thedailywtf.com/user_avatar/what.thedailywtf.com/faoileag/120/3811.png

So the first folder is for size in pixels and the second looks like an ID/GUID as you had '3595' before.

None of the expected images are served under the old ID of 3595.

Edit:

Yeah after checking all possible images in the space [0000:9999] the only active images are with ID/GUID 3811 and 0000 (which is empty).
So, they have do have a unique identifier but then destroy the old ones for some reason which breaks linkage :brillant:

faoileag

@faoileag said:

Interesting. It could mean that the avatar is baked into the reply but for the usual reasons (we are talking about Discourse here) I doubt that.

This reply is for testing purposes only.