Reply as linked topic Doesn't Perform HTML Sanitization

Apparently they got on it quick:
https://meta.discourse.org/t/links-in-topic-titles-override-topic-link/26837/5

EDIT: though I didn't actually look at what they did

They sanitize the display(on the client) but not the storage of the bad titles. wtf

RaceProUK

@delfinom said:

They sanitize the display(on the client) where it matters

delfinom

@RaceProUK said:

Until you add some new view and forget to sanitize again

RaceProUK

Yeah, but that can happen in all sorts of scenarios; it's not limited to this specific issue ;)

riking

Well, if you lay out your views in an environment where automatic HTML escaping is the default....

Kuro

@RaceProUK said:

Until you realize once more that things are / were not sanitized when displayed in the sidebar-area on the frontpage...

Filed Under: Then again, you might blame the frontpage coding for that... It really depends on where you stand with

RaceProUK

I'd say that's a front page issue, not a Discourse issue.

My view on data sanitisation is that you do it where it's needed, rather than where the data is. If that makes sense.

tar

@RaceProUK said:

My view on data sanitisation is that you do it where it's needed, rather than where the data is. If that makes sense.

I'm not really hearing a compelling argument against sanitizing your data as early as possible.

delfinom

This post is deleted!

RaceProUK

@tar said:

I'm not really hearing a compelling argument against sanitizing your data as early as possible.

I'm not really making one

tar

The mystery is resolved!

tarunik

@tar said:

I'm not really hearing a compelling argument against sanitizing your data as early as possible.

The problem with early sanitization is that you don't know what context your data is being put into until you go to put it into that context, but you need to sanitize the data differently depending on what context your data is going into.

tar

So you're arguing there might be a legitimate case for storing the user's name as ; DROP TABLE STUDENTS; -- Bob or <script src='zombo.com/xss.js'> Terry because we don't know at the point of entry if we'll store it in a database or render it on a webpage?

tarunik

@tar said:

So you're arguing there might be a legitimate case for storing the user's name as ; DROP TABLE STUDENTS; -- Bob or <script src='zombo.com/xss.js'> Terry because we don't know at the point of entry if we'll store it in a database or render it on a webpage?

Yes! What can happen if you try to sanitize prematurely is that you say, sanitize for HTML and then wind up dropping it into a JS context without further, JS-appropriate sanitization, leading to an injection attack.

sloosecannon

@tarunik said:

Yes! What can happen if you try to sanitize prematurely is that you say, sanitize for HTML and then wind up dropping it into a JS context without further, JS-appropriate sanitization, leading to an injection attack.

I'm currently attempting to exploit that exact type of bug to resurrect .fa-spin (again). Making it even worse, your attempted santitzation might cause an injection in a different language...

@tar said:

So you're arguing there might be a legitimate case for storing the user's name as ; DROP TABLE STUDENTS; -- Bob or <script src='zombo.com/xss.js'> Terry because we don't know at the point of entry if we'll store it in a database or render it on a webpage?

I don't care if someone has a username of ;DROP TABLE STUDENTS;-- as long as it never gets executed.

tar

@tarunik said:

Yes! What can happen if you try to sanitize prematurely is that you say, sanitize for HTML and then wind up dropping it into a JS context without further, JS-appropriate sanitization, leading to an injection attack.

But I guess we could agree that once you've decided to render stuff, it makes sense to have your HTML/JS sanitization in one place, say in some hypothetical ViewBase::pre_render() function, rather than scatterered around through out all of the inherited view classes?

I suppose it really depends on what the range of user input is as well. Nobody has characters such as <*@#%^& as part of their name, so in a Real Name: field we could reject anything other than [a-zA-Z'-]*^† at the point of entry and run the appropriate sanitization later when we want to put the username on the screen.

†. this regex is illustrative, not authoritative, you pedants.

Filed under: a pox on ye, missing {{link value}}!

tarunik

So far, in my (very limited) Webapp experience, I decided to sanitize at the view template level -- Mako provides a handy {blablabla | h} type of functionality for applying filtering to expressions. While it sounds error-prone -- all the actual sanitization logic itself is centralized into the filter functions, and the context makes it clear what kind of sanitization is actually needed vs. having to guess at what derived classes and templates will do with your variables.

@tar said:

But I guess we could agree that once you've decided to render stuff, it makes sense to have your HTML/JS sanitization in one place, say in some hypothetical ViewBase::pre_render() function, rather than scatterered around through out all of the inherited view classes?

Still too early -- also, I'm sure someone's built a pathological case where HTML sanitization causes a JS injection, or vice versa...

sloosecannon

@tar said:

But I guess we could agree that once you've decided to render stuff, it makes sense to have your HTML/JS sanitization in one place, say in some hypothetical ViewBase::pre_render() function, rather than scatterered around through out all of the inherited view classes?

Oh definitely. Doing the same sanitization in different places is a terrible idea.

Filed Under: DRY

tar

@sloosecannon said:

Oh definitely. Doing the same sanitization in different places is a terrible idea.

Although doing a variety of slightly different sanitizations in different places is arguably worse... ;)

sloosecannon

@tar said:

Although doing a variety of slightly different sanitizations in different places is arguably worse... <tt>;)

And that generally happens when you do
@sloosecannon said:

the same sanitization in different places

sloosecannon

That's essentially the same thing as @tar is saying, just done in a different way, since all your sanitization is going through the same code.

You get Discourse when you sanitize lazily with different code...

PleegWat

There's sanitization, and there's escaping.

Sanitization is removing stuff that has no business being there in the first place - like letters in phone numbers. These you want to remove at the earliest (trusted) point, and report back as user error. Note trusted - if you're doing that in JS on the browser side you want to do it again on the server, and keep the rules consistent.

Escaping is making the values safe for display. Stuff should not go into the database entity encoded. Stuff should not be magically backslash-escaped as it goes into your scripting language. These should happen as the data enters the context where escaping is needed - being inserted into a SQL query, or injected into a HTML page.

tar

@PleegWat said:

Note trusted - if you're doing that in JS on the browser side you want to do it again on the server, and keep the rules consistent.

To go off on a mild tangent, doesn't Perl have a notion of 'tainted' input data for precisely these kind of reasons—so you can't display user input unless it's been run through an input processor? I may have to go off and refresh my Perl knowledge...

PleegWat

Not sure if that's that helpful. A string doesn't become safer for display in a HTML page by having its quotes backslash-escaped.

tar

Is that all it does? Hmm...

EDIT: ah, found it

While in this mode, Perl takes special precautions called taint checks to prevent both obvious and subtle traps. Some of these checks are reasonably simple, such as verifying that path directories aren't writable by others; careful programmers have always used checks like these. Other checks, however, are best supported by the language itself, and it is these checks especially that contribute to making a set-id Perl program more secure than the corresponding C program.

You may not use data derived from outside your program to affect something else outside your program--at least, not by accident. All command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted". Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes, with the following exceptions:

PleegWat

No clue, that's just me assuming. But if you're saying 'any input processor'.

As mentioned before, I differentiate between input sanitization and output escaping. Whether either of them are needed depends on the input method and the data. (EG, a plain integer input may not need any sanitization or escaping at all)

tar

@PleegWat said:

I differentiate between input sanitization and output escaping.

Yeah, I think that is a useful distinction to make.