How much of a RWTF am I?



  • Long-time reader, first-time poster. Yes, I have read the entire 🥑 thread.

    I made a static website generator for a specific purpose. I know there are many out there, but I needed one to build practice problem sets for my classes (I teach physics and chemistry at the high school level). The code can be found on Github here.

    Now the question: How bad is the php? Will putting this on a server open me up to hackers, crackers, malicious kiddies, etc? I plan to couple it with a cron job to delete all output files older than X. Is this exploitable? In other words--I know I'm TRWTF. The question is to what degree? I especially need help with the create_assignment.php file, as the client can change without breaking the server.

    Known issues:

    • The CSS is ugly.
      *The javascript & php are probably horribly inefficient. Suggestions there would be beautiful.
      *The (hand-rolled) templating structure is limited. I'd rather not have to learn a whole templating framework for a bitty site like this.

    Please, WTF'ers, be your usual selves and do what you do best--break things and find problems.


  • area_pol

    @Benjamin-Hall
    Could you please explain more how this works? (who is interacting with the site and in what way)

    static website generator

    Will putting this on a server open me up to hackers, crackers, malicious kiddies, etc?

    If it is a static website, then why are you afraid of hackers?

    php

    So it is not a static website?

    Also you have some trash in the repo, maybe worth deleting and adding to .gitignore.
    0_1480195074066_git_trash.png



  • @Benjamin-Hall said in How much of a RWTF am I?:

    Will putting this on a server open me up to hackers, crackers, malicious kiddies, etc?

    At a glance, yes. chdir into a directory completely determined by user input + glob("*.*") all the files in it into a zip file = ability to read whatever folder your PHP code can read. That means it may well be possible to read out server configuration or your actual source code. You really don't want that. Instead, a more robust (yet admittedly more complex) solution would be to do things on the fly:

    • use tempnam to create a temporary file
    • open it with ZipArchive
    • use ZipArchive::addFromString to add the generated "files" without actually writing them to your filesystem, just pass the string returned by Template::output as file content.
    • when the zip is complete, use readfile to immediately deliver it to the browser
    • delete the temporary file at the end of the request

    @Benjamin-Hall said in How much of a RWTF am I?:

    The (hand-rolled) templating structure

    Your templating solution and the inline HTML are both dangerous because they don't encode HTML entities in their output properly. Even if you are not worried about someone making questions/answers like This is <script>alert("not a drill")</script> and getting browser alerts, it doesn't work right with legitimate content like a<b but b<c. (This is one of the most common security issues with handcrafted templating systems.)



  • @Adynathos This is a generator for static sites. The sites themselves are static, but the generator takes input and builds a zip file containing the necessary files for the static web site. It's designed for other teachers (who may not know HTML at all) to be able to build mini sites that interface with our Learning Management System so that the students can practice. Basically, the sites built are a quiz of sorts. I know the sites themselves work--I've been creating them with non-web-based tools for a while now. It's the automatic generation of the sites that I'm not sure of.


  • area_pol

    @Benjamin-Hall said in How much of a RWTF am I?:

    It's designed for other teachers

    Then a way to improve security is by either hosting it on internal network only or if its publicly available, protecting it with a password.



  • @DCoder Ah. Thanks. I hadn't thought of that chdir problem. I couldn't get the readfile solution to work when I tried it earlier--I think I had something wrong in my requests (as the client makes a POST request after gathering the JSON from the input.

    I'll have to consider how best to adjust for the templating security issue. I want at least basic markup to be available (especially for things like subscripts and superscripts). Suggestions on how to (easily) allow for those without being insecure would be nice.



  • @Adynathos said in How much of a RWTF am I?:

    @Benjamin-Hall said in How much of a RWTF am I?:

    It's designed for other teachers

    Then a way to improve security is by either hosting it on internal network only or if its publicly available, protecting it with a password.

    I don't have access to the internal servers. I'd like for it to be available publicly (on my hosted site) but will have to investigate how the hosting system deals with password protected portions of the site.



  • That's... well, debatable, though the term 'static' and dynamic' have always been a bit loosey-goosey. I assume you mean that the content (as opposed to the validation, two different versions of bootstrap - one the full version, the other the minimized version -and the various bits of chrome imported through nvm, which appears to include a carousel effect and some tooltip effect handers) is pre-generated and stored as complete, ready-to-serve HTML files rather than being pre-processed by PHP at the time it is served to the browser.

    I don't want to be the pedant here, but, well, pedantry is needed in this case. We have to know what you are doing, and are trying to accomplish, and we need a firehose worth of context to really help.

    Now, here's the $1 million question: is it in fact the case that the templates are going to be processed separate from being served by the HTTP daemon, or not? If that is the case, then the security issues all rest on you and the teachers who are using this tool. However, it isn't entirely clear that if this really is true or not.

    Mind you, even if it is, you should still be sanitizing the data being filtered through the templates, if only to ensure that you aren't passing something like 1 <x and giving the browser's parser a fit. You need to make sure things like the aforementioned less-than-symbol/left-angle-brackets, greater-than-symbol/right-angle-brackets, ampersands, and anything else corresponding to an HTML entity, gets escaped with the actual entities that are used for them, at minimum.

    This applies both the the data as entered by the user, and data coming from the database or data files you are storing it in, if any.

    Now, another point to be made is, if these are getting prepared ahead of time and stored as cleartext HTML files, then there is no real reason to use PHP aside from the fact that it's what you'd been using already. If all you are doing is generating the HTML code, then any language you feel comfortable with for doing string manipulation will do; if you have some other language you'd prefer (and I get the impression you do), then so long as the string library and file I/O are adequate, you can use that and be fine.

    (Technically this is true even for serve-time generation, actually, but since most languages would require an old-fashion CGI gateway or an add-on module for the HTTP server then PHP is just the easy way out most of the time. Oh, and some hosting services - I am looking at you, LunarPages, and don't think you are going to sneak out either, InMotion - will demand that you upgrade to a more expensive hosting plan if you choose something like Python/Django, ASP.NET, or RoR, and won't even discuss allowing things like Luminus or Revel, so there's a lot of pressure to use the cheaper PHP solutions which they already have set up - especially when they will also pressure you to pick some off-the-shelf CMS such as Turdpress or Magento, and have their support staff sing the praises of Softaculous and tell you that you should never, ever using something you can't get through them. Bitter? Me? Naaah.)

    Notice the use of the word 'cleartext' earlier? Yeah, that was deliberate. If you are generating the HTML ahead of time, and then serving it as mostly-static content, you aren't going to keep the answers hidden. Anyone who can use 'View Page Source' can see everything that is on the page, and if that includes the correct answer (or a fetch from a page for the right answer which differs from the one the wrong answers use), then you aren't giving a test, you're giving the answers away. They wouldn't even need to compromise the server - it would already be compromised. I am not sure looking at the repo if that is the case or not, but if it is, that's a serious problem.



  • @DCoder said in How much of a RWTF am I?:

    @Benjamin-Hall said in How much of a RWTF am I?:

    The (hand-rolled) templating structure

    Your templating solution and the inline HTML are both dangerous because they don't encode HTML entities in their output properly. Even if you are not worried about someone making questions/answers like This is <script>alert("not a drill")</script> and getting browser alerts, it doesn't work right with legitimate content like a<b but b<c. (This is one of the most common security issues with handcrafted templating systems.)

    Would it be enough to use something like htmlspecialchars() on the input and then converting the few white-listed tags (<sup> and <sub> really) back before inserting it into the template?



  • @ScholRLEA said in How much of a RWTF am I?:

    That's... well, debatable, though the term 'static' and dynamic' have always been a bit loosey-goosey. I assume you mean that the content (as opposed to the validation, two different versions of bootstrap - one the full version, the other the minimized version -and the various bits of chrome imported through nvm, which appears to include a carousel effect and some tooltip effect handers) is pre-generated and stored as complete, ready-to-serve HTML files rather than being pre-processed by PHP at the time it is served to the browser.

    I don't want to be the pedant here, but, well, pedantry is needed in this case. We have to know what you are doing, and are trying to accomplish, and we need a firehose worth of context to really help.

    Now, here's the $1 million question: is it in fact the case that the templates are going to be processed separate from being served by the HTTP daemon, or not? If that is the case, then the security issues all rest on you and the teachers who are using this tool. However, it isn't entirely clear that if this really is true or not.

    Mind you, even if it is, you should still be sanitizing the data being filtered through the templates, if only to ensure that you aren't passing something like 1 <x and giving the browser's parser a fit. You need to make sure things like the aforementioned less-than-symbol/left-angle-brackets, greater-than-symbol/right-angle-brackets, ampersands, and anything else corresponding to an HTML entity, gets escaped with the actual entities that are used for them, at minimum.

    This applies both the the data as entered by the user, and data coming from the database or data files you are storing it in, if any.

    Now, another point to be made is, if these are getting prepared ahead of time and stored as cleartext HTML files, then there is no real reason to use PHP aside from the fact that it's what you'd been using already. If all you are doing is generating the HTML code, then any language you feel comfortable with for doing string manipulation will do; if you have some other language you'd prefer (and I get the impression you do), then so long as the string library and file I/O are adequate, you can use that and be fine.

    (Technically this is true even for serve-time generation, actually, but since most languages would require an old-fashion CGI gateway or an add-on module for the HTTP server then PHP is just the easy way out most of the time. Oh, and some hosting services - I am looking at you, LunarPages, and don't think you are going to sneak out either, InMotion - will demand that you upgrade to a more expensive hosting plan if you choose something like Python/Django, ASP.NET, or RoR, and won't even discuss allowing things like Luminus or Revel, so there's a lot of pressure to use the cheaper PHP solutions which they already have set up - especially when they will also pressure you to pick some off-the-shelf CMS such as Turdpress or Magento, and have their support staff sing the praises of Softaculous and tell you that you should never, ever using something you can't get through them. Bitter? Me? Naaah.)

    Notice the use of the word 'cleartext' earlier? Yeah, that was deliberate. If you are generating the HTML ahead of time, and then serving it as mostly-static content, you aren't going to keep the answers hidden. Anyone who can use 'View Page Source' can see everything that is on the page, and if that includes the correct answer (or a fetch from a page for the right answer which differs from the one the wrong answers use), then you aren't giving a test, you're giving the answers away. They wouldn't even need to compromise the server - it would already be compromised. I am not sure looking at the repo if that is the case or not, but if it is, that's a serious problem.

    A few responses: The accidental inclusion of multiple bootstrap chunks (and the associated javascript/etc) is due to me slapping that part in a hurry.

    The actual output (the practice modules) is not designed to be secure. I have other ways of giving graded assessments. This is entirely for the students to practice. They can try as many times as they like--if they use "view page source" (which is difficult from the iPads they mostly use) they're only hurting themselves. The static sites don't use any bootstrap or other dynamic content--everything is in the zip file that's uploaded and served from the LMS.

    The dynamic part is the generation of the modules themselves. I have a set of python scripts to do this locally, which work fine. I've been doing it for most of a year now. I wanted to extend this as a web service for a couple reasons--

    • possibly allow my fellow teachers access without having to install stuff on their machines
    • not have to write the JSON (which the python scripts consume) by hand.

    I have (for other purposes) a cheap web host that runs php but not python (as far as I can tell), so that's why I picked php. This repo is a client and an API endpoint (if I'm using the terminology correctly). The php portion should simply take incoming properly-built JSON and spit back a zip file containing an assignment with the information from the JSON.

    Is this a total WTF way of doing things? Is there a better way?


  • Winner of the 2016 Presidential Election

    @Benjamin-Hall said in How much of a RWTF am I?:

    I have read the entire 🥑 thread.

    You are very much TRWTF

    Filed Under: you are welcome
    Also Filed Under: I will actually read the rest of the posts now
    Also also Filed Under: yeah, yeah, no trolling in Coding help, yada yada



  • @Kuro: TBH, I was tempted to take that particular cheap shot myself, but thought better of it, as I was sure someone else would do it anyway. Thank you for helping WTDWTF live down to everyone's expectations.

    Getting back to the actual problem, I would check with the host's tech support and/or customer service and see what they say about using Python (or whatever solution you feel most comfortable with), but for something like this, I wouldn't fight too hard over it. As I said earlier, a number of hosting companies really resist any attempt to use anything they can't give you a turnkey fix for, especially one they already have installed.

    If the assessments are only for practice, then yes, that part of it isn't terribly serious, I would say. The issue of vetting the data is still important, for a number of reasons, but not too dreadful. To make it easier for both you and your users, you might want to give the users something like BBCode or sigh Markdown rather than HTML tags for the superscripts and subscripts - perhaps the use carets ^like this^ for superscripts, and something like backticks or vertical bars for subscripts.

    The actual project almost sounds less like a web page per se, and more like a RESTful web service, though I may be reading to much into it. You might find it useful to separate it into a front end and a service in that way, but I am guessing it would be more trouble than it would be worth.

    I will say that your reasoning about not wanting to use an existing framework is flawed, though - a small project like this is exactly when you want to grab a shrink-wrap solution (if there is one); it is larger projects which those tend to break down on. Off-hand, I am not sure if there is one which you could use for your purposes, but if there is I would check it out and see if it is adequate. Turnkey, shrink-wrap systems are generally meant to be 75-90% solutions (meaning they cover most of the use cases practically out of the box, but may not be very good for any that they don't), and for most smaller uses, that's more than enough.



  • @ScholRLEA said in How much of a RWTF am I?:

    @Kuro: TBH, I was tempted to take that particular cheap shot myself, but thought better of it, as I was sure someone else would do it anyway. Thank you for helping WTDWTF live down to everyone's expectations.

    Getting back to the actual problem, I would check with the host's tech support and/or customer service and see what they say about using Python (or whatever solution you feel most comfortable with), but for something like this, I wouldn't fight too hard over it. As I said earlier, a number of hosting companies really resist any attempt to use anything they can't give you a turnkey fix for, especially one they already have installed.

    If the assessments are only for practice, then yes, that part of it isn't terribly serious, I would say. The issue of vetting the data is still important, for a number of reasons, but not too dreadful. To make it easier for both you and your users, you might want to give the users something like BBCode or sigh Markdown rather than HTML tags for the superscripts and subscripts - perhaps the use carets ^like this^ for superscripts, and something like backticks or vertical bars for subscripts.

    The actual project almost sounds less like a web page per se, and more like a RESTful web service, though I may be reading to much into it. You might find it useful to separate it into a front end and a service in that way, but I am guessing it would be more trouble than it would be worth.

    I will say that your reasoning about not wanting to use an existing framework is flawed, though - a small project like this is exactly when you want to grab a shrink-wrap solution (if there is one); it is larger projects which those tend to break down on. Off-hand, I am not sure if there is one which you could use for your purposes, but if there is I would check it out and see if it is adequate. Turnkey, shrink-wrap systems are generally meant to be 75-90% solutions (meaning they cover most of the use cases practically out of the box, but may not be very good for any that they don't), and for most smaller uses, that's more than enough.

    As a meta point--I included the 🥑 part specifically to open up those cheap shots. Call me a masochist (oh right, I read that whole thread. Of course I'm a masochist undefined ).

    You're right that the project is more of an web service. I've tried to separate it already (client.html is a front-end, while create_assignment.php is the service). Visiting create_assignment.php directly in a browser would only render an error page (as it requires JSON to be sent via POST to function and exits immediately if that data is malformed or absent.

    I'll admit that not using a framework partly comes down to wanting to learn things from the ground up. I'm self-taught in all my languages and want to add more HTML/CSS/JS/PHP to the arsenal. As I teach my students, I think it's important to understand the groundwork before using pre-built toolkits. If this were a serious production site that I were getting paid for, of course I'd use a framework.

    Thanks for your input.



  • @ScholRLEA said in How much of a RWTF am I?:

    or an add-on module for the HTTP server then PHP

    To be fair, this is falling out of favor, instead being replaced with a FastCGI bridge. Primarily for security reasons.



  • @Benjamin-Hall said in How much of a RWTF am I?:

    Would it be enough to use something like htmlspecialchars() on the input and then converting the few white-listed tags (<sup> and <sub> really) back before inserting it into the template?

    htmlspecialchars($input, ENT_QUOTES, 'utf-8') is a good approach. @ScholRLEA's suggestion of a different markup language is also reasonable, if a bit more complex.

    @ScholRLEA said in How much of a RWTF am I?:

    off-the-shelf CMS such as Turdpress or Magento

    undefined Magento is not a CMS. It's an e-Commerce solution that includes an (impotent) CMS module.



  • storedAssignment = localStorage['assignment'];
    ...
    localStorage.assigment = JSON.stringify(document.assignment);

    Not exactly TRWTF, but a minor WTF is that work in progress is stored to 'assigment' and loaded from 'assignment'.



  • @Benjamin-Hall said in How much of a RWTF am I?:

    Is this a total WTF way of doing things? Is there a better way?

    Yes, it is, and yes, there is


  • Discourse touched me in a no-no place

    @Benjamin-Hall said in How much of a RWTF am I?:

    The actual output (the practice modules) is not designed to be secure. I have other ways of giving graded assessments. This is entirely for the students to practice.

    With rather a lot of experience, not all of which I want to remember TBH, if your system is even slightly successful for anyone other than you, that will change because it will end up being used by people who don't understand why it is important to not put the answers on a system controlled by students. It's just human nature. It's easy (when one isn't security-minded) to conflate doing a quiz to informally check progress with a formal test, since both of them are just answering questions on a computer, right? The only ways to avoid the problems are to either be completely unsuccessful at being used by anyone else, or to make thing highly robust from the beginning.

    Is there a reason for not just using something like Moodle? Yes, it's big and complicated, but it's got far more effort applied to it than you're ever likely to do, and it handles all sorts of cases that you've never thought of (and also that I've never thought of too).



  • @dkf said in How much of a RWTF am I?:

    @Benjamin-Hall said in How much of a RWTF am I?:

    The actual output (the practice modules) is not designed to be secure. I have other ways of giving graded assessments. This is entirely for the students to practice.

    With rather a lot of experience, not all of which I want to remember TBH, if your system is even slightly successful for anyone other than you, that will change because it will end up being used by people who don't understand why it is important to not put the answers on a system controlled by students. It's just human nature. It's easy (when one isn't security-minded) to conflate doing a quiz to informally check progress with a formal test, since both of them are just answering questions on a computer, right? The only ways to avoid the problems are to either be completely unsuccessful at being used by anyone else, or to make thing highly robust from the beginning.

    Is there a reason for not just using something like Moodle? Yes, it's big and complicated, but it's got far more effort applied to it than you're ever likely to do, and it handles all sorts of cases that you've never thought of (and also that I've never thought of too).

    We (as a school) already have a LMS. PowerSchool Learning. It has flaws, but it's what the kids and the other faculty know. My efforts are to create modules that will plug into that LMS. More specifically, it has the capability to act as a very limited web server--you can hand it a zip file containing a site (no server-side logic, only html/css/js or flash) and it will serve it up and load it into an iframe on the class page. The output of my project is designed to be plug-and-play for the teachers--simply hit the upload button and it takes care of the rest.

    @clatter said in How much of a RWTF am I?:

    storedAssignment = localStorage['assignment'];
    ...
    localStorage.assigment = JSON.stringify(document.assignment);

    Not exactly TRWTF, but a minor WTF is that work in progress is stored to 'assigment' and loaded from 'assignment'.

    Derp. I seem to be prone to 1-character errors. They're the bane of my existence. You might say that I'm a disciple of @accalia in that regard, but mostly only in code.

    @tufty said in How much of a RWTF am I?:

    @Benjamin-Hall said in How much of a RWTF am I?:

    Is this a total WTF way of doing things? Is there a better way?

    Yes, it is, and yes, there is

    I curse your very soul for even suggesting it 😜



  • If you're doing Chemistry and Physics, I'd also suggest looking into integrating https://www.mathjax.org/

    Pure HTML simply doesn't cut it when it comes to formulas 🙂

    Shouldn't be too big of a problem if this Haiku thing simply embeds your uploads as an iframe - plonk the reference to MathJax into the <head> and then you can choose between using ASCIIMath, MathML, Latex or all of them at the same time.



  • @Rhywden said in How much of a RWTF am I?:

    If you're doing Chemistry and Physics, I'd also suggest looking into integrating https://www.mathjax.org/

    Pure HTML simply doesn't cut it when it comes to formulas 🙂

    Shouldn't be too big of a problem if this Haiku thing simply embeds your uploads as an iframe - plonk the reference to MathJax into the <head> and then you can choose between using ASCIIMath, MathML, Latex or all of them at the same time.

    Oooh that's something I hadn't considered but now want. I'll have to explore how the LMS will handle it, but I'm pretty sure it won't care. Typing
    equations as html sucks and is ugly.

    Thanks!



  • @DCoder said in How much of a RWTF am I?:

    undefined Magento is not a CMS. It's an e-Commerce solution that includes an (impotent) CMS module.

    That's a good point. However, IME Softaculous does present it as a whipped topping and a floor wax, which is technically true but far from usable if all you want is a CMS.

    OTOH, WordPus isn't really a CMS either; it is a blogging system which people have hacked around to serve as a CMS.

    What packages served up on Softaculous are Content Management Systems, or can be used as one without mangling them in weird ways? Uhm... dunno. Drupal and Joomla! would be the usual answers, but even they lack some of the features usually seen as critical for that (such a version control that actually works), and they are far from turnkey solutions.

    To be honest, the term "CMS" seems to be defined as "any program that the devs say is a CMS". Not really helping, there. And holy fuck, there sure are a lot of them, but that just makes me question the term even more as the only thing I see that all of the seem to have in common is that they run on computers. At that point you could call valgrind a CMS and doubt anyone would bat an eye.

    TL;DR fuck marketing terms, focus on what it actually does well.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.