Performing regular expressions on context-sensitive languages



  • Continuing the discussion from Completely empty pages on SeaMonkey:

    @hhaamu said:

    The 'img-reorder' filter reorders the attributes into a 'canonical' order so the other regexes (that are searching for some more serious things) can match better. And the filter is matching JS code. Lovely.

    @hhaamu said:

    Turned the filter off. Everything seems to be working now.

    I previously tested without the proxy but it seems I only pressed ctrl-R instead of ctrl-shift-R to reload and it still displayed the empty page so I discounted it. So basically privoxy was corrupting the JS file, and the browser still had the corrupted version cached, and this happened.

    The three offending regexps in question are:

    s|<img\s+?([^>]*)\ssrc\s*=\s*(['"])([^>\\\2]+)\2|<img src=$2$3$2 $1|siUg
    s|<img\s+?([^>]*)\ssrc\s*=\s*([^'">\\\s]+)|<img src=$2 $1|sig
    s|(<img[^>]+height)\s*=\s*|$1=|sig
    
    s|<img (src=(?:(['"])[^>\\\\2]*\2\|[^'">\\\s]+?))([^>]*)\s+width\s*=\s*((["']?)\d+?\5)(?=[\s>])|<img $1 width=$4$3|siUg
    

    It was somewhat of a misconfiguration on my end, but you can probably defend against this by not having an <img string anywhere in the JS code. Use the DOM to insert them or something. If you want. Not sure it's worth it. (Really, this is the first time for me privoxy has turned out to be the culprit, in six+ years of use.)

    This deserves its own topic, really.


    Filed under: Cloudflare's JS optimizations break something, too



  • Also, my guess as to what happened:

    The replacement found a Handlebars template -- which is supposed to be HTML code - and screwed up one of the {{bind-attr}} calls, turning the template into a syntax error, which killed the page.


Log in to reply