Yet another DiscoHTBBCoMLParser bug
-
*easily not damage anything, not easily type
Continuing the discussion from Why is Jeff everywhere:
@loopback0 said:
@mott555 said:
I can type 120 WPM.
Do you type with a hammer?!
Since I mostly use mechanical keyboards I could easily* type with a hammer without damaging anything.
* easily not damage anything, not easily type
* easily not damage anything, not easily type
Check raw. See where "small" tags are, notice what text is actually small.
-
Nice one!
-
Erm, why are there two quotes? I didn't do that...not intentionally anyway.
-
-
I wonder if any
othernested tags are broken...
-
I found this documentary of the DiscoDevs writing the parser:
-
_*) abcdefghijklmnopqrstuvwxyz1234567890
**View raw**
-
_*what
**_*\*<small>what*</small>**
-
The MD5 fuckery topic is
-
You mean
-
The MD5 fuckery knows no bounds.
-
Why not write a program that finds all these parser bugs for us? Oh wait, nevermind, I guess unit tests are
-
Once you remember they are using RegEx to parse, this is easy.
First let's take a look at what you typed. Since it is at the start of a word block, the parser expects it to be the start of an italics sequence, the second gets ignored, and the third, being at the end of a word block, is used as the close of the italics sequence. Because the
<small>
tag is now inside the generated<em>
tag, the parser "helpfully" closes it before closing the<em>
, which is what generates the results you see.*<small> *easily* not damage anything, not easily type </small> ^ ^ ^ Start | End italics | italics Ignore
Now, let's try the first solution of putting a space after the first asterisk. This method produces the results you want, but converts the first asterisk into a bullet:
* *easily* not damage anything, not easily type
Given sufficient knowledge of markup, this shouldn't be difficult to understand. So what is the solution? Why, escape the first asterisk, of course!
\*<small> *easily* not damage anything, not easily type </small>
\* *easily* not damage anything, not easily type
Honestly, this one should probably be classified as user error. Trying to add a fix to the parser would add layers of complications and introduce lots of edge cases and bugs.
-
-
Can you do mine next?
-
No, that's a legit bug.
-
That MD5 thing is the gift that keeps on giving.
-
Which is why we want you to dissect it!
-
Trying to add a fix to the parser would add additional layers of complications and introduce
lots ofeven more edge cases and bugs.FTFD
-
Which is why we want you to dissect it!
All right, let's see what we can find.
So @hungrier started with:
**_*\*<small>what*</small>**
to get: _*whatSo what happens if we drop the HTML?
**_*\*what***
to get: _*whatWell, that's no fun. So keep the HTML. Let's try dropping the underscore instead:
***\*<small>what*</small>**
to get: *whatThat's boring, too. Ok, underscore stays. Hmmm. Remove the bold level asterisks?
_*\*<small>what*</small>
to get: _*whatThat fracks up the fun too … Put the bold back in. Time to try removing the italics:
**_\*<small>what</small>**
to get: _*whatWell belgium. There's only two things left to remove, the escaped asterisk and the "what". I can't imagine that the fun would work without either of those, but just to be thorough, let's remove the "what".
**_*\*<small>*</small>**
to get: _*Well, now. That provides some interesting insight (I should really have picked up on this earlier). Note that more of the hash is visible in the version without the "what" than the one with the "what". Let's try adding "what back in, one letter at a time:
**_*\*<small>w*</small>**
to get: _*w
**_*\*<small>wh*</small>**
to get: _*wh**_*\*<small>wha*</small>**
to get: _*wha**_*\*<small>what*</small>**
to get: _*what**_*\*<small>what *</small>**
to get: *_**what**_*\*<small>what t*</small>**
to get: _*what t**_*\*<small>what th*</small>**
to get: _*what th**_*\*<small>what the*</small>**
to get: _*what the**_*\*<small>what the *</small>**
to get: *_**what the**_*\*<small>what the b*</small>**
to get: _*what the b**_*\*<small>what the be*</small>**
to get: _*what the be**_*\*<small>what the bel*</small>**
to get: _*what the bel**_*\*<small>what the belg*</small>**
to get: _*what the belg**_*\*<small>what the belgi*</small>**
to get: _*what the belgi**_*\*<small>what the belgiu*</small>**
to get: _*what the belgiu**_*\*<small>what the belgium*</small>**
to get: _*what the belgium**_*\*<small>what the belgium *</small>**
to get: *_**what the belgium**_*\*<small>what the belgium n*</small>**
to get: _*what the belgium n**_*\*<small>what the belgium no*</small>**
to get: _*what the belgium no**_*\*<small>what the belgium now*</small>**
to get: _*what the belgium now**_*\*<small>what the belgium now?*</small>**
to get: _*what the belgium now?
Interestingly, there are additional complications noted in this progression:
- Sequences ending in spaces completely hide the hash, cause an extra asterisk to appear in the baked test, and eliminate the duplicate text. (See #5, #9, and #17).
- Sequences containing censored words completely hide the hash – likely because they are using the HTML escape sequence for the censoring blocks (
■
). This can result in the doubled sequence getting overwritten in addition to the hash, and the<small>
tag being escaped early (See #16 and #18 - #21) - Only the first item in the list above is actually numbered.
-
That's...amazing. Where did the
;
come from?
-
Looks like a partially-eaten
&9632;
.
-
That's what I was wondering...would make sense, I suppose.
-
Looks like a partially-eaten &9632;.
You missed a # in there. But yes, you would appear to be correct.
-
I wish I could like this post more than once.
-
Discourse wanted me to think I could
-
It knows you've been breaking the DiscoHTBBCoMLParser.
-
-
I thought this was well known by now. Dickcorpse uses RegEx to parse DiscoHTBBCoML.
-
Once you remember they are using RegEx to parse, this is easy.
Exactly. Anything you type is liable to produce something that is almost, but not entirely, unlike what you wanted to produceHonestly, this one should probably be classified as
user errorshit design. Trying to add a fix to the parser rather than doing the job properly would add layers of complications and introduce lots of edge cases and bugs. "Trying to add a fix to the parser" is a large part of why the parser is already chock-full of edge cases and bugs.
-
Not sure if insane ideas thread or stupid things that people have actually done...
-
Don't worry, they "escape" special characters (* and _, for example) with MD5!