Regular Expression to look for [b]tags[/b]
-
I'm trying to write regular expression that looks for special tags such as [ b ]bold[/ b ] (spaces added to prevent formatting) and replaces it with <b>bold</b>. Specifically I'm trying to look for a way to match anything that isn't a specific character sequence so that it starts with [ b ] ends with [ /b ] and contains anything that isn't [ b ]
So far I have...
Regex reBoldTags = new Regex(@"\[b\].+\[\/b\]", RegexOptions.IgnoreCase);
...problem is the .dot matches absolutely everything including other [ b ] tags but I want it to match anything that isn't another [ b ] tag. How do I use negation on specific characters sequences? Cheers!
-
W00t!!! I think I've solved it!! It turns out I have to use uses lazy evaluation to match as few instances as possible. So rather than match .+ I match .*? eg.
Regex reBoldTags = new Regex(@"\[b\].*?\[\/b\]", RegexOptions.IgnoreCase);
Before the following character sequence "this is in [ b ]bold[ /b ] and is in [ b ]bold[ /b ] also."
Thre regex in the previous post would have matched "[ b ]bold[ /b ] and is in [ b ]bold[ /b ]", whereas lazy evaluation matches "[ b ]bold[ /b ]" and "[ b ]bold[ /b ]" separately. From there it's just a matter of stripping the tags off the ends and adding <b></b> tags instead. I'm trying to prevent malicious users sticking [ b ] and [ i ] tags in whilly nilly and thus screwing up the formatting of the rest of the page instead of keeping it to their comment/blog post. Here's the code in full...
public static string Bold(string text) { Regex reBoldTags = new Regex(@"\[b\].*?\[\/b\]", RegexOptions.IgnoreCase); foreach (Match match in reBoldTags.Matches(text)) { string oldValue = match.Value; string value = match.Value; value = value.Substring("[ b ]".Length); value = value.Substring(0, value.Length - "[ / b ]".Length); text = text.Replace(oldValue, "< b >" + value + "< / b >"); } return text; }
Now all I need is someone to turn this into a WTF! :-)* If you use this code remember to strip the spaces from [ b ] < b > etc. this form editor thing keeps removing them. How does one insert code snippets into this thing?
-
There's already a WTF there. If the language let the characters in a string be changed you could change [ ] to < > in-place and it would only be an O(n) algorithm, but since you have to make a new copy on each match it potentially becomes O(n^2).
-
Not sure I follow?
-
Every time you do the "text = text.Replace(...)" it makes a copy of the entire text. This is very slow if there are a lot of bolds; if someone sends a message consisting of 100000 [[b][/b]b][/b] in a row it will take 274 seconds to process - major vulnerability for DoS attacks.
I'm not a .NET guy but here's my shot at making a faster version. It uses StringBuilder so that the string only gets copied twice, and it takes 0.13 seconds on the aforementioned string.
public static string Bold2(string text)
{
StringBuilder newText = new StringBuilder(text.Length);
int lastEnd = 0;
Regex reBoldTags = new Regex(@"\[b\].*?\[\/b\]", RegexOptions.IgnoreCase);
for (Match match = reBoldTags.Match(text); match.Success; match = match.NextMatch())
{
newText.Append(text.Substring(lastEnd, match.Index - lastEnd));
string value = match.Value;
value = value.Substring("[b[b][/b]]".Length, value.Length - "[b[b][/b]][/b]".Length);
newText.Append("<b>");
newText.Append(value);
newText.Append("</b>");
lastEnd = match.Index + match.Length;
}
return newText.ToString();
}
-
@Goplat said:
public static string Bold2(string text)
{
StringBuilder newText = new StringBuilder(text.Length);
int lastEnd = 0;
Regex reBoldTags = new Regex(@"[b].*?[/b]", RegexOptions.IgnoreCase);
for (Match match = reBoldTags.Match(text); match.Success; match = match.NextMatch())
{
newText.Append(text.Substring(lastEnd, match.Index - lastEnd));
string value = match.Value;
value = value.Substring("[b[b][/b]]".Length, value.Length - "[b[b][/b]][/b]".Length);
newText.Append("<b>");
newText.Append(value);
newText.Append("</b>");
lastEnd = match.Index + match.Length;
}
return newText.ToString();
}That's an awful lot of code. How about a variation on this:
Regex.Replace(inputString, "[b](.*?)[/b]", "<b>$1</b>");
Use a case-insensitive options and make it global, and voila!
-
Mmmkay, real .NET C# version here:
static string ReplaceBBCode(string input) { return Regex.Replace(input, "\\[b\\](.*?)\\[\\/b\\]", "<b>$1</b>", RegexOptions.IgnoreCase | RegexOptions.Multiline); }
Tested and working.
-
[quote user="djork"]Mmmkay, real .NET C# version here:
static string ReplaceBBCode(string input)
{
return Regex.Replace(input, "\\[b\\](.*?)\\[\\/b\\]", "<b>$1</b>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
}Tested and working.[/quote]
I could kiss you!! Thank you very much! :-)
BTW BB Code stands for Bulleting Board correct? I know it's used in bulletin boards and forums a lot, it would give me a handy name for my class that does all this BBCode stuff.
-
@Sunday Ironfoot said:
[quote user="djork"]Mmmkay, real .NET C# version here:
static string ReplaceBBCode(string input)
{
return Regex.Replace(input, "\\[b\\](.*?)\\[\\/b\\]", "<b>$1</b>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
}Tested and working.
I could kiss you!! Thank you very much! :-)
BTW BB Code stands for Bulleting Board correct? I know it's used in bulletin boards and forums a lot, it would give me a handy name for my class that does all this BBCode stuff.
[/quote]
You're welcome.
-
<pooper what="party">
Note that this won't work correctly with nested tags. That might not be a problem with [b] tags because people don't usally nest them (though it [ b ]will look [ b ]like this[ /b ] and that's not what's intended[ /b ]), but it becomes a serious problem once you start introducing tags like [ quote ].
The only solution in that case is to write yourself a stack-based parser.
</pooper>
-
[quote user="RiX0R"]
<pooper what="party">
Note that this won't work correctly with nested tags. That might not be a problem with [b] tags because people don't usally nest them (though it [ b ]will look [ b ]like this[ /b ] and that's not what's intended[ /b ]), but it becomes a serious problem once you start introducing tags like [ quote ].
The only solution in that case is to write yourself a stack-based parser.
</pooper>
[/quote]
You could also pull a crappy "close enough" version by comparing the number of matches of opening and closing tags, replacing only as many tags as the lower count. I did that once and it turned out alright, though it's probably terribly inefficient.
-
Here is a simplified version of the one I use, its usefulness becomes more apparent when dealing with tags such as [ quote ].
<?phpfunction processBBCodeInline($in) {if (is_array($in)) {$in = '<' . $in[1] . '>' . $in[2]. '</' . $in[1] . '>';}return preg_replace_callback('@[(b|i|u|s)]((?:[^[]|[(?!/?\1])|(?R))+)[/\1]@Si','processBBCodeInline',$in);}echo processBBCodeInline('[b]Test![/b]');?>