Regular Expression to look for [b]tags[/b]



  • I'm trying to write  regular expression that looks for special tags such as [ b ]bold[/ b ] (spaces added to prevent formatting) and replaces it with <b>bold</b>. Specifically I'm trying to look for a way to match anything that isn't a specific character sequence so that it starts with [ b ] ends with [ /b ] and contains anything that isn't [ b ]

    So far I have...

    Regex reBoldTags = new Regex(@"\[b\].+\[\/b\]", RegexOptions.IgnoreCase);

    ...problem is the .dot matches absolutely everything including other [ b ] tags but I want it to match anything that isn't another [ b ] tag. How do I use negation on specific characters sequences? Cheers!



  • W00t!!! I think I've solved it!! It turns out I have to use uses lazy evaluation to match as few instances as possible. So rather than match .+ I match .*? eg.

    Regex reBoldTags = new Regex(@"\[b\].*?\[\/b\]", RegexOptions.IgnoreCase);

    Before the following character sequence "this is in [ b ]bold[ /b ] and is in [ b ]bold[ /b ] also."

    Thre regex in the previous post would have matched "[ b ]bold[ /b ] and is in [ b ]bold[ /b ]", whereas lazy evaluation matches "[ b ]bold[ /b ]" and "[ b ]bold[ /b ]" separately. From there it's just a matter of stripping the tags off the ends and adding <b></b> tags instead. I'm trying to prevent malicious users sticking [ b ] and [ i ] tags in whilly nilly and thus screwing up the formatting of the rest of the page instead of keeping it to their comment/blog post. Here's the code in full...

     

    public static string Bold(string text)
    {
    	Regex reBoldTags = new Regex(@"\[b\].*?\[\/b\]", RegexOptions.IgnoreCase);
    	foreach (Match match in reBoldTags.Matches(text))
    	{
    		string oldValue = match.Value;
    		string value = match.Value;
    		value = value.Substring("[ b ]".Length);
    		value = value.Substring(0, value.Length - "[ / b ]".Length);
    
    		text = text.Replace(oldValue, "< b >" + value + "< / b >");
    	}
    
    	return text;
    }
    


    Now all I need is someone to turn this into a WTF! :-) 

     

    * If you use this code remember to strip the spaces from [ b ] < b > etc. this form editor thing keeps removing them. How does one insert code snippets into this thing? 



  • There's already a WTF there. If the language let the characters in a string be changed you could change [ ] to < > in-place and it would only be an O(n) algorithm, but since you have to make a new copy on each match it potentially becomes O(n^2).



  • Not sure I follow?



  • Every time you do the "text = text.Replace(...)" it makes a copy of the entire text. This is very slow if there are a lot of bolds; if someone sends a message consisting of 100000 [[b][/b]b][/b] in a row it will take 274 seconds to process - major vulnerability for DoS attacks.

    I'm not a .NET guy but here's my shot at making a faster version. It uses StringBuilder so that the string only gets copied twice, and it takes 0.13 seconds on the aforementioned string.

    public static string Bold2(string text)
    {
        StringBuilder newText = new StringBuilder(text.Length);
        int lastEnd = 0;
        Regex reBoldTags = new Regex(@"\[b\].*?\[\/b\]", RegexOptions.IgnoreCase);
        for (Match match = reBoldTags.Match(text); match.Success; match = match.NextMatch())
        {
            newText.Append(text.Substring(lastEnd, match.Index - lastEnd));
            string value = match.Value;
            value = value.Substring("[b[b][/b]]".Length, value.Length - "[b[b][/b]][/b]".Length);
            newText.Append("<b>");
            newText.Append(value);
            newText.Append("</b>");
            lastEnd = match.Index + match.Length;
        }
                                                                                   
        return newText.ToString();
    }




  • @Goplat said:

    public static string Bold2(string text)
    {
        StringBuilder newText = new StringBuilder(text.Length);
        int lastEnd = 0;
        Regex reBoldTags = new Regex(@"[b].*?[/b]", RegexOptions.IgnoreCase);
        for (Match match = reBoldTags.Match(text); match.Success; match = match.NextMatch())
        {
            newText.Append(text.Substring(lastEnd, match.Index - lastEnd));
            string value = match.Value;
            value = value.Substring("[b[b][/b]]".Length, value.Length - "[b[b][/b]][/b]".Length);
            newText.Append("<b>");
            newText.Append(value);
            newText.Append("</b>");
            lastEnd = match.Index + match.Length;
        }
                                                                                   
        return newText.ToString();
    }


    That's an awful lot of code. How about a variation on this:



    Regex.Replace(inputString, "[b](.*?)[/b]", "<b>$1</b>");



    Use a case-insensitive options and make it global, and voila!



  • Mmmkay, real .NET C# version here:

    static string ReplaceBBCode(string input)
    {
        return Regex.Replace(input, "\\[b\\](.*?)\\[\\/b\\]", "<b>$1</b>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
    }
    

    Tested and working.



  • [quote user="djork"]Mmmkay, real .NET C# version here:

    static string ReplaceBBCode(string input)
    {
    return Regex.Replace(input, "\\[b\\](.*?)\\[\\/b\\]", "<b>$1</b>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
    }

    Tested and working.[/quote]

    I could kiss you!! Thank you very much! :-)

     

    BTW BB Code stands for Bulleting Board correct? I know it's used in bulletin boards and forums a lot, it would give me a handy name for my class that does all this BBCode stuff.
     



  • @Sunday Ironfoot said:

    [quote user="djork"]Mmmkay, real .NET C# version here:

    static string ReplaceBBCode(string input)
    {
    return Regex.Replace(input, "\\[b\\](.*?)\\[\\/b\\]", "<b>$1</b>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
    }

    Tested and working.

    I could kiss you!! Thank you very much! :-)

     

    BTW BB Code stands for Bulleting Board correct? I know it's used in bulletin boards and forums a lot, it would give me a handy name for my class that does all this BBCode stuff.
     

    [/quote]

    You're welcome.



  • <pooper what="party">

    Note that this won't work correctly with nested tags. That might not be a problem with [b] tags because people don't usally nest them (though it [ b ]will look [ b ]like this[ /b ] and that's not what's intended[ /b ]), but it becomes a serious problem once you start introducing tags like [ quote ].

    The only solution in that case is to write yourself a stack-based parser.

    </pooper>



  • [quote user="RiX0R"]

    <pooper what="party">

    Note that this won't work correctly with nested tags. That might not be a problem with [b] tags because people don't usally nest them (though it [ b ]will look [ b ]like this[ /b ] and that's not what's intended[ /b ]), but it becomes a serious problem once you start introducing tags like [ quote ].

    The only solution in that case is to write yourself a stack-based parser.

    </pooper>

    [/quote]

     

    You could also pull a crappy "close enough" version by comparing the number of matches of opening and closing tags, replacing only as many tags as the lower count. I did that once and it turned out alright, though it's probably terribly inefficient.



  • Here is a simplified version of the one I use, its usefulness becomes more apparent when dealing with tags such as [ quote ].


    <?php

    function processBBCodeInline($in) {
    if (is_array($in)) {
    $in = '<' . $in[1] . '>' . $in[2]
    . '</' . $in[1] . '>';
    }
    return preg_replace_callback(
    '@[(b|i|u|s)]((?:[^[]|[(?!/?\1])|(?R))+)[/\1]@Si',
    'processBBCodeInline',
    $in
    );
    }

    echo processBBCodeInline('[b]Test![/b]');

    ?>
     


Log in to reply