1 (edited by Miles 2005-11-30 22:56)

Topic: BBCode bug with improper nesting: [quote][url][/quote][/url]

The BBCode parser does all sorts of weird things with improperly nested quote and url tags (especially when the automagical linkshortenermajigger comes into effect).  I know this isn't really that bad of a bug, because I'm passing it bad BBCode to begin with, but could it return some sort of BBCode parse error instead of vomiting strange markup? smile

Credit for the find goes to ScifilterX at MacAddict Forums.  Sorry if this has already been posted about, I didn't see anything obvious in the first few pages of this forum.

The code:

[quote][url]http://www.example.com/foo/bar/[/quote]
[/url]Here is some more text.

The result:

<blockquote><div class="incqbox"><p><a href="http://www.example.com/foo/bar/</p></div></blockquote><p>">http://www.example.com/foo/bar/</p></di … kquote><p></a><br /><br />Here is some more text.

See it in action:
Removed by me, Rickard, because it was screwing up the format of the page.

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

I think this has been reported before...

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

It was (or something similar) and it was supposed to be fixed hmm

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

well one thing you can do is input the code correctly. like :

[quote][url]http://www.example.com/foo/bar/[/url][/quote]
Here is some more text.

maybe there should be a check to see if it nestled correctly and if it isnt give an error

5 (edited by Miles 2005-12-01 05:09)

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

Gizzmo wrote:

well one thing you can do is input the code correctly. like :

[quote][url]http://www.example.com/foo/bar/[/url][/quote]
Here is some more text.

maybe there should be a check to see if it nestled correctly and if it isnt give an error

Yeah, I realize that.

I wrote:

I know this isn't really that bad of a bug, because I'm passing it bad BBCode to begin with, but could it return some sort of BBCode parse error instead of vomiting strange markup?

It isn't that bad most of the time when you misnest the other tags.  It's only really annoying when the link shortner comes into play and truncates some of the HTML, sometimes badly screwing up the rest of the page.

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

It was reported earlier and it was fixed, but I believe it was the reintroduced again due to some other bug. Regular expressions are tricky smile I will look into it when I'm feeling better (flu).

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

Probably the same bug: entering

[url=http://punbb.org/forums/][quote]test[/quote]
[/url]

outputs:

<a href="http://punbb.org/forums/"></p><blockquote><div class="incqbox"><p>test</p></div></blockquote><p></a></p>

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

not so same bug.

the output HTML is clean, BUT it creates block element in the inline element, which is bad...

but how to check this sort of code? maybe the SAX integration?

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

I would appreciate any help on this matter. I really hate digging around in the bbcode regexes. Change one character and ten other things break.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

Would it be easiest to add more checks when the post is made rather than trying to make the parser sort it, i'm thinking adding a check that a quote tag cannot be within any tag except code and code tag cannot be in any except quote, and any tag opened within quote or code must be closed within it. Infact, would it be an idea to check that the last open tag is always closed before its parent tag that would stop things like

[b][/i]Writing[/b][/i]

being allowed?

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

Ok, here we go i wrote a new function to replace check_tag_order and do aload of other stuff

function preparse_tags($text, &$error)
{
    // Start off by making some arrays of bbcode tags and what we need to do with each one
    
    // List of all the tags
    $tags = array('quote','code','b','i','u','color','url','email','img');
    // List of tags that we need to check are open (You could not put b,i,u in here then illegal nesting like [b][i][/b][/i] would be allowed)
    $tags_opened = $tags;
    // and tags we need to check are closed (the same as above, added it just in case)
    $tags_closed = $tags;
    // Tags we can nest and the depth they can be nested to (only quotes )
    $tags_nested = array('quote');
    $tags_nested_depth = array('quote' => 3);
    // Tags to ignore the contents of completely (just code)
    $tags_ignore = array('code');
    // Block tags, block tags can only go within another block tag, they cannot be in a normal tag
    $tags_block = array('quote','code');

    $split_text = preg_split("/(\[.*?\])/", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
    
    $open_tags = array('post');
    $opened_tag = 0;
    $new_text = '';
    $current_ignore = '';
    $current_nest = '';
    $current_depth = array();
    
    foreach($split_text as $current)
    {
        if (strpos($current, '[') === 0 && strpos($current, ']') === 0)
        {
            // Its not a bbcode tag so we put it on the end and continue
            if (!$current_nest)
                $new_text .= $current;
            continue;
        }
        
        preg_match("/\[\/?([a-z]+).*\]/", $current, $matches);
        $current_tag = $matches[1];
        if (!in_array($current_tag, $tags))
        {
            // Its not a bbcode tag so we put it on the end and continue
            if (!$current_nest)
                $new_text .= $current;
            continue;
        }
        
        // We definitely have a bbcode tag.
        
        if ($current_ignore)
        {
            //This is if we are currently in a tag which escapes other bbcode such as code
            if ($current_ignore == $current_tag && $current == '[/'.$current_ignore.']') 
                $current_ignore = '';
            $new_text .= $current;
            continue;
        }

        if ($current_nest)
        {
            // We are currently too deeply nested so lets see if we are closing the tag or not.
            if ($current_tag != $current_nest)
                continue;
                
            if (substr($current, 1, 1) == '/')
                $current_depth[$current_nest]--;
            else
                $current_depth[$current_nest]++;
            
            if ($current_depth[$current_nest] <= $tags_nested_depth[$current_nest])
                $current_nest = '';

            continue;
        }

        if (substr($current, 1, 1) == '/')
        {
            //This is if we are closing a tag
            if ($opened_tag == 0 || !in_array($current_tag,$open_tags))
            {
                //We tried to close a tag which is not open 
                if (in_array($current_tag, $tags_opened))
                {
                    $error = "$current_tag was not open!";
                    return false;
                }
            }
            else
            {
                while (true)
                {
                    if ($open_tags[$opened_tag] == $current_tag)
                    {
                        array_pop($open_tags);
                        $opened_tag--;
                        break;
                    }
                    
                    if (in_array($open_tags[$opened_tag],$tags_closed) && in_array($current_tag,$tags_closed))
                    {
                        $error = "$current_tag cannot be closed until $open_tags[$opened_tag] is!";
                        return false;
                    }
                    elseif (in_array($open_tags[$opened_tag],$tags_closed))
                        break;
                    else
                    {
                        array_pop($open_tags);
                        $opened_tag--;
                    }
                }
            }
            if (in_array($current_tag,$tags_nested))
            {
                if (isset($current_depth[$current_tag]))
                    $current_depth[$current_tag]--;
            }
            $new_text .= $current;
            continue;
        }
        else
        {
            // We are opening a tag
            if (in_array($current_tag,$tags_block) && !in_array($open_tags[$opened_tag],$tags_block) && $opened_tag != 0)
            {
                // We tried to open a block tag within a non-block tag
                $error = "$current_tag cannot be within $open_tags[$opened_tag]!";
                return false;
            }
            if (in_array($current_tag,$tags_ignore))
            {
                // Its an ignore tag so we don't need to worry about whats inside it,
                $current_ignore = $current_tag;
                $new_text .= $current;
                continue;
            }
            if (in_array($current_tag,$open_tags) && !in_array($current_tag,$tags_nested))
            {
                // We tried to open a tag within itself that shouldn't be allowed.
                $error = "$current_tag cannot be within itself!";
                return false;
            }
            if (in_array($current_tag,$tags_nested))
            {
                if (isset($current_depth[$current_tag]))
                    $current_depth[$current_tag]++;
                else
                    $current_depth[$current_tag] = 1;
                    
                if ($current_depth[$current_tag] > $tags_nested_depth[$current_tag])
                {
                    $current_nest = $current_tag;
                    continue;
                }
            }
            $open_tags[] = $current_tag;
            $opened_tag++;
            $new_text .= $current;
            continue;
        }
    }
    // Check we closed all the tags we needed to
    foreach($tags_closed as $check)
    {
        if (in_array($check,$open_tags))
        {
            // We left an important tag open
            $error = "$check must be closed!";
            return false;
        }
    }
    if ($current_ignore)
    {
        // We left an ignore tag open
        $error = "$current_ignore must be closed!";
        return false;
    }
    return $new_text;
}

The errors would need tiding up and put in the language file, but it basically checks for the following things:

* Tag not open (if you try to close one thats not open)
* tag already open e.g. [b][b] unless its a [quote ]tag (other "nestable" tags can be added)
* stuff like [b][i][/b][/i] e.g. incorrect nesting (if you want to allow some slightly bad nesting you can as it might annoy users)
* tag not closed if you don't close a tag you open
* block tag inside inline tag e.g. [b]
[/b]

It also replaces the overflow code and lets you set how deep any tag can be nested, it returns the text with the overflows removed already. and since 1.3 is having hooks i wrote it with that in mind so by simply adding to the arrays new bbcode can be added and can be given properties of [ quote] or [ code]

I have tried to test it but it might have bugs, to use it as it is, add it to parser.php and:
find

    // Run this baby!
    $text = preg_replace($a, $b, $text);

and after add

    $text = preparse_tags($text, $error);
    if ($error)
        // A BBCode error was spotted in check_tag_order()
        $errors[] = $error;

and replace

    if (!$is_signature)
    {
        $overflow = check_tag_order($text, $error);

        if ($error)
            // A BBCode error was spotted in check_tag_order()
            $errors[] = $error;
        else if ($overflow)
            // The quote depth level was too high, so we strip out the inner most quote(s)
            $text = substr($text, 0, $overflow[0]).substr($text, $overflow[1], (strlen($text) - $overflow[0]));
    }
    else
    {
        global $lang_prof_reg;

        if (preg_match('#\[quote=("|"|\'|)(.*)\\1\]|\[quote\]|\[/quote\]|\[code\]|\[/code\]#i', $text))
            message($lang_prof_reg['Signature quote/code']);
    }

with

    if ($is_signature)
    {
        global $lang_prof_reg;

        if (preg_match('#\[quote=("|"|\'|)(.*)\\1\]|\[quote\]|\[/quote\]|\[code\]|\[/code\]#i', $text))
            message($lang_prof_reg['Signature quote/code']);
    }

Re: BBCode bug with improper nesting: [quote][url][/quote][/url]

Very nice Connor. I'll have a look at it when I get home from work and when I have installed MySQL, Apache etc (my computer broke down again!).

"Programming is like sex: one mistake and you have to support it for the rest of your life."