How not to parse BBcode



  • This one is the same situation as the last time I posted code from this program.

            $match_count_news = preg_match_all("#\[news:\"(.*?)\",(.*?)\]#si", $text, $matches_news);
    
            if ($match_count_news > 0 AND $news_array === NULL) {
                    $news_array = array();
                    $result = $db->sql_query('SELECT * FROM '.TABLE_NEWS.' WHERE `news_date` != 0'
                              .' ORDER BY `news_id` DESC;', __FILE__, __LINE__);
                    while ($row = $db->fetch_array($result)) {
                            $news_array[ ] = $row;
                    }
            }
    
            for ($i = 0; $i < $match_count_news; $i++) {
                    $news_cats = $matches_news[1][$i];
                    $news_id = $matches_news[2][$i];
                    $text_in = '';
                    $j = 1;
    
                    $cat_array = explode(",", $news_cats);
    
                    foreach ($news_array as $item) {
                            if (in_array($item['news_cat'], $cat_array)) {
                                    if ($j == $news_id) {
                                            $text_in = '<a '.href('', 'news', array('id' => $item['news_id'])).' class="news_title_link">'.$item['news_name'].'</a>';
                                            break;
                                    } else {
                                            $j++;
                                    }
                            }
                    }
    
                    $text = str_replace('[news:"'.$news_cats.'",'.$news_id.']', $text_in, $text);
            }
    

    This happend every single page-load.



  • It's not the worst PHP I've ever seen, but it does burn my eyes, I'll give you that. 



  • Come one. It's perfectly reasonable code... It just shouldn't get whole table. And shouldn't order the result when it doesn't matter anyways. And shouldn't iterate whole news table on every match. And shouldn't iterate whole categories table on every match. And could use 0'th group instead of reconstructing bbcode string in str_replace. And could use callback on replace so it doesn't create a new text in memory on every replace. And...

    @Lingerance said:

    This happend every single page-load.

    Oh yes - and shouldn't do that when person reads a page, but rather when he posts the message.

    Apart from those small issues, it's perfectly reasonable code...



  • Looks like yet another bbcode engine that uses hacks to get output that happens to pass the test cases instead of breaking it up into logical pieces and outputting them later.



  • @viraptor said:

    It just shouldn't get whole table.

    Indeed; however I have no idea where to set a limit as some categories will undoubtably be not full if I do so.
    @viraptor said:
    And shouldn't order the result when it doesn't matter anyways.

    Actually it does matter; the code will replace the BBcode with the latest news (it's used in a front page display); the table is ordered by nature in the opposite order.
    @viraptor said:
    shouldn't do that when person reads a page, but rather when he posts the message.

    That also alters the behavior of the tag.


    In reality it should be an ajax call or it should grab the data from a static file that's updated when someone posts news.



  •  @Lingerance said:

    @viraptor said:
    And shouldn't order the result when it doesn't matter anyways.
    Actually it does matter; the code will replace the BBcode with the _latest_ news (it's used in a front page display); the table is ordered by nature in the opposite order.

    So... news_id is not unique in that table? Because preg matches something like [news:"news_cat1,news_cat2",news_id]. That says pretty much - "no need to sort" for me... It also says that you could substitute them to static links after writing. Did I miss the meaning of "news_id"? Is it something else than "global, unique news id in database"?



  • @viraptor said:

    So... news_id is not unique in that table?
    It's auto-increment; ORDER BY news_id DESC will actually give 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, where naturally it will give 1, 2, 3, ...

    $news_id on the other hand is mislabled, it is an index of the paticular news tag. So [news:15,1] will return a link to the newest news post in the DB for category 15. My tagging might be off, this code hasn't been touched since 2005, so I've forgotten its use. If you still don't get what exactly $news_id is look at how $j behaves.



  • @Lingerance said:

    @viraptor said:
    It just shouldn't get whole table.
    Indeed; however I have no idea where to set a limit as some categories will undoubtably be not full if I do so.

    How many categories are there, how many results are being shown per category and how many rows are in the table? 



  • @morbiuswilters said:

    How many categories are there, how many results are being shown per category and how many rows are in the table? 
    ~20; 7; 2200. The main reason I was concerned is some categories can go a year without a new post.



  • @Lingerance said:

    ~20; 7; 2200. The main reason I was concerned is some categories can go a year without a new post.

    You're probably better just looping and querying for each category rather than returning the entire result set each time.  If the category ID is indexed ~20 queries returning 7 rows each will almost certainly be faster, especially if the old news isn't purged and your total number of rows will continue to grow.  You could also loop and build a big UNION query, but that will probably be slightly slower than looping in PHP.


Log in to reply