Topic: RSS newsbot not working so well

Ok... i wrotes this little script... but the html is not well parsed.
I am bypassing the html special chars the way elbecko told me here

this is just the beggining, but is not working so well... i guess i alsoforgot to update some post counts but i go into that later.... by now i would like to have the html well parsed...

i picked a feed with a lot of html on purpose.

can anybody point me where and with parsing am i missing?

thx

<?
require('magpierss/rss_fetch.inc');

$db_host = 'localhost';
$db_name = 'my_db';
$db_username = 'imacooluser';
$db_password = 'mypass';

mysql_connect($db_host, $db_username, $db_password);
mysql_select_db($db_name);

$url="http://chrisgoes.blogspot.com/rss.xml";
$rss = fetch_rss($url);


foreach ( $rss->items as $item ) {
            
            $topic_subject        =    $item['title'];
            $poster                =     "postbot";
            $timenow            =    time();
            $topic_forum_id        =    2;
            $post_message        =     mysql_escape_string($item['description']);
            $poster_id            =    3;

            
            $sql="
                insert into topics
                    (subject,
                    poster,
                    posted,
                    forum_id,
                    last_post_id,
                    last_poster)
                values
                    ('$topic_subject',
                    '$poster',
                    $timenow,
                    $topic_forum_id,
                    $timenow,
                    '$poster')";
                    
            mysql_query($sql);
            
            $topic_id = mysql_insert_id();
            
            $sql="
                insert into posts                
                    (poster,
                    poster_id,
                    message,
                    html,
                    hide_smilies,
                    posted,
                    topic_id)
                values
                    ('$poster',
                    $poster_id,
                    '$post_message',
                    1,
                    1,
                    $timenow,
                    $topic_id)";
            
            mysql_query($sql);
            $last_post_id = mysql_insert_id();
     
            $sql="update topcics set last_post_id=$last_post_id where id=$topic_id";
            mysql_query($sql);
    }
    ?>

Re: RSS newsbot not working so well

it ouputs something like this

<p><div xmlns="http://www.w3.org/1999/xhtml"><a href="http://photos1.blogger.com/blogger/1179/1741/1600/Aztecs%20Live%20-%20Front.0.jpg"><img style="FLOAT: left; MARGIN: 0px 10px 10px 0px; CURSOR: hand" alt="" src="http://photos1.blogger.com/blogger/1179/1741/400/Aztecs%20Live%20-%20Front.0.jpg" border="0" /></a><strong><span style="font-size:85%;">Size: 80.6 MB</span></strong><br /><strong><span style="font-size:85%;">Bitrade: 256</span></strong><br /><strong><span style="font-size:85%;">mp3</span></strong><br

Re: RSS newsbot not working so well

The problem is that the mysql query on line 186 isn't including p.html in its query.  You need to add ', p.html' somewhere in the select part of the query so the html field will be present in the array.  Additional, you spelled 'topics' incorrectly in your last query, and it might be a good idea to use $db->prefix.'tablename' instead.

Furthermore, every time this executes it will add a new topic for each item in the rss feed, regardless of whether you've already posted it.  You should implement a method to store the lsat rss timestamp or store all the used guids so you won't repost messages.

Re: RSS newsbot not working so well

first of all thenks for the help.
Though i didnt understand some stuff.

line 186? my script only has about 70 lines... wich query are you talking about? the first? the second? the third?

i also didnt understand the p.html hint... are you refering to the new field i created to check if the message has html or not? i am not using that field in any SELECT sql command... where should i have a SELECT and what for?

Re: RSS newsbot not working so well

In viewtopic.php, include p.html in the post fetch query. Only then the check for HTML will work wink

Re: RSS newsbot not working so well

Oops, yeah, forgot to say that that was line 186 in viewtopic.php.  Your modification was checking an array index which didn't exist, so the condition always evaluated to false.

Re: RSS newsbot not working so well

thanks guys, i actualy had updated the viewtopic, but i had a coma out of place...

i leave the date check for the last... lets see what do i need to update more

8 (edited by pedrotuga 2006-09-11 00:32)

Re: RSS newsbot not working so well

Its working!

its a pitty that a lot of sites say "RSS 2.0" but then each  one uses a diferent structure...

So far is working ok... just need to fix some relationships updated... they are more than i thought...

i will stop and do some school stuff now... i get back to it within a week or something.

thanks for the help... appreciated.