What bothers me, is that when you simply average votes or ratings, one or two party poopers can pull down a product's score from five stars to four, even though by far the majority of voters may think that the product is worthy of 5 stars.

I tried a "weighted average" algorithm, such as is used on most sites, for example on IMDB.com. But it doesn't do what I want. So I came up with this:

$votes = array(
    1 => 1,
    2 => 1,
    3 => 0,
    4 => 10,
    5 => 20
);

$weight = 0;
$numvotes = 0;

foreach ($votes as $vote => $count) {
    $weight += $count*$count;
    $numvotes += $count;
    echo "$count people voted $vote<br />";
}

echo "total vote weighs $weight<br />";

$score = 0;
$avg = 0;

foreach ($votes as $vote => $count) {
    $score += $vote * ($count*$count / $weight);
    $avg += $vote * ($count / $numvotes);
}

echo "average is: $avg<br />";
echo "weighted score is: $score<br />";

The output of this little script is as follows:

1 people voted 1
1 people voted 2
0 people voted 3
10 people voted 4
20 people voted 5
total vote weighs 502
average is: 4.46875
weighted score is: 4.7868525896414

As you can see, the way that I calculate my weighted score, the two party poopers who voted this product 1 and 2 are nearly discredited. I also display the plain average for comparison - as you can see, in a plain average, this product would be ranked down to 4 stars. By far the majority of voters thought this product should be a 4 or 5 star product - the way I calculate it, the majority rules.

What do you think of this sort of averaging? Is it fair? Does it make sense?

(note, I'm using a power of two for the weight - I've tried other powers; a power of 1.5 seems to not tilt the scores enough, while a power of 3 or higher seems to cancel out the effects of any negative votes altogether, which is also not fair ... power of 2 seems to work best.)

I'd like to share an idea that I've been toying with ... meta-programming in php. Let's start out directly with an example:

file: "test.meta.php"

<#

function complicated($number) {
    echo floor((10 / $number*123 + 5) * $number);
}

#>

#for($i=1; $i<10; $i++) {
    echo "<# complicated($i) #><br />";
#}

In the above example, the code blocks surrounded by <# and #> are considered "meta" code. The same goes for lines beginning with # - this means the same thing, but for readability, I chose to have two different syntaxes ... so you use # when you want to write a single line of meta-code, or <# and #> when you want to write an entire block of meta-code, such as the "meta function" called "complicated" at the beginning of the script above.

Obviously, the above code won't execute, so we need a "real" php script in front of it, like the example below.

file: "test.php"

<?php

require "_meta.php";

require __meta("test.meta.php");

?>

The function "__meta" is a function that "compiles" the meta-script, and returns the name of the generated "real" php script.

The include file is provided below.

file: "_meta.php"

<?php

$__meta_buffer = "";

function __meta_accu($str) {
    
    global $__meta_buffer;
    $__meta_buffer .= $str;
    
}

function __meta_exec($path) {
    
    ob_clean();
    ob_start("__meta_accu");
    require $path;
    ob_end_flush();
    
}

function __meta_prep($inpath, $outpath) {
    
    $code = file_get_contents($inpath);
    
    $search = array(
        '/<#(.*?)#>/s',
        '/[#]([^\n\r]+)/'
    );
    
    $replace = array(
        '<'.'?php \1 ?>',
        '<'.'?php \1 ?>'
    );
    
    $out_file = fopen($outpath, "w");
    fwrite($out_file, preg_replace($search, $replace, $code));
    fclose($out_file);
    
}

function __meta($path) {

    global $__meta_buffer;
    
    $out_path = str_replace(".meta.php", ".build.php", $path);
    $temp_path = str_replace(".meta.php", ".temp.php", $path);
    
    if (@filemtime($out_path) < filemtime($path) || !file_exists($out_path)) {
        __meta_prep($path, $temp_path);
        __meta_exec($temp_path);
        $out_file = fopen($out_path, "w");
        fwrite($out_file, '<'."?php \n".$__meta_buffer."\n?".'>');
        fclose($out_file);
    }
    
    return $out_path;
    
}

?>

If you save all three files with the filenames provided, you should have a working program. When run, two new files will be created in your folder - "test.temp.php", which is your meta-script with the meta-code delimiters turned into ordinary <?php ... ?> delimiters. And "test.build.php", which is the "real" php-script generated by your meta-script.

So how is this useful?

Well, in the above example, "complicated" could be a very complicated function, containing lots of database lookups, complicated recursive complications and other heavy processing. Any meta-code will be run only once, then the resulting files are "cached" in the "*.build.php" files, which are then included when the script is run again. (unless of course the original "*.meta.php" file has changed, in which case the files will be built again)

Now picture, for example, a database-interface wrapper class, written in meta-code instead of regular php. Now instead of your code calling a set-value-method in your table-abstraction class, which then calls a method in a parent database abstraction class, which in turn calls a database wrapper class, which then finally calls mysql_query (or whatever interface your database wrapper class is currently configured to use) ... *whew* ... now instead of all those function calls, you make *one* call to a meta-function of a class, which results in *one* clean php function call. That means considerably less overhead.

Or picture an entire application framework in metacode ... a form generator/validator library for example, could be implemented, which would generate code with zero function calls. Not only that, but php would no longer have to parse generator/validator-code for 20 different types of form controls if you only happen to be using a dropdown and a text-input.

One last thought, picture a template-engine with no required includes or parsing of any kind - with a few modifications, the script I provided above, could be used as a powerful template engine; you would have no overhead from any real "engine", and you wouldn't have to learn the syntax of any template engine to get started, because it would simply be php's native syntax.

...

Well, just thought I'd share the idea with everyone ... enjoy! smile

3

(72 replies, posted in PunBB 1.2 discussion)

that's what I thought ... don't really see how it's supposed to make a patching tool obsolete, then...

4

(72 replies, posted in PunBB 1.2 discussion)

oh, sounds cool ... so these extensions, will enable you to change/add anything? it's a patching system then?

or just a limited plugin architecture like most other forum systems have?

PS: eliminating the current template system and replacing it with Smarty, would already speed things up tremendously - Smarty is infinitely faster than the "search and replace"-kind of template system currently used in PunBB wink

I too would like to see PunBB made more flexible - in my case, a template system would cover 95% of all the changes I normally make, which is usually simple stuff like showing/hiding various details etc. ... stuff that really doesn't take much time to change, but makes it hard or impossible to upgrade to newer versions without starting over every time.

I personally would like to see PunBB using the Smarty template engine:

http://smarty.php.net/

it's the best template engine there is; and it "compiles" all templates into native PHP scripts transparently, which means near-zero overhead for general usage. I've been using it for years, and it's much more efficient, and even simpler, than using simple HTML with embedded PHP statements.

using a system like this, anyone would able to add/remove information in seconds - and upgrading would be as simple as unpacking the updated source files without unpacking the default templates.

I would get on the case myself and perform the integration, but I don't have time ... plus the fact that, if I went ahead and did this, there would soon be an updated version of PunBB, and I may have to start over to integrate the changes, same way I have to start over on customized PunBB installations now...

I really would be thrilled to see PunBB supporting a "real" template system - total separation of layout and content would, imho, take PunBB to the "next level" smile

I thought of that - just if there was already something available...

Anyknow know of any "file library" mods?

What I mean is, a mod where users can upload files to various categories, rate them, and download them - I imagine the file description would be a post, and comments on the posted files would be replies ... so you could build an organized community where users can exchange files.

Anything like that around? I searched and didn't find anything...

Thanks smile

9

(72 replies, posted in PunBB 1.2 discussion)

zaher: no - that just won't work .. by the time you try to replace the line the second time, it will have already been replaced by the first one - you'll get an error message.

10

(72 replies, posted in PunBB 1.2 discussion)

Jansson was taking a look at it, but he wants to finish the new PunRez first ... any news, Jansson?

11

(51 replies, posted in Feature requests)

wouldn't a universal username/password (or even an OpenID identity) mean that spam bots would only need to register one account to be able to spam every participating site?

12

(51 replies, posted in Feature requests)

btw, Shii, what's a TypeKey?

13

(51 replies, posted in Feature requests)

I agree, this thread started out on a bad vibe.

Registration is already very simple with PunBB, certainly simpler than on most other forums. But forcing people to give up their email address - isn't it pointless? You can just use a spam address - why put users through the hassle of setting up a spam address first, so they can feel secure when they register at your forum? I would rather give my users the option to register with nothing but a username and password. Then they can use the forum for a while, and once they're comfortable there and feel secure, maybe they'll want to enter their email address, so they can benefit from features like subscriptions and password recovery.

Anyways, these are just my personal conclusions - I'm not trying to change anyone's mind, I don't know what gave you that impression; I'm just sharing my thoughts.

14

(51 replies, posted in Feature requests)

I don't see what's pointless about weighing all the pros and cons - if you think it's pointless, you didn't have to participate, but you did anyway, so it was at least important enough for you to to bother writing a reply wink

What sort of forum do you run, where you expect 10 dumbass kids for every user? ... on our forums, we have about 600 users and 15000 posts, it's been running for about 2 years - we've never been hit by trolls or spam bots yet, I really only think that happens to very large forums? I doubt if anyone would bother hitting a forum as specialized as ours with spam...

I look at the statistics, and see the same 4-5 users answering most of the questions. I would like for every passer-by to be able to contribute their knowledge, with a minimum amount of hassle - not only are many of the more skilled and experienced users too busy to bother with a registration, they're also smart enough to know that entering your real email address on a forum potentially means stepping into a spamtrap; if not by forum admins collecting your email addresses, then by email harvesters picking it up.

But every forum has it's own needs - we're merely trying to figure out what ours are smile

15

(51 replies, posted in Feature requests)

CodeXP wrote:

While true, there is one thing you're forgetting: Automated spam bots. Requiring an e-mail adress (as long as a user have to verify it) will stop most of those.

I'm sure there are other solutions to this minor issue. Any bit of client-side JavaScript should already confuse 90% of them sufficiently, for example.

Or if you're really paranoid, use the "repeat this code" system, where you display a warped GIF version of some text that the user has to repeat. (btw, I personally think this is a rather annoying system, and I'm sure the "human check" could be done in much simpler and more efficient ways)

16

(51 replies, posted in Feature requests)

ext wrote:

You want to post as "Guest" without entering a username because you find it stupid.

I agree. I think that's what Shii wanted, that's not what I want - we don't even allow guest posts on our sites ... but I would like to make the registration process as "instant" as possible.

17

(51 replies, posted in Feature requests)

Dr.Jeckyl wrote:

atleast an email addy is one more tool for their fight.

you explained yourself how easy it is to get a free spam address at Yahoo or MSN - thus the email address is no tool for the fight ... trolls are never stupid enough to use their real email addresses.

the only sure information you have on forum users, whether they post anonymously or have to register, is their IP addresses - although most forum admins probably won't know what to do with those. everything else can be (and is) easily faked.

in my opinion it does more harm than good, hassling 1000 users for information which for the most part will be useless to you anyway, in the hopes that you'll be able to catch the one troll in the flock of a thousand - which never happens anyway, unless your troll is dumb enough to register with his real name or email address.

18

(51 replies, posted in Feature requests)

Dr.Jeckyl wrote:

life is too short to bicker over simple things like annonimity on the net, move on, get a hemet.

did you actually read the arguments on the page?

the primary issue here is not anonymity, but the fact that "people with lives", who have important knowledge, often are too busy to bother with the registration process, just so they can post a single useful remark on a forum that they happened to come across. I know this happens, and I do it myself all the time.

I have a spam account, of course, but I don't bother with forum registrations unless I need an answer myself; I'm sure this is true for others, and I think, in the long run, this means that most of the users who register, are users who are looking for answers ... the users who have answers, in my experience, are usually few and precious on a forum; which is often indicated by the fact that most of the questions on the forum, are answered by the same few people - "people with no lives".

I'm certain that this is a valid concern and a very real issue.

I am unsure of the solution, though - I'm not sure if total anonymity is the answer.

An "instant registration" feature, in my opinion, would be a better choice - that is, a tiny form with only username and password inputs, that allow you to register instantly, without entering your email address, and without email validation; both of which are pointless, since so many people just use a "spam address" to register, and never check incoming email on that address in the first place.

This would be a pretty simple mod to implement, and in my opinion a much better solution than the "shii" style forums - which essentially work the same way, they just present the information differently; I would prefer a tiny box, which actually says "instant registration", since that is what's really going on, except of course when you post completely anonymously with just a username, but I don't see the point in that in the first place - if you allow users to register "instantly" without any hassle, that should be just as good.

Rickard wrote:

Ah, maybe that's not such a bad idea. Have you measured any noticeable speed increase?

I haven't time the queries to see what's faster, no - I'm out of time for today, sorry smile

As you noted earlier, a major bottleneck at the moment seems to be the PHP script time itself, perhaps more so than the actual database queries, so the above change may give a barely noticeable performance increase for most searches...

So I'm looking at your search code, and there is a major difference between the way you do it, and the way it's done in wordindex.

PunBB walks through the wordlinks one at a time (the while-loop starting at line 167 in search.php).

Wordindex gets all of the wordlinks, sticks them in one array per word, and then uses one function call to combine those arrays.

Clearly, if you can get away with one function call instead of a whole bunch of function calls, this is going to give you a massive speed advantage in those cases where a word has lots of wordlinks, because looping through them in your script is a slow and heavy process compared to what a native, compiled function can achieve.

So what you might want to do, is try to avoid using loops - use single calls instead, and you should get a considerable performance increase.

Strangely enough, the MySQL library apparently does not provide a function to get all rows from a result to an array with one function call? You will still have to get get them one at a time and push them to an array.

Once you have all of your wordlink arrays (one array per word) filled, you can then join them with a single function call, which should give a notable speed increase.

Try array_intersect() for "OR" searches, and array_diff() for "AND" searches ... should considerably faster than doing similar operations manually?

I have been doing a closer examination of the data structures.

You have chosen to make `search_words`. `word` a VARCHAR(20), which doesn't really seem to make any sense ... a CHAR would give faster searches ... of course, this comes at the cost of more diskspace usage, but in my database with about 15.000 words, it only grew from 590KB to 648KB - because yes, the word records as such will take up more space, but the search index will also take up less space. So this seems like a sensible change - using VARCHAR instead of CHAR for strings as small as 20 chars, doesn't really make any sense.

This seems to be the only relevant difference between your table structures and the ones used in wordindex.

22

(51 replies, posted in Feature requests)

shii, I read the arguments on that page of yours .. I follow some of these points, and I wouldn't object to the idea, if it were an optional feature (or a mod) and not the default setting - we have to respect the fact that most people will probably still want their forums to work in the same way a traditional forum works.

but freedom and privacy are important values, I can understand why forums of this type would be attractive ... your most important point, I think, is the fact that a lot of people with valuable information simply won't bother with the registration process, if they just want to quickly share some important information they might have - I have held back important information myself in the past, simply because I couldn't be bothered to register just for that.

however, one thing that I don't like ... if it's completely anonymous, what's to stop me from putting "shii" as my username and posting malicious fake posts in your name?

Rickard: upgraded to PunBB v1.2.5 - I wanted to do some search testing ... in common.php I enabled the following:

define('PUN_DEBUG', 1);
define('PUN_SHOW_QUERIES', 1);

but it doesn't seem to work properly, it doesn't display all the queries? namely the interesting queries - word lookups and wordlink lookups - are not shown in the output??

If you don't have time, let me know, and I'll see if I can find time to do some tests with PunBB smile

Rickard, take a look here:

http://wordindex.sourceforge.net/

This search engine is incredibly fast - and as far as I can see, the only major difference between this system and yours, is that the wordlinks (links from word IDs to post IDs) is split into multiple tables. Despite your faith in MySQL's ability to search a single enormous table quickly enough, this does not seem to be the case.

The indexer is done in PERL, but as said, is not really that different from yours, as far as I can tell - the table structures are very similar as well, but you may want to examine them for key differences; for one, a double-key is used in the word-link tables, I think this may be one of the key points to it's high performance.

The example on the website indexes 22.000 posts with 3 million wordlinks split across 100 tables, e.g. 20-50 thousand wordlinks per table. A search typically takes less than one second. Now, I know you may have forums with 100 times that number of posts - but the thing is, according to the programmer of this search engine, it doesn't have to get any slower, you can just split across an even greater number of tables; he has successfully split 14 GB of posts over 1000 tables, and still been able to search them in less than one second even on a "modest server".

The technique used to determine which table a word is located in, is simple - just take the modulus of the word ID and the number of tables, e.g. word 12345:

12345 % 100 = table 45

The search engine itself is done in PHP, so it should be easy for you to compare it to yours.

Just try the demo on the website, and you'll see, it's incredibly fast smile

I don't know if splitting across multiple tables is the only key to it's speed, there may be other key factors in the PHP script...