1 (edited by Kyle 2008-03-01 10:08)

Topic: How to make convert_to_utf8 way faster

I was just working on my phpbb2 to punbb1.3 script and even with my tiny database (only about 38k posts) it would take a dump while trying to convert everything. In short it'd hit the 32MB limit pretty quickly. Now I could just up the limit but in some places you don't have that control and at what point do you stop upping it?

So ... I think these changes work but can someone please confirm against some proper non-UTF8 content?

Change these lines:

$str = preg_replace_callback('/&#([0-9]+);/', create_function('$s', 'return dcr2utf8($s[1]);'), $str);
$str = preg_replace_callback('/&#x([a-f0-9]+);/i', create_function('$s', 'return dcr2utf8(hexdec($s[1]));'), $str);

to:

$str = preg_replace_callback('/&#([0-9]+);/', 'callback1', $str);
$str = preg_replace_callback('/&#x([a-f0-9]+);/i', 'callback2', $str);

and add these two functions:

function callback1($matches) {
    return dcr2utf8($matches[1]);
}

function callback2($matches) {
    return dcr2utf8(hexdec($matches[1]));
}

And now I can convert my whole phpbb2 database in one go (no stupid page refreshing) in about 30 seconds without going over 1MB of memory.

create_function() creates an anonymous function, which of course it was doing 152,000 times for the posts table alone. Making a proper function for it makes things much better.

Re: How to make convert_to_utf8 way faster

You're right, I'll make those changes now wink