Re: The migration to utf-8 delete text, data loss!

Slavok wrote:

I can't reproduce this data loss. I created a db with latin1_spanish_ci collation, wrote a post with message "123á456", and after the migration I do see it. What did I miss?

Hi Slavok,
have you configured your mysql following these screenshots?
http://punbb.informer.com/forums/post/124662/#p124662

Parpalak,
glad to know you will look a the bug next week.

Many thanks to you both
Oliver

http://tinymailto.com/oliversl <-- my email after a captcha

Re: The migration to utf-8 delete text, data loss!

I've past this bug by doing a backup of tables, topics, posts and forums and import them back with some modification after the upgrade --> result http://futurama-france.fr/forum/index.php

28

Re: The migration to utf-8 delete text, data loss!

Parpalak wrote:

Posts and topics processing is required because 1.3 uses UTF-8. Also its parser works in a different way.

Yes, of course. What I am trying to say is that there is no reason to convert the database during upgrade. MySQL will accept and return data in UTF8 regardless of what encoding is used on the tables, as long as SET NAMES utf8 is used.

As for repeatability, my post should do it just fine.
http://punbb.informer.com/forums/post/125628/#p125628

Re: The migration to utf-8 delete text, data loss!

pepak wrote:

You can perform the fixing using a sequence of ALTER TABLE's:
1) Convert all character fields to either BLOB or BINARY without changing charset:
2) Convert all character fields back to the correct type with correct charset:
3) When all fields are converted, change the declaration of the table itself:

As far as I understand the update script works just like you have described:
http://punbb.informer.com/trac/browser/ … e.php#L338

Before 1.3 release we had tested the update process and had added SET NAMES:
http://punbb.informer.com/trac/changese … update.php

Maybe this SET NAMES call is wrong, we'll continue testing.

30

Re: The migration to utf-8 delete text, data loss!

Parpalak wrote:

As far as I understand the update script works just like you have described:
http://punbb.informer.com/trac/browser/ … e.php#L338

The difference is that the script does not know the correct charset and I see no easy way for it to recognize it.

31

Re: The migration to utf-8 delete text, data loss!

No wonder the users are losing their data!

What this function does is, it assumes that there is UTF8 data in the table and modifies the structure to match that. If the assumption is wrong - and it will often be wrong - it will simply take data in current encoding and tell MySQL that it is in fact UTF8. Which leads to data loss in itself, and if it just happens that the source data contain sequences not permitted under UTF8, the string will likely get truncated at that point. When I was upgrading to 1.3, for example, my data was in cp1250...

Re: The migration to utf-8 delete text, data loss!

Actually, the update script asks the encoding of the language pack before updating. Then it converts posts to UTF8 by calling this function for every post:
http://punbb.informer.com/trac/browser/ … e.php#L231
And then it tells MySQL that the data encoding is UTF8.

33

Re: The migration to utf-8 delete text, data loss!

Parpalak wrote:

Actually, the update script asks the encoding of the language pack before updating. Then it converts posts to UTF8 by calling this function for every post:
http://punbb.informer.com/trac/browser/ … e.php#L231
And then it tells MySQL that the data encoding is UTF8.

And if old charset is NOT ISO-8859-1 and neither iconv and mb_convert_encoding exist, leaves the string unchanged but tells MySQL that it is UTF8.

What the upgrade should do, and what WOULD be foolproof provided that the user tells the correct encoding, would be a sequence of ALTER TABLEs:

1) ALTER TABLE ... ALTER [string_field] BLOB
2) ALTER TABLE ... ALTER [string_field] [original_type] [user's_encoding]
3) ALTER TABLE ... ALTER [string_field] [original_type] CHARACTER SET utf8

Or even just steps 1 and 2, those would suffice and might be even safer. Conversion to UTF8 can be done on-request thanks to SET NAMES utf8.

Re: The migration to utf-8 delete text, data loss!

pepak wrote:

And if old charset is NOT ISO-8859-1 and neither iconv and mb_convert_encoding exist, leaves the string unchanged but tells MySQL that it is UTF8.

No, a message is displayed in this case:
http://punbb.informer.com/trac/browser/ … e.php#L426

Are you sure that these ALTER queries will work on PostgreSQL and SQLite?

It was not me who designed db_update.php so I can't explain its logic in details. To tell you the truth, I'm still confused a little with all these encodings and collations in databases. But I want to fix bugs if they exists and will continue investigating.

35

Re: The migration to utf-8 delete text, data loss!

Parpalak wrote:

Are you sure that these ALTER queries will work on PostgreSQL and SQLite?

I fail to see how that is relevant - database conversion code will almost certainly need to be hard-coded for every database separately. That is, if you want a reliable code.

It was not me who designed db_update.php so I can't explain its logic in details. To tell you the truth, I'm still confused a little with all these encodings and collations in databases. But I want to fix bugs if they exists and will continue investigating.

Well, my main database is Firebird so I can't really tell you details about PostgreSQL and SQLite.

With MySQL, you don't care what encoding the data is stored in the database. All you need to do to get UTF8 output, regardless of encoding actually used by the database, is:

1) Make sure table structure matches table data. Which is NOT the case with many PunBB 1.2 installations, including mine - PunBB 1.2 did not create the tables correctly.

2) Make sure SET NAMES utf8 is called before any other SQL command.

Even if #1 is not satisfied, this approach will not lead to data loss on old data - old posts will simply display incorrectly, but as soon as table structure is fixed to match the data, everything will be fine.

Upgrade script uses a much more dangerous approach of reading all data, converting it to UTF8 and writing it back.

36 (edited by gorsan 2009-04-08 05:32)

Re: The migration to utf-8 delete text, data loss!

Mr.Awesome wrote:

I've past this bug by doing a backup of tables, topics, posts and forums and import them back with some modification after the upgrade --> result http://futurama-france.fr/forum/index.php

could you please explain what you did in more details

i tried to do what has been said here but no success.

Or anyone have an idea how to import back with modification ?

37 (edited by targetinthebox 2011-03-06 10:45)

Re: The migration to utf-8 delete text, data loss!

thanks for your help.....

38 (edited by toni0 2012-06-14 22:29)

Re: The migration to utf-8 delete text, data loss!

Possible workaround

I have recently decided to migrate my forum from PunBB 1.2 to PunBB 1.3, but went across that issue too as my database's charset was latin1 (installed in french originally, but then used for international purpose)

I've tried to manually convert my columns to Binary (BLOB) before converting them to UTF8 as suggested, unfortunately this led to the same loss of data because of special characters in topic names or messages (ä, ö, é, ´, ...)

I've finally managed to deal with it just by using MODIFY statements on all the columns for which charset was latin1.
This works as long as the columns are declared as latin1 and the data are really stored in latin1, which is the case in most cases (this will also work with other charsets on the same condition).

I have created a script that converts all your standard PunBB 1.2 database columns to UTF8, keeping the default parameters. Please note that it won't affect manually added tabs or columns (some 1.2 plugins require you to create new colums by hand, CountryFlags does for example), so you have to modify the script in consequence if you are concerned.

You MUST NOT tick the "Enable conversion" when prompted box in db_update.php, as this has already been done by the script. Ticking it will lead to data loss.

>> PunBB DATABASE CHARSET CONVERSION SCRIPT <<
<!> Read instructions inside the script before using.

Tested in real conditions and functional, hope this helps, even in july 2011...  smile

39 (edited by Gotipe 2011-11-19 11:35)

Re: The migration to utf-8 delete text, data loss!

Is this even something PunBB has fixed? I have been trying all possible methods to upgrade from 1.2.22 to PunBB 1.3 and every single time it cuts off all text in topics, posts etc etc etc whenever a non-English letter (å ä ö Å Ä Ö) appears.

I don't understand much of those possible solutions offered by non-staff in this topic more than that they are possible solutions and offered by non-staff.
What I can tell is that 99% of all data gets erased whenever I try to upgrade.

If there is a fix to this I have missed, a new version of the db_update.php file, please do tell me.

EDIT: I can confirm that toni0s solution does indeed work, if you get an error message with table bans or something, just run each section separately, that worked for me.

Darth Vader will chase you away with his lightsaber!
GalacticEmpire.se Forum

40

Re: The migration to utf-8 delete text, data loss!

YOU SAVED MY LIFE WITH THIS:
http://punbb.informer.com/forums/post/140765/#p140765
your script stopped with error after each section, but i copied/pasted section by section and ran in phpmyadmin - took a while, but my, oh my!!!!
It worked wonders - THANK YOU.
Now - amazing that this solution is so difficult to find in the forums or the installation/upgrade instructions. How do we let everybody else know?

New Friendly web-shop! • SO happy with PunBB! • Now punBB 1.4.x on ALL forums (won't tell how many or their addresses to avoid spam-regs)