And if you are using unicode, and it eats the long string incorrectly while inserting index words, you will get an error.

Yes, I'm using the Feed Aggregator (http://www.punres.org/desc.php?pid=373), the only thing is that it takes the whole feed. What I need is for it to filter the feeds for keywords and only post those which contain the keyword.

Working on it.

3

(26 replies, posted in PunBB 1.3 troubleshooting)

I hope the PM remains an extension and does not become part of punBB.

Good idea to add this.

Thank you so much for your work!

Jérémie wrote:

I was going to answer point by point, especially with that amount of wrapping around my meaning, I even read again the appropriate RFC. Then, I saw that I'm closed minded and xenophobic, so I won't bother.

Yeah, and you just proved it.

Thank you, Jérémie, for taking the time to support internationalization and other languages in PunBB!

Thank you!

orlandu63 wrote:

Just give up, Solovey. You won't get anything through these guys.

I run several dozen websites in different languages, and deal with these issues everyday, as they mean a lot to the bottom line.

The argument dashes vs. underscords is a sideshow to the real issue:

The anti-internationalization attitude of the developers of PunBB!

I've looked through numerous threads where people requesting changes for the benefit of other languages, that would make this a much better peice of software were shunned or shot down.

"There is only so much we can do for ... Chinese... etc." pretty much sums this up.

Unfortunately, that means that PunBB will need some extra modding and hacking before I can deploy it in a real multilingual environement. BECAUSE FEATURES THAT SHOULD BE INCLUDED BY DEFAULT ARE NOT INCLUDED.

8

(69 replies, posted in News)

Congratulations!

Jérémie wrote:

For example, the Wikipedia way of systematically using % encoding is the right way. It's a very simple point of entry: you want the article about ?Jérémie?, you add this to the last part of the URL and it works.

On the other hand, it render the URL unreadable by human and by some older machines. For a forum software such as PunBB, for French, I can clearly state that romanization is the way to go.

You may want to do romanization for French, but that is not applicable for every language.

Nope. There's a huge, colossal, phishing issue about unicode URL.

This pertains to domain names themselves, and has absolutely nothing to do with the URL or rewriting, and as such is beyond the scope of the URL rewriting feature.


And there's also an issue with human interaction. How would I go to punbb.org/??????????? with my azerty (or qwerty, or whatever) keyboard?

How would you typein access a website in Hebrew? Well, that really is for YOU to figure out. The solution is simple: learn Hebrew! Or, use copy and paste.

The point of URL rewriting is not to let people type-in URLs. The point is to make them friendly for search engines to score higher on keyword to URL matching in search results.

I doubt Firefox 2 qualify as an old browser, just to point one example.

Firefox qualifies as a browser that is marginal and has low internationalization support.

If I want to buy a TV spot for jérémie.fr, I *know* it will create some confusion and dispersion, and I *know* that I will have to buy jeremie.fr or someone will steal my prospects.

Once again you are confusing IDN (internationalized domain names) with URLs.

We are talking about test.com/THIS_PART_OF_THE_URL not the domain name.

Google doesn't design web standards. The W3C does, and the RFC. I don't know what the standard is

Thats absolutely right. Google is merely following W3C standards and RFC.

Here is the RFC: http://www.ietf.org/rfc/rfc2396.txt

Uniform Resource Identifiers (URI): Generic Syntax
August 1998

Notice the date? 1998? 10 years ago? And you are suggesting it is too early for PunBB to support it?

It is regretable that closed-minded and possibly xenophobic Western developers like you are so block headed about standards that support languages of the world.

It is unfortunate that you still resist internationalizaton, even when you promise us utf-8 support in the new PunBB.

This is not a small issue. Do it right the first time, and you won't have complaints in the future.

From a pure, very basic seo point of view that's nonsense. Google sees - as a word separator, not _.

I suggest you do some reading on Google's latest thoughts on this.

July 23, 2007 10:24 PM PDT
Underscores are now word separators, proclaims Google

http://www.news.com/8301-10784_3-9748779-7.html

10

(38 replies, posted in PunBB 1.3 troubleshooting)

Rickard wrote:

Correct about lowercase.

Solovey wrote:

We need valid language characters in the URL, rather than symbols. And that is not limited to A-Z. A-Z is only good for English.

Well, there's only so much we can do for Russian, Chinese etc. We can replace stuff like é with e and ä with ä, and after that, enforce a-z0-9-, but that's about it. As far as I know, you can't put non-ascii characters into a URL. If you do, they'll get URL encoded by the browser.

All that needs to be done is conversion of the entire URL (except the domain name) to % encoding.

The better browsers will display it as real Unicode rather than % encoding, but urls should be issued in % encoding (http://www.php.net/manual/en/function.urlencode.php). This is the right way to do this rather than stripping accents etc. And it works for all languages.

Only older browsers will actually display this as %bla.

Google correctly indexes both real unicode and % encoded URLs in the index, so you get SEO power with URL rewriting even of non-ascii URLs. Google then displays them as real unicode in search results.

Just take a look at what Wikipedia is doing. They do it correctly with 100% non-ascii URLs for all wikis.

Example in English:

1) Regular URL:  http://en.wikipedia.org/wiki/É
2) % encoded URL: http://en.wikipedia.org/wiki/%C3%89   <--- even this will display as 1 in Google.

Also, the official word from Google is to use _ instead of %20 (space) in URLs, so that should be done too.

The other way is to support an optional numbers only URL rewriting scheme, in the style http://test.com/10001/ for forum post 1001.

Here is a scheme that words for all languages:

) lowercase

) Strip stop words (can be obtained from any localized search list for any language).

) Shorten to max valid URL length (but we should be short enough with a message title.).

) Replace space with _ and then get rid of double _

) If result is emply or only spaces, get the first sentence of the message and go back to step 1.

) Convert to % encoding.
http://www.php.net/manual/en/function.urlencode.php
string urlencode ( string $str )

Now updated fresh punbb-1.2.17 release with UTF-8 suport enabled:

punbb-1.2.17+utf-8+1.0.zip

Creates fresh utf-8 enabled databases. Fixes search index and other misc errors related to unicode. Requires mbstring.

WHAT WAS CHANGED IN THIS UTF-8 VERSION:

Guided by the UTF-8 FAQ http://punbb.ru/viewtopic.php?id=1222
the following has been changed from the official version of PunBB:

Requirements: mbstring extension must be enabled in php setup.
http://www.php.net/mbstring

1. Default English language pack changed to utf-8.

2. All cases of strlen, strtolower, strtoupper changed to mb_strlen, mb_strtolower, mb_strtoupper, except in function pun_strlen in functions.php to support unicode text functions.

3. Optional multibyte searching turned off in seach.php because regular search with indexing now supports unicode.

4. search_words.word field changed from 20 to 200 to accomodate longer length of words required by unicode, updated in install.php.
Fixes this issue:
Indexing is not unicode-safe:
http://forums.punbb.org/viewtopic.php?id=4509

5. search_idx.php updated to accomodate 200 length of search words.

6. install.php updated to force creation of utf8 tables in initial database creation.

7. Known issues: Chinese and other language with no spaces between words are still an issue. More work in include/search_idx.php needs to be done to support this.

vigya wrote:

Is any quick fix avaialable to to add the unicode char in the english setting. I am able to see the words properly when we use the same language but if i change the language then the previous post in other  language looks in unreadable format. Is there anyway so it can show it in proper way?

1. Fresh install.
2. Install English utf-8 language pack.
2. Install other language packs.

The English utf-8 language pack is available on the same download page as the others.

Edit: See next post.

13

(38 replies, posted in PunBB 1.3 troubleshooting)

Rickard wrote:

I agree that sef_friendly() needs a bit more work. Essentially, we only want A-Za-z0-9 in the URL.

We need valid language characters in the URL, rather than symbols. And that is not limited to A-Z. A-Z is only good for English.

14

(9 replies, posted in PunBB 1.2 bug reports)

Rickard wrote:

One of the reasons we do not support UTF-8 in 1.2. PunBB 1.3 will have full support though.

One more reason you should roll in the changes that fix utf-8 in 1.2 right away as an update.

Rickard wrote:

As much as I appreciate your work, since PunBB 1.2 does not properly support UTF-8, I don't believe we will be adding these to the download page. PunBB 1.3 on the other hand, which is coming shortly, supports UTF-8 out of the box.

Well, you've got *some* 1.2 language packs available for download in utf-8, and not others, so I don't really see what the issue is here.

Whats the right way for these to be included in the main download page?

Please just make the English, we can localize it ourselves, eventually everyone can share the rules.

Thanks! Changed.

big_smile

Get a blank page when trying to load admin.

pheldens wrote:

Hi I solved it in the end by replacing iso-8859-1 with utf-8 in the english language file. I realise this may not be the root cause.

I believe it is the root cause. Punbb needs to move to utf-8 asap.

I think the label "not recommended" should be removed. It makes it seem as if something will go wrong if the time is set to zero, whereas in reality, it is simply a user interface and aesthetics issue, and forum operation is not affected.

Not recommended warnings should be put on things that have serious reason not to be recommended.

Many people have asked to do without this redirect page, enough so that the option of turning it off is a valid user interface choice, and is in not way detrimental to the operation of the forum, as valid as having it on.

The reason this question comes up so many times in the support forums, is because of the "not recommended" text, which leads people to inquire, what, indeed is behind the dire warning.

If you just took that warning text out of there, less time would be wasted answering support questions about this.

Please consider it.

Thats not a feature, that's a hindrance to the end-user-developer who needs to simply edit the styles.

I think this was done on purpose, so that editing the style would be difficult, ie. to keep a consistent "approved" pubBB style intact.

The CSS code is very arcane, and this is the most annoying part of using punBB.

Lao, Khmer, Burmese, Nepali, Bengali,  Sindhi,  Gujarati,  Tamil, Urdu, Sanskrit, translations wanted.

Please PM if you can do it.

Thank you.

25

(9 replies, posted in PunBB 1.2 troubleshooting)

Rickard wrote:

Solovey: PunBB 1.2.* does not support UTF-8. You can try to use it, but there's no "official" support. We need to make a bunch of changes to the codebase in order to support UTF-8.

Please do make those changes. UTF-8 should be standard by default.