The better browsers will display it as real Unicode rather than % encoding, but urls should be issued in % encoding (http://www.php.net/manual/en/function.urlencode.php). This is the right way to do this rather than stripping accents etc.
No it's not.
Or, to be more precise, it really depend on the context: the language, the application, the website, etc.
For example, the Wikipedia way of systematically using % encoding is the right way. It's a very simple point of entry: you want the article about ?Jérémie?, you add this to the last part of the URL and it works.
On the other hand, it render the URL unreadable by human and by some older machines. For a forum software such as PunBB, for French, I can clearly state that romanization is the way to go.
And it works for all languages.
Nope. There's a huge, colossal, phishing issue about unicode URL. Some characters are almost impossible to distinguish (and that's by a expert, there a large amount of characters the general public won't identify as non Roman), and it's right now a real security threat. It has to be solved, and it will be, because there's no ground for a pure basic English only domain names and URLs; but we're not there yet.
And there's also an issue with human interaction. How would I go to punbb.org/??????????? with my azerty (or qwerty, or whatever) keyboard? Yes, the general public Google it, and almost never type an URL by hand. But I would say no to ignoring an issue and lowering usability to the largest common factor, as a web user I _demand_ to keep my own usability.
Only older browsers will actually display this as %bla.
I doubt Firefox 2 qualify as an old browser, just to point one example.
And remember that domains & URL are used by UA, not just browser. Google bot is an UA for example, and wget, and zillions of similar tools that have very few in common with a web browser as you may know it.
Google correctly indexes both real unicode and % encoded URLs in the index, so you get SEO power with URL rewriting even of non-ascii URLs. Google then displays them as real unicode in search results.
That's good. But I think it's a minority right now.
As a user, I would advocate such encoding in the future, but I think it's counter productive right now. If I want to buy a TV spot for jérémie.fr, I *know* it will create some confusion and dispersion, and I *know* that I will have to buy jeremie.fr or someone will steal my prospects.
Also, the official word from Google is to use _ instead of %20 (space) in URLs, so that should be done too.
Google doesn't design web standards. The W3C does, and the RFC. I don't know what the standard is, and I know that if it exist it need to be perfected to handle pishing, human error, and human usability.
The other way is to support an optional numbers only URL rewriting scheme
The only right way of doing this is allowing every single one to choose what URl scheme they want. And PunBB 1.3 does this already with its extenstion system.
Here is a scheme that words for all languages:
By definition, this doesn't exist. let's take it apart:
Nope. In some language, the same word with or without uppercase doesn't mean the same thing. If you want to advocate the "right" thing, and the "future right now", then respecting the language is the only way to go, and that mean going all the way.
Lowercase is the best option for the current romanized scheme yup. But that's it.
) Strip stop words (can be obtained from any localized search list for any language).
Again, stop words may be useless, in the way, for some applications, but may not be in other cases. You is a stopword, youtube agrees.
Again, there is no ?one rule to rule them all?. Each one, if competent, choose what's best, and can hook into PunBB to make it work that way.
) Replace space with _ and then get rid of double _
From a pure, very basic seo point of view that's nonsense. Google sees - as a word separator, not _.