1

Topic: XHTML Mime type

I'm just curious, has anybody apart from me ever tried serving PunBB with the correct mime type i.e. "application/xhtml+xml" as opposed to text/html.

I'm also curious as to whether anybody feels they need/want to.

Re: XHTML Mime type

doesn't that just not work in IE?

3

Re: XHTML Mime type

There is more to it than that. Most xhtml strict validating sites won't actually run as application/xhtml-xml. For example, character entities such as   are not really valid, they only work because pages are parsed as html. Some fairly common pieces of javascript don't work either and google ads go splat unless work arounds are employed. Also stylesheets should to be served as xml stylesheets.

Re: XHTML Mime type

whats the advantage of using the xhtml mime type? except being more "1337" wink

Re: XHTML Mime type

lots of practice at workarounds. big_smile

~James
FluxBB - Less is more

6

Re: XHTML Mime type

Theoretically it should render pages slightly faster because everything doesn't have to be converted to hmtl though I don't think anybody has actually proved it yet.
It's more standards compliant, the specs say that it's the correct mime type for xhtml
It can work in conjunction with other apps which require xml.

In reality, I would rather not be bothered with it which is why I asked if anybody had a need for it.

7 (edited by badrad 2005-05-14 00:34)

Re: XHTML Mime type

Hey, I just came to the forums to post about this. People you can read up on why XHTML sent as text/html is bad here:

http://www.hixie.ch/advocacy/xhtml

Basically, sending xhtml as text/html makes the browsers interpret it as tag soup anyway.

Reasons XHTML is better:
1. It's been the W3C recommendation since 1999, replacing HTML 4.01. Thats 6 years.
2. It's a simple transition to learn as the Web moves toward increasingly more XML.
3. It's far more flexible than HTML with being accessible to wireless devices, screen readers and other devices for the disabled (which helps by leaps and bounds with meeting accessibility guidelines and the U.S. 508 guidelines ).
4. XHTML is a cleaner, more logical markup.
5. XHTML is faster to parse (must faster), and thus display.
6. I like the idea of giving the browser what it prefers.
7. Its extensible in lots of fancy ways none of us need.

And you are right, sending pages as application/xhtml+xml to IE causes it to prompt to download, which is why it is best to use content negotiation.

Content Negotiation gives browsers what they prefer, old browser like IE get html and new browser get XHTML. It not only sends the header, but converts the content.

See here for a nice thing on content negotiation:
http://www.autisticcuckoo.net/archive.p … egotiation

Delivering documents in XHTML causes lazily written javascript to break (due to reliance on innerhtml and document.write, rather then DOM methods like CreateElementNS()) and can cause CSS to be displayed differently.

Anyway, punbb does not work in xhtml mode. The forums function just fine, but the styles are all broken. I dont have the time right now to check out why. I just registered here on the forums to point this out, in hopes it will get fixed. Punbb is claiming XHTML support, but it isn't really smile

Anyway, I don't have too much time right now (which is why I haven't looked through the CSS files myself to find out why they are breaking - something I will attempt later tonight), so I gotta wrap this post up.

Here are some easy steps to add content negotiation to Punbb.

1. In header.php:

After the "Send no-cache headers" add:

$xhtml = false;
    if (preg_match('/application\/xhtml\+xml(;q=(\d+\.\d+))?/i', $_SERVER['HTTP_ACCEPT'], $matches)) {
        $xhtmlQ = isset($matches[2]) ? $matches[2] : 1;
    if (preg_match('/text\/html(;q=(\d+\.\d+))?/i', $_SERVER['HTTP_ACCEPT'], $matches)) {
        $htmlQ = isset($matches[2]) ? $matches[2] : 1;
        $xhtml = ($xhtmlQ >= $htmlQ);
    } else {
        $xhtml = true;
    }
}
//  Here we can sniff the UA's and override the negotiated value if we want. While sniffing the UA is generaly not a good idea, there are cases where it
// is necessary or actually is a good idea. For example, here I am sniffing the WCC validators, because they do not properly declare XHTML support.
if(stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")     ||
   stristr($_SERVER["HTTP_USER_AGENT"],"W3C_CSS_Validator") ||
   stristr($_SERVER["HTTP_USER_AGENT"],"WDG_Validator")) {
    $xhtml = true;
}

Find this line in header.php:

<link rel="stylesheet" type="text/css" href="style/<?php echo $pun_user['style'].'.css' ?>" />

and change it to:

<link xmlns="http://www.w3.org/1999/xhtml" rel="stylesheet" type="text/css" href="style/<?php echo $pun_user['style'].'.css' ?>" />

2. Create xhtml.php in the include folder in punbb:

<?php
/*  Here we give the browser the XHTML or HTML headers. We also give the correct document type declaration and document type declaration, and language tag  */
if ((isset($xhtml)) and ($xhtml)) {
    $headerstring = 'Content-Type: application/xhtml+xml; charset=' . $lang_common['lang_encoding'];
    header($headerstring);
    echo '<?xml version="1.0" encoding="utf-8"?>', "\n";
    echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">', "\n";
    echo '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"';
} else {
    $headerstring = 'Content-Type: text/html; charset=' . $lang_common['lang_encoding'];
    header($headerstring);
    echo '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/DTD/strict.dtd">', "\n";
    echo '<html lang="en"';
}
?>

3. Edit template files (given stock templates). Take everything before pun_head, and remove it and replace with:

<pun_include "include/xhtml.php"> dir="<pun_content_direction>">

4. Edit footer.php. Write before the end, before "// Spit out the page" add:

// If page is being delivered as HTML, take the buffer and convert it to HTML
if ($xhtml != "true") {
    $xml = array('/>', 'xml:lang=', 'xmlns="http://www.w3.org/1999/xhtml"');
    $html = array('>', 'lang=', '');
    $tpl_main = str_replace($xml, $html, $tpl_main);
}

Upload these changed files, and now in XHTML capable browsers, the page will be delivered with the correct xhtml headers. On html page, the page will be converted to html (the conversion is all I need, your xhtml app might require additional conversion, although probably not unless you are using xhtml specific extensions like mathml). with the appropriate headers.

If your site is like mine, you will see that the styles of punbb are no longer properly applied.

I like to add a query string that lets me force either html or xhtml for testing.  In header.php AFTER the content stuff we added, add:

/*  I want to give a query string option to override this content negotiation. So, lets check for two force queries and set our output o match    */
if (isset($forcexhtml) and $forcexhtml == "true") {
    $xhtml = true;
}
if (isset($forcehtml) and forcehtml == "true") {
    $xhtml = false;
}

That will give you querry strings to force the output either way (if you have register globals off, you'll have to compensate).

So, anyway I gotta run, and I'll look into this more later, but this is the only thing I've ran into with punbb. Hopefully we can fix it so punbb has *real* xhtml support.

8

Re: XHTML Mime type

Thanks but I have it running on a test bed with content negotiation already except I see no reason to change the doctype for IE, just send it text/html.

The reason the stylesheet fails is that its partly uppercase plus the imports might be causing difficulty. It really should be sent as an xml stylesheet as well. If you have PunBB running successfully as application/xhtml+xml then it is only because the browsers are not being as strict as they should. It really shouldn't work. The good new is that it is quite easy to fix everything. The real questions is whether it is worth the bother but fortunately thats not my call. Of course, what might tip the balance is if IE7 can accept the correct mime type though I somehow doubt that will happen.

BTW: In my view Hickson is wrong.

Re: XHTML Mime type

Paul wrote:

Thanks but I have it running on a test bed with content negotiation already except I see no reason to change the doctype for IE, just send it text/html.

Content NEGOTIATION goes beyond just sending a mime type.

Take the doctype for example. You know that sending xhtml as text/html causes it to be parsed as html, yet you see nothing wrong with still declaring it as xhtml? Because there is something wrong there.

And saying "well, it works", isnt a good response. The crappy output of PhpBB "works", but obviously you and me would agree punbb's is far, far better.

Your giving a browser xhtml, knowing that it is parsed as html, and still calling it xhtml.

Proper content negotiation involves more then that. First, it involves determining what a broswer PREFERS based on its requests. Not just saying "oh i know i should give browser x this and browser y that", but actually obeying the Accept header.

Second, it involves actually giving the browser what it prefers. Just sending the mime type isn't enough, you must send your document as html, not xhtml. That means taking xhtml (ex: <br />) and making it html (ex: <br>). Sending the "/>" to a browser that is parsing as html is wrong, because it is NOT VALID html. It should be interpreted as "tag terminator followed by a greater-than sign", but most browsers ignore the / and treat it as a >. That doesnt mean we should keep doing it the wrong way.

Try validating an xhtml page as html. Doesn't work, does it?

Proper content negotiation involves giving a browser what it prefers based on its accept header, and actually transforming that content to be the proper media.

Paul wrote:

The reason the stylesheet fails is that its partly uppercase plus the imports might be causing difficulty. It really should be sent as an xml stylesheet as well.

No, imports work fine in xhtml. They have the added benefit of being not supported in old browsers (IE and Netscape 4.x), which are browsers that are not likely to understand even moderately complex css anyway.

As far as "should be sent" as an xml stylesheet, are you sure about that? Does it say that anywhere in the CSS or XHTML specs? I was lead to believe that adding the xml namespace to the LINK attribute was perfectly OK. This may be wrong though. So, ?

As far as uppercase, I just now tried taking a style and converting it entirely to lowercase. That fixed a ton of the display issues, but not all of them, its still unusable.

Paul wrote:

If you have PunBB running successfully as application/xhtml+xml then it is only because the browsers are not being as strict as they should. It really shouldn't work. The good new is that it is quite easy to fix everything.

When I said " forums function just fine" in my post I meant that they FUNCTION fine, they style/layout is not correct. Which you knew smile

So is this gonna be tackled then?

Paul wrote:

BTW: In my view Hickson is wrong.

I wont say sending XHTML as HTML is bad, but I think he has a point. All over the internet are people writing (what they think) is xhtml, and being happy that it works because some validator says it does. But when the day comes that it is actually sent AS xhtml, a strict parser is gonna spit errors at them, and they'll blame xhtml. That's what Hickson was getting at.

The point is, if we are gonna send it as html, then lets write it as html and declare it as html.

If we are gonna send it as xhtml, then lets write is as xhtml and declare it as xhtml.

Anything in between is not correct.

Are you a punbb developer? What happens here? What route is punbb going to take? Switch to html? Obiously using straight XHTML is out (thanks again IE). Or use content negotiation to have the pages be written in xhtml and delivered as xhtml to UAs that prefer it and HTML to UAs that prefer that?

With the above changes I've posted, I've got a great little content negotiation system going. Until I get the xhtml displaying right though, I've got the output forced to html right now.

At the very least, we should get punbb's display to work the same in xhtml as in html. If you don't want to add in content negotiation and keep sending xhtml as html, then that's your priority, but we should at least get it displaying right for those who set it up properly. Punbb has very nice CSS/XHTML output. To get this far, 95%, and say screw it its good enough would seem silly.

10

Re: XHTML Mime type

Oops, I hit reply hours ago and so I didn't pick your edit.

Paul wrote:

The real questions is whether it is worth the bother but fortunately thats not my call.

Whose call is it? Lets get them in this discussion!

Paul wrote:

Of course, what might tip the balance is if IE7 can accept the correct mime type though I somehow doubt that will happen.

I highly doubt that as well. I don't expect much at all in the realm of standards from them. All they've done so far is fix a few css bugs, ones that we had already figured out btw.

And one more thing, there is a reason XHTML 1.1 and the 2.0 draft don't have the HTML backwards compatability clauses that xhtml 1.0 had in its transitional spec. And that is because in retrospect, it was a bad idea that caused this mess xhtml is in now.

And one last thing, after rereading my last post, this is all meant to be positive feed back people smile I LOVE punbb, and thank everyone who has worked on it.

11

Re: XHTML Mime type

Found the xml stylesheet stuff. Gonna see if it makes a difference.

http://www.w3.org/TR/xml-stylesheet/

12

Re: XHTML Mime type

Okay, I fully switched over to using xml-stylesheets, still didnt fix it. Making the css lowercase helps alot, but the pages still flow wrong. Stuff that should be to the left is not.

I won't post my updated content negotiation code unless I get asked for it. I don't mind though, if its wanted.

So lets get this working in xhtml mode!

Re: XHTML Mime type

i don't think theres much point since like 80%+ people won't be able to use it hmm

14

Re: XHTML Mime type

badrad: PunBB is developed by Rickard (see copyright). I write the markup and css but Rickard decides what goes in and what doesn't.

I just posted the question because I wanted to see what level of interest, if any, there was in part because I saw a similar discussion somewhere regarding TextPattern. There is no possibility of going back to html. Neither is there any possibilty of content negotiation (unless I'm told different). The notion I was toying with was whether the markup and css should be put into a state where it could be served as xhtml. That way anybody who needed or wanted to serve it as xhtml would still have to mod PunBB but they wouldn't have the tedious job of having to sort out a lot of error generating markup and css as well. Put another way, going from xhtml 1.0 Strict to xhtml 1.0 Really Bloody Strict.

15

Re: XHTML Mime type

Paul wrote:

I write the markup and css.

Cool. Well, well done! Punbb has very slick markup. When I was looking at message board software I was blown away when I found this.

Paul wrote:

I just posted the question because I wanted to see what level of interest, if any, there was in part because I saw a similar discussion somewhere regarding TextPattern. There is no possibility of going back to html. Neither is there any possibilty of content negotiation (unless I'm told different). The notion I was toying with was whether the markup and css should be put into a state where it could be served as xhtml. That way anybody who needed or wanted to serve it as xhtml would still have to mod PunBB but they wouldn't have the tedious job of having to sort out a lot of error generating markup and css as well. Put another way, going from xhtml 1.0 Strict to xhtml 1.0 Really Bloody Strict.

Sounds good. I pretty much figured that was the situation.

Well, count my vote on getting the xhtml working under very strict parsing smile. Have you asked Rickard about it?

16

Re: XHTML Mime type

badrad wrote:

Well, count my vote on getting the xhtml working under very strict parsing smile. Have you asked Rickard about it?

Not yet. I wanted to see how much or how little work was involved and whether there were any knock on effects. I'm tidying up the markup at the moment (tidying up = removing stuff) and some things will get done as part of that anyway. It may well be the case that the changes needed to the final markup for 1.3 will be very trivial.  In many ways its all part of the standard design process. First make it work, then make it work better and finally make it work better with less and cleaner markup.

17

Re: XHTML Mime type

Alright. Thanks again for your work Paul.

Re: XHTML Mime type

Even though I am the lead developer, I know very little about markup, style sheets and W3C standards in general. I am willing to go either way based on your expertise.

I have a question. Does sending stuff as application/xhtml+xml affect what happens when a page is not well formed? I'm thinking primarily of all the people that integrate forum setups into their existing websites designs. Also, we must not forget that PunBB is currently not UTF-8 safe. You can still break a page's "well-formedness" by posting characters that are not part of that page's content type.

I'm seriously considering going all UTF-8 in PunBB 1.3 to avoid the content type soup. This will lead to some problems for people running MySQL 4 and earlier and it will slow some things down a big, but I really have no idea how much.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

19

Re: XHTML Mime type

There is no way I would suggest serving PunBB as application/xhtml+xml for the reason you gave. A non-valid page served as application/xhtml+xml will simply fail to output completely though you might get a polite xml error message.

What I was contemplating is auditing the markup and css so that it is capable of being served as application/xhtml+xml if somebody wants to mod it that way without them having to trawl through all the php files sorting out trivial markup glitches. For example, xml only recognises 6 character entities which don't include   so these would be replaced with the numeric references (or literal character if you go the UTF-8 route). Things like html comments would also be removed and care taken to ensure emply elements were not generated. As far as a user serving PunBB as text/html is concerned it should make no difference to them whatsoever. If somebody wants to serve it as application/xhtml+xml then they will certainly know what they are doing as far as site integration is concerned. The best way to think about it is as a tighter form of quality control.

As regards UTF-8, my only concern is that PunBB's major selling point is it's speed and I'm sure many people who use it accept the absence of some features only because they think its a good trade off to get a blisteringly fast forum. Anything that lessens that speed advantage really needs to be thought about very carefully.

20

Re: XHTML Mime type

Rickard wrote:

I have a question. Does sending stuff as application/xhtml+xml affect what happens when a page is not well formed?

A (usually ugly) XML error message is displayed.

Paul wrote:

There is no way I would suggest serving PunBB as application/xhtml+xml for the reason you gave. A non-valid page served as application/xhtml+xml will simply fail to output completely though you might get a polite xml error message.

As much as I hate to say it, I agree here. By DEFAULT punbb should not do content negotiation, because for the people out there who think they are writing xhtml and aren't really when they install punbb and their pages break, they'll blame punbb.

What I suggest is getting punbb's markup to be capable of being served as xhtml+xml. I also don't see a problem with adding content negotiation to punbb, but defaulting it to off.

Paul wrote:

What I was contemplating is auditing the markup and css so that it is capable of being served as application/xhtml+xml...

Speak of the devil, just what I was saying smile. Some of the UTF-8 pages I link to below have some cool functions for converting character references and entities to UTF-8.

Rickard wrote:

I'm seriously considering going all UTF-8 in PunBB 1.3 to avoid the content type soup. This will lead to some problems for people running MySQL 4 and earlier and it will slow some things down a big, but I really have no idea how much.

I am all for full UTF-8 support, I think it would be a great idea.

How were you planning to do it? Why do you think it will have a noticeable speed impact. There is a lot of great info out on the web about converting stuff to UTF8:
http://annevankesteren.nl/archives/2004/06/utf-8
http://annevankesteren.nl/2005/05-character-references
http://annevankesteren.nl/archives/2005 … mment-3988

Anyway, the only real pain is the MySQL 4 thing. Many people with shared hosting are probably still on 4.0.x MySQL installations. If you switched to full UTF-8 support, what would happen to those people with older 4.0.x MySQL set ups?

Paul wrote:

As regards UTF-8, my only concern is that PunBB's major selling point is it's speed and I'm sure many people who use it accept the absence of some features only because they think its a good trade off to get a blisteringly fast forum. Anything that lessens that speed advantage really needs to be thought about very carefully.

I agree that anything that lessens the speed needs to be thought about carefully, to a point though. Yes obviously I am willing to not have a PM system (isn't that why we have email?) for a speed increase, but here we aren't talking about a feature, we are talking about the quality (and usefulness) of the code output.

21

Re: XHTML Mime type

If you guys make these changes, do you think they'll make it in 1.3?

I ask simply because if they will I'm not gonna bother with some CSS styling I was gonna do.

22

Re: XHTML Mime type

My aim is to get the markup as efficient and cruft free as possible for 1.3 and then freeze it for a good long time so the stylers can be sure their work isn't going to be rendered useless in the near future. This should be pefectly possible, once the markup is reduced to strictly what is necessary then there isn't really anywhere else to go anyway, save for a major design change.

Re: XHTML Mime type

badrad wrote:

How were you planning to do it? Why do you think it will have a noticeable speed impact.

Well, the main issue is to make all string functions UTF-8 aware. As it is now, PunBB utilizes a bunch of PHP's string functions. For example, strlen() to count the number of characters in a string. This function is not UTF-8 aware, so an UTF-8 string containing 5 non us-ascii character is considered to be 10 characters long. A possible fix for strlen() is to first run utf8_decode() on the string and then run strlen(). utf8_decode() simply converts the UTF-8 string to ISO-8859-1 (and replaces any characters it cannot find in ISO-8859-1 with question marks) so that the length of the string can be checked. A different approach is to use an UTF-8 aware regular expression to count the number of characters. There are lots of alternatives. Some functions are a bit trickier to fix. substr() is one of them.

An alternative I did not mention above is to make PunBB rely on the PHP extension mbstring. mbstring contains unicode safe versions of pretty much all PHP string functions. The problem with mbstring is that it is not available in all PHP installs.

I mentioned earlier that speed might be an issue. I really have no idea to what degree the speed would be affected. I think that as long as one is aware of the potential speed caveats, it shouldn't be a problem. I mean, if strlen() is run on the username when registering to make sure it is between 2 and 25 characters, it doesn't matter if the UTF-8 version is 20 times slower. On the other hand, if some string functions is called once or several times per post in viewtopic, there are reasons for concern.

badrad wrote:

Anyway, the only real pain is the MySQL 4 thing. Many people with shared hosting are probably still on 4.0.x MySQL installations. If you switched to full UTF-8 support, what would happen to those people with older 4.0.x MySQL set ups?

The problem with going from ISO-8859-1 (or whatever) to UTF-8 is first of all that all data in the database needs to be converted to UTF-8. I believe the only way to do this is to create a script that fetches all the data in the database, runs utf8_encode() on it and then inserts it back in. This sounds a lot easier than it is. Maybe MySQL 4.1 and later has an easier way of migrating to a new character set, but I'm not sure. Anyone? The second problem is that since MySQL 4.0 and previous are not UTF-8 aware, sorting and string functions will not work as expected. One side-effect of this is that the user list will not be sorted properly. Another side-effect is that some data might get truncated. The latter is not something I am all that concerned about though. Most of the string fields in the PunBB database structure are a lot larger than what they need to be (e.g. username is 200 chars even though usernames can only be 25 chars).

If anything, I think this would be a feature for 1.3. When that happens, I have no idea.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

24

Re: XHTML Mime type

Well, sounds good to me. I think moving to UTF-8, if reasonably doable, is a good idea, and I think making the output xhtml+xml is a VERY good idea.

Thanks again for your hard work you two. Oh, and I like both your signatures.

Anyway...I almost hate to ask, I know how working on projects like this is, but, perhaps could you give me an ultra rough guess on the release date for 1.3. Literally choosing between these two would be nice for me to know at this point:

A. Punbb 1.3 might be out in less then two months.
B. Punbb 1.3 will probably take more then that.

Because if its going to be out in a few months or less, I'm not gonna go all out customizing my current install. You don't have to answer though smile

25

Re: XHTML Mime type

I would have said B (if I'm wrong a panic attack will shortly ensue)