onvert an HTML Site to XHTML
Because I write the Web Design/HTML site for About, I am often asked why my source code does not conform to Web standards. There are many problems with it, ranging from the minor (the code isn't standards compliant) to the more severe (missing alt text and lack of quotes around many attributes). The reason for this is that I only write articles for About, I don't control any of the surrounding HTML.
Since About uses a content management system, I don't even have a lot of control over the HTML that is inside my articles once they are published.
I had a chance to speak with one of the About developers and I asked why they insisted on writing what is, in my opinion, really bad HTML code. They told me that they do it for two reasons:
Bandwidth costs
With over 700 sites on About receiving millions of pageviews a day, even 1KB of extra characters on a page can result in huge transfer costs.
1. By cutting out things like alt tags and quotes they can save a lot of money.
2. Compatibility
The audience for About Web pages is huge with a lot of different browsers, moving to more modern HTML variants like XHTML 1.0 and CSS positioning will make it more difficult for older browsers to view the pages.
While I can't do anything about the second point, it is possible to streamline the code that is sent across the wires using XHTML and CSS to create a standards-compliant Web page.
So I Decided to Make Web Design/HTML Standards Compliant *
First I took a snapshot of the front page as I wanted to copy it. This way if About decided to launch a redesign in the middle of my changes I would still have a basis for comparison. About using a lot of JavaScript for their ads, and the JavaScript is often built into the layout, look, and feel of the page, so to get a really good feel for it, I had to remove all the JavaScript.
In order to start fresh with the page, I then converted it to valid XHTML. I did this in the following steps:
1. Converted all the tags to lowercase (I used Homesite's code sweeper for this)
2. Put double quotes around all attributes (again code sweeper)
3. Added a DOCTYPE and HTML namespace
4. Searched for all the img tags and added alt text and a trailing slash to close the element
5. Searched for all the link tags and added a trailing slash
6. Searched for all li and option tags and added the closing tags
7. Removed all font tags and other non-XHTML tags
8. Searched for special characters and converted them to SGML codes
9. When I thought it was ready I sent it through the W3C Validator. I got about 600 errors, but I just went through them one at a time and corrected them. For more help, see my article on Using an HTML Validator.
Once I had valid XHTML, I needed to convert the page into logical blocks so that they could be moved to their correct locations on the page. This resulted in a page with "semantic layout", and it is essentially all the page elements listed one by one on the page. It's not very pretty, because all the layout and decorative tags have been removed.
Then the Fun Starts
Once I had the page layed out semantically, I could start to lay it out on the page in the design of About. I created a separate CSS file just for the layout. The reason for doing this is that then if you want to change the fonts, colors or other design aspects of a page, you can do that easily without worrying about changing where they are sitting on the page.
As you can see, the page is starting to look like About Web Design / HTML - without colors or fonts.
Creating the page to have the correct fonts and colors was more CSS in a different CSS file. This markup CSS tells the page how it should look and act, and it looks awfully close to the original About Web page. (There are at least 2 errors that I know of that don't relate to browser specific links or the ads. Both of which are built with JavaScript.)
This page is only 19KB compared to 25 KB for the original page and 22 KB for the page without JavaScript. Yes, there are two separate CSS files (5KB and 2KB), but these CSS files would be used for every site on About.
And what's really neat is that if About wants to change the look of the site, it can be done by changing one CSS file. I created this Halloween skin in about 15 minutes, changing only the header graphic and the markup.css file.
Since About uses a content management system, I don't even have a lot of control over the HTML that is inside my articles once they are published.
I had a chance to speak with one of the About developers and I asked why they insisted on writing what is, in my opinion, really bad HTML code. They told me that they do it for two reasons:
Bandwidth costs
With over 700 sites on About receiving millions of pageviews a day, even 1KB of extra characters on a page can result in huge transfer costs.
1. By cutting out things like alt tags and quotes they can save a lot of money.
2. Compatibility
The audience for About Web pages is huge with a lot of different browsers, moving to more modern HTML variants like XHTML 1.0 and CSS positioning will make it more difficult for older browsers to view the pages.
While I can't do anything about the second point, it is possible to streamline the code that is sent across the wires using XHTML and CSS to create a standards-compliant Web page.
So I Decided to Make Web Design/HTML Standards Compliant *
First I took a snapshot of the front page as I wanted to copy it. This way if About decided to launch a redesign in the middle of my changes I would still have a basis for comparison. About using a lot of JavaScript for their ads, and the JavaScript is often built into the layout, look, and feel of the page, so to get a really good feel for it, I had to remove all the JavaScript.
In order to start fresh with the page, I then converted it to valid XHTML. I did this in the following steps:
1. Converted all the tags to lowercase (I used Homesite's code sweeper for this)
2. Put double quotes around all attributes (again code sweeper)
3. Added a DOCTYPE and HTML namespace
4. Searched for all the img tags and added alt text and a trailing slash to close the element
5. Searched for all the link tags and added a trailing slash
6. Searched for all li and option tags and added the closing tags
7. Removed all font tags and other non-XHTML tags
8. Searched for special characters and converted them to SGML codes
9. When I thought it was ready I sent it through the W3C Validator. I got about 600 errors, but I just went through them one at a time and corrected them. For more help, see my article on Using an HTML Validator.
Once I had valid XHTML, I needed to convert the page into logical blocks so that they could be moved to their correct locations on the page. This resulted in a page with "semantic layout", and it is essentially all the page elements listed one by one on the page. It's not very pretty, because all the layout and decorative tags have been removed.
Then the Fun Starts
Once I had the page layed out semantically, I could start to lay it out on the page in the design of About. I created a separate CSS file just for the layout. The reason for doing this is that then if you want to change the fonts, colors or other design aspects of a page, you can do that easily without worrying about changing where they are sitting on the page.
As you can see, the page is starting to look like About Web Design / HTML - without colors or fonts.
Creating the page to have the correct fonts and colors was more CSS in a different CSS file. This markup CSS tells the page how it should look and act, and it looks awfully close to the original About Web page. (There are at least 2 errors that I know of that don't relate to browser specific links or the ads. Both of which are built with JavaScript.)
This page is only 19KB compared to 25 KB for the original page and 22 KB for the page without JavaScript. Yes, there are two separate CSS files (5KB and 2KB), but these CSS files would be used for every site on About.
And what's really neat is that if About wants to change the look of the site, it can be done by changing one CSS file. I created this Halloween skin in about 15 minutes, changing only the header graphic and the markup.css file.
Comments