Just Say No to Using Word for Web Publishing

One of the largest ways to significantly decrease the performance (and possibly the readability) of your website is to use a program such as Microsoft Word to create your web presence. While MS Word is useful for editing and creating documents that will be opened on the desktop, it generates horrific HTML code. But before we get into just how bad this stuff is, I would like to start off with just a very quick overview of what HTML is and why cleanly formatted HTML code is so important.

Before the internet and web pages came along, there was a standard way of creating formatting for documents called Standard Generalized Markup Language (SGML).  As the internet was in the process of being “born”, it became apparent that a standard markup language needed to be created so that documents could be exchanged between users.  You can thank CERN for the genesis of what would become HyperText Markup Language (HTML) and that allows you to use web sites such as Facebook, Twitter, Youtube, and Google.  Many of the tags in the very first draft of HTML still existed as of HTML v4.

HTML at it’s most basic is a standard way of creating content that web browsers (such as Internet Explorer, Chrome, etc) can understand. Over the years, we’ve gone from the HTML 1 specifications, to the current pre-release HTML5. In addition, you also have XHTML that is in use. Without going into a detailed discussion of the difference between SGML and XML document parsing, XHTML is a more strict version of HTML — you must close every tag, punctuation and certain characters have to use the ASCII numeric or symbolic representation, and tags are case-sensitive.

The reason that it is important to have well-formed and concise HTML is not only for performance, but also for readability and maintainability.  With regards to performance, every browser that reaches your page will have to “read” the document you create with Word and parse every line.  This may not seem like a big deal, but it will be as your website grows and generates more traffic – more on that later.

Secondly, you need to ensure that the document you create can not only be read by Internet Explorer, but other web browsers as well.  By using Microsoft Word, which generates HTML code with Microsoft specific extensions, you risk that the web page (or even content) you create may not be visible in non-IE browsers such as Firefox, Chrome, and Safari, thus limiting the audience of your content from the very start.

Finally, any content you create should be easy for you to maintain — if you can’t easily make a change (say by having to wade through several thousand lines of HTML code), it will cost you time and money to change even a single sentence.

Don’t believe me that Word is bad for HTML code?  Let’s get to an example, and provide proof of just how poorly Word works for this purpose.  I’m going to use the “Hello World” that most programmers use for their very first programming experience.

At the most basic, an HTML document for “Hello World” should look like:

<title>Hello World</title>
Hello World.

With Microsoft Word 2010 “unfiltered” (and I’m not going to paste the entire code here), it generates a 424 line, 21 K file. If you have Word 2010, you can see this for yourself pretty easily, but to give you an idea as to what it generates — and this is just the first 30 lines or so (“Hello World” doesn’t show up until line 421):

<html xmlns:v=”urn:schemas-microsoft-com:vml”
<meta http-equiv=Content-Type content=”text/html; charset=windows-1252″>
<meta name=ProgId content=Word.Document>
<meta name=Generator content=”Microsoft Word 14″>
<meta name=Originator content=”Microsoft Word 14″>
<link rel=File-List href=”Hello%20World%20(unfiltered)_files/filelist.xml”>
<!–[if gte mso 9]><xml>
<o:Author>Michael D. Viron</o:Author>
<o:LastAuthor>Michael D. Viron</o:LastAuthor>

Word 2010 “filtered” does a better job, but it is still a 39 line / 860 byte file:

<meta http-equiv=Content-Type content=”text/html; charset=windows-1252″>
<meta name=Generator content=”Microsoft Word 14 (filtered)”>
/* Font Definitions */
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
<body lang=EN-US>
<p>Hello World</p>

Compare that to the 8 lines in the most basic format (saved as a file, it’s 80 bytes).  So now that we’ve established that Word generates poor HTML code, why should you care?  Imagine your website traffic grows to 1,000 hits per month (easily doable) – your visitors will use 20.5 MB of bandwidth for just a “Hello World” page (vs the .08 MB from the well formed code).  As your traffic gets higher, this will become a much larger problem as many web hosts include only so much bandwidth on their plans per month, which includes your uploads and downloads of files from your website.  Many plans also limit your available disk space — while this is less of a problem now, wouldn’t you rather not have to worry that your content is eating up valuable space that could be put to other uses?

While we’ll get to looking at better options to use for HTML writing in the next post, in the meantime, please don’t use Microsoft Word to post web pages.  Or if you do, please use the “filtered” option if available in your version of word.  You’ll save yourself and your staff a lot of headaches.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s