Guide

How To Clean Up HTML Pages Created by Microsoft Word

Do you use Microsoft Word to create HTML pages for your website? If you do, it’s likely that you’re actually costing yourself money in the form of bandwidth charges. This is because Microsoft, in its wisdom, has decided that what your HTML document “needs” is a whole bunch of extra “stuff” added to the final code. We’re talking stuff like your name, your company, heck, anything that’s listed in the “Properties” dialog in Microsoft Word will show up in your HTML document.

Another way Microsoft Word enlarges HTML pages unnecessarily is by adding tags such that Microsoft Word can later go back and edit the HTML document. While this is good if you plan on doing it, it’s quite unnecessary for a page destined to be read by web browsers.

How does this cost you money? Every web server I’ve ever encountered charges you a certain amount of money per month, which allows the user to store a certain amount of information on the web host’s servers, and to serve a certain amount of MB or GB each month to people viewing the website. Think of it this way – each time people view a web page, they are in effect downloading it. So if your page is huge, people viewing it have to download a huge page; if it’s smaller, on the other hand, they don’t need to download so much. And you don’t need to store such a big file on your web host’s servers, and you don’t need to serve out large files.

What to do, then, since Microsoft Word wants to cram all this extra stuff in your HTML pages? And we’re talking stuff that doesn’t affect how the page looks at all… it’s all just… there, waiting to be viewed by someone viewing the HTML of the page (as opposed to the output). Is there any way to shrink those web pages?

There is. A couple, actually. One way (the long way), is to go through the HTML code, line by line, removing everything you don’t need. This is the most effective way, but can also be really tedious, especially if the web page is long.

It is a hard bound task that requires a tremendous amount of patience and perseverance and therefore, most people would want to avoid it at all costs regardless of the HTML code being such an important source for websites despite the fact that costo sito internet isn’t as high as expected.

The next way, and the easiest, if you have a fairly small page, is to take the HTML page you created with Microsoft Word, and upload it to the Word HTML Cleaner (located at the Textism website). This is what happens to your web page, according to the converter: “Typographer’s quotes and dashes, and other non-ascii characters, are converted to HTML entities to increase their portability amongst browsers and operating systems. Basic styling and structure, as well as links, image references and tables, should come through intact. Everything else is stripped.”

If that doesn’t work for you, or else you need to strip down some HTML that’s more than 20 Kb (the Textism tool’s limit, unless you want to purchase a subscription), Google actually offers another option.

Everyone (it seems), uses Gmail these days, so take the actual Microsoft Word document (don’t convert it to HTML, just use the .doc file itself), and email it to your Gmail account. Now log into Gmail and instead of downloading the attachment, click to view it as HTML. It should then open up in a new window or tab.

You’ll want to use your browser’s ability to view a web page’s source code now, which should pop up in its own window. Simply copy the HTML and paste it into your favorite text editor (NotePad on Windows, TextEdit on Mac or Gedit/Kate on Linux). There is one line Google adds that you should remove:

Download the original attachment

After removing that, simply save your new text document with the .html extension, and you’re set! I did a bit of testing, and it turns out that using Gmail provides really good results. A basic Word document I had lying around weighed in at 47 Kb. Using Word to convert to HTML turned out a document that was a bit smaller, around 44 Kb. Using the Gmail method, on the other hand, turned out HTML that only measured 27 Kb! I compared the two pages side by side, and there was no difference, only that the Gmail version had the unnecessary tags and information stripped out.

So, if you have a web page, and are used to typing up content in Microsoft Word and saving to HTML, think again. There are more efficient ways of doing the same process, ways that will save you money, so why not give it a try?

Catherine

Catherine Han founded Murals Plus in 2017 and is currently the managing editor of the media website. She is also a content writer, editor, blogger and a photographer.

Leave a Reply

Your email address will not be published. Required fields are marked *

olux