Codeistry

Pasting from Microsoft Word into Wordpress or MODx

Just in case anyone else gets caught out by this, here's the deal:

When you copy & paste from Microsoft Word, into web-based systems like Wordpress or MODx, MS Word tries to bring its internal formatting along for the ride. All text in MS Word is styled in some way - fonts, font sizes, alignment, etc... so when you copy & paste it into your CMS, blog or whatever, this formatting comes along too, like an invisible hitch-hiker.

Because these styles are specified at a low level - i.e. on the actual text itself, these hitch-hiking styles tend to override your website's own visual styles. So everything looks normal, then suddenly you've got a couple of paragraphs in Arial 10pt when they're not supposed to be. This is almost never what you want - normally you want the website's own styles to be used for all content.

The way to check to see if this is causing your content formatting problems is to look for the proprietary mark-up that MS Word sneaks in. In Wordpress if you edit the post in question and hit the HTML tab and you see stuff like this:

i.e. XML tags starting with w: or mso., then this is content pasted from Word. This also works if you view the source of the actual webpage with the problem, generally by pressing Ctrl+U. If you see tags like those above or lots of stuff like this:

<p class="MsoNormal"....

i.e. lots of MsoNormal then this is text which got copied over from MS Word.

The easiest way to sort this out is to delete the offending text and re-do the copy & paste from Word, but using the 'Paste as Plain Text' button. This is on the 'Kitchen Sink' toolbar in Wordpress:

This shows the 'Kitchen Sink' toolbar, with the paste as plain text button:

The paste as plain text button is the fourth one from the left on the default kitchen sink toolbar.

There's an identical button on the editor toolbar in MODx. When you use the Paste as Plain Text button to paste your content in from MS Word, the extra formatting gets stripped off and your text formatting will regain its harmony.