We’ve been exploring ebook production here at Sherprog, and it seems like the best way to produce a high-quality EPUB or MOBI ebook using consumer tools involves starting from HTML. I’ve written about the inadequacies of InDesign’s export to HTML and EPUB, which is a problem many small publishers undoubtedly face. But individuals who publish or self publish ebooks are probably working from Microsoft Word or another common word processor. Unfortunately, Word also tends to produce messy HTML via its native save-as HTML function. So how do you get clean HTML from Word? I would like to present a life hack.
When you export HTML from Microsoft Word, what you tend to get is a class and a span with styling info for every paragraph. This is highly unnecessary, and frustrating if you are trying to control your text’s formatting for web or ebook publication. In order to clean up this HTML to any web developer’s reasonable standards, you would have to remove all these tiresome span tags and CSS declarations just so you could do the same work with a handful of human-designed CSS declarations for paragraph style. I have done this before, and manually. It took several hours to work through a novel-sized manuscript using search and replace to knock out these span tags one related group at a time in a text/code editor.
Necessity is the mother of hack.
That is when I noticed that I routinely pasted Microsoft Word content into WordPress and could hit publish and magically get a reasonable webpage every time. I went to a blog post and used my browser’s “view source” option to take a look at exactly what was happening and, bingo-automattico, there was beautiful, simple HTML with every paragraph in a nice <p> tag and not a lot else going on! Continue Reading →