Clean HTML from Word: a Hack

We’ve been exploring ebook production here at Sherprog, and it seems like the best way to produce a high-quality EPUB or MOBI ebook using consumer tools involves starting from HTML. I’ve written about the inadequacies of InDesign’s export to HTML and EPUB, which is a problem many small publishers undoubtedly face. But individuals who publish or self publish ebooks are probably working from Microsoft Word or another common word processor. Unfortunately, Word also tends to produce messy HTML via its native save-as HTML function. So how do you get clean HTML from Word? I would like to present a life hack.

The Problem

Not very clean HTML from Word

Classes and spans and styles oh MY

When you export HTML from Microsoft Word, what you tend to get is a class and a span with styling info for every paragraph. This is highly unnecessary, and frustrating if you are trying to control your text’s formatting for web or ebook publication. In order to clean up this HTML to any web developer’s reasonable standards, you would have to remove all these tiresome span tags and CSS declarations just so you could do the same work with a handful of human-designed CSS declarations for paragraph style. I have done this before, and manually. It took several hours to work through a novel-sized manuscript using search and replace to knock out these span tags one related group at a time in a text/code editor.

Necessity is the mother of hack.

That is when I noticed that I routinely pasted Microsoft Word content into WordPress and could hit publish and magically get a reasonable webpage every time. I went to a blog post and used my browser’s “view source” option to take a look at exactly what was happening and, bingo-automattico, there was beautiful, simple HTML with every paragraph in a nice <p> tag and not a lot else going on! Continue Reading →

Testing and Tweaking Your Ebook Using Calibre Conversion

In the previous post, we talked about choosing the right Calibre conversion settings for a general-purpose, reasonably-formatted ebook. In this continuation, I present a time-saving tip for using Calibre to reconvert your ebook during testing and tweaking. In programmer lingo, you might call it iterative testing. The publisher’s translation could be galley proofing, I guess.

Chapter Three: Reconverting Using Merge Book Records 

In which the protagonist saves time and frustration while device testing and tweaking his ebook, including how to use the Calibre merge function without screwing @*#( up. 

In my experience, when you first bring a book into Calibre to convert it, you should consider yourself squarely in the testing/tweaking phase and not in the publishing phase. You know that the moment you open your converted book you will find that typo that’s been there staring at you the whole time but which has somehow eluded your editorial eye until now. Or you will try the ebook on various devices and find it’s not working like you want. Whatever the case is, odds are you will have to change a few things and put it through Calibre conversion again–probably more than once.

You will quickly note that if you delete a book from your library and then re-import your source HTML/CSS, you will need to enter all of the metadata and settings over again. This can grow very aggravating if you have to re-convert your book more than a few times. By the fifteenth attempt, you may find yourself staring at the metadata fields glassy-eyed and wondering just what you named that book again. Continue Reading →

HTML to EPUB: Calibre Conversion Settings and How to Preserve Indents

Chapter One: HTML

In which the protagonist prepares an HTML document for Calibre conversion to EPUB format. Our story begins as the hero has completed the treacherous journey from InDesign, which is not designed to design ebooks at all, to clean, ebook-worthy HTML and CSS. And what a journey it has been, with action and romance and stylesheet declarations and dragons.

The first step to any good ebook is a clean, well-formatted source document. In my previous post, which was specifically about creating an EPUB from InDesign, I make the argument that the best way to generate an ebook is from HTML. This is because EPUB files are, at the core, HTML, and Kindle MOBI markup is supposedly not far removed from HTML either. This facilitates good, clean conversions, and when you do find formatting problems, they are easy to troubleshoot–if you know just a little HTML/CSS.

Here we will use Calibre to create an ebook from an HTML document and its resources. Calibre is a free tool and you can get it from http://calibre-ebook.com. Though Calibre was designed primarily for personal use, I think it is very handy for professional ebook creation when it is not being done in-bulk. (However, Calibre does offer command-line tools for automated conversions–but we’re not getting into that today.) Continue Reading →

InDesign EPUB Export Sucks (and How to Get Around It)

The Mystery of Ebooks

VertigoEbooks are still frequently a quandary to the small-to-mid-sized publisher. But with rising sales of ebooks and the popularity of mobile devices, there comes a time when you must look the ebook in the eye and face the future–or face the fad, at least. (I don’t know if ebooks as we know them will be around in ten years. I just know they’re around now so we better deal with them.) Anyway, realizing this, maybe you give in and poke the InDesign EPUB export button, just to see. InDesign chugs and spurts and gurtles a little bit and then spits out an ebook. You think, “Oh! How easy was that?” But then…the book is opened and the nightmare begins.

The nightmare is the slow-learned revelation that ebooks are not necessarily easy, despite the existence of tools that claim to be able to produce them from other file types at the push of a button. Only if your page layout is like that of a straightforward novel with no illustrations or special formatting is there a blessed chance in heck that any automated ebook export will produce a book that looks halfway good with no additional labor. This is especially true of many InDesign layouts, because you will have layers and graphic frames and fonts and style overrides up the wazoo. All this fancy formatting that is great for print will not translate. Here is why in a nutshell: Ebooks are basically HTML, and not advanced web2.0/webapp/skynet HTML, but stripped-down, carved in a stone slab as Cuneiform kind of HTML–no layers, limited positioning, tricky-to-non-existent font embedding…It’s barbarian by web design standards.

However, there is a solution to this, and that is to make it TAO. EPUB books don’t allow a full array of styling control, and so you must relinquish control. The solution to the issue is “Simplify, simplify, simplify.” Thoroughly simplify–or Thoreau-ly simplify, as in Henry David Thoreau-ly, since I believe that is his quote. In simplicity, your document will find the elegance of ebook beauty. Embrace white space, embrace the inability to layer or position, and embrace the fact that ereaders may or may not substitute their plain defaults for your special fonts. Most of all, you must embrace the concept of flowing text. You have almost no control over how any given page will look. Continue Reading →

Ada Lovelace Day 2013

Ada_Lovelace_Chalon_portraitThis is Ada Lovelace Day. I got the best possible reminder of that, this morning, from my dear friend Catherine Tuxbury. The folks who organize and promote this celebration encourage all of us to post today “about a woman in science, technology, engineering or maths whose achievements you admire.” And Cathy chose me for her contribution this year. I am flattered beyond belief.

This got me thinking about my own technology mentors and realizing that none of them were female.

Don’t get me wrong. I certainly have had strong female role models in my life including my mom, Elizabeth Gunn. And I have learned many lessons from the women with whom I have worked over the years.

But when it comes to mentors in the field of technology, the most influential ones happen to have been men. Continue Reading →

Goal setting: update on picking five things

Goal setting progress monitoring in action!In June, I kicked off my Summer of Product with a post describing how I intended to use the Zig Ziglar / Seth Godin Pick Four goal setting program as a framework for making the transition from mostly-freelance-programming to full-time startup founder.

This was a bit of a risk, making a public declaration. I’m just not a methodical, program-following sort of person. I was afraid of embarrassing myself by having to report, about now, that I’d filled out the workbook pages for a week or so and then just let the whole thing fade away. That’s has certainly been the fate of every attempt at journalling I’ve ever made. Continue Reading →

An iBook achievement

Screen shot 2013-09-05 at 2.30.27 PMApple Inc has such a gift for making small incremental achievements seem like huge victories. Our latest victory? All three of our ERG editions (English, French, and Spanish) are now for sale in the iTunes iBook marketplace.  It’s really just a small huzzah but it feels like a big deal to me.

Frankly, each of the ebook marketplaces presents its own set of challenges to the small publisher or self-published author.

Apple makes everything hard. Or, at least, it sure seems that way as you get started.  Only part of the toolset you must use to create and administer your ebooks in the iTunes marketplace is browser based.  The other part has to be installed on a local machine and that machine has to run OsX. So you have to own and use a Mac in order to publish an iBook. Doesn’t that seem just a bit narrow minded and self-serving to you? It does to me.

Most folks also have some trouble with the very strict epub validation step that Apple puts each epub file through. We did. But I have to say that the work we did to clean up the three ERGs so they would pass validation — and there was rather a lot of — also made the books more readable. I try not to complain, even about Apple, when the result is an improved product for our mutual customers. Continue Reading →

Ebook production: chaos and opportunity

Chaotic_mixingAugust was a busy month both in and out of the office. In house, we published the French Kindle edition of the ERG 2012, GMU 2012 : Guide Facile, and we successfully shepherded the English iBook edition through Apple’s tedious review process. We also learned, surprise!, that it takes at least as long for a change in the product description to get through Apple’s review process as it does the for the actual book to be reviewed. No quick fixes with Apple, ever.

Out on the road, I had the opportunity to meet face-to-face with a few specialty publishers who are either currently publishing ebook editions or are evaluating that option. Along with the phone conversations I had earlier, I’ve now had a whirlwind tour of the current state-of-the-art for small publishing houses.

The good news, for readers, is that most publishers are either on-board the ebook train or making their reservations as we speak. They know their readers want ebook editions. They are finding ways to supply those editions.  And they aren’t letting marketplace or technical uncertainties hold them back.

The bad news, for publishers, is that there’s no consensus yet on a best practice for producing those ebooks.  Each publishing house has had to, essentially, put together a DIY ebook production process

In my notes from the last five specialty publishers I’ve talked with at any length, I count a total of six separate production strategies. One house has experimented with two different processes and isn’t particularly satisfied with either yet.  Another has a ‘single’ strategy but it involves separate, parallel processes for producing their Kindle and Google Play editions.

Continue Reading →

Rails Routing with Phusion Passenger to support multiple apps on a single domain

Late last year, we published a custom Ruby on Rails web application for a college instructor. She wanted to provide her students with a specific learning activity and then monitor if, and how, it improved their mastery of some critical material. The app had two equally important parts: the UI front-end for students and the data collection backend for the instructor.

After the project was completed, our customer began using the app in her own teaching and showing it to colleagues.  Several expressed interest in integrating the app into their own courses. This was good news; the customer had always hoped she might be able to license the app to others and, perhaps, builds up a revenue stream from it.  She was eager to explore the business opportunity but wanted to have a strict separation of the data collected for each instructor, both for simplicity and for privacy considerations.

The ‘right’ way to provide this separation, if the current interest turns into a real opportunity, is for us to develop a more robust administrative back-end for the app, with role-based administration that would let each instructor manage his or her own students and their data.  But, in the spirit of providing a minimum viable product to allow the customer to explore the opportunity without sinking a lot of money into development, we came up with the idea of creating a single site with multiple copies of the application, each with its own database.  We figured that for a reasonably stable codebase, the duplication of installed code would create minimal problems for the first 2, 3, maybe even 5 customers.  After that, she’d have to invest in admin code if she wanted to support more.

And, at first glance, the approach seemed simple enough from a technical perspective. My famous last words.

In reality, this was a complex problem containing several hidden pitfalls, and requiring a high degree of knowledge in both Phusion Passenger and Rails configuration. I was surprised by how little documentation I could find on more advanced topics in the areas of Rails configuration, Rails deployment, and Rails routing. I found my ‘simple approach’ was actually a recipe for a few hard days of research, experimentation, and work.

Continue Reading →

The Ebook Cover: Graphic Design for the Under-Confident

Part two of two in a mini-series on ebook covers. The first part was about meeting marketplace specifications in the simplest way possible. Here we’re going to talk about the ebook cover design itself.

Graphic Design

Books get judged by their covers. One cannot over-stress how important visuals are for making a sale. Are you a graphic designer? How many clients have contracted your services? Unless your answer to this question is a non-zero positive integer, you might want to find someone else to help you.

Of all the aspects of producing an ebook as an individual who is self-publishing or as a small business, this is the one thing you really should consider outsourcing if you do not have the skills ready at your disposal. The human beast is a visual animal. It doesn’t matter what the inside of your book is like–nobody will see it if the outside screams unprofessional and low-quality product.

That’s an exaggeration. Actually professional publishers put out a lot of mediocre covers, and those books still sell. The cover is less-than-optimal because they are keeping production costs down, while the book still makes it into consumers’ hands because they have a great big marketing machine at their disposal. The difference between you and them is that you don’t have the goliath marketing machine. A great cover can only help you overcome this handicap.

The Blue Fox by Sjon (Bjartur, 2003)

A fine cover in my opinion. Anybody else get the sense Sjón is attempting to make a brand of himself? (Image: Bjartur)

For the bold DIYer who is not frightened by this attempt at intimidation, a few basic design guidelines can help you out. Like guidelines in any artistic medium, they are meant to be broken and fudged, but in general: Continue Reading →