Originally published on Digital Book World's "Expert Publishing Blog" 13th Feb 2015.
Many digital publishers agree HTML5 is one of the best mediums not only for outputting eBookeBookeBook content but for authoring it, too. Personally, I’m not so sure.
At this year’s Digital Book World conference, I attended a panel session titled, “In Publishing’s Multi-Tech Future, Is HTML5 the ‘Magic Bullet’?” The main debate was centered on how HyperText Markup Language (HTML), and in particular HTML5, can revolutionize the process of producing a book. I say ‘debate,’ but in reality the entire panel was in favor of using HTML5 not only as an output medium for eBooks but also for authoring them in the first place. But I’m not convinced HTML5 is really the best way for authors to write content.
Let me preface this entire argument by saying that I am a huge fan of HTML5. As a software engineer, I use HTML5 and its associated standards and languages pretty much every day. I have written blog posts, articles and even a book about it. I think HTML5 has improved the Web enormously in recent years and led to a much better experience for everyone who uses it.
There are many benefits to using HTML in the production process as well, particularly when compared to using tools like Microsoft Word or standards like DocBook XML. It’s relatively easy to understand, it is heavily standardized and, most important, it’s easy for machines to parse and apply customization and styling using Cascading Style Sheets (CSS). I fully agree that HTML and associated standards should be used in the production process.
What I don’t agree with is that the manuscript itself should be produced in HTML by an author. Yet it appears that’s what some publishers are touting as the future.
As an author, I don’t want to write a book in HTML. Inevitably, if authors are to use HTML, they will skip using markup altogether (imagine starting to write a paragraph by typing …) and will likely use a What-You-See-Is-What-You-Get (WYSIWYG) editor to write their book as easily as they compose a Word document. These tools generate HTML code automatically in the background as you type. Unfortunately, this is likely to produce HTML that is messy, has custom formatting and may cause pain later in the production process.
HTML is also too flexible as it stands. It is less complex than the likes of DocBook, but it also has a vast array of highly flexible elements, many of which serve no purpose in a book, whether in print or digital. It’s also very easy to create custom styles using CSS, which may not fit in with the publisher’s own style. As someone who leads a team of software engineers who work extensively with HTML, I’ve seen just how different markup can look when written by two or more different people.
This problem leads to publishers imposing restrictions or limited schemas on how authors use HTML to produce book content. Generally, there are no tools to enforce these restrictions, leaving publishers to come up with their own or to manually validate that authors have adhered to them in generating their manuscripts.
One of the speakers on the panel at DBW was Sanders Kleinfeld, Director of Publishing Technology at O’Reilly Media. O’Reilly created HTMLBook, a standard for writing books in XHTML5, a stricter XML-based variant of HTML5. HTMLBook is a subset of HTML5, containing only elements that are relevant to books, with data- attributes used to achieve new functionality that lies outside the scope of existing HTML elements. It’s an impressive standard, but I struggle to see it being embraced by authors unless they can use a WYSIWYG editor to produce it.
So what’s the solution here? For me, it’s not HTML, but rather a markup language like Markdown. Markdown is a very straightforward human- and machine-readable syntax for producing documents that offers limited formatting options. Instead of using elements and classes, Markdown formatting is much more simple. To italicize text, you wrap it in asterisks, *like this*. To produce paragraphs, you just leave a blank line between the them.
The image below illustrates the source of a Markdown document on the left-hand side and what the output of this looks like in a Web browser. A major benefit of Markdown is that the source itself is very easy to read.
HTML was designed as an output language. Markdown and similar languages were designed as input languages that are to be compiled into other output formats. Markdown is typically translated into HTML for output on Web pages, but it can also be translated into many document formats. Pandoc is a tool that takes a Markdown (or similar language) document as an input and produces output as HTML, Microsoft Word, EPUB, DocBook, InDesign, OPML, LaTeX, PDF and more. You can supply custom templates to ensure that output matches your own style.
Markdown itself is very limited in terms of the types of formatting it allows:
paragraphs and line breaks
headers
block quotes
lists (bulleted and numbered)
code blocks
horizontal rules
links
emphasis
inline code
images
For many types of book content, this is likely more than enough. If you need things like tables, footnotes, citations, math formatting, captions, definition lists and document metadata, you can use either a similar language that supports these or extended versions of Markdown itself such as MultiMarkdown. If you ever need to allow some HTML within your Markdown files, you can even enable HTML support to achieve this. Of course, there are some books where this won’t work—highly interactive ebooks, fixed format books and the like—but arguably HTML itself is not the answer there, either.
As an author, I have used Microsoft Word and XML to produce books and articles in the past. I’ve also used WYSIWYG HTML tools to produce blog posts, and I’ve even handcrafted entire pieces using raw HTML code. But over the past eighteen months or so, I have written almost exclusively using Markdown. Not only does it offer ultimate flexibility in terms of output, but it also removes a lot of distractions like formatting options from the writing process. When I set out to produce new content, I use a basic text editor (with Markdown syntax highlighting enabled), and I write. I love it, my editors love it and my publishers love it.
--Joe Lennon, Chief Technology Officer at Vearsa
You'll hear from us!