Inside the Open eBook Publication Structure
Fuelling the eBook revolution
Should all go as planned for the Open eBook (OEB) Authoring Group, then the Open eBook Publication Structure (OEPS) will be the catalyst that fuels the eBook revolution. The 50-company-strong group has devised a non-proprietary specification structure that details an XML-based eBook file format and structure. Files in the new format, commonly known as OEB documents, are available for use by all purveyors of electronic book content. The major reason for the use of the OEB format is based on the premise that "...in order for electronic-book technology to achieve widespread success in the marketplace, reading systems must have convenient access to a large number and variety of titles...". Vendors seem to be off to a solid start with software-based eBook readers such as the Microsoft Reader and MobiPocket Reader already providing OEB-compliancy. Time will tell as to how this format compares with its more print-oriented competitor, Adobe's Portable Document Format (PDF). The remainder of this article will give a comprehensive outline of this new file format. (Note: Planet Publish's sister site is soon publishing all it's free eBooks for download as EPUB and PDF formats.)
Overview
The publishers and authors, referred to as 'content providers' by the OEPS, provide publications to one or more reading systems in a form defined by the OEPS specification. A publication is a set of files/documents comprising of various media types, text, and graphics to be published, and a reading system is a combination of hardware and software used to view the OEB document. OEB documents are essentially XML documents which conform to the OEPS.
- Basic OEB Document
- Extended OEB Document
The OEB specification is based on XML and hence ensures that for any basic OEB document, there is a syntax form that:
- is a valid XML document
- conforms fully to the OEB document DTD
- is expected to conform to XHTML 1.0 when that specification is issued
- is effectively previewable in typical version 4 HTML browsers
A publication that conforms to the above specification should include exactly one OEB package file. This is necessary for the reading system to recognize the objects within the publication.
OEB Package
An OEB package is a file that has the description about an OEB publication, namely its associated files and the access information. Simply stated, the OEB package specifies the OEB documents, images, bookmarks and other objects that make up the OEB publication and how they relate to each other.
It is also important to note that it is highly recommended that all package files use the extension ".OPF", to distinguish them from the other files making a publication. Package files are of MIME (Multipurpose Internet Mail Extensions) media type "text/xml". This specification does not define means for physically bundling files together to make a single data transfer object (such as zip or tar).
Whilst an OEB package must be a valid XML document conforming to the OEB package Document Type Definition (DTD), it is not required to physically include the OEB package DTD in every publication.
Inside the package file (".OPF")
The major parts of the OEB package file include:
- Package Identity—a unique identifier for the OEB publication as a whole.
- Metadata—Publication metadata (title, author, publisher, etc.).
- Manifest—A list of files (documents, images, style sheets, etc.) that make up the publication. The manifest also includes fallback declarations for files of types not supported by this specification.
- Spine—An arrangement of documents providing a linear reading order.
- Tours—A set of alternate reading sequences through the publication, such as selective views for various reading purposes, reader expertise levels, etc.
- Guide—A set of references to fundamental structural features of the publication, such as table of contents, foreword, bibliography, etc.
Cascading Style Sheets (CSS)
Cascading Style Sheets (CSS) are mechanisms that enable both authors and readers to append style (e.g. fonts, colors, spacing) to HTML and XML documents. That is, CSS are needed to define the appearance of XML documents. They use common desktop publishing terminology that should make it easy for professional as well as untrained designers to make use of its features. There are two ways to create CSS. Firstly, you can use a normal text editor to write the style sheets entirely by yourself. Or secondly, you may prefer to use a tool that assists you in creating the CSS, such as SoftQuads XMetal. To stay in the eBook race and produce OEB-compliant eBooks, understanding CSS is a definite necessity.
How it applies to OEB
For the highly technical user who wants to know exactly how cascading styles apply to the Open eBook specification, they define a style language based on the style sheet mechanisms CSS1 and CSS2 with a MIME media type of "text/x-oeb1-css". Stylesheets of other MIME media types may be substituted for the text/x-oeb1-css stylesheets at the discretion of the reading system.
Not all properties of the CSS1 and CSS2 mechanisms have been included in the OEB format. The inclusion of the CSS-based stylesheet constructs is to define a baseline rendering functionality. Apart from the default properties of CSS1 and CSS2, few other properties and values have been added to support page layout, headers, and footers.
Additionally, this specification supports the inline style attribute, the style element, and externally linked stylesheets. In the event of processing stylesheets, the reading system is not required to handle XML namespaces. Reading systems that implement only the OEB CSS subset may ignore any stylesheets using other style languages, whereas, those that support extended stylesheet functionality may choose among any of the other external stylesheets. There exists an option of adding non-OEB elements to the OEB document as long as such elements are provided with style definitions in accompanying style sheets.
Not (immediately) a deliverable format
This early version of the specification does not address issues such as Digital Rights Management (DRM) and compressed distribution packaging, this means that OEB is unlikely to be seen as suited to secure and timely delivery over the Internet (note, this was an intentional exclusion from the first release). The fact is that software developers and e-reading device manufacturers are still likely to use their own digital wrapper for their end-user distributable file. This includes formats such as Microsoft's ".LIT" and MobiPocket's ".PCF".
For a publisher to make its OEB-compliant content available to different target devices, they must 'wrap' their content to comply with each specific reader-associated format. For example, if the publication is intended to reach a large number of recipients using a wide variety of readers, then a file must be created for each of Microsoft Reader MobiPocket Reader, REB 1100 and REB 1200. Not to mention Primer, goReader and Cybook who are also releasing OEB-compliant readers in the near future.
What about PDF and OEB?
Technically, the eBook specification allows embedding of PDF files (or any other non-OEB file) into a publication, as long as that publication contains an alternate representation of the content to be used by the reading systems that lack support for that filetype. However, since Open eBook reading systems are not required to support PDF, in reality, this means that this is technically possible, but highly unlikely to be implemented in the real world. It seems much more likely that PDF content will be converted to an Open eBook document by format converters such as BCL's GoHTM. This applies to all other file formats as well such as Quark and PageMaker (see our article on avenue.quark for more information on converting Quark to XML/OEB.
The Future of ePublishing
Electronic books are the next logical step for the publishing industry over the coming years. The Open eBook Publication Specification details a non-proprietary content format that may well provide a mechanism to facilitate this transformation. The major threat to the success of this initiative is going to be the integration of distribution formats and DRM initiatives from vendors who are seeking to provide what is not yet available. This will be a challenge that needs to be met if eBooks are in fact to be as interoperable as their paper cousins.
More Info