OEBPS Container Format (OCF) 1.0

Recommended Specification

September 11, 2006

The offical HTML version of this specification was produced by simply saving a Microsoft Word document as unfiltered HTML. This version is one-fifth the size and has internal hyperlinks (shown in this color). It also has some minor guessed-at corrections to punctuation, and much shorter titles for subsections 2.3.1 and 2.3.2.

Table of Contents

1 Overview

1.1 Purpose and Scope
1.2 Definitions
1.3 Relationship to Other Specifications
1.4 Conformance
1.4.1 Conforming Containers
1.4.2 Conforming Reading Systems
1.5 Accessibility
1.6 Future Directions

2 OCF Overview

2.1 OCF: A General Container Technology
2.2 “Abstract Container” vs. “Physical Container”
2.3 Examples
2.3.1 Example: a simple Publication
2.3.2 Example: alternate renditions

3 OCF Container Contents

3.1 File and directory structure
3.2 Relative IRIs for referencing other components
3.3 File Names
3.4 Container media type identification
3.5 META-INF
3.5.1 Container – META-INF/container.xml (Required)
3.5.2 Manifest – META-INF/manifest.xml (Optional)
3.5.3 Metadata – META-INF/metadata.xml (Optional)
3.5.4 Digital Signatures – META-INF/signatures.xml (Optional)
3.5.5 Encryption – META-INF/encryption.xml (Optional)
3.5.6 Rights Management – META-INF/rights.xml (Optional)

4 ZIP Container

Appendix A: RELAX NG OCF Schema

Appendix B: Example

Appendix C: Contributors

1 Overview

1.1 Purpose and Scope

This specification defines the OEBPS Container Format (OCF). OCF is a general-purpose container technology. This specification describes the general-purpose container technology in the context of encapsulating OEBPS publications and OPTIONAL alternate renditions thereof. It is however anticipated that the general-purpose container technology described herein will ultimately be used in other bundling applications.

As a general container format, OCF collects a related set of files into a single-file container. OCF can be used to collect files in various document formats and for classes of applications. The single-file container enables easy transport of, management of, and random access to, the collection.

OCF defines the rules for how to represent an abstract collection of files (the “abstract container”) into physical representation within a ZIP archive (the “physical container”). The rules for ZIP containers build upon and are backward compatible with the ZIP technologies used by Open Document Format (ODF) 1.0.

OCF is the RECOMMENDED single-file container technology for OEBPS publications. OCF MAY play a role in the following workflows:

1.2 Definitions

ASCII
American Standard Code for Information Interchange – a 7-bit character encoding based on the English alphabet (ANSI X3.4-1986). When used in this document, ASCII refers to the printable graphic characters in the range 33 (decimal) through 126 (decimal) and the nonprintable space character 32 (decimal).
IRI
Internationalized Resource Identifier (http://www.ietf.org/rfc/rfc3987.txt).
OCF Container
A container file that is compliant with the format defined in this specification.
OCF
The OEBPS Container Format defined by this specification.
Content Provider
A publisher, author, individual, or other information source that provides a publication to distribution or sales channels or directly to one or more OCF Reading Systems using OCF as described in this specification.
ODF
Open Document Format (http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf).
OEBPS
Open eBook Publication Structure (http://www.idpf.org/oebps/oebps1.2/index.htm).
OEBPS Document
An XML markup document that conforms to the OEBPS 1.2 specification – generally containing textual content of an OEBPS Publication.
OEBPS Package
An XML file that describes an OEBPS Publication as defined by the OEBPS 1.2 specification. It identifies all other files in the publication and provides descriptive information about them.
OEBPS Publication
A collection of OEBPS Documents, an OEBPS Package file, and other files, typically in a variety of media types, including structured text and graphics that constitute a cohesive unit for publication, as defined by the OEBPS 1.2 specification.
OCF Reading System
A combination of hardware and/or software that accepts OEBPS Publications (packaged in an OCF Container) and makes them available to consumer of the content. Great variety is possible in the architecture of OCF Reading Systems. An OCF Reading System MAY be implemented entirely on one device, or it MAY be split among several computers. In particular, a reading device that is a component of a OCF Reading System need not directly accept OCF Packaged OEBPS Publications, but all OCF Reading Systems MUST do so. OCF Reading Systems MAY include additional processing functions, such as compression, indexing, encryption, rights management, and distribution.
MIME
Multipurpose Internet Mail Extensions (http://www.isi.edu/in-notes/rfc2045.txt). “MIME media types” provide a standard methodology for specifying the content type of objects.
RFC
Literally “Request For Comments”, but more generally a document published by the Internet Engineering Task Force (IETF). See http://www.ietf.org/rfc.html.
Relax NG
A schema language for XML (http://www.relaxng.org/).
Rootfile
The top-level file of a rendition of a publication; either the “root” from which all other components can be found or the lone file encapsulating the rendition. The OEBPS rootfile is the OEBPS Package file. A PDF file containing the PDF rendition could also be a rootfile.
XML
Extensible Markup Language (http://www.w3.org/TR/xml11/).
ZIP
A defacto industry standard bundling and compression format (http://www.pkware.com/business_and_developers/developer/appnote).

1.3 Relationship to Other Specifications

This specification combines subsets and applications of other specifications. Together, these facilitate the construction, organization, presentation, and unambiguous interchange of electronic documents:

  1. The XML 1.1 Extensible Markup Language specification (http://www.w3.org/TR/xml11/); and
  2. The OEBPS 1.2 Open eBook Publication Structure specification (http://www.idpf.org/oebps/oebps1.2/index.htm); and
  3. The XML 1.1 namespace specification (http://www.w3.org/TR/xml-names11/); and
  4. The Unicode Standard, Version 4.0 (Addison-Wesley, 2003), as updated from time to time by the publication of new versions. (See http://www.unicode.org/unicode/standard/versions for the latest version and additional information on versions of the standard and of the Unicode Character Database); and
  5. Particular MIME media types (http://www.ietf.org/rfc/rfc4288.txt and http://www.iana.org/assignments/media-types/index.html); and
  6. Open Document Format for Office Applications (Open Document) v1.0 (http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf); and
  7. ZIP format (http://www.pkware.com/business_and_developers/developer/appnote); and
  8. XML-Signature Syntax and Processing (http://www.w3.org/TR/2002/REC-xmldsig-core-20020212); and
  9. XML Encryption Syntax and Processing (http://www.w3.org/TR/2002/REC-xmlenc-core-20021210); and
  10. Web Content Accessibility Guidelines 1.0 (http://www.w3.org/TR/WCAG10/).

1.4 Conformance

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document MUST be interpreted as described in http://www.ietf.org/rfc/rfc2119.txt.

This section defines conformance requirements for OCF.

1.4.1 Conforming Containers

The term “Conforming OCF Abstract Container” indicates an OCF Abstract Container (see Section 2.2) that conforms to all of the relevant conformance criteria defined in this specification. The term “Conforming OCF ZIP Container” indicates a ZIP archive that conforms to the relevant ZIP container conformance criteria (see Section 4) and whose contents is a Conforming OCF Abstract Container.

In addition to other conformance criteria defined in this specification, a Conforming OCF Abstract Container MUST meet the following conditions:

1.4.2 Conforming Reading Systems

The term “Conforming OCF Reading System” indicates an OCF Reading System that supports all of the mandatory features defined by this specification.

An OCF Reading System that does not support all of the features defined in this specification MUST NOT claim to be a Conforming OCF Reading System and SHOULD provide readily available documentation of the subset of features it supports.

An OCF Reading System SHOULD provide readily available documentation of the accessibility features it supports. This documentation SHOULD conform to the relevant version of the W3C’s Web Content Accessibility Guidelines. (See Section 1.5.)

1.5 Accessibility

E-books MAY provide an accessible reading experience for users with disabilities provided authors and publishers conform to accepted industry standards for the creation of accessible electronic materials. OEBPS publications packaged or delivered using OCF SHOULD conform to the accessibility standards set forth by the OEBPS Working Group (http://www.idpf.org/idpf_groups/oebpswg.htm) to ensure that the broadest possible set of users will have access to books delivered in this format. This includes adherence to the W3C’s Web Content Accessibility Guidelines 1.0 (http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505/) or, if it is released while the Working Group is active, the Web Content Accessibility Guidelines 2.0 (the current draft is available at http://www.w3.org/TR/WCAG20/). OEBPS publications packaged or delivered using OCF MUST NOT interfere with any features intended to deliver accessible content, regardless of how that content is rendered (e.g., using an OCF Reading System-specific delivery format or OEBPS as the delivery format from within the container).

1.6 Future Directions

It is the intent of the contributors to this specification that subsequent versions of this specification continue in the directions established by the 1.0 release. Specifically:

Future versions of the OCF specification MAY include:

2 OCF Overview

2.1 OCF: A General Container Technology

OCF is purposely designed as a general container technology that can be used by other file formats, not just OEBPS. In particular, OCF is purposely designed to be upwardly compatible with the container technology used in ODF 1.0 such that a future version of ODF might use OCF.

2.2 “Abstract Container” vs. “Physical Container”

An “Abstract Container” defines a file system model for the contents of the container. The file system model MUST have a single common root directory for all of the contents of the container. The special files REQUIRED by OCF MUST be included within the META-INF directory that is a direct child of the root directory. All (non-remote) electronic assets for embedded publications MUST be located within the directory tree headed by the container’s root directory.

A “Physical Container” holds the physical manifestation of an abstract container. This specification defines how an abstract container MUST be mapped to the following two physical container technologies:

Publications MUST render the same no matter whether using a File System Container or a ZIP Container. In both cases, the OCF Reading System ultimately opens the rootfile for the Publication, from which it can determine how to render the Publication.

2.3 Examples

(This section is informative.)

This section includes an example of a single rendition and a multiple rendition container. See Section 3.5.1 for normative descriptions.

2.3.1 Example: a simple Publication

To illustrate the concepts from the previous section, let’s assume we have a single OEBPS 1.2 Publication of Dickens’ “Great Expectations” which consists of an OEBPS 1.2 package file (“Great Expectations.opf”) and a large number of HTML files, one for the cover page (e.g., “cover.html”) and one for each chapter (e.g., “chapter01.html”). The contents of the publication might be as follows:

OEBPS 1.2 Publication:
Great Expectations.opf
cover.html
chapters/
   chapter01.html
   chapter02.html
   … other HTML files for the remaining chapters

The contents of the Abstract Container includes all of the assets from the Publication, plus a small number of files defined by OCF within the META-INF directory. Note that container.xml is REQUIRED in all circumstances. See Section 3 for descriptions of the files within the META-INF directory.

Abstract Container:
META-INF/
   container.xml
   [manifest.xml]
   [metadata.xml]
   [signatures.xml]
   [encryption.xml]
   [rights.xml]
OEBPS/
   Great Expectations.opf
   cover.html
   chapters/
      chapter01.html
      chapter02.html
      … other HTML files for the remaining chapters …

When the above abstract container is mapped to a File System Container, the directory structure within the file system exactly matches the OCF’s Abstract Container directory structure shown above:

File System Container: some directory within the file system…/
   META-INF/
      container.xml
      [manifest.xml]
      [metadata.xml]
      [signatures.xml]
      [encryption.xml]
      [rights.xml]
   OEBPS/
      Great Expectations.opf
      cover.html
      chapters/
         chapter01.html
         chapter02.html
         … other HTML files for the remaining chapters …>

When the above Abstract Container is stored within a ZIP container, the contents of the ZIP archive will match the directory structure shown above, but MUST also contain a “mimetype” file as the first file in the ZIP archive to aid in the easy identification of the media type of the container. (See section 3.4.)

ZIP Container:
mimetype
META-INF/
   container.xml
   [manifest.xml]
   [metadata.xml]
   [signatures.xml]
   [encryption.xml]
   [rights.xml]
OEBPS/
   Great Expectations.opf
   cover.html
   chapters/
      chapter01.html
      chapter02.html
      … other HTML files for the remaining chapters

The corresponding META-INF/container.xml file might appear as follows:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/Great Expectations.opf"
              media-type="application/oebps-package+xml" />
  </rootfiles>
</container>

N.B. The use of the specific namespace string “urn:oasis:names:tc:opendocument:xmlns:container” should be considered provisional until approved by an OASIS technical committee.

2.3.2 Example: alternate renditions

In some circumstances, an OCF container might hold multiple renditions of the same publication. An example is a container that has an OEBPS Publication as the primary rendition for viewing but includes an alternate PDF for printing. To avoid name conflicts, it is RECOMMENDED that each rendition be placed within its own subdirectory and that multiple <rootfile> elements be defined within container.xml. Here is an example:

Abstract Container:
META-INF/
   container.xml – Note: includes multiple <rootfile> elements
   [manifest.xml]
   [metadata.xml]
   [signatures.xml]
   [encryption.xml]
   [rights.xml]
OEBPS/
   Great Expectations.opf
   cover.html
   chapters/
      chapter01.html
      chapter02.html
      … other HTML files for the remaining chapters …
PDF/
   Great Expectations.pdf

The corresponding META-INF/container.xml file might appear as follows:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/Great Expectations.opf"
              media-type="application/oebps-package+xml" />
    <rootfile full-path="PDF/Great Expectations.pdf"
              media-type="application/pdf" />
  </rootfiles>
</container>

3 OCF Container Contents

3.1 File and directory structure

The virtual file system for the OCF “Abstract Container” MUST have a single common root directory for all of the contents of the container.

The following file names in the root directory are reserved:

The “mimetype” file is discussed in Section 4. The META-INF/ directory contains the reserved files used by OCF. These reserved files are described in the following sections. All other files used by the publication rendition(s) within the Abstract Container MAY be in any location descendant from the root directory except for “mimetype” at the root level or within the META-INF directory.

It is RECOMMENDED that the contents of individual publications be stored within dedicated sub-directories to minimize potential file name collisions in the event that multiple renditions are used or that multiple publications per container are supported in future versions of this Specification.

3.2 Relative IRIs for referencing other components

Files within the Abstract Container reference each other via Relative IRI References (http://www.ietf.org/rfc/rfc3987.txt and http://www.ietf.org/rfc/rfc3986.txt), no matter what is used for the physical container (e.g., File System Container or ZIP Container). For example, if a file named “chapter1.html” references an image file named “image1.jpg” that is located in the same directory, then “chapter1.html” might contain the following as part of its content:

<img src="image1.jpg" alt="…" …/>

For Relative IRI References, the Base IRI (see RFC3986) is determined by the relevant language specifications for the given file formats. For example, the CSS specification defines how relative IRI references work in the context of CSS style sheets and property declarations. Note that some language specifications reference RFCs that preceded RFC 3987, in which case the earlier RFC applies for content in that particular language.

Unlike most language specifications, the Base IRIs for all files within the META-INF/ directory use the root folder for the Abstract Container as the default Base IRI. For example, if META-INF/container.xml has the following content:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/Great Expectations.opf"
              media-type="application/oebps-package+xml" />
    </rootfiles>
</container>
the path “OEBPS/Great Expectations.opf” is relative to the root directory for the Abstract Container and not relative to the META-INF/ directory.

3.3 File Names

The term File Name represents the name of any type of file, either a directory or an ordinary file within a directory within an Abstract Container. For a given directory within the Abstract Container, the Path Name is a string holding all directory names in the full path concatenated together with a “/” character separating the directory names. For a given file within the Abstract Container, the Path Name is the string holding all directory names concatenated together with a “/” character separating the directory names, followed by a “/” character and then the name of the file. The File Name restrictions described below are designed to allow directory names and file names to be used without modification on most commonly used operating systems. This specification does not specify how a OCF Reading System that is unable to represent OCF conforming File Names would compensate for this incompatibility.

The following statements apply to Conforming OCF Content:

Note that some commercial ZIP tools do not support the full Unicode range and may only support the ASCII range for File Names. Content creators who want to use ZIP tools that have these restrictions MAY find it is best to restrict their File Names to the ASCII range. If the names of files can not be preserved during the unzipping process, it will be necessary to compensate for any name translation which took place when the files are referenced by URI from within the content.

3.4 Container media type identification

It is frequently necessary for applications to determine the media type of a file. This is usually accomplished by looking at the file extension of the file. This gives applications a quick way to determine the type of the file without looking inside the file. OCF Container files SHOULD use an extension “.epub” to identify to processing applications that they are OCF Containers.

In order to translate a file extension into a media type, typically a processing agent will register the relationship between file extension and media type with the operating system. Applications that are interested in OCF Container files SHOULD register the media type of “application/epub+zip” as corresponding to the file extension of “.epub”.

Unfortunately, the identification of files through the use of file extensions is notoriously unreliable. As a result, it is desirable to have a more robust way of identifying files independent of their file names or extensions. One mechanism that has evolved for doing this is to require the placement of specific information at specific file offsets. A processing agent can then check a fixed location to determine if the file is an OCF Container.

The method that has evolved for doing this in ZIP archives is the inclusion of an uncompressed, unencrypted file called “mimetype” as the first file in the ZIP archive. The contents of this file are the media type of the file. OCF Containers MUST place the ASCII string “application/epub+zip” in the “mimetype” file as the first file in the ZIP archive. See Section 4 for more detail on this mechanism.

3.5 META-INF

All valid OCF Containers MUST include a directory called “META-INF” at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication.

The semantics of the following files that MAY be present at the “META-INF/” level are specified. All other files found at the “META-INF/” level MUST be ignored by conformant OCF Reading Systems.

3.5.1 Container – META-INF/container.xml (Required)

(This is normative.)

All valid OCF Containers MUST include a file called “container.xml” within the “META-INF” directory at the root level of the container file system. The container.xml file MUST identify the MIME type of, and path to, the rootfile for the OEBPS version of the publication and any OPTIONAL alternate renditions included within the container.

The container.xml file MUST NOT be encrypted.

The container.xml file contains XML that uses the “urn:oasis:names:tc:opendocument:xmlns:container” namespace for all of its elements and attributes. The “version="1.0"” attribute MUST be included for all containers that conform to this version of the specification.

A RELAX NG OCF schema describing the <container> element that MUST be the root element of container.xml can be found in the Appendix A.

The <rootfiles> element MUST contain at least one <rootfile> element that has a media-type of “application/oebps-package+xml”. Only one <rootfile> element with a media-type of “application/oebps-package+xml” SHOULD be included. The file referenced by the first <rootfile> element that has a media-type of “application/oebps-package+xml” will be considered the OEBPS rootfile. The OEBPS rootfile (the OEBPS package file) MUST NOT be encrypted.

Each <rootfile> element specifies the rootfile of a single rendition of the contained publication. A rootfile often includes an enumeration of the other files needed by the rendition. In the case of OEBPS, the root will be the OEBPS Package file for the OEBPS rendition of the publication, whose <manifest> element enumerates the other files used by the OEBPS rendition. In other cases, the rootfile MAY be the only file needed by the rendition.

(This example is informative.)

The following example shows a sample container.xml for an OCF container inside of which is an OEBPS Publication with the root file “OEBPS/My Crazy Life.opf” (the OEBPS package file):

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/My Crazy Life.opf"
              media-type="application/oebps-package+xml" />
  </rootfiles>
</container>

(This example is informative.)

The following example adds an alternate PDF version of the Publication:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/My Crazy Life.opf"
              media-type="application/oebps-package+xml" />
    <rootfile full-path="PDF/My Crazy Life.pdf"
              media-type="application/pdf" />
  </rootfiles>
</container>

(This is normative.)

The <manifest> element contained within the OEBPS root package file specifies the one and only manifest used for OEBPS processing. Ancillary manifest information contained in the ZIP archive or in the OPTIONAL “manifest.xml” file MUST NOT be used for OEBPS processing purposes. Any extra files in the ZIP archive (i.e., files within the ZIP archive that are not listed within the package files’ <manifest> element, such as META-INF files or alternate derived renditions of the publication) MUST NOT be used in the processing of the OEBPS publication.

The values of the full-path attributes MUST contain a “path component” (as defined by RFC3986) which MUST only take the form of a “path-rootless” (as defined by RFC3986). The path components are relative to the root of the container in which they are used.

Conforming OCF User Agents MUST ignore unrecognized elements (and their contents) and unrecognized attributes within a container.xml file, including unrecognized elements and unrecognized attributes from other namespaces.

Conforming container.xml files MUST be valid according to the RELAX NG OCF schema with the <container> element as the root element after removing all elements (and child nodes of these elements) and attributes from other namespaces.

(This example is informative.)

For example:

<?xml version="1.0"?>
<container version="1.0"
              xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
              foo:xmlns="..."
              foozle:xmlns="..." />
  <foo:bar />
  <rootfiles foozle:identifier="bar">
  ...
  </rootfiles>
</container>
is conformant, but:
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <foo />
  <rootfiles>
  ...
  </rootfiles>
</container>
is non-conformant due to the non-namespace-qualified use of the <foo> element.
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles identifier="bar">
  ...
  </rootfiles>
</container>
is also non-conformant due to the non-namespace-qualified use of the “identifier” attribute on the <rootfiles> element.

3.5.2 Manifest – META-INF/manifest.xml (Optional)

An OPTIONAL file with the reserved name “manifest.xml” within the “META-INF” directory at the root level of the container may appear in a valid OCF container. If present, the file’s content MUST be as defined in the ODF 1.0 manifest schema (http://www.oasis-open.org/committees/download.php/12570/OpenDocument-manifest-schema-v1.0-os.rng).

The manifest.xml file, if present, MUST NOT be encrypted.

3.5.3 Metadata – META-INF/metadata.xml (Optional)

A file with the reserved name “metadata.xml” within the “META-INF” directory at the root level of the container file system may appear in a valid OCF container. This file, if present, MUST be used for container-level metadata. In version 1.0 of OCF, no such container-level metadata is specified.  It is in this file that future innovation and extension SHOULD occur.

If the “META-INF/metadata.xml” file exists, its contents MUST be valid XML with namespace-qualified elements to avoid collision with future versions of OCF that MAY specify a particular grammar and namespace for elements and attributes within this file.

The metadata.xml file, if present, MUST NOT be encrypted.

3.5.4 Digital Signatures – META-INF/signatures.xml (Optional)

An OPTIONAL “signatures.xml” file within the “META-INF” directory at the root level of the container file system holds digital signatures of the container and its contents. This file is an XML document whose root element is <signatures>. The <signatures> element contains child elements of type <Signature> as defined by “XML-Signature Syntax and Processing” (http://www.w3.org/TR/2002/REC-xmldsig-core-20020212). Signatures can be applied to the publication and any alternate renditions as a whole or to parts of the publication and renditions. XML Signature can specify the signing of any kind of data, not just XML.

The signatures.xml file MUST NOT be encrypted.

When the signatures.xml file is not present, the OCF container provides no information indicating any part of the container is digitally signed at the container level. It is however possible that digital signing exists within any optional alternate contained renditions.

A RELAX NG OCF schema describing the <signature> element that MUST be the root element of signatures.xml can be found in the Appendix A.

When an OCF agent creates a signature of data in a container, it SHOULD add the new signature as the last child <Signature> element of the <signatures> element in the signatures.xml file.

Each <Signature> in the signatures.xml file identifies by IRI the data to which the signature applies, using the XML Signature <Manifest> element and its <Reference> sub-elements. Individual contained files MAY be signed separately or together. Separately signing each file creates a digest value for the resource that can be validated independently. This approach MAY make a Signature element larger. If files are signed together, the set of signed files can be listed in a single XML Signature <Manifest> element and referenced by one or more <Signature> elements.

Any or all files in the container can be signed in their entirety with the exception of the signatures.xml file since that file will contain the computed signature information. Whether and how the signatures.xml file SHOULD be signed depends on the objective of the signer.

XML-Signature does not associate any semantics with a signature, however an agent MAY include semantic information, for example, by adding information to the Signature element that describes the signature. XML Signature describes how additional information can be added to a signature (for example, by using the SignatureProperties element).

(This example is informative.)

The following XML expression shows the content of an example “signatures.xml” file, and is based on the examples found in Section 2 of “XML-Signature Syntax and Processing.” It contains one signature, and the signature applies to two resources, OEBFPS/book.html and OEBFPS/images/cover.jpeg, in the container.

<signatures>
  <Signature Id="MyFirstSignature" xmlns="http://www.w3.org/2000/09/xmldsig#">
    <SignedInfo>
      <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
      <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
      <Reference URI="#Manifest1">
        <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
        <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
      </Reference>
    </SignedInfo>
    <SignatureValue>MC0CFFrVLtRlk=...</SignatureValue>
    <KeyInfo>
      <KeyValue>
        <DSAKeyValue>
          <P>...</P><Q>...</Q><G>...</G><Y>...</Y>
        </DSAKeyValue>
      </KeyValue>
    </KeyInfo>
    <Object>
      <Manifest Id="Manifest1">
        <Reference URI="OEBFPS/book.xml">
          <Transforms>
            <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
          </Transforms>
        </Reference>
        <Reference URI="OEBFPS/images/cover.jpeg"/>
      </Manifest>
    </Object>
  </Signature>
</signatures>

3.5.5 Encryption – META-INF/encryption.xml (Optional)

An OPTIONAL “encryption.xml” file within the “META-INF” directory at the root level of the container file system holds all encryption information on the contents of the container. This file is an XML document whose root element is <encryption>. The <encryption> element contains child elements of type <EncryptedKey> and <EncryptedData> as defined by “XML Encryption Syntax and Processing” (http://www.w3.org/TR/2002/REC-xmlenc-core-20021210). Each EncryptedKey element describes how one or more container files are encrypted. Consequently, if any resource within the container is encrypted, “encryption.xml” MUST be present to indicate that the resource is encrypted and provide information on how it is encrypted.

An <EncryptedKey> element describes each encryption key used in the container, while an <EncryptedData> element describes each encrypted file. Each <EncryptedData> element refers to an <EncryptedKey> element, as described in XML Encryption.

A RELAX NG OCF schema describing the <encryption> element that MUST be the root element of encryption.xml can be found in the Appendix A.

When the encryption.xml file is not present, the OCF container provides no information indicating any part of the container is encrypted.

OCF encrypts individual files independently, trading off some security for improved performance, allowing the container contents to be incrementally decrypted. Encryption in this way still exposes the directory structure and file naming of the whole package.

OCF uses XML Encryption to provide a framework for encryption, allowing a variety of algorithms to be used. XML Encryption specifies a process for encrypting arbitrary data and representing the result in XML. Even though an OCF container MAY contain non-XML data, XML Encryption can be used to encrypt all data in an OCF container. OCF encryption supports only encryption of whole files. The encryption.xml file, if present, MUST NOT be encrypted.

Encrypted data replaces unencrypted data in an OCF container. For example, if an image named “photo.jpeg” is encrypted, the contents of the photo.jpeg resource SHOULD be replaced by its encrypted contents. When stored in a ZIP container, streams of data MUST be compressed before they are encrypted; Flate compression MUST be used. Within the ZIP directory, encrypted files SHOULD be stored rather than Flate-compressed.

The following files MUST never be encrypted (regardless of whether default or specific encryption is requested):

Signed resources MAY subsequently be encrypted by using the Decryption Transform for XML Signature. This feature enables an application such as an OCF agent to distinguish data that was encrypted before signing from data that was encrypted after signing. Only data that was encrypted after signing MUST be decrypted before computing the digest used to validate the signature.

(This example is informative.)

In the following example, adapted from Section 2.2.1 of “XML Encryption Syntax and Processing,” the resource image.jpeg is encrypted using a symmetric key algorithm (AES) and the symmetric key is further encrypted using an asymmetric key algorithm (RSA) with a key of John Smith.

<encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
            xmlns:enc="http://www.w3.org/2001/04/xmlenc#"
            xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
  <enc:EncryptedKey Id="EK">
    <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
    <ds:KeyInfo>
      <ds:KeyName>John Smith</ds:KeyName>
    </ds:KeyInfo>
    <enc:CipherData>
      <enc:CipherValue>xyzabc</enc:CipherValue>
    </enc:CipherData>
  </enc:EncryptedKey>
  <enc:EncryptedData Id="ED1">
    <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
    <ds:KeyInfo>
      <ds:RetrievalMethod URI="#EK"
              Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
    </ds:KeyInfo>
    <enc:CipherData>
      <enc:CipherReference URI="image.jpeg"/>
    </enc:CipherData>
  </enc:EncryptedData>
</encryption>

3.5.6 Rights Management – META-INF/rights.xml (Optional)

An OPTIONAL file with the name “rights.xml” within the “META-INF” directory at the root level of the container file system is a reserved name in a valid OCF container. This location is reserved for digital rights management (DRM) information for trusted exchange of Publications among rights holders, intermediaries, and users. In version 1.0 of OCF, there is not a REQUIRED format for DRM information, but a future version of this specification MAY specify a particular format for DRM information.

If the “META-INF/rights.xml” file exists, it MUST be a well-formed XML document which uses and conforms to XML Namespaces it uses, and its contents SHOULD be valid XML with namespace-qualified elements to avoid collision with future versions of OCF that MAY specify a particular format this file.

The rights.xml file MUST NOT be encrypted.

When the rights.xml file is not present, the OCF container provides no information indicating any part of the container is rights governed.

4 ZIP Container

OCF’s ZIP Container supports the ZIP format as specified by the application note at http://www.pkware.com/business_and_developers/developer/appnote/, but with the following constraints and clarifications:

Here are some details about particular fields in the ZIP archive:

The first file in the ZIP Container MUST be a file by the ASCII name of ‘mimetype’ which holds the MIME type for the ZIP Container (i.e., “application/epub+zip” as an ASCII string; no padding, white-space or case change). The file MUST be neither compressed nor encrypted and there MUST NOT be an extra field in its ZIP header. If this is done, then the ZIP Container offers convenient “magic number” support as described in RFC 2048 and the following will hold true:

APPENDIX A: RELAX NG OCF Schema

<?xml version="1.0" encoding="UTF-8"?>
<choice xmlns="http://relaxng.org/ns/structure/1.0">

  <element name="container">
    <attribute name="version">
      <value>1.0</value>
    </attribute>
    <attribute name="xmlns">
      <value>urn:oasis:names:tc:opendocument:xmlns:container</value>
    </attribute>
    <element name="rootfiles">
      <oneOrMore>
        <element name="rootfile">
          <attribute name="full-path">
            <text/>
          </attribute>
          <attribute name="media-type">
            <text/>
          </attribute>
        </element>
      </oneOrMore>
    </element>
  </element>

  <element name="signatures">
    <attribute name="xmlns">
      <value>urn:oasis:names:tc:opendocument:xmlns:container</value>
    </attribute>
    <oneOrMore>
      <element name="Signature" ns="http://www.w3.org/2001/04/xmldsig#">
        <externalRef 
          href="http://www.w3.org/Signature/2002/07/xmldsig-core-schema.rng"/>
      </element>
    </oneOrMore>
  </element>

  <element name="encryption">
    <attribute name="xmlns">
      <value>urn:oasis:names:tc:opendocument:xmlns:container</value>
    </attribute>
    <oneOrMore>
      <choice>
        <element name="EncryptedData" ns="http://www.w3.org/2001/04/xmlenc#">
          <externalRef 
           href="http://www.w3.org/Encryption/2002/07/xenc-schema.rng"/>
        </element>
        <element name="EncryptedKey" ns="http://www.w3.org/2001/04/xmlenc#">
          <externalRef 
           href="http://www.w3.org/Encryption/2002/07/xenc-schema.rng"/>
        </element>
      </choice>
    </oneOrMore>
  </element>

</choice>

APPENDIX B: Example

The following example demonstrates the use of this OCF format to contain a signed and encrypted OEBPS publication with an alternate PDF rendition within a ZIP Container.

Ordered list of files in the ZIP Container:

mimetype
META-INF/container.xml
META-INF/signatures.xml
META-INF/encryption.xml
OEBPS/As You Like It.opf
OEBPS/book.html
OEBPS/images/cover.png
PDF/As You Like It.pdf

The mimetype file:

application/epub+zip

The META-INF/container.xml file:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/As You Like It.opf"
              media-type="application/oebps-package+xml" />
    <rootfile full-path="OEBPS/As You Like It.pdf"
              media-type="application/pdf" />
  </rootfiles>
</container>

The META-INF/signatures.xml file:

<?xml version="1.0"?>
<signatures xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <Signature Id="AsYouLikeItSignature" xmlns="http://www.w3.org/2000/09/xmldsig#">

    <!-- SignedInfo is the information that is actually signed. In this case -->
    <!-- the SHA1 algorithm is used to sign the canonical form of the XML    -->
    <!-- documents enumerated in the Object element below                    -->
    <SignedInfo>
      <CanonicalizationMethod
          Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
      <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
      <Reference URI="#AsYouLikeIt">
        <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
        <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
      </Reference>
    </SignedInfo>

    <!-- The signed value of the digest above using the DSA algorithm -->
    <SignatureValue>MC0CFFrVLtRlk=...</SignatureValue>

    <!-- The key to use to validate the signature -->
    <KeyInfo>
      <KeyValue>
        <DSAKeyValue>
          <P>...</P><Q>...</Q><G>...</G><Y>...</Y>
        </DSAKeyValue>
      </KeyValue>
    </KeyInfo>

    <!-- The list documents to sign. Note that the canonical form of XML  -->
    <!-- documents is signed while the binary form of the other documents -->
    <!-- is used -->
    <Object>
      <Manifest Id="AsYouLikeIt">
        <Reference URI="OEBPS/As You Like It.opf">
          <Transforms>
            <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
          </Transforms>
        </Reference>
        <Reference URI="OEBPS/book.html">
          <Transforms>
            <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
          </Transforms>
        </Reference>
        <Reference URI="OEBPS/images/cover.png" />
        <Reference URI="PDF/As You Like It.pdf" />
      </Manifest>
    </Object>

  </Signature>
</signatures>

The META-INF/encryption.xml file:

<?xml version="1.0"?>
<encryption xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
            xmlns:enc="http://www.w3.org/2001/04/xmlenc#"
            xmlns:ds="http://www.w3.org/2000/09/xmldsig#">

  <-- The RSA encrypted AES-128 symmetric key used to encrypt the data -->
  <enc:EncryptedKey Id="EK">
    <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5"/>
    <ds:KeyInfo>
      <ds:KeyName>John Smith</ds:KeyName>
    </ds:KeyInfo>
    <enc:CipherData>
      <enc:CipherValue>xyzabc...</enc:CipherValue>
    </enc:CipherData>
  </enc:EncryptedKey>

  <!-- Each EncryptedData block identifies a single document that has been    -->
  <!-- encrypted using the AES-128 algorithm. The data remains stored in it’s -->
  <!-- encrypted form in the original file within the container.              -->
  <enc:EncryptedData Id="ED1">
    <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
    <ds:KeyInfo>
      <ds:RetrievalMethod URI="#EK"
                 Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
    </ds:KeyInfo>
    <enc:CipherData>
      <enc:CipherReference URI="OEBPS/book.html"/>
    </enc:CipherData>
  </enc:EncryptedData>

  <enc:EncryptedData Id="ED2">
    <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
    <ds:KeyInfo>
      <ds:RetrievalMethod URI="#EK"
                 Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
    </ds:KeyInfo>
    <enc:CipherData>
      <enc:CipherReference URI="OEBPS/images/cover.png"/>
    </enc:CipherData>
  </enc:EncryptedData>

  <enc:EncryptedData Id="ED3">
    <enc:EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#kw-aes128"/>
    <enc:KeyInfo>
      <enc:RetrievalMethod URI="#EK"
                 Type="http://www.w3.org/2001/04/xmlenc#EncryptedKey"/>
    </enc:KeyInfo>
    <enc:CipherData>
      <enc:CipherReference URI="PDF/As You Like It.pdf"/>
    </enc:CipherData>
  </enc:EncryptedData>
;/encryption>

The OEBPS/As You Like It.opf file:

<?xml version="1.0"?>
<!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Package//EN"
                         "http://openebook.org/dtds/oeb-1.2/oebpkg12.dtd">
<package unique-identifier="Package-ID">
  <metadata>
    <dc-metadata xmlns:dc="http://purl.org/dc/elements/1.0"
                 xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0">
      <dc:Identifier id="Package-ID">ebook:guid-6B2DF0030656ED9D8</dc:Identifier>
      <dc:Title>As You Like It</dc:Title>
      <dc:Creator role="aut">William Shakespeare</dc:Creator>
      <dc:Identifier>0-7410-1455-6</dc:Identifier>
      <dc:Subject></dc:Subject>
      <dc:Type></dc:Type>
      <dc:Date event="publication">3/24/2000</dc:Date>
      <dc:Date event="copyright">1/1/9999</dc:Date>
      <dc:Identifier scheme="ISBN">0-7410-1455-6</dc:Identifier>
      <dc:Publisher>Project Gutenberg</dc:Publisher>
      <dc:Language></dc:Language>
    </dc-metadata>
  </metadata>
  <manifest>
    <item id="4915" href="book.html" media-type="text/x-oeb1-document"/>
    <item id="7184" href="images/cover.png" media-type="image/png" />
  </manifest>
  <spine>
    <itemref idref="4915"/>
  </spine>
</package>

The OEBPS/book.html file:

This file would be binary and be encrypted. Its decrypted contents might look something like:
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Document//EN"
                      "http://openebook.org/dtds/oeb-1.2/oebdoc12.dtd">
<html>
<head>
  ...
</head>
<body>
  ...
  <img src="images/cover.png" alt="Cover image: a picture of the Bard of Avon" />
  ...
</body>
</html>

The OEBPS/images/cover.png file:

This file contains the encrypted binary bytes of the cover.png file.

The OEBPS/As You Like It.pdf file:

This file contains the encrypted binary bytes of the PDF file.

APPENDIX C: CONTRIBUTORS

This specification has been developed through a cooperative effort, bringing together publishers, vendors, software developers, and experts in the relevant standards.

Version 1.0 of this specification was prepared by the International Digital Publishing Forum’s Unified OEBPS Container Format Working Group. Active members of the working group at the time of publication of revision 1.0 were:

Kelley L. Allen (Random House) Angel Ancin (iRex Technologies) Ryan Bandy (Random House) Richard Bellaver (Ball State University) Nick Bogaty (IDPF) - Working Group Secretary Thierry Brethes (Mobipocket) Janice Carter (Benetech/Bookshare.org) Richard Cohn (Adobe Systems Inc.) Garth Conboy (eBook Technologies) - Working Group Co-Chair Jon Ferraiolo (IBM) - Working Group Vice-Chair Neil De Young (Hachette Book Group USA) Linh N. Do (Random House, Inc.) Geoff Freed (WGBH) Liang Gang (TriWorks Asia) Peter Ghali (Motricity, ereader.com) Markku T. Hakkinen (DAISY Consortium) Gillian Harrison (NetLibrary) Jonathan Hevenstone (Publishing Dimensions) Theresa Horner (HarperCollins) Karen Iannone (Houghton Mifflin) Claire Israel (Simon & Schuster) Mattias Karlsson (Dolphin Computer Access) Bill Kasdorf (Apex Publishing) George Kerscher (DAISY Consortium) Steve Kotrch (Simon & Schuster) Bill McCoy (Adobe Systems, Inc.) Bill McKenna (Follett) Bonnie Melton (Houghton Mifflin College Division) Jon Noring (OpenReader Consortium) - Invited Expert Sayu Osayande (Motricity, ereader.com) Lee Passey - Invited Export Steve Potash (OverDrive) John Rivlin (eBook Technologies) - Working Group Co-Chair Tyler Ruse (Codemantra) Mike Smith (Harlequin) Kimi Sugeno (John Wiley & Sons) Gary Varnell (Osoft.com) Xin Wang, Ph.D. (ContentGuard, Inc.) Andrew Weinstein (Lightning Source) Tom Whitcomb (NetLibrary)  Andy Williams (Cambridge University Press) Eli Willner (Green Point Technology Services)