A quick reference note from BookNet Canada. Please see below for suggestions on options on embedding HTML (escaping vs CDATA and the best option, using XHTML with tags).
These general best practices are best to follow as they are part of XHMTL:
The following tags are recommended (others are problematic or prohibited) and this applies to both HTML (using CDATA or escaping of the < character) and XHTML (using HTML tags within text blocks and allowing XML processing rules apply to your tags. This option is the recommended one, but it's best supported by regular use of XML schema validation to ensure it's accuracy.)
<p> and <br /> – paragraphs and line breaks
<sub> and <sup> – sub- and superscript
In ONIX – whether version 2.1 or 3.0 – there are many common issues that arise when data providers embed HTML within the various textual data elements. Data providers deliver HTML in a variety of different ways – some which match the standard, many which don’t. And this means that for data recipients, the complexity of receiving so many unpredictable variations forces them to choose to just ignore all HTML – whether it matches the standard or not – or to treat each data file as unique (which adds unnecessary cost and time). This isn’t good for either senders or recipients.
Option 1. is preferred in all cases. If you cannot use XHTML, then 2. is preferred over 3. in ONIX 3.0, and in ONIX 2.1, 2. and 3. are equally preferred.
The first common error is simply omitting the necessary textformat attribute. When embedding HTML, you must always include the attribute textformat="02", as the default when you omit it is ‘plain text’. If you’re using XHTML, then you must use textformat="05".
There is complete list of XHTML and HTML tags – allowed and disallowed – within the ONIX 3.0 Implementation and Best Practice Guide.
There is only a small list of ONIX data elements in which HTML or XHTML markup is acceptable. For example, in ONIX 3.0, <BiographicalNote> can contain markup, but <ProductFormDescription> cannot. Here’s the complete list of data elements where markup can be used: