DIY Schema Validation for Workmanlike ONIX

DIY Schema Validation for Workmanlike ONIX


NOTE: This is dated documentation that has been superseded by much better EDItEUR documentation available at:

https://www.editeur.org/93/Release-3.0-Downloads/#How%20to

The information below may be of interest.  There is no fundamental difference, as a procedure, between validating ONIX 2.1 and ONIX 3.0 except in the sense the actual XSD schemas are completely distinct, but the following is focused on 2.1 and doesn't include any information on EDItEUR's strict schema for ONIX 3.0 which is an excellent tool for ensuring your ONIX follows best practices.




You can't really understand anything done in XML unless you know how to do a proper XML validation of the file. For validation you need three things:

  • A file that defines what's allowed in the XML file -- in this case a XML schema file with a XSD extension.
  • A XML document that needs testing -- here an ONIX file
  • XML software that can compare one to the other using the rules of the XML standard. It's sometimes called a parser and sometimes called a XML Editor but XML software capable of doing a validation.

Conceptually it's very simple. A programmer, or somebody who really understands XML, writes the "rules" – the schema or DTD – and any XML document designed to follow that specific rule set references the rule file in the document. All XML documents are self defining – the recipient knows the rule set used because the document must carry that information (or it's useless). These references are in the declaration and in an ONIX file it's the section at the start before the <Header> tag.  That declaration contains standard references to various files, which version of the XML standard for instance, and one of them is defines what's allowed specifically in this document.

There are two types of defining documents written for ONIX (and variations): The DTD (Document Text Definition) and a Schema. This can cause confusion because when ONIX was first developed in the late 1990s the DTD was the definition of choice and since then, as XML developed as a standard, schemas have become the definition typically used. In the ONIX manual for 2.1 it will show you a declaration using the DTD, while for ONIX 3.0 the declaration given is one for a schema.  This simply is following the current practice of the time they were written (and because standards aren't allowed to change willy-nilly, ONIX 2.1 remains as it was then), but there has always been a schema for 2.1 and there is a DTD for ONIX 3.0. The difference? Practically the schema can contain more rules and definitions than the DTD and that makes a validation against the schema more strict and much more useful.

BookNet Canada strongly recommends that everyone use the ONIX 2.1 schema today – if nothing else it will make the transition to ONIX 3.0 easier and it is simply better at making a good ONIX document. And that's what is described here – how to change the declaration to make it schema appropriate, get a schema and then use it in a validation.

The ONIX manual is an explanation of the rules. It tells you what's been defined in the schema: Associate this tag with that codelist, or that tag with a value. Using the manual is a great help in troubleshooting validation problems. The schema file is the same thing written for a computer. So validation is a routine that uses software to compare the computer documents to find mistakes.  The mistakes are defined by the ONIX developers by how they create the rule set: wrong codes, missing values, or empty values are most typical problems in ONIX. Other XML documents, Epub for example, would have a different schema and validation would find quite different things appropriate to the needs of ebooks.  For ONIX it's things that go wrong in data exchange and would cause database problems, and while not perfect at finding every issue, XML validation for ONIX files finds real data problems.

The ultimate point is a file that can pass a schema validation can be loaded by a computer, and a file that fails schema validation would need to be fixed before it can be loaded.

ONIX 3.0 vs ONIX 2.1.  ONIX 2.1 remains permanently associated with Code Issue 36 January 2017

Which ONIX version you use – 3.0 or 2.1 – doesn't matter to the software as XML validation is XML validation.  It does matter to the company you send it to though – so you can't just change from one standard to another.  Ask your data recipients first.  Currently BookNet Canada's data aggregation service BiblioShare wants companies to send BOTH 3.0 and 2.1 if they can OR to maintain their 2.1 feed to us if it's either / or.  More here.

You'll find that this document is written primarily for ONIX 2.1 – which remains the main ONIX version used for print in North America, still, though the digital supply chain is using ONIX 3.0.  From the point of view of validation the information here applies equally – only the file names change.  The two versions are very similar and if you're validating ONIX 3.0 and can't interpret what do based on this document let us know at the email at the bottom.  We hope to update this document to ONIX 3.0 centred soon.

There is one very important difference between ONIX 2.1 and ONIX 3.0:  EDItEUR has stopped supporting ONIX 2.1.   Within the published code lists from EDItEUR used in ONIX there are

  • code lists unique to ONIX 2.1 and code lists unique to ONIX 3.0.  They come in three general types:
    • code lists for ONIX 2.1 where a different code list exists for ONIX 3.0
      • Product Form / Product Form Detail are examples:  ONIX 2.1 uses Lists 7 and 78 and ONIX 3.0 uses Lists 150 and 175
    • In most cases ONIX 3.0 has an equivalent list to any ONIX 2.1 but in some cases the 3.0 methods are very different and there's no direct equivalency.
      • Compare 2.1's PR.16 Links to 3.0's P.16 Links as an example.
    • ONIX 3.0 only lists – there are many 3.0 lists with no 2.1 equivalent because 3.0 supports new features unsupported in 2.1
  • code lists shared by both ONIX 2.1 and 3.0

It's the shared code lists that are most affected by EDItEUR dropping support.  The Schema for ONIX 2.1 and 3.0 share one file for Codelists – for the XSD version that's ONIX_BookProduct_CodeLists.xsd.   All development of 2.1 has stopped for several years (no additions to it's codes or changes to it's format) but a back door type of update remained in that the shared code lists continue to be updated for ONIX 3.0.  As well, the unique to ONIX 2.1 code lists remained in schema's codelist files.

That's changed:  As of Code List Issue 37 (April 2017) the ONIX 2.1 only code lists are removed from the schema file.  EDItEUR's board has determined that with Issue 37 that no new "shared" code can be considered to be part of ONIX 2.1.    ONIX 2.1 is now static and permanently should use the Codelists for Issue 36.  This doesn't really affect ONIX 2.1 – it's functionality and the ability of anyone to use remains.   This just recognizes what's been true for a while – it's not longer being developed.

For the purpose of this all that means is if you're doing an ONIX 2.1 validation that you shouldn't update your ONIX_BookProduct_CodeLists.xsd with an ONIX 3.0 version because the validation won't work.  It checks files against codes and codes will be missing. 

ONIX 3.0 users should update their files regularly.  Code List updates are create every 3 or 4 months and new developments need to have the right file.

Locally Installing your files

Until recently EDItEUR supported ONIX 2.1 in a way that allowed you to do a standard DTD validation over the internet using files stored on the Editeur's server.  That put a huge demand on Editeur's bandwidth and resources, so they've long asked all users install their reference files locally. As noted above, they have now gleefully taken away that option.  ONIX 3.0 has never offered it.  So like the change to Schemas from DTDs, local file installation is now just a normal part of the ONIX 2.1 and 3.0 world.

There's a bonus for ONIX 2.1 users who are using old style entities like &eacute; or &uuml;.  Locally installing the DTD in addition to the schema will allow this entity style to process through a schema, allowing you to do both a schema validation and use this older style entities in your files.

Two BIG caveats:  You have to understand that in ONIX 3.0 you can't – CANNOT – use this old style of HTML entity because Editeur has made no allowances for it.  Like the change from DTD to Schema it's just progress and the old style HTML text-based entities like &eacute; are not recognized in current XML schema languages. They are incomplete and superseded by better systems.  So while I'll show this neat workaround that Graham Bell of Editeur let me in on you have to see it as a time buyer and start using current character sets and if necessary entities.  Ideally you simply use UTF-8 as display all characters in it, but if you find you need to use entities to display special characters, then use the "decimal" (&#233; for example – this is the most common choice) or "hex" (&#x17e; ) styles.

Please consult the ONIX 3.0 manual Section 2 "Character sets and special characters" for a better explanation.

And: These instructions affect the XML declarations which in turn affect the end user's data processing.  It's a good practice to ship files using the standard ONIX declaration given in each manual – the one the end users are told to expect – as not all end users will have programmed to accept the "localized" declarations I'm suggesting.  So while I'm showing you how to change the declaration to do a better validation, replacing that declarations with the "standard" ONIX manual declaration before distribtuion is a good practice – and you may be required to do so by end users.

Summary:  

    • Everyone should install their ONIX schema's (and DTD) locally (ONIX 2.1 or ONIX 3.0). 
    • In the best of all worlds use the UTF-8 character set, but if necessary use "decimal" or "hex" styled entitles for special characters (a must for ONIX 3.0)
    • For that subset of users who have ONIX 2.1 files containing HTML old style entities, you can also store the ONIX 2.1 DTD locally in order to allow the file to pass a schema validation.
    • Everyone should use standard declarations for data export.

Tools to help

A XML file is a specialized text file – there's nothing in it except characters as <tagged>text</tagged>. Computers are programmed to read the <tags>and are programmed on how to use the contents of the</tags>.  So while a XML file is a file that a computer can read, any text editor can open a XML file. It's just text. You can take a really basic text editor like Notepad and character by character create a working XML file – and if you did you would need to regularly use validation to test your work, just as described here.  Most of you though are going to using a file from some other system that has either created or partly created the ONIX file – when you open an XML file you are trying to preserve the integrity of the XML document, fix current problems and not create new ones.  Knowing a bit of XML can help:

http://www.w3schools.com/xml/ 

but largely it's three things:

  • nesting rules apply (<b><i>bob</i></b> but not <i><b>bob</i></b>),
  • all open tags need to be closed (a tag "opened" <ONIXMessage> or <p> must be "closed" </ONIXMessage> or </p>)
  • and case matters in tags (<tag>bob</Tag> isn't "well formed" because the closing tag doesn't match the opening tag -- tag and Tag are different).

Not all text editors are the same. For a start never use a word processor. Many word processors will happily open an ONIX file modify the contents to match what they are designed to expect – and it won't be ONIX that you save at the end of the session.  This advice can apply to some XML parsers as well – they can be designed for specific tasks and may be programmed with, from your and Editeur's point-of-view, erroneous expectations.  It never hurts to be careful when using a new tool with XML – work with a copy of your ONIX file.

All XML documents should declare the file's encoding in the opening line (<?xml version="1.0" encoding="utf-8"?>), so a text editor that can say something about the encoding of the characters you use is a help. Encoding is hard to explain but if I needed to use Chinese characters in a XML document I'd need to use a character encoding set appropriate to it. The standard English you are looking at is no different -- at some level it's machine code and the "encoding" is just a standard that defines each character at that level.  There's more than one way, different encodings, to tell a computer to create a character. This isn't a primary worry in validation, but the point is that the XML document declares its encoding and what you want to do is not screw it up.  Some text editors will read that opening line and set their encoding to match it (a big help) – or at least give you a chance to notice that the XML declaration is UTF-8 and the text editor list's it's encoding (what it types in) as ISO-8859-1.  I recommend not messing around with encoding, but it's good to use tools that set or can be set to the encoding declared for the document.

XML validation messages are cryptic and will normally include a line reference to the the problem, so a text editor that can show line numbers and a "Go To" line function is a huge help.  So what you want to have is a text editor that doesn't change the file, gives you information about encoding, and includes a go-to line number function.  Here are three:

PC only: Notepad++:  http://notepad-plus-plus.org/

PC only: Notepad2: http://www.flos-freeware.ch/

Mac only: TextWrangler http://www.barebones.com/products/textwrangler/

For PC: I usually recommend Notepad2 – Notepad++ is arguably more capable, but Notepad2 is what you need without much extra fuss about it and that can lessen confusion. 

XML Software

There are many freeware XML parsers out there, and in general they all work well.  However most of them are command line parsers that you work by using command strings. xmllint is a good choice and is detailed below but there's no one-size-fits-all answer. As noted above in my word processor warning, not every freeware XML parser (or commercially available one either) plays nice with an ONIX file.  As you might imagine each developer will have written their parser based on what s/he needs from the XML standard, each somewhat the same as it's based on the XML standard, but with slightly different interpretations.  ONIX is a very structured form of XML – and intended to be  – so in it's way it's an atypical use. If you're trying out a new parser, experiment on a copy of your ONIX file!

Here, I'm recommending several beginner friendly software choices and I'll provide detailed instructions on some as examples:

Windows only

There is a lovely generic XML software product called XML Notepad 2007. It's free and written by a Microsoft programmer, Chris Lovett, so the freeware is from a safe source, it's easy to set-up for a schema validation and robust with files as large as 20,000 records.  Not surprisingly it only works with Windows operating systems.  The software requires that you have.NET Frameworks version 4.0 installed  (note that .NET is likely already on your computer, but it's another Microsoft product and safe if you need to download it.).  The download site has moved about recently, but as of 2016 Lovett was still updating it.  The current site, now offering clone or download is here:

https://github.com/Microsoft/XmlNotepad

The installer for Windows can be found on the Wiki tab.

MAC

There are fewer free choice for Macs, but there is a good one that only costs $1.99 at the Mac App Store – XML Nanny:

https://itunes.apple.com/ca/app/xml-nanny/id423791387?mt=12

XML Nanny is excellent, simple software but some users have reported a line limit on XML Nanny where the software reports an error at line 65,535 when the actual error is many lines further down the file.  It appears that citation is accurate – an error exists – but the line reporting stops working.  If XML Nanny reports the file is fine, it is.  It's only if it reports an error after line 65,535 that this crops up.  On the assumption working solution to an issue is better than nothing and that you might have access to a PC, if you think you've encountered this issue and need a workaround try ONIX file splitter:

Free ONIXEdit Splitter

It's excellent software and reliably splits any size of ONIX file of either version, into same sized pieces which can be individually validated if you think you've run into this issue.

Linux and MAC

xmllint comes installed in most Linux versions and on many Macs as well.  It's a command line XML parser which means you'll need to write small scripts to do validations. I'm not going to illustrate any more than the below (and don't use this unless you're more comfortable using command line tools than me), but this example is intended for a Mac, with the username "Bob" with a folder called "schema" on his "Desktop", containing (in this case) the ONIX 2.1 reference schema "ONIX_BookProduct_Release2.1_reference.xsd",  The Desktop contains an ONIX file "TestOnix.xml" which has been set up for schema as noted below.  

Note that there are some instructions in the "Reference or Short Tag" section below on how to confirm the file location on a MAC – you have to be absolutely right in your location or it doesn't work. And you need to understand the difference between Reference or Short tags as well.  All that is discussed below in more detail.

This worked for me using Terminal:

xmllint --schema /Users/Bob/Desktop/schema/ONIX_BookProduct_Release2.1_reference.xsd /Users/Bob/Desktop/TestOnix.xml

You can find a full listing of commands here:

Java

Java based programs should workon any system but are seldom free.  I'll illustrate oXygen as it's often recommended for working with ePub.  It's expensive but you can get it on a 30 day trial.  Any of the versions Editor / Developer / Author will work for validation.

www.oxygenxml.com



The are lots more possible tools out there, but to use any of them you'll need to have the actual ONIX documents:

Now get the ONIX Schema

You'll need is the ONIX Schema on your computer, and it is available from www.editeur.org.  This is where you get all ONIX information from so it's a good site to know.

  • From the main page, select Standards
  • Under ONIX for Books, select Previous releases
  • Scroll down to “Download Release 2.1 XML Schema" and click on it.
  • Click on the “Release 2.1 (revision XX) XML Schema" and save the zip to your hard drive.
  • Extra if you want to use the ONIX 2.1 DTD, download them at the same time
  • Extra if you want the ONIX 3.0 Schema get it from the "Release 3.0 Downloads" off the main Standards menu. There's more choices and if you're unsure use the .xsd one.

Unpacking the 2.1 XML schema zip will give you a directory with 7 files in it, 6 xsd 'schema' files and a 'read me'. You'll need to put a location reference to these files into your declaration, so if you're on a PC make it easy on yourself and store this on your computer in an easily named location.  On a MAC it's probably easiest to do it off the desktop. On a PC or Linux avoid spaces in your directory names (spaces can confuse the XML software's ability to find the file). As an example, if you were to create a directory in your top level C: drive named XML with a subdirectory XSD and you were to put the contents of the zip there, then naming the local file reference would not only be easy but have a long tradition behind it:

C:/XML/XSD/

However you choose to name the location put the 7 files into that location. The schema file ONIX_BookProduct_CodeLists.xsd includes the ONIX codes, so every time the code list is updated, this file should be updated as well. There may be minor corrections to the other schema files so replacing them all is normal with every code issue for Editeur.  (These happen regularly – every 3 or 4 months – and if you're not updating them then you're missing out on new developments and codes.  Note that this where "backward compatibility" comes into play:  an old schema doesn't stop working and it won't be contradicted by later ones, but an outdated schema will not contain all it could.  You could have a client's file failing because they use new codes not in your dated schema).

Declare yourself!

You'll recall I mentioned that the ONIX 2.1 declaration is typically based on the DTD, so chances are that's what's in your ONIX 2.1 file: a DTD based declaration. And to do a schema validation we're going to do two things:

  • re-write the declaration to reference the schema file instead of the DTD
  • and do it so that validation is done locally, not using the internet to reference remote documents but to do it with the schema on your computer.

That just means replacing the first few lines of the ONIX file with a different bit of script -- everything before the <Header> tag -- a simple cut and paste that only takes a few seconds. Note that you probably want to restore the original ONIX DTD declaration before you send the ONIX file to trading partners (some may accept the schema declaration -- some may not).

What I find easier is to create a file just for schema validation and then to copy the ONIX data into it. So instead of replacing the declaration I leave it alone and I copy everything from <Header> down to </ONIXMessage> close tag at the file bottom and paste that into my schema declaration file (not overwriting the declaration of course).  That way I preserve the original file with it's original DTD declaration intact.  Then I do my validation work and fix the file in my "schema" declaration file and when all is done I do the reverse:  Cut and paste from the <Header> down to </ONIXMessage> back into the original document. Do yourself a favour and use unique file names when you save, if there are problems you'll still have the original file.

So here are two starter files – use these instead of cut and paste from this document – you'll save time and eliminate a source of errors:

DIY Schema Validation for Workmanlike ONIX

DIY Schema Validation for Workmanlike ONIX

The trick is never to overwrite the declarations.  Problems in the declaration are especially tricky to troubleshoot as they prevent the XML software from working at all.  That makes the error message especially vague – or absent.  See troubleshooting below.

Including a DTD reference?

Remember that you should do this only if your ONIX file contains old style HTML entities (example: &eacute; ) and you want a workaround to leave them alone in your current ONIX 2.1 file.  You'll need to have downloaded a DTD specific to either Reference or Short tags. Please note that they both Reference or Short tag versions have a file named "onix-international.dtd" but these are NOT identical files – one cites "short.elt" and the other cites "reference.elt".  Don't think about it.  Just download the one you need and put it in a directory of it's own – C:/XML/DTD/ONIX2_ref/ - is what I used below.  If you need both, then put them in separate directories.  

And here are two starter files:

DIY Schema Validation for Workmanlike ONIX

DIY Schema Validation for Workmanlike ONIX

Reference or Short Tag?

Hopefully you know this already. ONIX comes in two tag types: Reference and Short. There's no functional difference between the two sets -- but in the schemas you downloaded you can see some are labelled for Short and others for Reference. The tag names used each are a unique set (no overlap), but there is a one-to-one correspondence between them, so the tag sets carry the same meaning only they just appear differently. Why? Well if you had a very big ONIX file of thousands of records, you'll find that a "short" tag file might be as much as 30% smaller. It can help processing.  It doesn't matter which you use, but Editeur and BookNet recommend Reference tags as the "preferred choice".

Short tags look like:
<header>  <!-- lower case -->
<m174>Tags for FromCompany</m174>  <!-- codes expressed by a letter and 3 numbers -->

Reference tags look like:
<Header>  <!-- mixed case with caps at the start of words -->
<FromCompany>Tags for FromCompany</FromCompany>  <!-- mixed case and descriptive -->

All my examples will be in Reference tags, but I've attached a sample schema file for both Short and Reference. They are different so Short Tag users must use the right file.  Let us know if this is confusing and we can re-write the documentation to be more explicit for Short Tags.


You should open your ONIX file and find something like this at the top:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE ONIXMessage SYSTEM "http://www.editeur.org/onix/2.1/02/reference/onix-international.dtd">
<ONIXMessage>

What comes next is the <Header> tag.  What you need to to is replace all of the above (up to but NOT including the <Header> tag) with the following. (Alternatively you can take everything from the <Header> to the file bottom and paste it into my starter files found above.)

<?xml version="1.0" encoding="utf-8"?>
<ONIXMessage 
xmlns="http://www.editeur.org/onix/2.1/reference" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="C:\XML\XSD\ONIX_BookProduct_Release2.1_reference.xsd">

or if you're using the DTD workaround for ONIX 2.1, what's before <Header> needs to be replaced with the following

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE ONIXMessage SYSTEM "C:/XML/DTD/ONIX2_ref/onix-international.dtd">
<ONIXMessage 
xmlns="http://www.editeur.org/onix/2.1/reference" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="C:\XML\XSD\ONIX_BookProduct_Release2.1_reference.xsd">

Note that it has two distinct declarations, the first invoking the .dtd, and a second one that cites the .xsd file.  


Have a look at the original declaration – particularly the first line:

What if my ONIX file's DTD declaration is different than what you show here?  
It should not be a concern (unless the companies you send the file to complain that it's wrong).  What's here is the simplest version possible but there are variations.

I have a different encoding in the first line, what now?  
The expected encoding in North America is UTF-8, but it's common to see ISO-8859-1 and Windows-1252 and the are also accepted by most companies.  You might also see the encoding in lower or UPPER case.  None of that should matter and you should make sure your schema file matches what your ONIX file has for the encoding statement. There are lots of encodings and the expected ones are almost certainly what you'll see, but like any other end user you need to assume that the ONIX file creator used the encoding statement accurately.  Don't mess with it.

I don't have an encoding statement!  All I see is <?xml version="1.0"?>
The convention in XML is to assume the encoding is UTF-8 if it's not supplied.  It's not a good practice to not supply it though, so my advice is to add the encoding as UTF-8 (use the above example exactly).  

Made changes?  Always save the file.  If you close and re-open your file in the text editor it should open in the stated encoding automatically.

Beyond changing the encoding to match the original ONIX file you'll need to change the your schema file to match where you've put the files on your local computer:

C:/XML/XSD/ONIX_BookProduct_Release2.1_reference.xsd

The bit in red should match the location on your computer where you put the file – named in green in this example.  This simple seeming step is usually where a mistake is made and the source of problems when you first do a validation.  If the XML software can't find the reference file it won't work.

So, stop, and check:

Reference tag people, you're looking for:  ONIX_BookProduct_Release2.1_reference.xsd
Short  tag people, you're looking for:  ONIX_BookProduct_Release2.1_short.xsd

On a PC, use your Window's explorer to Search for the file, and when you've located the file, right click on it and look at the Properties.  What it says there should match what you have before the file name in your declaration.

On a MAC, use Finder to search for the file, right click on it and under Info you'll find the location.  What it says there should match what you have before the file name in your declaration.

If you're using the DTD version you'll need to do this twice and also change the line <!DOCTYPE ONIXMessage SYSTEM "C:/XML/DTD/ONIX2_ref/onix-international.dtd">

C:/XML/DTD/ONIX2_ref/onix-international.dtd

to the directory where you've stored the 25 files found in the DTD zip from Editeur.



That's it. Now: Make sure you 

save the file 

as the XML software works with the last saved version – always save the file before you validate it.  Here are some specific instructions for different software – though the most detail is in the first so read it.  All XML software has quirks and needs — the ones listed in the first are not typical but representative of the types of issues you might find:

PC: Setting up to use Schemas using XML Notepad

Start up XML Notepad 2007. One quirk in the software is that you'll need to let the software know where to find the schema by creating an internal link to the XSD schema files on your computer. So:

  • Top menu bar: Click on View
  • From the dropdown: Click on Schema
  • From the dialog box: Click on the box with the ellipse (three dots) at the far right

This opens a standard Windows file finding user interface—use it to navigate to where the schemas are stored and select:

  • For Reference tags: ONIX_BookProduct_Release2.1_reference.xsd
  • For Short tags: ONIX_BookProduct_Release2.1_short.xsd

(You can do this step twice and put both reference and short links here. The program will work fine if you do, but it's only necessary if you use both short and reference tag files.)

A second quirk:  Whenever you first start XML Notepad you'll need to let it know where the local schemas are – it's just a matter of opening the Schema dialog (as above) and clicking “OK."

Your First Validation using XML Notepad

After opening XML Notepad, get in the habit of of opening the schema dialogue as above.  You just need to do it once at the beginning of each session, but the software won't find the schema unless you first "OK" the location using this dialogue.  So, just open the dialogue, click OK.  If you forget to do this when you validate a file you'll get error messages but without line references.  No line numbers?  Do this step and re-do the validation.

Using XML Notepad go to its top menu bar: Click on File / Open and from the dropdown choose the file you changed to include the ONIX data and the schema declaration.  Open it.  Small files will be very fast but a large file may take a minute or two.  Wait for a response (but no more than 3 minutes).    

Above: An error free validation

Above: Errors found the Audience Range composite

Look at the top of your ONIX file and the screen – note that the what's in the declaration is in the upper right.  Expand some of the tree on the left.  What you're looking at is an XML Editor using the information in the schema to give you editing options!  Sweet and potentially useful but not what we are doing.  For what is being done here you'll want to go to the error listed at the first line in the actual file – the ONIX file itself – and figure out the error.  With luck you can change all the entries with a simple problem at once using find and replace.  More on that below.

MACs: Setting up to use Schemas using XML Nanny

Start up XML Nanny. 

  • In Source – use Browse to navigate to your ONIX file (be sure that it's declaration is modified as above to be a "schema" declaration with the correct file locations for a MAC)
  • In Validation, select XML Schema and use Browse to navigate to the appropriate schema file
    • For Reference tags: ONIX_BookProduct_Release2.1_reference.xsd
    • For Short tags: ONIX_BookProduct_Release2.1_short.xsd
  • Click on parse.

There are a couple of windows – the bottom one shows the file, and the top the results.  Check the results all the way to the bottom of the top window as it starts off with several standard responses.

This illustrates an XML Nanny response with problems – note the line number followed by a cryptic message, in this case concerning finding <b243> but not the expected "short" tags <b079> or <b241>.  XML error messages usually read this way – it's not 100% clear but if you go to the Line referenced in this file you'll find an incomplete <imprint> composite.

Setting up to use Schemas using other software

Overall, the instructions for XML Notepad will work for most software – each system will have some quirks unique them but the principals are the same.

For oXygen it's pretty straightforward:

  • Load the ONIX file set up for Schema work by using File / Open
  • In the top bar there is a button with a big check mark on it with a side drop down list, open the drop down.  
  • Click on the Validate With option and navigate to the location of the ONIX_BookProduct_Release2.1_reference.xsd (or the short one as appropriate) and select.
  • Then click on the Check Box – errors appear in the bottom pane (be sure to create errors to check it's working).

Clearly oXygen can do a lot more that XML Notepad can – and has better ways to associate the schema with what you're working on – but that's the quick and dirty method though that works.  One trick here:  Make your changes to the original document (as with XML Notepad above) and not through the edit pane.  If you use the edit pane oXygen will try to enforce the encoding, whereas if you change and save the original file the reloading of it you fool the software into doing less for you...  then it just validates the file against the schema. 

XMLSpy has a similar interface, and will generate the schema declaration for you...  

Troubleshooting problems

Well, if you're looking at a screen and you don't think it's showing any errors – the first question is Are you sure it works at all?  XML is a "fail" based standard – when it encounters a problem processing should stop. Validation is testing the file for things that make processing stop but sometime XML software will simply stop – do nothing – for some types of errors. So, it's a good idea, even if you're sure the software is working to do a simple test of it.  What I usually do is remove a value from the header or the first record and create an "empty set": 

From: <FromCompany>Company Name</FromCompany>  remove the company name leaving: <FromCompany></FromCompany>  – that's an empty set a tag without data and a validation error.  If you make that change and save the file (remember that must always save the file before doing the validation) and re-run the validation then you'll see an error message.  Reinstate the Company Name between the tags, save the file, and re-validate – if the error is gone.  If you can find an error then the software is working.

Creating errors is a good way to learn how the software works:  If you made a mistake doing this and removed the start of the tag as well, say leaving:  <FromCompany>FromCompany> – that makes the document not well formed and the software will not even load the file and only give you a line reference with saying the file is screwed.  Here you'd have to look at the line reference, check it and realize that the Tag isn't closed properly and fix that – and the empty set – to return to an error free state.

But it's not even working!

It's sad but common. Once you've got the software working you'll seldom have trouble thereafter, but the first time things often don't go well.  Deep breathe and go over the basics.  

First: (For XML Notepad or similar editors): Did you open the Schema dialogue and click OK? See Setting up above, but if you see errors with "0" as line references that's the likely problem.  

After checking your software, start with the basics: You need 3 working things working together:  

  • The ONIX schemas on your computer (Did you unpack them from the zip? Are the files in the right directory? – If you unpack the zip in the target directory they may be in a subdirectory off it.).  
  • The ONIX file (Did you reference the right file, short or reference, and is the location correct, back slashes right?  Did you base your schema on the files you download above? Do not cut and paste from this page, but use the downloadable, tested, files to base your declaration on.)
  • And the software.  

The most common problem is that the software is not finding the schemas, either because the internal reference location in the schema declaration is wrong – or because you haven't unpacked them, or you haven't left them where you think.

Finally the other common issue is that you've introduced an error in the declaration.  All those declaration bits must be perfect or nothing will happen – it may not make easy reading but your computer enjoys it – or not – and smart quotes make the computer dumb.  

Very systematically eliminate all those.  If it still doesn't work, do it again, calmly.

If it still doesn't work, check the software and maybe try another parser – that's seldom the issue – but it could be.  Finally you can contact biblioshare@booknetcanada.ca and send us the schema file and ONIX file you're using and we can see if we can help.  We can at least confirm that you have an ONIX file and a working schema.

Bonus ONIX 3.0 Schema starter files!

Catch up to those go-getters already learning ONIX 3.0!  Experiment!  Have fun being the first kid in your bibliographic group to experiment with ONIX 3.0.  Here are equivalent starter files for it:

DIY Schema Validation for Workmanlike ONIX

DIY Schema Validation for Workmanlike ONIX