| Table of Content: General overviewThe definitionSimple rules
    How to reference a DTD from a documentDeclaring elementsDeclaring attributes
Some examplesHow to validateOther resources
 Well what is validation and what is a DTD ? DTD is the acronym for Document Type Definition. This is a
descriptionofthe content for a family of XML files. This is part of the
XML1.0specification, and allows one to describe and verify that a
givendocumentinstance conforms to the set of rules detailing its structure
andcontent. Validation is the process of checking a document against a
DTD(moregenerally against a set of construction rules). The validation process and building DTDs are the two most difficultpartsof
the XML life cycle. Briefly a DTD defines all the possible elementsto befound
within your document, what is the formal shape of your documenttree(by
defining the allowed content of an element; either text, aregularexpression
for the allowed list of children, or mixed content i.e.both textand
children). The DTD also defines the valid attributes for allelements andthe
types of those attributes. The W3C XML Recommendation(Tim Bray's annotated
versionofRev1): (unfortunately) all this is inherited from the SGML world, the
syntaxisancient... Writing DTDs can be done in many ways. The rules to build them if
youneedsomething permanent or something which can evolve over time can
beradicallydifferent. Really complex DTDs like DocBook ones are flexible
butquiteharder to design. I will just focus on DTDs for a formats with a
fixedsimplestructure. It is just a set of basic rules, and definitely
notexhaustive norusable for complex DTD design. Assuming the top element of the document is specand the
dtdisplaced in the filemydtdin the
subdirectorydtdsofthe directory from where the document were
loaded: <!DOCTYPE spec SYSTEM "dtds/mydtd">
 Notes: The system string is actually an URI-Reference (as defined in RFC 2396) so you can
    useafull URL string indicating the location of your DTD on the Web. This
    isareally good thing to do if you want others to validate
  yourdocument.It is also possible to associate a PUBLICidentifier(amagic
    string) so that the DTD is looked up in catalogs on the clientsidewithout
    having to locate it on the web.A DTD contains a set of element and attribute declarations,
    buttheydon't define what the root of the document should be. This
    isexplicitlytold to the parser/validator as the first element
    oftheDOCTYPEdeclaration.
 The following declares an element spec: <!ELEMENT spec (front, body, back?)>
 It also expresses that the spec element contains onefront,onebodyand one optionalbackchildren elements inthis
order. The declaration of oneelement of the structure and its contentare done
in a single declaration.Similarly the following
declaresdiv1elements: <!ELEMENT div1 (head, (p | list | note)*, div2?)>
 which means div1 contains one headthen a series
ofoptionalp,lists andnotes and
thenanoptionaldiv2. And last but not least an element
cancontaintext: <!ELEMENT b (#PCDATA)>
 bcontains text or being of mixed content (text and
elementsinno particular order):
 <!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
 p can contain text ora,ul,b,i oremelements inno particularorder.
 Again the attributes declaration includes their content definition: <!ATTLIST termdef name CDATA #IMPLIED>
 means that the element termdefcan have
anameattribute containing text (CDATA) and which
isoptional(#IMPLIED). The attribute value can also be
definedwithin aset: <!ATTLIST list
type(bullets|ordered|glossary)"ordered">
 means listelement have atypeattribute
with3allowed values "bullets", "ordered" or "glossary" and which
defaultto"ordered" if the attribute is not explicitly specified. The content type of an attribute can be
text(CDATA),anchor/reference/references(ID/IDREF/IDREFS),entity(ies)(ENTITY/ENTITIES)
orname(s)(NMTOKEN/NMTOKENS). The following
definesthat achapterelement can have an
optionalidattributeof typeID, usable for reference
fromattribute of typeIDREF: <!ATTLIST chapter id ID #IMPLIED>
 The last value of an attribute definition can
be#REQUIREDmeaning that the attribute has to be
given,#IMPLIEDmeaning that it is optional, or the default
value(possibly prefixed by#FIXEDif it is the only allowed). Notes: The directory test/valid/dtds/in the
libxml2distributioncontains some complex DTD examples. The example in
thefiletest/valid/dia.xmlshows an XML file where the simple
DTDisdirectly included within the document. The simplest way is to use the xmllint program included with
libxml.The--validoption turns-on validation of the files given
asinput.For example the following validates a copy of the first revision of
theXML1.0 specification: xmllint --valid --noout test/valid/REC-xml-19980210.xml
 the -- noout is used to disable output of the resulting tree. The --dtdvalid dtdallows validation of the
document(s)againsta given DTD. Libxml2 exports an API to handle DTDs and validation, check the associateddescription. DTDs are as old as SGML. So there may be a number of examples
on-line,Iwill just list one for now, others pointers welcome: I suggest looking at the examples found under test/valid/dtd and any
ofthelarge number of books available on XML. The dia example in
test/validshouldbe both simple and complete enough to allow you to build your
own. Daniel Veillard |