There are two levels of correctness of an XML document:
- Well-formed XML documents basically conform to XML sytnax rules, and nothing else. “Conforming Parsers” are not allowed to process XML documents that are not well-formed.
- Valid XML documents, in addition to being well-formed, conform to some semantic rules, typically user-defined by means of an XML schema or DTD. “Validating Parsers” are not allowed to parse XML documents that are not valid.
The most significant rules that an XML document must follow to qualify as being “well-formed” are:
- There must be one, and only one, root (top-level) element.
- Non-empty elements must be delimited with matching start and end tags.
- Empty elements must have a self-closing tag.
- Attribute values must be delimited with matching single *or* double quotes.
- Tags may be nested but must not overlap.
- Element names must follow naming conventions:
- Names can start with letters (including non-Latin characters) or the “_” character, but not numbers or other punctuation characters.
- After the first character, numbers are allowed, as are the characters “-” and “.”.
- Names can’t contain spaces.
- Names can’t contain the reserved character “:”, unless namespaces are being used.
- Names can’t start with the letters “xml”, in any case (upper/lower/mixed).
- There can’t be a space after the opening “<” character; the name of the element must come immediately after it. However, there can be space before the closing “>”character, if desired.
- The document must comply with the specified character encoding (if any). If not specified, the default encoding is taken as Unicode/UTF-8.
- Unlike in HTML, whitespace is retained in XML (Web Browsers use XSLT to transform XML to HTML for display, so whitespaces appear to have been stripped).