Serialized DOM Format (SDF) is a format that can represent arbitrary DOM trees, including illegal DOM trees, in text form. It is primarily indended for test suites, but might be useful for other applications as well.
A node is represented by an identifier character followed by one or more strings which carry information about the node. The first string must always be specified, but further strings may be omitted. Unless otherwise stated, omitted strings default to the empty string.
The identifier characters and their meanings are as follows:
Elementnode. It has three strings representing the
namespaceURI, respectively. The
namespaceURIdefaults to "http://www.w3.org/1999/xhtml".
Attrnode. It has four strings representing the
Textnode. It has one string representing the
Commentnode. It has one string representing the
CDATASectionnode. It has one string representing the
ProcessingInstructionnode. It has two strings representing the
DocumentTypenode. It has three strings representing the
To express that a node is a child node of another node, or to express that an
Attr node is part of another node, the line is indented with 2 spaces. An
Attr node must not be a top-level node. The nodes must be indented appropriately so that they form a tree. There may be zero or more top-level nodes.
Should it be required that attributes come before the actual child nodes? Should attributes be required to be sorted?
A node is written as follows:
A string is a JSON string. [JSON]
In the following example, a DOM tree is represented in XML and SDF, respectively:
e "foo" "" "" s "bar" t "baz" c "quux"
Since HTML doesn't support CDATA sections, the above DOM can't be represented in HTML.
In the following example, the DOM tree cannot be represented in XML, but can be in HTML and SDF:
<!-- -- -->
c " -- "
In the following example, the DOM tree cannot be represented in either XML nor HTML, but can be with SDF:
c " --> "
In the following example, the DOM tree is not legal per the DOM specification, but can still be represented with SDF:
t "foo" e "bar"
Trying to build a DOM like this with the standard DOM methods will raise a HIERARCHY_REQUEST_ERR exception.
In the following example, the U+000C FORM FEED and U+1047E SHAVIAN LETTER IAN characters are escaped. Note that the latter is represented as a UTF-16 surrogate pair.
t "form feed: \u000C, ian: \uD801\uDC7E"
Thanks to Henri Sivonen, Lachlan Hunt and Philip Taylor for their contributions.