© Copyright 2008-2009 Opera Software ASA. All rights reserved.
This specification defines parsing rules for XML processing instructions with pseudo-attributes and the processing of xml-stylesheet
processing instructions, which allow style sheets to be associated with XML documents.
This document is a proposal from Opera Software, intended to replace the Associating Style Sheets with XML documents Version 1.0 specification. [ASSOCIATING] Should this draft be accepted by the W3C, Opera Software will offer Copyright and any other Intellectual Property Rights in accordance with standard practice under the W3C's Royalty-Free Patent Policy.
In the following example, a CSS style sheet is associated with an SVG document:
<?xml-stylesheet type="text/css" href="style.css"?> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100"> ... </svg>
In the following example, a transformation expressed in XSLT is associated with an XML document:
<?xml-stylesheet type="application/xml" href="transform.xml"?> <doc> <item> ... </item> ... </doc>
The creation of this specification was motivated for the following reasons:
The existing specification was unclear at best with regards to error handling, and required draconian error handling at worst.
The existing specification did not address dynamic changes to the DOM.
The existing specification did not match contemporary implementations.
At the time there were two additional specifications reusing the parsing rules for xml-stylesheet
processing instructions. [XBL2] [CORS]
Define reusable parsing rules for processing instructions with pseudo-attributes that is compatible with deployed content and implementations.
Define processing rules for xml-stylesheet
processing instructions in terms of the DOM, taking DOM changes into account.
Error handling should be defined and should not be draconian.
The parsing rules are a result of reverse engineering browsers. It is quite possible that they could be simplified quite a bit while still supporting existing content.
The charset
pseudo-attribute is intended to be in sync with the charset
attribute for style sheet links in HTML5. [HTML5]
This draft tries to not step on the toes of other specifications (in particular HTTP and XSLT) but browsers largely ignore various requirements in those specifications, such as ignoring Content-Type
metadata and not supporting multiple xml-stylesheet
processing instructions for XSLT.
The term root element means the first child of a Document
node that is an Element
node.
All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
This specification relies on the following specifications:
This specification is defined in terms of the DOM. For the purposes of the requirements in this specification, implementations must act as if they supported some version of DOM Core. [DOM3CORE]
Implementations that support XSLT must support some version of XML with namespaces. [XML] [XMLNS]
Implementations that support CSS must support some version of the Media Queries language. [MQ]
Processing instructions that are said to follow the rules for parsing processing instructions with pseudo-attributes must have the data
DOM attribute match the PIData production below.
[1] | PIData | ::= | (S* PseudoAtt (S PseudoAtt)*)? S* | |
[2] | S | ::= | (#x9 | #xA | #xD | #x20)+ | |
[3] | PseudoAtt | ::= | Name Eq (SingleQuoted | DoubleQuoted) | [CC: Unique PseudoAtt Spec] |
[4] | Name | ::= | (Char - ('=' |S))+ | |
[5] | Eq | ::= | S* '=' S* | |
[6] | SingleQuoted | ::= | "'" (AttContent - "'")* "'" | |
[7] | DoubleQuoted | ::= | '"' (AttContent - '"')* '"' | |
[8] | AttContent | ::= | Char - ('<' | '&') | EntityRef | CharRef | |
[9] | EntityRef | ::= | '&' | '<' | '>' | '"' | ''' | |
[10] | CharRef | ::= | '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' | [CC: Legal Character] |
[10] | Char | ::= | [#x0-#x10FFFF] | /* Any Unicode code point */ |
A pseudo-attribute name must not appear more than once.
Characters referred to using character references must not refer to U+0000 or U+D800..U+DFFF.
These constraints are not well-formedness constraints.
The rules for parsing processing instructions with pseudo-attributes are defined in this section. The user agent must follow these rules whenever the processing instruction's data
DOM attribute changes, and whenever the processing instruction is inserted to the DOM or moved in the DOM (which might be a result of changing nearby nodes).
When the user agent hits a parse error, it must act as described below, and may also inform the user that there was an error (e.g. in the error console).
When the user agent is to stop parsing, it must stop the state machine so that pseudo-attributes can be processed. pseudo-attributes is the output of the algorithm.
A pseudo-attribute has a name and a value. When a new pseudo-attribute is created, its name and value must be set the empty string.
A pseudo-attribute can be marked as being in error. This will result in the pseudo-attribute being ignored when it has been completely parsed.
The next input character is the first character in the processing instruction's data
DOM attribute that has not yet been consumed. Initially the next input character is the first character in the attribute.
"EOF" is a conceptual character representing the end of the processing instruction's data
DOM attribute.
Let pseudo-attributes be the empty array. Start in the before name state.
The state machine is as follows:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
This section defines how to consume an entity.
The behavior depends on the identity of the next character (the one immediately after the U+0026 AMPERSAND character):
Consume the U+0023 NUMBER SIGN.
The behavior further depends on the character after the U+0023 NUMBER SIGN:
Consume the U+0023 LATIN SMALL LETTER X.
Follow the steps below, but using the range of characters U+0030 DIGIT ZERO through to U+0039 DIGIT NINE, U+0061 LATIN SMALL LETTER A through to U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A, through to U+0046 LATIN CAPITAL LETTER F (in other words, 0-9, a-f, and A-F).
When it comes to interpreting the number, interpret it as a hexadecimal number.
Follow the steps below, but using the range of characters U+0030 DIGIT ZERO through to U+0039 DIGIT NINE (i.e. just 0-9).
When it comes to interpreting the number, interpret it as a decimal number.
Consume as many characters as match the range of characters given above.
If no characters match the range, then this is a parse error; mark the pseudo-attribute as being in error.
Otherwise, if the next character is a U+003B SEMICOLON, consume that too. If it isn't, there is a parse error; mark the pseudo-attribute as being in error.
If one or more characters match the range, then take them all and interpret the string of characters as a number (either hexadecimal or decimal as appropriate).
If the number is zero, if the number is higher than 0x10FFFF, or if it's one of the surrogate characters (characters in the range 0xD800 to 0xDFFF), then this is a parse error; mark the pseudo-attribute as being in error.
Otherwise, append the Unicode character whose code point is that number to the pseudo-attribute's value.
Consume the maximum number of characters possible, with the consumed characters case-sensitively matching one of the identifiers in the first column of the following table:
Entity name | Character |
---|---|
amp; | U+0026 |
apos; | U+0027 |
gt; | U+003E |
lt; | U+003C |
quot; | U+0022 |
If no match can be made, then this is a parse error; mark the pseudo-attribute as being in error.
Otherwise, append the character corresponding to the entity name (as given by the second column of the table above) to the pseudo-attribute's value.
xml-stylesheet
processing instructionProcessing instructions must not have the target
DOM attribute set to "xml-stylesheet
" if it has a parent that is not a Document
node, or if it does but is after the root element.
For xml-stylesheet
processing instructions that are children of the Document
object and are before the root element (if any), the user agent must use the rules for parsing processing instructions with pseudo-attributes to obtain the pseudo-attributes.
The type
pseudo-attribute represents a hint about the resource's MIME type, and must consist of a valid MIME type, optionally with parameters. The user agent may opt to abort processing the processing instruction if the MIME type given in the type
pseudo-attribute is known to be unsupported. For the purposes of this pseudo-attribute, text/xsl
must be assumed to be an XML MIME type.
The href
pseudo-attribute gives the address of the resource. The pseudo-attribute must be present and must consist of an IRI reference. If the pseudo-attribute is present, the user agent should begin to download the resource (subject to user agent specific downloading policies, e.g. security). [IRI]
The title
pseudo-attribute defines alternative style sheet sets. [CSSOM]
The alternate
pseudo-attribute must either have the literal value "yes
" or "no
". If the value is "yes
", then the referenced resource is an alternative style sheet. [CSSOM]
The media
pseudo-attribute says which media the referenced resource applies to. The value must be a valid media query. [MQ] The user agent must only apply the styles to views while their state match the listed media. [DOM2VIEWS]
The previous version of this specification had a charset
pseudo-attribute, which has been dropped in this version. [ASSOCIATING]
If the MIME type of the resource given in the href
pseudo-attribute is text/css
(ignoring parameters), then the resource must be processed according to the rules in CSS. [CSS21]
If the processing instruction being processed was inserted in the DOM by the XML parser, and the resource given in the href
pseudo-attribute is XML (text/xml
, application/xml
, or any MIME type that ends with +xml
(ignoring parameters)), and the root element (or, if the fragment identifier is present, the first (pre-order, depth-first) element that has that ID) is in the namespace http://www.w3.org/1999/XSL/Transform
or has a version
attribute in that namespace, then that document (or element) must be processed according to the rules in XSLT. [XSLT] In this case, the title
, alternate
and media
pseudo-attributes do not apply.
Thanks to Anne van Kesteren, Charles McCathieNevile, Daniel Bratell, George Chavchanidze, Ian Hickson, Jens Lindström, Maciej Stachowiak, Max Froumentin and Philip Taylor for their useful and substantial comments.