Associating Style Sheets with XML documents

Version 1.1

Work in Progress — Last Update 4 February 2009

This version:: ...
Latest version:: ...
Previous versions:: ...
Editor:: Simon Pieters, Opera Software, simonp@opera.com

Abstract

This specification defines parsing rules for XML processing instructions with pseudo-attributes and the processing of xml-stylesheet processing instructions, which allow style sheets to be associated with XML documents.

Status of this document

This document is a proposal from Opera Software, intended to replace the Associating Style Sheets with XML documents Version 1.0 specification. [ASSOCIATING] Should this draft be accepted by the W3C, Opera Software will offer Copyright and any other Intellectual Property Rights in accordance with standard practice under the W3C's Royalty-Free Patent Policy.

1 Introduction
2 Common infrastructure
1. 2.1 Terminology
2. 2.2 Conformance requirements
  1. 2.2.1 Dependencies
3 Writing processing instructions with pseudo-attributes
1. 3.1 Conformance constraints
4 Parsing processing instructions with pseudo-attributes
1. 4.1 Tokenizing entities
5 The xml-stylesheet processing instruction
References
Acknowledgments

1 Introduction

1.1 Examples

In the following example, a CSS style sheet is associated with an SVG document:

<?xml-stylesheet type="text/css" href="style.css"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
 ...
</svg>

In the following example, a transformation expressed in XSLT is associated with an XML document:

<?xml-stylesheet type="application/xml" href="transform.xml"?>
<doc>
 <item> ... </item>
 ...
</doc>

1.2 Motivation

The creation of this specification was motivated for the following reasons:

The existing specification was unclear at best with regards to error handling, and required draconian error handling at worst.
The existing specification did not address dynamic changes to the DOM.
The existing specification did not match contemporary implementations.
At the time there were two additional specifications reusing the parsing rules for xml-stylesheet processing instructions. [XBL2] [CORS]

1.3 Goals and constraints

Define reusable parsing rules for processing instructions with pseudo-attributes that is compatible with deployed content and implementations.
Define processing rules for xml-stylesheet processing instructions in terms of the DOM, taking DOM changes into account.
Error handling should be defined and should not be draconian.

1.4 Issues

The parsing rules are a result of reverse engineering browsers. It is quite possible that they could be simplified quite a bit while still supporting existing content.
The charset pseudo-attribute is intended to be in sync with the charset attribute for style sheet links in HTML5. [HTML5]
This draft tries to not step on the toes of other specifications (in particular HTTP and XSLT) but browsers largely ignore various requirements in those specifications, such as ignoring Content-Type metadata and not supporting multiple xml-stylesheet processing instructions for XSLT.

2 Common infrastructure

2.1 Terminology

The term root element means the first child of a Document node that is an Element node.

2.2 Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

2.2.1 Dependencies

This specification relies on the following specifications:

DOM: This specification is defined in terms of the DOM. For the purposes of the requirements in this specification, implementations must act as if they supported some version of DOM Core. [DOM3CORE]
XML: Implementations that support XSLT must support some version of XML with namespaces. [XML] [XMLNS]
Media Queries: Implementations that support CSS must support some version of the Media Queries language. [MQ]

3 Writing processing instructions with pseudo-attributes

Processing instructions that are said to follow the rules for parsing processing instructions with pseudo-attributes must have the data DOM attribute match the PIData production below.

[1]	PIData	::=	(S* PseudoAtt (S PseudoAtt))? S
[2]	S	::=	(#x9 \| #xA \| #xD \| #x20)+
[3]	PseudoAtt	::=	Name Eq (SingleQuoted \| DoubleQuoted)	[CC: Unique PseudoAtt Spec]
[4]	Name	::=	(Char - ('=' \|S))+
[5]	Eq	::=	S* '=' S*
[6]	SingleQuoted	::=	"'" (AttContent - "'")* "'"
[7]	DoubleQuoted	::=	'"' (AttContent - '"')* '"'
[8]	AttContent	::=	Char - ('<' \| '&') \| EntityRef \| CharRef
[9]	EntityRef	::=	'&' \| '<' \| '>' \| '"' \| '''
[10]	CharRef	::=	'&#' [0-9]+ ';' \| '&#x' [0-9a-fA-F]+ ';'	[CC: Legal Character]
[10]	Char	::=	[#x0-#x10FFFF]	/ Any Unicode code point /

3.1 Conformance constraints

Unique PseudoAtt Spec: A pseudo-attribute name must not appear more than once.
Legal Character: Characters referred to using character references must not refer to U+0000 or U+D800..U+DFFF.

These constraints are not well-formedness constraints.

4 Parsing processing instructions with pseudo-attributes

The rules for parsing processing instructions with pseudo-attributes are defined in this section. The user agent must follow these rules whenever the processing instruction's data DOM attribute changes, and whenever the processing instruction is inserted to the DOM or moved in the DOM (which might be a result of changing nearby nodes).

When the user agent hits a parse error, it must act as described below, and may also inform the user that there was an error (e.g. in the error console).

When the user agent is to stop parsing, it must stop the state machine so that pseudo-attributes can be processed. pseudo-attributes is the output of the algorithm.

A pseudo-attribute has a name and a value. When a new pseudo-attribute is created, its name and value must be set the empty string.

A pseudo-attribute can be marked as being in error. This will result in the pseudo-attribute being ignored when it has been completely parsed.

The next input character is the first character in the processing instruction's data DOM attribute that has not yet been consumed. Initially the next input character is the first character in the attribute.

"EOF" is a conceptual character representing the end of the processing instruction's data DOM attribute.

Let pseudo-attributes be the empty array. Start in the before name state.

The state machine is as follows:

Before name state

Consume the next input character:

U+0009 CHARACTER TABULATION
U+000A LINE FEED (LF)
U+000D CARRIAGE RETURN (CR)
U+0020 SPACE: Stay in the before name state.
U+003D EQUALS SIGN (=): Parse error. Stop parsing.
EOF: Stop parsing.
Anything else: Create a new pseudo-attribute and append the input character to its name. Switch to the name state.

Name state

Consume the next input character:

U+0009 CHARACTER TABULATION
U+000A LINE FEED (LF)
U+000D CARRIAGE RETURN (CR)
U+0020 SPACE: If there is a pseudo-attribute in pseudo-attributes that has the same name as this pseudo-attribute, then this is a parse error; mark the pseudo-attribute as being in error. In any case, switch to the after name state.
U+003D EQUALS SIGN (=): If there is a pseudo-attribute in pseudo-attributes that has the same name as this pseudo-attribute, then this is a parse error; mark the pseudo-attribute as being in error. In any case, switch to the before value state.
EOF: Parse error. Stop parsing.
Anything else: Append the input character to the pseudo-attribute's name. Stay in the name state.

After name state

Consume the next input character:

U+0009 CHARACTER TABULATION
U+000A LINE FEED (LF)
U+000D CARRIAGE RETURN (CR)
U+0020 SPACE: Stay in the after name state.
U+003D EQUALS SIGN (=): Switch to the before value state.
Anything else: Parse error. Stop parsing.

Before value state

Consume the next input character:

U+0009 CHARACTER TABULATION
U+000A LINE FEED (LF)
U+000D CARRIAGE RETURN (CR)
U+0020 SPACE: Stay in the before value state
U+0022 QUOTATION MARK ("): Switch to the value (double-quoted) state.
U+0027 APOSTROPHE ('): Switch to the value (single-quoted) state.
Anything else: Parse error. Stop parsing.

Value (double-quoted) state

Consume the next input character:

U+0022 QUOTATION MARK ("): If the pseudo-attribute is not in error, then append the pseudo-attribute to pseudo-attributes. In any case, switch to the after value state.
U+0026 AMPERSAND (&): Attempt to consume an entity.
U+003C LESS-THAN SIGN (<): Parse error. Mark the pseudo-attribute as being in error.
EOF: Parse error. Stop parsing.
Anything else: Append the character to the pseudo-attribute's value.

Value (single-quoted) state

Consume the next input character:

U+0027 APOSTROPHE ('): If the pseudo-attribute is not in error, then append the pseudo-attribute to pseudo-attributes. In any case, switch to the after value state.
U+0026 AMPERSAND (&): Attempt to consume an entity.
U+003C LESS-THAN SIGN (<): Parse error. Mark the pseudo-attribute as being in error.
EOF: Parse error. Stop parsing.
Anything else: Append the character to the pseudo-attribute's value.

After value state

Consume the next input character:

U+0009 CHARACTER TABULATION
U+000A LINE FEED (LF)
U+000D CARRIAGE RETURN (CR)
U+0020 SPACE: Switch to the before name state.
EOF: Stop parsing.
Anything else: Parse error. Reconsume the current input character in the name state.

4.1 Tokenizing entities

This section defines how to consume an entity.

The behavior depends on the identity of the next character (the one immediately after the U+0026 AMPERSAND character):

U+0023 NUMBER SIGN (#)

Consume the U+0023 NUMBER SIGN.

The behavior further depends on the character after the U+0023 NUMBER SIGN:

U+0078 LATIN SMALL LETTER X

Consume the U+0023 LATIN SMALL LETTER X.

Follow the steps below, but using the range of characters U+0030 DIGIT ZERO through to U+0039 DIGIT NINE, U+0061 LATIN SMALL LETTER A through to U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A, through to U+0046 LATIN CAPITAL LETTER F (in other words, 0-9, a-f, and A-F).

When it comes to interpreting the number, interpret it as a hexadecimal number.

Anything else

Follow the steps below, but using the range of characters U+0030 DIGIT ZERO through to U+0039 DIGIT NINE (i.e. just 0-9).

When it comes to interpreting the number, interpret it as a decimal number.

Consume as many characters as match the range of characters given above.

If no characters match the range, then this is a parse error; mark the pseudo-attribute as being in error.

Otherwise, if the next character is a U+003B SEMICOLON, consume that too. If it isn't, there is a parse error; mark the pseudo-attribute as being in error.

If one or more characters match the range, then take them all and interpret the string of characters as a number (either hexadecimal or decimal as appropriate).

If the number is zero, if the number is higher than 0x10FFFF, or if it's one of the surrogate characters (characters in the range 0xD800 to 0xDFFF), then this is a parse error; mark the pseudo-attribute as being in error.

Otherwise, append the Unicode character whose code point is that number to the pseudo-attribute's value.

Anything else

Consume the maximum number of characters possible, with the consumed characters case-sensitively matching one of the identifiers in the first column of the following table:

Entity name	Character
`amp;`	U+0026
`apos;`	U+0027
`gt;`	U+003E
`lt;`	U+003C
`quot;`	U+0022

If no match can be made, then this is a parse error; mark the pseudo-attribute as being in error.

Otherwise, append the character corresponding to the entity name (as given by the second column of the table above) to the pseudo-attribute's value.

5 The `xml-stylesheet` processing instruction

Processing instructions must not have the target DOM attribute set to "xml-stylesheet" if it has a parent that is not a Document node, or if it does but is after the root element.

For xml-stylesheet processing instructions that are children of the Document object and are before the root element (if any), the user agent must use the rules for parsing processing instructions with pseudo-attributes to obtain the pseudo-attributes.

The type pseudo-attribute represents a hint about the resource's MIME type, and must consist of a valid MIME type, optionally with parameters. The user agent may opt to abort processing the processing instruction if the MIME type given in the type pseudo-attribute is known to be unsupported. For the purposes of this pseudo-attribute, text/xsl must be assumed to be an XML MIME type.

The href pseudo-attribute gives the address of the resource. The pseudo-attribute must be present and must consist of an IRI reference. If the pseudo-attribute is present, the user agent should begin to download the resource (subject to user agent specific downloading policies, e.g. security). [IRI]

The title pseudo-attribute defines alternative style sheet sets. [CSSOM]

The alternate pseudo-attribute must either have the literal value "yes" or "no". If the value is "yes", then the referenced resource is an alternative style sheet. [CSSOM]

The media pseudo-attribute says which media the referenced resource applies to. The value must be a valid media query. [MQ] The user agent must only apply the styles to views while their state match the listed media. [DOM2VIEWS]

The previous version of this specification had a charset pseudo-attribute, which has been dropped in this version. [ASSOCIATING]

If the MIME type of the resource given in the href pseudo-attribute is text/css (ignoring parameters), then the resource must be processed according to the rules in CSS. [CSS21]

If the processing instruction being processed was inserted in the DOM by the XML parser, and the resource given in the href pseudo-attribute is XML (text/xml, application/xml, or any MIME type that ends with +xml (ignoring parameters)), and the root element (or, if the fragment identifier is present, the first (pre-order, depth-first) element that has that ID) is in the namespace http://www.w3.org/1999/XSL/Transform or has a version attribute in that namespace, then that document (or element) must be processed according to the rules in XSLT. [XSLT] In this case, the title, alternate and media pseudo-attributes do not apply.

References

[ASSOCIATING]: (Non-normative) http://www.w3.org/1999/06/REC-xml-stylesheet-19990629
[XBL2]: (Non-normative) http://www.w3.org/TR/2007/CR-xbl-20070316/
[CORS]: (Non-normative) http://www.w3.org/TR/2008/WD-access-control-20080912/
[HTML5]: (Non-normative) http://www.w3.org/TR/2008/WD-html5-20080610/
[RFC2119]: http://www.ietf.org/rfc/rfc2119.txt
[DOM3CORE]: http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
[XML]: http://www.w3.org/TR/2008/REC-xml-20081126/
[XMLNS]: http://www.w3.org/TR/2006/REC-xml-names-20060816/
[MQ]: http://www.w3.org/TR/2008/WD-css3-mediaqueries-20081015/
[IRI]: http://www.ietf.org/rfc/rfc3987.txt
[CSSOM]: http://dev.w3.org/csswg/cssom/
[DOM2VIEWS]: http://www.w3.org/TR/2000/REC-DOM-Level-2-Views-20001113/
[CSS21]: http://www.w3.org/TR/2007/CR-CSS21-20070719/
[XSLT]: http://www.w3.org/TR/1999/REC-xslt-19991116

Acknowledgments

Thanks to Anne van Kesteren, Charles McCathieNevile, Daniel Bratell, George Chavchanidze, Ian Hickson, Jens Lindström, Maciej Stachowiak, Max Froumentin and Philip Taylor for their useful and substantial comments.

Associating Style Sheets with XML documents

Version 1.1

Work in Progress — Last Update 4 February 2009

Abstract

Status of this document

Table of Contents

1 Introduction

1.1 Examples

1.2 Motivation

1.3 Goals and constraints

1.4 Issues

2 Common infrastructure

2.1 Terminology

2.2 Conformance requirements

2.2.1 Dependencies

3 Writing processing instructions with pseudo-attributes

3.1 Conformance constraints

4 Parsing processing instructions with pseudo-attributes

4.1 Tokenizing entities

5 The xml-stylesheet processing instruction

References

Acknowledgments

5 The `xml-stylesheet` processing instruction