Associating Style Sheets with XML documents

Version 1.1

Work in Progress — Last Update [DATE: 01 Jan 1901]

This version:: ...
Latest version:: ...
Previous versions:: ...
Editor:: Simon Pieters, Opera Software, simonp@opera.com

Abstract

This specification defines parsing rules for XML processing instructions with pseudo-attributes and the processing of xml-stylesheet processing instructions, which allow style sheets to be associated with XML documents.

Status of this document

This document is a proposal from Opera Software, intended to replace the Associating Style Sheets with XML documents Version 1.0 specification. [ASSOCIATING] Should this draft be accepted by the W3C, Opera Software will offer Copyright and any other Intellectual Property Rights in accordance with standard practice under the W3C's Royalty-Free Patent Policy.

Introduction

Examples

In the following example, a CSS style sheet is associated with an SVG document:

<?xml-stylesheet type="text/css" href="style.css"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
 ...
</svg>

In the following example, a transformation expressed in XSLT is associated with an XML document:

<?xml-stylesheet type="application/xml" href="transform.xml"?>
<doc>
 <item> ... </item>
 ...
</doc>

Motivation

The creation of this specification was motivated for the following reasons:

The existing specification was unclear at best with regards to error handling, and required draconian error handling at worst.
The existing specification did not address dynamic changes to the DOM.
The existing specification did not match contemporary implementations.
At the time there were two additional specifications reusing the parsing rules for xml-stylesheet processing instructions. [XBL2] [CORS]

Goals and constraints

Define reusable parsing rules for processing instructions with pseudo-attributes that is compatible with deployed content and implementations.
Define processing rules for xml-stylesheet processing instructions in terms of the DOM, taking DOM changes into account.
Error handling should be defined and should not be draconian.

Issues

The parsing rules are a result of reverse engineering browsers. It is quite possible that they could be simplified quite a bit while still supporting existing content.
The charset pseudo-attribute is intended to be in sync with the charset attribute for style sheet links in HTML5. [HTML5]
This draft tries to not step on the toes of other specifications (in particular HTTP and XSLT) but browsers largely ignore various requirements in those specifications, such as ignoring Content-Type metadata and not supporting multiple xml-stylesheet processing instructions for XSLT.

Common infrastructure

Terminology

The term root element means the first child of a Document node that is an Element node.

Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Dependencies

This specification relies on the following specifications:

DOM: This specification is defined in terms of the DOM. For the purposes of the requirements in this specification, implementations must act as if they supported some version of DOM Core. [DOM3CORE]
XML: Implementations that support XSLT must support some version of XML with namespaces. [XML] [XMLNS]
Media Queries: Implementations that support CSS must support some version of the Media Queries language. [MQ]

Writing processing instructions with pseudo-attributes

Processing instructions that are said to follow the rules for parsing processing instructions with pseudo-attributes must have the data DOM attribute match the PIData production below.

[1]	PIData	::=	(S* PseudoAtt (S PseudoAtt))? S
[2]	S	::=	(#x9 \| #xA \| #xD \| #x20)+
[3]	PseudoAtt	::=	Name Eq (SingleQuoted \| DoubleQuoted)	[CC: Unique PseudoAtt Spec]
[4]	Name	::=	(Char - ('=' \|S))+
[5]	Eq	::=	S* '=' S*
[6]	SingleQuoted	::=	"'" (AttContent - "'")* "'"
[7]	DoubleQuoted	::=	'"' (AttContent - '"')* '"'
[8]	AttContent	::=	Char - ('<' \| '&') \| EntityRef \| CharRef
[9]	EntityRef	::=	'&' \| '<' \| '>' \| '"' \| '''
[10]	CharRef	::=	'&#' [0-9]+ ';' \| '&#x' [0-9a-fA-F]+ ';'	[CC: Legal Character]
[10]	Char	::=	[#x0-#x10FFFF]	/ Any Unicode code point /

Conformance constraints

Unique PseudoAtt Spec: A pseudo-attribute name must not appear more than once.
Legal Character: Characters referred to using character references must not refer to U+0000 or U+D800..U+DFFF.

These constraints are not well-formedness constraints.

Parsing processing instructions with pseudo-attributes

The rules for parsing processing instructions with pseudo-attributes are defined in this section. The user agent must follow these rules whenever the processing instruction's data DOM attribute changes, and whenever the processing instruction is inserted to the DOM or moved in the DOM (which might be a result of changing nearby nodes).

When the user agent hits a parse error, it must act as described below, and may also inform the user that there was an error (e.g. in the error console).

When the user agent is to stop parsing, it must stop the state machine so that pseudo-attributes can be processed. pseudo-attributes is the output of the algorithm.

A pseudo-attribute has a name and a value. When a new pseudo-attribute is created, its name and value must be set the empty string.

A pseudo-attribute can be marked as being in error. This will result in the pseudo-attribute being ignored when it has been completely parsed.

The next input character is the first character in the processing instruction's data DOM attribute that has not yet been consumed. Initially the next input character is the first character in the attribute.

"EOF" is a conceptual character representing the end of the processing instruction's data DOM attribute.

Let pseudo-attributes be the empty array. Start in the before name state.

The state machine is as follows:

Before name state