Understanding HTML and MIME

I have dropped the differentiation of HTML into a sequence of conformance levels. Many people confused levels with versions. The different levels also encourage interoperability problems! Lets encourage full conformance with HTML 2.0 or HTML 3.0 rather than perpetuating intermediate levels of support.

HTML as an Internet Media Type

This (and upward compatible specifications) define the Internet Media Type (RFC 1590) and MIME Content Type (RFC 1521) called "text/html". The type "text/html" accepts the following parameters:

Version
To help avoid future compatibility problems, the version parameter may be used to give the version number of the specification to which the document conforms. The version number appears at the front of this document and within the public identifier for the SGML DTD. This specification defines version 3.0.
Character sets
The charset parameter (as defined in section 7.1.1 of RFC 1521) may be used with the text/html content type to specify the encoding used to represent the HTML document as a sequence of bytes. Normally, text/* media types specify a default of US-ASCII for the charset parameter. However, for text/html, if the byte stream contains data that is not in the 7-bit US-ASCII set, the HTML interpreting agent should assume a default charset of ISO-8859-1.

When an HTML document is encoded using US-ASCII, the mechanisms of numeric character references and character entity references may be used to encode additional characters from ISO-8859-1. Character entity references are needed for symbols such as math and greek characters from other unspecified character sets.

Other values for the charset parameter are not defined in this specification, but may be specified in future versions of HTML. It is envisioned that HTML will use the charset parameter to allow support for non-Latin characters such as Arabic, Hebrew, Cyrillic and Japanese, rather than relying on any SGML mechanism for doing so.

What about Unicode and its assorted encodings? This section would benefit from an explanation of the issues underlying support for multiple character sets and the problems arising from bidirectionality.