Parsing XHTML documents with .NET 4.0 and XmlPreloadedResolver

When I looked at "What's new in System.Xml in .NET 4.0/Visual Studio 2010" with the beta 1 release I presented an example that shows how parsing an XHTML document referencing one of the W3C XHTML 1.0 DTDs can be sped up by using the new XmlReaderSettings.DtdProcessing set to DtdProcessing.Ignore. The drawback I mentioned is that any referenced entity in the document would then throw an exception.

What I overlooked at the time of the beta 1 release but I have found now in the recent beta 2 release is the new class XmlPreloadedResolver in System.Xml.Resolvers. It allows you to avoid any network access to the W3C's server for the XHTML DTDs but nevertheless parse any XHTML document having entity references as it uses copies of those DTDs stored in an assembly deployed with the .NET framework.

If I use that class with an adaption of the older example the code looks as follows:

            Stopwatch watch = new Stopwatch();
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;


string xhtml = @"<!DOCTYPE html
PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
<html xml:lang=""en"">
<head>
<title>Example</title>
</head>
<body>
<p>Price is: 100 &euro;</p>
</body>
</html>"
;
watch.Start();
using (XmlReader reader = XmlReader.Create(new StringReader(xhtml), settings))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine(reader.Value);
}
}
}
watch.Stop(); ;
Console.WriteLine("First parse: elapsed time: {0}", watch.Elapsed);

watch.Reset();

settings.XmlResolver = new XmlPreloadedResolver(XmlKnownDtds.Xhtml10);

watch.Start();
using (XmlReader reader = XmlReader.Create(new StringReader(xhtml), settings))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine(reader.Value);
}
}
}
watch.Stop(); ;
Console.WriteLine("Second parse: elapsed time: {0}", watch.Elapsed);

Running that code here with Visual Studio 2010 Beta 2 in a virtual machine outputs numbers clearly showing the speed gained by parsing with the XmlPreloadedResolver:

First parse: elapsed time: 00:00:04.6378648
Second parse: elapsed time: 00:00:00.0441933

 

Posted by Martin Honnen | with no comments

Exploiting covariance with LINQ to XML

In my last post I showed how the new contravariance feature in .NET 4.0/Visual Studio 2010 for type parameters of generic interfaces makes coding with LINQ to XML easier and more straightforward. In this post I will show how the covariance of the type parameter T of IEnumerable<T> also allows us to write LINQ to XML queries in a more straightforward way.

Let's assume we have the following XML document:

<?xml version="1.0" encoding="utf-8" ?>
<root>
<!-- comment 1 -->
<foo>foo 1</foo>
<bar>bar 1</bar>
<!-- comment 2 -->
<foo>foo 2</foo>
<bar>2</bar>
<!-- comment 3
-->
</root>

and we want to transform that document into a second one with the same root element having the same child nodes, except where all 'bar' child elements of the 'root' element node have been removed. So the result should look as follows:

<?xml version="1.0" encoding="utf-8" ?>
<root>
<!-- comment 1 -->
<foo>foo 1</foo>
<!-- comment 2 -->
<foo>foo 2</foo>
<!-- comment 3
-->
</root>

A first attempt to achieve that with LINQ to XML could look as follows:

            XDocument doc1 = XDocument.Load(@"XMLFile1.xml");

XDocument doc2 =
new XDocument(
new XElement(doc1.Root.Name,
doc1
.Root
.Nodes()
.Except(
doc1
.Root
.Elements("bar"))));

doc2.Save(Console.Out);

So we create a new XDocument with a new root XElement having the same name as the Root of the first XDocument where all child nodes except of the 'bar' child elements are copied.

Looks nice and straightforward only if you try to compile that with Visual Studio 2008/.NET 3.5 you get the following error: "Argument '2': cannot convert from 'System.Collections.Generic.IEnumerable<System.Xml.Linq.XElement>' to 'System.Collections.Generic.IEnumerable<System.Xml.Linq.XNode>'".

The problem is that the Nodes() call returns an IEnumerable<XNode> and then the following Except() call also needs an IEnumerable<XNode> as its argument while Elements("bar") gives us an IEnumerable<XElement>. With generic interfaces being invariant in .NET 3.5 we can't pass that IEnumerable<XElement> in for an IEnumerable<XNode>, although XElement is a class derived from XNode.

As a workaround we can first cast the IEnumerable<XElement> to an IEnumerable<XNode>:

            XDocument doc1 = XDocument.Load(@"XMLFile1.xml");

XDocument doc2 =
new XDocument(
new XElement(doc1.Root.Name,
doc1
.Root
.Nodes()
.Except(
doc1
.Root
.Elements("bar")
.Cast<XNode>())));

doc2.Save(Console.Out);

That way it compiles fine and produces the wanted result with .NET 3.5, only it seems desirable that you would not need that Cast<XNode>() call.

The good news is that starting with .NET 4.0 the type parameter T of IEnumerable<T> is covariant meaning where an IEnumerable<T> of a certain type T is expected we can always pass in an IEnumerable<T2> where T2 is type derived from T, as in our example where XElement is a subclass of XNode (or subsubclass to be precise).

Thus with .NET 4.0 the following compiles and works fine:

            XDocument doc1 = XDocument.Load(@"XMLFile1.xml");

XDocument doc2 =
new XDocument(
new XElement(doc1.Root.Name,
doc1
.Root
.Nodes()
.Except(
doc1
.Root
.Elements("bar"))));

doc2.Save(Console.Out);

 

 

 

Exploiting contravariance with LINQ to XML

Covariance and contravariance for generic interfaces are new features in C# and VB.NET in Visual Studio 2010 respectively the .NET framework 4.0. Generic interfaces like IEnumerable<T> or IEqualityComparer<T> in the .NET framework 4.0 use these new features. Starting with .NET 4.0 the type parameter T in IEqualityComparer<T> is contravariant. That can make coding with LINQ to XML easier, as the class XNodeEqualityComparer implements IEqualityComparer<XNode> where XNode is a common base class for other LINQ to XML classes like XElement.

Let's look at an example. Assume we have the following XML document

<?xml version="1.0" encoding="utf-8" ?>
<root>
<items>
<item>
<foo>a</foo>
<bar>1</bar>
</item>
<item>
<foo>b</foo>
<bar>2</bar>
</item>
<item>
<foo>a</foo>
<bar>1</bar>
</item>
<item>
<foo>c</foo>
<bar>3</bar>
</item>
<item>
<foo>c</foo>
<bar>3</bar>
</item>
</items>
</root>

and we want to use LINQ to XML to extract distinct items where we use XNodeEqualityComparer to compare the 'item' elements in the XML document.

You could be tempted to try it as follows:

            XDocument doc = XDocument.Load("XMLFile1.xml");

var distinctItems =
doc
.Root
.Element("items")
.Elements("item")
.Distinct(new XNodeEqualityComparer())
.Select(i => new { foo = (string)i.Element("foo"), bar = (int)i.Element("bar") });

foreach (var item in distinctItems)
{
Console.WriteLine(item);
}

but with .NET 3.5 that does not compile, complaining "Instance argument: cannot convert from 'System.Collections.Generic.IEnumerable<System.Xml.Linq.XElement>' to 'System.Collections.Generic.IEnumerable<System.Xml.Linq.XNode>'" on the Distinct(new XNodeEqualityComparer()) call. That happens because Elements("item") gives us an IEnumerable<XElement> and subsequently the Distinct method wants an IEqualityComparer<XElement> to be passed in while we only pass in an IEqualityComparer<XNode>.

With .NET 3.5 to work around that problem we first have to cast IEnumerable<XElement> up to IEnumerable<XNode> before we call Distinct(new XNodeEqualityComparer()) and then down again after the Distinct() call:

            XDocument doc = XDocument.Load("XMLFile1.xml");

var distinctItems =
doc
.Root
.Element("items")
.Elements("item")
.Cast<XNode>()
.Distinct(new XNodeEqualityComparer())
.Cast<XElement>()
.Select(i => new { foo = (string)i.Element("foo"), bar = (int)i.Element("bar") });

foreach (var item in distinctItems)
{
Console.WriteLine(item);
}

That compiles fine and nicely returns only distinct items:

{ foo = a, bar = 1 }
{ foo = b, bar = 2 }
{ foo = c, bar = 3 }

With .NET 4.0 however the type parameter T of IEqualityComparer is contravariant meaning if we have a method expecting an IEqualityComparer<XElement> it suffices to use a base type of XElement like XNode and thus with .NET 4.0 our original attempt compiles and runs fine:

            XDocument doc = XDocument.Load("XMLFile1.xml");

var distinctItems =
doc
.Root
.Element("items")
.Elements("item")
.Distinct(new XNodeEqualityComparer())
.Select(i => new { foo = (string)i.Element("foo"), bar = (int)i.Element("bar") });

foreach (var item in distinctItems)
{
Console.WriteLine(item);
}

 

 

 

 

What is new in System.Xml in .NET 4.0/Visual Studio 2010

Beta 1 of the .NET framework 4.0 and of Visual Studio 2010 has been released a few days ago. Although the "What's new" document does not list any new features in System.Xml or LINQ to XML I am browsing through the documentation to find new features or changes in APIs.

So far I have found the following:

With LINQ to XML the SaveOptions enumeration has a new flag named OmitDuplicateNamespaces. That is particularly useful with VB.NET XML literals as using them you might end up with more namespace declaration attributes as you want resulting in superfluous namespace declarations on child or descendant elements when you save/serialize a LINQ to XML XDocument or XElement.

Here is an example in VB.NET with .NET 3.5:

Imports System
Imports System.Xml.Linq
Imports <xmlns="http://www.w3.org/1999/xhtml">

Module Module1

Sub Main()
Dim html As XElement = _
<html>
<head>
<title>Example</title>
</head>
<body>
</body>
</html>

html.<body>(0).Add(GetParagraphs())

html.Save(Console.Out)

End Sub

Function GetParagraphs() As IEnumerable(Of XElement)
Dim ps() As String = {"Paragraph 1.", "Paragraph 2."}
Return (From p In ps Select <p><%= p %></p>)
End Function

End Module

Its output is as follows:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p xmlns="http://www.w3.org/1999/xhtml">Paragraph 1.</p>
<p xmlns="http://www.w3.org/1999/xhtml">Paragraph 2.</p>
</body>
</html>

As you can see, the namespace declarations on the 'p' elements are redundant as the namespace is already defined on the 'html' root element.

With .NET 4.0 and the SaveOptions.OmitDuplicateNamespaces flag you can avoid them as follows:

Imports System
Imports System.Xml.Linq
Imports <xmlns="http://www.w3.org/1999/xhtml">

Module Module1

Sub Main()
Dim html As XElement = _
<html>
<head>
<title>Example</title>
</head>
<body>
</body>
</html>

html.<body>(0).Add(GetParagraphs())


html.Save(Console.Out, SaveOptions.OmitDuplicateNamespaces)

End Sub

Function GetParagraphs() As IEnumerable(Of XElement)
Dim ps() As String = {"Paragraph 1.", "Paragraph 2."}
Return (From p In ps Select <p><%= p %></p>)
End Function

End Module

Now the output is fine:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p>Paragraph 1.</p>
<p>Paragraph 2.</p>
</body>
</html>

Have you ever wondered why an XDocument or XElement in .NET 3.5 could be saved to a TextWriter or a file or an XmlWriter but not directly to a Stream? In .NET 3.5 you need to construct an XmlWriter or TextWriter over a Stream but now in .NET 4.0 you can save directly to a Stream: XDocument.Save(Stream), XElement.Save(Stream). No functionality gain but a convenient addition, for instance when you want to send the serialization of an XDocument or XElement to the request stream of an HttpWebRequest. There are also corresponding Load methods taking a Stream as the input, XDocument.Load(Stream), XElement.Load(Stream).

 

There is also a new enumeration ReaderOptions in System.Xml.Linq but so far I have not found any method or property using that enumeration.

 

XmlReaderSettings has a new property DtdProcessing that replaces the now obsolete ProhibitDtd property. With the boolean property ProhibitDtd you could choose to either allow DTD parsing/processing or to prohibit it. With the new DtdProcessing property you have now three choices, prohibit, parse, or ignore. Ignore could give you performance benefits over parse. for instance the following parses the W3C home page twice, once ignoring the DTD, once parsing/processing it:

            Stopwatch watch = new Stopwatch();
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
watch.Start();
using (XmlReader reader = XmlReader.Create(@"http://www.w3.org/", settings))
{
while (reader.Read())
{
}
}
watch.Stop();
Console.WriteLine("DtdProcessing.Ignore: Elapsed time: {0}", watch.Elapsed);

settings.DtdProcessing = DtdProcessing.Parse;
watch.Start();
using (XmlReader reader = XmlReader.Create(@"http://www.w3.org/", settings))
{
while (reader.Read())
{
}
}
watch.Stop();
Console.WriteLine("DtdProcessing.Parse: Elapsed time: {0}", watch.Elapsed);

 

The output for me here is

DtdProcessing.Ignore: Elapsed time: 00:00:01.5245222
DtdProcessing.Parse: Elapsed time: 00:00:09.1677892

so ignoring the DTD is about nine times faster for that sample document. On the other hand if the DTD defines any entities that are then referenced in the XML document ignoring the DTD would give you an exception:

            XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;

string xhtml = @"<!DOCTYPE html
PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
<html xml:lang=""en"">
<head>
<title>Example</title>
</head>
<body>
<p>Price is: 100 &euro;</p>
</body>
</html>"
;
using (XmlReader reader = XmlReader.Create(new StringReader(xhtml), settings))
{
while (reader.Read()) { }
}

throws the exception "Reference to undeclared entity 'euro'".

 

That's all I have found so far, I will edit this post when I find more.

[edit 2009-05-26] I have now found a page "What's new in System.Xml" in the .NET framework 4 Beta 1 documentation. Oddly enough it lists LINQ to XML and the XSLT compiler as new features although both were introduced in .NET 3.5. It also mentions new methods in the XmlConvert class.

Creating XML with namespaces with JavaScript and MSXML

In my previous post I showed how to use the W3C DOM API to create XML with namespaces, using namespace aware methods like createElementNS or setAttributeNS of the W3C DOM Level 2 and 3 Core API. While MSXML (all versions including the latest, MSXML 6) implements the W3C DOM Level 1 Core API it does not implement any of the methods used in the previous post, like createElementNS or setAttributeNS. Instead it has a single method createNode that takes as its first argument the node type, as its second argument the qualified name and as its third argument the namespace you want to create a node in. This post shows how to use that method to create XML with namespaces with MSXML and JavaScript. The examples use MSXML 3 but later versions of MSXML (e.g. 4, 5, 6) expose exactly the same API.

Let's assume you want to create the following XML document with JavaScript and the MSXML DOM API:

<root xmlns="http://example.com/ns1"><foo><bar>foobar</bar></foo></root>

The key to doing that properly is to undestand the following: in the XML markup there is an XML default namespace declaration attribute xmlns="http://example.com/ns1" on the "root" element that is in scope for the "root" element and all its descendant elements (e.g. the "foo" and the "bar" element) meaning all three elements, the "root" element, the "foo" element and the "bar" element are in that namespace http://example.com/ns1. To create that XML document programmatically you have to create all three elements in that namespace. Thus to create those three elements, you need to call createNode three times, each time passing in the node type (1 for element node), the element name (e.g. 'root') and the namespace (e.g. 'http://example.com/ns1'):

    var doc = new ActiveXObject('Msxml2.DOMDocument.3.0');

var ns1 = 'http://example.com/ns1';

var root = doc.createNode(1, 'root', ns1);
var foo = doc.createNode(1, 'foo', ns1);
var bar = doc.createNode(1, 'bar', ns1);
bar.appendChild(doc.createTextNode('foobar'));
foo.appendChild(bar);
root.appendChild(foo);
doc.appendChild(root);

That creates an XML DOM document that, when serialized, looks as the above document. Here is an example showing that. As you can see, the DOM code did not need to create the default namespace declaration attribute at all, nevertheless, when the document is serialized by accessing the xml property, the serialization adds it.

Let's look at a further example that includes elements and attributes in two namespaces:

<root xmlns="http://example.com/ns1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.com/n1 schema.xsd">
<foo>
<bar>foobar</bar>
</foo>
</root>

[Note: for better reading I have inserted whitespace in the XML markup but the code I will show will focus on creating the elements and attributes only, not the whitespace.] In the above XML sample we now have two additional attributes, one namespace declaration attribute xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and one attribute in that namespace, the xsi:schemaLocation attribute. Again we do not have to create any namespace declaration attributes at all, it suffices to use the previous code, add a call to createNode and pass in 2 as the node type for attribute, the qualified name of the attribute (e.g. 'xsi:schemaLocation') and the namespace (e.g. 'http://www.w3.org/2001/XMLSchema-instance') to create that attribute and use setAttributeNode to add the attribute to the 'root' element:

    var doc = new ActiveXObject('Msxml2.DOMDocument.3.0');

var ns1 = 'http://example.com/ns1';
var xsi = 'http://www.w3.org/2001/XMLSchema-instance';

var root = doc.createNode(1, 'root', ns1);
var schemaLocation = doc.createNode(2, 'xsi:schemaLocation', xsi);
schemaLocation.nodeValue = 'http://example.com/n1 schema.xsd';
root.setAttributeNode(schemaLocation);

var foo = doc.createNode(1, 'foo', ns1);
var bar = doc.createNode(1, 'bar', ns1);
bar.appendChild(doc.createTextNode('foobar'));
foo.appendChild(bar);
root.appendChild(foo);
doc.appendChild(root);

 

Here is an example showing the result. Again, serialization (i.e. accessing the the xml property) creates all necessary namespace declaration attributes, as long as elements and attributes have been created in the namespaces they belong to. The only reason to create a namespace declaration attribute explicitly is to enforce its output on an element where the namespace is not used. Let's look at a further example:

<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink">
<script type="text/ecmascript" xlink:href="foo.js"/>
</svg>

In that XML document the XLink namespace http://www.w3.org/1999/xlink is defined on the root element, the 'svg' element, although the namespace is only used on the 'script' child element's attribute xlink:href. This time, if we want to ensure the namespace declaration attribute appears on the 'svg' element, we have to explicitly set it on that element (and need to know that namespace declaration attributes are per definition in the namespace http://www.w3.org/2000/xmlns/):

    var doc = new ActiveXObject('Msxml2.DOMDocument.3.0');

var svgNs = 'http://www.w3.org/2000/svg';
var xlinkNs = 'http://www.w3.org/1999/xlink';

var root = doc.createNode(1, 'svg', svgNs);
var xlink = doc.createNode(2, 'xmlns:xlink', 'http://www.w3.org/2000/xmlns/');
xlink.nodeValue = xlinkNs;
root.setAttributeNode(xlink);

var script = doc.createNode(1, 'script', svgNs);
script.setAttribute('type', 'text/ecmascript');
var href = doc.createNode(2, 'xlink:href', xlinkNs);
href.nodeValue = 'foo.js';
script.setAttributeNode(href);

root.appendChild(script);
doc.appendChild(root);

 

Here is an example showing the result. If we did not set the namespace declaration attribute on the 'svg' element then the resulting serialized document would nevertheless be namespace well-formed XML, only serialization would add the namespace declaration on the 'script' element.

So keep two things in mind when creating XML with namespaces programmatically with the MSXML DOM API: you need to create each element and attribute in the namespace it belongs to, passing in the node type, the qualified name and the namespace URI to the createNode method. And you do not need to create namespace declaration attribute explicitly, unless you want to enforce its appearance on an element where the namespace is not used (like the root element of your document).

The first rule is also of importance when you want to add elements or attributes to an already loaded document. Assuming we have loaded the following document

<root xmlns="http://example.com/ns1">
<foo/>
</root>

and want to add a 'bar' element in the same namespace as the other elements then often people assume they can create a 'bar' element with createElement('bar') and add it to the root element and that it then takes on the namespace of the root element. That is not the case however, createElement('bar') creates a 'bar' element in no namespace and when you insert that as a child of the above 'root' element and serialize the serialization will add <bar xmlns=""/> to ensure the created element is serialized in no namespace. So to properly add a 'bar' element in the same namespace as the 'root' element you again need to use createNode and pass in the namespace URI:

var bar = doc.createNode(1, 'bar', doc.documentElement.namespaceURI);
doc.documentElement.appendChild(bar);

 

Creating XML with namespaces with JavaScript and the W3C DOM

Let's assume you want to create the following XML document with JavaScript and the W3C DOM API:

<root xmlns="http://example.com/ns1"><foo><bar>foobar</bar></foo></root>

The key to doing that properly is to undestand the following: in the XML markup there is an XML default namespace declaration attribute xmlns="http://example.com/ns1" on the "root" element that is in scope for the "root" element and all its descendant elements (e.g. the "foo" and the "bar" element) meaning all three elements, the "root" element, the "foo" element and the "bar" element are in that namespace http://example.com/ns1. To create that XML document programmatically you have to create all three elements in that namespace. Thus to create the "root" element you use the createDocument method and pass in namespace and name and to create the descendant elements you use the method createElementNS method and each time pass in the namespace and the name:

var ns1 = 'http://example.com/ns1';

var doc = document.implementation.createDocument(ns1, 'root', null);
var foo = doc.createElementNS(ns1, 'foo');
var bar = doc.createElementNS(ns1, 'bar');
bar.appendChild(document.createTextNode('foobar'));
foo.appendChild(bar);
doc.documentElement.appendChild(foo);

That creates an XML DOM document that, when serialized, looks as the above document. Here is an example showing that. As you can see, the DOM code did not need to create the default namespace declaration attribute at all, nevertheless, the serializer, when serializing the DOM tree, adds it.

Let's look at a further example that includes elements and attributes in two namespaces:

<root xmlns="http://example.com/ns1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.com/n1 schema.xsd">
<foo>
<bar>foobar</bar>
</foo>
</root>

[Note: for better reading I have inserted whitespace in the XML markup but the code I will show will focus on creating the elements and attributes only, not the whitespace.] In the above XML sample we now have two additional attributes, one namespace declaration attribute xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and one attribute in that namespace, the xsi:schemaLocation attribute. Again we do not have to create any namespace declaration attributes at all, it suffices to use the previous code and add a call to setAttributeNS and pass in the namespace and the qualified name and the value to create the xsi:schemaLocation attribute:

var ns1 = 'http://example.com/ns1';
var xsi = 'http://www.w3.org/2001/XMLSchema-instance';

var doc = document.implementation.createDocument(ns1, 'root', null);
doc.documentElement.setAttributeNS(xsi, 'xsi:schemaLocation', 'http://example.com/n1 schema.xsd');

var foo = doc.createElementNS(ns1, 'foo');
var bar = doc.createElementNS(ns1, 'bar');
bar.appendChild(document.createTextNode('foobar'));
foo.appendChild(bar);
doc.documentElement.appendChild(foo);

 

Here is an example showing the result. Again, the serializer creates all necessary namespace declaration attributes, as long as elements and attributes have been created in the namespaces they belong to. The only reason to create a namespace declaration attribute explicitly is to enforce its output on an element where the namespace is not used. Let's look at a further example:

<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink">
<script type="text/ecmascript" xlink:href="foo.js"/>
</svg>

In that XML document the XLink namespace http://www.w3.org/1999/xlink is defined on the root element, the 'svg' element, although the namespace is only used on the 'script' child element's attribute xlink:href. This time, if we want to ensure the namespace declaration attribute appears on the 'svg' element, we have to explicitly set it on that element (and need to know that namespace declaration attributes are per definition in the namespace http://www.w3.org/2000/xmlns/):

var svgNs = 'http://www.w3.org/2000/svg';
var xlinkNs = 'http://www.w3.org/1999/xlink';

var doc = document.implementation.createDocument(svgNs, 'svg', null);
doc.documentElement.setAttributeNS('http://www.w3.org/2000/xmlns/', 'xmlns:xlink', xlinkNs);

var script = doc.createElementNS(svgNs, 'script');
script.setAttributeNS(null, 'type', 'text/ecmascript');
script.setAttributeNS(xlinkNs, 'xlink:href', 'foo.js');

doc.documentElement.appendChild(script);

 

Here is an example showing the result. If we did not set the namespace declaration attribute on the 'svg' element then the resulting serialized document would nevertheless be namespace well-formed XML, only the serializer would add the namespace declaration on the 'script' element.

So keep two things in mind when creating XML with namespaces programmatically with the W3C DOM API: you need to create each element and attribute in the namespace it belongs to, passing in the namespace URI to methods like createElementNS or setAttributeNS. And you do not need to create namespace declaration attribute explicitly, unless you want to enforce its appearance on an element where the namespace is not used (like the root element of your document).

The first rule is also of importance when you want to add elements or attributes to an already loaded document. Assuming we have loaded the following document

<root xmlns="http://example.com/ns1">
<foo/>
</root>

and want to add a 'bar' element in the same namespace as the other elements then often people assume they can create a 'bar' element with createElement('bar') and add it to the root element and that it then takes on the namespace of the root element. That is not the case however, createElement('bar') creates a 'bar' element in no namespace and when you insert that as a child of the above 'root' element and serialize the serializer will add <bar xmlns=""/> to ensure the created element is serialized in no namespace. So to properly add a 'bar' element in the same namespace as the 'root' element you again need to use createElementNS and pass in the namespace URI:

var bar = doc.createElementNS(doc.documentElement.namespaceURI, 'bar');
doc.documentElement.appendChild(bar);