May 2009 - Posts

Exploiting contravariance with LINQ to XML

Covariance and contravariance for generic interfaces are new features in C# and VB.NET in Visual Studio 2010 respectively the .NET framework 4.0. Generic interfaces like IEnumerable<T> or IEqualityComparer<T> in the .NET framework 4.0 use these new features. Starting with .NET 4.0 the type parameter T in IEqualityComparer<T> is contravariant. That can make coding with LINQ to XML easier, as the class XNodeEqualityComparer implements IEqualityComparer<XNode> where XNode is a common base class for other LINQ to XML classes like XElement.

Let's look at an example. Assume we have the following XML document

<?xml version="1.0" encoding="utf-8" ?>
<root>
<items>
<item>
<foo>a</foo>
<bar>1</bar>
</item>
<item>
<foo>b</foo>
<bar>2</bar>
</item>
<item>
<foo>a</foo>
<bar>1</bar>
</item>
<item>
<foo>c</foo>
<bar>3</bar>
</item>
<item>
<foo>c</foo>
<bar>3</bar>
</item>
</items>
</root>

and we want to use LINQ to XML to extract distinct items where we use XNodeEqualityComparer to compare the 'item' elements in the XML document.

You could be tempted to try it as follows:

            XDocument doc = XDocument.Load("XMLFile1.xml");

var distinctItems =
doc
.Root
.Element("items")
.Elements("item")
.Distinct(new XNodeEqualityComparer())
.Select(i => new { foo = (string)i.Element("foo"), bar = (int)i.Element("bar") });

foreach (var item in distinctItems)
{
Console.WriteLine(item);
}

but with .NET 3.5 that does not compile, complaining "Instance argument: cannot convert from 'System.Collections.Generic.IEnumerable<System.Xml.Linq.XElement>' to 'System.Collections.Generic.IEnumerable<System.Xml.Linq.XNode>'" on the Distinct(new XNodeEqualityComparer()) call. That happens because Elements("item") gives us an IEnumerable<XElement> and subsequently the Distinct method wants an IEqualityComparer<XElement> to be passed in while we only pass in an IEqualityComparer<XNode>.

With .NET 3.5 to work around that problem we first have to cast IEnumerable<XElement> up to IEnumerable<XNode> before we call Distinct(new XNodeEqualityComparer()) and then down again after the Distinct() call:

            XDocument doc = XDocument.Load("XMLFile1.xml");

var distinctItems =
doc
.Root
.Element("items")
.Elements("item")
.Cast<XNode>()
.Distinct(new XNodeEqualityComparer())
.Cast<XElement>()
.Select(i => new { foo = (string)i.Element("foo"), bar = (int)i.Element("bar") });

foreach (var item in distinctItems)
{
Console.WriteLine(item);
}

That compiles fine and nicely returns only distinct items:

{ foo = a, bar = 1 }
{ foo = b, bar = 2 }
{ foo = c, bar = 3 }

With .NET 4.0 however the type parameter T of IEqualityComparer is contravariant meaning if we have a method expecting an IEqualityComparer<XElement> it suffices to use a base type of XElement like XNode and thus with .NET 4.0 our original attempt compiles and runs fine:

            XDocument doc = XDocument.Load("XMLFile1.xml");

var distinctItems =
doc
.Root
.Element("items")
.Elements("item")
.Distinct(new XNodeEqualityComparer())
.Select(i => new { foo = (string)i.Element("foo"), bar = (int)i.Element("bar") });

foreach (var item in distinctItems)
{
Console.WriteLine(item);
}

 

 

 

 

What is new in System.Xml in .NET 4.0/Visual Studio 2010

Beta 1 of the .NET framework 4.0 and of Visual Studio 2010 has been released a few days ago. Although the "What's new" document does not list any new features in System.Xml or LINQ to XML I am browsing through the documentation to find new features or changes in APIs.

So far I have found the following:

With LINQ to XML the SaveOptions enumeration has a new flag named OmitDuplicateNamespaces. That is particularly useful with VB.NET XML literals as using them you might end up with more namespace declaration attributes as you want resulting in superfluous namespace declarations on child or descendant elements when you save/serialize a LINQ to XML XDocument or XElement.

Here is an example in VB.NET with .NET 3.5:

Imports System
Imports System.Xml.Linq
Imports <xmlns="http://www.w3.org/1999/xhtml">

Module Module1

Sub Main()
Dim html As XElement = _
<html>
<head>
<title>Example</title>
</head>
<body>
</body>
</html>

html.<body>(0).Add(GetParagraphs())

html.Save(Console.Out)

End Sub

Function GetParagraphs() As IEnumerable(Of XElement)
Dim ps() As String = {"Paragraph 1.", "Paragraph 2."}
Return (From p In ps Select <p><%= p %></p>)
End Function

End Module

Its output is as follows:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p xmlns="http://www.w3.org/1999/xhtml">Paragraph 1.</p>
<p xmlns="http://www.w3.org/1999/xhtml">Paragraph 2.</p>
</body>
</html>

As you can see, the namespace declarations on the 'p' elements are redundant as the namespace is already defined on the 'html' root element.

With .NET 4.0 and the SaveOptions.OmitDuplicateNamespaces flag you can avoid them as follows:

Imports System
Imports System.Xml.Linq
Imports <xmlns="http://www.w3.org/1999/xhtml">

Module Module1

Sub Main()
Dim html As XElement = _
<html>
<head>
<title>Example</title>
</head>
<body>
</body>
</html>

html.<body>(0).Add(GetParagraphs())


html.Save(Console.Out, SaveOptions.OmitDuplicateNamespaces)

End Sub

Function GetParagraphs() As IEnumerable(Of XElement)
Dim ps() As String = {"Paragraph 1.", "Paragraph 2."}
Return (From p In ps Select <p><%= p %></p>)
End Function

End Module

Now the output is fine:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p>Paragraph 1.</p>
<p>Paragraph 2.</p>
</body>
</html>

Have you ever wondered why an XDocument or XElement in .NET 3.5 could be saved to a TextWriter or a file or an XmlWriter but not directly to a Stream? In .NET 3.5 you need to construct an XmlWriter or TextWriter over a Stream but now in .NET 4.0 you can save directly to a Stream: XDocument.Save(Stream), XElement.Save(Stream). No functionality gain but a convenient addition, for instance when you want to send the serialization of an XDocument or XElement to the request stream of an HttpWebRequest. There are also corresponding Load methods taking a Stream as the input, XDocument.Load(Stream), XElement.Load(Stream).

 

There is also a new enumeration ReaderOptions in System.Xml.Linq but so far I have not found any method or property using that enumeration.

 

XmlReaderSettings has a new property DtdProcessing that replaces the now obsolete ProhibitDtd property. With the boolean property ProhibitDtd you could choose to either allow DTD parsing/processing or to prohibit it. With the new DtdProcessing property you have now three choices, prohibit, parse, or ignore. Ignore could give you performance benefits over parse. for instance the following parses the W3C home page twice, once ignoring the DTD, once parsing/processing it:

            Stopwatch watch = new Stopwatch();
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
watch.Start();
using (XmlReader reader = XmlReader.Create(@"http://www.w3.org/", settings))
{
while (reader.Read())
{
}
}
watch.Stop();
Console.WriteLine("DtdProcessing.Ignore: Elapsed time: {0}", watch.Elapsed);

settings.DtdProcessing = DtdProcessing.Parse;
watch.Start();
using (XmlReader reader = XmlReader.Create(@"http://www.w3.org/", settings))
{
while (reader.Read())
{
}
}
watch.Stop();
Console.WriteLine("DtdProcessing.Parse: Elapsed time: {0}", watch.Elapsed);

 

The output for me here is

DtdProcessing.Ignore: Elapsed time: 00:00:01.5245222
DtdProcessing.Parse: Elapsed time: 00:00:09.1677892

so ignoring the DTD is about nine times faster for that sample document. On the other hand if the DTD defines any entities that are then referenced in the XML document ignoring the DTD would give you an exception:

            XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;

string xhtml = @"<!DOCTYPE html
PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
<html xml:lang=""en"">
<head>
<title>Example</title>
</head>
<body>
<p>Price is: 100 &euro;</p>
</body>
</html>"
;
using (XmlReader reader = XmlReader.Create(new StringReader(xhtml), settings))
{
while (reader.Read()) { }
}

throws the exception "Reference to undeclared entity 'euro'".

 

That's all I have found so far, I will edit this post when I find more.

[edit 2009-05-26] I have now found a page "What's new in System.Xml" in the .NET framework 4 Beta 1 documentation. Oddly enough it lists LINQ to XML and the XSLT compiler as new features although both were introduced in .NET 3.5. It also mentions new methods in the XmlConvert class.