What is new in System.Xml in .NET 4.0/Visual Studio 2010
Beta 1 of the .NET framework 4.0 and of Visual Studio 2010 has been released a few days ago. Although the "What's new" document does not list any new features in System.Xml or LINQ to XML I am browsing through the documentation to find new features or changes in APIs.
So far I have found the following:
With LINQ to XML the SaveOptions enumeration has a new flag named OmitDuplicateNamespaces. That is particularly useful with VB.NET XML literals as using them you might end up with more namespace declaration attributes as you want resulting in superfluous namespace declarations on child or descendant elements when you save/serialize a LINQ to XML XDocument or XElement.
Here is an example in VB.NET with .NET 3.5:
Imports System
Imports System.Xml.Linq
Imports <xmlns="http://www.w3.org/1999/xhtml">
Module Module1
Sub Main()
Dim html As XElement = _
<html>
<head>
<title>Example</title>
</head>
<body>
</body>
</html>
html.<body>(0).Add(GetParagraphs())
html.Save(Console.Out)
End Sub
Function GetParagraphs() As IEnumerable(Of XElement)
Dim ps() As String = {"Paragraph 1.", "Paragraph 2."}
Return (From p In ps Select <p><%= p %></p>)
End Function
End Module
Its output is as follows:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p xmlns="http://www.w3.org/1999/xhtml">Paragraph 1.</p>
<p xmlns="http://www.w3.org/1999/xhtml">Paragraph 2.</p>
</body>
</html>
As you can see, the namespace declarations on the 'p' elements are redundant as the namespace is already defined on the 'html' root element.
With .NET 4.0 and the SaveOptions.OmitDuplicateNamespaces flag you can avoid them as follows:
Imports System
Imports System.Xml.Linq
Imports <xmlns="http://www.w3.org/1999/xhtml">
Module Module1
Sub Main()
Dim html As XElement = _
<html>
<head>
<title>Example</title>
</head>
<body>
</body>
</html>
html.<body>(0).Add(GetParagraphs())
html.Save(Console.Out, SaveOptions.OmitDuplicateNamespaces)
End Sub
Function GetParagraphs() As IEnumerable(Of XElement)
Dim ps() As String = {"Paragraph 1.", "Paragraph 2."}
Return (From p In ps Select <p><%= p %></p>)
End Function
End Module
Now the output is fine:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example</title>
</head>
<body>
<p>Paragraph 1.</p>
<p>Paragraph 2.</p>
</body>
</html>
Have you ever wondered why an XDocument or XElement in .NET 3.5 could be saved to a TextWriter or a file or an XmlWriter but not directly to a Stream? In .NET 3.5 you need to construct an XmlWriter or TextWriter over a Stream but now in .NET 4.0 you can save directly to a Stream: XDocument.Save(Stream), XElement.Save(Stream). No functionality gain but a convenient addition, for instance when you want to send the serialization of an XDocument or XElement to the request stream of an HttpWebRequest. There are also corresponding Load methods taking a Stream as the input, XDocument.Load(Stream), XElement.Load(Stream).
There is also a new enumeration ReaderOptions in System.Xml.Linq but so far I have not found any method or property using that enumeration.
XmlReaderSettings has a new property DtdProcessing that replaces the now obsolete ProhibitDtd property. With the boolean property ProhibitDtd you could choose to either allow DTD parsing/processing or to prohibit it. With the new DtdProcessing property you have now three choices, prohibit, parse, or ignore. Ignore could give you performance benefits over parse. for instance the following parses the W3C home page twice, once ignoring the DTD, once parsing/processing it:
Stopwatch watch = new Stopwatch();
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
watch.Start();
using (XmlReader reader = XmlReader.Create(@"http://www.w3.org/", settings))
{
while (reader.Read())
{
}
}
watch.Stop();
Console.WriteLine("DtdProcessing.Ignore: Elapsed time: {0}", watch.Elapsed);
settings.DtdProcessing = DtdProcessing.Parse;
watch.Start();
using (XmlReader reader = XmlReader.Create(@"http://www.w3.org/", settings))
{
while (reader.Read())
{
}
}
watch.Stop();
Console.WriteLine("DtdProcessing.Parse: Elapsed time: {0}", watch.Elapsed);
The output for me here is
DtdProcessing.Ignore: Elapsed time: 00:00:01.5245222
DtdProcessing.Parse: Elapsed time: 00:00:09.1677892
so ignoring the DTD is about nine times faster for that sample document. On the other hand if the DTD defines any entities that are then referenced in the XML document ignoring the DTD would give you an exception:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
string xhtml = @"<!DOCTYPE html
PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN""
""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">
<html xml:lang=""en"">
<head>
<title>Example</title>
</head>
<body>
<p>Price is: 100 €</p>
</body>
</html>";
using (XmlReader reader = XmlReader.Create(new StringReader(xhtml), settings))
{
while (reader.Read()) { }
}
throws the exception "Reference to undeclared entity 'euro'".
That's all I have found so far, I will edit this post when I find more.
[edit 2009-05-26] I have now found a page "What's new in System.Xml" in the .NET framework 4 Beta 1 documentation. Oddly enough it lists LINQ to XML and the XSLT compiler as new features although both were introduced in .NET 3.5. It also mentions new methods in the XmlConvert class.