One of the major improvements in XSLT 2.0 is grouping where the xsl:for-each-group element supports four different grouping ways with group-by, group-starting-with, group-ending-with and group-adjacent. Unfortunately Microsoft does not support XSLT 2.0 thus if you need to group some XML input and don't want to write XSLT 1.0 you will need to use third party XSLT 2.0 implementation like Saxon 9 or AltovaXML tools or you will need to use the grouping support in LINQ with LINQ to XML.
In this blog post I will look into solving the grouping examples in the XSLT 2.0 specification with LINQ to XML to find out whether/how LINQ to XML allows you to solve the same problems that group-by, group-starting-with, group-ending-with and group-adjacent in XSLT 2.0 allow you to solve.
The first example is rather straightforward, the group-by in XSLT 2.0 can be "translated" into a group by in LINQ. So with cities.xml being
<cities>
<city name="Milano" country="Italia" pop="5"/>
<city name="Paris" country="France" pop="7"/>
<city name="München" country="Deutschland" pop="4"/>
<city name="Lyon" country="France" pop="2"/>
<city name="Venezia" country="Italia" pop="1"/>
</cities>
we can use the following C# code
XDocument input = XDocument.Load(@"cities.xml");
XDocument output = new XDocument(
new XElement("table",
new XElement("tr",
new XElement("th", "Position"),
new XElement("th", "Country"),
new XElement("th", "List of Cities"),
new XElement("th", "Population"),
(from city in input.Root.Elements("city")
group city by (string)city.Attribute("country"))
.Select((g, i) =>
new XElement("tr",
new XElement("td", i + 1),
new XElement("td", g.Key),
new XElement("td", string.Join(", ", g.Select(c => (string)c.Attribute("name")).ToArray())),
new XElement("td", g.Sum(c => (decimal)c.Attribute("pop"))))))));
output.Save(@"cities.html");
to get cities.html as follows:
<table>
<tr>
<th>Position</th>
<th>Country</th>
<th>List of Cities</th>
<th>Population</th>
<tr>
<td>1</td>
<td>Italia</td>
<td>Milano, Venezia</td>
<td>6</td>
</tr>
<tr>
<td>2</td>
<td>France</td>
<td>Paris, Lyon</td>
<td>9</td>
</tr>
<tr>
<td>3</td>
<td>Deutschland</td>
<td>München</td>
<td>4</td>
</tr>
</tr>
</table>
The only nuisance with the LINQ solution is that, to get the positional index, we need to switch from the query expression syntax to the functional syntax. And the index in XSLT/XPath is one based whereas in .NET is is zero based so we need to add one to get the same result.
The second example is called "A Composite Grouping Key" as it wants to compute the average population value for each (country, name) combination. However the XSLT 2.0 solution then suggests to nest two xsl:for-each-group as XSLT 2.0 does not allow a grouping key containing of two separate values. LINQ however allows that so with LINQ to XML we can use the approach shown below. Assuming the XML input cities2.xml is
<cities>
<city name="Milano" country="Italia" year="1950" pop="5.23"/>
<city name="Milano" country="Italia" year="1960" pop="5.29"/>
<city name="Padova" country="Italia" year="1950" pop="0.69"/>
<city name="Padova" country="Italia" year="1960" pop="0.93"/>
<city name="Paris" country="France" year="1951" pop="7.2"/>
<city name="Paris" country="France" year="1961" pop="7.6"/>
</cities>
then this C# code
XDocument input = XDocument.Load(@"cities2.xml");
XDocument output = new XDocument(
new XElement("div",
from city in input.Root.Elements("city")
group city by new { country = (string)city.Attribute("country"), name = (string)city.Attribute("name") } into g
select new XElement("p",
g.Key.name + ", "
+ g.Key.country + ": "
+ g.Average(c => (decimal)c.Attribute("pop")).ToString(CultureInfo.InvariantCulture))));
output.Save(@"cities2.html");
produces the following output snippet:
<div>
<p>Milano, Italia: 5.26</p>
<p>Padova, Italia: 0.81</p>
<p>Paris, France: 7.4</p>
</div>
The last group-by example in the XSLT 2.0 specification is called "Adding an Element to Several Groups". Here XSLT's "for-each-group group-by" excels as it simply allows the group-by expression to evaluate to a sequence of grouping keys where an item can then belong to several groups. With LINQ that is not possible so we need to follow a different approach, we first need to find the distinct keys, then for each distinct key we need to process the elements containing the key. So assuming the XML input is
<titles>
<title>A Beginner's Guide to <ix>Java</ix></title>
<title>Learning <ix>XML</ix></title>
<title>Using <ix>XML</ix> with <ix>Java</ix></title>
</titles>
the following C# code
XDocument input = XDocument.Load(@"titles.xml");
XDocument output = new XDocument(
new XElement("div",
from key in input.Root.Elements("title").Elements("ix").Select(i => i.Value).Distinct()
select
(new List<XElement>() {new XElement("h2", key)}).Union(
from title in input.Root.Elements("title")
where title.Elements("ix").Any(ix => (string)ix == key)
select new XElement("p", title.Value))));
output.Save("titles.html);
produces the output fragment
<div>
<h2>Java</h2>
<p>A Beginner's Guide to Java</p>
<p>Using XML with Java</p>
<h2>XML</h2>
<p>Learning XML</p>
<p>Using XML with Java</p>
</div>
Now let's look at group-starting-with. With the sample input being
<body>
<h2>Introduction</h2>
<p>XSLT is used to write stylesheets.</p>
<p>XQuery is used to query XML databases.</p>
<h2>What is a stylesheet?</h2>
<p>A stylesheet is an XML document used to define a transformation.</p>
<p>Stylesheets may be written in XSLT.</p>
<p>XSLT 2.0 introduces new grouping constructs.</p>
</body>
the XSLT 2.0 solution has a template for the body element doing an <xsl:for-each-group select="*" group-starting-with="h2"> so it selects all child elements of the body and specifies a pattern 'h2' to identity the element type starting a group. The LINQ grouping construct does not allow to follow this same approach but we can use a different approach instead: we process the 'p' child elements and then group them by the immediately preceding 'h2' sibling. We can select that with LINQ to XML as p.NodesBeforeSelf().OfType<XElement>().LastOrDefault(e => e.Name == "h2") which takes the preceding sibling nodes (NodesBeforeSelf()), restricts that to XElement element nodes (OfType<XElement>()) and then takes the last one in document order to have the name 'h2'. With that approach the C# code looks as follows:
XDocument input = XDocument.Load("input.xml");
XDocument output = new XDocument(
new XElement("chapter",
from p in input.Root.Elements("p")
group p by p.NodesBeforeSelf().OfType<XElement>().LastOrDefault(e => e.Name == "h2") into g
select new XElement("section",
new XAttribute("title", g.Key != null ? g.Key.Value : ""),
from p2 in g
select new XElement("para", p2.Value))));
output.Save("output.xml");
and creates the following XML output:
<chapter>
<section title="Introduction">
<para>XSLT is used to write stylesheets.</para>
<para>XQuery is used to query XML databases.</para>
</section>
<section title="What is a stylesheet?">
<para>A stylesheet is an XML document used to define a transformation.</para>
<para>Stylesheets may be written in XSLT.</para>
<para>XSLT 2.0 introduces new grouping constructs.</para>
</section>
</chapter>
which is the same result the XSLT stylesheet creates. So basically as LINQ allows the grouping key to be an object we simply need to identity the XElement object (or generally the XNode object) starting a group and then we can make use the the LINQ group by construct.
Not surprisingly a similar approach can be applied for the group-ending-with example. It has the following input
<doc>
<page continued="yes">Some text</page>
<page continued="yes">More text</page>
<page>Yet more text</page>
<page continued="yes">Some words</page>
<page continued="yes">More words</page>
<page>Yet more words</page>
</doc>
and then has a template matching the 'doc' element with an xsl:for-each-group select="*" group-ending-with="page[not(@continued='yes')]" meaning it processes all child element and provides a pattern to indentify the element node ending a group. With LINQ to XML we can select the 'page' elements having the continued="yes" attribute and group them by the immediately following sibling 'page' element not having that attribute. That can be found using page.NodesAfterSelf().OfType<XElement>().FirstOrDefault(p => !((string)p.Attribute("continued") == "yes")) which selects the following sibling nodes, restricts that to XElement nodes and then takes the first in document order which does not have the continued="yes" attribute. So the C# code looks as follows:
XDocument input = XDocument.Load("input.xml");
XDocument output = new XDocument(
new XElement(input.Root.Name,
from page in input.Root.Elements("page")
where (string)page.Attribute("continued") == "yes"
group page by page.NodesAfterSelf().OfType<XElement>().FirstOrDefault(p => !((string)p.Attribute("continued") == "yes")) into g
select new XElement("pageset",
from page2 in g
select new XElement("page", page2.Value),
g.Key)));
output.Save("output.xml");
and produces the following output:
<doc>
<pageset>
<page>Some text</page>
<page>More text</page>
<page>Yet more text</page>
</pageset>
<pageset>
<page>Some words</page>
<page>More words</page>
<page>Yet more words</page>
</pageset>
</doc>
Again, LINQ allowing us to have an object (e.g. XElement node) as the grouping key provides us with a way to use the LINQ group by to solve that problem.
That leaves us with the group-adjacent XSLT example to be solved with LINQ to XML. As doing that looks to be harder and I am thinking of implementing that with an extension method I will end this article here and treat group-adjacent in another article.