November 2009 - Posts

Data oriented WCF service Agile with EF4 and a “sort of” provider LINQ

I use a different approach than Julie’s one for EF Agile development. Indeed, I use here Self-Tracking Entities and I massively use T4.

The first step is to create a project DAL in which we will add an edmx.

In my sample, I use Northwind DataBase with only the Customers, Orders and [Order Details] tables.

image

When it’s done, we will add a new item of type ADO.NET Self-Tracking Entity Generator to our project.

It will add two T4 templates to our project:

  • one with entities + some classes and interfaces for Self-Tracking

image

  • one with the context + a static class with extension methods.

We will move our entities template into another project Entities.

Then we have to add the Entities reference in our DAL project. We will also set the edmx path in our templates.

Then, we will create a new project Repositories. In this project, we will define an interface INorthwindRepository.

image

We can see the redundancy of INorthwindRepository. That’s why, in my sample, these interfaces (INorthwindRepository and IEntitySet) are also generated with a T4 template which uses my edmx. What is great here is the fact that if we change our data model (my edmx), we have no more code to write. We just have to regenerate all my T4 templates.

Now, we will go back to my context. It has to implement INorthwindRepository. For this, we will update my T4 template to have this:

image

For WCF, we need a contract. So I will add two new projects: Services and ServiceContracts (with T4).

image

image

So now, if you already have the T4 templates (which are independent of the project), you wrote 0 lines of code!

We can see the assemblies independence with EF (except for the DAL of course) :

image

I wanted to use Unity. To do it, I use Alexey Zakharov approach. So I integrate his WCFFacility project and classes Bootstrapper and UnityServiceLocatorAdapter in my WCFService (web project host). After creating the svc and the config file, my service is finished!

 

Now in the client tier, I wanted to use a “sort of” provider LINQ. For this, I inspired from one of my old demo made for ALT.NET French community.

We will create a Client.LINQ project in which we will define two classes: ClientLINQ and MyQueryable.

public static class ClientLINQ
{
    public static MyQueryable<T> Where<T>(this MyQueryable<T> source, Expression<Func<T, bool>> where)
    {
        source.WhereValue = string.Concat(source.WhereValue ?? "", where.Body.ToString().Replace(string.Format("{0}.", where.Parameters[0].Name), "it.").Replace("\"", "'").Replace("||", " OR ").Replace("&&", " AND "));
        return source;
    }

 

    public static MyQueryable<T> OrderBy<T, T2>(this MyQueryable<T> source, Expression<Func<T, T2>> orderBy)
    {
        source.OrderByValue = orderBy.Body.ToString().Replace(string.Format("{0}.", orderBy.Parameters[0].Name), "it.");
        return source;
    }

 

    public static MyQueryable<T> Include<T>(this MyQueryable<T> source, string include)
    {
        source.IncludeValues.Add(include);
        return source;
    }

 

    public static MyQueryable<T> Skip<T>(this MyQueryable<T> source, int number)
    {
        source.SkipValue = number;
        return source;
    }

 

    public static MyQueryable<T> Take<T>(this MyQueryable<T> source, int number)
    {
        source.TakeValue = number;
        return source;
    }

 

    public static T FirstOrDefault<T>(this MyQueryable<T> source)
    {
        source.TakeValue = 1;
        return source.AsEnumerable().FirstOrDefault();
    }

 

    public static T First<T>(this MyQueryable<T> source)
    {
        source.TakeValue = 1;
        return source.AsEnumerable().First();
    }

 

    public static MyQueryable<T> ToMyQueryable<T>(this IEnumerable<T> source)
    {
        var value = source as MyQueryable<T>;
        if (value == null)
            value = new MyQueryable<T>(source);
        return value;
    }
}

public class MyQueryable<T> : IEnumerable<T>
{
    public MyQueryable()
    {
    }
    public MyQueryable(IEnumerable<T> enumerable)
    {
        Enumerable = enumerable;
    }

 

    public IEnumerable<T> Enumerable { get; set; }

 

    public IEnumerator<T> GetEnumerator()
    {
        return Enumerable.GetEnumerator();
    }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

 

    private bool _allEntities = true;
    public bool AllEntities
    {
        get { return _allEntities; }
        internal set
        {
            _allEntities = value;
            if (value)
            {
                IncludeValues.Clear();
                WhereValue = null;
                OrderByValue = null;
                SkipValue = null;
                TakeValue = null;
            }
        }
    }
    private ObservableCollection<string> _includeValues;
    public ObservableCollection<string> IncludeValues
    {
        get
        {
            if (_includeValues == null)
            {
                _includeValues = new ObservableCollection<string>();
                _includeValues.CollectionChanged += (sender, e) =>
                    {
                        if (e.Action == NotifyCollectionChangedAction.Add)
                            AllEntities = false;
                    };
            }
            return _includeValues;
        }
    }
    private string _whereValue;
    public string WhereValue
    {
        get { return _whereValue; }
        internal set
        {
            if (value != null)
                AllEntities = false;
            _whereValue = value;
        }
    }
    private string _orderByValue;
    public string OrderByValue
    {
        get { return _orderByValue; }
        internal set
        {
            if (value != null)
                AllEntities = false;
            _orderByValue = value;
        }
    }
    private int? _skipValue;
    public int? SkipValue 
    {
        get { return _skipValue; }
        internal set
        {
            if (value.HasValue)
                AllEntities = false;
            _skipValue = value;
        }
    }
    private int? _takeValue;
    public int? TakeValue 
    {
        get { return _takeValue; }
        internal set
        {
            if (value.HasValue)
                AllEntities = false;
            _takeValue = value;
        }
    }
}

Then, we will create our client project (a Test project in my sample). In this project, I add Entities project and Client.LINQ references. Client.LINQ is not mandatory. I can also duplicate my entities code but in order to have Self-Tracking, I can’t use the default generated code. Then, we will add the WCF service reference. To use my “sort of” provider LINQ, I need some MyQueryable of my entities. Here also, (I hope that you guess it), we can use a T4 template.

public partial class NorthwindClientContext
{     
      private INorthwindService _service;

      public NorthwindClientContext(INorthwindService service)
      {
            _service = service;
     
  
      public MyQueryable<Customer> Customers
      {
            get
            {
                  var value = new MyQueryable<Customer>();
                  value.Enumerable = GetCustomers(value);
                  return value;
            }
      }
      private IEnumerable<Customer> GetCustomers(MyQueryable<Customer> myQueryable)
      {
            IEnumerable<Customer> value;
            if (myQueryable.AllEntities)
                  value = _service.GetAllCustomers();
            else
                  value = _service.GetCustomers(myQueryable.IncludeValues.ToList(), myQueryable.WhereValue, myQueryable.OrderByValue, myQueryable.SkipValue, myQueryable.TakeValue);
            foreach (var entity in value)
                  yield return entity;
     

      public MyQueryable<Order> Orders
      {
            get
            {
                  var value = new MyQueryable<Order>();
                  value.Enumerable = GetOrders(value);
                  return value;
            }
      }
      private IEnumerable<Order> GetOrders(MyQueryable<Order> myQueryable)
      {
            IEnumerable<Order> value;
            if (myQueryable.AllEntities)
                  value = _service.GetAllOrders();
            else
                  value = _service.GetOrders(myQueryable.IncludeValues.ToList(), myQueryable.WhereValue, myQueryable.OrderByValue, myQueryable.SkipValue, myQueryable.TakeValue);
            foreach (var entity in value)
                  yield return entity;
     

      public MyQueryable<OrderDetail> OrderDetails
      {
            get
            {
                  var value = new MyQueryable<OrderDetail>();
                  value.Enumerable = GetOrderDetails(value);
                  return value;
            }
      } 
      private IEnumerable<OrderDetail> GetOrderDetails(MyQueryable<OrderDetail> myQueryable)
      {
            IEnumerable<OrderDetail> value;
            if (myQueryable.AllEntities)
                  value = _service.GetAllOrderDetails();
            else
                  value = _service.GetOrderDetails(myQueryable.IncludeValues.ToList(), myQueryable.WhereValue, myQueryable.OrderByValue, myQueryable.SkipValue, myQueryable.TakeValue);
            foreach (var entity in value)
                  yield return entity;
      }
}

The use of the yield return will postpone the execution and allow MyQueryable properties to be set when GetCustomers / GetOrders / GetOrderDetails methods are called.

So the following code:

var order = (from o in new NorthwindClientContext(service).Orders.Include("Customer").Include("OrderDetails")
             where o.ShipCity == "PARIS"
             orderby o.OrderDate
             select o).Skip(2).First();

will call this method

_service.GetOrders(myQueryable.IncludeValues.ToList(), myQueryable.WhereValue, myQueryable.OrderByValue, myQueryable.SkipValue, myQueryable.TakeValue)

with these parameters:

image

wich will generate the following SQL query:

SELECT
[Project1].[OrderID] AS [OrderID],
[Project1].[CustomerID] AS [CustomerID],
[Project1].[EmployeeID] AS [EmployeeID],
[Project1].[OrderDate] AS [OrderDate],
[Project1].[RequiredDate] AS [RequiredDate],
[Project1].[ShippedDate] AS [ShippedDate],
[Project1].[ShipVia] AS [ShipVia],
[Project1].[Freight] AS [Freight],
[Project1].[ShipName] AS [ShipName],
[Project1].[ShipAddress] AS [ShipAddress],
[Project1].[ShipCity] AS [ShipCity],
[Project1].[ShipRegion] AS [ShipRegion],
[Project1].[ShipPostalCode] AS [ShipPostalCode],
[Project1].[ShipCountry] AS [ShipCountry],
[Project1].[CustomerID1] AS [CustomerID1],
[Project1].[CompanyName] AS [CompanyName],
[Project1].[ContactName] AS [ContactName],
[Project1].[ContactTitle] AS [ContactTitle],
[Project1].[Address] AS [Address],
[Project1].[City] AS [City],
[Project1].[Region] AS [Region],
[Project1].[PostalCode] AS [PostalCode],
[Project1].[Country] AS [Country],
[Project1].[Phone] AS [Phone],
[Project1].[Fax] AS [Fax],
[Project1].[C1] AS [C1],
[Project1].[OrderID1] AS [OrderID1],
[Project1].[ProductID] AS [ProductID],
[Project1].[UnitPrice] AS [UnitPrice],
[Project1].[Quantity] AS [Quantity],
[Project1].[Discount] AS [Discount]
FROM ( SELECT 
      [Limit1].[OrderID] AS [OrderID], 
      [Limit1].[CustomerID1] AS [CustomerID], 
      [Limit1].[EmployeeID] AS [EmployeeID], 
      [Limit1].[OrderDate] AS [OrderDate], 
      [Limit1].[RequiredDate] AS [RequiredDate], 
      [Limit1].[ShippedDate] AS [ShippedDate], 
      [Limit1].[ShipVia] AS [ShipVia], 
      [Limit1].[Freight] AS [Freight], 
      [Limit1].[ShipName] AS [ShipName], 
      [Limit1].[ShipAddress] AS [ShipAddress], 
      [Limit1].[ShipCity] AS [ShipCity], 
      [Limit1].[ShipRegion] AS [ShipRegion], 
      [Limit1].[ShipPostalCode] AS [ShipPostalCode], 
      [Limit1].[ShipCountry] AS [ShipCountry], 
      [Limit1].[CustomerID2] AS [CustomerID1], 
      [Limit1].[CompanyName] AS [CompanyName], 
      [Limit1].[ContactName] AS [ContactName], 
      [Limit1].[ContactTitle] AS [ContactTitle], 
      [Limit1].[Address] AS [Address], 
      [Limit1].[City] AS [City], 
      [Limit1].[Region] AS [Region], 
      [Limit1].[PostalCode] AS [PostalCode], 
      [Limit1].[Country] AS [Country], 
      [Limit1].[Phone] AS [Phone], 
      [Limit1].[Fax] AS [Fax], 
      [Extent3].[OrderID] AS [OrderID1], 
      [Extent3].[ProductID] AS [ProductID], 
      [Extent3].[UnitPrice] AS [UnitPrice], 
      [Extent3].[Quantity] AS [Quantity], 
      [Extent3].[Discount] AS [Discount], 
      CASE WHEN ([Extent3].[OrderID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
      FROM   (SELECT TOP (1) [Filter1].[OrderID], [Filter1].[CustomerID1], [Filter1].[EmployeeID], [Filter1].[OrderDate], [Filter1].[RequiredDate], [Filter1].[ShippedDate], [Filter1].[ShipVia], [Filter1].[Freight], [Filter1].[ShipName], [Filter1].[ShipAddress], [Filter1].[ShipCity], [Filter1].[ShipRegion], [Filter1].[ShipPostalCode], [Filter1].[ShipCountry], [Filter1].[CustomerID2], [Filter1].[CompanyName], [Filter1].[ContactName], [Filter1].[ContactTitle], [Filter1].[Address], [Filter1].[City], [Filter1].[Region], [Filter1].[PostalCode], [Filter1].[Country], [Filter1].[Phone], [Filter1].[Fax]
            FROM ( SELECT [Extent1].[OrderID] AS [OrderID], [Extent1].[CustomerID] AS [CustomerID1], [Extent1].[EmployeeID] AS [EmployeeID], [Extent1].[OrderDate] AS [OrderDate], [Extent1].[RequiredDate] AS [RequiredDate], [Extent1].[ShippedDate] AS [ShippedDate], [Extent1].[ShipVia] AS [ShipVia], [Extent1].[Freight] AS [Freight], [Extent1].[ShipName] AS [ShipName], [Extent1].[ShipAddress] AS [ShipAddress], [Extent1].[ShipCity] AS [ShipCity], [Extent1].[ShipRegion] AS [ShipRegion], [Extent1].[ShipPostalCode] AS [ShipPostalCode], [Extent1].[ShipCountry] AS [ShipCountry], [Extent2].[CustomerID] AS [CustomerID2], [Extent2].[CompanyName] AS [CompanyName], [Extent2].[ContactName] AS [ContactName], [Extent2].[ContactTitle] AS [ContactTitle], [Extent2].[Address] AS [Address], [Extent2].[City] AS [City], [Extent2].[Region] AS [Region], [Extent2].[PostalCode] AS [PostalCode], [Extent2].[Country] AS [Country], [Extent2].[Phone] AS [Phone], [Extent2].[Fax] AS [Fax], row_number() OVER (ORDER BY [Extent1].[OrderDate] ASC) AS [row_number]
                  FROM  [dbo].[Orders] AS [Extent1]
                  LEFT OUTER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerID] = [Extent2].[CustomerID]
                  WHERE [Extent1].[ShipCity] = 'PARIS'
            )  AS [Filter1]
            WHERE [Filter1].[row_number] > 2
            ORDER BY [Filter1].[OrderDate] ASC ) AS [Limit1]
      LEFT OUTER JOIN [dbo].[Order Details] AS [Extent3] ON [Limit1].[OrderID] = [Extent3].[OrderID]
)  AS [Project1]

ORDER BY [Project1].[OrderDate] ASC, [Project1].[OrderID] ASC, [Project1].[CustomerID1] ASC, [Project1].[C1] ASC

If we comment the Include, the SQL query is more readable Smile :

SELECT TOP (1)
[Filter1].[OrderID] AS [OrderID],
[Filter1].[CustomerID] AS [CustomerID],
[Filter1].[EmployeeID] AS [EmployeeID],
[Filter1].[OrderDate] AS [OrderDate],
[Filter1].[RequiredDate] AS [RequiredDate],
[Filter1].[ShippedDate] AS [ShippedDate],
[Filter1].[ShipVia] AS [ShipVia],
[Filter1].[Freight] AS [Freight],
[Filter1].[ShipName] AS [ShipName],
[Filter1].[ShipAddress] AS [ShipAddress],
[Filter1].[ShipCity] AS [ShipCity],
[Filter1].[ShipRegion] AS [ShipRegion],
[Filter1].[ShipPostalCode] AS [ShipPostalCode],
[Filter1].[ShipCountry] AS [ShipCountry]
FROM ( SELECT [Extent1].[OrderID] AS [OrderID], [Extent1].[CustomerID] AS [CustomerID], [Extent1].[EmployeeID] AS [EmployeeID], [Extent1].[OrderDate] AS [OrderDate], [Extent1].[RequiredDate] AS [RequiredDate], [Extent1].[ShippedDate] AS [ShippedDate], [Extent1].[ShipVia] AS [ShipVia], [Extent1].[Freight] AS [Freight], [Extent1].[ShipName] AS [ShipName], [Extent1].[ShipAddress] AS [ShipAddress], [Extent1].[ShipCity] AS [ShipCity], [Extent1].[ShipRegion] AS [ShipRegion], [Extent1].[ShipPostalCode] AS [ShipPostalCode], [Extent1].[ShipCountry] AS [ShipCountry], row_number() OVER (ORDER BY [Extent1].[OrderDate] ASC) AS [row_number]
      FROM [dbo].[Orders] AS [Extent1]
      WHERE [Extent1].[ShipCity] = 'PARIS'
)  AS [Filter1]
WHERE [Filter1].[row_number] > 2

ORDER BY [Filter1].[OrderDate] ASC

We can find our WHERE City = ‘Paris’, our WHERE row_number > 2 (for the Skip), out ORDER BY OrderDate and our TOP  1 (for the First).

 

Take care, this “sort of” provider LINQ is a POC. It is not finished. For example, it doesn’t support variables or the new operator (for date for example).

 

What happens if we integrate methods non supported by ClientLINQ?

It’s ok! Smile

LINQ To Object will be used.

For example, the following LINQ query:

var customerInfos = (from o in new NorthwindClientContext(service).Orders.Include("Customer")
                     where o.ShipCity == "PARIS"
                     orderby o.OrderDate
                     group o by o.Customer into
                     select new { g.Key.CompanyName, g.Key.ContactName, OrdersCount = g.Count() }).ToList();

will call this method

_service.GetOrders(myQueryable.IncludeValues.ToList(), myQueryable.WhereValue, myQueryable.OrderByValue, myQueryable.SkipValue, myQueryable.TakeValue)

with these parameters:

image

which will generate the following SQL query:

SELECT
[Extent1].[OrderID] AS [OrderID],
[Extent1].[CustomerID] AS [CustomerID],
[Extent1].[EmployeeID] AS [EmployeeID],
[Extent1].[OrderDate] AS [OrderDate],
[Extent1].[RequiredDate] AS [RequiredDate],
[Extent1].[ShippedDate] AS [ShippedDate],
[Extent1].[ShipVia] AS [ShipVia],
[Extent1].[Freight] AS [Freight],
[Extent1].[ShipName] AS [ShipName],
[Extent1].[ShipAddress] AS [ShipAddress],
[Extent1].[ShipCity] AS [ShipCity],
[Extent1].[ShipRegion] AS [ShipRegion],
[Extent1].[ShipPostalCode] AS [ShipPostalCode],
[Extent1].[ShipCountry] AS [ShipCountry],
[Extent2].[CustomerID] AS [CustomerID1],
[Extent2].[CompanyName] AS [CompanyName],
[Extent2].[ContactName] AS [ContactName],
[Extent2].[ContactTitle] AS [ContactTitle],
[Extent2].[Address] AS [Address],
[Extent2].[City] AS [City],
[Extent2].[Region] AS [Region],
[Extent2].[PostalCode] AS [PostalCode],
[Extent2].[Country] AS [Country],
[Extent2].[Phone] AS [Phone],
[Extent2].[Fax] AS [Fax]
FROM  [dbo].[Orders] AS [Extent1]
LEFT OUTER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerID] = [Extent2].[CustomerID]
WHERE [Extent1].[ShipCity] = 'PARIS'
ORDER BY [Extent1].[OrderDate] ASC

As you can see, no trace of group by. However, the result integrates it (with LINQ To Object).

 

In this post, I wanted to present my “sort of'” LINQ provider but the real goal of this post is to persuade you (and I’m sure you are Smile) of the EF / T4 couple productivity gain. Indeed,

  • your templates are reusable in different projects. You just have to update the edmx path and to let Visual Studio regenerate T4 templates code.
  • If you haven't already written the template you need, the dev time is not proportional to the number of entities which implies very quickly a productivity gain compare to a “classic” dev.

4 in 1: EF won’t fire DBAs, Velocity, SQL CLR and SQL Dependency

Imagine that we have a table Cars with a PK Id (nvarchar(8)) based on the following regex pattern [1-9][0-9]{1,2}[A-Z]{2,3}[0-9]{2}. The last two numbers are a region code.

The increasement is done like this: 10AA[region], 11AA[region], …, 999AA[region], 10AB[region], …, 999ZZ[region], 10AAA[region], …, 999ZZZ[region]

This table can be huge.

In a first time, we will write a LINQ To Entities query to get regions with 3 letters:

var q1 = (from c in context.Cars
          where c.Id.Contains("AAA")
          select c.Id.Substring(c.Id.Length - 2)).Distinct();

The SQL generated query is the following:

SELECT
[Distinct1].[C1] AS [C1]
FROM ( SELECT DISTINCT

      SUBSTRING([Extent1].[ Id ], ((LEN([Extent1].[ Id ])) - 2) + 1, (LEN([Extent1].[ Id ])) - ((LEN([Extent1].[ Id ])) - 2)) AS [C1]
      FROM [dbo].[Cars] AS [Extent1]
      WHERE [Extent1].[ Id ] LIKE N'%AAA%'
)  AS [Distinct1]

If I write it myself, I would probably write this:

SELECT DISTINCT
      RIGHT(Id, 2) AS [C1]
      FROM [Cars]
      WHERE [ Id ] LIKE N'%AAA%'

Note that the execution plan is the same!

Imagine that the DBA wants that I use the RIGHT function. With EF4, we can do it.

The Right doesn’t exist in C# but it does in ESQL. So we will use CSDL Functions.

In the CSDL, we will add the following code:

<Function Name="GetRegion" ReturnType="String">
  <Parameter Name="car" Type="Self.Car" />
  <DefiningExpression>
    Right(car.Id, 2)
  </DefiningExpression>
</Function>

Then,we will add an extension method to be able to use it in our LINQ To Entities queries:

public static class CarExtension
{
    [EdmFunction("CarsModel", "GetRegion")] 
    public static string GetRegion(this Car car)
    {
        throw new NotImplementedException("Only used by LINQ To Entities");
    }
}

I now can write my query like this:

var q1 = (from c in context.Cars
          where c.Id.Contains("AAA")
          select c.GetRegion()).Distinct();

This LINQ query is translated as the following SQL query:

SELECT
[Distinct1].[C1] AS [C1]
FROM ( SELECT DISTINCT
      RIGHT([Extent1].[ Id ], 2) AS [C1]
      FROM [dbo].[Cars] AS [Extent1]
      WHERE [Extent1].[ Id ] LIKE N'%AAA%'
)  AS [Distinct1]

Cool!

Now I want to get the last Id per region. I don’t think that there is a clean way to do it with SQL. Indeed, I don’t think that we can use Regex in SQL Server and we need it to determine which is the last id.

So to do this, I will use a LINQ To Object query. The “normal” way can be to write this:

var qL2E = from c in context.Cars
           orderby c.GetRegion()
           select c.Id;
var qL2O = from id in qL2E.AsEnumerable()
           group id by id.Substring(id.Length - 2) into g
           select new
           {
               Region = g.Key,
               Id = (from id in g
                     let letters = Regex.Match(id, "[A-Z]{2,3}")
                     orderby letters.Length descending, letters descending, Regex.Match(id, "^[0-9]{2,3}") descending
                     select id).FirstOrDefault()
           }.ToDictionary(id => id.Region, id => id.Id);

However, with this way, we will probably have an OutOfMemoryException. Indeed, with this query, all the Cars records are loaded in memory and we can have hundreds millions of rows.

In a first time, we will try a dichotomous approach.

var ids = (from c in context.Cars
           select c.GetRegion()).Distinct().AsEnumerable().ToDictionary(region => region, region => GetLastId(region, "", (new[] { "" }.Union(Enumerable.Range('A', 26).Select(i => char.ConvertFromUtf32(i)))), 0));

private static string GetLastId(string region, string letters, IEnumerable<string> possibleChars, int index)
{
    if (!possibleChars.Skip(1).Any()) // Count() == 1
    {
        letters += possibleChars.First();
        if (++index == 3)
        {
            using (var context = new CarsContainer())
            {
                return
                    (from c in context.Cars
                     where c.Id.EndsWith(letters + region)
                     orderby c.Id.IndexOf(letters) descending, c.Id descending
                     select c.Id).FirstOrDefault();
            }
        }
        return GetLastId(region, letters, Enumerable.Range('A', 26).Select(i => char.ConvertFromUtf32(i)), index);
    }
    else
    {
        var possibleCharsList = possibleChars.ToList();
        int middle = possibleCharsList.Count / 2;
        using (var context = new CarsContainer())
        {
            string idBeginning = string.Format("10{0}{1}{2}", letters, possibleCharsList[middle], "AA".Substring(index));
            if ((from c in context.Cars
                 where c.Id.StartsWith(idBeginning) && c.Id.EndsWith(region)
                 select c.Id).Any())
                return GetLastId(region, letters, possibleCharsList.Skip(middle), index);
            return GetLastId(region, letters, possibleCharsList.Take(middle), index);
        }
    }
}

The execution of this query is very slow (26 minutes and 18 seconds in my test with 562 012 347 records).

So what can we do?

The CPU is used by the DB at 100% and the slowly is only because of the DB. So it useless to imagine to parallelize our code.

We can imagine to resize the Data Base server but it is not the goal of this post.

In our case, the DBA (I specify that I am not a DBA) will probably want to change the Cars table schema in order to split into 3 columns the Id. However, in a lot of cases, it can be interesting to keep the Id column, which keeps the PK. Indeed, imagine that we have some existing applications. The idea is to add these columns with no impact on them.

So we will have four columns (all not nullable) :

  • Id
  • Number
  • Letters
  • Region

To set them, we will use CLR functions:

public class CarsFunction
{
    [SqlFunction]
    public static SqlInt16 GetNumber(SqlString id)
    {
        return new SqlInt16(short.Parse(Regex.Match(id.Value, "^[0-9]{2,3}").Value));
   

    [SqlFunction]
    public static SqlString GetLetters(SqlString id)
    {
        return new SqlString(Regex.Match(id.Value, "[A-Z]{2,3}").Value);
   

    [SqlFunction]
    public static SqlInt16 GetRegion(SqlString id)
    {
        return new SqlInt16(short.Parse(Regex.Match(id.Value, "[0-9]{2}$").Value));
    }
}

Then, in the DB, we will register the assembly and we will create some SQL functions from previous ones:

CREATE ASSEMBLY CarsFunctionsAssembly
FROM 'D:\documents\visual studio 2010\Projects\CarsFunctions\CarsFunctions\bin\Debug\CarsFunctions.dll'
GO   

CREATE FUNCTION GetNumber(@id AS nvarchar(8)) RETURNS smallint AS EXTERNAL NAME CarsFunctionsAssembly.CarsFunction.GetNumber
GO

CREATE FUNCTION GetLetters(@id AS nvarchar(8)) RETURNS nvarchar(3) AS EXTERNAL NAME CarsFunctionsAssembly.CarsFunction.GetLetters
GO

CREATE
FUNCTION GetRegion(@id AS nvarchar(8)) RETURNS smallint AS EXTERNAL NAME CarsFunctionsAssembly.CarsFunction.GetRegion

Ok, we have the functions, so we can be back on the new columns creation. Instead of creating “normal” columns, we will use computed columns. We can do it with the SSMS designer by setting the Computed Column Specification Formula to dbo.GetNumber(Id) (resp dbo.GetLetters(Id), dbo.GetRegion(Id)). In our case, the idea isn’t to calculate them each time but to keep their values. So we will set the “Is Persisted” property to true.

But there is an issue: the function has to be deterministic. Our function is. However, because it is a CLR function, SQL Server can’t know it. So we will tell it explicitly.

To do this, we will change the CLR functions code like this:

[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static SqlInt16 GetNumber(SqlString id)
{
    return new SqlInt16(short.Parse(Regex.Match(id.Value, "^[0-9]{2,3}").Value));
}

[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static SqlString GetLetters(SqlString id)
{
    return new SqlString(Regex.Match(id.Value, "[A-Z]{2,3}").Value);
}

[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static SqlInt16 GetRegion(SqlString id)
{
    return new SqlInt16(short.Parse(Regex.Match(id.Value, "[0-9]{2}$").Value));
}

Now, we can persist the three columns.

Be careful, if we create an edmx from the DB, we will have a StoreGeneratedPattern="Computed" on the columns Number, Letters and Region. It implies that in each update, EF, we reload the properties Number, Letters et Region. In our case, the calculation is only done on the key. With Entity Framework, it is not possible to change the value of the key. So, it is useful to change the StoreGeneratedPattern from Computed to Identity.

Cool, we have fixed the INSERT issue.

Let’s now concentrate on the goal: our query.

We don’t need a dichotomous approach, a simple LINQ To Entities query is enough:

var ids = (from c in context.Cars
           group c by c.Region into g
           select new
           {
               Region = g.Key,
               LastId = (from c in g
                         orderby c.Letters.Length descending, c.Letters descending, c.Number descending
                         select c.Id).FirstOrDefault() 
           }).AsEnumerable().ToDictionary(region => region.Region, region => region.LastId);

With this, our query runs in 1 minute and 42 seconds instead of 26 minutes and 18 seconds!

It’s better but it isn’t finished yet.

In the DB, we will set the Cars table cluster on the Region column (instead of the Id column).

In this case, the query execution falls to 12 seconds!

However, be careful! It’s really great in our sample but in the real life, don’t forget to study all the DB query on the table Cars before changing the cluster.

So the DBA role is important in the success of the project.

 

Now, imagine that we want to keep in cache the last id per region. To do it, we can use a static Dictionary. But there are two issues:

  • Imagine a N-Tiers scenario. If our application is deployed on more than one frontal server, it’s a shame to have to initialize the cache for each of them. Moreover, we have to maintain each of them with the last values
  • If another application adds or removes a car in the DB, how to maintain our cache with the last values?

To fix the first issue, we will Velocity. For the second, we will use SQLDependency.

Velocity is a distributed cache which allows a lot of very interesting things like load balancing. This distributed cache will be shared by all the frontal servers. It means that we won”t have to manage one cache per server.

What do we have to do to use Velocity? First point: download it and install it.Smile

When the installation is done, we will add the following references in our project:

  • CacheBaseLibrary.dll
  • CASBase.dll
  • CASMain.dll
  • ClientLibrary.dll
  • FabricCommon.dll

Then, we have to start the Velocity cluster. In this CTP, all administration commands are written with the console “Administration Tool - Microsoft Distributed Cache”. Then we write the following command line:

start-cachecluster

Then, in the config file, we wiil add these lines:

<configSections>
  <
section name="dataCacheClient" type="Microsoft.Data.Caching.DataCacheClientSection, CacheBaseLibrary" allowLocation="true" allowDefinition="Everywhere"/>
</
configSections>
<
dataCacheClient deployment="routing">
  <
localCache isEnabled="true" sync="TTLBased" ttlValue="60000"/>
  <
hosts>
    <
host name="MATTHIEU-PRO" cachePort="22233" cacheHostName="DistributedCacheService"/>
    <
host name="MATTHIEU-LAPTOP1" cachePort="22233" cacheHostName="DistributedCacheService"/>
    <
host name="MATTHIEU-LAPTOP2" cachePort="22233" cacheHostName="DistributedCacheService"/>
    <
host name="MATTHIEU-PC1" cachePort="22233" cacheHostName="DistributedCacheService"/>
    <
host name="MATTHIEU-PC2" cachePort="22233" cacheHostName="DistributedCacheService"/>
  </
hosts>
</
dataCacheClient>

Now we can imagine to query the cache in our application (server tier).

return new DataCacheFactory().GetDefaultCache().GetObjectsInRegion("LastImmatPerRegion").ToDictionary(keyValuePair => short.Parse(keyValuePair.Key), keyValuePair => (string)keyValuePair.Value);

With the data in the cache, the query executes in 30 ms!

Now, we have to fix the last point: initializing the cache and maintaining it.

To initialize it, it’s very easy. We just have to use the above query:

var cache = new DataCacheFactory().GetDefaultCache();

try
{
  
cache.RemoveRegion(LAST_IMMAT_PER_REGION);
}
catch
{
} 
cache.CreateRegion(LAST_IMMAT_PER_REGION, false);
foreach (var car in from c in context.Cars
                    group c by c.Region into g
                    select new
                    {
                        Region = g.Key,
                        LastId = (from c in g
                                  orderby c.Letters.Length descending, c.Letters descending, c.Number descending
                                  select c.Id).FirstOrDefault()
                    })
    cache.Put(car.Region.ToString(), car.LastId, LAST_IMMAT_PER_REGION);

Now to maintain the cache with the last values, it’s harder. My idea is to use a SQLDependency. The issue is the fact that we can’t, a priori, get the modifications with SQLDependency and, of course, I don’t want to refresh all my cache. So my idea is to add a Trigger on the Cars table which will fill a new table (for updates).

To begin, we will create this new table: CarsModificationsTmp with six columns:

  • TmpId (int Identity(1,1), PK),
  • CarId (nvarchar(8), not nullable)
  • Number (smallint, not nullable)
  • Letters (nvarchar(3), not nullable)
  • Region (smallint, not nullable),
  • Deleted (bit, not nullable, default=0)

Then, we have to define triggers on Cars table:

CREATE TRIGGER CarsInserted
ON Cars
FOR INSERT
AS
BEGIN
  DECLARE @Id AS nvarchar(8)
  DECLARE @Number AS smallint
  DECLARE @Letters AS nvarchar(3)
  DECLARE @Region AS smallint
  SELECT @Id = Id, @Number = Number, @Letters = Letters, @REGION = Region FROM Inserted
  INSERT INTO CarsModificationsTmp(CarId, Number, Letters, Region) VALUES(@Id, @Number, @Letters, @Region)
END
GO
 
CREATE TRIGGER CarsDeleted
ON Cars
FOR DELETE
AS
BEGIN
  DECLARE @Region AS smallint
  DECLARE @Id AS nvarchar(8)
  DECLARE @Number AS smallint
  DECLARE @Letters AS nvarchar(3)
  SELECT @Id = Id, @Number = Number, @Letters = Letters, @REGION = Region FROM Deleted
  INSERT INTO CarsModificationsTmp(CarId, Number, Letters, Region, Deleted) VALUES(@Id, @Number, @Letters, @Region, 1)
END

Last, we will modify our code to use SQL Dependency.

private void Load()
{
    using (var context = new CarsContainer())
    {
        _cache = new DataCacheFactory().GetDefaultCache();
        try
        {
            _cache.RemoveRegion(LAST_IMMAT_PER_REGION);
        }
        catch
        {
        }
        _cache.CreateRegion(LAST_IMMAT_PER_REGION, false);
        foreach (var car in from c in context.Cars
                            group c by c.Region into g
                            select new
                            {
                                Region = g.Key,
                                LastId = (from c in g
                                          orderby c.Letters.Length descending, c.Letters descending, c.Number descending
                                          select c.Id).FirstOrDefault()
                            })
            _cache.Put(car.Region.ToString(), car.LastId, LAST_IMMAT_PER_REGION);

 

        _connectionString = ((EntityConnection)context.Connection).StoreConnection.ConnectionString;
        SqlDependency.Stop(_connectionString);
        SqlDependency.Start(_connectionString);
        DefineCarsNotification();
    }
}

 

private
void DefineCarsNotification()
{
    var connection = new SqlConnection(_connectionString);
    var command = connection.CreateCommand();
    command.CommandText = "SELECT TmpId, Region, CarId, Deleted FROM CarsModificationsTmp";
    command.CommandType = CommandType.Text;
    var sqlDependency = new SqlDependency(command);
    sqlDependency.OnChange += SqlDependency_OnChange;
    connection.Open();
    command.ExecuteNonQuery();
    connection.Close();
}

 

private
void SqlDependency_OnChange(object sender, SqlNotificationEventArgs e)
{
    using (var contextModifications = new CarsContainer())
    {
        bool any = false;
        foreach (var cm in contextModifications.CarsModificationsTmps)
        {
            any = true;
            var cacheModifications = new DataCacheFactory().GetDefaultCache();
            var cacheItem = cacheModifications.GetCacheItem(cm.Region.ToString(), LAST_IMMAT_PER_REGION);
            if (cm.Deleted)
            {
                if (cacheItem != null && cacheItem.Value.ToString() == cm.CarId)
                {
                    var newLastId = (from c in contextModifications.Cars
                                     where c.Region == cm.Region
                                     orderby c.Letters.Length descending, c.Letters descending, c.Number descending
                                     select c.Id).FirstOrDefault();
                    if (newLastId == null)
                        _cache.Remove(cm.Region.ToString(), LAST_IMMAT_PER_REGION);
                    else
                        _cache.Put(cm.Region.ToString(), newLastId, LAST_IMMAT_PER_REGION);
                }
            }
            else
            {
                string letters;
                int lettersCompare = 0;
                if (cacheItem == null || (letters = Regex.Match(cacheItem.Value.ToString(), "[A-Z]{2,3}").Value).Length < cm.Letters.Length || (letters.Length == cm.Letters.Length && ((lettersCompare = String.Compare(letters, cm.Letters)) < 0 || lettersCompare == 0 && short.Parse(Regex.Match(cacheItem.Value.ToString(), "^[0-9]{2,3}").Value) < cm.Number)))
                    _cache.Put(cm.Region.ToString(), cm.CarId, LAST_IMMAT_PER_REGION);
            }
            contextModifications.DeleteObject(cm);
        }
        if (any)
            contextModifications.SaveChanges();
    }
    DefineCarsNotification();

}

We have a possible bug with this. Indeed, SqlDependency_OnChange is asynchronous and so we can have some parallels calls which implies some concurrent issues. Velocity DataCache is thread-safe. But, with the previous code, the concurrency doesn’t allow us to be sure we have the last id in the cache. To fix this, we will use a lock.

if (cm.Deleted)
{
    lock (_lockObject)
    {
        if (cacheItem != null && cacheItem.Value.ToString() == cm.CarId)
        {
            var newLastId = (from c in contextModifications.Cars
                             where c.Region == cm.Region
                             orderby c.Letters.Length descending, c.Letters descending, c.Number descending
                             select c.Id).FirstOrDefault();
            if (newLastId == null)
                _cache.Remove(cm.Region.ToString(), LAST_IMMAT_PER_REGION);
            else
                _cache.Put(cm.Region.ToString(), newLastId, LAST_IMMAT_PER_REGION);
        }
    }
}
else
{
    string letters;
    int lettersCompare = 0;
    lock (_lockObject)
    {
        if (cacheItem == null || (letters = Regex.Match(cacheItem.Value.ToString(), "[A-Z]{2,3}").Value).Length < cm.Letters.Length || (letters.Length == cm.Letters.Length && ((lettersCompare = String.Compare(letters, cm.Letters)) < 0 || lettersCompare == 0 && short.Parse(Regex.Match(cacheItem.Value.ToString(), "^[0-9]{2,3}").Value) < cm.Number)))
            _cache.Put(cm.Region.ToString(), cm.CarId, LAST_IMMAT_PER_REGION);
    }
}

We have another thing like this with the SaveChanges. Indeed, with concurrency, we can try to delete an already deleted DataRow which will throw an OptimisticConcurrencyException. My idea is: if the SaveChanges throws an exception, we will detach all the entities which implies the exception (ie: which are already deleted).

if (any)
    for (; ; )
        try
        {
            contextModifications.SaveChanges();
            break;
        }
        catch (OptimisticConcurrencyException ex)
        {
            foreach (var ose in ex.StateEntries)
                contextModifications.Detach(ose.Entity);
        }

Our solution is now thread-safe and is (I hope so Smile) bug free. However, we can improve our code. If we have two update notifications in parallel which aren’t in the same region, it’s a shame to wait that the first ends before treating the second (what we do with our single lock). In fact, my idea is to use a dictionnary of locks instead of a single one. However, the Dictionary class is not thread-safe! We can use another object to lock the Dictionary but with .NET 4, we have a very useful new class: ConcurrentDictionary.

private ConcurrentDictionary<short, object> _concurrentDictionary = new ConcurrentDictionary<short,object>();

Now, we will modify our code to use a lock per region:

object lockRegion = _concurrentDictionary.GetOrAdd(cm.Region, new object());
if (cm.Deleted)
{
    lock (lockRegion)
    {
        if (cacheItem != null && cacheItem.Value.ToString() == cm.CarId)
        {
            var newLastId = (from c in contextModifications.Cars
                             where c.Region == cm.Region
                             orderby c.Letters.Length descending, c.Letters descending, c.Number descending
                             select c.Id).FirstOrDefault();
            if (newLastId == null)
                _cache.Remove(cm.Region.ToString(), LAST_IMMAT_PER_REGION);
            else
                _cache.Put(cm.Region.ToString(), newLastId, LAST_IMMAT_PER_REGION);
        }
    }
}
else
{
    string letters;
    int lettersCompare = 0;
    lock (lockRegion)
    {
        if (cacheItem == null || (letters = Regex.Match(cacheItem.Value.ToString(), "[A-Z]{2,3}").Value).Length < cm.Letters.Length || (letters.Length == cm.Letters.Length && ((lettersCompare = String.Compare(letters, cm.Letters)) < 0 || lettersCompare == 0 && short.Parse(Regex.Match(cacheItem.Value.ToString(), "^[0-9]{2,3}").Value) < cm.Number)))
            _cache.Put(cm.Region.ToString(), cm.CarId, LAST_IMMAT_PER_REGION);
    }

}

It’s now finish. Enjoy Smile

ADO.NET Data Services Client context

One of the great points with EF is the eco-system around it. Even if it can be used without EF, I think that ADO .NET Data Services is one of the technologies in this eco-system. Indeed, using ADO.NET Data Services with EF implies almost no code to write some LINQ queries in the client tier or to save the changes.

However, when we use the generated proxy in the client tier there are some bad points:

  • For example with Northwind, if we add an orderDetail to the order.OrderDetails collection, the orderDetail.Order is still null. In reverse, if we set the orderDetail.Order, the orderDetail isn’t added to the order.OrderDetails collection.
  • Moreover, if we load the orders and then the orderDetails, the OrderDetails collection of the orders is empty and the orderDetails order is null. However, because OrderID is a part of the orderDetails key, we should be able to get automatically the relation as EF do.
    //To have this relation with ADO.NET Data Services, we have to use the Expand method in the query or to use the LoadProperty method.
    With EF4, we can have FK in the model (with EF v1, we have only the FKs which are included in the PK) which is really great because, in theory, it’s now possible to reform the relations.
  • Another point: if you want to add an order with some (new) orderDetails, you need to call the Add for the order, then the Add for all orderDetails (the MS client context ignore the navigation properties) and then you also need to call the methods SetLink and AddLink (because the order key is an Identity). So in fact this is the code you have to write:
context.AddToOrders(o);
foreach (var od in o.OrderDetails)
{
    context.AddToOrderDetails(od);
    context.AddLink(o, "OrderDetails", od);
    context.SetLink(od, "Order", o);
}
  • One more point, the context doesn’t track changes alone. You have to call yourself the UpdateObject method.
  • Finally, if we add an orderDetail to the context and only then the associated order, when we call the SaveChanges, we have an exception because the context doesn’t change the Add order

So my idea was to code another client tier context. But of course, I wanted a generic solution. It means that I wanted to code something usable with every model. To realize it, I used a T4 template based on the edmx (in the server tier). This implies two things:

  • We need to use EF in the server tier
  • We need to have the server tier edmx when we code the client tier

My template fixes all the above issues.

You can download it here.

You also can download all the solution (with unit tests) here and the DB creation script here.

//I admit that some parts of the code aren’t great but ADO.NET Data Services is an extremely “closed” technology so it isn’t so easy and I would probably have some better ideas if I didn’t write it so late in the night Wink