December 2012 - Posts
In deze podcast spreekt Maurice de Beijer met Marcel Meijer over huidige stand van zaken bij Windows Azure. Ze blikken even terug op de eerste versie van Azure en wat je toen allemaal wel en niet kon. Het grootste deel van de podcast gaat over hoe Azure veranderd is en wat je tegenwoordig allemaal kan doen. Windows Azure is een dynamisch platform en er is dus veel nieuws te melden. Daarom komt niet alles aan bod en gaan we binnenkort een vervolg podcast opnemen.
Links:
Met dank aan onze sponsor RedGate.
Some time ago I posted a blog post on Tincr and live reloading of CSS/JavaScript in Google Chrome. This works really well with one exception, on Windows 8 it will not install. When you try Chrome shows the following error message:
This application is not supported on this computer. Installation has been disabled.
The Chromium team has acknowledged this as a bug but it still needs to be fixed.
The interim solution
Fortunately Lauricio Su came up with a nice workaround and posted it in the Tincr discussion group. Basically his solution is to run Chrome in Windows 7 compatibility mode and install Tincr. After that Chrome can be started in the normal way and Tincr will work just fine.
This worked just fine for me with one addition. I had to keep once instance of Chrome running before starting a new instance in compatibility mode. Without that Chrome would not get a connection to the Internet.
Enjoy!
In deze podcast spreekt Maurice de Beijer met Willem Meints over wat je precies kan met Mono for Android. Verder hebben ze het over de geschiedenis van het open source Mono framework.
Link:
Met dank aan onze sponsor RedGate.
As we have seen in previous blog posts getting data from a RavenDB database is easy. Either use the IDocumentSession.Query<T>() function to load a series of documents or the IDocumentSession..Load<T>() function to load a document using its identity. However sometimes we want more control over what we want to load. It turns out this is rather easy as the IDocumentSession.Query<T>() function returns an IQueryable<T>, actually it returns a IRavenQueryable<T> to be exact but more about that another time.
Querying the database
As IDocumentSession.Query<T>() return an IQueryable<T> we can just start composing queries just as we can with EntityFramework or another ORM. In my original online example you might have noticed that the books aren't ordered. As with any query this is easy to correct. All we need to do is add an order by clause.
1: public ActionResult Index()
2: {
3: using (var session = MvcApplication.DocumentStore.OpenSession())
4: {
5: var books = session.Query<Book>()
6: .OrderBy(b => b.Title)
7: .ToList();
8:
9: return View(books);
10: }
11: }
If you prefer the full LINQ syntax that is no problem as it produced exactly the same result±
1: public ActionResult Index()
2: {
3: using (var session = MvcApplication.DocumentStore.OpenSession())
4: {
5: var books = from book in session.Query<Book>()
6: orderby book.Title
7: select book;
8:
9: return View(books.ToList());
10: }
11: }
Filtering data
In this case I am only ordering the result set but you can also use other operators if you like. Just want books from an author starting with “A”? No problem, just add the filter and it works. The following view search for books where the title or author starts with the passed string.
1: public ActionResult BooksByAuthor(string author)
2: {
3: using (var session = MvcApplication.DocumentStore.OpenSession())
4: {
5: var books = session.Query<Book>()
6: .Where(b => b.Author.StartsWith(author) || b.Title.StartsWith(author))
7: .OrderBy(b => b.Title)
8: .ToList();
9:
10: return View("Index", books);
11: }
12: }
Lucene.Net indexes
If you try this you might notice one big difference from EntityFramework or in memory LINQ queries. If you search for books starting with a lowercase “the” both “The Hitchhiker's Guide to the Galaxy” and “The RavenDB Book” are returned. And in the query I am not doing anything about making this a case insensitive search. It turns out RavenDB does so by default as it uses Lucene.Net to create indexes. As we didn’t create any indexes you might be wondering where this comes from. It turns out that RavenDB always uses an index when you do a query. You can explicitly create them if you want to. However if you do not, as is the case here, RavenDB will try to find a matching index or create a temporary index if a marching one isn’t found. If the same temporary index is used frequently RavenDB will automatically promote it to a permanent index. That might sound like a bad idea, after all in SQL Server index help with querying but can also slow updates down. In RavenDB this isn’t the case as indexes are update asynchronously so we get the query benefits without the update penalties. Of course there is no such thing as a free lunch and in the case of RavenDB this can mean we use a stale index for the query. That might sounds bad but isn’t quite as bad as it sounds as the data is never stale, just the index, and we can easily control when we want to wait for a non stale index.
Want to try? I have this example running on Windows Azure here.
Enjoy!
In a previous post I explored various options of hosting RavenDB. While using RavenDB as a Windows Service or hosted in IIS is probably the best way to go in most cases there are a number of cases where the embedded option is great. And one of the places where I really like the embedded option of hosting RavenDB is when doing simple standalone websites. And with the new Azure website option that is a great way to host them.
The code is real straightforward and just like before. I am using a DataDirectory option when creating the EmbeddableDocumentStore here but that does the same thing as I did before using the connection string.
1: public static DocumentStore DocumentStore { get; private set; }
2:
3: private void CreateRavenDB()
4: {
5: DocumentStore = new EmbeddableDocumentStore()
6: {
7: DataDirectory = @"~\App_Data\db"
8: };
9: DocumentStore.Initialize();
10: }
RavenDB Configuration
The example code from my previous blog posts will just work except for one gotcha. By default RavenDB will search for a post to use by itself. However that means scanning the available posts and that isn’t permitted by default. The result is an exception:
System.Net.NetworkInformation.NetworkInformationException: Access is denied
With a stack trace like this:
[NetworkInformationException (0x5): Access is denied]
System.Net.NetworkInformation.SystemIPGlobalProperties.GetAllTcpConnections() +1570717
System.Net.NetworkInformation.SystemIPGlobalProperties.GetActiveTcpListeners() +74
Raven.Database.Util.PortUtil.FindPort() +79
Raven.Database.Util.PortUtil.GetPort(String portStr) +133
Raven.Database.Config.InMemoryRavenConfiguration.Initialize() +4355
Raven.Database.Config.RavenConfiguration.LoadConfigurationAndInitialize(IEnumerable`1 values) +307
Raven.Database.Config.RavenConfiguration..ctor() +196
Raven.Client.Embedded.EmbeddableDocumentStore.get_Configuration() +87
Raven.Client.Embedded.EmbeddableDocumentStore.set_DataDirectory(String value) +40
MvcApplication3.MvcApplication.CreateRavenDB() +57
MvcApplication3.MvcApplication.Application_Start() +169
The fix is simple. Instead of letting RavenDB search for a port by itself just specify one in the web.config. The setting required is "Raven/Port".
1: <configuration>
2: <appSettings>
3: <add key="Raven/Port"
4: value="9090"/>
5: ....
6: </appSettings>
7: ....
That is all there is to it :-)
Want to see for yourself?
I have a simple page using RavenDB as the backend store hosted on Azure right here.
Enjoy!
In the previous blog posts about RavenDB I used the Raven.Server.exe to create a database server. Just running raven.Server.exe and connecting to it is fine for development but certainly not our only option.
Running RavenDB as a Windows Service
Once you have the RavenDB.Server NuGet package you can install it as a Windows Service. This means it is always available and running whenever you need it. Certainly a nice and simple way of using RavenDB if you do so more often.
Installing is easy, just run Raven.Server.exe /install and it installs.
Running as an IIS application
Another option for hosting RavenDB is creating an IIS application and letting IIS take care of things. There is a little bit more work to it and I consider it more of a deployment option but its a great way of hosting RavenDB in production.
Running RavenDB in embedded mode
Yet one more option is running RavenDB in embedded mode. This is a great option if you don’t want to, or can’t, deploy an extra IIS application. A number of my websites run on a budget hosting infrastructure that make it harder to deploy extra IIS applications. Deploying to Azure Web Sites is another of those places where embedded mode is really useful.
As the client also contains the server the setup and capabilities are a little different. The most obvious difference is using the RavenDB.Embedded NuGet package and the EmbeddableDocumentStore class. The initialization code in the Global.asax now looks like this:
1: public static DocumentStore DocumentStore { get; private set; }
2:
3: private void CreateRavenDB()
4: {
5: DocumentStore = new EmbeddableDocumentStore()
6: {
7: ConnectionStringName = "ravenDB"
8: };
9: DocumentStore.Initialize();
10: }
And in the web.config we need the following configuration:
1: <connectionStrings>
2: <add name="ravenDB"
3: connectionString="DataDir = ~\App_Data\RavenDB"/>
4: </connectionStrings>
In this case the RavenDB folder inside the special App_Data folder is where all data is stored.
One thing to keep in mind is that this package uses the original 4.0.8 version of Newtonsoft.Json making it difficult to use with some of the newer releases of the ASP.NET team that ship with Newtonsoft.Json version 4.5.6. The upcoming RavenDB 2.0 should take care of that.
Conclusion
There are quite a few options when it comes to hosting RavenDB. Just choose whatever works best for your situation. In most business application I would opt for hosting as an IIS application if possible and otherwise as a Windows Service. When running on budget web hosting environment or other special cases the embedded option is really nice.
Enjoy!
Quite a few people seem to be intimidated by the concept of Map-Reduce. As it turns out Map-Reduce is actually quite simple and straightforward when you get to understand the basic principle.
Basic principle
The basic Map-Reduce consists of two steps. I guess you are not going to be very surprised when I tell you that these steps are called Map and Reduce.
The Map process gets the raw data as input and discard anything in the data we are not interested in, basically each input has a corresponding output so we end up with the same number of simplified data elements. Simple right?
And the Reduce process takes the output of the Map process and combines/discards data to produce the result. Again pretty simple right? There is one catch with the Reduce though. It should output exactly the same data structure as it took as its input. The point being is you can run the same Reduce process multiple times if you would like to. I will come back to why you might want to do so.
A simple example.
The following code loads a few orders and does a Map reduce job on them.
1: private static void MapReduce()
2: {
3: var orders = LoadOrders();
4: Print(orders);
5:
6: var mapped = Map(orders);
7: Print(mapped);
8:
9: var reduced = Reduce(mapped);
10: Print(reduced);
11: }
The Orders data structure looks like this:
1: public class Order
2: {
3: public Order()
4: {
5: Lines = new List<OrderLine>();
6: }
7:
8: public int CustomerId { get; set; }
9: public List<OrderLine> Lines { get; set; }
10: }
11:
12: public class OrderLine
13: {
14: public string Book { get; set; }
15: public int Quantity { get; set; }
16: public decimal Price { get; set; }
17: }
As you can see we have an order with a customer and a number of order lines. Each order line contains the books title, the quantity and the amount it was sold for. During the Map process we just extract the data we are interested in. In this case the books title and the price it was sold for. We store that in the following structure:
1: public class ReducedTo
2: {
3: public string Book { get; set; }
4: public decimal Amount { get; set; }
5: }
The Map Process
So the Map function takes a collection of Orders as input and returns a collection of ReduceTo items.
1: private static IEnumerable<ReducedTo> Map(IEnumerable<Order> orders)
2: {
3: var result = new List<ReducedTo>();
4:
5: foreach (var order in orders)
6: {
7: foreach (var line in order.Lines)
8: {
9: result.Add(new ReducedTo()
10: {
11: Book = line.Book,
12: Amount = line.Price
13: });
14: }
15: }
16:
17: return result;
18: }
In fact you could easily rewrite this into a LINQ query like this:
1: private static IEnumerable<ReducedTo> Map(IEnumerable<Order> orders)
2: {
3: return from order in orders
4: from line in order.Lines
5: select new ReducedTo
6: {
7: Book = line.Book,
8: Amount = line.Price
9: };
10: }
And in fact that is all there is to a Map process, it is just a LINQ select clause. And when you look at it that way it becomes a lot simpler and easier to understand :-)
The Reduce process
With the Map done we can focus on the Reduce part. The important part is that the Reduce takes the result of the Map function both as input and output. As long as we take this into account the function is quite simple to write.
1: private static IEnumerable<ReducedTo> Reduce(IEnumerable<ReducedTo> input)
2: {
3: var result = new List<ReducedTo>();
4:
5: foreach (var reduced in input)
6: {
7: var existing = result.FirstOrDefault(r => r.Book == reduced.Book);
8: if (existing != null)
9: {
10: existing.Amount += reduced.Amount;
11: }
12: else
13: {
14: result.Add(reduced);
15: }
16: }
17:
18: return result;
19: }
Pretty simple and if we think about it it is really just a grouping. So just as with the Map process we could easily rewrite this as a LINQ group by query like this:
1: private static IEnumerable<ReducedTo> Reduce(IEnumerable<ReducedTo> input)
2: {
3: return from sale in input
4: group sale by sale.Book into grp
5: select new ReducedTo()
6: {
7: Book = grp.Key,
8: Amount = grp.Sum(item => item.Amount)
9: };
10: }
So where the Map process is just a LINQ select the Reduce process is just a LINQ group by with the additional collection that the input type is also the output type. Makes things a lot simpler right :-)
Running the sample code produces output like this.
Advantages of Map-Reduce
We have seen Map-Reduce is just another way of doing a LINQ group by query so what is the big deal? One of the biggest advantage of Map-Reduce is the fact that you can parallelize them. You can do so over multiple threads inside a single process or if you want to scale this out over many machines. The key to this is being able to repeat the Reduce step multiple times.
Suppose we want to analyze a lot of sales data and compute the sales amount per article. We could partition the data into as many partitions as we want. Lets assume that we split the sales orders into three partitions. For each group of orders we could map the sales into a collection of article and sales amount pair where each article could be repeated multiple times. After this is done the Reduce step groups the data so we have a single entry for each article with the total sales amount for that partition. Now when that has been done for each partition we combine the three reduced sets into a single set. This is much like the set after the first Map process and contains duplicate entries for each article so all we need to do is run the same reduce process over this much smaller set of data to get the final result. So in reality it is just divide and conquer where we divide the original data into as many partitions as we want to. In other words, a really easy system for scale out scenarios :-)
This is quite easy to do now with the same Map and Reduce functions we used before. The code below loads 25 different sets of orders and runs the Map and Reduce jobs on each of them. Once that is done it combines the different output sets into one new input and runs another Reduce over that to get the end result.
1: private static void ParallelMapReduce()
2: {
3: var tasks = new List<Task<IEnumerable<ReducedTo>>>();
4: for (int i = 0; i < 25; i++)
5: {
6: var task = Task<IEnumerable<ReducedTo>>.Factory.StartNew(() =>
7: {
8: var orders = LoadOrders();
9: var mapped = Map(orders);
10: var reduced = Reduce(mapped);
11: return reduced;
12: });
13: tasks.Add(task);
14: }
15:
16: Task.WaitAll(tasks.ToArray());
17:
18: foreach (var task in tasks)
19: {
20: Print(task.Result);
21: }
22:
23: var result = Reduce(tasks.Select(t => t.Result));
24: Print(result);
25: }
To combine the different results I created an simple overload of the Reduce function that just combines multiple inputs and calls the original Reduce function like this:
1: private static IEnumerable<ReducedTo> Reduce(IEnumerable<IEnumerable<ReducedTo>> args)
2: {
3: var input = new List<ReducedTo>();
4:
5: foreach (var item in args)
6: {
7: input.AddRange(item);
8: }
9:
10: return Reduce(input);
11: }
Running this will produce output like below. The white Map/Reduce output is different reduced sets of output from the 25 tasks and the yellow output the the second Reduce done with the output from the first 25 Reduce calls.
Conclusion
Map-Reduce is just another way of doing a LINQ group by but one that is very scalable because of a few simple rules. And once you understand that there is no longer a reason to be intimidated by Map-Reduce and start embracing its power.
Enjoy!