• A little of RavenDB

    Published by on February 1st, 2013 6:44 pm under c#, DB, Distributed databases, MapReduce, NoSQL, ORM, RavenDB

    No Comments

    Hi, a few days ago I published an article showing how you can easily use MongoDB under C#. In this article I mainly plan to show the same operations or features, but using RavenDB. As this database is built in C# it will be easier to integrate it a native C# client.

    Besides both databases have very similar features, RavenDB supports ACID transactions under the unit or work pattern. For a fully set of features that both databases support, and a comparison, I suggest you to go to their respective home sites and read more about them.

    So, in order to try some of its features, I’ve created a console application using RavenDB (after installing the server, the client driver is very simple to add to your solution – just using NuGet).

    First at all you need to get a document store instance and configure it before using sessions, after you downloaded the client NuGet package you can do it like this:

    var documentStore = new DocumentStore { Url = "http://localhost:8080/" };
    documentStore.DefaultDatabase = "mauro";
    documentStore.Conventions.FindTypeTagName = (t) => t.Equals(typeof(City)) ? "Cities" : string.Format("{0}s", t.Name);
    documentStore.Initialize();
    

    Now you can perform for example a book adding operation. You will have to do it under a session (that will track all your changes on objects, as an ORM does, and commit or discard the whole thing):

    using (var session = documentStore.OpenSession())
    {
     session.Store(new Book
     {
       Title = "For Whom the Bell Tolls",
       Authors = new string[] { "Ernest Hemingway" },
       Price = 15.64M
     });
     session.SaveChanges();
    }
    

    Where your Book class is defined as:

    public class Book
    {
     public string Title { get; set; }
     public string[] Authors { get; set; }
     public decimal Price { get; set; }
    }
    

    How we can iterate over all our book collection? Well, as RavenDB is ‘Safe by Default‘ meaning, among other things, that you can NOT do a ‘SELECT * from Books‘ you will need to paginate through your collection or take another approach for custom queries and/or aggregation operations.

    For this purpose is that RavenDB uses indexes for everything. And what is an index under this DB? Well, in the most simple concept an index here is a MapReduce operation. If you don’t have an existing, or static in RavenDB language, index for a specific query, the server will create a temporary one for you when needs to resolve a query (I recommend you to read more about at the home site documentation as indexes are the core of the inner working of queries).

    So, how we can define a static index to resolve a specific query? This is pretty simple from the C# client and using LINQ. You can set for example this index to query all the books in the book collection (anyway remember always the ‘safe’ behavior):

    using (var session = documentStore.OpenSession())
    {
     if (session.Advanced.DocumentStore.DatabaseCommands.GetIndex("allBooks") == null)
     {
       session.Advanced.DocumentStore.DatabaseCommands.PutIndex("allBooks", new IndexDefinitionBuilder<Book>
       {
          Map = documents => documents.Select(entity => new { })
       });
     }
    }
    

    You can choose to create an IndexDefinition with your map/reduce function as strings within it, or use the IndexDefinitionBuilder helper (as here) and make things easier as you can use code-insight help to write the LINQ queries and you have all your domain objects at hand. As we are returning an empty object here for each one of the iterated books we are creating like a ‘mock index’ over the whole collection.

    Tip: you can use this index to delete the book collection with the DeleteByIndex() command as it is shown next:

    session.Advanced.DocumentStore.DatabaseCommands.DeleteByIndex("allBooks", new IndexQuery());
    

    You can create any index you want to this way, just using the PutIndex() command as shown above and the IndexDefinitionBuilder class, but what is the best practice to create an index?

    Well, another way to define and create a static index within the DB is to inherit AbstractIndexCreationTask<TDocument> class and set the index properties and behavior within the new class ctor. Let’s use this approach to create an index per book’s title:

    public class BooksByTitleIndex : AbstractIndexCreationTask<Book>
    {
     public BooksByTitleIndex()
     {
       this.Map = books => books.Select(book => new { Title = book.Title });
    
       //will analyze title to be available on search operations
       this.Indexes.Add(x => x.Title, FieldIndexing.Analyzed);
     }
    }
    

    Besides being the recommended approach to create static indexes, this one is also good for use them later on queries where you can explicitly tell the engine what index should use or leave it to be decided by the query resolver. In the following code you have two queries, the first one does not specify which index should use to resolve, and the second one does, anyway both queries in this case will use the same index: ‘BooksByTitleIndex‘.

    Why? Because the resolver sees that your are using the Title property of your class and it has an index for it.

    var book1 = session.Query<Book>()
          .Where(b => b.Title.Equals("Seven Databases in Seven Weeks"))
          .FirstOrDefault();
    
    var book2 = session.Query<Book, BooksByTitleIndex>()
          .Where(b => b.Title.Equals("Seven Databases in Seven Weeks"))
          .FirstOrDefault();
    

    But how do we create the index in the DB before executing the queries? Well, you can simple instantiate and call the Execute() method for your index, or use a static index creation helper class that takes all types that inherited from the AbstractIndexCreationTask class and creates the indexes for you if they don’t exist (it is recommended to do this when initializing):

    // create from index classes
    IndexCreation.CreateIndexes(typeof(Program).Assembly, documentStore);
    

    Now let’s say that you want to sum the prices of all books in your collection. Well that’s something that you can very easily do with just a couple of lines in C# using LINQ and workaround the ‘safe’ behavior by retrieving pages of books (or change the ‘safe’ behavior settings – although this is NOT recommended!).

    The problem with this? You are transferring all the data to the client, need to paginate over all the books in your collection and it’s bizarre. The cost will be prohibitive if this collection is huge and you will be going against the ‘safe’ behavior the DB has – that will be for sure a very bad design decision.

    So let’s create a new index for this using the MapReduce features RavenDB has for us. Again, we are going to define a class for the new index to follow best practices and, in addition, simplify our code after:

    public class SumBookPricesIndex: AbstractIndexCreationTask<Book, SumBookPricesIndex.ReduceResult>
    {
     public class ReduceResult
     {
       public string SetOfBooks { get; set; }
       public decimal SumOfPrices { get; set; }
     }
    
     public SumBookPricesIndex()
     {
       this.Map = books => books.Select(book => new ReduceResult { SetOfBooks = "all", SumOfPrices = book.Price });
    
       this.Reduce = results => results
          .GroupBy(result => result.SetOfBooks)
          .Select(group => new ReduceResult { SetOfBooks = group.Key, SumOfPrices = group.Sum(groupCount => groupCount.SumOfPrices) });
     }
    }
    

    Look how a new ReduceResult class is defined within the new index class. This is to make things easier later also, and because as the MapReduce engine could call reduce with mapped or reduced lists you will need to have a logic that can handle both type of items (this is the simplest way). Refer to the documentation for more information.

    How do you get the sum of  all the book prices using MapReduce then? It is very simple after you have the index shown above created, just like this:

    var sumOfBooksPrices = session
       .Query<SumBookPricesIndex.ReduceResult, SumBookPricesIndex>()
       .FirstOrDefault();
    

    Here we are also using the ReduceResult class definition to indicate the query the type of the returned objects.

    What if for example we want to group the authors in our DB and count their books? The driver does not support the GroupBy() operation (at least using the Query() method exposed by the LINQ interface – but it is supported within the LuceneQuery() advanced method)… So? Just create another index!

    public class GroupBookAuthorsIndex : AbstractIndexCreationTask<Book, GroupBookAuthorsIndex.ReduceResult>
    {
     public class ReduceResult
     {
       public string Author { get; set; }
       public int NumberOfBooks { get; set; }
     }
    
     public GroupBookAuthorsIndex()
     {
       // be aware with linq queries [here SelectMany(authors,new) is not the same as SelectMany(authors).Select(new)]
       this.Map = books => books
          .SelectMany(book => book.Authors, (book, author) => new ReduceResult { Author = author, NumberOfBooks = 1 });
    
       this.Reduce = results => results
          .GroupBy(result => result.Author)
          .Select(group => new ReduceResult { Author = group.Key, NumberOfBooks = group.Sum(groupCount => groupCount.NumberOfBooks) });
     }
    }
    

    To finish this article I’ll show you a cool thing you can do by using RavenDB and taking advantage of the Lucene engine it relies on (in this particular case Lucene.NET): the “Did you mean?” feature (suggestions).

    Let’s say that you want to look for a book title but remember just an approximate word or you did a typo when entering your search:

    var query = session.Query<Book, BooksByTitleIndex>().Where(book => book.Title.Equals("sells"));
    var searchedBook = query.FirstOrDefault();
    

    And you have the following titles in your DB:

    • For Whom the Bell Tolls
    • Cat’s Cradle
    • Slaughterhouse-Five
    • Seven Databases in Seven Weeks

    What will be the result (searchedBook variable)? Well, as there is no match for ‘sells‘ the result will be null.

    At this point you can show suggestions to the user (using several comparison algorithms, the user’s input and your index’s tokens [BooksByTitleIndex]) calling the Suggest() method from the LinqExtensions class:

    if (searchedBook == null)
    {
       Console.WriteLine("Did you mean:");
       foreach (var suggestion in query.Suggest(new SuggestionQuery { Accuracy = 0.4f, MaxSuggestions = 5 }).Suggestions)
       {
          Console.WriteLine("\t. {0}?", suggestion);
       }
    }
    

    Being the output something like this:

    Did you mean:

    • tools?
    • bell?
    • seven?

    You can find the sample code here. Anyway, this is just the top of the iceberg, there is a lot of features (document relationships, polymorphic indexes, transformation of results, lazy operations, and many many more not just related to querying) RavenDB offers not used within this article. I recommend you to download the DB and give it a try.

    Tags: , , , , , ,