Linq Index (part3 speeding up a linq query)

In this post i will first show some more complex LinqIndex queries. I want to start with some unequal where queries. In this example i want to query all Customers with a number that is bigger than a specific value.

IEnumerable<Customer> customers = Customer.SampleData();
IndexDefinition<Customer> indexPart = customers.CreateIndexKey(c => c.Number).
    AddIndexKey(c => c.CustomerID);
var list = indexPart.Where(c => c.Number > 10).ToList();

When we look at this query we see that generating a hashtable and using it as index is useless here. If we have an equal where query (like c => c.Number == 10) we generate a hashtable and look in this hashtable for the value 10. But if we have the unequal query (like c => c.Number > 10) we have to lookup many values an we have only the starting value (10). To speed up such a query we need a tree as index. To be more specific we need a balanced tree to speed up the query. There are a lot of algorithms to build balanced trees. (avl trees, splay trees, red-black-trees …) I have choosen a avl tree because i implemented one some times ago and so i could use it for LinqIndex. The problem i had was that my implementation was too slow so i had to reimplement it. It is very importand that the insertion speed in the balanced tree is as fast a possible because i am always generating a new avl tree when an index is needed. So here is an example to show the performance with and without LinqIndex when querying on 100010 Customers.

[TestMethod]
public void UnequalIndexSpecification()
{
    IEnumerable<Customer> customers = Customer.SampleData();
    IndexDefinition<Customer> indexPart = customers.CreateIndexKey(c => c.Number);                            
    Stopwatch stopWatch = new Stopwatch();
    stopWatch.Start();
    var list = customers.Where(c => c.Number > 1000000).ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where for unequal condition without index first time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = customers.Where(c => c.Number > 1000000).ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where for unequal condition without index second time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = indexPart.Where(c => c.Number > 1000000).ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where for unequal condition with unequal index first time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = indexPart.Where(c => c.Number > 1000000).ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where for unequal condition with unequal index second time " + stopWatch.ElapsedMilliseconds);
}

As we see this test executes the same query (with one million Customers) two times without an Index and then two times with an unequal index. This are the result times:

Where for unequal condition without index first time 231
Where for unequal condition without index second time 224
Where for unequal condition with unequal index first time 14738
Where for unequal condition with unequal index second time 0

So while the execution time of the standard Linq query is constant the execution time with the index drops from 14738 to 0 milliseconds. Here we see how long it takes to generate the avl tree. So it is only useful to use LinqIndex for such queries if it is possible to precreate the index or if the index gets used very often.

So now i have described the use of LinqIndex with simple where queries (equal and unequal compare), Join and GroupJoin. But what is with more complicated queries that uses and (&&)  and/or or (||) expressions. Implimenting such queries and gaining speed from the generated Index is very difficult because when combinating expressions (like c.Number == 10 && c.LastName == “Gottshall”) we have to decide if we use an existing index, generate a new index or don’t use an index. If we can speed up the query by using an index for the first part of the expression (c.Number == 10) that does not mean that it is a good decision to use an index for the second part of the expression (c.LastName == “Gottshall”). So complicated queries that uses LinqIndex are slower than the standard linq queries most of the time. Now i will show one example with a logical and (&&) and one with a logical or (||). So in this first example i will use a where on two properties. For both properties (c.Number and c.LastName) a index is defined.

[TestMethod]
public void EqualIndexSpecification()
{
    IEnumerable<Customer> customers = Customer.SampleData();
    IndexDefinition<Customer> indexPart = customers.CreateIndexKey(c => c.Number)
        .AddIndexKey(c => c.LastName);                
    Stopwatch stopWatch = new Stopwatch();
    stopWatch.Start();
    var list = customers.Where(i => i.Number > 1000000 && i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where without index first time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = customers.Where(i => i.Number > 1000000 && i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where without index second time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = indexPart.Where(i => i.Number > 1000000 && i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where with equal index first time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = indexPart.Where(i => i.Number > 1000000 && i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where with equal index second time " + stopWatch.ElapsedMilliseconds);
}

Again i am executing the same query two times without an Index and then two times with an index (one unequal and one equal). This are the result times:

Where without index first time 206
Where without index second time 199
Where with equal index first time 19859
Where with equal index second time 0

Again it was possible to bring the execution time with the index from 19859 to 0 milliseconds. If you execute such a query with LinqIndex it will first build a avl tree search for the Number in it, take the result build a hashtable out of it and search for the LastName in it. If you define no index for the LastName, LinqIndex would build the avl tree, search for Number in it and then it would use a standard linq where to find the LastName. If i use a logical or instead of the logical and the query would look as the following.

[TestMethod]
public void EqualIndexSpecification()
{
    IEnumerable<Customer> customers = Customer.SampleData();
    IndexDefinition<Customer> indexPart = customers.CreateIndexKey(c => c.Number)
        .AddIndexKey(c => c.LastName);
    Stopwatch stopWatch = new Stopwatch();
    stopWatch.Start();
    var list = customers.Where(i => i.Number > 1000000 || i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where without index first time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = customers.Where(i => i.Number > 1000000 || i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where without index second time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = indexPart.Where(i => i.Number > 1000000 || i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where with equal index first time " + stopWatch.ElapsedMilliseconds);
    stopWatch.Reset();
    stopWatch.Start();
    list = indexPart.Where(i => i.Number > 1000000 || i.LastName == "Gottshall").ToList();
    stopWatch.Stop();
    Debug.WriteLine("Where with equal index second time " + stopWatch.ElapsedMilliseconds);
}

If we execute this query query two times without an Index and then two times with an index (one unequal and one equal) the execution times are different.

Where without index first time 337
Where without index second time 337
Where with equal index first time 24102
Where with equal index second time 742

Here LinqIndex is also slower than the standard where query the second time when executing. That happens because when using a logical or i have to find both values in both indexes and then i have to create a hashset to combine the results. That costs more time than traversing all items and check for both conditions. So when building where queries with logical or in it LinqIndex gains no speed and should not be used.

LinqIndex: (part1) (part2) (part3) (part4) (part5)

Advertisements