building a simple spam filter with naive bayes classifier and c#

Have you also often wondered how a filter works? I have and last month I got a book in my fingers that deals with machine learning. In that book there is a chapter about naive bayer classifier, and that naive bayer classifiers are used to implement spam filtering. All programming examples in that book are in python and I do not speak python, so i calculated the naive bayer classifier myself, wrote a c# program and put it on codeplex (naivebayes.codeplex.com). I made the application only do demonstrate the principle of spam filtering, it is not for practical use. To understand how naive bayes classifiers work the best example is the cookie problem. Lets assume we have two bowls with cookies in it.

Bowl 1: 5 chocolate cookies and 35 vanilla cookies

Bowl 2: 20 chocolate cookies and 20 vanilla cookies

Now we take one cookie (it is a vanilla cookie) and we have 50 percent chance that we take it from bowl 1 or from bowl2. Now we have two hypothesis:

1. It is from bowl 1

H1 = 1/2

2. It is from bowl 2

H2 = 1/2

That means that we have a 50 to 50 chance to pick from bowl 1 or from bowl 2

P(h1) = P(h2) = 1/2

After defining our hypothesis and taking one vanilla cookie, now we have one evidence, and we can calculate the probability that we took it from bowl 1 or bowl 2. So now we can define the hypothesis under the evidence that we took a vanilla cookie.

P(E/h1) = 35/40 (because we took one cookie that is vanilla the chance is 35/40)
P(E/h2) = 20/40 = 1/2 (because we took one cookie that is vanilla but the chance is 1/2)

P(E/h1) is 35/40 because we have 35 vanilla cookies out of 40
P(E/h2) is 1/2 because we have 20 vanilla cookies out of 40
At this point we need bayes theorem:

P(h1/E) = P(h1) * P(E/h1) / P(E)

Bayes theorem says that you can calculate the evidence under the hypothesis one P(h1/E) with multiplying the P(h1) with the calculated evidence under the hypothesis P(E/h1) divided by a base P(E). To calculate P(h1/E) that is the value that tells us how big the chance is that our hypothesis one is correct. What we need to calculate that value is P(E). In this case it is very easy to find. Is is the sum of all vanilla cookies divided by the sum of all cookies.

P(E) = sum(vanilla cookies) / sum(all cookies) -> 55 / 80 

So now we have all values we need for bayes theorem.

P(h1/E) = P(h1) * P(E/h1) / P(E) -> P(h1/E) = 
(1/2 * 35/40) / 55/80 = 0.5 * 0,875 / 0,6875 = 0.6363636363 = 63.6 %

That means after taking one vanilla cookie we can say that the chance that we take cookies from bowl 1 is 63.6 percent.

So that’s it with the cookie problem but how can us help that by implementing a spam filter? We can calculate the possibility if a mail we get is spam or not spam with the same formula, with bayes theorem.
But instead of cookies we use mails we have classified as spam mails or as not spam mails. So if we get a new mail, we take every word and calculate a value Q. If Q is greater than 1 the word comes more likely from a spam mail, if Q is smaller than 1 the word comes more likely not from a spam mail. So the rule will be

Q > 1 = spam
Q < 0 = not spam

Now we will calculate Q for every word w.
Our hypothesis h1 and h2 are:

H1 = HSpam = 1/2
H2 = HNotSpam = 1/2

So we have again have a 50 to 50 chance

P(h1) = P(h2) = 1/2

Now we need some evidence data, in our case we have 10 spam mails and 10 not spam mails. That mails will give us the evidence we need to decide if the word comes from a spam mail or not. So now we take one word we have in the mail we want to check. Lets say the word is mother. We look in the spam mails and we find it one time. Then we look in the not spam mails and we find it five times. With that information we can calculate P(E/h1) and P(E/h2):

P(E/h1) = 1 / 10 (one mail out of ten)
P(E/h2) = 5 / 10 (five mails out of ten)

Now we use bayes theorem to calculate P(h1/E) and P(h2/E).

P(h1/E) = P(h1) * P(E/h1) / P(E)
P(h2/E) = P(h2) * P(E/h2) / P(E)

Normally we have to figure out what P(E) is but in this case P(E) refers only to the actual mail that means it is one and we can remove it. Here is the calculation:

P(h1/E) = 1/2 * 1/10 / 1 = 1/20
P(h2/E) = 1/2 * 5/10 / 1 = 5/20

So we have two values that tell us how big the chance is that a mail is a spam mail or not. (based on the spam and not spam mails we already got)
Now we calculate Q and check if it is greater or smaller that 1.

Q = P(h1/E) / P(h2/E) = 1/20 / 5/20 = 1/5 = 0.2

Q is smaller that one that means that the word mother occurs more often in one of our not spam mails. To check all the text in a mail we only have to calculate Q for every word sum all Q values up and divide it through the number of all words in the email.

Q overall = sum(calculate Q for every word w) / count(every word w)
Q overall > 1 = spam
Q overall < 0 = not spam

Now all we have to do is to implement a program that takes the emails and checks a given email for spam. To check if the program does what it should do, i implemented two tests that check phrases. The NaiveBayersSpamTrueTest method uses phrases from spam mails and the NaiveBayersNotSpamFalseTest method uses phrases from not spam mails.
That is the spam true test.

[TestMethod]
public void NaiveBayersSpamTrueTest()
{
    NaiveBayes.NaiveBayes naiveBayes = new NaiveBayes.NaiveBayes();
    var result = naiveBayes.CheckEmail("Buy Cheap cialis Online");
    Assert.AreEqual(true, result);
    result = naiveBayes.CheckEmail("Enlarge Your Penis");
    Assert.AreEqual(true, result);
    result = naiveBayes.CheckEmail("accept VISA, MasterCard");
    Assert.AreEqual(true, result);
}

That is the spam not true test.

[TestMethod]
public void NaiveBayersNotSpamFalseTest()
{
    NaiveBayes.NaiveBayes naiveBayes = new NaiveBayes.NaiveBayes();
    var result = naiveBayes.CheckEmail("Sincerely Mathias");
    Assert.AreEqual(false, result);
    result = naiveBayes.CheckEmail("Dear Graduates");
    Assert.AreEqual(false, result);
    result = naiveBayes.CheckEmail("Thanks in advance for your support");
    Assert.AreEqual(false, result);
    result = naiveBayes.CheckEmail("for it with my Mastercard");
    Assert.AreEqual(false, result);
}

Now that we have the tests we start with the part that reads the evidence data. What i did is, i took 10 normal mails and 10 spam mails from my inbox and my spam folder and made 10 text files out of them. Then i implemented a method that read all the data from the text files. (the text files with the text from the spam mails and not spam mails are at the naivebayes.codeplex.com project)

public class EmailReader
{
    public List<string> Read(string pathToFiles)
    {
        List<string> list = new List<string>();
        DirectoryInfo currentdirectoryInfo = 
            new DirectoryInfo(Environment.CurrentDirectory);
        DirectoryInfo parent = 
            currentdirectoryInfo.Parent.Parent.Parent;
        DirectoryInfo directoryInfo = 
            new DirectoryInfo(parent.FullName + @"\NaiveBayes" + pathToFiles);
        if (directoryInfo.Exists)
        {
            foreach (var file in directoryInfo.EnumerateFiles())
            {
                var text = String.Empty;
                using(FileStream filestream = file.OpenRead())
                {
                    System.IO.StreamReader streamReader = 
                        new System.IO.StreamReader(filestream);
                    text = streamReader.ReadToEnd();
                }
                list.Add(text);
            }
        }
        return list;
    }
}

The methode Read is called two times, one time for the spam mails and one time for the not spam mails. After reading the text from the text files i need a class that parses all words from the text into a dictionary and count how often a word occurs. So that parser tells me how often a word occurs in all my spam mails and in all my not spam mails. So for simplicity the parser takes the text of every mail and splits it into word by using the string.Split(‘ ‘) method, that means every times a blank occurs a new word starts.

public class Parser
{
    private Dictionary<string, int> wordCounter = new Dictionary<string,int>();

    public Dictionary<string, int> WordCounter
    {
        get { return wordCounter; }
        set { wordCounter = value; }
    }

    public void Parse(string textToParse)
    {
        var stringArray = textToParse.Split(' ');
        foreach (var str in stringArray)
        {
            if (wordCounter.ContainsKey(str.ToLower()))
            {
                wordCounter[str.ToLower()] += 1;
            }
            else
            {
                wordCounter.Add(str.ToLower(), 1);
            }
        }
    }
}

Now all the groundwork is done. What we need now is the method that calculates Q for every word and calculates out of all Q values of all words of our email, if it is spam or not spam. In the constructor of the NaiveBayes class the mails are read and saved in the spamMails and notSpamMails variable. The CheckMail method is the starting point of the calculation. First it parses the spamMails and the notSpamMails string and fills two dictionaries with the spam and the not spam mails words. Then the CheckIfSpam method is called that splits the mail text in words and calls for every word the CalculateQ method. Then it sums up all q values and divided it by the count of all tested words. If the calculated value is bigger one it returns true that means spam if not it returns false that means no spam. In the CalculateQ method we calculate Q with the bayes theorem.

P(h1/E) = P(h1) * P(E/h1) / P(E)

We calculate Ph1e by dividing wordCountSpam by countSpamMails and Ph2e by dividing wordCountNotSpam with countNotSpamMails. Then we divide Ph1e by Ph2e and the result is q.

public class NaiveBayes
{
    private List<string> spamMails;
    private List<string> notSpamMails;
    private int countSpamMails;
    private int countNotSpamMails;

    public NaiveBayes()
    {
        EmailReader emailReader = new EmailReader();
        spamMails = emailReader.Read(@"\Mails\Spam\");
        notSpamMails = emailReader.Read(@"\Mails\NotSpam\");
        countSpamMails = spamMails.Count();
        countNotSpamMails = notSpamMails.Count();
    }

    public bool CheckEmail(string text)
    {            
        Parser parser = new Parser();
        foreach(var spamMail in spamMails)
        {
            parser.Parse(spamMail);
        }
        var spamWords = parser.WordCounter;
        parser = new Parser();
        foreach (var notSpamMail in notSpamMails)
        {
            parser.Parse(notSpamMail);
        }
        var notSpamWords = parser.WordCounter;
        return CheckIfSpam(text, countSpamMails,
            spamWords, countNotSpamMails, notSpamWords);
    }

    private bool CheckIfSpam(string text, 
        int countSpamMails, Dictionary<string,int> spamWordList,
        int countNotSpamMails, Dictionary<string,int> notSpamWordList)
    {
        var stringArray = text.Split(' ');
        var sumQ = 0.0;
        var wordCounter = 0;
        foreach (var item in stringArray)
        {
            var q = CalculateQ(item.ToLower(), 
                countSpamMails, spamWordList, 
                countNotSpamMails, notSpamWordList);
            sumQ += q;
            wordCounter++;
        }
        if (sumQ / wordCounter > 1)
        {
            return true;
        }
        else
        {
            return false;
        }
    }

    private double CalculateQ(string word, 
        int countSpamMails, Dictionary<string,int> spamWordList,
        int countNotSpamMails, Dictionary<string,int> notSpamWordList)
    {
        double wordCountSpam = 1;
        if (spamWordList.ContainsKey(word))
        {
            wordCountSpam = spamWordList[word];
        }
        double Ph1e = 0.5 * wordCountSpam / countSpamMails;
        double wordCountNotSpam = 1;
        if (notSpamWordList.ContainsKey(word))
        {
            wordCountNotSpam = notSpamWordList[word];
        }
        double Ph2e = 0.5 * wordCountNotSpam / countNotSpamMails;
        double q = Ph1e / Ph2e;
        return q;
    }
}

So that’s it, if we now give the CheckEmail method a phrase (e.g. “Thanks in advance for your support”) it recognises that this is not from a spam mail. The complete source code including the text from spam and not spam mails is in the naivebayes.codeplex.com project. If you need more information about naive bayes classification i would recommend the open book Think Stats: Probability and Statistics for Programmers. I have to say that I really enjoyed implementing the naive bayes classifier, because i always wanted to know how a spam filter works and it is surprising how good it works.


little or big endian in c#

If you want to check the endianess of your operating system in c# you have two possibilities, a safe and a unsafe way. But first of all i want to explain what endianess is and what big and little endian means. If you have an Int32 value in c#, that Int32 is saved in 4 bytes. But the question is which part of the Int32 is saved in which byte with which address. If the system is using little endian (like Windows does) the last byte (=least significant byte) is saved in the lowest address. That means if you have the hex value 0A0B0C0D and you have the Int32 has the address 00000001, the bytes are saved in the following order:

0D at address 00000001

0C at address 00000002

0B at address 00000003

0A at address 00000004

The following images shows the insertion order:

LittleEndian

If the system has big endian order the last byte (=least significant byte) is saved in the highest address. That means the Int32 value 0A0B0C0D with the address 00000001 the bytes are inserted the following way:

0A at address 00000001

0B at address 00000002

0C at address 00000003

0D at address 00000004

The following image shows the insertion order:

BigEndian

Now if we want to check if the system uses big or little endian in c# there is a very easy way, you can use the BitConverter class that has an IsLittleEndian property.

if(BitConverter.IsLittleEndian)
   return "Little Endian";

If you want to implement the check yourself the simplest method is the following one:

public enum BigLittleEndianEnum
{
    Big,
    Little
}
public BigLittleEndianEnum TestSystemIsBigOrLittleEndianSafe()
{
    Int32 intValue = 0x0001;
    byte[] bytes = BitConverter.GetBytes(intValue);
    var safeBigLittleEndian = (bytes[0] == 1) ? 
        BigLittleEndianEnum.Little : BigLittleEndianEnum.Big;
    return safeBigLittleEndian;
}

What we do is we take the Int32 type intValue and assign 1 to it. Then we use the BitConverter class to generate a byte array out of that Int32 value. If the zero byte of the array (byte[0]) is one the we know that the system uses little endian because the 1 is in the lowest address (byte[0]).
If you search for check little or big endian in google you often find the c/c++ solution. In c# you have to use the unsafe keyword, because you need pointer syntax to implement the c/c++ solution.

public BigLittleEndianEnum TestSystemIsBigOrLittleEndianUnsafe()
{
    Int32 intValue = 0x0001;
    unsafe
    {        
        Int32* intValueAddress = &intValue;                
        byte* byteValueAddress = (byte*) intValueAddress;
        byte byteValue = *byteValueAddress;
        var unsafeBigLittleEndian = (byteValue == 1) ? 
            BigLittleEndianEnum.Little : BigLittleEndianEnum.Big;        
        return unsafeBigLittleEndian;                        
    }            
}

Here we use a pointer of the Int32 intValue and cast it to a byte pointer. That byte pointer points on the lowest address (byte[0]) of the Int32 value and you can check if the value the pointer points on is zero or one, if it is one its little endian. A real c/c++ programmer would write that whole method in two lines:

public BigLittleEndianEnum TestSystemIsBigOrLittleEndianUnsafe()
{
    Int32 intValue = 0x0001;
    return *((byte*) &intValue) == 1 ? 
        BigLittleEndianEnum.Little : BigLittleEndianEnum.Big;
}

For a more detailed explanation please look at the wickipedia article about Endianess.
 


creating generic object or generic method with reflection.

Some times i am in a situation that i needed to create a generic Type or a generic Method with a changing type at runtime. I will start with the generation of a generic method. Lets asume i have a method ConvertValue defined in a class Converter:

public class Converter
{
    public TResult ConvertValue<TResult>(int value)
    {            
        TResult resultValue = (TResult)Convert.ChangeType(value, typeof(TResult));
        return resultValue;         
    }
}
var intValue = 10;
Converter converter = new Converter();
double doubleValue = converter.ConvertValue(intValue);

As we see that method converts an int value to a generic type, in our case in a double. But what is if we dont know the type we want to convert in the intValue at compile type. What if the type is choosen at runtime. Then we need the MakeGenericMethod method from the System.Reflection namespace. That call builds a generic method and then we can invoke that method with the correct generic type.

public class CreateGenericTypeAndMethod
{
    public static object MakeGenericMethod(object obj, string methodName, 
                                           Type innerType, params object[] args)
    {            
        MethodInfo method = obj.GetType().GetMethod(methodName, 
            BindingFlags.Public | BindingFlags.NonPublic |
            BindingFlags.Instance | BindingFlags.Static);
        MethodInfo generic = method.MakeGenericMethod(innerType);
        return generic.Invoke(obj, args);
    }
}
var intValue = 10;
Type type = typeof(double);
object obj = MakeGenericMethod(new Converter(), "ConvertValue", type, 10);
// obj is now from type double and has the value 10.0
var str = String.Format("{0:0.0}",((double)obj));
Console.WriteLine("obj has the type {0} and the value {1}", obj.GetType(), str);

After the MakeGenericMethod call the obj object is now from type double and has the value 10.0 .
objAsDouble
But what is if you want to generate a class with a generic type T. In the next example i show how to generate a Stack of type T. The type T is not known at compile type but at runtime.

public class Stack<T>
{
    private List<T> list = new List<T>();

    public void Push(T value)
    {
        list.Add(value);
    }

    public T Pop()
    {
        if (list.Count > 0)
        {
            T value = list[list.Count - 1];
            list.RemoveAt(list.Count - 1);
            return value;
        }
        return default(T);
    }
}
public class CreateGenericTypeAndMethod
{
    public static object MakeGenericType(Type generic, Type innerType, 
        params object[] args)
    {
        Type specificType = generic.MakeGenericType(new Type[] { innerType });
        return Activator.CreateInstance(specificType, args);
    }
}
Type type = typeof(double);
object stack = CreateGenericTypeAndMethod.MakeGenericType
    (typeof(Stack<>), type, new object[] { });

Here i generate a Stack for double values.

stackAsDouble

I call the MakeGenericType method with the type of the object (Stack) and the type of the generic type (double). In the MakeGenericType method i generate the generic type of the Stack (Stack). For that generation i use the MakeGenericType method of the Type class. With that type i instanciate the object by calling Activator.CreateInstance. That is all i have to do to create the Stack.

Here is the complete implementation of the CreateGenericTypeAndMethod class. There are two overloads for generic type generation and generic methode generation with one generic parameter or multible generic parameters.


public class CreateGenericTypeAndMethod
{
    public static object MakeGenericType(Type generic, Type innerType, 
        params object[] args)
    {
        Type specificType = generic.MakeGenericType(new Type[] { innerType });
        return Activator.CreateInstance(specificType, args);
    }

    public static object MakeGenericType(Type generic, Type[] innerTypes, 
        params object[] args)
    {
        Type specificType = generic.MakeGenericType(innerTypes);
        return Activator.CreateInstance(specificType, args);
    }

    public static object MakeGenericMethod(object obj, string methodName, 
        Type innerType, params object[] args)
    {            
        MethodInfo method = obj.GetType().GetMethod(methodName, 
            BindingFlags.Public | BindingFlags.NonPublic | 
            BindingFlags.Instance | BindingFlags.Static);
        MethodInfo generic = method.MakeGenericMethod(innerType);
        return generic.Invoke(obj, args);
    }

    public static object MakeGenericMethod(object obj, string methodName, 
        Type[] innerTypes, params object[] args)
    {
        MethodInfo method = obj.GetType().GetMethod(methodName, 
            BindingFlags.Public | BindingFlags.NonPublic | 
            BindingFlags.Instance | BindingFlags.Static);
        MethodInfo generic = method.MakeGenericMethod(innerTypes);
        return generic.Invoke(obj, args);
    }
}

dynamic dispatch in c#

In this post i want to write about dynamic dispatch. So dynamic dispatch is used for code invocation and describes if a language uses compile time information or run time information to invoke a method. The two main concepts with dynamic dispatch are single dispatch and multible dispatch. I wanted to start with the simpler one and most common in object orientated languages, with single dispatch.

  • Single dispatch in c# is polymorphistic overriding of methods. The following example shows the c# code for overriding.
    public class Animal
    {       
        public virtual void VirtualShout()
        {
            Console.WriteLine("Animal virtual shouts");
        }
        public void Shout()
        {
            Console.WriteLine("Animal shouts");
        }
    }
    
    public class Cat : Animal
    {        
        public new void Shout()
        {
            Console.WriteLine("Cat shouts");
        }
    }
    
    public class SingleDispatch
    {
        public void Dispatch()
        {
            Animal animal = new Cat();
            animal.VirtualShout();
            animal.Shout();
        }
    }

    Single dispatch is normally implemented with virtual tables. That means the compiler produces a table with entries depending on the methods of the classes. In the Animal example the virtual table for the Animal and the Cat class looks the following way.

       
   Animal:
   vtableAnimal 

   Cat:
   vtableCat

Here we see the output of the single dispatch example:
SingleDispatchOutput
As we see if we call animal.VirtualShout() the decision which VirtualShout method is called (Animal or Cats one) is made at runtime. When the call gets executed the runtime looks at the virtual table and calls the method that stands in that virtual table. In our case the VirtualShout method of the Animal class is called because it is not ovverriden in the Cat class. In c# that mechanismus works only for virtual methodes (in java every method is automatic virtual). The Shout() method is not virtual and so the decision which method to call is choosen at compile time and does not change at runtime. The importand point why this virtual call mechanismus is called single dispatch is because the decision which method to call is made by the single object (If it is a Animal object it has a pointer to the Animal virtual table if it is a Cat it has a pointer to the Cat virtual table).

  • multible dispatch works with the dynamic keyword in c# the following example shows multible dispatch
    public class Owner
    {
        public virtual string Name
        {
            get { return "Owner"; }
        }
    }
    
    public class Mathias : Owner
    {
        public override string Name
        {
            get { return "Mathias"; }
        }
    }
    
    public class Animal
    {       
        public virtual void Shout(Owner owner)
        {
            Console.WriteLine("Animal Shouts to Owner object {0}", owner.Name);
        }
    
        public virtual void Shout(Mathias mathias)
        {
            Console.WriteLine("Animal Shouts to Mathias object {0}", mathias.Name);
        }
    }
    
    public class Cat : Animal
    {        
        public override void Shout(Owner owner)
        {
            Console.WriteLine("Cat Shouts to Owner object {0}", owner.Name);
        }
    
        public override void Shout(Mathias mathias)
        {
            Console.WriteLine("Cat Shouts to Mathias object {0}", mathias.Name);
        }
    }   
    
    public class MultibleDispatch
    {
        public void Dispatch()
        {
            Animal animal = new Animal();
            Owner owner = new Mathias();
            animal.Shout(owner);
            dynamic dynamicOwner = new Mathias();
            animal.Shout(dynamicOwner);
        }
    }

    Multible dispatch means that the decision which method should be called is made by the type of all the methodes parameter. In c# that decision is normally met at compile time, that means the compiler looks at the parameter types of the methodes and chooses the method that fits best. In the example we have two classes Owner and Mathias. Mathias is derifed from Owner. If we declare an Owner and make it an instance of Mathias, and call the Shout(owner) method the binding is made at compile time. That means the Shout method with the first method (Shout(Owner owner)) is called because the compiler uses the type of owner (for the compiler that type is Owner) to call the method. So that call is single dispatch because it uses only the instance objects type Animal to decide which method to call. If we want a call that recognises the correct type of the parameter (the type is Mathias not Owner) we have to move the decision which method to call to runtime. We do that with the dynamic keyword. If we build a dynamic object dynamicOwner that object is from type object and if the runtime makes the Shout(dynamicOwner) call it knows what type dynamicOwner really is (Mathias) and it chooses the correct method (Shout(Mathias mathias)). Here is the output of the Dispatch() method.
    MultibleDispatchOutput
    So now the decision is made by the parameter type and if we have more than one parameter, the decision is made by all the parameter types as we see in the following example.

    public class Animal
    {  
        public virtual void ShoutTwoOwners(Owner ownerOne, Owner ownerTwo)
        {
            Console.WriteLine(
                "Animal Shouts to Owner one object with name {0}, Owner two object with name {1}",
                ownerOne.Name, ownerTwo.Name);        
        }
    
        public virtual void ShoutTwoOwners(Owner ownerOne, Mathias mathiasTwo)
        {
            Console.WriteLine(
                "Animal Shouts to Owner one object with name {0} Mathias two object with name {1}",
                ownerOne.Name, mathiasTwo.Name);
        }
    
        public virtual void ShoutTwoOwners(Mathias mathiasOne, Owner ownerTwo)
        {
            Console.WriteLine(
                "Animal Shouts to Mathias one object with name {0} Owner two object with name {1}", 
                mathiasOne.Name, ownerTwo.Name);        
        }
    
        public virtual void ShoutTwoOwners(Mathias mathiasOne, Mathias mathiasTwo)
        {
            Console.WriteLine(
                "Animal Shouts to Mathias one object with name {0} Mathias two object with name {1}", 
                mathiasOne.Name, mathiasTwo.Name);        
        }
    }
    public class MultibleDispatch
    {
        public void Dispatch()
        {
            Animal animal = new Animal();        
            dynamic dynamicOwnerOne = new Mathias();
            dynamic dynamicOwnerTwo = new Owner();
            animal.ShoutTwoOwners(dynamicOwnerOne, dynamicOwnerTwo);
        }
    }

    Here is the output for that example that shows that the decision which method to call is made by multible Dispach (multible parameter are evaluated at runtime to make the correct method call)
    MultibleDispatchTwoParametersOutput


Building a rule engine in c# (part 5: bringing it all together)

In the first four post of these series i showed a lot of code parts. I became some mails about that posts that show me that my descriptions are not as good as i thought. For that reason i decided to write that fifth post to bring it all together. First of all i decided to bring the source code of the posts together and so i created the ruleengine project at ruleengine.codeplex.com. Here are the links to the four previous posts and to the description of the expression evaluator.

Now i want to describe the tests i added to the ruleengine.codeplex.com project to show how to use ist. First i want to show the Person and Adress classed i use to validate the rules against:

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
    public int Children { get; set; }
    public bool Married { get; set; }
    public Adresse Adresse_ { get; set; }
    private List<Adresse> adresses = new List<Adresse>();
    public List<Adresse> Adresses_ 
    { 
        get { return adresses; } 
        set { adresses = value; } 
    }
}

The Person class has some properties (Name, Age, Children, Married) and a collection of Adresses.

public class Adresse
{
    public string Street { get; set; }
    public int Plz { get; set; }
    public string City { get; set; }
    public bool ActiveState { get; set; }
}

The Adress class has some simple properties (Stree, Plz, City, ActiveState).
I will start with a simple test case to test a persons properties against a specific rule.

[TestMethod]
public void SimpleRuleLoaderPersonTest()
{
    Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
    RuleLoader ruleLoader = new RuleLoader();
    // new Rule("Age", Operator.LessThanOrEqual, 50);
    Rule rule = ruleLoader.Load(2);
    RuleEngine.RuleEngine ruleEngine = new RuleEngine.RuleEngine();
    var ruleFunc = ruleEngine.CompileRule<Person>(rule);
    var result = ruleFunc(person);
    Assert.AreEqual(result, true);
}

That test defines a Person object and instanciates a RuleLoader object. That ruleLoader object would load the rules from the database but in my special case i wanted to use no database so the rule Loader is just a big switch statement that creates rules:

public class RuleLoader
{
    public Rule Load(int id)
    {            
        switch(id)
        {
            case 1 : 
                return new Rule("Name", Operator.NotEqual, "test");
            case 2 : 
                return new Rule("Age", Operator.LessThanOrEqual, 50);
            case 3: 
                return new Rule("Children", Operator.GreaterThan, 0);
            case 4: 
                return new Rule("City", Operator.Equal, "New York");
            case 5: return 
                new Rule("ActiveState", Operator.Equal, true);                
            case 6: 
                return new Rule("DecimalValue", Operator.GreaterThanOrEqual, 1);
            case 7: 
                return new Rule("DecimalValue", Operator.GreaterThanOrEqual, 1);
            case 8: 
                return new Rule("Married", Operator.Equal, true);
            default :
                return null;
        }
    }        
}

After that i use the RuleEngine to build a Func delegate out of the defined rule. In this case it is a Func of Person, bool delegate. This delegate gets invoked with the Person object as parameter and returns true or false. The second test shows how to use the RuleValidator class. That class has defined different ValidatorRule methodes to validate objects against rules.

[TestMethod]
public void SimpleRuleLoaderPersonValidateRulesAny()
{
    Person person1 = new Person() { 
        Name = "Mathias", Age = 36, Children = 2, Married = true };
    Person person2 = new Person() { 
        Name = "Anna", Age = 32, Children = 2, Married = false };
    RuleLoader ruleLoader = new RuleLoader();
    // new Rule("Age", Operator.LessThanOrEqual, 50);
    Rule rule1 = ruleLoader.Load(2);
    // new Rule("Children", Operator.GreaterThan, 0);
    Rule rule2 = ruleLoader.Load(3);
    RuleEngine.RuleEngine ruleEngine = new RuleEngine.RuleEngine();
    var ruleFuncs = ruleEngine.CombineRules<Person>(
        new Rule[] { rule1, rule2 });
    RuleValidator ruleValidator = new RuleValidator();
    var result = ruleValidator.ValidateRulesAny(
        new[] { person1, person2 }, ruleFuncs);
    Assert.AreEqual(result, true);
}

In this case we use the CombineRules method of the RuleEngine to create an array of Func delegates. That delegates and the two Person objects are overtaken to the ValidateRulesAny method of the RuleValidator. That method returns true if every person fulfills at least one rule. The next test shows how to use the RuleEngine to validate collections of objects against rules.

[TestMethod]
public void SimpleRuleLoaderPersonAddressValidateRulesAll()
{
    Person person = new Person() { 
        Name = "Mathias", Age = 36, Children = 2, Married = true };
    Adresse adresseOne = new Adresse() { 
        Street = "Teststreet1", Plz = 3030, City = "New York", ActiveState = true };
    Adresse adresseTwo = new Adresse() { 
        Street = "Teststreet2", Plz = 1010, City = "London", ActiveState = false };
    Adresse adresseThree = new Adresse() { 
        Street = "Teststreet3", Plz = 2020, City = "Paris", ActiveState = false };        
    person.Adresses_.Add(adresseOne);
    person.Adresses_.Add(adresseTwo);
    person.Adresses_.Add(adresseThree);
    RuleLoader ruleLoader = new RuleLoader();
    // new Rule("City", Operator.Equal, "New York");
    Rule firstAdressRule = ruleLoader.Load(4);
    // new Rule("ActiveState", Operator.Equal, true);      
    Rule secondAdressRule = ruleLoader.Load(5);
    RuleEngine.RuleEngine ruleEngine = new RuleEngine.RuleEngine();
    var firstAdressRuleFunc = 
        ruleEngine.CompileRule<Adresse>(firstAdressRule);
    var secondAdressRuleFunc = 
        ruleEngine.CompileRule<Adresse>(secondAdressRule);
    RuleValidator ruleValidator = new RuleValidator();
    bool result = ruleValidator.
        ValidateValuesAny(person.Adresses_, firstAdressRuleFunc);
    Assert.AreEqual(result, true);
    result = ruleValidator.
        ValidateValuesAny(person.Adresses_, secondAdressRuleFunc);            
    Assert.AreEqual(result, true);
}

Here the Person objects owns a List of Adress objects. The RuleValidator class has some methods that take a List and validates every element of that list against a rule.(ruleValidator.ValidateValuesAny(person.Adresses_, firstAdressRuleFunc) ). The next test shows how the ruleEngine sums up a property and validates this sum against two rules.

[TestMethod]
public void SimpleRuleLoaderPersonValidateRulesSum()
{
    Person person1 = new Person() { 
        Name = "Mathias", Age = 35, Children = 2 };
    Person person2 = new Person() { 
        Name = "Anna", Age = 32, Children = 2 };
    RuleLoader ruleLoader = new RuleLoader();
    // new Rule("Age", Operator.LessThanOrEqual, 50);
    Rule rule1 = ruleLoader.Load(2);
    // new Rule("Children", Operator.GreaterThan, 0);
    Rule rule2 = ruleLoader.Load(3);
    RuleValidator ruleValidator = new RuleValidator();
    var result = ruleValidator.ValidateRulesSum(
        new Person[] { person1, person2 },
        new Rule[] { rule1, rule2 });            
    Assert.AreEqual(result, false);
}

Here is use the ValidateRulesSum method. This method takes an array of objects (person1, person2) and an array of rules (rule1, rule2). It takes the first rule checks what property of the Person object this rule uses, sums up that property value of all overtaken objects and checks that sum against the rule. It returns true if the sum of all properties pass all rules. The next test shows the usage of the expression evaluator rules. In this case the Rules are expression rules ( like ” Name = ‘mathias’ “).

[TestMethod]
public void SimpleRuleLoaderPersonEvaluateExpression()
{
    Person person1 = new Person() 
        { Name = "Mathias", Age = 35, Children = 2 };
    ExpressionRuleLoader expressionRuleLoader = new ExpressionRuleLoader();
    // new Rule(" Name = 'mathias' ");
    Rule rule1 = expressionRuleLoader.Load(1);
    // new Rule(" Age = 35 ");
    Rule rule2 = expressionRuleLoader.Load(2);           
    RuleValidator ruleValidator = new RuleValidator();
    var result = ruleValidator.ValidateExpressionRules(person1, rule1);
    Assert.AreEqual(result, true);
    result = ruleValidator.ValidateExpressionRules(person1, rule2);
    Assert.AreEqual(result, true);
}

The RuleValidator class has methodes (e.g. ValidateExpressionRules) to validate the Expression Rules against objects. In this case the rules are ” Name = ‘Mathias’ ” and ” Age = 35 “. To look what the expression evaluator can afford and what not please look at my the implementing expression evaluator in c# post.
The source code of the expression evaluator is part of the ruleengine.codeplex.com project and can also be downloaded from simpleexpeval.codeplex.com.


implementing expression evaluator in c#

In this post i want to write about the implementation an expression evaluator. This expression evaluator evaluates simple c style expressions. I wrote it to implement defined rules like in my building a rule engine in c# post. A description about the use of the expression evaluator can be found at building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions). You can download the expression evaluator and the source code from (http://simpleexpeval.codeplex.com/). Now i want to show the use of the expression evaluator and then i will describe the implementation of the parts of the expression evaluator. That is the list of operations the expression evaluator supports:

  • open bracket ‘(‘ and close bracket ‘)’
  • addition ‘+’
  • subtraction ‘-‘
  • multiblication ‘*’
  • division ‘/’
  • modulo ‘%’
  • equality ‘=’
  • unequality ‘!=’
  • bigger than ‘>’
  • bigger than or equal ‘>=’
  • smaller than ‘<‘
  • smaller than or equal ‘<=’
  • logical and ‘&&’
  • logical or ‘||’

and the following data types:

  • integer ’10’
  • double ‘10.2’
  • string ‘”test”‘
  • boolean ‘true’ or ‘false’

Here is a example of the use of the expression evaluator.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
var ruleText = " (Children = 2 && Married = true) || Age > 36 ";
Evaluator evaluator = new Evaluator();
var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

As we see the expression evaluator takes a rule text and a object. In this case it is a person object with four properties. I use this properties in the rule text and the expression evaluator replaces the property name with the value of the overtaken object. That means that for the expression evaluator the rule text looks like ” (2 = 2 && true = true) || 36 > 36 “. Now the expression evaluator can evaluate that text and check if the expression is true or false. In this case it is true because 2 = 2 and true = true.
The expression evaluator consist of four parts. The Lexer that reads every character, the Parser that builds the expression parts the infix to postfix converter and the evaluator that takes the expression parts and evaluates it to a specific result. I want to start with a description of the lexer. It is a so called LL(k) lexer that means it left associative and starts with the first left character. The k means how many character the lexer can look ahead. In this case it is only one character. Here is the class diagram of the lexer.

SimpleExpressionEvaluatorLexer

In the Lexer class i have implementet the basis functionality for a LL(k) lexer. The ExpressionEvaluatorLexer class contains the specific implementation for the expression evaluator. The most importand part is the NextToken method that reads character after character and builds Tokens out of the characters. Here is the code for the NextToken method.

public override Token NextToken()
{
    while (lookahead[lookaheadIndex] != EOF)
    {
        switch (lookahead[lookaheadIndex])
        {
            case '.': ConsumeItem(); return new Token(TokenType.POINT, ".");
            case '+': ConsumeItem(); return Plus();
            case '-': ConsumeItem(); return Minus();
            case '*': ConsumeItem(); return Multibly();
            case '/': ConsumeItem(); return Divide();
            case '%': ConsumeItem(); return Modulo();
            case '<': ConsumeItem(); return Smaller();
            case '>': ConsumeItem(); return Greater();
            case '=': ConsumeItem(); return Equal();
            case '!': ConsumeItem(); return Unequal();
            case '"': ConsumeItem(); return String();
            case '\'': ConsumeItem(); return SingleQuoteString();
            case '|': ConsumeItem(); return Or();
            case '&': ConsumeItem(); return And();                    
            case '(': ConsumeItem(); return new Token(TokenType.OPENBRACKET, "(");
            case ')': ConsumeItem(); return new Token(TokenType.CLOSEBRACKET, ")");
            default:                        
                if (IsLetter())
                {
                    return Terminal();
                }
                if (IsNumber())
                {
                    return Number();
                }
                if (IsWhitespace())
                {
                    ConsumeItem();
                    continue;
                }
                throw new Exception("invalid character." + lookahead[lookaheadIndex]);
        }
    }
    return new Token(TokenType.EOF, "<EOF>");
}

As we see it is a loop that loops every character in the lookahead char array. If it finds a specific character for example a digit, the lexer assumes that it has an integer or an double. So i calls the Number method.

private Token Number()
{
    bool hasPoint = false;
    StringBuilder stringBuilder = new StringBuilder();
    while (IsNumber() || IsPoint() || IsExponent())
    {
        if (IsPoint())
        {
            hasPoint = true;
        }
        stringBuilder.Append(lookahead[lookaheadIndex]);
        ConsumeItem();
    }
    if (hasPoint)
        return new Token(TokenType.DOUBLE, stringBuilder.ToString());
    return new Token(TokenType.INTEGER, stringBuilder.ToString());
}

The Number method tries to find out if the text is a integer or a double and it returns an integer or an double Token with the value it read. The lexer does this for every character and the parser controls the lexer and tells it to read the next token. That means the lexer does not read every character in the character array, it reads it token by token. If it has identified one token it returns the control back to the parser. Here we see the class diagram of the parser.

SimpleExpressionEvaluatorParser

The ExpressionEvaluationParser reads the token from the lexer and builds a list of AbstractSyntaxTreeNodes. That nodes get rearranged by the InfixToPostfix class and then the result list is passed to the expression evaluator. That is necessary because the expression evaluator can handle operator precedence, that means that a multiblication is performed before and addition or substraction and that calculations are performed before comparison operations.

public AbstractSyntaxTreeNode BuildAST()
{                                    
    if (lookahead[lookaheadIndex].Type_ == TokenType.EQUAL)
    {
        return Equal();
    }
    else if (lookahead[lookaheadIndex].Type_ == TokenType.INTEGER)
    {
        return Integer(lookahead[lookaheadIndex].Text);
    }
    ...
public EqualNode Equal()
{
     EqualNode equalNode = new EqualNode();
     Match(TokenType.EQUAL);
     return equalNode;
}
public IntegerNode Integer(string value)
{
    IntegerNode integerNode = new IntegerNode();
    Match(TokenType.INTEGER);            
    integerNode.Value = int.Parse(value);
    return integerNode;
}

That is the parser code that perfoms the transformation from the tokens to the AbstractSyntaxTreeNodes. That means if it finds an Eqaul token it builds a EqualNode and consumes the next token, if it finds a Integer token it overtakes the value of the Integer, builds a IntegerNode with the overtaken value and consumes the next token with the match methode. The parser has a tranformation method for every token. We now have a list of AbstrancSyntaxTreeNodes that represent the overtaken text. After that we have to get the AbstractSyntaxTreeNodes in the correct order to handle operator precedence and open and closed brackets. The best way to do that is to convert the expression that is defined in infix into a postfix expression. The benefit of a postfix operation is that it is much easier to evaluate than an infix expression. All we need for the infix to postfix transformation is an operator stack. I will show an example for the transformation of a list of infix operations to a list of postfix operations. We start with the expression 5 + 3 * 2 – 1. To convert that infix expression to a postfix expression we take every part of the expression and tranform it. We need a stack for the operator precedence and a list that contains the postfix expressions after the transformation.

  • 5 => to List ( 5 )
  • + => to Operator Stack ( + )
  • 3 => to List ( 5, 3 )
  • * => to Operator Stack ( +, * )
  • 2 => to List ( 5, 3, 2 )
  • – => lower precedence than * on Stack => to List ( 5, 3, 2, * ) => Operator Stack ( +, – )
  • 1 => to List ( 5, 3, 2, *, 1 )
  • pop Operator Stack ( +, – ) => to List ( 5, 3, 2, *, 1, -, + )

Thats how the postfix list looks after the transformation: ( 5, 3, 2, *, 1, -, + )
Now we have one big benefit, we can take that list and evaluate it from the beginning to the end without mind of the operator precedence. But what is if we have brackets in out expression for example the expression ( 5 + 3 ) * 2. We can handle that also very easily.

  • ( => handle bracket, instanciate new Operator Stack
  • 5 => to List ( 5 )
  • + => to new Operator Stack ( + )
  • 3 => to List ( 5, 3 )
  • ) => close bracket pop Operator Stack ( + ) => to List ( 5, 3, + )
  • * => to Operator Stack ( * )
  • 2 => to List ( 5, 3, +, 2 )
  • pop Operator Stack ( * ) => to List ( 5, 3, +, 2, * )

As we see all brackets are gone and if we use postfix we dont need them anymore, if we evaluate that expression we can calculate the right result without any brackets.

What we now need to do is do define the operator precedence for all operators. That is necessary because at first the expression evaluator has to perform calculations (multiblication, division and modulo before addition and subtraction) then the expression evaluator has to check for comparison ( equal, unequal, greater than, smaller than). After he evaluated this it has to check for logical combinators (logical and, logical or). So that is the operator precedence of the expression evaluator.

  • *, /, % have the highest order
  • +, – have the second highest order
  • =, !=, >, >=, <, <= have the third highest order
  • &&, || have the lowest order

Here is the code for the ConvertInfixToPostfix expression that performs exact that transformation from an infix to a postfix expression.

public List<AbstractSyntaxTreeNode> ConvertInfixToPostfix(List<AbstractSyntaxTreeNode> postfixList)
{
    List<AbstractSyntaxTreeNode> returnList = new List<AbstractSyntaxTreeNode>();
    Stack<AbstractSyntaxTreeNode> operatorStack = new Stack<AbstractSyntaxTreeNode>();
    var position = 0;
    for (int i = position; i < postfixList.Count; i++)            
    {
        var item = postfixList[i];
        if (item is IntegerNode || item is DoubleNode || item is VariableNode ||
            item is BooleanNode || item is StringNode)
        {
            returnList.Add(item);
        }
        else if (item is OpenBracketNode)
        {
            i = ConvertInfixToPostfixBracket(i, postfixList, returnList);
        }
        else if (item is OrNode || item is AndNode)
        {
            while (operatorStack.Count > 0)
            {
                var abstractSyntaxTreeNode = operatorStack.Pop();
                returnList.Add(abstractSyntaxTreeNode);
            }
            operatorStack.Push(item);
        }
        else if (item is GreaterThenNode || item is GreaterThenOrEqualNode ||
            item is SmallerThenNode || item is SmallerThenOrEqualNode ||
            item is EqualNode || item is UnEqualNode)
        {
            if (operatorStack.Count() > 0 && (operatorStack.Peek() is MulNode || 
                operatorStack.Peek() is DivNode ||
                operatorStack.Peek() is ModuloNode))
            {
                AbstractSyntaxTreeNode node = operatorStack.Pop();
                returnList.Add(node);
            }
            else if (operatorStack.Count() > 0 && (operatorStack.Peek() is AddNode || 
                operatorStack.Peek() is SubNode))
            {
                AbstractSyntaxTreeNode node = operatorStack.Pop();
                returnList.Add(node);
            }
            operatorStack.Push(item);
        }
        else if (item is AddNode || item is SubNode)
        {
            if (operatorStack.Count() > 0 && (operatorStack.Peek() is MulNode || 
                operatorStack.Peek() is DivNode ||
                operatorStack.Peek() is ModuloNode))
            {
                AbstractSyntaxTreeNode node = operatorStack.Pop();
                returnList.Add(node);
            }
            operatorStack.Push(item);
        }
        else if (item is MulNode || item is DivNode || item is ModuloNode)
        {
            operatorStack.Push(item);
        }                
        position++;
    }
    while (operatorStack.Count > 0)
    {
        var abstractSyntaxTreeNode = operatorStack.Pop();
        returnList.Add(abstractSyntaxTreeNode);
    }
    return returnList;
}

As we see the code is very simple it loops all AbstractSyntaxTreeNodes in the overtaken list and if it finds a Integer, Double, String or Boolean AbstractSyntaxTreeNode it adds it to the return list. If it finds a add or sub Node it checks the stack. If it finds a mul, div or mod Node it pops the Node (that has a higher precedence than the add or sub Node) and adds it to the list, then it pushes the actual operator to the stack. It works simular for comparison operators ( equal, unequal … ) and for logical operators ( and, or). If it comes to a open bracket it calls the ConvertInfixToPostfixOpenBracket method. That method instanciates a new operator stack and works the same as the ConvertInfixToPostfix method. It calls it self recursive for nested brackets. Now we have brought our AbstractSyntaxTreeNodes in the correct order and we can overhand our postfix list to the expression evaluator. That evaluator takes one node after the other and evaluates them. For that evaluation we need a stack that takes the result of the calculations. For example if we want to calculate the result of the expression ( 5, 3, 2, *, 1, -, + )

  • 5 => to the Stack ( 5 )
  • 3 => to the Stack ( 5, 3 )
  • 2 => to the Stack ( 5, 3, 2 )
  • * => take first two parameter from the Stack ( 2, 3 ) and interchange them because they are in wrong order. ( 3, 2 ) => calculate the multiblication 3 * 2 = 6 => push result to Stack ( 5, 6 )
  • 1 => to the Stack ( 5, 6, 1 )
  • – => take first two parameter from the Stack ( 1, 6 ) and interchange them because they are in wrong order. ( 6, 1 ) => calcualte the subtraction 6 – 1 = 5 => push result to Stack ( 5, 5 )
  • + => take first two parameter from the Stack ( 5, 5 ) and interchange them because they are in wrong order. ( 5, 5 ) => calculate the addition 5 + 5 = 10 => push result to Stack

Now we can take the result from the Stack ( 10 ) => that is the result. Here is the code that shows the Evaluate method that takes the list of the AbstractSyntaxTreeNode and evaluates them.

public bool Evaluate<T>(List<AbstractSyntaxTreeNode> postfixList, 
    Dictionary<string, AbstractSyntaxTreeNode> symbolTable, T objectValue)
{
    foreach (var item in postfixList)
    {
        if (item is IntegerNode || item is DoubleNode || 
            item is BooleanNode || item is StringNode)
        {
            valueStack.Push(item);
        }
        else if (item is AddNode)
        {
            var secondoperand = valueStack.Pop();
            var firstoperand = valueStack.Pop();
            if (secondoperand is IntegerNode && firstoperand is IntegerNode)
            {
                 IntegerNode integerNode = new IntegerNode();
                 integerNode.Value = ((IntegerNode)firstoperand).Value + 
                      ((IntegerNode)secondoperand).Value;
                 valueStack.Push(integerNode);
            }
         }
         else if (item is MulNode)
         {
              var secondoperand = valueStack.Pop();
              var firstoperand = valueStack.Pop();
              if (secondoperand is IntegerNode && firstoperand is IntegerNode)
              {
                   IntegerNode integerNode = new IntegerNode();
                        integerNode.Value = ((IntegerNode)firstoperand).Value *
                   ((IntegerNode)secondoperand).Value;
                   valueStack.Push(integerNode);
              }
         }
         else if (item is EqualNode)
         {
              var secondoperand = valueStack.Pop();
              var firstoperand = valueStack.Pop();
              if (secondoperand is IntegerNode && firstoperand is IntegerNode)
              {
                   BooleanNode booleanNode = new BooleanNode();
                   booleanNode.Value = ((IntegerNode)firstoperand).Value == 
                        ((IntegerNode)secondoperand).Value;
                   valueStack.Push(booleanNode);
         }
}
...

The evaluator part is really easy, it uses on stack for the values and the calculation results. So it takes on AbstractSyntaxTreeNode after the other and if it is data it puts it to the stack, if it is an operator it performs the operation and puts the result to the stack. If it gets the following postfixlist ( IntegerNode(4), IntegerNode(5), IntegerNode(3), MulNode(*),AddNode(+)) it starts with pushing 4 to the stack, pushing 5 to the stack, pushing 3 to the stack. Poping 3 and 5 from the stack perform the multiblication and push 15 to the stack. Then it pops 15 and 4 from the stack, performs the addition and pushes 19 to the stack. So the result would be 15. The problem is that the Evaluate methode returns bool. That is because the expression evaluator can only perform expression the have true or false as result. So if you evaluate ( 4 + 5 * 3 ) you get an exception. You have to write ( 4 + 5 * 3 = 19 ) that is a correct expression that returns true or false. The expression evaluator was written to perform rules against the properties of objects (building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions)) and a rule has to return always true or false. You can download the source code from http://simpleexpeval.codeplex.com/.


Building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions)

In the first three posts of building a rule engine in c# building a rule engine in c#, building a rule engine in c# (part 2: extending the rule engine to handle collections), building a rule engine in c# (part 3: extending the rule engine to handle aggregations of collections) the definition of the rules are very basic and did not look very clear. So i decided to implement an expression evaluator. With this expression evaluator the definition of the rules becomes much clearer. I will show the difference in the next three pictures. The first one shows the definition of the rules how i described it in the past three posts.

ProcessingEngineDatabase

This second picture shows the definition with a expression evaluator.

ComplexProcessingRule

If we use a expression evaluator we can also use logical combinators (and, or) to combine the defined rules. The third picture shows how the processing rules would look like if they are combined.

ComplexProcessingRuleCombinator2

The problem with that processing rules is that in the first three post i used expression trees to generate the rules but i had the seperate parts of the rules and so i needed no parser. Now we have to develop a parser that parses the expression and an evaluator that checks the overtaken object against the parsed expression. In this post i will show how to use the expression evaluator i posted at codeplex (https://simpleexpeval.codeplex.com/) to validate a given object against the defined rule. In the additional blog post implementing an expression evaluator in c# i describes how the expression evaluator works. But in this post i will describe how to use it to define the rules you need to check your object again. The expression evaluator support the following operators:

  • open bracket ‘(‘ and close bracket ‘)’
  • addition ‘+’
  • subtraction ‘-‘
  • multiblication ‘*’
  • division ‘/’
  • modulo ‘%’
  • equality ‘=’
  • unequality ‘!=’
  • bigger than ‘>’
  • bigger than or equal ‘>=’
  • smaller than ‘<‘
  • smaller than or equal ‘<=’
  • logical and ‘&&’
  • logical or ‘||’

and the following data types:

  • integer ’10’
  • double ‘10.2’
  • string ‘”test”‘
  • boolean ‘true’ or ‘false’

This is a example how to use the expression evaluator with a defined person that is evaluated against a rule. That is the definition of the person class:

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
    public int Children { get; set; }
    public bool Married { get; set; }
}

Now the defined rule text (” (Children = 2 && Married = true) || Age > 36 “) is evaluated against the defined person object.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
RuleLoader ruleLoader = new RuleLoader();
Rule processingRule = ruleLoader.LoadProcessingRule(5);
//processingRule.ProcessingRule = " (Children = 2 && Married = true) || Age > 36 ";
Evaluator evaluator = new Evaluator();
var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

The evaluatorResult is true because the person object has two children and is married. Here is a second example that shows that it is possible to use properties at any possition in the rule text.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
RuleLoader ruleLoader = new RuleLoader();
Rule processingRule = ruleLoader.LoadProcessingRule(5);
//processingRule.ProcessingRule = " Name != 'test' && Children <= Age / 20 "; 
Evaluator evaluator = new Evaluator(); 
var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

The evaluatorResult is also true because Age / 20 is smaller than Children and the name is unequal ‘test’. If you want to use the expression evaluation engine you can download it from codeplex (https://simpleexpeval.codeplex.com/) and add a reference in your visual studio project. Then you can use the Evaluator class to evaluate the defined rule text against an overtaken object. In the blog post implementing an expression evaluator in c# i describe the implementation of the expression evaluator and in the post building a rule engine in c# (part 5: bringing it all together) i try to give an overview of the usage of the ruleengine.codeplex.com project.