implementing expression evaluator in c#

In this post i want to write about the implementation an expression evaluator. This expression evaluator evaluates simple c style expressions. I wrote it to implement defined rules like in my building a rule engine in c# post. A description about the use of the expression evaluator can be found at building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions). You can download the expression evaluator and the source code from (http://simpleexpeval.codeplex.com/). Now i want to show the use of the expression evaluator and then i will describe the implementation of the parts of the expression evaluator. That is the list of operations the expression evaluator supports:

  • open bracket ‘(‘ and close bracket ‘)’
  • addition ‘+’
  • subtraction ‘-‘
  • multiblication ‘*’
  • division ‘/’
  • modulo ‘%’
  • equality ‘=’
  • unequality ‘!=’
  • bigger than ‘>’
  • bigger than or equal ‘>=’
  • smaller than ‘<‘
  • smaller than or equal ‘<=’
  • logical and ‘&&’
  • logical or ‘||’

and the following data types:

  • integer ’10’
  • double ‘10.2’
  • string ‘”test”‘
  • boolean ‘true’ or ‘false’

Here is a example of the use of the expression evaluator.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
var ruleText = " (Children = 2 && Married = true) || Age > 36 ";
Evaluator evaluator = new Evaluator();
var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

As we see the expression evaluator takes a rule text and a object. In this case it is a person object with four properties. I use this properties in the rule text and the expression evaluator replaces the property name with the value of the overtaken object. That means that for the expression evaluator the rule text looks like ” (2 = 2 && true = true) || 36 > 36 “. Now the expression evaluator can evaluate that text and check if the expression is true or false. In this case it is true because 2 = 2 and true = true.
The expression evaluator consist of four parts. The Lexer that reads every character, the Parser that builds the expression parts the infix to postfix converter and the evaluator that takes the expression parts and evaluates it to a specific result. I want to start with a description of the lexer. It is a so called LL(k) lexer that means it left associative and starts with the first left character. The k means how many character the lexer can look ahead. In this case it is only one character. Here is the class diagram of the lexer.

SimpleExpressionEvaluatorLexer

In the Lexer class i have implementet the basis functionality for a LL(k) lexer. The ExpressionEvaluatorLexer class contains the specific implementation for the expression evaluator. The most importand part is the NextToken method that reads character after character and builds Tokens out of the characters. Here is the code for the NextToken method.

public override Token NextToken()
{
    while (lookahead[lookaheadIndex] != EOF)
    {
        switch (lookahead[lookaheadIndex])
        {
            case '.': ConsumeItem(); return new Token(TokenType.POINT, ".");
            case '+': ConsumeItem(); return Plus();
            case '-': ConsumeItem(); return Minus();
            case '*': ConsumeItem(); return Multibly();
            case '/': ConsumeItem(); return Divide();
            case '%': ConsumeItem(); return Modulo();
            case '<': ConsumeItem(); return Smaller();
            case '>': ConsumeItem(); return Greater();
            case '=': ConsumeItem(); return Equal();
            case '!': ConsumeItem(); return Unequal();
            case '"': ConsumeItem(); return String();
            case '\'': ConsumeItem(); return SingleQuoteString();
            case '|': ConsumeItem(); return Or();
            case '&': ConsumeItem(); return And();                    
            case '(': ConsumeItem(); return new Token(TokenType.OPENBRACKET, "(");
            case ')': ConsumeItem(); return new Token(TokenType.CLOSEBRACKET, ")");
            default:                        
                if (IsLetter())
                {
                    return Terminal();
                }
                if (IsNumber())
                {
                    return Number();
                }
                if (IsWhitespace())
                {
                    ConsumeItem();
                    continue;
                }
                throw new Exception("invalid character." + lookahead[lookaheadIndex]);
        }
    }
    return new Token(TokenType.EOF, "<EOF>");
}

As we see it is a loop that loops every character in the lookahead char array. If it finds a specific character for example a digit, the lexer assumes that it has an integer or an double. So i calls the Number method.

private Token Number()
{
    bool hasPoint = false;
    StringBuilder stringBuilder = new StringBuilder();
    while (IsNumber() || IsPoint() || IsExponent())
    {
        if (IsPoint())
        {
            hasPoint = true;
        }
        stringBuilder.Append(lookahead[lookaheadIndex]);
        ConsumeItem();
    }
    if (hasPoint)
        return new Token(TokenType.DOUBLE, stringBuilder.ToString());
    return new Token(TokenType.INTEGER, stringBuilder.ToString());
}

The Number method tries to find out if the text is a integer or a double and it returns an integer or an double Token with the value it read. The lexer does this for every character and the parser controls the lexer and tells it to read the next token. That means the lexer does not read every character in the character array, it reads it token by token. If it has identified one token it returns the control back to the parser. Here we see the class diagram of the parser.

SimpleExpressionEvaluatorParser

The ExpressionEvaluationParser reads the token from the lexer and builds a list of AbstractSyntaxTreeNodes. That nodes get rearranged by the InfixToPostfix class and then the result list is passed to the expression evaluator. That is necessary because the expression evaluator can handle operator precedence, that means that a multiblication is performed before and addition or substraction and that calculations are performed before comparison operations.

public AbstractSyntaxTreeNode BuildAST()
{                                    
    if (lookahead[lookaheadIndex].Type_ == TokenType.EQUAL)
    {
        return Equal();
    }
    else if (lookahead[lookaheadIndex].Type_ == TokenType.INTEGER)
    {
        return Integer(lookahead[lookaheadIndex].Text);
    }
    ...
public EqualNode Equal()
{
     EqualNode equalNode = new EqualNode();
     Match(TokenType.EQUAL);
     return equalNode;
}
public IntegerNode Integer(string value)
{
    IntegerNode integerNode = new IntegerNode();
    Match(TokenType.INTEGER);            
    integerNode.Value = int.Parse(value);
    return integerNode;
}

That is the parser code that perfoms the transformation from the tokens to the AbstractSyntaxTreeNodes. That means if it finds an Eqaul token it builds a EqualNode and consumes the next token, if it finds a Integer token it overtakes the value of the Integer, builds a IntegerNode with the overtaken value and consumes the next token with the match methode. The parser has a tranformation method for every token. We now have a list of AbstrancSyntaxTreeNodes that represent the overtaken text. After that we have to get the AbstractSyntaxTreeNodes in the correct order to handle operator precedence and open and closed brackets. The best way to do that is to convert the expression that is defined in infix into a postfix expression. The benefit of a postfix operation is that it is much easier to evaluate than an infix expression. All we need for the infix to postfix transformation is an operator stack. I will show an example for the transformation of a list of infix operations to a list of postfix operations. We start with the expression 5 + 3 * 2 – 1. To convert that infix expression to a postfix expression we take every part of the expression and tranform it. We need a stack for the operator precedence and a list that contains the postfix expressions after the transformation.

  • 5 => to List ( 5 )
  • + => to Operator Stack ( + )
  • 3 => to List ( 5, 3 )
  • * => to Operator Stack ( +, * )
  • 2 => to List ( 5, 3, 2 )
  • – => lower precedence than * on Stack => to List ( 5, 3, 2, * ) => Operator Stack ( +, – )
  • 1 => to List ( 5, 3, 2, *, 1 )
  • pop Operator Stack ( +, – ) => to List ( 5, 3, 2, *, 1, -, + )

Thats how the postfix list looks after the transformation: ( 5, 3, 2, *, 1, -, + )
Now we have one big benefit, we can take that list and evaluate it from the beginning to the end without mind of the operator precedence. But what is if we have brackets in out expression for example the expression ( 5 + 3 ) * 2. We can handle that also very easily.

  • ( => handle bracket, instanciate new Operator Stack
  • 5 => to List ( 5 )
  • + => to new Operator Stack ( + )
  • 3 => to List ( 5, 3 )
  • ) => close bracket pop Operator Stack ( + ) => to List ( 5, 3, + )
  • * => to Operator Stack ( * )
  • 2 => to List ( 5, 3, +, 2 )
  • pop Operator Stack ( * ) => to List ( 5, 3, +, 2, * )

As we see all brackets are gone and if we use postfix we dont need them anymore, if we evaluate that expression we can calculate the right result without any brackets.

What we now need to do is do define the operator precedence for all operators. That is necessary because at first the expression evaluator has to perform calculations (multiblication, division and modulo before addition and subtraction) then the expression evaluator has to check for comparison ( equal, unequal, greater than, smaller than). After he evaluated this it has to check for logical combinators (logical and, logical or). So that is the operator precedence of the expression evaluator.

  • *, /, % have the highest order
  • +, – have the second highest order
  • =, !=, >, >=, <, <= have the third highest order
  • &&, || have the lowest order

Here is the code for the ConvertInfixToPostfix expression that performs exact that transformation from an infix to a postfix expression.

public List<AbstractSyntaxTreeNode> ConvertInfixToPostfix(List<AbstractSyntaxTreeNode> postfixList)
{
    List<AbstractSyntaxTreeNode> returnList = new List<AbstractSyntaxTreeNode>();
    Stack<AbstractSyntaxTreeNode> operatorStack = new Stack<AbstractSyntaxTreeNode>();
    var position = 0;
    for (int i = position; i < postfixList.Count; i++)            
    {
        var item = postfixList[i];
        if (item is IntegerNode || item is DoubleNode || item is VariableNode ||
            item is BooleanNode || item is StringNode)
        {
            returnList.Add(item);
        }
        else if (item is OpenBracketNode)
        {
            i = ConvertInfixToPostfixBracket(i, postfixList, returnList);
        }
        else if (item is OrNode || item is AndNode)
        {
            while (operatorStack.Count > 0)
            {
                var abstractSyntaxTreeNode = operatorStack.Pop();
                returnList.Add(abstractSyntaxTreeNode);
            }
            operatorStack.Push(item);
        }
        else if (item is GreaterThenNode || item is GreaterThenOrEqualNode ||
            item is SmallerThenNode || item is SmallerThenOrEqualNode ||
            item is EqualNode || item is UnEqualNode)
        {
            if (operatorStack.Count() > 0 && (operatorStack.Peek() is MulNode || 
                operatorStack.Peek() is DivNode ||
                operatorStack.Peek() is ModuloNode))
            {
                AbstractSyntaxTreeNode node = operatorStack.Pop();
                returnList.Add(node);
            }
            else if (operatorStack.Count() > 0 && (operatorStack.Peek() is AddNode || 
                operatorStack.Peek() is SubNode))
            {
                AbstractSyntaxTreeNode node = operatorStack.Pop();
                returnList.Add(node);
            }
            operatorStack.Push(item);
        }
        else if (item is AddNode || item is SubNode)
        {
            if (operatorStack.Count() > 0 && (operatorStack.Peek() is MulNode || 
                operatorStack.Peek() is DivNode ||
                operatorStack.Peek() is ModuloNode))
            {
                AbstractSyntaxTreeNode node = operatorStack.Pop();
                returnList.Add(node);
            }
            operatorStack.Push(item);
        }
        else if (item is MulNode || item is DivNode || item is ModuloNode)
        {
            operatorStack.Push(item);
        }                
        position++;
    }
    while (operatorStack.Count > 0)
    {
        var abstractSyntaxTreeNode = operatorStack.Pop();
        returnList.Add(abstractSyntaxTreeNode);
    }
    return returnList;
}

As we see the code is very simple it loops all AbstractSyntaxTreeNodes in the overtaken list and if it finds a Integer, Double, String or Boolean AbstractSyntaxTreeNode it adds it to the return list. If it finds a add or sub Node it checks the stack. If it finds a mul, div or mod Node it pops the Node (that has a higher precedence than the add or sub Node) and adds it to the list, then it pushes the actual operator to the stack. It works simular for comparison operators ( equal, unequal … ) and for logical operators ( and, or). If it comes to a open bracket it calls the ConvertInfixToPostfixOpenBracket method. That method instanciates a new operator stack and works the same as the ConvertInfixToPostfix method. It calls it self recursive for nested brackets. Now we have brought our AbstractSyntaxTreeNodes in the correct order and we can overhand our postfix list to the expression evaluator. That evaluator takes one node after the other and evaluates them. For that evaluation we need a stack that takes the result of the calculations. For example if we want to calculate the result of the expression ( 5, 3, 2, *, 1, -, + )

  • 5 => to the Stack ( 5 )
  • 3 => to the Stack ( 5, 3 )
  • 2 => to the Stack ( 5, 3, 2 )
  • * => take first two parameter from the Stack ( 2, 3 ) and interchange them because they are in wrong order. ( 3, 2 ) => calculate the multiblication 3 * 2 = 6 => push result to Stack ( 5, 6 )
  • 1 => to the Stack ( 5, 6, 1 )
  • – => take first two parameter from the Stack ( 1, 6 ) and interchange them because they are in wrong order. ( 6, 1 ) => calcualte the subtraction 6 – 1 = 5 => push result to Stack ( 5, 5 )
  • + => take first two parameter from the Stack ( 5, 5 ) and interchange them because they are in wrong order. ( 5, 5 ) => calculate the addition 5 + 5 = 10 => push result to Stack

Now we can take the result from the Stack ( 10 ) => that is the result. Here is the code that shows the Evaluate method that takes the list of the AbstractSyntaxTreeNode and evaluates them.

public bool Evaluate<T>(List<AbstractSyntaxTreeNode> postfixList, 
    Dictionary<string, AbstractSyntaxTreeNode> symbolTable, T objectValue)
{
    foreach (var item in postfixList)
    {
        if (item is IntegerNode || item is DoubleNode || 
            item is BooleanNode || item is StringNode)
        {
            valueStack.Push(item);
        }
        else if (item is AddNode)
        {
            var secondoperand = valueStack.Pop();
            var firstoperand = valueStack.Pop();
            if (secondoperand is IntegerNode && firstoperand is IntegerNode)
            {
                 IntegerNode integerNode = new IntegerNode();
                 integerNode.Value = ((IntegerNode)firstoperand).Value + 
                      ((IntegerNode)secondoperand).Value;
                 valueStack.Push(integerNode);
            }
         }
         else if (item is MulNode)
         {
              var secondoperand = valueStack.Pop();
              var firstoperand = valueStack.Pop();
              if (secondoperand is IntegerNode && firstoperand is IntegerNode)
              {
                   IntegerNode integerNode = new IntegerNode();
                        integerNode.Value = ((IntegerNode)firstoperand).Value *
                   ((IntegerNode)secondoperand).Value;
                   valueStack.Push(integerNode);
              }
         }
         else if (item is EqualNode)
         {
              var secondoperand = valueStack.Pop();
              var firstoperand = valueStack.Pop();
              if (secondoperand is IntegerNode && firstoperand is IntegerNode)
              {
                   BooleanNode booleanNode = new BooleanNode();
                   booleanNode.Value = ((IntegerNode)firstoperand).Value == 
                        ((IntegerNode)secondoperand).Value;
                   valueStack.Push(booleanNode);
         }
}
...

The evaluator part is really easy, it uses on stack for the values and the calculation results. So it takes on AbstractSyntaxTreeNode after the other and if it is data it puts it to the stack, if it is an operator it performs the operation and puts the result to the stack. If it gets the following postfixlist ( IntegerNode(4), IntegerNode(5), IntegerNode(3), MulNode(*),AddNode(+)) it starts with pushing 4 to the stack, pushing 5 to the stack, pushing 3 to the stack. Poping 3 and 5 from the stack perform the multiblication and push 15 to the stack. Then it pops 15 and 4 from the stack, performs the addition and pushes 19 to the stack. So the result would be 15. The problem is that the Evaluate methode returns bool. That is because the expression evaluator can only perform expression the have true or false as result. So if you evaluate ( 4 + 5 * 3 ) you get an exception. You have to write ( 4 + 5 * 3 = 19 ) that is a correct expression that returns true or false. The expression evaluator was written to perform rules against the properties of objects (building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions)) and a rule has to return always true or false. You can download the source code from http://simpleexpeval.codeplex.com/.

Advertisements

Building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions)

In the first three posts of building a rule engine in c# building a rule engine in c#, building a rule engine in c# (part 2: extending the rule engine to handle collections), building a rule engine in c# (part 3: extending the rule engine to handle aggregations of collections) the definition of the rules are very basic and did not look very clear. So i decided to implement an expression evaluator. With this expression evaluator the definition of the rules becomes much clearer. I will show the difference in the next three pictures. The first one shows the definition of the rules how i described it in the past three posts.

ProcessingEngineDatabase

This second picture shows the definition with a expression evaluator.

ComplexProcessingRule

If we use a expression evaluator we can also use logical combinators (and, or) to combine the defined rules. The third picture shows how the processing rules would look like if they are combined.

ComplexProcessingRuleCombinator2

The problem with that processing rules is that in the first three post i used expression trees to generate the rules but i had the seperate parts of the rules and so i needed no parser. Now we have to develop a parser that parses the expression and an evaluator that checks the overtaken object against the parsed expression. In this post i will show how to use the expression evaluator i posted at codeplex (https://simpleexpeval.codeplex.com/) to validate a given object against the defined rule. In the additional blog post implementing an expression evaluator in c# i describes how the expression evaluator works. But in this post i will describe how to use it to define the rules you need to check your object again. The expression evaluator support the following operators:

  • open bracket ‘(‘ and close bracket ‘)’
  • addition ‘+’
  • subtraction ‘-‘
  • multiblication ‘*’
  • division ‘/’
  • modulo ‘%’
  • equality ‘=’
  • unequality ‘!=’
  • bigger than ‘>’
  • bigger than or equal ‘>=’
  • smaller than ‘<‘
  • smaller than or equal ‘<=’
  • logical and ‘&&’
  • logical or ‘||’

and the following data types:

  • integer ’10’
  • double ‘10.2’
  • string ‘”test”‘
  • boolean ‘true’ or ‘false’

This is a example how to use the expression evaluator with a defined person that is evaluated against a rule. That is the definition of the person class:

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
    public int Children { get; set; }
    public bool Married { get; set; }
}

Now the defined rule text (” (Children = 2 && Married = true) || Age > 36 “) is evaluated against the defined person object.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
RuleLoader ruleLoader = new RuleLoader();
Rule processingRule = ruleLoader.LoadProcessingRule(5);
//processingRule.ProcessingRule = " (Children = 2 && Married = true) || Age > 36 ";
Evaluator evaluator = new Evaluator();
var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

The evaluatorResult is true because the person object has two children and is married. Here is a second example that shows that it is possible to use properties at any possition in the rule text.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true };
RuleLoader ruleLoader = new RuleLoader();
Rule processingRule = ruleLoader.LoadProcessingRule(5);
//processingRule.ProcessingRule = " Name != 'test' && Children <= Age / 20 "; 
Evaluator evaluator = new Evaluator(); 
var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

The evaluatorResult is also true because Age / 20 is smaller than Children and the name is unequal ‘test’. If you want to use the expression evaluation engine you can download it from codeplex (https://simpleexpeval.codeplex.com/) and add a reference in your visual studio project. Then you can use the Evaluator class to evaluate the defined rule text against an overtaken object. In the blog post implementing an expression evaluator in c# i describe the implementation of the expression evaluator and in the post building a rule engine in c# (part 5: bringing it all together) i try to give an overview of the usage of the ruleengine.codeplex.com project.


Implementing a regular expression to state machine parser. (part 1: introduction)

The idea behind this post is the generation of regular expressions. After writing a regular regular expression i read a book about data structures. In this book the author writes about implementing a regular expression parser with the help of a state maschine. The basic idea is to build a state maschine out of the regex expression, and then let the regex text flow through that state machine. If the text passes the state machine it corresponds with the regex expression, if not it throws an error. That principle can be transformed to a simple custom language. So in this post i first will show how to implement a subset of the regular expressions. In a second post i will describe the implementation of the RegexToStateMachine project. Then i will show how to transform a custom language that has control commands to send and receive data to a state machine. This control commands will be parsed and checked with the generated state maschine.

Now i will start with theoretical description of the process of generating a state machine out of a subset of the  regular expressions. This are the definied commands:

  1. c – Matches any literal character c
  2. . – Matches any single character
  3. * – Matches one or more occurrences of the previous character or characters
  4. () – Brackets to group the characters
  5. | – Logical or

First we have to think about how we could build a dynamic state machine out of a given set of characters. That is very simple. We parse each character and build a State object for it and we build a state transition from the current state to the next state. If we do so we get a state maschine where we have only one state transition from one state to another state.

For example if we have the regular expression:

aabba

we have the state maschine:

aabba

Now we have the text “aabba”, that text would flow through that state maschine and would throw now error. Every other text would fail and throw an error. But what is with regular expressions that contains repeating. For example the following expression.

a* (a can be repeated n times)

If we want a state machine that works for such a regular expression we need a different state transition. The following graphic shows how that state transition has to look.

singe-charcter closure

The state * has a jump back to the previous character. If we have the text “aaaaa” it would work with that state machine. But what is with the following regular expression.

(abc)*

For that expression the state machine has to look like the one on that following graphic. So we need to include a jump back to the last open bracket. .

closue expression

If we have the text “abcabcabc” it would work with that state machine. Now we have to check how the | operator that stands for logical or works. If we have the following regular expression.

(ab|cd)

We need the following state machine with the following transitions. So the state machine has to jump to the next character behind the pipe operator. .

or expression

If we have a program that can dynamical build a state machine for all that operations we have to ensure that we can combine all that operations so that the following regular expression would also work.

(ab|ef)*

The following graphic shows the state machine that combines the or (|) operation with the repeat (*) operation.

repeat and or
As we see there are many texts that would fit that regular espression. For example “abab” or “efefef”. Now we have definied the operations for the regular expression to state machine parser. So with this transformations we now know how the state machine has to look for the defined commands. What we now need is a program that converts the expression in the corresponding state machine and checks if a given text flows through the generated state machine.

The description of a possible implementation is shown in the second part of that post series. The source code is available at RegexToStateMachine.


Using JsonToStaticTypeConverter (part 1: introduction)

In this post i will write about the JsonToStaticTypeConverter project and how to use it. I created that project after implementing a Json parser. After finishing the parser i thought about where i could use this. So the result is an application that can generate c# classes out of a Json string. So with the  JsonToStaticTypeConverter you can generate c# classes and fill them with your Json data.

If we press generate a class Addresse im namespace MainTest in the file Addresse.cs in the folder C:\Projecte is generated out of the Json code (in this case { ‘street’ : ‘Teststreet’, ‘number’ : 1, ‘city’ : ‘Graz’ }. Here we see the generated c# file:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using JsonConverter;
namespace MainTest
{
public class Addresse : JsonStaticObject
{
public string street { get { return JsonSource.Find<string>("Addresse", "street"); } }
public double number { get { return JsonSource.Find<double>("Addresse", "number"); } }
public string city { get { return JsonSource.Find<string>("Addresse", "city"); } }
}
}

This c# file is inserted into the solution that wants to process the Json data. Here we see how the JsonConverter is used to generate the static structure of Addresse out of the Json data. We also need to reference the JsonConverter.dll from the JsonToStaticTypeConverter Project. Therefore we use the Deserialize method and overtake the Json string.

var jsonConverter = new JsonConverter<MainTest.Addresse>();
MainTest.Addresse address = jsonConverter.Deserialize(
    "{ 'street' : 'Gürtel', 'number' : 23, 'city' : 'Vienna' }");
var street = address.street;
var number = address.number;
var city = address.city;
Trace.WriteLine(street);
Trace.WriteLine(number);
Trace.WriteLine(city);

Here we see the trace result:

So with the JsonConverter it is possible to use the Json data due to a static structure. In the second post of this series i will write about the implementation of JsonToStaticTypeConverter and about the implementation of the Json ParserCombinator.