# implementing expression evaluator in c#

**Posted:**04/03/2013

**Filed under:**.net, c#, Lexer, Parser, Parsing |

**Tags:**ast, c#, expression evaluation, lexer, parser, Parsing 6 Comments

In this post i want to write about the implementation an expression evaluator. This expression evaluator evaluates simple c style expressions. I wrote it to implement defined rules like in my building a rule engine in c# post. A description about the use of the expression evaluator can be found at building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions). You can download the expression evaluator and the source code from (http://simpleexpeval.codeplex.com/). Now i want to show the use of the expression evaluator and then i will describe the implementation of the parts of the expression evaluator. That is the list of operations the expression evaluator supports:

- open bracket ‘(‘ and close bracket ‘)’
- addition ‘+’
- subtraction ‘-‘
- multiblication ‘*’
- division ‘/’
- modulo ‘%’
- equality ‘=’
- unequality ‘!=’
- bigger than ‘>’
- bigger than or equal ‘>=’
- smaller than ‘<‘
- smaller than or equal ‘<=’
- logical and ‘&&’
- logical or ‘||’

and the following data types:

- integer ’10’
- double ‘10.2’
- string ‘”test”‘
- boolean ‘true’ or ‘false’

Here is a example of the use of the expression evaluator.

Person person = new Person() { Name = "Mathias", Age = 36, Children = 2, Married = true }; var ruleText = " (Children = 2 && Married = true) || Age > 36 "; Evaluator evaluator = new Evaluator(); var evaluatorResult = evaluator.Evaluate<Person>(ruleText, person);

As we see the expression evaluator takes a rule text and a object. In this case it is a person object with four properties. I use this properties in the rule text and the expression evaluator replaces the property name with the value of the overtaken object. That means that for the expression evaluator the rule text looks like ” (2 = 2 && true = true) || 36 > 36 “. Now the expression evaluator can evaluate that text and check if the expression is true or false. In this case it is true because 2 = 2 and true = true.

The expression evaluator consist of four parts. The Lexer that reads every character, the Parser that builds the expression parts the infix to postfix converter and the evaluator that takes the expression parts and evaluates it to a specific result. I want to start with a description of the lexer. It is a so called LL(k) lexer that means it left associative and starts with the first left character. The k means how many character the lexer can look ahead. In this case it is only one character. Here is the class diagram of the lexer.

In the Lexer class i have implementet the basis functionality for a LL(k) lexer. The ExpressionEvaluatorLexer class contains the specific implementation for the expression evaluator. The most importand part is the NextToken method that reads character after character and builds Tokens out of the characters. Here is the code for the NextToken method.

public override Token NextToken() { while (lookahead[lookaheadIndex] != EOF) { switch (lookahead[lookaheadIndex]) { case '.': ConsumeItem(); return new Token(TokenType.POINT, "."); case '+': ConsumeItem(); return Plus(); case '-': ConsumeItem(); return Minus(); case '*': ConsumeItem(); return Multibly(); case '/': ConsumeItem(); return Divide(); case '%': ConsumeItem(); return Modulo(); case '<': ConsumeItem(); return Smaller(); case '>': ConsumeItem(); return Greater(); case '=': ConsumeItem(); return Equal(); case '!': ConsumeItem(); return Unequal(); case '"': ConsumeItem(); return String(); case '\'': ConsumeItem(); return SingleQuoteString(); case '|': ConsumeItem(); return Or(); case '&': ConsumeItem(); return And(); case '(': ConsumeItem(); return new Token(TokenType.OPENBRACKET, "("); case ')': ConsumeItem(); return new Token(TokenType.CLOSEBRACKET, ")"); default: if (IsLetter()) { return Terminal(); } if (IsNumber()) { return Number(); } if (IsWhitespace()) { ConsumeItem(); continue; } throw new Exception("invalid character." + lookahead[lookaheadIndex]); } } return new Token(TokenType.EOF, "<EOF>"); }

As we see it is a loop that loops every character in the lookahead char array. If it finds a specific character for example a digit, the lexer assumes that it has an integer or an double. So i calls the Number method.

private Token Number() { bool hasPoint = false; StringBuilder stringBuilder = new StringBuilder(); while (IsNumber() || IsPoint() || IsExponent()) { if (IsPoint()) { hasPoint = true; } stringBuilder.Append(lookahead[lookaheadIndex]); ConsumeItem(); } if (hasPoint) return new Token(TokenType.DOUBLE, stringBuilder.ToString()); return new Token(TokenType.INTEGER, stringBuilder.ToString()); }

The Number method tries to find out if the text is a integer or a double and it returns an integer or an double Token with the value it read. The lexer does this for every character and the parser controls the lexer and tells it to read the next token. That means the lexer does not read every character in the character array, it reads it token by token. If it has identified one token it returns the control back to the parser. Here we see the class diagram of the parser.

The ExpressionEvaluationParser reads the token from the lexer and builds a list of AbstractSyntaxTreeNodes. That nodes get rearranged by the InfixToPostfix class and then the result list is passed to the expression evaluator. That is necessary because the expression evaluator can handle operator precedence, that means that a multiblication is performed before and addition or substraction and that calculations are performed before comparison operations.

public AbstractSyntaxTreeNode BuildAST() { if (lookahead[lookaheadIndex].Type_ == TokenType.EQUAL) { return Equal(); } else if (lookahead[lookaheadIndex].Type_ == TokenType.INTEGER) { return Integer(lookahead[lookaheadIndex].Text); } ... public EqualNode Equal() { EqualNode equalNode = new EqualNode(); Match(TokenType.EQUAL); return equalNode; } public IntegerNode Integer(string value) { IntegerNode integerNode = new IntegerNode(); Match(TokenType.INTEGER); integerNode.Value = int.Parse(value); return integerNode; }

That is the parser code that perfoms the transformation from the tokens to the AbstractSyntaxTreeNodes. That means if it finds an Eqaul token it builds a EqualNode and consumes the next token, if it finds a Integer token it overtakes the value of the Integer, builds a IntegerNode with the overtaken value and consumes the next token with the match methode. The parser has a tranformation method for every token. We now have a list of AbstrancSyntaxTreeNodes that represent the overtaken text. After that we have to get the AbstractSyntaxTreeNodes in the correct order to handle operator precedence and open and closed brackets. The best way to do that is to convert the expression that is defined in infix into a postfix expression. The benefit of a postfix operation is that it is much easier to evaluate than an infix expression. All we need for the infix to postfix transformation is an operator stack. I will show an example for the transformation of a list of infix operations to a list of postfix operations. We start with the expression 5 + 3 * 2 – 1. To convert that infix expression to a postfix expression we take every part of the expression and tranform it. We need a stack for the operator precedence and a list that contains the postfix expressions after the transformation.

- 5 => to List ( 5 )
- + => to Operator Stack ( + )
- 3 => to List ( 5, 3 )
- * => to Operator Stack ( +, * )
- 2 => to List ( 5, 3, 2 )
- – => lower precedence than * on Stack => to List ( 5, 3, 2, * ) => Operator Stack ( +, – )
- 1 => to List ( 5, 3, 2, *, 1 )
- pop Operator Stack ( +, – ) => to List ( 5, 3, 2, *, 1, -, + )

Thats how the postfix list looks after the transformation: ( 5, 3, 2, *, 1, -, + )

Now we have one big benefit, we can take that list and evaluate it from the beginning to the end without mind of the operator precedence. But what is if we have brackets in out expression for example the expression ( 5 + 3 ) * 2. We can handle that also very easily.

- ( => handle bracket, instanciate new Operator Stack
- 5 => to List ( 5 )
- + => to new Operator Stack ( + )
- 3 => to List ( 5, 3 )
- ) => close bracket pop Operator Stack ( + ) => to List ( 5, 3, + )
- * => to Operator Stack ( * )
- 2 => to List ( 5, 3, +, 2 )
- pop Operator Stack ( * ) => to List ( 5, 3, +, 2, * )

As we see all brackets are gone and if we use postfix we dont need them anymore, if we evaluate that expression we can calculate the right result without any brackets.

What we now need to do is do define the operator precedence for all operators. That is necessary because at first the expression evaluator has to perform calculations (multiblication, division and modulo before addition and subtraction) then the expression evaluator has to check for comparison ( equal, unequal, greater than, smaller than). After he evaluated this it has to check for logical combinators (logical and, logical or). So that is the operator precedence of the expression evaluator.

- *, /, % have the highest order
- +, – have the second highest order
- =, !=, >, >=, <, <= have the third highest order
- &&, || have the lowest order

Here is the code for the ConvertInfixToPostfix expression that performs exact that transformation from an infix to a postfix expression.

public List<AbstractSyntaxTreeNode> ConvertInfixToPostfix(List<AbstractSyntaxTreeNode> postfixList) { List<AbstractSyntaxTreeNode> returnList = new List<AbstractSyntaxTreeNode>(); Stack<AbstractSyntaxTreeNode> operatorStack = new Stack<AbstractSyntaxTreeNode>(); var position = 0; for (int i = position; i < postfixList.Count; i++) { var item = postfixList[i]; if (item is IntegerNode || item is DoubleNode || item is VariableNode || item is BooleanNode || item is StringNode) { returnList.Add(item); } else if (item is OpenBracketNode) { i = ConvertInfixToPostfixBracket(i, postfixList, returnList); } else if (item is OrNode || item is AndNode) { while (operatorStack.Count > 0) { var abstractSyntaxTreeNode = operatorStack.Pop(); returnList.Add(abstractSyntaxTreeNode); } operatorStack.Push(item); } else if (item is GreaterThenNode || item is GreaterThenOrEqualNode || item is SmallerThenNode || item is SmallerThenOrEqualNode || item is EqualNode || item is UnEqualNode) { if (operatorStack.Count() > 0 && (operatorStack.Peek() is MulNode || operatorStack.Peek() is DivNode || operatorStack.Peek() is ModuloNode)) { AbstractSyntaxTreeNode node = operatorStack.Pop(); returnList.Add(node); } else if (operatorStack.Count() > 0 && (operatorStack.Peek() is AddNode || operatorStack.Peek() is SubNode)) { AbstractSyntaxTreeNode node = operatorStack.Pop(); returnList.Add(node); } operatorStack.Push(item); } else if (item is AddNode || item is SubNode) { if (operatorStack.Count() > 0 && (operatorStack.Peek() is MulNode || operatorStack.Peek() is DivNode || operatorStack.Peek() is ModuloNode)) { AbstractSyntaxTreeNode node = operatorStack.Pop(); returnList.Add(node); } operatorStack.Push(item); } else if (item is MulNode || item is DivNode || item is ModuloNode) { operatorStack.Push(item); } position++; } while (operatorStack.Count > 0) { var abstractSyntaxTreeNode = operatorStack.Pop(); returnList.Add(abstractSyntaxTreeNode); } return returnList; }

As we see the code is very simple it loops all AbstractSyntaxTreeNodes in the overtaken list and if it finds a Integer, Double, String or Boolean AbstractSyntaxTreeNode it adds it to the return list. If it finds a add or sub Node it checks the stack. If it finds a mul, div or mod Node it pops the Node (that has a higher precedence than the add or sub Node) and adds it to the list, then it pushes the actual operator to the stack. It works simular for comparison operators ( equal, unequal … ) and for logical operators ( and, or). If it comes to a open bracket it calls the ConvertInfixToPostfixOpenBracket method. That method instanciates a new operator stack and works the same as the ConvertInfixToPostfix method. It calls it self recursive for nested brackets. Now we have brought our AbstractSyntaxTreeNodes in the correct order and we can overhand our postfix list to the expression evaluator. That evaluator takes one node after the other and evaluates them. For that evaluation we need a stack that takes the result of the calculations. For example if we want to calculate the result of the expression ( 5, 3, 2, *, 1, -, + )

- 5 => to the Stack ( 5 )
- 3 => to the Stack ( 5, 3 )
- 2 => to the Stack ( 5, 3, 2 )
- * => take first two parameter from the Stack ( 2, 3 ) and interchange them because they are in wrong order. ( 3, 2 ) => calculate the multiblication 3 * 2 = 6 => push result to Stack ( 5, 6 )
- 1 => to the Stack ( 5, 6, 1 )
- – => take first two parameter from the Stack ( 1, 6 ) and interchange them because they are in wrong order. ( 6, 1 ) => calcualte the subtraction 6 – 1 = 5 => push result to Stack ( 5, 5 )
- + => take first two parameter from the Stack ( 5, 5 ) and interchange them because they are in wrong order. ( 5, 5 ) => calculate the addition 5 + 5 = 10 => push result to Stack

Now we can take the result from the Stack ( 10 ) => that is the result. Here is the code that shows the Evaluate method that takes the list of the AbstractSyntaxTreeNode and evaluates them.

public bool Evaluate<T>(List<AbstractSyntaxTreeNode> postfixList, Dictionary<string, AbstractSyntaxTreeNode> symbolTable, T objectValue) { foreach (var item in postfixList) { if (item is IntegerNode || item is DoubleNode || item is BooleanNode || item is StringNode) { valueStack.Push(item); } else if (item is AddNode) { var secondoperand = valueStack.Pop(); var firstoperand = valueStack.Pop(); if (secondoperand is IntegerNode && firstoperand is IntegerNode) { IntegerNode integerNode = new IntegerNode(); integerNode.Value = ((IntegerNode)firstoperand).Value + ((IntegerNode)secondoperand).Value; valueStack.Push(integerNode); } } else if (item is MulNode) { var secondoperand = valueStack.Pop(); var firstoperand = valueStack.Pop(); if (secondoperand is IntegerNode && firstoperand is IntegerNode) { IntegerNode integerNode = new IntegerNode(); integerNode.Value = ((IntegerNode)firstoperand).Value * ((IntegerNode)secondoperand).Value; valueStack.Push(integerNode); } } else if (item is EqualNode) { var secondoperand = valueStack.Pop(); var firstoperand = valueStack.Pop(); if (secondoperand is IntegerNode && firstoperand is IntegerNode) { BooleanNode booleanNode = new BooleanNode(); booleanNode.Value = ((IntegerNode)firstoperand).Value == ((IntegerNode)secondoperand).Value; valueStack.Push(booleanNode); } } ...

The evaluator part is really easy, it uses on stack for the values and the calculation results. So it takes on AbstractSyntaxTreeNode after the other and if it is data it puts it to the stack, if it is an operator it performs the operation and puts the result to the stack. If it gets the following postfixlist ( IntegerNode(4), IntegerNode(5), IntegerNode(3), MulNode(*),AddNode(+)) it starts with pushing 4 to the stack, pushing 5 to the stack, pushing 3 to the stack. Poping 3 and 5 from the stack perform the multiblication and push 15 to the stack. Then it pops 15 and 4 from the stack, performs the addition and pushes 19 to the stack. So the result would be 15. The problem is that the Evaluate methode returns bool. That is because the expression evaluator can only perform expression the have true or false as result. So if you evaluate ( 4 + 5 * 3 ) you get an exception. You have to write ( 4 + 5 * 3 = 19 ) that is a correct expression that returns true or false. The expression evaluator was written to perform rules against the properties of objects (building a rule engine in c# (part 4: extending the rule engine to evaluate defined expressions)) and a rule has to return always true or false. You can download the source code from http://simpleexpeval.codeplex.com/.