Reading a csv file into a DataTable with Linq

Some times ago i had the requirement to read a csv file into a DataTable and doint it the classical iterative way seems boring to me so i thought about the make all with one linq statement princip and tried to accomplish the whole processing with one linq statement. That are my results i want to share. I stated with the following statement.

public DataTable ImportFirstAdept(FileInfo fileInfo, char splitChar, Encoding encoding)
{
    var plainfileName = Path.GetFileName(fileInfo.FullName);
    DataTable dataTable = new DataTable(plainfileName);
    var allRows = ((Func)(() => File.ReadAllLines(fileInfo.FullName, encoding)))();
    allRows.
        Select(str => str.Split(splitChar)).
        First().
        Select(columnHeader => { return new DataColumn(columnHeader.Trim()); }).
        Foreach(column =>
        {
            dataTable.Columns.Add(column);
            return column;
        }).
        ToList();
    allRows.
        Select(str => str.Split(splitChar)).
        Skip(1).
        Select(strArray =>
        {
            var strings = new List();
            foreach (var s in strArray)
                strings.Add(s.Trim());
            return strings.ToArray();
        }).
        Select(st => dataTable.Rows.Add(st)).
        ToList();
    return dataTable;
}

The follwoing test shows how it works.

public void CsvToDataTableTest(string path)
{
    DirectoryInfo directoryInfo = new DirectoryInfo(path);            
    CsvImport cvsImport = new CsvImport();
    DataSet dataSet = new DataSet();
    if (directoryInfo.Exists)
    {
        foreach (var file in directoryInfo.EnumerateFiles("*.csv"))
        {
            DataTable dataTable = cvsImport.ImportFirstAdept(file, '\t', Encoding.Default);
            dataSet.Tables.Add(dataTable);
        }
    }
}

[TestMethod]
public void CsvToDataTablePrequelsTest()
{
    CsvToDataTableTest(@"CsvToDataTableWithLinq\bin\Debug");
}

If we look at the ImportFirstAdept method it does not use one linq statement it uses two linq statements. So that was not sufficiently and i wrote some extension methodes to only use one Linq statement for the processing of the csv file.

public static class CsvImportExtension
{
    public static void If(this IEnumerable source, Func ifFunc, 
        Action ifAction, Action elseAction)
    {
        var counter = 1;
        List ifActions = new List();
        List elseActions = new List();
        foreach (var item in source)
        {
            if (ifFunc(item, counter))                                    
            {
                ifActions.Add(item);
            }                           
            else
                elseActions.Add(item);
            counter++;                
        }
        ifAction(ifActions.AsEnumerable());
        elseAction(elseActions.AsEnumerable());
    }

    public static void If(this IEnumerable source, Func ifFunc,
        Action ifAction, Action elseAction)
    {
        List ifActions = new List();
        List elseActions = new List();
        foreach (var item in source)
        {
            if (ifFunc(item))
            {
                ifActions.Add(item);
            }
            else
                elseActions.Add(item);
        }
        ifAction(ifActions.AsEnumerable());
        elseAction(elseActions.AsEnumerable());
    }

    public static IEnumerable Foreach(this IEnumerable source, Func foreachFunc)
    {
        foreach (var item in source)
        {
            yield return foreachFunc(item);
        }
    }

    public static void Foreach(this IEnumerable source, Action foreachAction)
    {
        foreach (var item in source)
        {
            foreachAction(item);
        }
    }
}

That is the ImportSecondAdept method that uses the if and foreach extension methodes to only use one linq statement to process the csv file.

public DataTable ImportSecondAdept(FileInfo fileInfo, char splitChar, Encoding encoding)
{
    var plainfileName = Path.GetFileName(fileInfo.FullName);
    var dataTable = new DataTable(plainfileName);
    ((Func)(() => File.ReadAllLines(fileInfo.FullName, encoding)))().
        Foreach(str => str.Split(splitChar)).
        If((t, i) => i == 1,
        t =>
        {
            t.First().
            Foreach(columnHeader => { return new DataColumn(columnHeader.Trim()); }).
            Foreach(column =>
            {
                dataTable.Columns.Add(column);
            });                    
        },
        t =>
        {
            t.
            Foreach(strArray =>
            {
                var strings = new List();
                foreach (var s in strArray)
                    strings.Add(s.Trim());
                return strings.ToArray();
            }).
            Foreach(str => dataTable.Rows.Add(str)).
            ToList();                    
        }
        );           
    return dataTable;
}

That is the test to try it and the result of the second adept method.

public void CsvToDataTableTest(string path)
{
    DirectoryInfo directoryInfo = new DirectoryInfo(path);            
    CsvImport cvsImport = new CsvImport();
    DataSet dataSet = new DataSet();
    if (directoryInfo.Exists)
    {
        foreach (var file in directoryInfo.EnumerateFiles("*.csv"))
        {
            DataTable dataTable = cvsImport.ImportSecondAdept(file, '\t', Encoding.Default);
            dataSet.Tables.Add(dataTable);
        }
    }
}

[TestMethod]
public void CsvToDataTablePrequelsTest()
{
    CsvToDataTableTest(@"CsvToDataTableWithLinq\bin\Debug");
}

csvToDataSetWithLinq

So now it is only one linq statement and the one linq statement princip is suffused.

 

Advertisements

Building a rule engine in c# (part 6: extentions to the rule engine)

When i started implementing the rule engine project (ruleengine.codeplex.com) it was thought as an example project. But i became some comments on the project that show that some people use the project productive and so i was asked to make some extensions to the rule engine. I am glad that someone uses my work an so i implemented the additional features. But the problem is that all that the description of that new features are spread over the comment sections of different posts. So in this post i want to summerice that features.

The first feature i was asked for by martin is that the rule engine has an is in and is not in operator. The FoundIn operator returns true if a person is in a list of given persons. The NotFoundIn operator gives back true if a person is not in a list of given persons. Here is the test i wrote to see if the operator works.

[TestMethod]
public void SimpleRuleLoaderPersonEvaluateFoundInExpression()
{
    Person person1 = new Person() { Name = "Mathias", Age = 36, Children = 2 };
    Person person2 = new Person() { Name = "Anna", Age = 35, Children = 2 };
    Person person3 = new Person() { Name = "Emil", Age = 4, Children = 0 };            

    List<Person> persons = new List<Person>();
    persons.Add(person1);
    persons.Add(person2);

    RuleEngine.RuleEngine ruleEngine = new RuleEngine.RuleEngine();
    var ruleFunc =
        ruleEngine.CompileRule<Person>("Name", Operator.FoundIn, persons);
    var result = ruleFunc(person1);

    Rule<Person> rule = new Rule<Person>("Name", Operator.NotFoundIn, persons);
    var ruleFuncRule =
        ruleEngine.CompileRule<Person>(rule);

    var ruleFuncRuleResult = ruleFuncRule(person3);
    Assert.AreEqual(result, true); 
    Assert.AreEqual(ruleFuncRuleResult, false);
}

To implement that operator i had build a expression tree that is more complex that the simple one for Equal, Unequal, SmallerThen etc. Here is the code that builds the expression tree to check if a person with a given attribute is in a list of persons.

public Tuple<Expression, 
ParameterExpression> BuildExpression<T>(string propertyName, Operator ruleOperator, 
    ParameterExpression parameterExpression, List<T> values)
{                    
    ParameterExpression listExpression = Expression.Parameter(typeof(List<T>));
    ParameterExpression counterExpression = 
Expression.Parameter(typeof(int));
    ParameterExpression toExpression = Expression.Parameter(typeof(int));
    ParameterExpression arrayExpression = Expression.Parameter(typeof(T[]));
    ParameterExpression valueExpression = Expression.Parameter(typeof(T));
    ParameterExpression checkExpression = Expression.Parameter(typeof(T));
    ParameterExpression returnExpression = 
Expression.Parameter(typeof(bool));
    MemberExpression memberExpression = MemberExpression.Property(parameterExpression, propertyName);
    Expression expression = memberExpression.Expression;
    var type = memberExpression.Type;
    ParameterExpression propertyExpression = Expression.Parameter(type);
    ParameterExpression localPropertyExpression = Expression.Parameter(type);

    LabelTarget breakLabel = Expression.Label();
    PropertyInfo result = typeof(List<T>).GetProperty("Count");
    MethodInfo toArray = typeof(List<T>).GetMethod("ToArray");
    var toArrayName = toArray.Name;
    MethodInfo getGetMethod = result.GetGetMethod();
    ConstantExpression constantExpression = Expression.Constant(true);
    if (ruleOperator == Operator.NotFoundIn)
    {
        constantExpression = Expression.Constant(false);
    }
    Expression loop = Expression.Block(
        new ParameterExpression[] { 
toExpression, arrayExpression, valueExpression, counterExpression, 
        returnExpression, propertyExpression, localPropertyExpression, listExpression },
        Expression.Assign(listExpression, Expression.Constant(values)),
        Expression.Assign(toExpression, Expression.Call(listExpression, getGetMethod)),
        Expression.Assign(arrayExpression, Expression.Call(listExpression, toArray)),
        Expression.Assign(propertyExpression, MemberExpression.Property(checkExpression, propertyName)),
        Expression.Loop(
            Expression.IfThenElse(
                Expression.LessThan(counterExpression, toExpression),
                Expression.Block(
                    Expression.Assign(valueExpression, 
Expression.ArrayAccess(arrayExpression, counterExpression)),
                    Expression.Assign(localPropertyExpression, 
Expression.Property(valueExpression, propertyName)),
                    Expression.IfThen(
                        Expression.Equal(propertyExpression, localPropertyExpression),
                        Expression.Block(Expression.Assign(returnExpression, constantExpression),
                            Expression.Break(breakLabel))),
                    Expression.Assign(
Expression.ArrayAccess(arrayExpression, counterExpression), checkExpression),
                    Expression.PostIncrementAssign(counterExpression)),
                Expression.Break(breakLabel)
                ), breakLabel
            ),
            Expression.And(returnExpression, constantExpression)
        );
    return new Tuple<Expression, ParameterExpression>(Expression.Block(loop), checkExpression);
}

This code builds a loop that enumerates a list of objects and checks a specified property (localPropertyExpression specified by propertyName) has the same value as the propertyExpression. If the values are the same the returnExpression is set to true. (if it is the NotFoundIn operator it is set to false). The CompileRule call looks the following:

public Func<T, bool> CompileRule<T>(string propertyName, 
    Operator ruleOperator, List<T> values)
{
    ExpressionBuilder expressionBuilder = new ExpressionBuilder();
    var param = Expression.Parameter(typeof(T));
    Tuple<Expression, ParameterExpression> expression =
        expressionBuilder.BuildExpression<T>(propertyName, ruleOperator, param, values);
    Func<T, bool> compiledExpression = Expression.Lambda<Func<T, bool>>(
        expression.Item1, expression.Item2).Compile();
    return compiledExpression;
}

The differnece to all the other CompileRule methodes is that it takes a list of values. That list of values becomes iterated in the expression tree.

The second feature i was asked for by Mike is that he needed to call a method at the person object if a specific  property of the person has a specific value. The problem was that he needed that in the expression evaluator string. The following code shows what he needed.

var tuple = evaluator.Evaluate<Person>(
                " (Age < 10) then SetCanReceiveBenefits(true) else SetCancelBenefits(true) ");

To check if the evaluation works correct i have added two methodes (SetCanReceiveBenefits and SetCancelBenefits) to the person object.

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
    public int Children { get; set; }
    public bool Married { get; set; }
    public DateTime Birthdate { get; set; }
    public Adresse Adresse_ { get; set; }
    public bool ReceiveBenefits { get; set; }
    public bool CancelBenefits { get; set; }
    public void SetCanReceiveBenefits(bool receiveBenefits)
    {
        ReceiveBenefits = receiveBenefits;
    }
    public void SetCancelBenefits(bool cancelBenefits)
    {
        CancelBenefits = cancelBenefits;
    }
    private List<Adresse> adresses = new List<Adresse>();
    public List<Adresse> Adresses_ 
    { 
        get { return adresses; } 
        set { adresses = value; } 
    }
}

After that i had to change the lexer and the parser of the evaluation evaluator to handle the then and else keyword. Here is the complete test that shows how to use the then and else keyword.

[TestMethod]
public void SimpleExpressionEvaluatorWithThenMethod()
{
    Person person1 = new Person()
    {
        Name = "Mathias",
        Age = 36,
        Children = 2,
        Married = true,
        Birthdate = new DateTime(1976, 5, 9),
        CancelBenefits = false,
        ReceiveBenefits = false
    };            
    Evaluator evaluator = new Evaluator();
    bool result = evaluator.Evaluate(
        " (Age > 10) then SetCanReceiveBenefits(true) ", person1);
    Assert.AreEqual(person1.ReceiveBenefits, true);
}

Here is the test that shows how the else keyword works.

[TestMethod]
public void SimpleExpressionPreEvaluatorWithThenAndElseMethod()
{
    Person person1 = new Person()
    {
        Name = "Mathias",
        Age = 36,
        Children = 2,
        Married = true,
        Birthdate = new DateTime(1976, 5, 9),
        CancelBenefits = false,
        ReceiveBenefits = false
    };
    Evaluator evaluator = new Evaluator();
    bool result = evaluator.Evaluate(
         " (Age < 10) then SetCanReceiveBenefits(true) else SetCancelBenefits(true) ", person1);
    Assert.AreEqual(person1.CancelBenefits, true);
}

Mike also wrote that he needs the rule engine to work with many objects and asked if it is possible that the rule engine could preprocess the evaluation of the expression. So i splitted the evaluation process and added the PreEvalute method to preprocess the evaluation of the evaluation string and the building of the symbol table. The ExecuteEvaluate method takes the prepocessed abstract syntax tree and the symbol table and does only the evaluation with the given object. Her is a test that shows that preprocessing step.

[TestMethod]
public void SimpleExpressionPreEvaluatorWithThenElseMethod()
{
    Person person1 = new Person()
    {
        Name = "Mathias",
        Age = 36,
        Children = 2,
        Married = true,
        Birthdate = new DateTime(1976, 5, 9),
        CancelBenefits = false,
        ReceiveBenefits = false
    };
    Person person2 = new Person()
    {
        Name = "Anna",
        Age = 32,
        Children = 2,
        Married = false,
        Birthdate = new DateTime(2002, 2, 2),
        CancelBenefits = false,
        ReceiveBenefits = false
    };
    Evaluator evaluator = new Evaluator();
    var tuple = evaluator.PreEvaluate(
        " (Age < 10) then SetCanReceiveBenefits(true) else SetCancelBenefits(true) ");
    evaluator.ExecuteEvaluate(tuple, person1);
    evaluator.ExecuteEvaluate(tuple, person2);
    Assert.AreEqual(person1.CancelBenefits, true);
    Assert.AreEqual(person2.CancelBenefits, true);
}

If you use rule engine in a real project that could save a lot of time because the evaluation needs some time and repeats over and over again if you use the normal evaluate method.

The third feature i was asked for by Damian, and he needed to check if a specific property is null. So i implemented the is null operator. The problem is that in c# value types like int can not take null so this operator works only correct with strings. The following test shows how the is null operator works.

[TestMethod]
public void SimpleExpressionStringIsNullMethod()
{            
    Person person2 = new Person()
    {
        Name = null,
        Age = 32,
        Children = 2,
        Married = false,
        Birthdate = new DateTime(2002, 2, 2),
        CancelBenefits = false,
        ReceiveBenefits = false
    };
    Evaluator evaluator = new Evaluator();
    var result = evaluator.Evaluate(
        " Name is null ", person2);
    Assert.AreEqual(result, true);            
}

So if you want to use the rule engine in real world projects i hope that helps to check all rules you have to check.