Distinct a DataTable with c#

Today i want to write about the different possibilities to distinct the data in a DataTable. I had that problem some times ago and searched for a good solution. I found some different solutions and in that post i will show them. First i want to show the test i wrote to test the distinct of the DataTable and the data stored in the DataTable.

[TestMethod]
public void Distinct()
{
    DataTable dataTable = new DataTable("Distinct");
    var oldPK = new DataColumn() { ColumnName = "PK", DataType = typeof(int) };
    dataTable.Columns.Add(oldPK);
    var secondPK = new DataColumn() { ColumnName = "secondPK", DataType = typeof(int) };
    dataTable.Columns.Add(secondPK);
    dataTable.Columns.Add(new DataColumn() 
        { ColumnName = "firstname", DataType = typeof(string) });
    dataTable.Columns.Add(new DataColumn() 
        { ColumnName = "secondname", DataType = typeof(string) });
    dataTable.Columns.Add(new DataColumn() 
        { ColumnName = "socialSecurityNumber", DataType = typeof(int) });
    dataTable.PrimaryKey = new DataColumn[] { oldPK, secondPK };
    foreach (var number in Enumerable.Range(1, 3))
    {
        DataRow dataRow = dataTable.NewRow();
        dataRow["PK"] = number;
        dataRow["secondPK"] = number + 1;
        dataRow["firstname"] = "mathias";
        dataRow["secondname"] = "schober";
        dataRow["socialSecurityNumber"] = 4040;
        dataTable.Rows.Add(dataRow);
    }
    foreach (var number in Enumerable.Range(4, 2))
    {
        DataRow dataRow = dataTable.NewRow();
        dataRow["PK"] = number;
        dataRow["secondPK"] = number + 1;
        dataRow["firstname"] = "anna";
        dataRow["secondname"] = "schober";
        dataRow["socialSecurityNumber"] = 5050;
        dataTable.Rows.Add(dataRow);
    }
    var newDataTable = 
        dataTable.Distinct("socialSecurityNumber");
    var newDataViewDataTable = 
        dataTable.DistinctDataView("socialSecurityNumber");
    var newEqualityCompareDataTable = 
        dataTable.DistinctEqualityCompare("socialSecurityNumber");
    var newGenericEqualityCompareDataTable = 
        dataTable.DistinctGenericEqualityCompare("socialSecurityNumber");         
}

DistinctDataTable
As we see i wrote four different methods to distinct the DataTable. The first on Distinct is the classical way. I create a new DataTable, take the original DataTable, enumerate all rows and check if the value of the overtaken column (in the test i use socialSecurityNumber) is already in the new DataTable. Then i set the primary key of the new DataTable and reorder the columns to bring the primary key (overtaken column) to the first position. Here you see the code to do so.

public static class Distincting
{
    public static DataTable Distinct(this DataTable dataTable, 
        string columName)
    {
        DataColumn[] dataColumns = dataTable.PrimaryKey;
        if (dataColumns.Length >= 1)
        {
            foreach (DataColumn primaryKey in dataColumns)
            {
                if (primaryKey.ColumnName == columName)
                {
                    return dataTable;
                }                    
            }                
        }            
        return RebuildDataTable(dataTable, columName);
    }

    private static DataTable RebuildDataTable(DataTable dataTable, 
        string columName)
    {
        DataTable newDataTable = dataTable.Clone();
        DataColumn newDataColumn = newDataTable.Columns[columName];
        newDataTable.PrimaryKey = new DataColumn[] { newDataColumn };
        newDataTable.Columns[columName].SetOrdinal(0);
        dataTable.Rows.Foreach(dataRow =>
        {
            object pk = dataRow[columName];
            if (newDataTable.Rows.Find(pk) == null)
            {
                newDataTable.ImportRow(dataRow);
            }
        });
        ReorderPrimaryKey(dataTable, newDataTable);
        return newDataTable;
    }

    private static void ReorderPrimaryKey(DataTable dataTable, 
        DataTable newDataTable)
    {
        int maxOrdinal = 0;
        foreach (DataColumn column in newDataTable.Columns)
        {
            if (column.Ordinal > maxOrdinal)
                maxOrdinal = column.Ordinal;
        }
        DataColumn[] dataColumns = dataTable.PrimaryKey;
        if (dataColumns.Length >= 1)
        {
            foreach (DataColumn primaryKey in dataColumns)
            {
                newDataTable.Columns[primaryKey.ColumnName].
                    SetOrdinal(maxOrdinal);
            }
        }
    }
}
public static class ForEachEnumeration
{
    public static void Foreach(this DataRowCollection dataRows, 
Action action)
    {
        foreach(T dataRow in dataRows)
        {
            action(dataRow);
        }
    }
}

After distincting with the social sercurity number we have the following DataTable.
DistinctDataTableVersion1

The second possibility to distinct a DataTable is using the ToTable method of the DataView. In the next code part you can see how that works.

public static DataTable DistinctDataView(this DataTable dataTable,
        string columName)
{
    DataView view = new DataView(dataTable);
    return view.ToTable(true, new string[] { columName });
}

That is the resulting DataTable. As we see it contains only the social security number. The other columns are gone.
DistinctDataTableVersion2

After the classic version and the DataView version it is also possible to solve the problem with the linq Distinct Extension method.

public static DataTable DistinctEqualityCompare(this DataTable dataTable, 
    string columName)
{
    DataRowEqaulityComparer dataRowEqualityComparer = 
        new DataRowEqaulityComparer(columName);
    return dataTable.AsEnumerable().
        Distinct(dataRowEqualityComparer).CopyToDataTable();
}

To accomplish that we need a EqualityComparer.

public class DataRowEqualityComparer : IEqualityComparer
{
    private string distinctColumnName;

    public DataRowEqualityComparer(string distinctColumnName)
    {
        this.distinctColumnName = distinctColumnName;
    }

    public bool Equals(DataRow x, DataRow y)
    {
        if (x.Field<object>(distinctColumnName).
            Equals(y.Field<object>(distinctColumnName)))
        {
            return true;
        }
        return false;
    }

    public int GetHashCode(DataRow obj)
    {
        return obj.Field<object>(distinctColumnName).GetHashCode();
    }
}

Or we could also use a Generic EqualityComparer. (implementing a generic IEqualityComparer and IComparer class)

public static DataTable DistinctGenericEqualityCompare(this DataTable dataTable,
    string columName)
{
    EqualityComparer dataRowEqualityComparer = 
        new EqualityComparer(
        (x, y) => x.Field<object>(columName).Equals(y.Field<object>(columName)),
        x => x.Field<object>(columName).GetHashCode()
        );
    return dataTable.AsEnumerable().
        Distinct(dataRowEqualityComparer).CopyToDataTable();
}
public class EqualityComparer : IEqualityComparer
{
    private Func<T, T, bool> equalsFunction;
    private Func<T, int> getHashCodeFunction;

    public EqualityComparer(Func<T, T, bool> equalsFunction)
    {
        this.equalsFunction = equalsFunction;
    }

    public EqualityComparer(Func<T, T, bool> equalsFunction,
        Func<T, int> getHashCodeFunction)
        : this(equalsFunction)
    {
        this.getHashCodeFunction = getHashCodeFunction;
    }

    public bool Equals(T a, T b)
    {
        return equalsFunction(a, b);
    }

    public int GetHashCode(T obj)
    {
        if (getHashCodeFunction == null)
            return obj.GetHashCode();
        return getHashCodeFunction(obj);
    }
}

That is the resulting DataTable.
DistinctDataTableVersion3
That are all possibilities i could think of to distinct a DataTable.

Advertisements

implementing a generic IEqualityComparer and IComparer class

Today i will write about the generic implementation of IComparer and IEqualityComparer. I read about this topic some times ago and now i wrote a linq join where i needed a specific EqualityComparer. So i decided to implement a generic version. The base of the Implementation is the generic IEqualityComparer<T> interface,

public interface IEqualityComparer<in T>
{
  bool Equals(T x, T y);
  bool int GetHashCode(T x);
}

and the generic IComparer<T> interface.

public interface IComparer<in T>
{
  int Compare(T x,T y)
}

If you use the IEqualityComparer<in T> interface you have to implement the Equals method that checks the equality of the two overtaken objects of type T and you have to implement the GetHashCode method that generates a customized  hashcode for the comparer object. This IEqualityComparer interface is widley used in Linq. For example at the Join, GroupJoin or Except extension methodes.

If you use the IComparer<in T> inferface you have to implement the Compare method that takes two objects of type T and compares them. If a is smaller than b it returns -1 if a and b are equal it returns 0 and if a is bigger than b it returns 1. The IComparer interface is used to sort datastructures. The List<T>.Sort method takes a comparer to decide how to sort a list.

The generic implemetation of the two interfaces is very easy. We use Func delegates to overtake the methodes dynmically and call that delegates if the Equals, GetHashCode and Compare methodes are called. Here is the implementation of the EqualityComparer.

public class EqualityComparer<T> : IEqualityComparer<T>
{
    private Func<T, T, bool> equalsFunction;
    private Func<T, int> getHashCodeFunction;

    public EqualityComparer(Func<T, T, bool> equalsFunction)
    {
        this.equalsFunction = equalsFunction;
    }

    public EqualityComparer(Func<T, T, bool> equalsFunction, 
        Func<T, int> getHashCodeFunction) : this(equalsFunction)
    {
        this.getHashCodeFunction = getHashCodeFunction;
    }

    public bool Equals(T a, T b)
    {
        return equalsFunction(a, b);
    }

    public int GetHashCode(T obj)
    {
        if (getHashCodeFunction == null)
            return obj.GetHashCode();
        return getHashCodeFunction(obj);
    }
}

And the use of the Equality Comparer:

List<Person> firstPersonList = new List<Person>() 
    { new Person() { Name = "mathias", Age = 36 }, 
        new Person() { Name = "karoline", Age = 35 } };

List<Person> secondPersonList = new List<Person>() { 
    new Person() { Name = "karl", Age = 36 }, 
    new Person() { Name = "sepp", Age = 36 }, 
    new Person() { Name = "emil", Age = 35 }, 
    new Person() { Name = "mia", Age = 35 }, 
    new Person() { Name = "hans", Age = 36 } };

EqualityComparer<Person> equalityComparer = new
    EqualityComparer<Person>((s1, s2) => s1.Age == s2.Age);

var groupJoinItems = firstPersonList.GroupJoin(
    secondPersonList, 
    p => p, 
    p => p,
    (p1, p2) => new { Name = p1.Name, Person = p2 }, 
    equalityComparer);

foreach (var item in groupJoinItems)
{
    Console.WriteLine("Name {0}", item.Name);
    foreach (var groupItem in item.Person)
    {
        var secondname = groupItem.Name;
        Console.WriteLine(" Name {0} / Age {1}", 
            groupItem.Name, groupItem.Age);
    }
}

Result:

Here is the implementation of the Comparer.

public class Comparer<T> : IComparer<T>
{
    private Func<T, T, int> comparerFunction;

    public Comparer(Func<T, T, int> comparerFunction)
    {
        this.comparerFunction = comparerFunction;
    }

    public int Compare(T a, T b)
    {
        if (object.ReferenceEquals(a, b))
            return 0;
        return comparerFunction(a, b);
    }
}

And the use of the Comparer<T>

List<Person> personList = new List<Person>() { 
    new Person() { Name = "karl", Age = 36 }, 
    new Person() { Name = "sepp", Age = 23 }, 
    new Person() { Name = "emil", Age = 3 }, 
    new Person() { Name = "mia", Age = 2 }, 
    new Person() { Name = "hans", Age = 26 } };

Comparer<Person> comparer =
    new Comparer<Person>(
        (p1, p2) => (p1.Age < p2.Age) ? -1 : (p1.Age == p2.Age) ? 0 : 1);        

personList.Sort(comparer);

foreach (var item in personList)
{
    Console.WriteLine("Name {0} / Age {1}", item.Name, item.Age);                
}

Result:

Using this generic version of IEqualityComparer<T> and IComparer<T> has two benefits. The first benefit is that you dont need to implement objects that implement the interfaces over and over again. The second benefit is that you can use lambda expression syntax to generate the body of the Equals and Compare methodes.