dimanche 4 février 2018

Poor performance comparing collections of objects using reflections and Linq Except/Intersect

I have written an app which compares two collections of objects (of the same type) and works out the similarities and the differences by comparing the objects using values of their properties (or combinations of their properties). This app was never intended to scale above 10000 objects in either of the collections and it was accepted that this a long running operation. The business requirement has now changed and we need to be able to compare up to 50000 (with a stretch target of up to 100000) of objects in either of the collections.

Below is minimal example of a type to be compared.

    internal class Employee
    {
        public string ReferenceCode { get; set; }
    }

For this purpose I have written a custom equality comparer for this type which takes a property name as a constructor parameter. Reason for parameterizing this was to avoid writing different equality comparers for each property of each type (which were was a fair amount and also this sounded like a neat solution).

   public class EmployeeComparerDynamic : IEqualityComparer<Employee>
    {
        string PropertyNameToCompare { get; set; }
        public EmployeeComparerDynamic(string propertyNameToCompare)
        {
            PropertyNameToCompare = propertyNameToCompare;
        }

        public bool Equals(Employee x, Employee y)
        {
            return y.GetType().GetProperty(PropertyNameToCompare).GetValue(y) != null 
                && x.GetType().GetProperty(PropertyNameToCompare).GetValue(x)
                .Equals(y.GetType().GetProperty(PropertyNameToCompare).GetValue(y));
        }

        public int GetHashCode(Employee x) 
        {
            unchecked
            {
                int hash = 17;
                hash = hash * 23 + x.GetType().GetProperty(PropertyNameToCompare).GetHashCode();
                return hash;
            }
        }
    }

Using this equality comparer I have been comparing collections of objects using LINQ Intersect and Except functions.

        var intersectingEmployeesLinq = firstEmployeeList
            .Intersect(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();

        var deltaEmployeesLinq = firstEmployeeList
            .Except(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();

This was all working nicely until the scaling limit requirement increased and I have noticed that my app is performing very poorly with large collections of objects.

Initially, I thought that this is just normal and there is likely to be a significant increase in the overall time to complete, however, when I've tried looping through one list manually and comparing the item to check if such item exists in the other list - I have noticed that my own implementation of what LINQ Except and Intersect achieves in the context of my app is yielding the same results, but performing a lot better.

        var intersectingEmployeesManual = new List<Employee>();           

        foreach (var employee in firstEmployeeList)
        {
            if (secondEmployeeList.Any(x => x.ReferenceCode == employee.ReferenceCode))
                intersectingEmployeesManual.Add(employee);
        }

This was performing significantly better (about 30 times) compared to the implementation in the earlier snippet. Of course, the earlier snippet used reflection to get the value of the property, so I also tried that.

        var intersectingEmployeesManual = new List<Employee>();

        foreach (var employee in firstEmployeeList)
        {
            if (secondEmployeeList.Any(x => x.GetType()
                .GetProperty("ReferenceCode").GetValue(x)
                .Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
                intersectingEmployeesManual.Add(employee);
        }

This was still performing about 2-3 times better. Lastly, I have then wrote another equality comparer, but instead of parameterizing the property, it was comparing against a predefined property of a type.

    public class EmployeeComparerManual : IEqualityComparer<Employee>
    {
        public bool Equals(Employee x, Employee y)
        {
            return y.ReferenceCode != null
                   && x.ReferenceCode.Equals(y.ReferenceCode);
        }

        public int GetHashCode(Employee x)
        {
            unchecked
            {
                int hash = 17;
                hash = hash * 23 + x.ReferenceCode.GetHashCode();
                return hash;
            }
        }
    }

And the corresponding code to work out the intersection and the delta objects.

        var intersectingEmployeesLinqManual = firstEmployeeList
            .Intersect(secondEmployeeList, new EmployeeComparerManual()).ToList();

        var deltaEmployeesLinqManual = firstEmployeeList
            .Except(secondEmployeeList, new EmployeeComparerManual()).ToList();

Finally, I started getting the scaling that I was looking for with this implementation, but additionally I have done some benchmarking using 10 different machines. The results are as per below (averaged, in milliseconds rounded to the nearest millisecond).

    +-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
    |       | List Items  | Intersect | Intersect Dynamic | Except | Except Dynamic | Intersect Linq | Intersect Linq Dynamic | Except Linq | Except Linq Dynamic |
    +-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
    | Run 1 | 5000/4000   |       479 |              7440 |    340 |           7439 |              1 |                  14583 |           2 |               15257 |
    | Run 2 | 10000/8000  |      2177 |             32489 |   1282 |          29290 |              1 |                  59154 |           2 |               74170 |
    | Run 3 | 20000/16000 |      6758 |            116266 |   4578 |         116720 |              5 |                 225960 |           3 |              295146 |
    | Run 4 | 50000/40000 |     34457 |            720023 |  30693 |         731690 |             14 |                1483084 |          14 |             1657832 |
    +-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+

So, my summary so far is:

  • Using reflections to get the value of a property adds overhead with a factor between 15-20
  • Using reflections in the equality comparer and LINQ Except or Intersect adds overhead with a factor of 2-3

My outstanding questions are:

  • Is using reflection to get the value of property really adds that much overhead or am I missing a piece of a puzzle here?
  • Why am I only getting the promised O(n+m) overall effort when using LINQ with an equality comparer which does not use reflection?
  • Is there a hope for me to find and approach where I can have an equality comparer per type and somehow parameterize the property I am comparing by instead of an equality comparer per type per property?
  • Side question - Why using reflections in the equality comparer combined with LINQ Except or Intersect added additional overhead compared to my own basic implementation of just iterating through the list comparing everything with everything?

Lastly, a full reproducible example below:

class Program
{
    static void Main(string[] args)
    {
        StackOverflow();
    }

    private static void StackOverflow()
    {
        var firstEmployeeList = CreateEmployeeList(5000);
        var secondEmployeeList = CreateEmployeeList(4000);

        var intersectingEmployeesManual = new List<Employee>();
        var sw = new Stopwatch();

        //Intersecting employees - comparing predefined property
        sw.Start();
        foreach (var employee in firstEmployeeList)
        {
            if (secondEmployeeList.Any(x => x.ReferenceCode == employee.ReferenceCode))
                intersectingEmployeesManual.Add(employee);
        }
        sw.Stop();
        Console.WriteLine("Intersecting Employees Manual: " + sw.ElapsedMilliseconds);
        intersectingEmployeesManual.Clear();
        sw.Reset();

        //Intersecting employees - comparing dynamic property
        sw.Start();
        foreach (var employee in firstEmployeeList)
        {
            if (secondEmployeeList.Any(x => x.GetType()
                .GetProperty("ReferenceCode").GetValue(x)
                .Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
                intersectingEmployeesManual.Add(employee);
        }
        sw.Stop();
        Console.WriteLine("Intersecting Employees Manual (dynamic property): " + sw.ElapsedMilliseconds);
        sw.Reset();

        //Delta Employees - comparing predefined property
        var deltaEmployeesManual = new List<Employee>();
        sw.Start();
        foreach (var employee in firstEmployeeList)
        {
            if (secondEmployeeList.All(x => x.ReferenceCode != employee.ReferenceCode))
                deltaEmployeesManual.Add(employee);
        }
        sw.Stop();
        Console.WriteLine("Delta Employees Manual: " + sw.ElapsedMilliseconds);
        sw.Reset();
        deltaEmployeesManual.Clear();

        //Delta Employees - comparing dynamic property
        sw.Start();
        foreach (var employee in firstEmployeeList)
        {
            if (secondEmployeeList
                .All(x => !x.GetType().GetProperty("ReferenceCode").GetValue(x)
                .Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
                deltaEmployeesManual.Add(employee);
        }
        sw.Stop();
        Console.WriteLine("Delta Employees Manual (dynamic property): " + sw.ElapsedMilliseconds);
        sw.Reset();

        //Intersecting employees Linq - dynamic property
        sw.Start();
        var intersectingEmployeesLinq = firstEmployeeList
            .Intersect(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
        sw.Stop();
        Console.WriteLine("Intersecting Employees Linq (dynamic property): " + sw.ElapsedMilliseconds);
        sw.Reset();

        //Intersecting employees Linq - manual property
        sw.Start();
        var intersectingEmployeesLinqManual = firstEmployeeList
            .Intersect(secondEmployeeList, new EmployeeComparerManual()).ToList();
        sw.Stop();
        Console.WriteLine("Intersecting Employees Linq (manual property): " + sw.ElapsedMilliseconds);
        sw.Reset();

        //Delta employees Linq - dynamic property
        sw.Start();
        var deltaEmployeesLinq = firstEmployeeList
            .Except(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
        sw.Stop();
        Console.WriteLine("Delta Employees Linq (dynamic property): " + sw.ElapsedMilliseconds);
        sw.Reset();

        //Delta employees Linq - manual property
        sw.Start();
        var deltaEmployeesLinqManual = firstEmployeeList
            .Except(secondEmployeeList, new EmployeeComparerManual()).ToList();
        sw.Stop();
        Console.WriteLine("Delta Employees Linq (manual property): " + sw.ElapsedMilliseconds);
        sw.Reset();

        Console.WriteLine("Finished");
        Console.ReadLine();

    }

    private static List<Employee> CreateEmployeeList(int numberToCreate)
    {
        var employeList = new List<Employee>();
        for (var i = 0; i < numberToCreate; i++)
        {
            employeList.Add(new Employee
            {
                ReferenceCode = i.ToString()
            });
        }
        return employeList;
    }

    internal class Employee
    {
        public string ReferenceCode { get; set; }
    }

    public class EmployeeComparerDynamic : IEqualityComparer<Employee>
    {
        string PropertyNameToCompare { get; set; }
        public EmployeeComparerDynamic(string propertyNameToCompare)
        {
            PropertyNameToCompare = propertyNameToCompare;
        }

        public bool Equals(Employee x, Employee y)
        {
            return y.GetType().GetProperty(PropertyNameToCompare).GetValue(y) != null 
                && x.GetType().GetProperty(PropertyNameToCompare).GetValue(x)
                .Equals(y.GetType().GetProperty(PropertyNameToCompare).GetValue(y));
        }

        public int GetHashCode(Employee x) 
        {
            unchecked
            {
                int hash = 17;
                hash = hash * 23 + x.GetType().GetProperty(PropertyNameToCompare).GetHashCode();
                return hash;
            }
        }
    }

    public class EmployeeComparerManual : IEqualityComparer<Employee>
    {
        public bool Equals(Employee x, Employee y)
        {
            return y.ReferenceCode != null
                   && x.ReferenceCode.Equals(y.ReferenceCode);
        }

        public int GetHashCode(Employee x)
        {
            unchecked
            {
                int hash = 17;
                hash = hash * 23 + x.ReferenceCode.GetHashCode();
                return hash;
            }
        }
    }
}





Aucun commentaire:

Enregistrer un commentaire