The System.Linq.Distinct extension returns a list (or more to the point, an IEnumerable<>) of ... surprise ... distinct items in a list.

This is all well and good for simple data types (e.g. integers). But what if you have more complex data types, and you want to get a distinct list based on a specific property.

Well - Distinct extension enumerates the Object.Equals method on all items in the list, and returns a new list list based on the result. An object is only equal to another object if they have the same hash code. So running List<T>.Distinct() would only exclude items from List<T> where T are the same object, and not items with the same information.

Say you have a Student type:

public class Student
{
    public long ID { get; set; }
    public string Name { get; set; }

    public Student(long id, string name)
    {
        ID = id;
        Name = name;
    }
}

And a Class:

public class Class
{
    private List<Student> _students { get; } = new List<Student>() { new Student(1, "John Doe"), new Student(1, "John Doe"), new Student(2, "Jane Doe") };
}

Running 

IEnumerable<Student> distinctStudent = _students.Distinct();

would give you 3 items, as both "John Doe" have unique HashCode - they are not the same object.

There are two ways of getting about this.

Either override Object.Equals(object obj) and Object.GetHashCode() on the Student type:

public class Student
{
    public long ID { get; set; }
    public string Name { get; set; }

    public Student(long id, string name)
    {
        ID = id;
        Name = name;
    }

    public override bool Equals(object obj)
    {
        var student = obj as Student;
        return this.ID.Equals(student?.ID);
    }
    public override int GetHashCode()
    {
        return ID.GetHashCode();
    }
    public override string ToString()
    {
        return Name;
    }
}

This executes fairly fast, but you would only be able to get unique items based on their ID.

If you want a more customisable way, you could write an extension:

public static class Extensions
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        HashSet<TKey> seenKeys = new HashSet<TKey>();
        foreach (TSource item in source)
        {
            if (seenKeys.Add(keySelector(item)))
            {
                yield return item;
            }
        }
    }
}

and user it like this:

public class Class
{
    private List<Student> _students { get; } = new List<Student>() { new Student(1, "John Doe"), new Student(1, "John Doe"), new Student(2, "Jane Doe") };

    public IEnumerable<Student> GetDistinct()
    {
        return _students.Distinct();
    }

    public IEnumerable<Student> GetDistinctBy()
    {
        return _students.DistinctBy(s => s.ID);
    }

    public IEnumerable<Student> GetDistinctByMore()
    {
        return _students.DistinctBy(s => new { s.ID, s.Name });
    }
}

As always, feel free to comment, or ask.


Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading