The System.Linq.Distinct extension returns a list (or more to the point, an IEnumerable<>) of ... surprise ... distinct items in a list.
This is all well and good for simple data types (e.g. integers). But what if you have more complex data types, and you want to get a distinct list based on a specific property.
Well - Distinct extension enumerates the Object.Equals method on all items in the list, and returns a new list list based on the result. An object is only equal to another object if they have the same hash code. So running List<T>.Distinct() would only exclude items from List<T> where T are the same object, and not items with the same information.
Say you have a Student type:
public class Student
{
public long ID { get; set; }
public string Name { get; set; }
public Student(long id, string name)
{
ID = id;
Name = name;
}
}
And a Class:
public class Class
{
private List<Student> _students { get; } = new List<Student>() { new Student(1, "John Doe"), new Student(1, "John Doe"), new Student(2, "Jane Doe") };
}
Running
IEnumerable<Student> distinctStudent = _students.Distinct();
would give you 3 items, as both "John Doe" have unique HashCode - they are not the same object.
There are two ways of getting about this.
Either override Object.Equals(object obj) and Object.GetHashCode() on the Student type:
public class Student
{
public long ID { get; set; }
public string Name { get; set; }
public Student(long id, string name)
{
ID = id;
Name = name;
}
public override bool Equals(object obj)
{
var student = obj as Student;
return this.ID.Equals(student?.ID);
}
public override int GetHashCode()
{
return ID.GetHashCode();
}
public override string ToString()
{
return Name;
}
}
This executes fairly fast, but you would only be able to get unique items based on their ID.
If you want a more customisable way, you could write an extension:
public static class Extensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource item in source)
{
if (seenKeys.Add(keySelector(item)))
{
yield return item;
}
}
}
}
and user it like this:
public class Class
{
private List<Student> _students { get; } = new List<Student>() { new Student(1, "John Doe"), new Student(1, "John Doe"), new Student(2, "Jane Doe") };
public IEnumerable<Student> GetDistinct()
{
return _students.Distinct();
}
public IEnumerable<Student> GetDistinctBy()
{
return _students.DistinctBy(s => s.ID);
}
public IEnumerable<Student> GetDistinctByMore()
{
return _students.DistinctBy(s => new { s.ID, s.Name });
}
}
As always, feel free to comment, or ask.