The System.Linq.Distinct extension returns a list (or more to the point, an IEnumerable<>) of ... surprise ... distinct items in a list.
This is all well and good for simple data types (e.g. integers). But what if you have more complex data types, and you want to get a distinct list based on a specific property.
Well - Distinct extension enumerates the Object.Equals method on all items in the list, and returns a new list list based on the result. An object is only equal to another object if they have the same hash code. So running List<T>.Distinct() would only exclude items from List<T> where T are the same object, and not items with the same information.
Say you have a Student type:
1 2 3 4 5 6 7 8 9 10 11 | public class Student
{
public long ID { get ; set ; }
public string Name { get ; set ; }
public Student( long id, string name)
{
ID = id;
Name = name;
}
}
|
And a Class:
1 2 3 4 | public class Class
{
private List<Student> _students { get ; } = new List<Student>() { new Student(1, "John Doe" ), new Student(1, "John Doe" ), new Student(2, "Jane Doe" ) };
}
|
Running
1 | IEnumerable<Student> distinctStudent = _students.Distinct();
|
would give you 3 items, as both "John Doe" have unique HashCode - they are not the same object.
There are two ways of getting about this.
Either override Object.Equals(object obj) and Object.GetHashCode() on the Student type:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | public class Student
{
public long ID { get ; set ; }
public string Name { get ; set ; }
public Student( long id, string name)
{
ID = id;
Name = name;
}
public override bool Equals( object obj)
{
var student = obj as Student;
return this .ID.Equals(student?.ID);
}
public override int GetHashCode()
{
return ID.GetHashCode();
}
public override string ToString()
{
return Name;
}
}
|
This executes fairly fast, but you would only be able to get unique items based on their ID.
If you want a more customisable way, you could write an extension:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | public static class Extensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>( this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource item in source)
{
if (seenKeys.Add(keySelector(item)))
{
yield return item;
}
}
}
}
|
and user it like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | public class Class
{
private List<Student> _students { get ; } = new List<Student>() { new Student(1, "John Doe" ), new Student(1, "John Doe" ), new Student(2, "Jane Doe" ) };
public IEnumerable<Student> GetDistinct()
{
return _students.Distinct();
}
public IEnumerable<Student> GetDistinctBy()
{
return _students.DistinctBy(s => s.ID);
}
public IEnumerable<Student> GetDistinctByMore()
{
return _students.DistinctBy(s => new { s.ID, s.Name });
}
}
|
As always, feel free to comment, or ask.