Merge collections without duplicates in C#

Your task is to merge two lists of objects. The resulting collection should be without duplicates, based on a certain property on the objects.

The lists are populated with very simple Person objects.

1 2 3 4 5 class Person { public int Number { get; set; } public string Name { get; set; } }


C# programmers often turn to LINQ, and they should! With LINQ you could end up with something like this:

1 2 3 var merged = new List<Person>(list1); merged.AddRange(list2.Where(p2 => list1.All(p1 => p1.Number != p2.Number)));


It’s not all that obvious how it works and it also has performance problems. This solution has to iterate over list1 for every object in list2. Not every object every time but on average half of them (if the lists contains no duplicates within themselves). If the lists contains a 1000 objects each there’s a good chance that list1 will be iterated 500.000 times. That kind of work does not come for free.

We need to get rid of this nested looping. I was pretty sure a dictionary based solution would do the trick.

1 2 3 4 5 6 var dict = list2.ToDictionary(p => p.Number); foreach (var person in list1) { dict[person.Number] = person; } var merged = dict.Values.ToList();


This solution converts list2 to a dictionary and then it loops a single time over list1. It either adds to the dictionary or replaces the value of list2 if it was a duplicate. This seemed to be a much more efficient algorithm.

In C# there’s not just Dictionary<TKey, TValue> but we also have HashSet. A HashSet is a collection that contains no duplicates. It also has the UnionWithmethod and that is exactly what we are looking for. The problem is that it compares object to object and we want to compare based on a property.

Fortunately the HashSet methods honors the default equality comparer. Such a beast could look like this.

1 2 3 4 5 6 7 8 9 10 11 12 class PersonComparer : IEqualityComparer<Person> { public bool Equals(Person p1, Person p2) { return p1.Number == p2.Number; } public int GetHashCode(Person p) { return p.Number; } }


Then just add it as the second parameter to the constructor.

1 2 3 var hs = new HashSet<Person>(list1, new PersonComparer()); hs.UnionWith(list2); var merged = hs.ToList();


Set theory tells us that this merging is actually a union of the two sets (without duplicates) and LINQ happens to have a Union method.

var merged = list1.Union(list2, new PersonComparer());


By that we’ve come full circle and it’s time to see how the different solutions performs. I wrote a small test program that creates two lists of objects with 1000 objects in each. In each list 500 of the objects property, that we base the merge on, has the same value.

Lists and LINQ merge: 4820ms
Dictionary merge: 16ms
HashSet and IEqualityComparer: 20ms
LINQ Union and IEqualityComparer: 24ms

The first solution is, in this case, 300 times slower than the fastest one!

I prefer LINQ Union because it communicates intent very clearly.

https://alicebobandmallory.com/articles/2012/10/18/merge-collections-without-duplicates-in-c

Books

  • Code Complete, 2nd edition. It’s about the construction of software and the design that goes in the small parts of the code you write. The book is huge, so reading it might take a while, but it’s still very relevant and useful.
  • The Clean series (Code, Coder, Architecture), those 3 books deal with different aspects of the profession, but I enjoyed all 3 and learned many important lessons from them.
  • The pragmatic programmer, a book full of important lessons and ideas, it’s a relatively quick read compared to the other books on the list, but the lessons are as valuable.
  • The mythical man-month, it might be the most important book on the list, full of important lessons about the lifecycle of a software project and the profession itself. I believe Fred Brooks said everything decades ago and we still make the same mistakes.
  • Design Patterns: Elements of Reusable Object-Oriented Software(GOF): It’s the original design-patterns book, a bit old but still an excellent reference. Being familiar with design patterns gives you a good toolbox for tackling most design challenges, even if the pattern doesn’t match your problem perfectly, you have a solid base for building a solution.
  • Head First Design Patterns(Robson and Freeman): I think this one is much easier to digest than the GOF book. Read both, maybe start with this one and when you become familiar with the contents use the GOF book as a reference.
  • Practical Object-Oriented Design: An Agile Primer in Ruby(Sandi Metz): It’s more hands-on than the other ones, full of examples and refactoring. It starts with an introduction to design thinking and why it’s important to spend time and effort on good architecture and design. Don’t let the ruby part discourage you from using it, the lessons you’ll learn are applicable to all other OO languages.
  • Domain Driven Design(Evans): DDD is a very useful approach, and it pays off to be familiar with it. Reading this one will go a long way in helping you devise solutions that match well the domain problem and are easy to evolve.

Replace switch statement with strategy pattern

using NLog;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Net.Http;
using System.Net.Http.Formatting;
using System.Web.Http.Filters;
using TaxMs.DTOs;
using TaxMs.Infrastructure.CustomExceptions;
using TaxMs.Infrastructure.Extensions;

namespace TaxMs.ApiNet.Filters
{
    public class AlertServiceFilter : ExceptionFilterAttribute
    {
        private readonly Logger _logger = LogManager.GetCurrentClassLogger();

        public HttpStatusCode GetStatusCodeFromException(Exception exc)
        {
            if (exc is ArgumentException || exc is FileNotFoundException)
                return HttpStatusCode.BadRequest;
            if (exc is InvalidOperationException || exc is NotFoundException)
                return HttpStatusCode.NotFound;
            if (exc is FormatException)
                return HttpStatusCode.ExpectationFailed;
            return HttpStatusCode.InternalServerError;
        }

        private List GetReasons(Exception exc)
        {
            var outList = new List();
            if (exc is ArgumentException)
            {
                var currExc = exc as ArgumentException;
                foreach (DictionaryEntry item in currExc.Data)
                    outList.Add(item.Value.ToString());
            }
            else
                outList.Add(exc.GetInnerMostException().Message);

            return outList;
        }

        public override void OnException(HttpActionExecutedContext cont)
        {
            //cont.Response = new HttpResponseMessage(GetStatusCodeFromException(cont.Exception));
            var a = new ExceptionStrategy();
            cont.Response = new HttpResponseMessage(a.GetStatusCode(cont.Exception.GetType()));


            cont.Response.Content = new ObjectContent(new ErrorInfo
            {
                Date = DateTime.Now,
                Status = cont.Response.StatusCode.ToInt(),
                Reasons = GetReasons(cont.Exception),
                Message = cont.Exception.Message
            }, new JsonMediaTypeFormatter());

            _logger.Error(cont.Exception, cont.Exception.Message);
        }
    }
    /////////////////////////////////////////////////////////////////////////////////////////////////
    #region Strategy Interface

    public interface IHttpResponseMessageStrategy
    {
        HttpStatusCode GetStatusCode();
    }

    #endregion

    #region Concrete Classes

    public class ExpectationFailedExceptionStrategy : IHttpResponseMessageStrategy
    {
        #region IHttpResponseMessageStrategy Members

        public HttpStatusCode GetStatusCode()
        {
            return HttpStatusCode.ExpectationFailed;
        }

        #endregion
    }

    public class NotFoundExceptionStrategy : IHttpResponseMessageStrategy
    {
        #region IHttpResponseMessageStrategy Members

        public HttpStatusCode GetStatusCode()
        {
            return HttpStatusCode.NotFound;
        }

        #endregion
    }

    public class BadRequestExceptionStrategy : IHttpResponseMessageStrategy
    {
        #region IHttpResponseMessageStrategy Members

        public HttpStatusCode GetStatusCode()
        {
            return HttpStatusCode.BadRequest;
        }

        #endregion
    }

    public class InternalServerErrorExceptionStrategy : IHttpResponseMessageStrategy
    {
        #region IHttpResponseMessageStrategy Members

        public HttpStatusCode GetStatusCode()
        {
            return HttpStatusCode.InternalServerError;
        }

        #endregion
    }
    #endregion

    #region Context

    public class ExceptionStrategy
    {
        #region Members

        private readonly Dictionary _strategies = new Dictionary();

        #endregion

        #region Ctor

        public ExceptionStrategy()
        {
            _strategies.Add(typeof(ArgumentException), new BadRequestExceptionStrategy());
            _strategies.Add(typeof(FileNotFoundException), new BadRequestExceptionStrategy());
            _strategies.Add(typeof(FormatException), new ExpectationFailedExceptionStrategy());
            _strategies.Add(typeof(NotFoundException), new NotFoundExceptionStrategy());
            _strategies.Add(typeof(InvalidOperationException), new NotFoundExceptionStrategy());
            _strategies.Add(typeof(Exception), new InternalServerErrorExceptionStrategy());
        }

        #endregion

        #region Methods

        public HttpStatusCode GetStatusCode(Type type)
        {
            return _strategies.ContainsKey(type) ? _strategies[type].GetStatusCode() : _strategies[typeof(Exception)].GetStatusCode();
        }

        #endregion
    }

    #endregion

}
using NUnit.Framework;
using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using TaxMs.ApiNet.Filters;
using TaxMs.DTOs;
using TaxMs.Infrastructure.Extensions;

namespace TaxMs.ApiNet.Tests.Filters
{
    public class AlertFilterServiceTests
    {
        public AlertServiceFilter _sut;
        public AlertFilterServiceTests()
        {
            _sut = new AlertServiceFilter();
        }

        [Test]
        public void Should_Log_ArgumentException()
        {
            var argException = new ArgumentException("Model validation exception !");
            argException.Data.Add("Mail", "Mail is required");
            argException.Data.Add("Name", "Name is required");

            var fakeInput = new System.Web.Http.Filters.HttpActionExecutedContext
            {
                Exception = argException,
                ActionContext = new System.Web.Http.Controllers.HttpActionContext(),
                Response = new HttpResponseMessage()
            };

            _sut.OnException(fakeInput);

            var expectdInput = fakeInput.Response.Content as ObjectContent;
            var expectedBody = expectdInput.Value as ErrorInfo;

            Assert.IsNotNull(expectdInput);
            Assert.AreEqual(expectedBody.Status, HttpStatusCode.BadRequest.ToInt());
            Assert.AreEqual(expectedBody.Reasons.Count, 2);
            Assert.AreEqual(expectedBody.Reasons[0], "Mail is required");
            Assert.AreEqual(expectedBody.Reasons[1], "Name is required");
        }

        [Test]
        public void Should_Log_Generic_Exception()
        {
            var argException = new Exception("Generic Exception");

            var fakeInput = new System.Web.Http.Filters.HttpActionExecutedContext()
            {
                Exception = argException,
                ActionContext = new System.Web.Http.Controllers.HttpActionContext()
            };

            fakeInput.Response = new HttpResponseMessage();

            _sut.OnException(fakeInput);

            var expectdInput = fakeInput.Response.Content as ObjectContent;
            var expectedBody = expectdInput.Value as ErrorInfo;

            Assert.IsNotNull(expectdInput);
            Assert.AreEqual(expectedBody.Status, HttpStatusCode.InternalServerError.ToInt());
            Assert.AreEqual(expectedBody.Reasons.Count, 1);
            Assert.AreEqual(expectedBody.Reasons.First(),"Generic Exception");
        }
    }
}