Sometimes being Lazy is a good thing

...or "making ConcurrentDictionary.GetOrAdd a bit safer".

How many times have you written the "double-check locking" code when adding stuff to a dictionary? You know, the "does the dictionary contain this key? No, ok, so we take a lock. How about now, the key is still not present? No? OK, good, then let’s add it." drill. Code-wise, it would typically look something like this:

if (!dictionary.ContainsKey(key))
{
    lock (dictionaryLock)
    {
        if (!dictionary.ContainsKey(key))
        {
            dictionary.Add(key, value);
        }
    }
}

It’s not particularly pretty, is it? The signal-to-noise ratio of that code is pretty low, so it’s nice to see that something was done in the base class library to address this not uncommon scenario: the ConcurrentDictionary. Using one of those instead, the above code could be distilled down to this:

dictionary.TryAdd(key, value);

Yes, that’s it. This will check if the key exists in the dictionary and if it doesn’t the value will be inserted and associated with that key. All done in a manner that will guarantee that the value will be inserted only once.

In many cases, you will also want to use the value that is either already present in the dictionary, or the one that you just inserted. What I used to do with the old Dictionary was to simply add a line after all the double-check locking code that would pull out he value from the dictionary:

if (!dictionary.ContainsKey(key))
{
    lock (dictionaryLock)
    {
        if (!dictionary.ContainsKey(key))
        {
            dictionary.Add(key, value);
        }
    }
}

var valueToUse = dictionary[key];

While you can still do that with the ConcurrentDictionary (first add, then fetch the value), there is a more convenient way:

var valueToUse = dictionary.GetOrAdd(key, value);

This will check in the dictionary if the key is present. If the key was not present, the value that was provided in the second argument will be inserted. Finally the value associated with the key is returned. Again, this is done in a way that ensures that only one attempt to insert he key into the dictionary is done.

So, what’s the problem?

Now, let’s say that the second argument, the default value, is in fact a method call that will spawn some process of some kind, and that we would not want to start that task unless the key does not already exist. Well, luckily the designers of the ConcurrentDictionary thought about this, so they added an overload taking a Func that can provide the default value:

Task task = dictionary.GetOrAdd(key, k => CreateTaskForKey(k));

This means that the method CreateTaskForKey will not be executed unless the Func is called. That should work, right? Well, it might. And it might not. It is certainly not safe. You see, there is no guarantee that the Func will be executed only once for each key. The only guarantee is that each key (with an associated value) will be inserted into the dictionary only once.

It turns out that the designers have aimed for minimal locking. So the process goes roughly like this:

  1. Take a lock, check if the key is present, release the lock
  2. If the key was not present, call the Func producing the value
  3. Take a lock, check if the key is present and, if it’s not, insert the key and value

This means that if two parallel calls - A and B - came in, A could see that the key is not there, and while A is busy producing a value, B checks and also finds that the key is not there. So B also moves along to produce the value. Then A will have the value ready, check and find that the key is still not present and insert the key and value. Then B also gets the value ready, and goes to check the key only to find that now it’s there. So B throws away its value and GetOrAdd will return the value inserted by A.

As you see, the state of the dictionary itself is safe; only one value is inserted for each key, but there are side effects. So I wanted to solve this so that the task would be started only if it also inserted in the dictionary. I also found this to be an interesting case for some TDD exercise, since it offers a bit of a challenge when it comes to writing a test that would provoke the problem.

In order to test this, my thoughts went something like this:

  • First, I want to create a test where I can observe the behavior that the value factory method is invoked twice for the same key. The purpose of this test would be to prove the problem, and to prove that we can create the problematic state in code.
  • Then, I want to create another test that will expect that the value factory is invoked only once for each key. The purpose of this test is to prove the solution.

The first task was to write a test the would guarantee that this race condition occurred. In order to achieve that I would have to have some code that would call GetOrAdd with a value factory that would wait before returning its value, until it was told to continue. That way I can make two concurrent calls, wait until I know that both are inside the value factory, in parallel. When that happens, I have proven he problem. This is what I came up with:

[Test]
public void GetOrAdd_ValueFactoryCalledTwiceForSameKey()
{
    var valueFactoryWaitHandles = new[]
    {
        new ManualResetEvent(false),
        new ManualResetEvent(false)
    };

    var factoryReleaseWaitHandle = new ManualResetEvent(false);

    int waitCount = 0;

    // This value factory method will called twice, each time
    // signaling one of the factory wait handles, and then waiting
    // for the factory release wait handle to be signaled before
    // returning the value.
    Func valueFactory = number =>
    {
        int index = Interlocked.Increment(ref waitCount) - 1;
        valueFactoryWaitHandles[index].Set();
        factoryReleaseWaitHandle.WaitOne();
        return number;
    };

    int key = 42;
    var dictionary = new ConcurrentDictionary();

    // Start the first task that will attempt to add a value
    // using the value factory (that will wait for a signal before
    // returning the value).
    Task task1 = Task.Factory.StartNew(() =>
    {
        dictionary.GetOrAdd(key, valueFactory);
    });

    // Start the second task
    Task task2 = Task.Factory.StartNew(() =>
    {
        dictionary.GetOrAdd(key, valueFactory);
    });

    // Wait for the value factory wait handles to be signaled
    foreach (ManualResetEvent mre in valueFactoryWaitHandles)
    {
        mre.WaitOne();
    }

    // Signal the factory release handle...
    factoryReleaseWaitHandle.Set();

    // ...and wait for the tasks to finish
    task1.Wait();
    task2.Wait();

    // Verify that the factory was called twice, but the value
    // inserted only once.
    Assert.AreEqual(2, waitCount);
    Assert.AreEqual(1, dictionary.Count);
}

There, now I had a reliable way of producing the problem. This code clearly points out two things;

  1. The value factory method is invoked twice for the same key.
  2. The dictionary contains only one value afterwards.

That was the first step. Now on to...

The Solution

One prerequisite is that I could of course not alter he behavior of the ConcurrentDictionary as such, but instead needed to focus on how to handle the side effects. I needed to find a way to allow for the value factory to execute, but push the side effects forward until after the value has been inserted into the dictionary. As already mentioned; only one value for each key is inserted into the dictionary, even if the value factory is called several times. Sounds like a job for Lazy, right?

Consider these two cases:

// case A
Task task = Task.Factory.StartNew(() => DoStuff());

// case B
Lazy lazyTask = new Lazy(() => Task.Factory.StartNew(() => DoStuff()));
Task task = lazyTask.Value;
        

Both cases will produce a task that is started, but in case B, the task is not produced and started until we access the Value property of lazyTask. That's nice, since it means that we can set up everything we need around producing the task, but not actually create it and start it until we say so.

This means that if we create two identical Lazy objects, but fetch the Value from just one of them, only one of the tasks will be created and started. So, by changing our ConcurrentDictionary> into a ConcurrentDictionary>> we shield ourselves from he side effects caused by the GetOrAdd value factory to be invoked several times. The only thing that happens is that we create a bunch of Lazy objects, but only one of them will find its way into the dictionary.

Since GetOrAdd will return either the already present object, or the one that is inserted as a result of he call, all calls to GetOrAdd for a given key will return the same Lazy instance.

As often when I want to solve a situation like this, I want to find a solution that will require as little changes as possible in the code. In this case I decided to make an extension method. I started from the first test, and tried to make minimal changes for a new test that would verify the behavior where the task-producing method was invoked only once:

[Test]
public void LazyGetOrAdd_ValueFactoryCalledOnceForSameKey()
{
    ManualResetEvent valueFactoryWaitHandle = new ManualResetEvent(false);

    var factoryReleaseWaitHandle = new ManualResetEvent(false);

    int waitCount = 0;
    Func valueFactory = number =>
    {
        Interlocked.Increment(ref waitCount);
        valueFactoryWaitHandle.Set();
        factoryReleaseWaitHandle.WaitOne();
        return number;
    };

    int key = 42;
    var dictionary = new ConcurrentDictionary>();

    Task task1 = Task.Factory.StartNew(() =>
    {
        dictionary.LazyGetOrAdd(key, valueFactory);
    });
    valueFactoryWaitHandle.WaitOne();

    Task task2 = Task.Factory.StartNew(() =>
    {
        dictionary.LazyGetOrAdd(key, valueFactory);
    });

    factoryReleaseWaitHandle.Set();

    task1.Wait();
    task2.Wait();

    Assert.AreEqual(1, waitCount);
    Assert.AreEqual(1, dictionary.Count);
}

As you can see, I chose to call the extension method LazyGetOrAdd, and here is how I implemented it:

public static class ConcurrentDictionaryExtensions
{
    public static TValue LazyGetOrAdd(
        this ConcurrentDictionary> dictionary,
        TKey key,
        Func valueFactory)
    {
        if (dictionary == null) throw new ArgumentNullException("dictionary");
        var result = dictionary.GetOrAdd(key, new Lazy(() => valueFactory(key)));
        return result.Value;
    }
}