Why is there no ExInterlockedRemoveEntryList?

A long time ago, I promised an entry on why there is no ExInterlockedRemoveEntryList function. If you search the NTDEV archives (or if you got to hear Peter Viscarola from OSR discuss it at one of the Driver DevCons a while back), you know that Microsoft left the function out intentionally due to its potential for misuse.

To understand why this is, consider one of the nice properties of a doubly-linked list: constant-time removal of an item from the middle, if you already know the item’s address. The list entries look something like this:

typedef struct _LIST_ENTRY
{
	struct _LIST_ENTRY *Flink;
	struct _LIST_ENTRY *Blink;
}
LIST_ENTRY, *PLIST_ENTRY;

To do a remove operation, you would simply point the next item (Flink) to the previous item (Blink) and vice-versa. No need to walk a long list of items. There’s even a macro to do this for you: RemoveEntryList.

This process is subject to an obvious race condition and another less obvious one. The obvious race is that two different threads could try to mutate the list simultaneously. The naïve solution is to wrap the removal in locks:

LockList();             // Spin lock, mutex, whatever...
RemoveEntryList(item);
UnlockList();

That does indeed prevent two threads from making simultaneous updates, but it misses another important problem: What if the entry you’re trying to remove is no longer on the list? What if another thread has just finished removing the same item, right before your call to LockList above? You certainly have no idea if the item’s neighbors are still valid after the item has been removed from the list, so you could easily trash the list.

The only safe way to do this is to ensure that the item is still a part of the list at the time you remove it. And the only safe way to do that is to walk the list from a point that you know will always be on the list, namely, from the head.

There are, of course, situations in which you can be sure, due to other semantics of the program in question, that your item really is still on the list. In those cases, the pattern above is safe.

But in other situations, you have to walk the entire list. This can be expensive, and has to be done under the protection of whatever lock you’re using. For a list with thousands of entries on it, you would want to avoid this whenever possible, and you should probably try to set up whatever extra bookkeeping you need to take advantage of O(1) removal. But, you don’t get that bookkeeping automatically, so that’s why there’s no ExInterlockedRemoveEntryList.

4 Responses to “Why is there no ExInterlockedRemoveEntryList?”

  1. strik says:

    But an ExInterlockedRemoveEntryList() could set Flink and Blink to NULL after removal, and test both before removal. This would be safe (if I did not miss anything in the 2 seconds I thought about it).

  2. dispensa says:

    Yep, that would work. The problem is that RemoveEntryList does NOT do that, and you’d have to make sure that nobody removed anything either. And, you can imagine someone depending somehow on the fact that the addresses aren’t zero (“has this ever been on the list?”).

    None of these are unsolvable, obviously, so you’re right in the end.

  3. Raymond says:

    Even setting Flink and Blink to NULL after removal doesn’t fix it. What if somebody removed the item from the list **and added it to a different list**. Now you just removed the item from the wrong list.

  4. [...] To continue on yesterday’s discussion of interlocked lists, let’s explore the nature of the interlocking done by the ExInterlocked* APIs. The ExInterlockedRemoveHeadList documentation says the following about its spin lock argument: You must use this spin lock only with the ExInterlockedXxxList routines. [...]

Leave a Reply