I actually posted quiz #10 quite a while ago but a comment with the correct solution came in so quickly that I wasn't very motivated to post a followup. There are excellent links in the comments (thank you readers!) But now I'll have to make the quizzes harder :)
The problem was to see what overhead is associated with various methods of creating flexible thread local storage. I suggested two ways of having named storage.
I've posted a sample benchmark that expands on this and shows four different approaches (some less general than others).
On my machine I observed the following times:
Test1: Named Slot 7,991ms
Test2: Numbered Slot 4,136ms
Test3: Thread-local dictionary 2,006ms
Test4: Thread-local direct 704ms
So, what's going on? Well I looked into it with our profiler and got these results which show the extra costs pretty clearly. Have a look at all the helper functions under Test1 and Test2.
Exclusive |
Inclusive |
Function Name |
0.39 % |
89.92 % |
|
Quiz10.Program.Main (string[]) |
|
0.78 % |
53.07 % |
|
0.95 % |
25.19 % |
| |
System.LocalDataStoreMgr.GetNamedDataSlot (string) |
|
0.18 % |
12.14 % |
| | |
JIT_MonReliableEnter (class Object *,bool *) |
|
5.76 % |
8.06 % |
| | |
System.Collections.Hashtable.get_Item (object) |
|
3.05 % |
3.11 % |
|
3.49 % |
22.31 % |
| |
NativeArrayMarshalerBase::NativeArrayMarshalerBase (class CleanupWorkList *) |
|
0.43 % |
5.97 % |
| | |
ThreadStore::LockDLSHash (void) |
|
0.14 % |
5.41 % |
| | |
CantAllocThreads::MarkThread (void) |
|
0.04 % |
2.80 % |
| | |
EEHashTableBase<int,class EEIntHashTableHelper,0>::FindItem (int) |
|
0.77 % |
2.19 % |
| | |
FrameWithCookie<class HelperMethodFrame_1OBJ>::FrameWithCookie<class HelperMethodFrame_1OBJ> (void *,struct LazyMachState *,unsigned int,class Object * *) |
|
0.78 % |
1.59 % |
| |
System.Threading.Thread.get_LocalDataStoreManager () |
|
0.16 % |
1.22 % |
| |
ThreadNative::GetDomainLocalStore (void) |
|
0.57 % |
1.16 % |
| |
System.LocalDataStore.GetData (class System.LocalDataStoreSlot) |
|
0.66 % |
26.72 % |
|
3.73 % |
21.79 % |
| |
NativeArrayMarshalerBase::NativeArrayMarshalerBase (class CleanupWorkList *) |
|
0.46 % |
5.79 % |
| | |
ThreadStore::LockDLSHash (void) |
|
0.18 % |
5.13 % |
| | |
CantAllocThreads::MarkThread (void) |
|
0.05 % |
3.13 % |
| | |
EEHashTableBase<int,class EEIntHashTableHelper,0>::FindItem (int) |
|
0.57 % |
1.62 % |
| | |
FrameWithCookie<class HelperMethodFrame_1OBJ>::FrameWithCookie<class HelperMethodFrame_1OBJ> (void *,struct LazyMachState *,unsigned int,class Object * *) |
|
0.11 % |
1.19 % |
| |
ThreadNative::GetDomainLocalStore (void) |
|
0.44 % |
1.08 % |
| |
System.Threading.Thread.get_LocalDataStoreManager () |
|
0.53 % |
1.05 % |
| |
System.LocalDataStore.GetData (class System.LocalDataStoreSlot) |
|
0.25 % |
8.43 % |
|
0.55 % |
7.07 % |
| |
System.Collections.Generic.Dictionary`2.get_Item (!0) |
|
2.38 % |
6.52 % |
| |
System.Collections.Generic.Dictionary`2.FindEntry (!0) |
|
0.20 % |
1.30 % |
|
The table above is showing all functions starting from Main with an inclusive cost >= 1% and a depth of no more than 3 -- so things are missing but it's good for discussion. Under Test1 there's a good deal of Locking and Marshalling... looks like there is a big oops here. The good news is that the contract is sound so hopefully this could be addressed. But really I'm not sure why I would even bother. The other approach, using [ThreadStatic] is much cleaner and much faster. I don't know why anyone would ever want to use the slots.
For my part rather than fix this I think I will ask that the relevant functions be deprecated -- the [ThreadStatic] approach seems better in every way . The slot methods hereby have my personal deprecation for what that's worth.
- Anonymous
July 18, 2006
PingBack from http://microsoft.wagalulu.com/2006/07/18/performance-quiz-10-thread-local-storage-solution/
- Anonymous
July 18, 2006
If you're stuffing anything in thread local storage, you might be interested in the performance comparison...
- Anonymous
July 18, 2006
Rico,
Deprecation seems a bit harsh. It seems like some of the slot based methods could have application to dynamic languages and other interpreters especially(?).
- Anonymous
July 18, 2006
Seriously I can't think of any cases where it wouldn't be better to just make your own personal Dictionary to hang on each thread. Such a thing is still discoverable in a dynamic language if you wish it to be.
Poking into other classes named slots -- which may have been intended to be 'private' seems unwise at best.
So I think to myself, why have these methods at all?
I wouldn't worry though, when it comes to deprecation I don't usually get what I ask for :)
- Anonymous
August 03, 2006
Ever wonder how I get those nice looking HTML call trees with attributed costs like this one here&nbsp;in...
- Anonymous
December 21, 2006
Ever wonder how I get those nice looking HTML call trees with attributed costs like this one here in