Home c# C++ Dictionary as C # Dictionary

C++ Dictionary as C # Dictionary

Author

Date

Category

There is a surprisingly fast Dictionary in C #, I would like to know if there is a similarly efficient one only in C++? I tried unordered_map , hash_map , map , but the performance is many times lower than that of Sisharpovsky’s Dictionary
P.S: The pair looks like & lt; unsigned int, unsigned int & gt;


Answer 1, authority 100%

In fact, comparing languages ​​is a thankless thing. There will always be tests in which one of the languages ​​will win over the other, and there will always be people who believe that this test is not relevant and such code will never be encountered in reality.

Nevertheless, I would not say that the TC results are very unexpected: in .NET, memory allocation is usually faster than in native languages ​​without a custom allocator. And small tests usually load the allocator much more than, say, function call mechanisms.

The reason for this difference in allocator performance is that C++ objects cannot be moved in memory, which means that the usual memory allocation algorithm (which, as you know, maintains a list of free blocks and looks for a suitable one when allocating) is rather slow, and, worse, it requires a global memory lock (which makes the situation even worse in a multithreaded scenario). In addition, objects in C++ tend to be deallocated as quickly as possible, resulting in additional deallocation memory overhead that also requires a global lock.

In the .NET environment, things are different. Objects are always allocated at the top of heap memory, which means that allocation is no slower than InterlockedIncrement . .NET does not need to maintain a list of free blocks because garbage collection occurs compactification of the heap memory: objects are moved, filling the “holes”.


In addition, the news that C++ code may well be slower than similar C # code is not news for a long time. For example, here’s a great series articles about a simple application by native programming experts, and summary Jeff Atwood:

To get around the C # version in terms of performance, Raymond had to write his own I / O routines, rewrite the string class, use a custom allocator, as well as his own code page rendering routine.

This is also confirmed by the benchmark below: native containers out of the box are significantly inferior to Dotnet ones, (some) self-written containers win.


Now for the fun part: measurements.

C #:

using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace Sharp
{
  class Program
  {
    static void Main (string [] args)
    {
      var dict = new Dictionary & lt; int, int & gt; ();
      int seed = 1;
      var timer = new Stopwatch ();
      timer.Start ();
      for (int i = 0; i & lt; 10000000; i ++)
      {
        seed = 1664525 * seed + 1013904223;
dict.Add (seed, i);
      }
      timer.Stop ();
      Console.WriteLine (
        "elapsed time = {0} ms, dictionary entries count = {1}",
        timer.ElapsedMilliseconds,
        dict.Count);
    }
  }
}

C++:

# include "stdafx.h"
#include & lt; ctime & gt;
#include & lt; map & gt;
#include & lt; iostream & gt;
using namespace std;
int main (int argc, char * argv [])
{
  map & lt; int, int & gt; dict;
  int seed = 1;
  auto begin = clock ();
  for (int i = 0; i & lt; 10000000; i ++)
  {
    seed = 1664525 * seed + 1013904223;
    dict.insert (make_pair (seed, i));
  }
  auto end = clock ();
  double elapsedMs = double (end - begin) * 1000.0 / CLOCKS_PER_SEC;
  cout & lt; & lt; "elapsed time =" & lt; & lt; elapsedMs
     & lt; & lt; "ms, dictionary entries count =" & lt; & lt; dict.size ()
     & lt; & lt; endl;
  return 0;
}

Measurement results (release mode, 5 runs in a row without a debugger):

C #

elapsed time = 1138 ms, dictionary entries count = 10000000
elapsed time = 1127 ms, dictionary entries count = 10000000
elapsed time = 1133 ms, dictionary entries count = 10000000
elapsed time = 1134 ms, dictionary entries count = 10000000
elapsed time = 1129 ms, dictionary entries count = 10000000

C++

elapsed time = 8377 ms, dictionary entries count = 10000000
elapsed time = 8408 ms, dictionary entries count = 10000000
elapsed time = 8377 ms, dictionary entries count = 10000000
elapsed time = 8377 ms, dictionary entries count = 10000000
elapsed time = 8361 ms, dictionary entries count = 10000000

Average time: C # = 1132 ms, C++ = 8379 ms.

I am not claiming that my tests are perfect. Also, they are only relevant on my computer. If someone comes up with a better measurement technique, I would be happy to apply that too. However, under my circumstances, the performance of System.Collections.Generic.Dictionary for adding elements is substantially superior to that of std :: map .


Note that Dictionary uses hash tables, while std :: map in my implementation uses a red-black tree as the host data structure. Hash tables are usually faster on their own, so allocation speed isn’t the only reason Dictionary has better speed.


Replacing make_pair (seed, i) in C++ with pair & lt; int, int & gt; (seed, i) as suggested by @igumnov didn’t make a big difference: 8361 / 8392/8361/8408/8361/8345.


Replacing std :: map in C++ with std :: unordered_map as suggested by @Kotick resulted in a significant speedup: 2230/2230/2230/2230/2246 (average 2233 ). However, C++ is still almost twice as slow.


Replaced in C++ std :: unordered_map with uthash as suggested by @igumnov. The result is slightly worse than std :: unordered_map : 2963/2932/2948/2948/2932. Code:

void testUhash ()
{
  struct myint
  {
    int key;
    int value;
    UT_hash_handle hh;
  };
  struct myint * dict = NULL;
  int seed = 1;
  auto begin = clock ();
  for (int i = 0; i & lt; 10000000; i ++)
  {
    seed = 1664525 * seed + 1013904223;
    struct myint * ps = (struct myint *) malloc (sizeof (struct myint));
    ps- & gt; key = seed;
    ps- & gt; value = i;
    HASH_ADD_INT (dict, key, ps);
  }
  auto end = clock ();
double elapsedMs = double (end - begin) * 1000.0 / CLOCKS_PER_SEC;
  cout & lt; & lt; "elapsed time =" & lt; & lt; elapsedMs
     & lt; & lt; "ms, dictionary entries count =" & lt; & lt; HASH_COUNT (dict)
     & lt; & lt; endl;
}

Added capacity = 10000000 in C++ and for fair comparison in C # too. Changes:

C++:

unordered_map & lt; int, int & gt; dict (10000000);

C #:

var dict = new Dictionary & lt; int, int & gt; (10000000);

Indeed, it has become rather:

  • C++: 1826/1856/1857/1841/1825, average 1841
  • C #: 790/786/801/790/791, average 792

C # is still more than twice ahead.


Following @KoVadim’s advice, removed the seed computation (the capacity left), now the duty cycle is as follows:

C++:

for (int i = 0; i & lt; 10000000; i ++)
{
  // seed = 1664525 * seed + 1013904223;
  dict.insert (pair & lt; int, int & gt; (i, i));
}

C #:

for (int i = 0; i & lt; 10000000; i ++)
{
  // seed = 1664525 * seed + 1013904223;
  dict.Add (i, i);
}

Results:

  • C++: 1498/1514/1498/1498/1498, average 1501
  • C #: 129/129/135/133/132, average 132

On the advice of @igumnov added khash to the benchmark. Code:

KHASH_MAP_INIT_INT (32, int)
void testKhash ()
{
  int seed = 1;
  khiter_t iter;
  khash_t (32) * dict = kh_init (32);
  int dummy;
  auto begin = clock ();
  for (int i = 0; i & lt; 10000000; i ++)
  {
seed = 1664525 * seed + 1013904223;
    iter = kh_put (32, dict, seed, & amp; dummy);
    kh_value (dict, iter) = i;
  }
  auto end = clock ();
  double elapsedMs = double (end - begin) * 1000.0 / CLOCKS_PER_SEC;
  cout & lt; & lt; "elapsed time =" & lt; & lt; elapsedMs
     & lt; & lt; "ms, dictionary entries count =" & lt; & lt; kh_size (dict)
     & lt; & lt; endl;
}

Result: 577/577/608/577/577, average 583, massive wines for the native code. Let me remind you that the best result for a standard .NET container is 792 ms.

Who will offer a custom container for .NET?


Tried the .NET implementation FastDictionary (project MapReduce.NET ). It turned out a little slower than the built-in Dictionary : 853/865/842/841/842, average 849.


Tried the pure allocation speed to test @Dith’s hypothesis: the constructor of an empty class is run 10 million times. Code:

C #:

static class Allocation
{
  class Foo
  {
  }
  static public void Test ()
  {
    const int size = 10000000;
    var timer = new Stopwatch ();
    timer.Start ();
    for (int i = 0; i & lt; size; i ++)
    {
      new Foo ();
    }
    timer.Stop ();
    Console.WriteLine ("elapsed time = {0} ms", timer.ElapsedMilliseconds);
  }
}

C++:

void testAlloc ()
{
  const int size = 10000000;
  LARGE_INTEGER li;
  if (! QueryPerformanceFrequency (& amp; li))
    exit (1);
double freq = double (li.QuadPart) /1000.0;
  QueryPerformanceCounter (& amp; li);
  auto begin = li.QuadPart;
  for (int i = 0; i & lt; size; i ++)
    new Foo ();
  QueryPerformanceCounter (& amp; li);
  auto end = li.QuadPart;
  double elapsedMs = double (end - begin) / freq;
  cout & lt; & lt; "elapsed time =" & lt; & lt; elapsedMs
     & lt; & lt; "ms" & lt; & lt; endl;
}

Results:

  • C #: 58/54/28/55/55 (average 50)
  • C++: 407.719 / 400.693 / 401.674 / 401.926 / 399.976 (average 402.4)

Answer 2, authority 9%

Continuation of a very interesting discussion:

as @VladD said, Dotnet Dictionary is based on HashMap, so here

Worst function implementation
GetHashCode in the .NET Framework is
default implementation
in structures. The fact is that this
the function for structures does the following.
With the help of reflection, she goes through everything
fields and tries to get a hash code.
Having found the first field from which you can
get the hash code, the function ends
your job by returning this value. V
in most cases it returns
the hash code of the first found field.
However, if the structure consists of
reference types and they are all set
to null, then the function is similar to
by class returns the ordinal
object.

….

This approach has two drawbacks.
First, such a hash code may turn out to be
and often turns out to be incorrect, so
how hard it is in one field
identify the entire structure.
Secondly, due to the use of
reflection, this method is far from
perfect performance
.
Therefore, if necessary, obtaining
hash values ​​from structures, better
implement the corresponding functions
yourself.

source


Answer 3, authority 2%

First, remember that map’s first parameter is the type of the key, and the second is the type of the value.
You add a key in each entry
seed = 1664525 * seed + 1013904223;
where seed = 1 is constant.
Accordingly, the same key value is passed separately, which is the worst value for insertion.

Second, and most importantly.
map is efficient for finding a value by key and very inefficient for adding / removing elements.
Your test only makes the addition. The main cost for such a test is not allocation, but addition in the tree (red and black, as you correctly noted).
If you want to compare dictionnary in dotnet and map in stl, you need to write a test in which values ​​are filled first, and then measure the access time to random values ​​in a loop.
If you often need to search for values ​​in your application, we use map, but then we write another test. If you need to add values ​​often and a lot – use vector, if add / remove – list. Then the test (to add) will be different:

vector & lt; int & gt; vec;
for (int i = 0; i & lt; 1000000; i ++)
{
  vec.push_back (i);
}

Your code for 1,000,000 elements runs at 8386 ms for me, the example with a vector is 251 ms, 33 times faster.


Answer 4

I’ll write here, since my comment limit has ended

Tried pure allocation speed to test hypothesis @Dith : Empty class constructor is run 10 million times.

This test shows ignorance of the compiler and runtime environment. There will be a leak in C++ code, but not in sharp code, since GC will clean everything up. This is the first sign that the tests are not equivalent. We added at least a call to the destructor. (But tests show that this will not speed up the process much.)

BUT! in the sharp code, in fact, the amount of allocated memory will not increase – the object will be destroyed, and the new one will be placed in the old memory. Since the object is empty, initialization will be fast. Plus the memory is already allocated. Formally – assignment of a pair of pointers + memset ().

But the jit optimizer can see that although the object is created, it is not used, and the constructor is empty and thrown away. And in fact – to drive an empty cycle. Although ideally it can then be thrown away. But this should be looked at separately.

I’m not that strong in C #, but I think that if you start displaying the address of an object that is being created, then it will always be the same (or there will be a dozen different addresses that are iterated over in a loop).

That is, in fact, the C++ code looked like this somewhere (very simplified code, on the knee):

Foo * f; // this is a global variable.
for (int i = 0; i & lt; size; i ++) {
  // it's like a constructor
  Foo * t;
  if (f == NULL) // if there is no object, create
    t = alloc_memery ();
    init (t);
  else {
    t = f;
    init (t); // and this is part of the constructor. The memory has already been allocated, we just need to correct the fields.
    f = NULL; // we took the object
  }
  // and this is the destructor.
  f = t; // put the object back.
  t = NULL;
}

It turned out to be such a pool of objects for one object.

But you can do better, which will make the tests more correct.

char * t = new char [1000]; // allocate some memory for ourselves.
for (int i = 0; i & lt; 10000000; i ++) {
  new (t) Foo (); // see a strange new operator? :)
}
delete [] t;

(the new operator, which is such a special form in the code, is called placement new ).
The most interesting thing is that now there is no leak, and the code works 10 times faster than the original. I think this code is the most plausible analogue of the sharp code. But on the other hand, you can write even more beautiful

for (int i = 0; i & lt; 10000000; i ++)
  Foo f (); // yes, the object will be created here and the destructor constructor will be called

I will not be surprised if the sharpened optimizer also made allocation of memory on the stack in the original code (this is possible, I know for sure that 7 java can do that). And allocating memory on the stack is very cheap in terms of speed.

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions