STXXL  1.4.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
STXXL Unordered Map (Hash Map)

This page introduces the EXPERIMENTAL stxxl::unordered_map which can be used in-lieu of std::unordered_map (for further information on the interface, refer to the API stxxl::unordered_map).

stxxl::unordered_map is an external memory hash map that stores elements formed by a combination of a unique key value and a data value, without any specific order. The main problem is that a hash map ITSELF IS NOT VERY EFFICIENT in external memory, since access to an element requires a random access to disk. PLEASE CHECK whether an ordered sequence, as provided by stxxl::map, may not be the better replacement for your application. However, if you are willing to provide a lot of buffer memory then the hash map can cache many items in internal memory, and direct hash-based access will be very fast. Also, with SSDs one may be able to reduce the block size.

The implementation of the unordered hash_map is experimental, and help for improving, fixing bugs and writing documentation in it is very welcome. If you have an application, please consider THROUGHLY TESTING the implementation and patching problems.

Creating a STXXL Unordered Map

To create a stxxl::unordered_map object, several template parameters are required. The first two parameters KeyType and MappedType, which are combined into a std::pair<int, char> in this example, are self-explanatory, the third parameter is a hasher class and the fourth has to be a comparator class which is used to determine whether a key is smaller than another one, the fifth and sixth parameters define the subblock- and block size (in subblock items).

#define SUB_BLOCK_SIZE 8192
#define SUB_BLOCKS_PER_BLOCK 256
// template parameter <KeyType, MappedType, HashType, CompareType, SubBlockSize, SubBlocksPerBlock>
int, char, HashFunctor, CompareLess, SUB_BLOCK_SIZE, SUB_BLOCKS_PER_BLOCK
> unordered_map_type;
// constructor: use defaults for all parameters
unordered_map_type my_map;

The hash function follows the standard std::hash signature, and returns a size_t:

struct HashFunctor
{
size_t operator () (int key) const
{
// a simple integer hash function
return (size_t)(key * 2654435761u);
}
};

Instead of the equality comparator as required by the C++ standard, we require a less comparator, because the unordered_map sorts bulk insertions by hash value. A simple comparator looks like:

struct CompareLess
{
bool operator () (const int& a, const int& b) const
{ return a < b; }
static int min_value() { return std::numeric_limits<int>::min(); }
static int max_value() { return std::numeric_limits<int>::max(); }
};

After construction, the standard operations of an unordered map are available as one would think, see below for a short example of some function.

Additional Implementation Notes

The implementation contains some TODO items very relevant to performance. A potential heavy user should consider fixing these.

As the btree, the unordered_map must keep an iterator map for updating items when they are swapped out to disk.

TODO: write more information.

A minimal working example on STXXL Unordered Map

(See examples/containers/unordered_map1.cpp for the sourcecode of the following example).

#include <iostream>
//! [hash]
struct HashFunctor
{
size_t operator () (int key) const
{
// a simple integer hash function
return (size_t)(key * 2654435761u);
}
};
//! [hash]
//! [comparator]
struct CompareLess
{
bool operator () (const int& a, const int& b) const
{ return a < b; }
static int min_value() { return std::numeric_limits<int>::min(); }
static int max_value() { return std::numeric_limits<int>::max(); }
};
//! [comparator]
int main()
{
//! [construction]
#define SUB_BLOCK_SIZE 8192
#define SUB_BLOCKS_PER_BLOCK 256
// template parameter <KeyType, MappedType, HashType, CompareType, SubBlockSize, SubBlocksPerBlock>
int, char, HashFunctor, CompareLess, SUB_BLOCK_SIZE, SUB_BLOCKS_PER_BLOCK
> unordered_map_type;
// constructor: use defaults for all parameters
unordered_map_type my_map;
//! [construction]
// insert some items and delete one
my_map.insert(std::make_pair(1, 'a'));
my_map.insert(std::make_pair(2, 'b'));
my_map.insert(std::make_pair(3, 'c'));
my_map.insert(std::make_pair(4, 'd'));
my_map.erase(3);
// iterate over all items in the unordered_map
unordered_map_type::iterator iter;
std::cout << "my_map contains:\n";
for (iter = my_map.begin(); iter != my_map.end(); ++iter)
{
std::cout << iter->first << " => " << iter->second << std::endl;
}
// direct operator[] access to items
std::cout << "my_map[2] = " << my_map[2] << std::endl;
// efficient bulk-insert into hash map by sorting by hash keys
std::vector<unordered_map_type::value_type> value_array;
for (int i = 0; i < 128; ++i)
value_array.push_back(std::make_pair(i, (char)i));
my_map.insert(value_array.begin(), value_array.end(), 8 * 1024 * 1024);
// check results of insertion
std::cout << "my_map[42] = " << my_map[42] << std::endl;
std::cout << "my_map.size() = " << my_map.size() << std::endl;
return 0;
}