STXXL
1.4.1
|
The most universal STXXL container is stxxl::vector. Vector is an array whose size can vary dynamically. The implementation of stxxl::vector is similar to the LEDA-SM array [25]. The content of a vector is striped block-wise over the disks, using an assignment strategy given as a template parameter. Some of the blocks are cached in a vector cache of fixed size (also a parameter). The replacement of cache blocks is controlled by a specified page-replacement strategy. STXXL has implementations of LRU and random replacement strategies. The user can provide his/her own strategy as well. The stxxl::vector has STL-compatible Random Access Iterators.
The stxxl::vector is organized as a collection of blocks residing on the external storage media (parallel disks). Access to the external blocks is organized through the fully associative cache which consist of some fixed amount of in-memory pages (The page is a collection of consecutive blocks. The number of blocks in the page is constant.). The schema of stxxl::vector is depicted in the following figure. When accessing an element the implementation of stxxl::vector access methods ([]
operator, push_back
, etc.) first checks whether the page to which the requested element belongs is in the vector's cache. If it is the case the reference to the element in the cache is returned. Otherwise the page is brought into the cache (If the page of the element has not been touched so far, this step is skipped. To keep an eye on such situations there is a special flag for each page.). If there was no free space in the cache, then some page is to be written out. Vector maintains a pager object, that tells which page to kick out. STXXL provides LRU and random paging strategies. The most efficient and default one is LRU. For each page vector maintains the dirty flag, which is set when non-constant reference to one of the page's elements was returned. The dirty flag is cleared each time when the page is read into the cache. The purpose of the flag is to track whether any element of the page is modified and therefore the page needs to be written to the disk(s) when it has to be evicted from the cache.
In the worst case scenario when vector elements are read/written in the random order each access takes 2 x blocks_per_page I/Os. The factor two shows up here because one has to write the replaced from cache page and read the required one). However the scanning of the array costs about I/Os using constant vector iterators or const reference to the vector, where n is the number of elements to read or write (read-only access). Using non-const vector access methods leads to I/Os because every page becomes dirty when returning a non const reference. If one needs only to sequentially write elements to the vector in I/Os the currently fastest method is stxxl::generate. Sequential writing to an untouched before vector (e.g. when created using stxxl::vector(size_type n) constructor.) or alone adding elements at the end of the vector, using the push_back(const T&) method, leads also to I/Os.
Besides the type of the elements stxxl::vector has many other template parameters (block size, number of blocks per page, pager class, etc.). To make the configuration of the vector type easier STXXL provides special type generator template meta programs for its containers.
The meta-program for stxxl::vector is called stxxl::VECTOR_GENERATOR.
The stxxl::VECTOR_GENERATOR has the following template parameters:
ValueType | element type of contained objects (POD with no references to internal memory) |
PageSize | number of blocks in a page, default: 4 (recommended >= D) |
CachePages | number of pages in cache, default: 8 (recommended >= 2) |
BlockSize | external block size B in bytes, default: 2 MiB |
AllocStr | parallel disk allocation strategies: striping , RC, SR, or FR. default: RC. |
Pager | pager type: random or lru , default: lru. |
strategy | identifier |
---|---|
striping | striping |
simple randomized | SR |
fully randomized | FR |
randomized cycling | RC |
strategy | identifier |
---|---|
random | random |
least recently used | lru |
The cache of stxxl::vector largely dominates in its internal memory consumption. Other members consume very small fraction of stxxl::vector's memory even when the vector size is large. Therefore, the internal memory consumption of stxxl::vector can be estimated as bytes.
operator
[] makes the page of the element dirty. This causes the page to be written back to the disks(s) when the page is to be kicked off from the cache (additional write I/Os). If you do not want this behavior, use const operator
[]. For that you need to access the vector via a const reference to it. Example: front()
and back()
methods.