STXXL  1.4-dev
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Vector
Author
Roman Dementiev (2006)

The most universal STXXL container is stxxl::vector. Vector is an array whose size can vary dynamically. The implementation of stxxl::vector is similar to the LEDA-SM array [25]. The content of a vector is striped block-wise over the disks, using an assignment strategy given as a template parameter. Some of the blocks are cached in a vector cache of fixed size (also a parameter). The replacement of cache blocks is controlled by a specified page-replacement strategy. STXXL has implementations of LRU and random replacement strategies. The user can provide his/her own strategy as well. The stxxl::vector has STL-compatible Random Access Iterators.

  • One random access costs $ \mathcal{O}(1) $ I/Os in the worst case. Same for insertion and removal at the end.
  • Sequential scanning of the vector costs $ \mathcal{O}(1/DB) $ amortized I/Os per vector element.

The Architecture of stxxl::vector

The stxxl::vector is organized as a collection of blocks residing on the external storage media (parallel disks). Access to the external blocks is organized through the fully associative cache which consist of some fixed amount of in-memory pages (The page is a collection of consecutive blocks. The number of blocks in the page is constant.). The schema of stxxl::vector is depicted in the following figure. When accessing an element the implementation of stxxl::vector access methods ([]operator, push_back, etc.) first checks whether the page to which the requested element belongs is in the vector's cache. If it is the case the reference to the element in the cache is returned. Otherwise the page is brought into the cache (If the page of the element has not been touched so far, this step is skipped. To keep an eye on such situations there is a special flag for each page.). If there was no free space in the cache, then some page is to be written out. Vector maintains a pager object, that tells which page to kick out. STXXL provides LRU and random paging strategies. The most efficient and default one is LRU. For each page vector maintains the dirty flag, which is set when non-constant reference to one of the page's elements was returned. The dirty flag is cleared each time when the page is read into the cache. The purpose of the flag is to track whether any element of the page is modified and therefore the page needs to be written to the disk(s) when it has to be evicted from the cache.

vector_architecture_small.png
The schema of stxxl::vector that consists of ten external memory pages and has a cache with the capacity of four pages. The first cache page is mapped to external page 1, the second page is mapped to external page 8, and the fourth cache page is mapped to page 5. The third page is not assigned to any external memory page.

In the worst case scenario when vector elements are read/written in the random order each access takes 2 x blocks_per_page I/Os. The factor two shows up here because one has to write the replaced from cache page and read the required one). However the scanning of the array costs about $ n/B $ I/Os using constant vector iterators or const reference to the vector, where n is the number of elements to read or write (read-only access). Using non-const vector access methods leads to $ 2 \times n/B $ I/Os because every page becomes dirty when returning a non const reference. If one needs only to sequentially write elements to the vector in $ n/B $ I/Os the currently fastest method is stxxl::generate. Sequential writing to an untouched before vector (e.g. when created using stxxl::vector(size_type n) constructor.) or alone adding elements at the end of the vector, using the push_back(const T&) method, leads also to $ n/B $ I/Os.

// Example of use
V.push_back(3);
assert(V.size() == 1 && V.capacity() >= 1 && V[0] == 3);

stxxl::VECTOR_GENERATOR

Besides the type of the elements stxxl::vector has many other template parameters (block size, number of blocks per page, pager class, etc.). To make the configuration of the vector type easier STXXL provides special type generator template meta programs for its containers.

The meta-program for stxxl::vector is called stxxl::VECTOR_GENERATOR.

// Example of use
vector_type V;
V.push_back(3);
assert(V.size() == 1 && V.capacity() >= 1 && V[0] == 3);

The stxxl::VECTOR_GENERATOR has the following template parameters:

Template Parameters
ValueTypeelement type of contained objects (POD with no references to internal memory)
PageSizenumber of blocks in a page, default: 4 (recommended >= D)
CachePagesnumber of pages in cache, default: 8 (recommended >= 2)
BlockSizeexternal block size B in bytes, default: 2 MiB
AllocStrparallel disk allocation strategies: striping, RC, SR, or FR. default: RC.
Pagerpager type: random or lru, default: lru.
Warning
Do not store references to the elements of an external vector. Such references might be invalidated during any following access to elements of the vector

Notes:

  • All blocks of a page are read and written from/to disks together. Therefore to increase the I/O bandwidth, it is recommended to set the PageSize parameter to multiple of D.
  • Memory consumption of constructed vector is BlockSize x CachePages x PageSize bytes (see below).
  • The configured vector type is available as VECTOR_GENERATOR<>::result.
  • Since there are defaults for the last five of the parameters, it is not necessary to specify them all.
  • Supported parallel disk assignment strategies:
    strategy identifier
    striping striping
    simple randomized SR
    fully randomized FR
    randomized cycling RC
  • Supported paging strategies:
    strategy identifier
    random random
    least recently used lru

    Examples:

  • VECTOR_GENERATOR<double>::result – external vector of double's with four blocks per page, the cache with eight pages, 2 MiB blocks, Random Allocation and lru cache replacement strategy.
  • VECTOR_GENERATOR<double,8>::result – external vector of double's , with eight blocks per page, the cache with eight pages, 2 MiB blocks, Random Allocation and lru cache replacement strategy
  • VECTOR_GENERATOR<double,8,2,524288,SR>::result – external vector of double's, with eight blocks per page, the cache with two pages, 512 KiB blocks, Simple Randomized allocation and lru cache replacement strategy

Internal Memory Consumption of stxxl::vector

The cache of stxxl::vector largely dominates in its internal memory consumption. Other members consume very small fraction of stxxl::vector's memory even when the vector size is large. Therefore, the internal memory consumption of stxxl::vector can be estimated as $ BlockSize \times CachePages \times PageSize $ bytes.

More Notes

  • In opposite to STL, stxxl::vector's iterators do not get invalidated when the vector is resized or reallocated.
  • Dereferencing a non-const iterator makes the page of the element to which the iterator points to dirty. This causes the page to be written back to the disks(s) when the page is to be kicked off from the cache (additional write I/Os). If you do not want this behavior, use const iterators instead. See following example:
    vector_type V;
    // ... fill the vector here
    vector_type::iterator iter = V.begin();
    // ... advance the iterator
    a = *iter; // causes write I/Os, although *iter is not changed
    vector_type::const_iterator citer = V.begin();
    // ... advance the iterator
    a = *citer; // read-only access, causes no write I/Os
    *citer = b; // does not compile, citer is const
  • Non const operator[] makes the page of the element dirty. This causes the page to be written back to the disks(s) when the page is to be kicked off from the cache (additional write I/Os). If you do not want this behavior, use const operator[]. For that you need to access the vector via a const reference to it. Example:
    vector_type V;
    // ... fill the vector here
    a = V[index]; // causes write I/Os, although V[index] is not changed
    const vector_type & CV = V; // const reference to V
    a = CV[index]; // read-only access, can cause no write I/Os
    CV[index] = b; // does not compile, CV is const
    This issue also concerns front() and back() methods.