STXXL: Standard Template Library for Extra Large Data Sets.
The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i. e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the closeness to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.
The key features of STXXL are:
Transparent support of parallel disks. The library provides implementations of basic parallel disk algorithms. STXXL is the only external memory algorithm library supporting parallel disks.
The library is able to handle problems of very large size (tested to up to dozens of terabytes).
Improved utilization of computer resources. STXXL implementations of external memory algorithms and data structures benefit from overlapping of I/O and computation.
Small constant factors in I/O volume. A unique library feature called "pipelining" can save more than half the number of I/Os, by streaming data between algorithmic components, instead of temporarily storing them on disk. A development branch supports asynchronous execution of the algorithmic components, enabling high-level task parallelism.
Shorter development times due to well known STL-compatible interfaces for external memory algorithms and data structures.
STL algorithms can be directly applied to STXXL containers; moreover, the I/O complexity of the algorithms remains optimal in most of the cases. [more info]
For internal computation, parallel algorithms from the MCSTL or the libstdc++ parallel mode are optionally utilized, making the algorithms inherently benefit from multi-core parallelism.
Current maintainers: Andreas Beckmann, Timo Bingmann Past contributors: Roman Dementiev (author), Peter Sanders, Johannes Singler, Raoul Steffen, Markus Westphal
Design paper (2005) - library overview [pdf] [html]
libstdc++ parallel mode homepage: [link] (optional, comes with GCC)
Support Channels
Questions concerning use and development can be search for and asked on Stack Overflow, and longer user-contributed solutions may also be shared via the Github Wiki..
Integrated support for kernel based asynchronous I/O on Linux (new file type "linuxaio"), which exploits Native Command Queuing (NCQ) if available.
Merged stxxl::unordered_map branch, which provides a hash map backed by external memory.
Replaced struct default_completion_handler with a NULL pointer, thus avoiding superfluous new/delete work for each I/O request
Added stxxl::external_shared_ptr which is a proxy class to allow use of shared_ptr classes inside stxxl containers
Fixing bugs and warnings on 32-bit systems (yes, they still exist).
Use atomic_counted_object in class file for request reference counting.
Adding support for MinGW-w64 (64-bit) systems with working SJLJ thread implementations.
Version 1.4.0 (December 12, 2013)
reorganized source hierarchy into include/ lib/ tests/ examples/ doc/ tools/
CMake build system for cross-platform compilation
greatly improved documentation with tutorials and examples
efficient external matrix operations
new containers stxxl::sequence and stxxl::sorter
improved .stxxl disk configuration files and additional options
combined stxxl_tool of disk benchmarks
simple examples and skew3 as real-world stream application
support for Visual Studio 2012 and 2013 _without_ Boost
important bug fixes in stxxl::queue and stxxl::priority_queue
Version 1.3.1 (March 10, 2011)
Contains memory management, disk virtualization, prefetching, and so on, as the lower layers, and as part of the higher layer (pipelined) sorting with SMP and multi-core processor support, (pipelined) scanning and containers (vectors, stacks, priority queues, maps (B+Tree), queues, deques). Currently that sums to about 35,000 lines of code.
See the current Changelog for detailed lists of changes.
Branches
Special features are maintained as Github forks until they are merged into master. Until inclusion into the master branch, the interface may change without further notice.
Asynchronous Pipelining/Streaming: parallel_pipelining_integration This contains an unfinished integration attempt of the async nodes and parallel sorting/pipelining described in the IPDPS 2009 paper. If someone has a stake or interest in this branch, please contact me (Timo) about further work on it. -- 2014-10-23
Publications, Ongoing and Completed Projects using STXXL
If you use STXXL and wish to appear in the following list, please provide a description line via email to one of the maintainers.
Thomas Keh: "Bulk-Parallel Priority Queue in External Memory", 2014: Bachelor Thesis, Supervisor: Timo Bingmann and Peter Sanders
[pdf]
Timo Bingmann, Johannes Fischer, Vitaly Osipov: "Inducing Suffix and LCP Arrays in External Memory", ALENEX 2013: Workshop on Algorithm Engineering and Experiments (Jan 2013, New Orleans, LA, USA)
[pdf] [project page]
Daniel Feist: "External Batched Range Minimum Queries and LCP Construction", 2013: Bachelor Thesis, Supervisor: Timo Bingmann and Peter Sanders
[pdf]
Dennis Luxen, Christian Vetter: "Real-Time Routing with OpenStreetMap Data", ACM GIS 2011: 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (Nov 2011, Chicago, IL, USA)
[pdf]
Mirko Rahn, Peter Sanders, Johannes Singler: "Scalable Distributed-Memory External Sorting", ICDE 2010: International Conference on Data Engineering (Mar 2010, Long Beach)
[arXiv]
Raoul Steffen: "Matrix Operations for STXXL", 2010: Bachelor Thesis, Supervisor: Johannes Singler and Peter Sanders
[pdf]
Andreas Beckmann, Roman Dementiev, Johannes Singler: "Building A Parallel Pipelined External Memory Algorithm Library", IPDPS 2009: International Parallel and Distributed Processing Symposium (May 2009, Rome, Italy)
[full text]
Roman Dementiev, Lutz Kettner, Peter Sanders: "STXXL: standard template library for XXL data sets", Software: Practice and Experience (August 2007, DOI: 10.1002/spe.844)
[link] [preprint]
D. Ajwani, Roman Dementiev, Ulrich Meyer: "A Computational Study of External-Memory BFS Algorithms", SODA 2006: ACM-SIAM Symposium on Discrete Algorithms (January 2006, Miami, Florida, USA)
[pdf]
Roman Dementiev, Lutz Kettner, Peter Sanders: "Stxxl: Standard Template Library for XXL Data Sets", ESA 2005: 13th Annual European Symposium on Algorithms (October 2005, Palma de Mallorca, Spain)
[pdf] [see also the extended version]
Roman Dementiev, Lutz Kettner, Peter Sanders: "Stxxl: Standard Template Library for XXL Data Sets", Technical Report 2005/18, Department of Informatics, University of Karlsruhe
[pdf] [html]
Roman Dementiev, Juha Kaerkkaeinen, Jens Mehnert, Peter Sanders: "Better External Memory Suffix Array Construction", ALENEX 2005: Algorithm Engineering and Experiments (January 2005, Vancouver, Canada):
[pdf] [input instances] [project page]
Roman Dementiev, Peter Sanders, Dominik Schultes, Jop Sibeyn: "Engineering an External Memory Minimum Spanning Tree Algorithm", TSC 04: 3rd IFIP International Conference on Theoretical Computer Science (August 24-26, 2004, Toulouse): [ps] [pdf] [project page]
Roman Dementiev, Peter Sanders: "Asynchronous Parallel Disk Sorting", SPAA 2003: 15th ACM Symposium on Parallelism in Algorithms and Architectures (June 7-9, 2003, San Diego, California, USA)
[pdf]