STXXL
1.4-dev
|
The intended audience of this tutorial are developers or researchers who develop applications or implement algorithms processing large data sets which do not fit into the main memory of a computer. They must have basic knowledge in the theory of external memory computing and have working knowledge of C++ and an experience with programming using STL. Familiarity with key concepts of generic programming and C++ template mechanism is assumed.
Let us start with a toy but pretty relevant problem: the phone call billing problem. You are given a sequence of event records. Each record has a time stamp (time when the event had happened), type of event ('call begin' or 'call end'), the callers number, and the destination number. The event sequence is time-ordered. Your task is to generate a bill for each subscriber that includes cost of all her calls. The solution is uncomplicated: sort the records by the callers number. Since the sort brings all records of a subscriber together, we scan the sorted result computing and summing up the costs of all calls of a particular subscriber. The phone companies record up to 300 million transactions per day. AT&T billing system Gecko [33] has to process databases with about 60 billion records, occupying 2.6 terabytes. Certainly this volume can not be sorted in the main memory of a single computer (Except may be in the main memory of an expensive supercomputer.) Therefore we need to sort those huge data sets out-of-memory. Now we show how STXXL can be useful here, since it can handle large volumes I/O efficiently.
If you are familiar with STL your the main
function of bill generation program will probably look like this:
To complete the code we need to define the log entry data type LogEntry
, input operator >>
for LogEntry
, comparison functor SortByCaller
, unary functor ProduceBills
used for computing bills, and the print_usage
function.
In order to make the program I/O efficient we will replace the STL internal memory data structures and algorithms by their STXXL counterparts. The changes are marked with //!
As you note the changes are minimal. Only the namespaces and some memory specific parameters had to be changed.
See examples/algo/phonebills.cpp for the full source code. The example program is automatically compiled when building STXXL, refer to Compilation and Configuration on how to build programs with STXXL.
The program examples/algo/phonebills_genlog.cpp can be used to generate logs for processing with the phonebills example.
Do not forget to configure you external memory space in file .stxxl
. See config.