STXXL
1.4.1
|
A main feature of the STXXL is to take advantage of parallel access to multiple disks. For this, you must define the disk configuration in a text file, using the syntax described below. If no file is found at the locations below, STXXL will by default create a 1000 MiB file in /var/tmp/stxxl
on Unix or in the user's temp directory on Windows.
These are the locations STXXL will look for a disk configuration file on Linux/Unix systems, in order of precedence:
STXXLCFG
specifies a file, this is used..stxxl.$HOSTNAME
(for host specific configuration),.stxxl
(for general configuration).$HOME
directory of the current user is checked (usual method):$HOME/.stxxl.$HOSTNAME
(for host specific configuration),$HOME/.stxxl
(for general configuration).$HOSTNAME
variable is not exported. For the host specific configuration to work, you must add "export HOSTNAME"
to your shell configuration (.bashrc
).On Windows systems, STXXL looks for a disk configuration file in the following directories:
STXXLCFG
specifies a file, this is used..stxxl.%COMPUTERNAME%.txt
(for host specific configuration),.stxxl.txt
(for general configuration).%APPDATA%
directory of the current user is checked (usual method):%APPDATA%/.stxxl.%COMPUTERNAME%.txt
(for host specific configuration),%APPDATA%/.stxxl.txt
(for general configuration).C:\Users\<username>\AppData\Roaming
%APPDATA%
directory by simply entering "%APPDATA%"
in the Windows Explorer address/location line.Each line of the configuration file describes a disk. Lines starting with '#' are comments.
A disk description uses the following format:
disk=<path>,<capacity>,<fileio> <options>
Description of the parameters:
<path>
: full disk filename.full_disk_filename
would be /mnt/disk0/some_file_name
. "###"
(three '#'), then these symbols are replaced by the current process id.<capacity>
: maximum capacity of the diskK
, M
, G
, T
, P
(powers of 10),Ki
, Mi
, Gi
, Ti
, Pi
(powers of 2).M
(megabyte) is assumed.<fileio>
: STXXL
has a number of different file access implementations, choose one of them (recommended ones in bold):syscall
: use read
and write
system calls which perform disk transfers directly on user memory pages without superfluous copying (currently the fastest method)wincall
: on Windows, use direct calls to the Windows API.linuxaio
: on Linux, use direct syscalls to the native Linux AIO interface. memory
: keeps all data in RAM, for quicker testingmmap
: use
mmap
and munmap
system callsboostfd
: access the file using a Boost file descriptorfileperblock_syscall
, fileperblock_mmap
, fileperblock_boostfd
: same as above, but take a single file per block, using full_disk_filename as file name prefix. Usually provide worse performance than the standard variants, but release freed blocks to the file system immediately.simdisk
: simulates timings of the IBM IC35L080AVVA07 disk, full_disk_filename must point to a file on a RAM disk partition with sufficient spacewbtl
: library-based write-combining (good for writing small blocks onto SSDs), based on syscall
<options>
: additional options for file access implementation. Not all are available for every fileio method. The option order is unimportant.autogrow
: enables automatic growth of the file beyond the specified capacity.direct
, nodirect
, direct=
[off/try/on] : disable buffering in system cache by passing O_DIRECT or similar flag to open. direct
or direct=on
, STXXL will fail without direct access. With nodirect
or direct=off
it is disabled. The default is direct=try
, which first attempts to open with O_DIRECT and falls back to opening without if it fails.unlink
(or unlink_on_open
) : unlink the file from the fs immediately after creation. delete
(or delete_on_exit
) : delete file after the STXXL program exists raw_device
: fail if the opened path is not a raw block device. queue=#
: assign the disk to a specific I/O request queue and thread. devid=#
: assign the disk entry a specific physical device id. queue_length=#
: specify for linuxaio the desired queue inside the linux kernel using this option.Example:
disk=/data01/stxxl,500G,syscall unlink disk=/data02/stxxl,300G,syscall unlink
On Windows, one usually uses different disk drives and wincall
.
disk=c:\stxxl.tmp,700G,wincall delete disk=d:\stxxl.tmp,200G,wincall delete
On Linux you can try to take advantage of NCQ + Kernel AIO queues:
disk=/data01/stxxl,500G,linuxaio unlink disk=/data02/stxxl,300G,linuxaio unlink
The library benefits from direct transfers from user memory to disk, which saves superfluous copies. We recommend to use the XFS file system, which gives good read and write performance for large files. Note that file creation speed of XFS
is a bit slower, so that disk files should be precreated for optimal performance.
If the filesystems only use is to store one large STXXL disk file, we also recommend to add the following options to the mkfs.xfs
command to gain maximum performance:
$ mkfs.xfs -d agcount=1 -l size=512b
The following filesystems have been reported not to support direct I/O: tmpfs
, glusterfs
. By default, STXXL will first try to use direct I/O (O_DIRECT
open flag). If that fails, it will print a warning and open the file without O_DIRECT
.
syscall
. disk=/dev/sdb1
or similar. This will of course overwrite all data on the partitions! The I/O performance of raw disks is generally more stable and slightly higher than with file systems. raw_device
flag is only for verification, STXXL will automatically detect raw block devices and also their size.STXXL produces two kinds of log files, a message and an error log. By setting the environment variables STXXLLOGFILE
and STXXLERRLOGFILE
, you can configure the location of these files. The default values are stxxl.log
and stxxl.errlog
, respectively.
In order to get the maximum performance one can precreate disk files described in the configuration file, before running STXXL applications. A precreation utility is included in the set of STXXL utilities in stxxl_tool
. Run this utility for each disk you have defined in the disk configuration file:
$ stxxl_tool create_files <capacity> <full_disk_filename...> // for example: $ stxxl_tool create_files 1GiB /data01/stxxl
With STXXL >= 1.4.0, the library can also be configured via the user application.
All disk configuration is managed by the stxxl::config class, which contains a list of stxxl::disk_config objects. Each stxxl::disk_config object encapsulates one disk= lines from a config file, or one allocated disk.
The disk configuration must be supplied to the STXXL library before any other function calls, because the stxxl::config object must be filled before any external memory blocks are allocated by stxxl::block_manager.