IMPALA-3202, IMPALA-2079: rework scratch file I/O
Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
TmpFileMgr, in anticipation of it being shared with BufferPool.
TmpFileMgr now handles:
- Scratch space allocation and recycling
- Read and write I/O
The interface is also greatly changed so that it is built around Write()
and Read() calls, abstracting away the details of temporary file
allocation from clients. This means the TmpFileMgr::File class can
be hidden from clients.
Write error recovery:
Also implement write error recovery in TmpFileMgr.
If an error occurs while writing to scratch and we have multiple
scratch directories, we will try one of the other directories
before cancelling the query. File-level blacklisting is used to
prevent excessive repeated attempts to resize a scratch file during
a single query. Device-level blacklisting is not implemented because
it is problematic to permanently take a scratch directory out of use.
To reduce the number of error paths, all I/O errors are now handled
asynchronously. Previously errors creating or extending the file were
returned synchronously from WriteUnpinnedBlock(). This required
modifying DiskIoMgr to create the file if not present when opened.
Also set the default max_errors value in the thrift definition file,
so that it is in effect for backend tests.
- Support for recycling variable-length scratch file ranges. I omitted
this to avoid making the patch even large.
Updated BufferedBlockMgr unit test to reflect changes in behaviour:
- Scratch space is no longer permanently associated with a block, and
is remapped every time a new block is written to disk .
- Files are now blacklisted - updated existing tests and enable the
disable blacklisting test.
Added some basic testing of recycling of scratch file ranges in
the TmpFileMgr unit test.
I also manually tested the code in two ways. First by removing permissions
for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
Second, by creating a tiny ramdisk (16M) and running with two scratch
directories: one on /tmp and one on the tiny ramdisk. When spilling, an
out of space error is encountered for the tiny ramdisk and impala spills
the remaining data (72M) to /tmp.
Reviewed-by: Tim Armstrong <email@example.com>
Tested-by: Impala Public Jenkins