Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2079

Don't fail when write to scratch dir results in error.

    Details

      Description

      Currently, a query fails when it tries to spill data to a disk and that write results in an error. This can happen for multiple reasons: a) no write permission, b) bad disk.

      The fix should do two things:

      • Check for write permissions on startup
      • Don't fail queries that hit an error at runtime (e.g. disk full) if there is another functional scratch disk that can be used instead.

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          What's the desired behavior for the startup check? Should it result in an error or warning?

          Show
          tarmstrong Tim Armstrong added a comment - What's the desired behavior for the startup check? Should it result in an error or warning?
          Hide
          marcelk Marcel Kornacker added a comment -

          Just a warning: we should then blacklist that directory and continue. If that means we don't have any scratch dirs available when we finally run a query that needs it, we should abort that query.

          Show
          marcelk Marcel Kornacker added a comment - Just a warning: we should then blacklist that directory and continue. If that means we don't have any scratch dirs available when we finally run a query that needs it, we should abort that query.
          Hide
          alan@cloudera.com Alan Choi added a comment -

          At start up, if none of the scratch dir is available, the error message should explicitly say that "no scratch dir is available".

          Show
          alan@cloudera.com Alan Choi added a comment - At start up, if none of the scratch dir is available, the error message should explicitly say that "no scratch dir is available".
          Hide
          marcelk Marcel Kornacker added a comment -

          Exactly, that should be a (non-fatal) error.

          Show
          marcelk Marcel Kornacker added a comment - Exactly, that should be a (non-fatal) error.
          Hide
          tarmstrong Tim Armstrong added a comment -

          After looking at the code there are a few possible subgoals with 2):

          2a. Blacklist so that no future queries attempt to create tmp files on that disk
          2b. Prevent the current query from aborting by redirectly the failed write to a good disk. Cross fingers that any already-spilled blocks can be read back ok from the disk.
          2c. Notify other concurrently running queries about the bad disk so they can avoid using it going forward.

          I think 2a and 2b are definitely goals. I'm not sure if 2c is a goal or non-goal - it's probably ok but suboptimal if the other concurrent queries continue using the "bad" disk until they encounter write errors independently. 2c is a bit tricky since it requires interaction between query threads where there was none previously.

          Show
          tarmstrong Tim Armstrong added a comment - After looking at the code there are a few possible subgoals with 2): 2a. Blacklist so that no future queries attempt to create tmp files on that disk 2b. Prevent the current query from aborting by redirectly the failed write to a good disk. Cross fingers that any already-spilled blocks can be read back ok from the disk. 2c. Notify other concurrently running queries about the bad disk so they can avoid using it going forward. I think 2a and 2b are definitely goals. I'm not sure if 2c is a goal or non-goal - it's probably ok but suboptimal if the other concurrent queries continue using the "bad" disk until they encounter write errors independently. 2c is a bit tricky since it requires interaction between query threads where there was none previously.
          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-2079: Part 1: report non-writable scratch dirs at startup
          Previously Impala could erroneously decide to use non-writable scratch
          directories, e.g. if /tmp/impala-scratch already exists and is not
          writable by the current user.

          With this change, if we cannot remove and recreate a fresh scratch directory,
          it is not used. If we have no valid scratch directories, we log an
          error and continue startup.

          Add unit test for CreateDirectory to test behavior for success and
          failure cases.

          Add system tests to check logging and query execution in various
          scenarios where we do not have scratch available.

          Modify FilesystemUtil to use non-exception-throwing Boost functions to
          avoid unhandled exceptions escaping into the rest of the Impala
          codebase, which does not expect the use of exceptions.

          Change-Id: Icaa8429051942424e1d811c54bde10102ac7f7b3
          Reviewed-on: http://gerrit.cloudera.org:8080/565
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-2079 : Part 1: report non-writable scratch dirs at startup Previously Impala could erroneously decide to use non-writable scratch directories, e.g. if /tmp/impala-scratch already exists and is not writable by the current user. With this change, if we cannot remove and recreate a fresh scratch directory, it is not used. If we have no valid scratch directories, we log an error and continue startup. Add unit test for CreateDirectory to test behavior for success and failure cases. Add system tests to check logging and query execution in various scenarios where we do not have scratch available. Modify FilesystemUtil to use non-exception-throwing Boost functions to avoid unhandled exceptions escaping into the rest of the Impala codebase, which does not expect the use of exceptions. Change-Id: Icaa8429051942424e1d811c54bde10102ac7f7b3 Reviewed-on: http://gerrit.cloudera.org:8080/565 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins
          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-2079: Part 2: handling tmp device errors

          Tmp devices are blacklisted when a write error is encountered for that
          device. No more scratch space will be allocated on the blacklisted
          device, based on the assumption that the device is likely to be
          misconfigured or failing.

          This patch does not attempt to recover the query that experienced the
          write error. It also does not attempt to remap any existing blocks away
          from the temporary device.

          This behaviour is unit tested for several failure scenarios.

          This patch adds additional test infrastructure required for testing
          BufferedBlockMgr behavior in the presence of faults and in
          configurations with multiple tmp directories.

          Adds metrics tmp-file-mgr.active-scratch-dirs and
          tmp-file-mgr.active-scratch-dirs.list that track the number and set of
          active scratch dirs and expose it in the Impala web UI.

          Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746
          Reviewed-on: http://gerrit.cloudera.org:8080/579
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-2079 : Part 2: handling tmp device errors Tmp devices are blacklisted when a write error is encountered for that device. No more scratch space will be allocated on the blacklisted device, based on the assumption that the device is likely to be misconfigured or failing. This patch does not attempt to recover the query that experienced the write error. It also does not attempt to remap any existing blocks away from the temporary device. This behaviour is unit tested for several failure scenarios. This patch adds additional test infrastructure required for testing BufferedBlockMgr behavior in the presence of faults and in configurations with multiple tmp directories. Adds metrics tmp-file-mgr.active-scratch-dirs and tmp-file-mgr.active-scratch-dirs.list that track the number and set of active scratch dirs and expose it in the Impala web UI. Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746 Reviewed-on: http://gerrit.cloudera.org:8080/579 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins
          Hide
          tarmstrong Tim Armstrong added a comment -

          The last change prevents additional blocks being allocated on tmp devices that previously experienced errors. We could further attempt to save a failing query by redirecting writes to a new tmp device, so I'll downgrade the priority and keep the issue open.

          Show
          tarmstrong Tim Armstrong added a comment - The last change prevents additional blocks being allocated on tmp devices that previously experienced errors. We could further attempt to save a failing query by redirecting writes to a new tmp device, so I'll downgrade the priority and keep the issue open.
          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-2305 revealed that it's possible for CANCELLATION errors to be passed into WriteComplete without the other cancellation flags being true - need to handle this also.

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-2305 revealed that it's possible for CANCELLATION errors to be passed into WriteComplete without the other cancellation flags being true - need to handle this also.
          Hide
          tarmstrong Tim Armstrong added a comment -

          Blacklisting was disabled for - it's undesirable to permanently disable scratch directories, especially for errors like out of disk space. Temporary blacklisting will be investigated.

          Show
          tarmstrong Tim Armstrong added a comment - Blacklisting was disabled for - it's undesirable to permanently disable scratch directories, especially for errors like out of disk space. Temporary blacklisting will be investigated.
          Hide
          tarmstrong Tim Armstrong added a comment -

          The remaining aspects of this aren't going to make 2.3.0, but we should get the retry logic and some form of blacklisting into 2.4.0.

          Show
          tarmstrong Tim Armstrong added a comment - The remaining aspects of this aren't going to make 2.3.0, but we should get the retry logic and some form of blacklisting into 2.4.0.
          Hide
          tarmstrong Tim Armstrong added a comment -

          I think we will want to tackle this in the context of the new buffer pool.

          Show
          tarmstrong Tim Armstrong added a comment - I think we will want to tackle this in the context of the new buffer pool.
          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-3202,IMPALA-2079: rework scratch file I/O

          Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
          TmpFileMgr, in anticipation of it being shared with BufferPool.
          TmpFileMgr now handles:

          • Scratch space allocation and recycling
          • Read and write I/O

          The interface is also greatly changed so that it is built around Write()
          and Read() calls, abstracting away the details of temporary file
          allocation from clients. This means the TmpFileMgr::File class can
          be hidden from clients.

          Write error recovery:
          Also implement write error recovery in TmpFileMgr.

          If an error occurs while writing to scratch and we have multiple
          scratch directories, we will try one of the other directories
          before cancelling the query. File-level blacklisting is used to
          prevent excessive repeated attempts to resize a scratch file during
          a single query. Device-level blacklisting is not implemented because
          it is problematic to permanently take a scratch directory out of use.

          To reduce the number of error paths, all I/O errors are now handled
          asynchronously. Previously errors creating or extending the file were
          returned synchronously from WriteUnpinnedBlock(). This required
          modifying DiskIoMgr to create the file if not present when opened.

          Also set the default max_errors value in the thrift definition file,
          so that it is in effect for backend tests.

          Future Work:

          • Support for recycling variable-length scratch file ranges. I omitted
            this to avoid making the patch even large.

          Testing:
          Updated BufferedBlockMgr unit test to reflect changes in behaviour:

          • Scratch space is no longer permanently associated with a block, and
            is remapped every time a new block is written to disk .
          • Files are now blacklisted - updated existing tests and enable the
            disable blacklisting test.

          Added some basic testing of recycling of scratch file ranges in
          the TmpFileMgr unit test.

          I also manually tested the code in two ways. First by removing permissions
          for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
          Second, by creating a tiny ramdisk (16M) and running with two scratch
          directories: one on /tmp and one on the tiny ramdisk. When spilling, an
          out of space error is encountered for the tiny ramdisk and impala spills
          the remaining data (72M) to /tmp.

          Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
          Reviewed-on: http://gerrit.cloudera.org:8080/5141
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-3202 , IMPALA-2079 : rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: Scratch space allocation and recycling Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is not implemented because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Also set the default max_errors value in the thrift definition file, so that it is in effect for backend tests. Future Work: Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Reviewed-on: http://gerrit.cloudera.org:8080/5141 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins
          Hide
          jbapple Jim Apple added a comment -

          This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09.

          2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that:

          https://github.com/apache/incubator-impala/commits/2.8.0

          Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0?

          Thank you!

          Show
          jbapple Jim Apple added a comment - This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09. 2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that: https://github.com/apache/incubator-impala/commits/2.8.0 Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0? Thank you!

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              alan@cloudera.com Alan Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development