Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4118

Implement disk spill encryption and integrity checking for new buffer pool

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None

      Description

      Ensure that the --disk_spill_encryption option is supported for the new buffer pool in the same way as it currently is for BufferedBlockMgr. This is an important security feature.

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          Extracted out the common logic:

          Change subject: IMPALA-4118: extract encryption utils from BufferedBlockMgr
          ......................................................................

          IMPALA-4118: extract encryption utils from BufferedBlockMgr

          As groundwork for IMPALA-4118, extract encryption and integrity
          verification routines into a separate module that can be used
          by the new BufferPool.

          Simplify the logic in BufferedBlockMgr by not distinguishing between the
          integrity check and encryption options, which are controlled by the same
          command line flag anyway.

          This patch changes how the OpenSSL RNG is seeded. I consulted with the
          original author of the code (Michael Yoder), who suggested that the new
          approach would be appropriate: "I'd pick whatever implementation is
          easiest for you. You could add entropy once per query, or once every X
          keys (100 keys seems reasonable), or once every "convenient place". 4kb
          is probably overkill, you could use 512b for example."

          Testing:
          Added some unit tests for the utilities. We already have coverage of the
          BufferedBlockMgr with encryption in buffered-block-mgr-test. Also reduce
          the number of iterations in the buffered-block-mgr-test to save some
          testing time.

          Change-Id: Ibebe431f69c3c569cbff68171beaa32ef2ab03b6
          Reviewed-on: http://gerrit.cloudera.org:8080/4389
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - Extracted out the common logic: Change subject: IMPALA-4118 : extract encryption utils from BufferedBlockMgr ...................................................................... IMPALA-4118 : extract encryption utils from BufferedBlockMgr As groundwork for IMPALA-4118 , extract encryption and integrity verification routines into a separate module that can be used by the new BufferPool. Simplify the logic in BufferedBlockMgr by not distinguishing between the integrity check and encryption options, which are controlled by the same command line flag anyway. This patch changes how the OpenSSL RNG is seeded. I consulted with the original author of the code (Michael Yoder), who suggested that the new approach would be appropriate: "I'd pick whatever implementation is easiest for you. You could add entropy once per query, or once every X keys (100 keys seems reasonable), or once every "convenient place". 4kb is probably overkill, you could use 512b for example." Testing: Added some unit tests for the utilities. We already have coverage of the BufferedBlockMgr with encryption in buffered-block-mgr-test. Also reduce the number of iterations in the buffered-block-mgr-test to save some testing time. Change-Id: Ibebe431f69c3c569cbff68171beaa32ef2ab03b6 Reviewed-on: http://gerrit.cloudera.org:8080/4389 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins —
          Hide
          tarmstrong Tim Armstrong added a comment -

          This is no longer a distinct task from IMPALA-3202, since the below commit pushed the handling down into TmpFileMgr, which will be used by BufferPool.

          commit 95ed4434f2f446e214934f7dc251b843c1d6b0a6
          Author: Tim Armstrong <tarmstrong@cloudera.com>
          Date: Fri Sep 4 10:54:11 2015 -0700

          IMPALA-3202,IMPALA-2079: rework scratch file I/O

          Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
          TmpFileMgr, in anticipation of it being shared with BufferPool.
          TmpFileMgr now handles:

          • Scratch space allocation and recycling
          • Read and write I/O

          The interface is also greatly changed so that it is built around Write()
          and Read() calls, abstracting away the details of temporary file
          allocation from clients. This means the TmpFileMgr::File class can
          be hidden from clients.

          Write error recovery:
          Also implement write error recovery in TmpFileMgr.

          If an error occurs while writing to scratch and we have multiple
          scratch directories, we will try one of the other directories
          before cancelling the query. File-level blacklisting is used to
          prevent excessive repeated attempts to resize a scratch file during
          a single query. Device-level blacklisting is not implemented because
          it is problematic to permanently take a scratch directory out of use.

          To reduce the number of error paths, all I/O errors are now handled
          asynchronously. Previously errors creating or extending the file were
          returned synchronously from WriteUnpinnedBlock(). This required
          modifying DiskIoMgr to create the file if not present when opened.

          Also set the default max_errors value in the thrift definition file,
          so that it is in effect for backend tests.

          Future Work:

          • Support for recycling variable-length scratch file ranges. I omitted
            this to avoid making the patch even large.

          Testing:
          Updated BufferedBlockMgr unit test to reflect changes in behaviour:

          • Scratch space is no longer permanently associated with a block, and
            is remapped every time a new block is written to disk .
          • Files are now blacklisted - updated existing tests and enable the
            disable blacklisting test.

          Added some basic testing of recycling of scratch file ranges in
          the TmpFileMgr unit test.

          I also manually tested the code in two ways. First by removing permissions
          for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
          Second, by creating a tiny ramdisk (16M) and running with two scratch
          directories: one on /tmp and one on the tiny ramdisk. When spilling, an
          out of space error is encountered for the tiny ramdisk and impala spills
          the remaining data (72M) to /tmp.

          Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
          Reviewed-on: http://gerrit.cloudera.org:8080/5141
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - This is no longer a distinct task from IMPALA-3202 , since the below commit pushed the handling down into TmpFileMgr, which will be used by BufferPool. commit 95ed4434f2f446e214934f7dc251b843c1d6b0a6 Author: Tim Armstrong <tarmstrong@cloudera.com> Date: Fri Sep 4 10:54:11 2015 -0700 IMPALA-3202 , IMPALA-2079 : rework scratch file I/O Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into TmpFileMgr, in anticipation of it being shared with BufferPool. TmpFileMgr now handles: Scratch space allocation and recycling Read and write I/O The interface is also greatly changed so that it is built around Write() and Read() calls, abstracting away the details of temporary file allocation from clients. This means the TmpFileMgr::File class can be hidden from clients. Write error recovery: Also implement write error recovery in TmpFileMgr. If an error occurs while writing to scratch and we have multiple scratch directories, we will try one of the other directories before cancelling the query. File-level blacklisting is used to prevent excessive repeated attempts to resize a scratch file during a single query. Device-level blacklisting is not implemented because it is problematic to permanently take a scratch directory out of use. To reduce the number of error paths, all I/O errors are now handled asynchronously. Previously errors creating or extending the file were returned synchronously from WriteUnpinnedBlock(). This required modifying DiskIoMgr to create the file if not present when opened. Also set the default max_errors value in the thrift definition file, so that it is in effect for backend tests. Future Work: Support for recycling variable-length scratch file ranges. I omitted this to avoid making the patch even large. Testing: Updated BufferedBlockMgr unit test to reflect changes in behaviour: Scratch space is no longer permanently associated with a block, and is remapped every time a new block is written to disk . Files are now blacklisted - updated existing tests and enable the disable blacklisting test. Added some basic testing of recycling of scratch file ranges in the TmpFileMgr unit test. I also manually tested the code in two ways. First by removing permissions for /tmp/impala-scratch and ensuring that a spilling query fails cleanly. Second, by creating a tiny ramdisk (16M) and running with two scratch directories: one on /tmp and one on the tiny ramdisk. When spilling, an out of space error is encountered for the tiny ramdisk and impala spills the remaining data (72M) to /tmp. Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Reviewed-on: http://gerrit.cloudera.org:8080/5141 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development