Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3671

Add query option to limit scratch space usage

    Details

      Description

      The immediate motivation for this is to enable better testing of graceful failures when spilling is disabled (e.g. IMPALA-3670). Currently we only have control over this via startup options, so we have to implement these as custom cluster tests, but there's no reason in principle we need to start a fresh cluster.

      The idea would be to add a query option 'scratch_limit' or similar, that limits the amount of scratch directory space that can be used. This would be useful to prevent runaway queries or to prevent queries from spilling when that is not desired.

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-3671: Add query option to limit scratch space usage

        Currently we can only disable spilling via a startup option which means
        we need to restart the cluster for this.
        This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of
        scratch directory space that can be used. This would be useful to prevent
        runaway queries or to prevent queries from spilling when that is not desired.
        This also adds a 'ScratchSpace' counter to the runtime profile of the
        BlockMgr that keeps track of the scratch space allocated.

        Valid values for the SCRATCH_LIMIT query option are:

        • unspecified or a limit of -1 means no limit
        • a limit of 0 (zero) means spilling is disabled
        • an int (= number of bytes)
        • a float followed by "M" (MB) or "G" (GB)

        Testing:
        A new test file "test_scratch_limit.py" was added for testing functionality.

        Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470
        Reviewed-on: http://gerrit.cloudera.org:8080/4497
        Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-3671 : Add query option to limit scratch space usage Currently we can only disable spilling via a startup option which means we need to restart the cluster for this. This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of scratch directory space that can be used. This would be useful to prevent runaway queries or to prevent queries from spilling when that is not desired. This also adds a 'ScratchSpace' counter to the runtime profile of the BlockMgr that keeps track of the scratch space allocated. Valid values for the SCRATCH_LIMIT query option are: unspecified or a limit of -1 means no limit a limit of 0 (zero) means spilling is disabled an int (= number of bytes) a float followed by "M" (MB) or "G" (GB) Testing: A new test file "test_scratch_limit.py" was added for testing functionality. Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470 Reviewed-on: http://gerrit.cloudera.org:8080/4497 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        John Russell - Docs work required.

        Show
        mmokhtar Mostafa Mokhtar added a comment - John Russell - Docs work required.

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            tarmstrong Tim Armstrong
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development