Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4611

Checking perms on S3 files is a very expensive no-op

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Catalog
    • Labels:

      Description

      After getting in IMPALA-4172/IMPALA-3653, we expected good gains in S3 metadata loading. However, as Mostafa Mokhtar found recently, we spend a lot of time sending requests to S3 for every file to check the ACL permissions associated with it.

      However, on S3, there are no Hadoop style ACLs but only object-level ACLs which are associated with the AWS credentials accessing the files (objects) and we cannot yet check or set permissions with the S3AFileSystem.

      This means that we're wasting a lot of time waiting for S3 only to get back the same standard response every time. We can skip these checks when we're working with S3 files just the same way we skip 'inherit permissions' while creating new S3 files.

      1. invalidate_cs_3.jfr
        1.90 MB
        Sailesh Mukil

        Issue Links

          Activity

          Hide
          jbapple Jim Apple added a comment -

          This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09.

          2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that:

          https://github.com/apache/incubator-impala/commits/2.8.0

          Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0?

          Thank you!

          Show
          jbapple Jim Apple added a comment - This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09. 2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that: https://github.com/apache/incubator-impala/commits/2.8.0 Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0? Thank you!
          Show
          sailesh Sailesh Mukil added a comment - https://github.com/apache/incubator-impala/commit/ffbdeda9469997dce6c6c3a80c78877486a3bdd9
          Hide
          sailesh Sailesh Mukil added a comment -

          Also, thanks for clarifying that it was for every partition directory and not every file.

          Show
          sailesh Sailesh Mukil added a comment - Also, thanks for clarifying that it was for every partition directory and not every file.
          Hide
          sailesh Sailesh Mukil added a comment -

          bharath v This is a TPCDS dataset on a table with 2000 partitions. IMPALA-4172/IMPALA-3653 did help remove one hot path from loading metadata, however, the above mentioned path is another hot path that can be optimized for S3.

          Show
          sailesh Sailesh Mukil added a comment - bharath v This is a TPCDS dataset on a table with 2000 partitions. IMPALA-4172 / IMPALA-3653 did help remove one hot path from loading metadata, however, the above mentioned path is another hot path that can be optimized for S3.
          Hide
          bharathv bharath v added a comment -

          we spend a lot of time sending requests to S3 for every file to check the ACL permissions associated with it.

          We seem to be calling getAvailableAccessLevel() for every partition directory path and not every file. How many partitions are we loading here?

          Show
          bharathv bharath v added a comment - we spend a lot of time sending requests to S3 for every file to check the ACL permissions associated with it. We seem to be calling getAvailableAccessLevel() for every partition directory path and not every file. How many partitions are we loading here?
          Hide
          sailesh Sailesh Mukil added a comment -

          Attached the JFR that Mostafa Mokhtar attached to IMPALA-3482. Needs to opened with Java Mission Control.

          Show
          sailesh Sailesh Mukil added a comment - Attached the JFR that Mostafa Mokhtar attached to IMPALA-3482 . Needs to opened with Java Mission Control.

            People

            • Assignee:
              Unassigned
              Reporter:
              sailesh Sailesh Mukil
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development