Hive
  1. Hive
  2. HIVE-467

Scratch data location should be on different filesystems for different types of intermediate data

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Query Processor
    • Labels:
      None
    • Environment:

      S3/EC2

      Description

      Currently Hive uses the same scratch directory/path for all sorts of temporary and intermediate data. This is problematic:

      1. Temporary location for writing out DDL output should just be temp file on local file system. This divorces the dependence of metadata and browsing operations on a functioning hadoop cluster.
      2. Temporary location of intermediate map-reduce data should be the default file system (which is typically the hdfs instance on the compute cluster)
      3. Temporary location for data that needs to be 'moved' into tables should be on the same file system as the table's location (table's location may not be same as hdfs instance of processing cluster).

      ie. - local storage, map-reduce intermediate storage and table storage should be distinguished. Without this distinction - using hive on environments like S3/EC2 causes problems. In such an environment - i would like to be able to:

      • do metadata operations without a provisioned hadoop cluster (using data stored in S3 and metastore on local disk)
      • attach to a provisioned hadoop cluster and run queries
      • store data back in tables that are created over s3 file system
      1. hive-467.patch.2
        74 kB
        Joydeep Sen Sarma
      2. hive-467.patch.1
        54 kB
        Joydeep Sen Sarma
      3. hive-467.6.patch
        77 kB
        Joydeep Sen Sarma
      4. hive-467.5.patch
        77 kB
        Joydeep Sen Sarma
      5. hive-467.4.patch
        77 kB
        Joydeep Sen Sarma
      6. hive-467.3.patch
        76 kB
        Joydeep Sen Sarma

        Activity

        Hide
        Joydeep Sen Sarma added a comment -

        first try. need to add some tests (the only thing i can think of right now is making sure that explain and other statements work with invalid hdfs file system setting). if minimr is there - can write better tests.

        i haven't been able to make select * work without hdfs (since the map operator/intermediate dir are optimized away later after they have already been created). need some ideas on this as well.

        Show
        Joydeep Sen Sarma added a comment - first try. need to add some tests (the only thing i can think of right now is making sure that explain and other statements work with invalid hdfs file system setting). if minimr is there - can write better tests. i haven't been able to make select * work without hdfs (since the map operator/intermediate dir are optimized away later after they have already been created). need some ideas on this as well.
        Hide
        Joydeep Sen Sarma added a comment -
        • takes care of the issues mentioned here - consolidates all temp file logic into one place with different policies for local/MR/external storage
          a) DDL/Explain operations only require local storage - tests for the same included.
          b) insert operations write to tmp storage on file system of target. ie. if the destination is an external table on s3 - we will write to tmp storage on s3 and then move to target. unfortunately - can't think of a way to test this since don't have two functioning file systems
          c) all else get MR intermediate storage.

        fixed a problem with external partitions where insert overwrites into an external partition were not using the external location (but the default location). test for this is included.

        review appreciated - i would like to publish a recipe for working against s3 storage and can't really do without this.

        Show
        Joydeep Sen Sarma added a comment - takes care of the issues mentioned here - consolidates all temp file logic into one place with different policies for local/MR/external storage a) DDL/Explain operations only require local storage - tests for the same included. b) insert operations write to tmp storage on file system of target. ie. if the destination is an external table on s3 - we will write to tmp storage on s3 and then move to target. unfortunately - can't think of a way to test this since don't have two functioning file systems c) all else get MR intermediate storage. fixed a problem with external partitions where insert overwrites into an external partition were not using the external location (but the default location). test for this is included. review appreciated - i would like to publish a recipe for working against s3 storage and can't really do without this.
        Hide
        Joydeep Sen Sarma added a comment -

        retry - previous attachment was bogus

        Show
        Joydeep Sen Sarma added a comment - retry - previous attachment was bogus
        Hide
        Prasad Chakka added a comment -

        Good catch about overwriting external partitions. But that might undo the fix where partition dir can get created a little bit before the actual data is moved (in replaceFiles()). so down stream processes can start before data is moved.

        HiveMetaStore.java:~289 remove System.out.println(tblPath.toString());
        insertexternal1.q: create and alter can be replaced with one stmt. create external table texternal(key string, val string) partitioned by (insertdate string) location 'file:///tmp/texternal/2008-01-01';
        Hive.java: replaceFiles() – look at the fix for 488

        I am going to let others review the scratch dir related stuff.

        Show
        Prasad Chakka added a comment - Good catch about overwriting external partitions. But that might undo the fix where partition dir can get created a little bit before the actual data is moved (in replaceFiles()). so down stream processes can start before data is moved. HiveMetaStore.java:~289 remove System.out.println(tblPath.toString()); insertexternal1.q: create and alter can be replaced with one stmt. create external table texternal(key string, val string) partitioned by (insertdate string) location 'file:///tmp/texternal/2008-01-01'; Hive.java: replaceFiles() – look at the fix for 488 I am going to let others review the scratch dir related stuff.
        Hide
        Joydeep Sen Sarma added a comment -
        • contains one more fix in ExecDriver where addinputpaths was not using the right filesystem
        • removed the system.out.println
        • don't want to change insertexternal - i wanted the test to specifically cover the case where the partition location is specified to external fs (and the table isn't - which means it has a default location). the suggested test would have actually passed the old code.
        • Fixed Hive.java to not create partition first.

        i didn't read carefully through 488 and other jiras. it seems that hive.java's behavior is exactly the way it is in trunk now except handling the external partition case correctly. if there are other issues with the current code path - we can fix them through a separate jira?

        Show
        Joydeep Sen Sarma added a comment - contains one more fix in ExecDriver where addinputpaths was not using the right filesystem removed the system.out.println don't want to change insertexternal - i wanted the test to specifically cover the case where the partition location is specified to external fs (and the table isn't - which means it has a default location). the suggested test would have actually passed the old code. Fixed Hive.java to not create partition first. i didn't read carefully through 488 and other jiras. it seems that hive.java's behavior is exactly the way it is in trunk now except handling the external partition case correctly. if there are other issues with the current code path - we can fix them through a separate jira?
        Hide
        Prasad Chakka added a comment -

        > with the current code path - we can fix them through a separate jira?
        i just wanted partition to be not created until data has been moved.

        the load part and the metastore part looks good to me.

        Show
        Prasad Chakka added a comment - > with the current code path - we can fix them through a separate jira? i just wanted partition to be not created until data has been moved. the load part and the metastore part looks good to me.
        Hide
        Joydeep Sen Sarma added a comment -

        > i just wanted partition to be not created until data has been moved.

        cool - that's taken care of .. (although not sure we can test without some kind of fault injection code).

        Show
        Joydeep Sen Sarma added a comment - > i just wanted partition to be not created until data has been moved. cool - that's taken care of .. (although not sure we can test without some kind of fault injection code).
        Hide
        Joydeep Sen Sarma added a comment -

        would be great if someone could review the rest of the patch. some javadocs were missing on the new public apis - attaching another one.

        he changes are relatively trivial - they just consolidate all the tmp file logic in one place and provide four public api calls from Context.java:

        public boolean isMRTmpFileURI(String uriStr)
        public String getMRTmpFileURI()
        public String getLocalTmpFileURI()
        public String getExternalTmpFileURI(URI extURI)

        the semantics are obvious. Context.java is also simplified to not be a reusable object. There are a small number of indent only changes in BaseSemanticAnalyzer.java. Quite a bit of code reduction across the different parts of the compiler that had their own tmp file allocation logic.

        Show
        Joydeep Sen Sarma added a comment - would be great if someone could review the rest of the patch. some javadocs were missing on the new public apis - attaching another one. he changes are relatively trivial - they just consolidate all the tmp file logic in one place and provide four public api calls from Context.java: public boolean isMRTmpFileURI(String uriStr) public String getMRTmpFileURI() public String getLocalTmpFileURI() public String getExternalTmpFileURI(URI extURI) the semantics are obvious. Context.java is also simplified to not be a reusable object. There are a small number of indent only changes in BaseSemanticAnalyzer.java. Quite a bit of code reduction across the different parts of the compiler that had their own tmp file allocation logic.
        Hide
        Raghotham Murthy added a comment -

        The diff looks good. However, there are several space issues - there should be a space after 'if' and 'for'. once you fix those, i can commit after running tests.

        Show
        Raghotham Murthy added a comment - The diff looks good. However, there are several space issues - there should be a space after 'if' and 'for'. once you fix those, i can commit after running tests.
        Hide
        Joydeep Sen Sarma added a comment -

        fixed the space issues and resubmitted.

        Show
        Joydeep Sen Sarma added a comment - fixed the space issues and resubmitted.
        Hide
        Raghotham Murthy added a comment -

        looks good. will commit once tests pass.

        Show
        Raghotham Murthy added a comment - looks good. will commit once tests pass.
        Hide
        Joydeep Sen Sarma added a comment -

        the Trash constructor i was using was only available in 19.2 onwards. fixed and made sure the patch compiles with all hadoop 17/18/19

        Show
        Joydeep Sen Sarma added a comment - the Trash constructor i was using was only available in 19.2 onwards. fixed and made sure the patch compiles with all hadoop 17/18/19
        Hide
        Raghotham Murthy added a comment -

        Committed. Thanks Joydeep!

        Show
        Raghotham Murthy added a comment - Committed. Thanks Joydeep!

          People

          • Assignee:
            Joydeep Sen Sarma
            Reporter:
            Joydeep Sen Sarma
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development