Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-467

Scratch data location should be on different filesystems for different types of intermediate data

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • Query Processor
    • None
    • S3/EC2

    Description

      Currently Hive uses the same scratch directory/path for all sorts of temporary and intermediate data. This is problematic:

      1. Temporary location for writing out DDL output should just be temp file on local file system. This divorces the dependence of metadata and browsing operations on a functioning hadoop cluster.
      2. Temporary location of intermediate map-reduce data should be the default file system (which is typically the hdfs instance on the compute cluster)
      3. Temporary location for data that needs to be 'moved' into tables should be on the same file system as the table's location (table's location may not be same as hdfs instance of processing cluster).

      ie. - local storage, map-reduce intermediate storage and table storage should be distinguished. Without this distinction - using hive on environments like S3/EC2 causes problems. In such an environment - i would like to be able to:

      • do metadata operations without a provisioned hadoop cluster (using data stored in S3 and metastore on local disk)
      • attach to a provisioned hadoop cluster and run queries
      • store data back in tables that are created over s3 file system

      Attachments

        1. hive-467.patch.1
          54 kB
          Joydeep Sen Sarma
        2. hive-467.patch.2
          74 kB
          Joydeep Sen Sarma
        3. hive-467.3.patch
          76 kB
          Joydeep Sen Sarma
        4. hive-467.4.patch
          77 kB
          Joydeep Sen Sarma
        5. hive-467.5.patch
          77 kB
          Joydeep Sen Sarma
        6. hive-467.6.patch
          77 kB
          Joydeep Sen Sarma

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jsensarma Joydeep Sen Sarma Assign to me
            jsensarma Joydeep Sen Sarma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment