Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2889

LOAD DATA IF NOT EXISTS functionality

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.1
    • None
    • Import/Export
    • None

    Description

      Background:
      The behavior of LOAD DATA LOCAL INPATH has changed. It used to give you an error when trying to copy in a log that already existed. Now it re-names the file with copy_1 so the file always goes into hdfs.

      Original discussion:
      http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCB8D2849.14F69%25sean.mcnamara%40webtrends.com%3E

      Issue:
      There is no longer an atomic way to insert files into hive and guarantee that the file won't go in twice. Using OVERWRITE will cause other logs in the table/partition to be deleted.

      Example:
      /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_a.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')"
      /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')"
      /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')"
      /usr/local/hive/bin/hive -e "LOAD DATA LOCAL INPATH 'test_b.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')"

      Result:
      test_a.bz2
      test_b.bz2
      test_b_copy_1.bz2
      test_b_copy_2.bz2

      test_b data was inserted 3 times, which is not the desired behavior in this instance.

      Proposal:
      Add IF NOT EXISTS flag to indicate copy semantics. If the the log file does not exist in the table/partition, the log would go in normally. If the log does exist in the table/partition hive would return an error and return an exit code.

      Proposed HiveQL Example:
      LOAD DATA LOCAL IF NOT EXISTS INPATH 'test_a.bz2' INTO TABLE logs PARTITION(ds='2012-03-19', hr='23')

      Attachments

        Activity

          People

            Unassigned Unassigned
            seanmcn Sean McNamara
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: