Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18350 load data should rename files consistent with insert statements
  3. HIVE-18391

load data should rename files consistent with insert statements (bucketed tables only)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Insert statements create files of format ending with 0000_0, 0001_0 etc. However, the load data uses the input file name. That results in inconsistent naming convention which makes SMB joins difficult in some scenarios and may cause trouble for other types of queries in future.
      We need consistent naming convention.

      For bucketed table, hive relies on user to name the files matching the bucket in non-strict mode. Hive assumes that the data belongs to same bucket in a file. In strict mode, loading bucketed table is disabled.
      This will likely affect most of the tests which load data which is pretty significant.

      Attachments

        1. HIVE-18391.1.patch
          1.07 MB
          Deepak Jaiswal
        2. HIVE-18391.2.patch
          1.07 MB
          Deepak Jaiswal
        3. HIVE-18391.3.patch
          1.07 MB
          Deepak Jaiswal

        Issue Links

          Activity

            People

              djaiswal Deepak Jaiswal
              djaiswal Deepak Jaiswal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: