Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18350

load data should rename files consistent with insert statements

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • None
    • None
    • None
    • Incompatible change

    Description

      Insert statements create files of format ending with 0000_0, 0001_0 etc. However, the load data uses the input file name. That results in inconsistent naming convention which makes SMB joins difficult in some scenarios and may cause trouble for other types of queries in future.

      We need consistent naming convention.

      For non-bucketed table, hive renames all the files regardless of how they were named by the user.
      For bucketed table, hive relies on user to name the files matching the bucket in non-strict mode. Hive assumes that the data belongs to same bucket in a file. In strict mode, loading bucketed table is disabled.

      This will likely affect most of the tests which load data which is pretty significant due to which it is further divided into two subtasks for smoother merge.

      For existing tables in customer database, it is recommended to reload bucketed tables otherwise if customer tries to run SMB join and there is a bucket for which there is no split, then there is a possibility of getting incorrect results. However, this is not a regression as it would happen even without the patch.
      With this patch however, and reloading data, the results should be correct.

      For non-bucketed tables and external tables, there is no difference in behavior and reloading data is not needed.

      Attachments

        1. HIVE-18350.9.patch
          354 kB
          Deepak Jaiswal
        2. HIVE-18350.8.patch
          346 kB
          Deepak Jaiswal
        3. HIVE-18350.7.patch
          347 kB
          Deepak Jaiswal
        4. HIVE-18350.6.patch
          25 kB
          Deepak Jaiswal
        5. HIVE-18350.5.patch
          31 kB
          Deepak Jaiswal
        6. HIVE-18350.4.patch
          28 kB
          Deepak Jaiswal
        7. HIVE-18350.3.patch
          28 kB
          Deepak Jaiswal
        8. HIVE-18350.2.patch
          28 kB
          Deepak Jaiswal
        9. HIVE-18350.16.patch
          824 kB
          Deepak Jaiswal
        10. HIVE-18350.15.patch
          823 kB
          Deepak Jaiswal
        11. HIVE-18350.14.patch
          823 kB
          Deepak Jaiswal
        12. HIVE-18350.13.patch
          322 kB
          Deepak Jaiswal
        13. HIVE-18350.12.patch
          321 kB
          Deepak Jaiswal
        14. HIVE-18350.11.patch
          321 kB
          Deepak Jaiswal
        15. HIVE-18350.10.patch
          315 kB
          Deepak Jaiswal
        16. HIVE-18350.1.patch
          1.47 MB
          Deepak Jaiswal

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              djaiswal Deepak Jaiswal
              djaiswal Deepak Jaiswal
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: