Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19830

Inconsistent behavior when multiple partitions point to the same location

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0
    • None
    • Hive
    • None

    Description

      // create a table with 2 partitions where both partitions share the same location and inserting a single line to one of them.
      create table test (i int) partitioned by (j int) stored as parquet;
      alter table test add partition (j=1) location 'hdfs://localhost:20500/test-warehouse/test/j=1';
      alter table test add partition (j=2) location 'hdfs://localhost:20500/test-warehouse/test/j=1';
      insert into table test partition (j=1) values (1);

      // select * show this single line in both partitions as expected.
      select * from test;
      1 1
      1 2

      // however, sum() doesn't add up the line for all the partitions. This is Issue #1.
      select sum( i), sum(j) from test;
      1 2

      // On the file system there is a common dir for the 2 partitions that is expected.
      hdfs dfs -ls hdfs://localhost:20500/test-warehouse/test/
      Found 1 items
      drwxr-xr-x - gaborkaszab supergroup 0 2018-06-08 10:54 hdfs://localhost:20500/test-warehouse/test/j=1

      // Let's drop one of the partitions now!
      alter table test drop partition (j=2);
      // running the same hdfs dfs -ls command shows that the j=1 directory is dropped. I think this is a good behavior, we just have to document that this is the expected case.
      // select * from test; returns zero rows, this is still as expected.

      // Even though the dir is dropped j=1 partition is still visible with show partitions. This is Issue #2.
      show partitions test;
      j=1

      After dropping the directory with Hive, when Impala reloads it's partitions it asks Hive to tell what are the existing partitions. Apparently, Hive sends down a list with j=1 partition included and then Impala takes it as an existing one and doesn't drop it from Catalog's cache. Here Hive shouldn't send that partition down. This is Issue #3.

      Attachments

        Issue Links

          Activity

            People

              szita Ádám Szita
              gaborkaszab Gabor Kaszab
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: