Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-624

Impala does not use a partition's HDFS path as the sink location for INSERT queries, instead uses the parent table's location

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.2
    • Impala 1.3
    • None
    • None

    Description

      Impala does not use a partition's HDFS location as the sink location for INSERT queries, instead it uses the parent table's HDFS base directory and then builds the partition keys from that.

      This means data will not get inserted into the expected location if you do something like:

      CREATE TABLE Foo(i int) PARTITION(j int) LOCATION '/test-warehouse/foo'
      ALTER TABLE Foo ADD PARTITION(j=1);
      ...
      ALTER TABLE Foo PARTITION(j=1) SET LOCATION '/test-warehouse/another_path/j=1';
      INSERT INTO Foo PARTITION(j=1) SELECT 1; <-- this will go to /test-warehouse/foo/j=1 instead of the new path 
      

      When scanning the table, it seems we do use the correct path so the insert will not be reflect if the user tries to query the table.

      This is because we don't pass all the partition paths to the BE when executing the insert, we just set the HDFS base directory and then the partition expressions.

      Attachments

        Issue Links

          Activity

            People

              henryr Henry Robinson
              lskuff Lenni Kuff
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: