Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22062

WriteId is not updated for a partitioned ACID table when schema changes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Changing the schema (e.g. adding a new column) of a non-partitioned ACID table results in the table-level writeId being incremented. This is as expected.

      However, if you do the same on a partitioned ACID table then neither the table-level nor the partition-level writeIds are updated. I would expect in this case to increment the table-level writeId to reflect that the table has been changed.
      Note, that get_valid_write_ids() shows that the high watermark is incremented even though the writeId isn't.

      Update: I'd extend the scope of this Jira further a bit. There are a number of use cases in Hive that doesn't result in a writeId change on ACID tables and as a result there is no way from other systems (like Impala) to judge if a refresh should be run on a table or not. The only option is to every time update all the data for a table that is expensive. E.g. Additionally to the above use-case compaction is something that is not noticeable outside from Hive.

      Attachments

        Issue Links

          Activity

            People

              lkovari Laszlo Kovari
              gaborkaszab Gabor Kaszab
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: