Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27356

Hive should write name of blob type instead of table name in Puffin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.1.0
    • None
    • None

    Description

      Currently Hive writes the name of the table plus snapshot id as blob type:

      https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422

      Instead, it should write the name of the blog it writes. Table name and snapshot id are redundant information anyway, as they can be inferred from the location and filename of the puffin file.

      Currently it writes a non-standard blob (Standard blob types are listed here). I think it would be better to write standard blobs for interoperability. But if Hive wants to write non-standard blobs anyway, it should still come up with a descriptive name for them, e.g. 'hive-column-statistics-v1'.

      Attachments

        Issue Links

          Activity

            People

              simhadri-g Simhadri Govindappa
              boroknagyz Zoltán Borók-Nagy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: