Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11344

HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.2.2, 1.3.0, 2.0.0
    • None
    • None

    Description

      HIVE-9845 introduced a notion of compression for HCatSplits so that when serializing, it finds commonalities between PartInfo and TableInfo objects, and if the two are identical, it nulls out that field in PartInfo, thus making sure that when PartInfo is then serialized, info is not repeated.

      This, however, has the side effect of making the PartInfo object unusable if HCatSplit.write has been called.

      While this does not affect M/R directly, since they do not know about the PartInfo objects and once serialized, the HCatSplit object is recreated by deserializing on the backend, which does restore the split and its PartInfo objects, this does, however, affect framework users of HCat that try to mimic M/R and then use the PartInfo objects to instantiate distinct readers.

      Thus, we need to make it so that PartInfo is still usable after HCatSplit.write is called.

      Attachments

        1. HIVE-11344.patch
          2 kB
          Sushanth Sowmyan

        Issue Links

          Activity

            People

              sushanth Sushanth Sowmyan
              sushanth Sushanth Sowmyan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: