Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-877

Add column metadata for partition for inline hive registration

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Previously, we remove the schema.literal for partition.  Because Avro schemas should only be defined at the table level. Hive overrides table properties if the same property is defined on the partition. Defining them at the partition level may lead to partitions with inconsistent schemas. And because column metadata is calculated from schema.literal, so we remove the column metadata as well.

      Then we encounter a problem that presto cannot read data from orc file. Because ORC (and other Hive serdes) need metadata in the partitions so that coercion can be done between a partition schema and the table schema.

      So we need to treat Avro and other formate separately to make sure hive registration works well so that user can read right data from Presto.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Zihan Li Zihan Li
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h