Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8687

Support Avro through HCatalog

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • discovered in Pig, but it looks like the root cause impacts all non-Hive users

    • Hive Avro tables are now usable through HCatalog.

    Description

      Attempting to write to a HCatalog defined table backed by the AvroSerde fails with the following stacktrace:

      java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.io.LongWritable
      	at org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
      	at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
      	at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
      	at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
      	at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
      	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
      

      The proximal cause of this failure is that the AvroContainerOutputFormat's signature mandates a LongWritable key and HCat's FileRecordWriterContainer forces a NullWritable. I'm not sure of a general fix, other than redefining HiveOutputFormat to mandate a WritableComparable.

      It looks like accepting WritableComparable is what's done in the other Hive OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also be changed, since it's ignoring the key. That way fixing things so FileRecordWriterContainer can always use NullWritable could get spun into a different issue?

      The underlying cause for failure to write to AvroSerde tables is that AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so fixing the above will just push the failure into the placeholder RecordWriter.

      Attachments

        1. HIVE-8687.branch-0.14.patch
          19 kB
          Sushanth Sowmyan
        2. HIVE-8687.patch
          19 kB
          Sushanth Sowmyan
        3. HIVE-8687.2.patch
          19 kB
          Sushanth Sowmyan
        4. HIVE-8687.branch-0.14.2.patch
          19 kB
          Sushanth Sowmyan
        5. HIVE-8687.branch-0.14.3.patch
          21 kB
          Sushanth Sowmyan
        6. HIVE-8687.3.patch
          21 kB
          Sushanth Sowmyan
        7. HIVE-8687.4.patch
          21 kB
          Sushanth Sowmyan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sushanth Sushanth Sowmyan Assign to me
            sushanth Sushanth Sowmyan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment