Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8120 Umbrella JIRA tracking Parquet improvements
  3. HIVE-9333

Move parquet serialize implementation to DataWritableWriter to improve write speeds

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • None
    • None

    Description

      The serialize process on ParquetHiveSerDe parses a Hive object
      to a Writable object by looping through all the Hive object children,
      and creating new Writables objects per child. These final writables
      objects are passed in to the Parquet writing function, and parsed again
      on the DataWritableWriter class by looping through the ArrayWritable
      object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet.

      In order to achieve this, we can wrap the Hive object and object inspector
      on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class.

      Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()).

      This performance issue was found using microbenchmark tests from HIVE-8121.

      Attachments

        1. HIVE-9333.5.patch
          75 kB
          Sergio Peña
        2. HIVE-9333.6.patch
          60 kB
          Sergio Peña
        3. HIVE-9333.7.patch
          61 kB
          Sergio Peña

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            spena Sergio Peña Assign to me
            spena Sergio Peña
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment