Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8120 Umbrella JIRA tracking Parquet improvements
  3. HIVE-11131

Get row information on DataWritableWriter once for better writing performance

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 2.0.0
    • None
    • None

    Description

      DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called).

      We can make this class perform better by initializing some writers per data
      type once, and saving all object inspectors on each writer.

      The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet.

      Attachments

        1. HIVE-11131.2.patch
          28 kB
          Sergio Peña
        2. HIVE-11131.3.patch
          29 kB
          Sergio Peña
        3. HIVE-11131.4.patch
          29 kB
          Sergio Peña

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            spena Sergio Peña Assign to me
            spena Sergio Peña
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment