Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1443

Remove record deserialization in RDDCustomColumnsSortPartitioner

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • performance
    • None

    Description

      https://github.com/apache/hudi/pull/2263#discussion_r533653930 has the context. When sorting is specified as part of clustering, we use custom partitioner RDDCustomColumnsSortPartitioner. This deserializes schema to get values for sort columns. Check if its possible to avoid this and implement the suggestion in PR.

      We tried another approach by adding SerializableSchema. But this is not working for nested schemas. See test failing here. Fix this serialization and use it in RDDCustomColumnsSortPartitioner

      Attachments

        1. image-2020-12-21-17-06-25-253.png
          15 kB
          Vinoth Chandar

        Activity

          People

            Unassigned Unassigned
            satishkotha satish
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: