Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5633

Fixing HoodieSparkRecord performance bottlenecks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.13.0
    • None

    Description

      There currently following issues w/ the current HoodieSparkRecord implementation:

      1. It rewrites records using `rewriteRecord` and `rewriteRecordWithNewSchema` which do Schema traversals for every record. Instead we should do schema traversal only once and produce a transformer that will directly create new record from the old one.
      2. Records are currently copied for every Executor even for Simple one which actually is not buffering any records and therefore doesn't require records to be copied.

      Attachments

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: