Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5261

Use proper parallelism for engine context APIs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • performance

    Description

      do a global search of these APIs

      • org.apache.hudi.common.engine.HoodieEngineContext#flatMap
      • org.apache.hudi.common.engine.HoodieEngineContext#map

      and similar ones take in parallelism.

      A lot of occurrences are using number of items as parallelism, which affect performance. Parallelism should be based on num cores available in the cluster and set by user via parallelism configs.

      Attachments

        Issue Links

          Activity

            People

              jonvex Jonathan Vexler
              xushiyan Raymond Xu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: