XMLWordPrintableJSON

Details

    Description

      I have identified 3 performance bottleneck in the finalizeWrite function, that are manifesting and becoming more prominent with the new bootstrap mechanism on S3:

       

      Upon testing with a 1 TB data set, having 8000 partitions and approximately 190000 files this whole process consumes 35 minutes. There is scope to address these performance issues with spark parallelization and using appropriate data structures.

      Attachments

        Issue Links

          Activity

            People

              uditme Udit Mehrotra
              uditme Udit Mehrotra
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: