Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39225

Support spark.history.fs.update.batchSize

    XMLWordPrintableJSON

Details

    Description

      Current Spark History Server suffers when there are a large number of eventlog files under eventLog.dir: when a SHS starts, the initial scan may take a long time, and new eventlog files would not be scanned/parsed until the initial scan completes.

      For example, if the initial scan takes 1-2 days(this is not uncommon in large environments), the newly finished spark jobs would not show up in SHS since their eventlog files are not scanned/parsed until the initial scan process finishes. This would result in a 1-2 days SHS malfunctioning since the newly finished spark jobs are most likely to be queried by users.

      Attachments

        Activity

          People

            hai_tao Hai Tao
            hai_tao Hai Tao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified