[SPARK-39225] Support spark.history.fs.update.batchSize - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: Spark Core
Labels:
- pull-request-available

Description

Current Spark History Server suffers when there are a large number of eventlog files under eventLog.dir: when a SHS starts, the initial scan may take a long time, and new eventlog files would not be scanned/parsed until the initial scan completes.

For example, if the initial scan takes 1-2 days(this is not uncommon in large environments), the newly finished spark jobs would not show up in SHS since their eventlog files are not scanned/parsed until the initial scan process finishes. This would result in a 1-2 days SHS malfunctioning since the newly finished spark jobs are most likely to be queried by users.

Attachments

Issue Links

links to

[Github] Pull Request #36597 (hai-tao-1)

Activity

People

Assignee:: Hai Tao

Reporter:: Hai Tao

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 18/May/22 16:35

Updated:: 19/May/22 20:10

Resolved:: 19/May/22 20:09

Time Tracking

Estimated:

72h

Remaining:

72h

Logged:

Not Specified