Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23607

Use HDFS extended attributes to store application summary to improve the Spark History Server performance

    XMLWordPrintableJSON

Details

    Description

      Currently in Spark History Server, checkForLogs thread will create replaying tasks for log files which have file size change. The replaying task will filter out most of the log file content and keep the application summary including applicationId, user, attemptACL, start time, end time. The application summary data will get updated into listing.ldb and serve the application list on SHS home page. For a long running application, its log file which name ends with "inprogress" will get replayed for multiple times to get these application summary. This is a waste of computing and data reading resource to SHS, which results in the delay for application to get showing up on home page. Internally we have a patch which utilizes HDFS extended attributes to improve the performance for getting application summary in SHS. With this patch, Driver will write the application summary information into extended attributes as key/value. SHS will try to read from extended attributes. If SHS fails to read from extended attributes, it will fall back to read from the log file content as usual. This feature can be enable/disable through configuration.

      It has been running fine for 4 months internally with this patch and the last updated timestamp on SHS keeps within 1 minute as we configure the interval to 1 minute. Originally we had long delay which could be as long as 30 minutes in our scale where we have a large number of Spark applications running per day.

      We want to see whether this kind of approach is also acceptable to community. Please comment. If so, I will post a pull request for the changes. Thanks.

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhouyejoe Ye Zhou
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: