Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7889

Jobs progress of apps on complete page of HistoryServer shows uncompleted

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      When running a SparkPi with 2000 tasks, cliking into the app on incomplete page, the job progress shows 400/2000. After the app is completed, the app goes to complete page from incomplete, and now cliking into the app, the job progress still shows 400/2000.

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'XuTingjun' has created a pull request for this issue:
          https://github.com/apache/spark/pull/6545

          Show
          apachespark Apache Spark added a comment - User 'XuTingjun' has created a pull request for this issue: https://github.com/apache/spark/pull/6545
          Hide
          tsudukim Masayoshi TSUZUKI added a comment -

          As a workaround, try to set spark.history.retainedApplications as 0.

          Show
          tsudukim Masayoshi TSUZUKI added a comment - As a workaround, try to set spark.history.retainedApplications as 0 .
          Hide
          srowen Sean Owen added a comment -

          I think the intent is to make job info refresh without refreshing the page, which is not a bug.

          Show
          srowen Sean Owen added a comment - I think the intent is to make job info refresh without refreshing the page, which is not a bug.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Is JIRA about

          (a) the status on the listing of complete/uncomplete being wrong in some way
          (b) the actual job view (history/some-app-id) being stale when a job completes.

          (b) is consistent with what I observed in SPARK-8275

          Looking at your patch, and comparing it with my proposal, I prefer mine. All I'm proposing is invalidating the cache on work in progress, so that it is retrieved again.

          Thinking about it some more, we can go one better: rely on the ApplicationHistoryInfo.lastUpdated field to tell us when the UI was last updated. If we cache the update time with the UI, on any GET of an appUI, we can look to see if the previous UI was not completed and if the lastupdated time has changed...if so. that triggers a refresh.

          with this approach the entry you see will always be the one most recently published to the history store (of any implementation), and picked up by the history provider in its getListing()/background refresh operation.

          Show
          stevel@apache.org Steve Loughran added a comment - Is JIRA about (a) the status on the listing of complete/uncomplete being wrong in some way (b) the actual job view (history/some-app-id) being stale when a job completes. (b) is consistent with what I observed in SPARK-8275 Looking at your patch, and comparing it with my proposal, I prefer mine. All I'm proposing is invalidating the cache on work in progress, so that it is retrieved again. Thinking about it some more, we can go one better: rely on the ApplicationHistoryInfo.lastUpdated field to tell us when the UI was last updated. If we cache the update time with the UI, on any GET of an appUI, we can look to see if the previous UI was not completed and if the lastupdated time has changed...if so. that triggers a refresh. with this approach the entry you see will always be the one most recently published to the history store (of any implementation), and picked up by the history provider in its getListing()/background refresh operation.
          Hide
          meiyoula meiyoula added a comment -

          Steve Loughran Can you realize your proposal with code, I think maybe you can create a new pull request.

          Show
          meiyoula meiyoula added a comment - Steve Loughran Can you realize your proposal with code, I think maybe you can create a new pull request.
          Hide
          apachespark Apache Spark added a comment -

          User 'steveloughran' has created a pull request for this issue:
          https://github.com/apache/spark/pull/6935

          Show
          apachespark Apache Spark added a comment - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/6935
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Added a new pull request; notes which uis do not contain a completed attempt -and will refresh those.

          Show
          stevel@apache.org Steve Loughran added a comment - Added a new pull request ; notes which uis do not contain a completed attempt -and will refresh those.
          Hide
          apachespark Apache Spark added a comment -

          User 'steveloughran' has created a pull request for this issue:
          https://github.com/apache/spark/pull/9913

          Show
          apachespark Apache Spark added a comment - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/9913
          Hide
          stevel@apache.org Steve Loughran added a comment -

          While I work on this, I suspect one of the issues is the reliance on FileStatus..getModificationTime() as the metric for a file being updated. It's not enough to track changes of incomplete apps

          1. when data is appended to an in-progress app through an open output stream, the modification time does not change: hence, the log is not considered updates
          2. when a log file is renamed from $name.inprogress to $name, the modtime is changed in HDFS, _but not, apparently, in a POSIX fs.

          Issue #1 is stopping probes for incompletness; The query for incomplete apps is going to have to probe for an update using file size.

          For issue #2, a call to FileSystem.setTimes() on the renamed file, will guarantee that the change is picked up; HADOOP-12612 filed to cover the issue that the modtime semantics of rename are undefined and clearly inconsistent.

          Show
          stevel@apache.org Steve Loughran added a comment - While I work on this, I suspect one of the issues is the reliance on FileStatus..getModificationTime() as the metric for a file being updated. It's not enough to track changes of incomplete apps when data is appended to an in-progress app through an open output stream, the modification time does not change: hence, the log is not considered updates when a log file is renamed from $name.inprogress to $name , the modtime is changed in HDFS, _but not, apparently, in a POSIX fs. Issue #1 is stopping probes for incompletness; The query for incomplete apps is going to have to probe for an update using file size. For issue #2, a call to FileSystem.setTimes() on the renamed file, will guarantee that the change is picked up; HADOOP-12612 filed to cover the issue that the modtime semantics of rename are undefined and clearly inconsistent.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I'm going to note something something problematic about probing HDFS for changes, HDFS-5478 reports that flushing fs data doesn't immediately update fs length. So detecting changes in log files saved to HDFS isn't likely to be that immediate. the SPARK-1537 timeline binding won't have this problem, nor will logs saved to other filesystems (object stores have different issues)

          Show
          stevel@apache.org Steve Loughran added a comment - I'm going to note something something problematic about probing HDFS for changes, HDFS-5478 reports that flushing fs data doesn't immediately update fs length. So detecting changes in log files saved to HDFS isn't likely to be that immediate. the SPARK-1537 timeline binding won't have this problem, nor will logs saved to other filesystems (object stores have different issues)
          Hide
          apachespark Apache Spark added a comment -

          User 'squito' has created a pull request for this issue:
          https://github.com/apache/spark/pull/11118

          Show
          apachespark Apache Spark added a comment - User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/11118
          Hide
          irashid Imran Rashid added a comment -

          Issue resolved by pull request 11118
          https://github.com/apache/spark/pull/11118

          Show
          irashid Imran Rashid added a comment - Issue resolved by pull request 11118 https://github.com/apache/spark/pull/11118

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              meiyoula meiyoula
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development