Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.1
-
None
-
I used the spark-2.4.3-bin-hadoop2.7 and spark-2.3.1-bin-hadoop2.7 packages from https://spark.apache.org/downloads.html
Running the history server locally as-is (using default values) on ubunto 16.04.4 running using WSL (Windows Subsystem for Linux) on my windows 10 machine.
Browser used is firefox 67.0.2 (64-bit) for windows
I used the spark-2.4.3-bin-hadoop2.7 and spark-2.3.1-bin-hadoop2.7 packages from https://spark.apache.org/downloads.html Running the history server locally as-is (using default values) on ubunto 16.04.4 running using WSL (Windows Subsystem for Linux) on my windows 10 machine. Browser used is firefox 67.0.2 (64-bit) for windows
Description
Overview:
If you are looking to watch locally a spark application attempt history, trying to see the history of the first attempt (or any attempt but the last one) would fail, because some UI inconsistently.
The inconsistency is that in the spark history UI, the "app_id" column is clickable and will always take you to this application last attempt, but if you tried to download only the first attempt, you will get an error of application not found.
How to reproduce:
- open spark any spark history server (if using Azure HDinsight the address would be https://<cluster name>.azurehdinsight.net/sparkhistory/)
- look for an application that have multiple attempts (ie - attempt ID > 1)
- look for the first attempt in this application and download it using the "download" button in the event column. save it in your local spark history folder (default: /tmp/spark-events)
- Start a local spark history server (typically: using the start-history-server.sh script)
- browse to the local history server and look for the application for which you downloaded the history.
- click the application name in the "App ID" column, and you would get the following error:
"Application <your application ID> not found."
Why ?
because on the remote history server it is assumed that all Attempts history files are preset, so the "App ID" column points to the latest attempt of this app, while the "Attempt ID" column points to the specific attempt.
But if we have an application with two attempts, and we only want to research the first one, we download it locally, opening with the local history server, and intuitively clicking the link in the "app id" column, the link actually points to the second attempt, which we haven't even downloaded.