XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      There is a race condition in RemoteSparkJobMonitor. Sometimes the info in RemoteSparkJobMonitor#startMonitor.STARTED gets printed out, sometimes it doesn't. This can be easily verified by running a qtest on TestMiniSparkOnYarnCliDriver and counting the number of times Query Hive on Spark job is printed vs. the number of times Finished successfully in gets printed.

      The issue is that RemoteSparkJobMonitor runs every one second, and checks the state of JobHandle. Depending on the state, it prints out some logging info. The content of the logs contain an implicit assumption that logs in the STARTED state are printed before the logs in the SUCCEEDED state. However, this isn't always the case. The state transitions are driven by how long the remote Spark job takes to run, and it it finishes within one second then the logs in the STARTED state never printed.

      This can be confusing to users, and there is key debugging information that is printed in the STARTED state.

      Attachments

        1. HIVE-18684.3.patch
          16 kB
          Sahil Takiar
        2. HIVE-18684.2.patch
          15 kB
          Sahil Takiar
        3. HIVE-18684.1.patch
          12 kB
          Sahil Takiar

        Activity

          People

            stakiar Sahil Takiar
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: