Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      There is a race condition in RemoteSparkJobMonitor. Sometimes the info in RemoteSparkJobMonitor#startMonitor.STARTED gets printed out, sometimes it doesn't. This can be easily verified by running a qtest on TestMiniSparkOnYarnCliDriver and counting the number of times Query Hive on Spark job is printed vs. the number of times Finished successfully in gets printed.

      The issue is that RemoteSparkJobMonitor runs every one second, and checks the state of JobHandle. Depending on the state, it prints out some logging info. The content of the logs contain an implicit assumption that logs in the STARTED state are printed before the logs in the SUCCEEDED state. However, this isn't always the case. The state transitions are driven by how long the remote Spark job takes to run, and it it finishes within one second then the logs in the STARTED state never printed.

      This can be confusing to users, and there is key debugging information that is printed in the STARTED state.

      Attachments

        1. HIVE-18684.1.patch
          12 kB
          Sahil Takiar
        2. HIVE-18684.2.patch
          15 kB
          Sahil Takiar
        3. HIVE-18684.3.patch
          16 kB
          Sahil Takiar

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stakiar Sahil Takiar Assign to me
            stakiar Sahil Takiar

            Dates

              Created:
              Updated:

              Slack

                Issue deployment