Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41163

Spark 3.2.2 storage.ShuffleBlockFetcherIterator and TransportResponseHandler issue

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.2
    • None
    • Build, Deploy
    • None
      • spark 3.2.2
      • hadoop 3.1.2
      • hive 3.1.1
      • scala 2.12

    Description

      Hello there.

      I've build spark 3.2.2 for my cluster which has hadoop 3.1.2 and scala 2.12 (pom.xml is attached).

      build script:

       

      cd spark && \
      ./build/mvn -Pyarn -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive -Phive-thriftserver -DskipTests clean package 

       

      It was working fine but a few applications has got strage error and warning form time to time.

      It always looks like datanode connection lost and shuffle reading issues.

      2022-11-16 22:18:25,423 ERROR server.TransportChannelHandler: Connection to s00abd02node9.company.com/10.x.y.163:35143 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.shuffle.io.connectionTimeout if this is wrong.
      2022-11-16 22:18:25,423 ERROR client.TransportResponseHandler: Still have 5 requests outstanding when connection from s00abd02node9.company.com/10.x.y.163:35143 is closed
      2022-11-16 22:18:25,423 WARN netty.NettyBlockTransferService: Error while trying to get the host local dirs for [16]
      2022-11-16 22:18:25,425 ERROR storage.ShuffleBlockFetcherIterator: Error occurred while fetching host local blocks 

      So when it happend application will go to retry and fail after 2nd start.

      Can anybody help?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dishka_krauch Dmitry Kravchuk

            Dates

              Created:
              Updated:

              Slack

                Issue deployment