Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41163

Spark 3.2.2 storage.ShuffleBlockFetcherIterator and TransportResponseHandler issue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.2
    • None
    • Build, Deploy
    • None
      • spark 3.2.2
      • hadoop 3.1.2
      • hive 3.1.1
      • scala 2.12

    Description

      Hello there.

      I've build spark 3.2.2 for my cluster which has hadoop 3.1.2 and scala 2.12 (pom.xml is attached).

      build script:

       

      cd spark && \
      ./build/mvn -Pyarn -Dhadoop.version=3.1.2 -Pscala-2.12 -Phive -Phive-thriftserver -DskipTests clean package 

       

      It was working fine but a few applications has got strage error and warning form time to time.

      It always looks like datanode connection lost and shuffle reading issues.

      2022-11-16 22:18:25,423 ERROR server.TransportChannelHandler: Connection to s00abd02node9.company.com/10.x.y.163:35143 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.shuffle.io.connectionTimeout if this is wrong.
      2022-11-16 22:18:25,423 ERROR client.TransportResponseHandler: Still have 5 requests outstanding when connection from s00abd02node9.company.com/10.x.y.163:35143 is closed
      2022-11-16 22:18:25,423 WARN netty.NettyBlockTransferService: Error while trying to get the host local dirs for [16]
      2022-11-16 22:18:25,425 ERROR storage.ShuffleBlockFetcherIterator: Error occurred while fetching host local blocks 

      So when it happend application will go to retry and fail after 2nd start.

      Can anybody help?

      Attachments

        1. pom.xml
          128 kB
          Dmitry Kravchuk
        2. container_1668606650061_0087_01_000057.txt
          39 kB
          Dmitry Kravchuk

        Activity

          People

            Unassigned Unassigned
            dishka_krauch Dmitry Kravchuk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: