Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22526

Document closing of PortableDataInputStream in binaryFiles

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 2.2.0
    • None
    • Documentation, Spark Core
    • None

    Description

      Hi,

      I am using Spark 2.2.0(recent version) to read binary files from S3. I use sc.binaryfiles to read the files.

      It is working fine until some 100 file read but later it get hangs indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in the later releases)

      I tried setting the fs.s3a.connection.maximum to some maximum values but didn't help.

      And finally i ended up using the spark speculation parameter set which is again didnt help much.

      One thing Which I observed is that it is not closing the connection after every read of binary files from the S3.

      example :- sc.binaryFiles("s3a://test/test123.zip")

      Please look into this major issue!

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            imranece59 mohamed imran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment