Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12420

While trying to access Amazon S3 through hadoop-aws(Spark basically) I was getting Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      While trying to access data stored in Amazon S3 through Apache Spark, which internally uses hadoop-aws jar I was getting the following exception :

      Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V

      Probable reason could be the fact that aws java sdk expects a long parameter for the setMultipartUploadThreshold(long multiPartThreshold) method, but hadoop-aws was using a parameter of type int(multiPartThreshold).

      I tried using the downloaded hadoop-aws jar and the build through its maven dependency, but in both the cases I encountered the same exception. Although I can see private long multiPartThreshold; in hadoop-aws GitHub repo, it's not getting reflected in the downloaded jar or in the jar created from maven dependency.

      Following lines in the S3AFileSystem class create this difference :

      Build from trunk :
      private long multiPartThreshold;
      this.multiPartThreshold = conf.getLong("fs.s3a.multipart.threshold", 2147483647L); => Line 267

      Build through maven dependency :
      private int multiPartThreshold;
      multiPartThreshold = conf.getInt(MIN_MULTIPART_THRESHOLD, DEFAULT_MIN_MULTIPART_THRESHOLD); => Line 249

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment - - edited
          Show
          kihwal Kihwal Lee added a comment - - edited https://github.com/aws/aws-sdk-java/pull/637 in 1.10.56.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          the rule for s3a work now and in future "use a consistent version of the amazon libraries with which hadoop was built with". You should not be seeing this error with 2.72+SDK 1.7.4. Try to use a later verson of AWS ADK and yes, things will break. Sorry.

          timeout/connection problems are unrelated to this JIRA. You may want to (a) look at HADOOP-12346 , change those config options locally and see if that helps. Otherwise, do grab hadoop branch-2.8 and build spark against it, and see if that fixes things. As if it doesn't, now is the time to identify and fix the problems —before we get that 2.8.0 release out the door.

          Show
          stevel@apache.org Steve Loughran added a comment - the rule for s3a work now and in future "use a consistent version of the amazon libraries with which hadoop was built with". You should not be seeing this error with 2.72+SDK 1.7.4. Try to use a later verson of AWS ADK and yes, things will break. Sorry. timeout/connection problems are unrelated to this JIRA. You may want to (a) look at HADOOP-12346 , change those config options locally and see if that helps. Otherwise, do grab hadoop branch-2.8 and build spark against it, and see if that fixes things. As if it doesn't, now is the time to identify and fix the problems —before we get that 2.8.0 release out the door.
          Hide
          MasterDDT Mitesh added a comment - - edited

          Hadoop 2.8 still hasn't even RC from what I can tell. What workaround is everyone using? I'm seeing lots of intermittent timeout/connection issues in Spark that I am fairly sure due to being stuck at hadoop-2.7.2/aws-java-sdk-1.7.4.

          Show
          MasterDDT Mitesh added a comment - - edited Hadoop 2.8 still hasn't even RC from what I can tell. What workaround is everyone using? I'm seeing lots of intermittent timeout/connection issues in Spark that I am fairly sure due to being stuck at hadoop-2.7.2/aws-java-sdk-1.7.4.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          no, that uses jet3t

          Show
          stevel@apache.org Steve Loughran added a comment - no, that uses jet3t
          Hide
          ashrowty Ashish Shrowty added a comment -

          Does this also break s3n based access?

          Show
          ashrowty Ashish Shrowty added a comment - Does this also break s3n based access?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Marking as fixed for Hadoop 2.8, the library change fixes that. If you see it in earlier versions, you've got an incompatible version of aws-java-sdk, as Amazon have broken their java method signature

          Show
          stevel@apache.org Steve Loughran added a comment - Marking as fixed for Hadoop 2.8, the library change fixes that. If you see it in earlier versions, you've got an incompatible version of aws-java-sdk, as Amazon have broken their java method signature
          Hide
          Thomas Demoor Thomas Demoor added a comment -

          Ok, just see now that it was backported to 2 already, disregard my previous comment.

          Show
          Thomas Demoor Thomas Demoor added a comment - Ok, just see now that it was backported to 2 already, disregard my previous comment.
          Hide
          Thomas Demoor Thomas Demoor added a comment -

          I'm not sure I understand the issue completely. Spark with hadoop-2.7.1 and aws-java-sdk 1.7.4 should work. The upgrade to aws-sdk-s3 1.10.6 is only in hadoop-trunk. Are you building build spark vs hadoop-trunk yourself? With hadoop-provided (http://spark.apache.org/docs/latest/building-spark.html)? From the error it seems you have the old hadoop code but the updated aws-sdk.

          Steve Loughran, backporting HADOOP-12269 fixes some bugs on the aws side such as MultipartThreshold->long but also more serious ones (HADOOP-12267). THese might be serious enough to justify the backport to branch-2.

          Show
          Thomas Demoor Thomas Demoor added a comment - I'm not sure I understand the issue completely. Spark with hadoop-2.7.1 and aws-java-sdk 1.7.4 should work. The upgrade to aws-sdk-s3 1.10.6 is only in hadoop-trunk. Are you building build spark vs hadoop-trunk yourself? With hadoop-provided ( http://spark.apache.org/docs/latest/building-spark.html)? From the error it seems you have the old hadoop code but the updated aws-sdk. Steve Loughran , backporting HADOOP-12269 fixes some bugs on the aws side such as MultipartThreshold->long but also more serious ones ( HADOOP-12267 ). THese might be serious enough to justify the backport to branch-2.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          OK, this looks likes an AWS library version change, the one handled in HADOOP-12269

          Summary: Hadoop 2.7.1 needs aws-java-sdk version 1.7.4; the aws release, aws-java-sdk-s3 v 1.10.6 has changed the signature and doesn't work.

          Right now there's not a lot we can do except wait for people to type in the stack trace into their browser and end up here; for 2.7.2 we could think about backporting the library change.

          Show
          stevel@apache.org Steve Loughran added a comment - OK, this looks likes an AWS library version change, the one handled in HADOOP-12269 Summary: Hadoop 2.7.1 needs aws-java-sdk version 1.7.4; the aws release, aws-java-sdk-s3 v 1.10.6 has changed the signature and doesn't work. Right now there's not a lot we can do except wait for people to type in the stack trace into their browser and end up here; for 2.7.2 we could think about backporting the library change.

            People

            • Assignee:
              tariq Tariq Mohammad
              Reporter:
              tariq Tariq Mohammad
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development