Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      hadoop-aws jar still depends on the very old 1.7.4 version of aws-java-sdk.
      In newer versions of SDK, there is incompatible API changes that leads to the following error when trying to use the S3A class and newer versions of sdk presents.
      This is because S3A is calling the method with "int" as the parameter type while the new SDK is expecting "long". This makes it impossible to use kinesis + s3a in the same process.
      It would be very helpful to upgrade hadoop-awas's aws-sdk version.

      java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V
      at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:285)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
      at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
      at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:130)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
      at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
      at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
      at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
      at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
      at $iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
      at $iwC$$iwC$$iwC.<init>(<console>:42)
      at $iwC$$iwC.<init>(<console>:44)
      at $iwC.<init>(<console>:46)
      at <init>(<console>:48)
      at .<init>(<console>:52)
      at .<clinit>(<console>)
      at .<init>(<console>:7)
      at .<clinit>(<console>)
      at $print(<console>)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
      at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
      at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
      at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
      at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
      at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:655)
      at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:620)
      at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:613)
      at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
      at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
      at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
      at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
      at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

        Issue Links

          Activity

          Hide
          Thomas Demoor Thomas Demoor added a comment -

          Hi Yongjia,

          HADOOP-12269 upgraded aws-sdk-s3 to 1.10.6

          Show
          Thomas Demoor Thomas Demoor added a comment - Hi Yongjia, HADOOP-12269 upgraded aws-sdk-s3 to 1.10.6
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Linking to duplicate issues. Yonjia, please search JIRA first as you'll save us all time.

          Show
          stevel@apache.org Steve Loughran added a comment - Linking to duplicate issues. Yonjia, please search JIRA first as you'll save us all time.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          sorry: Yongjia.

          Show
          stevel@apache.org Steve Loughran added a comment - sorry: Yongjia.

            People

            • Assignee:
              Unassigned
              Reporter:
              yongjiaw Yongjia Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development