Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13265

Refactoring of basic ML import/export for other file system besides HDFS

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.1, 2.0.0
    • Component/s: ML
    • Labels:
      None

      Description

      We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6.

      https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

      When I tried to export a KMeans model into Amazon S3, I got the error.

      scala> val kmeans = new KMeans().setK(2)
      scala> val model = kmeans.fit(train)
      scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
      java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected: hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c
      om:9000
              at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
              at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
              at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
              at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
              at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
              at $iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
              at $iwC$$iwC$$iwC.<init>(<console>:49)
              at $iwC$$iwC.<init>(<console>:51)
              at $iwC.<init>(<console>:53)
              at <init>(<console>:55)
              at .<init>(<console>:59)
              at .<clinit>(<console>)
              at .<init>(<console>:7)
              at .<clinit>(<console>)
              at $print(<console>)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
              at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
              at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
              at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
              at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
              at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
              at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
              at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
              at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
              at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
              at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
              at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
              at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
              at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
              at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
              at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
              at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
              at org.apache.spark.repl.Main$.main(Main.scala:31)
              at org.apache.spark.repl.Main.main(Main.scala)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
              at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
              at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
              at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
              at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yuu.ishikawa@gmail.com Yu Ishikawa
                Reporter:
                yuu.ishikawa@gmail.com Yu Ishikawa
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: