Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5146

saveAsHadoopFiles does not work with checkpointing

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      Hello folks,
      Consider the following simple app for word counting via network socket:

      WordCount.scala
          val conf = new SparkConf().setAppName("Sample Application")
          val sc = new SparkContext(conf)
          val ssc = new StreamingContext(sc, Seconds(5))
          ssc.checkpoint("target/checkpointDir")
          val lines = ssc.socketTextStream("localhost", 9999)
          val words = lines.flatMap(_.split(" "))
          val pairs = words.map(word => (word, 1))
          val wordCounts = pairs.reduceByKey(_ + _)
          wordCounts.saveAsHadoopFiles("target/prefix","suffix")
          //nc -lk 9999
          ssc.start()
          ssc.awaitTermination(60)
      

      When this is packaged and executed on spark, following exception is thrown:

      java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf

      JobConf usage inside saveAsHadoopFiles methods seems to be the cause.

      Thanks

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              codemomentum Halit Olali
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified