Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5146

saveAsHadoopFiles does not work with checkpointing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 1.2.0
    • 1.2.0
    • None
    • None

    Description

      Hello folks,
      Consider the following simple app for word counting via network socket:

      WordCount.scala
          val conf = new SparkConf().setAppName("Sample Application")
          val sc = new SparkContext(conf)
          val ssc = new StreamingContext(sc, Seconds(5))
          ssc.checkpoint("target/checkpointDir")
          val lines = ssc.socketTextStream("localhost", 9999)
          val words = lines.flatMap(_.split(" "))
          val pairs = words.map(word => (word, 1))
          val wordCounts = pairs.reduceByKey(_ + _)
          wordCounts.saveAsHadoopFiles("target/prefix","suffix")
          //nc -lk 9999
          ssc.start()
          ssc.awaitTermination(60)
      

      When this is packaged and executed on spark, following exception is thrown:

      java.io.NotSerializableException: org.apache.hadoop.mapred.JobConf

      JobConf usage inside saveAsHadoopFiles methods seems to be the cause.

      Thanks

      Attachments

        Activity

          People

            Unassigned Unassigned
            codemomentum Halit Olali
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified