Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4856 Optimization for pig on spark
  3. PIG-4970

Remove the deserialize and serialization of JobConf in code for spark mode

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • spark-branch
    • spark
    • None

    Description

      Now we use KryoSerializer to serialize the jobConf in SparkLauncher. then
      deserialize it in ForEachConverter, StreamConverter. We deserialize and serialize the jobConf in order to make jobConf available in spark executor thread.

      We can refactor it in following ways:
      1. Let spark to broadcast the jobConf in sparkContext.newAPIHadoopRDD. Here not create a new jobConf and load properties from PigContext but directly use jobConf from SparkLauncher.
      2. get jobConf in org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader

      Attachments

        1. PIG-4970.patch
          33 kB
          liyunzhang
        2. PIG-4970_4.patch
          41 kB
          liyunzhang
        3. PIG-4970_3.patch
          40 kB
          liyunzhang
        4. PIG-4970_2.patch
          37 kB
          liyunzhang

        Issue Links

          Activity

            People

              kellyzly liyunzhang
              kellyzly liyunzhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: