Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3057

Custom Partitioner not working in Oozie Mapreduce action

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 4.1.0
    • None
    • action, workflow
    • None

    Description

      I implemented secondary sort in mapreduce using old API (org.apache.hadoop.mapred.*) and trying to execute it using Oozie (From Hue).

      Though I have set the partitioner class in the properties, the partitioner is not being executed. So, I'm not getting output as expected.

      The same code runs fine when run using hadoop command from CLI.

      And here is my workflow.xml

      <workflow-app name="MyTriplets" xmlns="uri:oozie:workflow:0.5">
      <start to="mapreduce-598d"/>
      <kill name="Kill">
      <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
      </kill>
      <action name="mapreduce-598d">
      <map-reduce>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <configuration>
      <property>
      <name>mapred.output.dir</name>
      <value>/test_1109_3</value>
      </property>
      <property>
      <name>mapred.input.dir</name>
      <value>/apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0609_tx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0509_util/day=05-09-2017/hour=16/quarter=1/</value>
      </property>
      <property>
      <name>mapred.input.format.class</name>
      <value>org.apache.hadoop.hive.ql.io.RCFileInputFormat</value>
      </property>
      <property>
      <name>mapred.mapper.class</name>
      <value>PonRankMapper</value>
      </property>
      <property>
      <name>mapred.reducer.class</name>
      <value>PonRankReducer</value>
      </property>
      <property>
      <name>mapred.output.value.comparator.class</name>
      <value>PonRankGroupingComparator</value>
      </property>
      <property>
      <name>mapred.mapoutput.key.class</name>
      <value>PonRankPair</value>
      </property>
      <property>
      <name>mapred.mapoutput.value.class</name>
      <value>org.apache.hadoop.io.Text</value>
      </property>
      <property>
      <name>mapred.reduce.output.key.class</name>
      <value>org.apache.hadoop.io.NullWritable</value>
      </property>
      <property>
      <name>mapred.reduce.output.value.class</name>
      <value>org.apache.hadoop.io.Text</value>
      </property>
      <property>
      <name>mapred.reduce.tasks</name>
      <value>1</value>
      </property>
      <property>
      <name>mapred.partitioner.class</name>
      <value>PonRankPartitioner</value>
      </property>
      <property>
      <name>mapred.mapper.new-api</name>
      <value>False</value>
      </property>
      </configuration>
      </map-reduce>
      <ok to="End"/>
      <error to="Kill"/>
      </action>
      <end name="End"/>

      When running using hadoop jar command, I set the partitioner class using JobConf.setPartitionerClass API.

      Partitioner is not executed when using old API . Inspite of adding the property.

      <property>
      <name>mapred.partitioner.class</name>
      <value>PonRankPartitioner</value>
      </property>

      Executed the same logic using new API's (org.apache.hadoop.mapreduce) and added mapreduce.partitioner.class property in workflow.

      Partitioner was executed and desired outcome was seen.

      Attachments

        1. workflow.xml
          2 kB
          Raghavi Ravi
        2. PonRankPartitioner.java
          0.6 kB
          Raghavi Ravi
        3. Logs.zip
          127 kB
          Raghavi Ravi

        Activity

          People

            Unassigned Unassigned
            raghaviravi92 Raghavi Ravi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: