[OOZIE-3057] Custom Partitioner not working in Oozie Mapreduce action - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 4.1.0
Fix Version/s: None
Component/s: action, workflow
Labels:
None
Environment:

Hide

Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux version 3.10.0-327.10.1.el7.x86_64 (mockbuild@x86-021.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Sat Jan 23 04:54:55 EST 2016
oozie version - 4.1.0
cdh version - 5.10.1
Hue™ 3.11 - The Hadoop UI

Show
Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux version 3.10.0-327.10.1.el7.x86_64 (mockbuild@x86-021.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Sat Jan 23 04:54:55 EST 2016 oozie version - 4.1.0 cdh version - 5.10.1 Hue™ 3.11 - The Hadoop UI

Description

I implemented secondary sort in mapreduce using old API (org.apache.hadoop.mapred.*) and trying to execute it using Oozie (From Hue).

Though I have set the partitioner class in the properties, the partitioner is not being executed. So, I'm not getting output as expected.

The same code runs fine when run using hadoop command from CLI.

And here is my workflow.xml

<workflow-app name="MyTriplets" xmlns="uri:oozie:workflow:0.5">
<start to="mapreduce-598d"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mapreduce-598d">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.output.dir</name>
<value>/test_1109_3</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0609_tx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0509_util/day=05-09-2017/hour=16/quarter=1/</value>
</property>
<property>
<name>mapred.input.format.class</name>
<value>org.apache.hadoop.hive.ql.io.RCFileInputFormat</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>PonRankMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>PonRankReducer</value>
</property>
<property>
<name>mapred.output.value.comparator.class</name>
<value>PonRankGroupingComparator</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>PonRankPair</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.reduce.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapred.reduce.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>False</value>
</property>
</configuration>
</map-reduce>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>

When running using hadoop jar command, I set the partitioner class using JobConf.setPartitionerClass API.

Partitioner is not executed when using old API . Inspite of adding the property.

<property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property>

Executed the same logic using new API's (org.apache.hadoop.mapreduce) and added mapreduce.partitioner.class property in workflow.

Partitioner was executed and desired outcome was seen.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

workflow.xml
14/Feb/18 07:00
2 kB
Raghavi Ravi
PonRankPartitioner.java
14/Feb/18 06:56
0.6 kB
Raghavi Ravi
Logs.zip
14/Feb/18 06:55
127 kB
Raghavi Ravi

Activity

People

Assignee:: Unassigned

Reporter:: Raghavi Ravi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Sep/17 14:22

Updated:: 14/Feb/18 07:28