[AVRO-1215] AvroMultipleOutputs not working when specifying baseOutputPath - ASF JIRA

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.7.2
Fix Version/s: 1.7.4
Component/s: java
Labels:
- avro
- mapreduce

Tags:
avro

Description

I'm calling the write() method of AvroMultipleOutputs which takes the baseOutputPath. The reducer appears to begin hanging once it tries writing to a baseOuputPath value not already encountered. It then fails with:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ... because current leaseholder is trying to recreate file.

I think the problem has to do with this line in AvroMultipleOutputs:

// get the record writer from context output format
//FileOutputFormat.setOutputName(taskContext, baseFileName);

This line is not commented out in the similar code from Hadoop. So I think the baseOutputPath is ignored. As a result when each record writer is created it uses the same path, leading to the exception.

Uncommenting this line does not work because of visibility of the method. However what this method does is set "mapreduce.output.basename". But setting this doesn't work either.

After digging through Avro code I found that AvroOutputFormatBase is using "avro.mo.config.namedOutput" to create the path. If I replace the commented out line with this it seems to work:

taskContext.getConfiguration().set("avro.mo.config.namedOutput", baseFileName);

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AVRO-1215_final.patch
17/Feb/13 23:07
9 kB
Ashish Nagavaram
AVRO-1215.patch
06/Feb/13 22:20
9 kB
Ashish Nagavaram
AVRO-1215.patch
06/Feb/13 22:17
10 kB
Ashish Nagavaram
AVRO-1215.patch
05/Feb/13 06:38
29 kB
Ashish Nagavaram
AVRO-1215-v3.patch
22/Jan/13 23:02
7 kB
Ashish Nagavaram

Issue Links

Add Link

duplicates

AVRO-1236 AvroMultipleOutputs fails to close successfuly

Open

Delete this link

AVRO-1179 AvroMultipleOutputs does not seem to be generating different base output paths

Resolved

Delete this link

is duplicated by

AVRO-1239 AvroMultipleOutput ignores schemas

Resolved

Delete this link

is related to

AVRO-1106 AvroMultipleOutputs for new Hadoop Version

Closed

Delete this link

relates to

AVRO-1266 Fix mapred AvroMultipleOutputs class to write the schema to Jobconf rather than private Hashmap

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Ashish Nagavaram

Reporter:: Matthew Hayes

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 07/Dec/12 04:44

Updated:: 20/Aug/15 19:10

Resolved:: 18/Feb/13 18:31

Agile

View on Board

AvroMultipleOutputs not working when specifying baseOutputPath

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment