Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0, 0.15.0
-
None
-
None
-
Reviewed
Description
From a single Pig script I write two completely different and unrelated CSV files; both with the flag 'WRITE_OUTPUT_HEADER'.
The bug is that both files get the SAME header at the top of the output file even though the data is different.
Reproduction:
foo.txt
1
bar.txt (Tab separated)
1 a
WriteTwoCSV.pig
FOO = LOAD 'foo.txt' USING PigStorage('\t') AS (a:chararray); BAR = LOAD 'bar.txt' USING PigStorage('\t') AS (b:chararray, c:chararray); STORE FOO into 'Foo' USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER'); STORE BAR into 'Bar' USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');
Command:
pig -x local WriteTwoCSV.pig
Result:
cat Bar/part-*
b c 1 a
cat Foo/part-*
b c 1
The error is that the Foo output has a the two column header from the Bar output.
One of the effects is that parsing the Foo data will probably fail due to the varying number of columns