Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4689

CSV Writes incorrect header if two CSV files are created in one script

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0, 0.15.0
    • 0.16.0
    • None
    • None
    • Reviewed

    Description

      From a single Pig script I write two completely different and unrelated CSV files; both with the flag 'WRITE_OUTPUT_HEADER'.

      The bug is that both files get the SAME header at the top of the output file even though the data is different.

      Reproduction:

      foo.txt
      1
      
      bar.txt (Tab separated)
      1	a
      
      WriteTwoCSV.pig
      FOO =
          LOAD 'foo.txt'
          USING PigStorage('\t')
          AS (a:chararray);
      
      BAR =
          LOAD 'bar.txt'
          USING PigStorage('\t')
          AS (b:chararray, c:chararray);
      
      STORE FOO into 'Foo'
      USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');
      
      STORE BAR into 'Bar'
      USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');
      

      Command:

      pig -x local WriteTwoCSV.pig

      Result:

      cat Bar/part-*

      b	c
      1	a
      

      cat Foo/part-*

      b	c
      1
      

      The error is that the Foo output has a the two column header from the Bar output.
      One of the effects is that parsing the Foo data will probably fail due to the varying number of columns

      Attachments

        1. PIG-4689-2015-10-06.patch
          0.7 kB
          Niels Basjes
        2. PIG-4689-20151016.patch
          0.8 kB
          Niels Basjes
        3. PIG-4689-20151105.patch
          4 kB
          Niels Basjes

        Activity

          People

            nielsbasjes Niels Basjes
            nielsbasjes Niels Basjes
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: