Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4758

In PigStorage when Using -tagPath or -tagFile Option columns order out of sync

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.14.0, 0.15.0
    • None
    • internal-udfs, piggybank
    • None
    • PigStorage tagPath tagFile

    Description

      when using the following script:

      a= LOAD 'data.csv' USING PigStorage('\t','-tagPath') AS (filepath:chararray, f1:chararray, f2:chararray); 
      b = FOREACH a GENERATE filepath, f2; 
      dump b; 
      

      The output will contain the data from filepath and from f1 fields instead of f2 field.
      This is caused because of a bug within PigStorage (it also happens in CSVExcelStorage) where it doesn't take the tagPath/tagFile into account when calculating requiredColumns index:

      PigStorage.java
      if (mRequiredColumns==null || (mRequiredColumns.length>fieldID && mRequiredColumns[fieldID])) 
      	addTupleValue(mProtoTuple, buf, start, i); 
      

      but fieldID doesn't take the tagFile/tagPath column into account.

      Attachments

        Activity

          People

            Unassigned Unassigned
            opensource@xplenty.com xplenty
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: